Training: 2022-01-07 20:14:38,618-rank_id: 0 Training: 2022-01-07 20:15:06,564-: loss cosface Training: 2022-01-07 20:15:06,568-: network r50 Training: 2022-01-07 20:15:06,568-: resume False Training: 2022-01-07 20:15:06,568-: output work_dirs/webface42m_r50_lr01_pfc02 Training: 2022-01-07 20:15:06,568-: embedding_size 512 Training: 2022-01-07 20:15:06,568-: sample_rate 0.2 Training: 2022-01-07 20:15:06,568-: fp16 True Training: 2022-01-07 20:15:06,568-: momentum 0.9 Training: 2022-01-07 20:15:06,568-: weight_decay 0.0005 Training: 2022-01-07 20:15:06,568-: batch_size 512 Training: 2022-01-07 20:15:06,569-: lr 0.4 Training: 2022-01-07 20:15:06,569-: dali True Training: 2022-01-07 20:15:06,569-: verbose 5000 Training: 2022-01-07 20:15:06,569-: frequent 10 Training: 2022-01-07 20:15:06,569-: if_hard_scale False Training: 2022-01-07 20:15:06,569-: score None Training: 2022-01-07 20:15:06,569-: rec /train_tmp/WebFace42M Training: 2022-01-07 20:15:06,569-: num_classes 2059906 Training: 2022-01-07 20:15:06,569-: num_image 42474557 Training: 2022-01-07 20:15:06,569-: num_epoch 20 Training: 2022-01-07 20:15:06,569-: warmup_epoch 2 Training: 2022-01-07 20:15:06,569-: val_targets ['lfw', 'cfp_fp', 'agedb_30'] Training: 2022-01-07 20:15:06,569-: warmup_step 20738 Training: 2022-01-07 20:15:06,569-: total_step 207380 Training: 2022-01-07 20:16:14,076-Reducer buckets have been rebuilt in this iteration. Training: 2022-01-07 20:16:27,104-Speed 5994.74 samples/sec Loss 42.5193 LearningRate 0.0004 Epoch: 0 Global Step: 20 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-01-07 20:16:33,951-Speed 5984.21 samples/sec Loss 42.4755 LearningRate 0.0006 Epoch: 0 Global Step: 30 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-01-07 20:16:40,804-Speed 5979.20 samples/sec Loss 42.4921 LearningRate 0.0008 Epoch: 0 Global Step: 40 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-01-07 20:16:47,708-Speed 5933.39 samples/sec Loss 42.4951 LearningRate 0.0010 Epoch: 0 Global Step: 50 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-01-07 20:16:54,551-Speed 5989.99 samples/sec Loss 42.4792 LearningRate 0.0012 Epoch: 0 Global Step: 60 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-01-07 20:17:01,389-Speed 5991.95 samples/sec Loss 42.4620 LearningRate 0.0014 Epoch: 0 Global Step: 70 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-01-07 20:17:08,285-Speed 5941.03 samples/sec Loss 42.4866 LearningRate 0.0015 Epoch: 0 Global Step: 80 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-01-07 20:17:15,142-Speed 5978.06 samples/sec Loss 42.4729 LearningRate 0.0017 Epoch: 0 Global Step: 90 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-01-07 20:17:22,018-Speed 5958.27 samples/sec Loss 42.4625 LearningRate 0.0019 Epoch: 0 Global Step: 100 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-01-07 20:17:28,860-Speed 5988.37 samples/sec Loss 42.4581 LearningRate 0.0021 Epoch: 0 Global Step: 110 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 20:17:35,742-Speed 5953.60 samples/sec Loss 42.4277 LearningRate 0.0023 Epoch: 0 Global Step: 120 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 20:17:42,639-Speed 5940.10 samples/sec Loss 42.4295 LearningRate 0.0025 Epoch: 0 Global Step: 130 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 20:17:49,488-Speed 5981.67 samples/sec Loss 42.4274 LearningRate 0.0027 Epoch: 0 Global Step: 140 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 20:17:56,372-Speed 5952.38 samples/sec Loss 42.4029 LearningRate 0.0029 Epoch: 0 Global Step: 150 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 20:18:03,314-Speed 5902.03 samples/sec Loss 42.4134 LearningRate 0.0031 Epoch: 0 Global Step: 160 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 20:18:10,155-Speed 5988.17 samples/sec Loss 42.3723 LearningRate 0.0033 Epoch: 0 Global Step: 170 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 20:18:17,007-Speed 5981.73 samples/sec Loss 42.3449 LearningRate 0.0035 Epoch: 0 Global Step: 180 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 20:18:23,854-Speed 5983.36 samples/sec Loss 42.3115 LearningRate 0.0037 Epoch: 0 Global Step: 190 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 20:18:30,720-Speed 5966.77 samples/sec Loss 42.2991 LearningRate 0.0039 Epoch: 0 Global Step: 200 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 20:18:37,558-Speed 5992.28 samples/sec Loss 42.2836 LearningRate 0.0041 Epoch: 0 Global Step: 210 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-07 20:18:44,410-Speed 5979.41 samples/sec Loss 42.2356 LearningRate 0.0042 Epoch: 0 Global Step: 220 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-07 20:18:51,256-Speed 5983.55 samples/sec Loss 42.1778 LearningRate 0.0044 Epoch: 0 Global Step: 230 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-07 20:18:58,092-Speed 5993.67 samples/sec Loss 42.1115 LearningRate 0.0046 Epoch: 0 Global Step: 240 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-07 20:19:04,964-Speed 5964.23 samples/sec Loss 42.0501 LearningRate 0.0048 Epoch: 0 Global Step: 250 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-07 20:19:11,795-Speed 5996.75 samples/sec Loss 41.9939 LearningRate 0.0050 Epoch: 0 Global Step: 260 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-07 20:19:18,657-Speed 5970.68 samples/sec Loss 41.9613 LearningRate 0.0052 Epoch: 0 Global Step: 270 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-07 20:19:25,491-Speed 5994.74 samples/sec Loss 41.8977 LearningRate 0.0054 Epoch: 0 Global Step: 280 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-07 20:19:32,338-Speed 5983.68 samples/sec Loss 41.8052 LearningRate 0.0056 Epoch: 0 Global Step: 290 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-07 20:19:39,209-Speed 5962.38 samples/sec Loss 41.7056 LearningRate 0.0058 Epoch: 0 Global Step: 300 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-07 20:19:46,058-Speed 5982.47 samples/sec Loss 41.6454 LearningRate 0.0060 Epoch: 0 Global Step: 310 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-07 20:19:52,907-Speed 5980.92 samples/sec Loss 41.5677 LearningRate 0.0062 Epoch: 0 Global Step: 320 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-07 20:19:59,752-Speed 5985.32 samples/sec Loss 41.4773 LearningRate 0.0064 Epoch: 0 Global Step: 330 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-07 20:20:06,627-Speed 5959.60 samples/sec Loss 41.4134 LearningRate 0.0066 Epoch: 0 Global Step: 340 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-07 20:20:13,488-Speed 5970.97 samples/sec Loss 41.3312 LearningRate 0.0068 Epoch: 0 Global Step: 350 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-07 20:20:20,331-Speed 5987.49 samples/sec Loss 41.2454 LearningRate 0.0069 Epoch: 0 Global Step: 360 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-07 20:20:27,208-Speed 5958.68 samples/sec Loss 41.1775 LearningRate 0.0071 Epoch: 0 Global Step: 370 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-07 20:20:34,057-Speed 5981.81 samples/sec Loss 41.1244 LearningRate 0.0073 Epoch: 0 Global Step: 380 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-07 20:20:40,923-Speed 5966.25 samples/sec Loss 41.0575 LearningRate 0.0075 Epoch: 0 Global Step: 390 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-07 20:20:47,798-Speed 5962.23 samples/sec Loss 40.9563 LearningRate 0.0077 Epoch: 0 Global Step: 400 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 20:20:54,624-Speed 6001.49 samples/sec Loss 40.9201 LearningRate 0.0079 Epoch: 0 Global Step: 410 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 20:21:01,479-Speed 5978.91 samples/sec Loss 40.8457 LearningRate 0.0081 Epoch: 0 Global Step: 420 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 20:21:08,338-Speed 5972.58 samples/sec Loss 40.8103 LearningRate 0.0083 Epoch: 0 Global Step: 430 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:21:15,210-Speed 5962.13 samples/sec Loss 40.7411 LearningRate 0.0085 Epoch: 0 Global Step: 440 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:21:22,061-Speed 5979.62 samples/sec Loss 40.6518 LearningRate 0.0087 Epoch: 0 Global Step: 450 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:21:28,903-Speed 5987.84 samples/sec Loss 40.6232 LearningRate 0.0089 Epoch: 0 Global Step: 460 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:21:35,753-Speed 5980.73 samples/sec Loss 40.5723 LearningRate 0.0091 Epoch: 0 Global Step: 470 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:21:42,620-Speed 5966.04 samples/sec Loss 40.5179 LearningRate 0.0093 Epoch: 0 Global Step: 480 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:21:49,512-Speed 5944.77 samples/sec Loss 40.4562 LearningRate 0.0095 Epoch: 0 Global Step: 490 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:21:56,370-Speed 5974.20 samples/sec Loss 40.4086 LearningRate 0.0096 Epoch: 0 Global Step: 500 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:22:03,228-Speed 5973.28 samples/sec Loss 40.3730 LearningRate 0.0098 Epoch: 0 Global Step: 510 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:22:10,086-Speed 5974.06 samples/sec Loss 40.2950 LearningRate 0.0100 Epoch: 0 Global Step: 520 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:22:16,940-Speed 5977.79 samples/sec Loss 40.2581 LearningRate 0.0102 Epoch: 0 Global Step: 530 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 20:22:23,788-Speed 5982.33 samples/sec Loss 40.1983 LearningRate 0.0104 Epoch: 0 Global Step: 540 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:22:30,636-Speed 5982.02 samples/sec Loss 40.1660 LearningRate 0.0106 Epoch: 0 Global Step: 550 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:22:37,491-Speed 5976.74 samples/sec Loss 40.1292 LearningRate 0.0108 Epoch: 0 Global Step: 560 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:22:44,333-Speed 5987.73 samples/sec Loss 40.0748 LearningRate 0.0110 Epoch: 0 Global Step: 570 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:22:51,176-Speed 5986.45 samples/sec Loss 40.0370 LearningRate 0.0112 Epoch: 0 Global Step: 580 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:22:58,048-Speed 5961.44 samples/sec Loss 39.9905 LearningRate 0.0114 Epoch: 0 Global Step: 590 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:23:04,889-Speed 5988.89 samples/sec Loss 39.9674 LearningRate 0.0116 Epoch: 0 Global Step: 600 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:23:11,732-Speed 5986.92 samples/sec Loss 39.8930 LearningRate 0.0118 Epoch: 0 Global Step: 610 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:23:18,598-Speed 5968.27 samples/sec Loss 39.8293 LearningRate 0.0120 Epoch: 0 Global Step: 620 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:23:25,434-Speed 5992.49 samples/sec Loss 39.8242 LearningRate 0.0122 Epoch: 0 Global Step: 630 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:23:32,290-Speed 5975.48 samples/sec Loss 39.7918 LearningRate 0.0123 Epoch: 0 Global Step: 640 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 20:23:39,154-Speed 5969.05 samples/sec Loss 39.7562 LearningRate 0.0125 Epoch: 0 Global Step: 650 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:23:46,003-Speed 5981.58 samples/sec Loss 39.7265 LearningRate 0.0127 Epoch: 0 Global Step: 660 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:23:52,847-Speed 5985.79 samples/sec Loss 39.6804 LearningRate 0.0129 Epoch: 0 Global Step: 670 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:23:59,727-Speed 5968.73 samples/sec Loss 39.6473 LearningRate 0.0131 Epoch: 0 Global Step: 680 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:24:06,594-Speed 5966.43 samples/sec Loss 39.5964 LearningRate 0.0133 Epoch: 0 Global Step: 690 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:24:13,536-Speed 5903.19 samples/sec Loss 39.5511 LearningRate 0.0135 Epoch: 0 Global Step: 700 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:24:20,396-Speed 5971.88 samples/sec Loss 39.5207 LearningRate 0.0137 Epoch: 0 Global Step: 710 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:24:27,325-Speed 5914.70 samples/sec Loss 39.5184 LearningRate 0.0139 Epoch: 0 Global Step: 720 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:24:34,169-Speed 5985.56 samples/sec Loss 39.4845 LearningRate 0.0141 Epoch: 0 Global Step: 730 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-01-07 20:24:41,025-Speed 5975.83 samples/sec Loss 39.4615 LearningRate 0.0143 Epoch: 0 Global Step: 740 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-01-07 20:24:47,877-Speed 5979.55 samples/sec Loss 39.4317 LearningRate 0.0145 Epoch: 0 Global Step: 750 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-01-07 20:24:54,735-Speed 5972.60 samples/sec Loss 39.3674 LearningRate 0.0147 Epoch: 0 Global Step: 760 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-01-07 20:25:01,588-Speed 5978.31 samples/sec Loss 39.3763 LearningRate 0.0149 Epoch: 0 Global Step: 770 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-01-07 20:25:08,463-Speed 5959.78 samples/sec Loss 39.3582 LearningRate 0.0150 Epoch: 0 Global Step: 780 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-01-07 20:25:15,332-Speed 5963.57 samples/sec Loss 39.3231 LearningRate 0.0152 Epoch: 0 Global Step: 790 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-01-07 20:25:22,193-Speed 5971.00 samples/sec Loss 39.2798 LearningRate 0.0154 Epoch: 0 Global Step: 800 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-01-07 20:25:29,057-Speed 5968.94 samples/sec Loss 39.2505 LearningRate 0.0156 Epoch: 0 Global Step: 810 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-01-07 20:25:35,906-Speed 5980.91 samples/sec Loss 39.2465 LearningRate 0.0158 Epoch: 0 Global Step: 820 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-01-07 20:25:42,781-Speed 5959.54 samples/sec Loss 39.2197 LearningRate 0.0160 Epoch: 0 Global Step: 830 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:25:49,654-Speed 5961.44 samples/sec Loss 39.2087 LearningRate 0.0162 Epoch: 0 Global Step: 840 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:25:56,529-Speed 5958.39 samples/sec Loss 39.1596 LearningRate 0.0164 Epoch: 0 Global Step: 850 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:26:03,402-Speed 5960.83 samples/sec Loss 39.1576 LearningRate 0.0166 Epoch: 0 Global Step: 860 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:26:10,260-Speed 5973.59 samples/sec Loss 39.1433 LearningRate 0.0168 Epoch: 0 Global Step: 870 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:26:17,122-Speed 5970.61 samples/sec Loss 39.1540 LearningRate 0.0170 Epoch: 0 Global Step: 880 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:26:24,008-Speed 5949.25 samples/sec Loss 39.1294 LearningRate 0.0172 Epoch: 0 Global Step: 890 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:26:30,864-Speed 5975.66 samples/sec Loss 39.1101 LearningRate 0.0174 Epoch: 0 Global Step: 900 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:26:37,813-Speed 5895.48 samples/sec Loss 39.0989 LearningRate 0.0176 Epoch: 0 Global Step: 910 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:26:44,677-Speed 5969.08 samples/sec Loss 39.0847 LearningRate 0.0177 Epoch: 0 Global Step: 920 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:26:51,550-Speed 5961.07 samples/sec Loss 39.0651 LearningRate 0.0179 Epoch: 0 Global Step: 930 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 20:26:58,393-Speed 5986.52 samples/sec Loss 39.0690 LearningRate 0.0181 Epoch: 0 Global Step: 940 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:27:05,255-Speed 5972.38 samples/sec Loss 39.0476 LearningRate 0.0183 Epoch: 0 Global Step: 950 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:27:12,107-Speed 5978.94 samples/sec Loss 39.0540 LearningRate 0.0185 Epoch: 0 Global Step: 960 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:27:18,970-Speed 5969.43 samples/sec Loss 39.0580 LearningRate 0.0187 Epoch: 0 Global Step: 970 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:27:25,853-Speed 5952.76 samples/sec Loss 39.0359 LearningRate 0.0189 Epoch: 0 Global Step: 980 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:27:32,716-Speed 5969.20 samples/sec Loss 39.0380 LearningRate 0.0191 Epoch: 0 Global Step: 990 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:27:39,568-Speed 5979.10 samples/sec Loss 39.0184 LearningRate 0.0193 Epoch: 0 Global Step: 1000 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:27:46,427-Speed 5972.38 samples/sec Loss 39.0182 LearningRate 0.0195 Epoch: 0 Global Step: 1010 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:27:53,279-Speed 5979.27 samples/sec Loss 38.9937 LearningRate 0.0197 Epoch: 0 Global Step: 1020 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:28:00,130-Speed 5978.91 samples/sec Loss 39.0072 LearningRate 0.0199 Epoch: 0 Global Step: 1030 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:28:06,987-Speed 5977.02 samples/sec Loss 39.0168 LearningRate 0.0201 Epoch: 0 Global Step: 1040 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 20:28:13,833-Speed 5984.66 samples/sec Loss 38.9930 LearningRate 0.0203 Epoch: 0 Global Step: 1050 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:28:20,682-Speed 5981.07 samples/sec Loss 38.9955 LearningRate 0.0204 Epoch: 0 Global Step: 1060 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:28:27,537-Speed 5976.42 samples/sec Loss 38.9939 LearningRate 0.0206 Epoch: 0 Global Step: 1070 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:28:34,420-Speed 5951.39 samples/sec Loss 38.9994 LearningRate 0.0208 Epoch: 0 Global Step: 1080 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:28:41,272-Speed 5978.54 samples/sec Loss 39.0005 LearningRate 0.0210 Epoch: 0 Global Step: 1090 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:28:48,153-Speed 5953.91 samples/sec Loss 38.9745 LearningRate 0.0212 Epoch: 0 Global Step: 1100 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:28:55,010-Speed 5974.78 samples/sec Loss 38.9775 LearningRate 0.0214 Epoch: 0 Global Step: 1110 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:29:01,881-Speed 5962.39 samples/sec Loss 38.9924 LearningRate 0.0216 Epoch: 0 Global Step: 1120 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:29:08,730-Speed 5981.46 samples/sec Loss 38.9674 LearningRate 0.0218 Epoch: 0 Global Step: 1130 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:29:15,591-Speed 5971.48 samples/sec Loss 38.9701 LearningRate 0.0220 Epoch: 0 Global Step: 1140 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:29:22,446-Speed 5975.44 samples/sec Loss 38.9810 LearningRate 0.0222 Epoch: 0 Global Step: 1150 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 20:29:29,353-Speed 5931.76 samples/sec Loss 38.9791 LearningRate 0.0224 Epoch: 0 Global Step: 1160 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 20:29:36,303-Speed 5894.62 samples/sec Loss 38.9628 LearningRate 0.0226 Epoch: 0 Global Step: 1170 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 20:29:43,245-Speed 5901.19 samples/sec Loss 38.9661 LearningRate 0.0228 Epoch: 0 Global Step: 1180 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:29:50,170-Speed 5915.68 samples/sec Loss 38.9722 LearningRate 0.0230 Epoch: 0 Global Step: 1190 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:29:57,069-Speed 5938.80 samples/sec Loss 38.9813 LearningRate 0.0231 Epoch: 0 Global Step: 1200 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:30:03,930-Speed 5971.07 samples/sec Loss 38.9934 LearningRate 0.0233 Epoch: 0 Global Step: 1210 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:30:10,802-Speed 5960.84 samples/sec Loss 38.9674 LearningRate 0.0235 Epoch: 0 Global Step: 1220 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:30:17,674-Speed 5961.32 samples/sec Loss 38.9816 LearningRate 0.0237 Epoch: 0 Global Step: 1230 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:30:24,541-Speed 5968.30 samples/sec Loss 38.9865 LearningRate 0.0239 Epoch: 0 Global Step: 1240 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:30:31,412-Speed 5962.77 samples/sec Loss 38.9702 LearningRate 0.0241 Epoch: 0 Global Step: 1250 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:30:38,276-Speed 5968.05 samples/sec Loss 38.9805 LearningRate 0.0243 Epoch: 0 Global Step: 1260 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:30:45,135-Speed 5973.83 samples/sec Loss 38.9873 LearningRate 0.0245 Epoch: 0 Global Step: 1270 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:30:51,995-Speed 5971.66 samples/sec Loss 38.9819 LearningRate 0.0247 Epoch: 0 Global Step: 1280 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 20:30:58,859-Speed 5968.71 samples/sec Loss 38.9628 LearningRate 0.0249 Epoch: 0 Global Step: 1290 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:31:05,709-Speed 5981.14 samples/sec Loss 38.9647 LearningRate 0.0251 Epoch: 0 Global Step: 1300 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:31:12,571-Speed 5970.11 samples/sec Loss 38.9823 LearningRate 0.0253 Epoch: 0 Global Step: 1310 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:31:19,438-Speed 5966.28 samples/sec Loss 38.9873 LearningRate 0.0255 Epoch: 0 Global Step: 1320 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:31:26,290-Speed 5979.52 samples/sec Loss 38.9673 LearningRate 0.0257 Epoch: 0 Global Step: 1330 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:31:33,141-Speed 5979.45 samples/sec Loss 38.9920 LearningRate 0.0258 Epoch: 0 Global Step: 1340 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:31:40,005-Speed 5969.00 samples/sec Loss 38.9698 LearningRate 0.0260 Epoch: 0 Global Step: 1350 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:31:46,854-Speed 5980.60 samples/sec Loss 38.9964 LearningRate 0.0262 Epoch: 0 Global Step: 1360 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:31:53,727-Speed 5960.97 samples/sec Loss 38.9925 LearningRate 0.0264 Epoch: 0 Global Step: 1370 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:32:00,600-Speed 5961.23 samples/sec Loss 38.9887 LearningRate 0.0266 Epoch: 0 Global Step: 1380 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-07 20:32:07,463-Speed 5968.66 samples/sec Loss 39.0125 LearningRate 0.0268 Epoch: 0 Global Step: 1390 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 20:32:14,319-Speed 5978.56 samples/sec Loss 38.9861 LearningRate 0.0270 Epoch: 0 Global Step: 1400 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 20:32:21,165-Speed 5983.78 samples/sec Loss 38.9947 LearningRate 0.0272 Epoch: 0 Global Step: 1410 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 20:32:28,054-Speed 5946.90 samples/sec Loss 38.9721 LearningRate 0.0274 Epoch: 0 Global Step: 1420 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 20:32:34,990-Speed 5906.85 samples/sec Loss 38.9875 LearningRate 0.0276 Epoch: 0 Global Step: 1430 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 20:32:41,965-Speed 5873.81 samples/sec Loss 38.9687 LearningRate 0.0278 Epoch: 0 Global Step: 1440 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 20:32:48,925-Speed 5886.13 samples/sec Loss 38.9819 LearningRate 0.0280 Epoch: 0 Global Step: 1450 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 20:32:55,804-Speed 5955.89 samples/sec Loss 38.9673 LearningRate 0.0282 Epoch: 0 Global Step: 1460 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 20:33:02,713-Speed 5929.27 samples/sec Loss 38.9776 LearningRate 0.0284 Epoch: 0 Global Step: 1470 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 20:33:09,578-Speed 5967.85 samples/sec Loss 38.9723 LearningRate 0.0285 Epoch: 0 Global Step: 1480 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 20:33:16,433-Speed 5976.53 samples/sec Loss 39.0120 LearningRate 0.0287 Epoch: 0 Global Step: 1490 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 20:33:23,307-Speed 5958.70 samples/sec Loss 38.9859 LearningRate 0.0289 Epoch: 0 Global Step: 1500 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 20:33:30,180-Speed 5961.46 samples/sec Loss 38.9910 LearningRate 0.0291 Epoch: 0 Global Step: 1510 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 20:33:37,059-Speed 5955.37 samples/sec Loss 38.9745 LearningRate 0.0293 Epoch: 0 Global Step: 1520 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 20:33:43,922-Speed 5968.70 samples/sec Loss 38.9936 LearningRate 0.0295 Epoch: 0 Global Step: 1530 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 20:33:50,808-Speed 5950.01 samples/sec Loss 38.9866 LearningRate 0.0297 Epoch: 0 Global Step: 1540 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 20:33:57,690-Speed 5952.91 samples/sec Loss 38.9934 LearningRate 0.0299 Epoch: 0 Global Step: 1550 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 20:34:04,574-Speed 5951.60 samples/sec Loss 38.9996 LearningRate 0.0301 Epoch: 0 Global Step: 1560 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 20:34:11,459-Speed 5950.49 samples/sec Loss 38.9841 LearningRate 0.0303 Epoch: 0 Global Step: 1570 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 20:34:18,310-Speed 5979.97 samples/sec Loss 38.9631 LearningRate 0.0305 Epoch: 0 Global Step: 1580 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 20:34:25,165-Speed 5975.76 samples/sec Loss 38.9840 LearningRate 0.0307 Epoch: 0 Global Step: 1590 Fp16 Grad Scale: 262144 Required: 40 hours Training: 2022-01-07 20:34:32,018-Speed 5978.76 samples/sec Loss 38.9842 LearningRate 0.0309 Epoch: 0 Global Step: 1600 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 20:34:38,868-Speed 5980.21 samples/sec Loss 38.9848 LearningRate 0.0311 Epoch: 0 Global Step: 1610 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 20:34:45,725-Speed 5974.35 samples/sec Loss 39.0005 LearningRate 0.0312 Epoch: 0 Global Step: 1620 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:34:52,611-Speed 5949.27 samples/sec Loss 39.0077 LearningRate 0.0314 Epoch: 0 Global Step: 1630 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:34:59,513-Speed 5935.78 samples/sec Loss 38.9841 LearningRate 0.0316 Epoch: 0 Global Step: 1640 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:35:06,417-Speed 5934.42 samples/sec Loss 38.9946 LearningRate 0.0318 Epoch: 0 Global Step: 1650 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:35:13,368-Speed 5893.91 samples/sec Loss 38.9825 LearningRate 0.0320 Epoch: 0 Global Step: 1660 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:35:20,244-Speed 5957.73 samples/sec Loss 38.9760 LearningRate 0.0322 Epoch: 0 Global Step: 1670 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:35:27,106-Speed 5973.03 samples/sec Loss 38.9981 LearningRate 0.0324 Epoch: 0 Global Step: 1680 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-07 20:35:34,002-Speed 5943.96 samples/sec Loss 38.9854 LearningRate 0.0326 Epoch: 0 Global Step: 1690 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-07 20:35:40,856-Speed 5977.55 samples/sec Loss 38.9892 LearningRate 0.0328 Epoch: 0 Global Step: 1700 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-07 20:35:47,707-Speed 5979.54 samples/sec Loss 38.9947 LearningRate 0.0330 Epoch: 0 Global Step: 1710 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-07 20:35:54,564-Speed 5976.54 samples/sec Loss 38.9936 LearningRate 0.0332 Epoch: 0 Global Step: 1720 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-07 20:36:01,440-Speed 5962.39 samples/sec Loss 39.0219 LearningRate 0.0334 Epoch: 0 Global Step: 1730 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-07 20:36:08,294-Speed 5976.97 samples/sec Loss 39.0060 LearningRate 0.0336 Epoch: 0 Global Step: 1740 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-07 20:36:15,188-Speed 5941.97 samples/sec Loss 39.0153 LearningRate 0.0338 Epoch: 0 Global Step: 1750 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-07 20:36:22,066-Speed 5957.38 samples/sec Loss 38.9944 LearningRate 0.0339 Epoch: 0 Global Step: 1760 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-07 20:36:28,946-Speed 5954.29 samples/sec Loss 38.9938 LearningRate 0.0341 Epoch: 0 Global Step: 1770 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-07 20:36:35,799-Speed 5980.33 samples/sec Loss 38.9996 LearningRate 0.0343 Epoch: 0 Global Step: 1780 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:36:42,652-Speed 5978.25 samples/sec Loss 38.9743 LearningRate 0.0345 Epoch: 0 Global Step: 1790 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:36:49,516-Speed 5968.25 samples/sec Loss 39.0266 LearningRate 0.0347 Epoch: 0 Global Step: 1800 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:36:56,390-Speed 5960.19 samples/sec Loss 39.0039 LearningRate 0.0349 Epoch: 0 Global Step: 1810 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:37:03,254-Speed 5968.43 samples/sec Loss 38.9995 LearningRate 0.0351 Epoch: 0 Global Step: 1820 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:37:10,131-Speed 5957.24 samples/sec Loss 38.9976 LearningRate 0.0353 Epoch: 0 Global Step: 1830 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:37:16,998-Speed 5966.42 samples/sec Loss 39.0207 LearningRate 0.0355 Epoch: 0 Global Step: 1840 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:37:23,862-Speed 5968.51 samples/sec Loss 39.0049 LearningRate 0.0357 Epoch: 0 Global Step: 1850 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:37:30,735-Speed 5960.15 samples/sec Loss 38.9927 LearningRate 0.0359 Epoch: 0 Global Step: 1860 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:37:37,596-Speed 5970.89 samples/sec Loss 38.9925 LearningRate 0.0361 Epoch: 0 Global Step: 1870 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:37:44,458-Speed 5970.18 samples/sec Loss 39.0170 LearningRate 0.0363 Epoch: 0 Global Step: 1880 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:37:51,325-Speed 5966.14 samples/sec Loss 38.9966 LearningRate 0.0365 Epoch: 0 Global Step: 1890 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:37:58,195-Speed 5962.77 samples/sec Loss 38.9791 LearningRate 0.0366 Epoch: 0 Global Step: 1900 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:38:05,066-Speed 5962.20 samples/sec Loss 38.9851 LearningRate 0.0368 Epoch: 0 Global Step: 1910 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:38:11,997-Speed 5911.79 samples/sec Loss 38.9756 LearningRate 0.0370 Epoch: 0 Global Step: 1920 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:38:18,856-Speed 5972.90 samples/sec Loss 39.0197 LearningRate 0.0372 Epoch: 0 Global Step: 1930 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:38:25,739-Speed 5952.25 samples/sec Loss 38.9839 LearningRate 0.0374 Epoch: 0 Global Step: 1940 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:38:32,608-Speed 5966.03 samples/sec Loss 38.9746 LearningRate 0.0376 Epoch: 0 Global Step: 1950 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:38:39,470-Speed 5969.82 samples/sec Loss 39.0004 LearningRate 0.0378 Epoch: 0 Global Step: 1960 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:38:46,351-Speed 5954.11 samples/sec Loss 38.9728 LearningRate 0.0380 Epoch: 0 Global Step: 1970 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:38:53,198-Speed 5983.69 samples/sec Loss 38.9963 LearningRate 0.0382 Epoch: 0 Global Step: 1980 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:39:00,076-Speed 5956.53 samples/sec Loss 39.0164 LearningRate 0.0384 Epoch: 0 Global Step: 1990 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:39:06,947-Speed 5962.53 samples/sec Loss 38.9966 LearningRate 0.0386 Epoch: 0 Global Step: 2000 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:39:13,812-Speed 5968.27 samples/sec Loss 38.9955 LearningRate 0.0388 Epoch: 0 Global Step: 2010 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:39:20,679-Speed 5965.57 samples/sec Loss 38.9830 LearningRate 0.0390 Epoch: 0 Global Step: 2020 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:39:27,577-Speed 5942.10 samples/sec Loss 39.0091 LearningRate 0.0392 Epoch: 0 Global Step: 2030 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:39:34,448-Speed 5962.44 samples/sec Loss 38.9936 LearningRate 0.0393 Epoch: 0 Global Step: 2040 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:39:41,321-Speed 5960.94 samples/sec Loss 38.9832 LearningRate 0.0395 Epoch: 0 Global Step: 2050 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:39:48,189-Speed 5964.92 samples/sec Loss 38.9807 LearningRate 0.0397 Epoch: 0 Global Step: 2060 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:39:55,104-Speed 5923.93 samples/sec Loss 38.9631 LearningRate 0.0399 Epoch: 0 Global Step: 2070 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:40:02,045-Speed 5902.40 samples/sec Loss 38.9817 LearningRate 0.0401 Epoch: 0 Global Step: 2080 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 20:40:08,967-Speed 5918.78 samples/sec Loss 38.9700 LearningRate 0.0403 Epoch: 0 Global Step: 2090 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:40:15,874-Speed 5931.38 samples/sec Loss 38.9316 LearningRate 0.0405 Epoch: 0 Global Step: 2100 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:40:22,728-Speed 5977.14 samples/sec Loss 38.9720 LearningRate 0.0407 Epoch: 0 Global Step: 2110 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:40:29,600-Speed 5961.56 samples/sec Loss 38.9500 LearningRate 0.0409 Epoch: 0 Global Step: 2120 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:40:36,462-Speed 5970.60 samples/sec Loss 38.9639 LearningRate 0.0411 Epoch: 0 Global Step: 2130 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:40:43,324-Speed 5970.83 samples/sec Loss 38.9435 LearningRate 0.0413 Epoch: 0 Global Step: 2140 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:40:50,188-Speed 5970.73 samples/sec Loss 38.9259 LearningRate 0.0415 Epoch: 0 Global Step: 2150 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:40:57,043-Speed 5975.85 samples/sec Loss 38.9296 LearningRate 0.0417 Epoch: 0 Global Step: 2160 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:41:03,897-Speed 5977.55 samples/sec Loss 38.9367 LearningRate 0.0419 Epoch: 0 Global Step: 2170 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:41:10,752-Speed 5976.39 samples/sec Loss 38.9845 LearningRate 0.0420 Epoch: 0 Global Step: 2180 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:41:17,607-Speed 5976.15 samples/sec Loss 38.9526 LearningRate 0.0422 Epoch: 0 Global Step: 2190 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 20:41:24,467-Speed 5972.22 samples/sec Loss 38.9232 LearningRate 0.0424 Epoch: 0 Global Step: 2200 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 20:41:31,323-Speed 5974.93 samples/sec Loss 38.9251 LearningRate 0.0426 Epoch: 0 Global Step: 2210 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 20:41:38,182-Speed 5973.49 samples/sec Loss 38.9275 LearningRate 0.0428 Epoch: 0 Global Step: 2220 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 20:41:45,050-Speed 5966.95 samples/sec Loss 38.8835 LearningRate 0.0430 Epoch: 0 Global Step: 2230 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:41:51,913-Speed 5971.99 samples/sec Loss 38.9009 LearningRate 0.0432 Epoch: 0 Global Step: 2240 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:41:58,805-Speed 5944.24 samples/sec Loss 38.8685 LearningRate 0.0434 Epoch: 0 Global Step: 2250 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:42:05,677-Speed 5963.05 samples/sec Loss 38.8964 LearningRate 0.0436 Epoch: 0 Global Step: 2260 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:42:12,535-Speed 5973.44 samples/sec Loss 38.8614 LearningRate 0.0438 Epoch: 0 Global Step: 2270 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:42:19,404-Speed 5964.55 samples/sec Loss 38.9026 LearningRate 0.0440 Epoch: 0 Global Step: 2280 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:42:26,290-Speed 5948.95 samples/sec Loss 38.8550 LearningRate 0.0442 Epoch: 0 Global Step: 2290 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:42:33,149-Speed 5972.57 samples/sec Loss 38.8624 LearningRate 0.0444 Epoch: 0 Global Step: 2300 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:42:40,046-Speed 5940.62 samples/sec Loss 38.8621 LearningRate 0.0446 Epoch: 0 Global Step: 2310 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:42:46,916-Speed 5962.71 samples/sec Loss 38.8372 LearningRate 0.0447 Epoch: 0 Global Step: 2320 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:42:53,758-Speed 5987.97 samples/sec Loss 38.8022 LearningRate 0.0449 Epoch: 0 Global Step: 2330 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:43:00,624-Speed 5966.72 samples/sec Loss 38.8195 LearningRate 0.0451 Epoch: 0 Global Step: 2340 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:43:07,481-Speed 5974.43 samples/sec Loss 38.8139 LearningRate 0.0453 Epoch: 0 Global Step: 2350 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:43:14,372-Speed 5944.93 samples/sec Loss 38.7939 LearningRate 0.0455 Epoch: 0 Global Step: 2360 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:43:21,228-Speed 5975.97 samples/sec Loss 38.7874 LearningRate 0.0457 Epoch: 0 Global Step: 2370 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:43:28,096-Speed 5964.68 samples/sec Loss 38.7595 LearningRate 0.0459 Epoch: 0 Global Step: 2380 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:43:34,948-Speed 5979.26 samples/sec Loss 38.7434 LearningRate 0.0461 Epoch: 0 Global Step: 2390 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:43:41,816-Speed 5965.46 samples/sec Loss 38.7468 LearningRate 0.0463 Epoch: 0 Global Step: 2400 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:43:48,676-Speed 5972.28 samples/sec Loss 38.7252 LearningRate 0.0465 Epoch: 0 Global Step: 2410 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:43:55,530-Speed 5977.09 samples/sec Loss 38.7193 LearningRate 0.0467 Epoch: 0 Global Step: 2420 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:44:02,388-Speed 5973.16 samples/sec Loss 38.7122 LearningRate 0.0469 Epoch: 0 Global Step: 2430 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 20:44:09,262-Speed 5960.45 samples/sec Loss 38.6918 LearningRate 0.0471 Epoch: 0 Global Step: 2440 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:44:16,122-Speed 5971.47 samples/sec Loss 38.6639 LearningRate 0.0473 Epoch: 0 Global Step: 2450 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:44:22,980-Speed 5973.35 samples/sec Loss 38.6667 LearningRate 0.0474 Epoch: 0 Global Step: 2460 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:44:29,848-Speed 5965.41 samples/sec Loss 38.6632 LearningRate 0.0476 Epoch: 0 Global Step: 2470 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:44:36,737-Speed 5946.59 samples/sec Loss 38.6093 LearningRate 0.0478 Epoch: 0 Global Step: 2480 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:44:43,607-Speed 5965.16 samples/sec Loss 38.6256 LearningRate 0.0480 Epoch: 0 Global Step: 2490 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:44:50,470-Speed 5969.16 samples/sec Loss 38.5960 LearningRate 0.0482 Epoch: 0 Global Step: 2500 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:44:57,355-Speed 5951.86 samples/sec Loss 38.5823 LearningRate 0.0484 Epoch: 0 Global Step: 2510 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:45:04,223-Speed 5965.52 samples/sec Loss 38.5503 LearningRate 0.0486 Epoch: 0 Global Step: 2520 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:45:11,083-Speed 5971.56 samples/sec Loss 38.5346 LearningRate 0.0488 Epoch: 0 Global Step: 2530 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:45:17,947-Speed 5968.23 samples/sec Loss 38.5030 LearningRate 0.0490 Epoch: 0 Global Step: 2540 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:45:24,841-Speed 5942.75 samples/sec Loss 38.4794 LearningRate 0.0492 Epoch: 0 Global Step: 2550 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:45:31,693-Speed 5981.49 samples/sec Loss 38.5025 LearningRate 0.0494 Epoch: 0 Global Step: 2560 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:45:38,587-Speed 5942.91 samples/sec Loss 38.4676 LearningRate 0.0496 Epoch: 0 Global Step: 2570 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:45:45,456-Speed 5963.82 samples/sec Loss 38.4305 LearningRate 0.0498 Epoch: 0 Global Step: 2580 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:45:52,309-Speed 5977.81 samples/sec Loss 38.4588 LearningRate 0.0500 Epoch: 0 Global Step: 2590 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:45:59,161-Speed 5979.32 samples/sec Loss 38.4301 LearningRate 0.0501 Epoch: 0 Global Step: 2600 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:46:06,027-Speed 5966.46 samples/sec Loss 38.3997 LearningRate 0.0503 Epoch: 0 Global Step: 2610 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:46:12,988-Speed 5884.96 samples/sec Loss 38.3583 LearningRate 0.0505 Epoch: 0 Global Step: 2620 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:46:19,851-Speed 5970.05 samples/sec Loss 38.3270 LearningRate 0.0507 Epoch: 0 Global Step: 2630 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:46:26,714-Speed 5971.09 samples/sec Loss 38.3352 LearningRate 0.0509 Epoch: 0 Global Step: 2640 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 20:46:33,587-Speed 5960.52 samples/sec Loss 38.3132 LearningRate 0.0511 Epoch: 0 Global Step: 2650 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:46:40,444-Speed 5975.41 samples/sec Loss 38.3094 LearningRate 0.0513 Epoch: 0 Global Step: 2660 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:46:47,317-Speed 5960.01 samples/sec Loss 38.2479 LearningRate 0.0515 Epoch: 0 Global Step: 2670 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:46:54,197-Speed 5958.34 samples/sec Loss 38.2519 LearningRate 0.0517 Epoch: 0 Global Step: 2680 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:47:01,050-Speed 5978.47 samples/sec Loss 38.2418 LearningRate 0.0519 Epoch: 0 Global Step: 2690 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:47:07,901-Speed 5980.02 samples/sec Loss 38.2228 LearningRate 0.0521 Epoch: 0 Global Step: 2700 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:47:14,751-Speed 5980.48 samples/sec Loss 38.1595 LearningRate 0.0523 Epoch: 0 Global Step: 2710 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:47:21,609-Speed 5973.26 samples/sec Loss 38.1601 LearningRate 0.0525 Epoch: 0 Global Step: 2720 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:47:28,503-Speed 5943.38 samples/sec Loss 38.1444 LearningRate 0.0527 Epoch: 0 Global Step: 2730 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:47:35,362-Speed 5972.79 samples/sec Loss 38.1072 LearningRate 0.0528 Epoch: 0 Global Step: 2740 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:47:42,243-Speed 5953.13 samples/sec Loss 38.0634 LearningRate 0.0530 Epoch: 0 Global Step: 2750 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 20:47:49,097-Speed 5977.58 samples/sec Loss 38.0635 LearningRate 0.0532 Epoch: 0 Global Step: 2760 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:47:55,958-Speed 5970.77 samples/sec Loss 38.0383 LearningRate 0.0534 Epoch: 0 Global Step: 2770 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:48:02,844-Speed 5949.90 samples/sec Loss 38.0355 LearningRate 0.0536 Epoch: 0 Global Step: 2780 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:48:09,716-Speed 5960.67 samples/sec Loss 38.0058 LearningRate 0.0538 Epoch: 0 Global Step: 2790 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:48:16,567-Speed 5980.30 samples/sec Loss 37.9565 LearningRate 0.0540 Epoch: 0 Global Step: 2800 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:48:23,414-Speed 5982.77 samples/sec Loss 37.9802 LearningRate 0.0542 Epoch: 0 Global Step: 2810 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:48:30,258-Speed 5985.97 samples/sec Loss 37.9396 LearningRate 0.0544 Epoch: 0 Global Step: 2820 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:48:37,120-Speed 5972.37 samples/sec Loss 37.9317 LearningRate 0.0546 Epoch: 0 Global Step: 2830 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:48:43,971-Speed 5979.81 samples/sec Loss 37.8873 LearningRate 0.0548 Epoch: 0 Global Step: 2840 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:48:50,823-Speed 5978.55 samples/sec Loss 37.8206 LearningRate 0.0550 Epoch: 0 Global Step: 2850 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:48:57,675-Speed 5978.95 samples/sec Loss 37.8006 LearningRate 0.0552 Epoch: 0 Global Step: 2860 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:49:04,527-Speed 5978.80 samples/sec Loss 37.8381 LearningRate 0.0554 Epoch: 0 Global Step: 2870 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:49:11,373-Speed 5984.45 samples/sec Loss 37.7962 LearningRate 0.0556 Epoch: 0 Global Step: 2880 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:49:18,257-Speed 5953.61 samples/sec Loss 37.7702 LearningRate 0.0557 Epoch: 0 Global Step: 2890 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:49:25,112-Speed 5977.13 samples/sec Loss 37.7147 LearningRate 0.0559 Epoch: 0 Global Step: 2900 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:49:31,980-Speed 5965.34 samples/sec Loss 37.6737 LearningRate 0.0561 Epoch: 0 Global Step: 2910 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:49:38,832-Speed 5978.24 samples/sec Loss 37.7134 LearningRate 0.0563 Epoch: 0 Global Step: 2920 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:49:45,700-Speed 5965.77 samples/sec Loss 37.6625 LearningRate 0.0565 Epoch: 0 Global Step: 2930 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:49:52,564-Speed 5967.94 samples/sec Loss 37.6372 LearningRate 0.0567 Epoch: 0 Global Step: 2940 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:49:59,412-Speed 5982.88 samples/sec Loss 37.5586 LearningRate 0.0569 Epoch: 0 Global Step: 2950 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:50:06,280-Speed 5967.44 samples/sec Loss 37.6291 LearningRate 0.0571 Epoch: 0 Global Step: 2960 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 20:50:13,131-Speed 5982.19 samples/sec Loss 37.5445 LearningRate 0.0573 Epoch: 0 Global Step: 2970 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 20:50:19,996-Speed 5967.45 samples/sec Loss 37.5529 LearningRate 0.0575 Epoch: 0 Global Step: 2980 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:50:26,869-Speed 5960.64 samples/sec Loss 37.5017 LearningRate 0.0577 Epoch: 0 Global Step: 2990 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:50:33,715-Speed 5984.18 samples/sec Loss 37.5059 LearningRate 0.0579 Epoch: 0 Global Step: 3000 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:50:40,696-Speed 5868.38 samples/sec Loss 37.4628 LearningRate 0.0581 Epoch: 0 Global Step: 3010 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:50:47,559-Speed 5969.46 samples/sec Loss 37.4272 LearningRate 0.0583 Epoch: 0 Global Step: 3020 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:50:54,506-Speed 5897.03 samples/sec Loss 37.4054 LearningRate 0.0584 Epoch: 0 Global Step: 3030 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:51:01,355-Speed 5980.67 samples/sec Loss 37.3898 LearningRate 0.0586 Epoch: 0 Global Step: 3040 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:51:08,343-Speed 5863.02 samples/sec Loss 37.3195 LearningRate 0.0588 Epoch: 0 Global Step: 3050 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:51:15,196-Speed 5977.90 samples/sec Loss 37.3005 LearningRate 0.0590 Epoch: 0 Global Step: 3060 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:51:22,070-Speed 5962.61 samples/sec Loss 37.2665 LearningRate 0.0592 Epoch: 0 Global Step: 3070 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:51:28,925-Speed 5976.11 samples/sec Loss 37.2734 LearningRate 0.0594 Epoch: 0 Global Step: 3080 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 20:51:35,773-Speed 5982.24 samples/sec Loss 37.2562 LearningRate 0.0596 Epoch: 0 Global Step: 3090 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 20:51:42,622-Speed 5983.00 samples/sec Loss 37.1916 LearningRate 0.0598 Epoch: 0 Global Step: 3100 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 20:51:49,469-Speed 5983.61 samples/sec Loss 37.1790 LearningRate 0.0600 Epoch: 0 Global Step: 3110 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 20:51:56,321-Speed 5978.35 samples/sec Loss 37.1711 LearningRate 0.0602 Epoch: 0 Global Step: 3120 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 20:52:03,196-Speed 5958.78 samples/sec Loss 37.1202 LearningRate 0.0604 Epoch: 0 Global Step: 3130 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 20:52:10,039-Speed 5986.35 samples/sec Loss 37.0971 LearningRate 0.0606 Epoch: 0 Global Step: 3140 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 20:52:16,879-Speed 5991.28 samples/sec Loss 37.0651 LearningRate 0.0608 Epoch: 0 Global Step: 3150 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:52:23,748-Speed 5964.49 samples/sec Loss 37.0270 LearningRate 0.0610 Epoch: 0 Global Step: 3160 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:52:30,602-Speed 5977.70 samples/sec Loss 36.9910 LearningRate 0.0611 Epoch: 0 Global Step: 3170 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:52:37,466-Speed 5969.93 samples/sec Loss 36.9284 LearningRate 0.0613 Epoch: 0 Global Step: 3180 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:52:44,322-Speed 5974.60 samples/sec Loss 36.9566 LearningRate 0.0615 Epoch: 0 Global Step: 3190 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:52:51,189-Speed 5966.37 samples/sec Loss 36.8858 LearningRate 0.0617 Epoch: 0 Global Step: 3200 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:52:58,040-Speed 5980.39 samples/sec Loss 36.8811 LearningRate 0.0619 Epoch: 0 Global Step: 3210 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:53:04,906-Speed 5966.51 samples/sec Loss 36.8102 LearningRate 0.0621 Epoch: 0 Global Step: 3220 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:53:11,772-Speed 5972.62 samples/sec Loss 36.8026 LearningRate 0.0623 Epoch: 0 Global Step: 3230 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:53:18,626-Speed 5979.33 samples/sec Loss 36.8288 LearningRate 0.0625 Epoch: 0 Global Step: 3240 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:53:25,479-Speed 5980.60 samples/sec Loss 36.7670 LearningRate 0.0627 Epoch: 0 Global Step: 3250 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 20:53:32,347-Speed 5964.99 samples/sec Loss 36.7325 LearningRate 0.0629 Epoch: 0 Global Step: 3260 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 20:53:39,188-Speed 5988.94 samples/sec Loss 36.7056 LearningRate 0.0631 Epoch: 0 Global Step: 3270 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:53:46,054-Speed 5969.45 samples/sec Loss 36.6984 LearningRate 0.0633 Epoch: 0 Global Step: 3280 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:53:52,906-Speed 5979.08 samples/sec Loss 36.6671 LearningRate 0.0635 Epoch: 0 Global Step: 3290 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:53:59,758-Speed 5979.15 samples/sec Loss 36.6037 LearningRate 0.0637 Epoch: 0 Global Step: 3300 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:54:06,623-Speed 5967.68 samples/sec Loss 36.6051 LearningRate 0.0638 Epoch: 0 Global Step: 3310 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:54:13,502-Speed 5955.37 samples/sec Loss 36.5251 LearningRate 0.0640 Epoch: 0 Global Step: 3320 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:54:20,358-Speed 5976.06 samples/sec Loss 36.4742 LearningRate 0.0642 Epoch: 0 Global Step: 3330 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:54:27,216-Speed 5973.72 samples/sec Loss 36.4280 LearningRate 0.0644 Epoch: 0 Global Step: 3340 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:54:34,082-Speed 5966.71 samples/sec Loss 36.4358 LearningRate 0.0646 Epoch: 0 Global Step: 3350 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:54:40,937-Speed 5976.38 samples/sec Loss 36.4167 LearningRate 0.0648 Epoch: 0 Global Step: 3360 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:54:47,791-Speed 5977.81 samples/sec Loss 36.3575 LearningRate 0.0650 Epoch: 0 Global Step: 3370 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 20:54:54,645-Speed 5977.04 samples/sec Loss 36.3310 LearningRate 0.0652 Epoch: 0 Global Step: 3380 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 20:55:01,508-Speed 5968.80 samples/sec Loss 36.2775 LearningRate 0.0654 Epoch: 0 Global Step: 3390 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 20:55:08,356-Speed 5983.13 samples/sec Loss 36.2646 LearningRate 0.0656 Epoch: 0 Global Step: 3400 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 20:55:15,221-Speed 5967.96 samples/sec Loss 36.2401 LearningRate 0.0658 Epoch: 0 Global Step: 3410 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 20:55:22,090-Speed 5963.55 samples/sec Loss 36.1861 LearningRate 0.0660 Epoch: 0 Global Step: 3420 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 20:55:28,954-Speed 5968.67 samples/sec Loss 36.1800 LearningRate 0.0662 Epoch: 0 Global Step: 3430 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 20:55:35,827-Speed 5960.44 samples/sec Loss 36.0995 LearningRate 0.0664 Epoch: 0 Global Step: 3440 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 20:55:42,694-Speed 5966.29 samples/sec Loss 36.1242 LearningRate 0.0665 Epoch: 0 Global Step: 3450 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 20:55:49,590-Speed 5941.10 samples/sec Loss 36.0335 LearningRate 0.0667 Epoch: 0 Global Step: 3460 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:55:56,459-Speed 5964.30 samples/sec Loss 35.9773 LearningRate 0.0669 Epoch: 0 Global Step: 3470 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:56:03,322-Speed 5969.28 samples/sec Loss 36.0212 LearningRate 0.0671 Epoch: 0 Global Step: 3480 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:56:10,175-Speed 5978.15 samples/sec Loss 35.9200 LearningRate 0.0673 Epoch: 0 Global Step: 3490 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:56:17,037-Speed 5970.32 samples/sec Loss 35.8686 LearningRate 0.0675 Epoch: 0 Global Step: 3500 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:56:23,893-Speed 5975.41 samples/sec Loss 35.9164 LearningRate 0.0677 Epoch: 0 Global Step: 3510 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:56:30,768-Speed 5959.34 samples/sec Loss 35.8524 LearningRate 0.0679 Epoch: 0 Global Step: 3520 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:56:37,628-Speed 5971.48 samples/sec Loss 35.7934 LearningRate 0.0681 Epoch: 0 Global Step: 3530 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:56:44,488-Speed 5972.11 samples/sec Loss 35.7463 LearningRate 0.0683 Epoch: 0 Global Step: 3540 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:56:51,360-Speed 5961.59 samples/sec Loss 35.7327 LearningRate 0.0685 Epoch: 0 Global Step: 3550 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:56:58,211-Speed 5980.33 samples/sec Loss 35.6677 LearningRate 0.0687 Epoch: 0 Global Step: 3560 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 20:57:05,065-Speed 5977.64 samples/sec Loss 35.6797 LearningRate 0.0689 Epoch: 0 Global Step: 3570 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 20:57:11,908-Speed 5986.24 samples/sec Loss 35.6242 LearningRate 0.0691 Epoch: 0 Global Step: 3580 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 20:57:18,773-Speed 5969.99 samples/sec Loss 35.5924 LearningRate 0.0692 Epoch: 0 Global Step: 3590 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 20:57:25,622-Speed 5983.30 samples/sec Loss 35.5608 LearningRate 0.0694 Epoch: 0 Global Step: 3600 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 20:57:32,504-Speed 5953.51 samples/sec Loss 35.4761 LearningRate 0.0696 Epoch: 0 Global Step: 3610 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 20:57:39,367-Speed 5970.04 samples/sec Loss 35.4137 LearningRate 0.0698 Epoch: 0 Global Step: 3620 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 20:57:46,216-Speed 5983.49 samples/sec Loss 35.4462 LearningRate 0.0700 Epoch: 0 Global Step: 3630 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:57:53,081-Speed 5967.91 samples/sec Loss 35.3985 LearningRate 0.0702 Epoch: 0 Global Step: 3640 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:57:59,947-Speed 5966.44 samples/sec Loss 35.3569 LearningRate 0.0704 Epoch: 0 Global Step: 3650 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:58:06,813-Speed 5966.51 samples/sec Loss 35.3448 LearningRate 0.0706 Epoch: 0 Global Step: 3660 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:58:13,692-Speed 5955.77 samples/sec Loss 35.2712 LearningRate 0.0708 Epoch: 0 Global Step: 3670 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:58:20,562-Speed 5963.61 samples/sec Loss 35.2860 LearningRate 0.0710 Epoch: 0 Global Step: 3680 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:58:27,411-Speed 5981.36 samples/sec Loss 35.1319 LearningRate 0.0712 Epoch: 0 Global Step: 3690 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:58:34,264-Speed 5978.02 samples/sec Loss 35.1569 LearningRate 0.0714 Epoch: 0 Global Step: 3700 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:58:41,123-Speed 5973.68 samples/sec Loss 35.0992 LearningRate 0.0716 Epoch: 0 Global Step: 3710 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:58:47,994-Speed 5962.36 samples/sec Loss 35.0593 LearningRate 0.0718 Epoch: 0 Global Step: 3720 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:58:54,857-Speed 5970.11 samples/sec Loss 35.0803 LearningRate 0.0719 Epoch: 0 Global Step: 3730 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 20:59:01,704-Speed 5983.01 samples/sec Loss 35.0500 LearningRate 0.0721 Epoch: 0 Global Step: 3740 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 20:59:08,555-Speed 5980.50 samples/sec Loss 34.9635 LearningRate 0.0723 Epoch: 0 Global Step: 3750 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 20:59:15,417-Speed 5969.60 samples/sec Loss 34.8776 LearningRate 0.0725 Epoch: 0 Global Step: 3760 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 20:59:22,277-Speed 5972.11 samples/sec Loss 34.8599 LearningRate 0.0727 Epoch: 0 Global Step: 3770 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:59:29,125-Speed 5983.10 samples/sec Loss 34.8668 LearningRate 0.0729 Epoch: 0 Global Step: 3780 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:59:35,993-Speed 5965.99 samples/sec Loss 34.7957 LearningRate 0.0731 Epoch: 0 Global Step: 3790 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:59:42,843-Speed 5979.86 samples/sec Loss 34.7567 LearningRate 0.0733 Epoch: 0 Global Step: 3800 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:59:49,705-Speed 5970.53 samples/sec Loss 34.7522 LearningRate 0.0735 Epoch: 0 Global Step: 3810 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 20:59:56,555-Speed 5981.13 samples/sec Loss 34.6789 LearningRate 0.0737 Epoch: 0 Global Step: 3820 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:00:03,425-Speed 5963.31 samples/sec Loss 34.6225 LearningRate 0.0739 Epoch: 0 Global Step: 3830 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:00:10,281-Speed 5974.49 samples/sec Loss 34.6152 LearningRate 0.0741 Epoch: 0 Global Step: 3840 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:00:17,150-Speed 5964.77 samples/sec Loss 34.5266 LearningRate 0.0743 Epoch: 0 Global Step: 3850 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:00:24,003-Speed 5977.75 samples/sec Loss 34.5414 LearningRate 0.0745 Epoch: 0 Global Step: 3860 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:00:30,861-Speed 5974.35 samples/sec Loss 34.4448 LearningRate 0.0746 Epoch: 0 Global Step: 3870 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:00:37,706-Speed 5985.01 samples/sec Loss 34.4667 LearningRate 0.0748 Epoch: 0 Global Step: 3880 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:00:44,588-Speed 5953.31 samples/sec Loss 34.4353 LearningRate 0.0750 Epoch: 0 Global Step: 3890 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:00:51,436-Speed 5981.73 samples/sec Loss 34.3496 LearningRate 0.0752 Epoch: 0 Global Step: 3900 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:00:58,336-Speed 5937.93 samples/sec Loss 34.3870 LearningRate 0.0754 Epoch: 0 Global Step: 3910 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:01:05,204-Speed 5967.61 samples/sec Loss 34.2986 LearningRate 0.0756 Epoch: 0 Global Step: 3920 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:01:12,067-Speed 5969.12 samples/sec Loss 34.2485 LearningRate 0.0758 Epoch: 0 Global Step: 3930 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:01:18,925-Speed 5974.41 samples/sec Loss 34.1365 LearningRate 0.0760 Epoch: 0 Global Step: 3940 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:01:25,772-Speed 5983.44 samples/sec Loss 34.1696 LearningRate 0.0762 Epoch: 0 Global Step: 3950 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:01:32,625-Speed 5981.55 samples/sec Loss 34.0986 LearningRate 0.0764 Epoch: 0 Global Step: 3960 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:01:39,480-Speed 5976.29 samples/sec Loss 34.1165 LearningRate 0.0766 Epoch: 0 Global Step: 3970 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:01:46,327-Speed 5983.15 samples/sec Loss 34.1329 LearningRate 0.0768 Epoch: 0 Global Step: 3980 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:01:53,199-Speed 5961.82 samples/sec Loss 33.9972 LearningRate 0.0770 Epoch: 0 Global Step: 3990 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:02:00,068-Speed 5965.05 samples/sec Loss 33.9902 LearningRate 0.0772 Epoch: 0 Global Step: 4000 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:02:06,944-Speed 5957.74 samples/sec Loss 33.9399 LearningRate 0.0773 Epoch: 0 Global Step: 4010 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:02:13,807-Speed 5971.05 samples/sec Loss 33.8481 LearningRate 0.0775 Epoch: 0 Global Step: 4020 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:02:20,684-Speed 5956.95 samples/sec Loss 33.8311 LearningRate 0.0777 Epoch: 0 Global Step: 4030 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:02:27,577-Speed 5945.30 samples/sec Loss 33.8118 LearningRate 0.0779 Epoch: 0 Global Step: 4040 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:02:34,432-Speed 5976.44 samples/sec Loss 33.7787 LearningRate 0.0781 Epoch: 0 Global Step: 4050 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:02:41,277-Speed 5984.91 samples/sec Loss 33.7604 LearningRate 0.0783 Epoch: 0 Global Step: 4060 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:02:48,147-Speed 5963.97 samples/sec Loss 33.6753 LearningRate 0.0785 Epoch: 0 Global Step: 4070 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:02:55,002-Speed 5976.21 samples/sec Loss 33.6727 LearningRate 0.0787 Epoch: 0 Global Step: 4080 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:03:01,850-Speed 5982.80 samples/sec Loss 33.5546 LearningRate 0.0789 Epoch: 0 Global Step: 4090 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:03:08,706-Speed 5975.08 samples/sec Loss 33.5452 LearningRate 0.0791 Epoch: 0 Global Step: 4100 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:03:15,576-Speed 5963.37 samples/sec Loss 33.4464 LearningRate 0.0793 Epoch: 0 Global Step: 4110 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:03:22,426-Speed 5980.36 samples/sec Loss 33.4349 LearningRate 0.0795 Epoch: 0 Global Step: 4120 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:03:29,274-Speed 5983.54 samples/sec Loss 33.4476 LearningRate 0.0797 Epoch: 0 Global Step: 4130 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:03:36,128-Speed 5977.07 samples/sec Loss 33.4153 LearningRate 0.0799 Epoch: 0 Global Step: 4140 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:03:42,970-Speed 5987.89 samples/sec Loss 33.2608 LearningRate 0.0800 Epoch: 0 Global Step: 4150 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:03:49,825-Speed 5976.81 samples/sec Loss 33.3099 LearningRate 0.0802 Epoch: 0 Global Step: 4160 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:03:56,671-Speed 5983.93 samples/sec Loss 33.2322 LearningRate 0.0804 Epoch: 0 Global Step: 4170 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:04:03,568-Speed 5940.69 samples/sec Loss 33.1833 LearningRate 0.0806 Epoch: 0 Global Step: 4180 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:04:10,442-Speed 5959.74 samples/sec Loss 33.0915 LearningRate 0.0808 Epoch: 0 Global Step: 4190 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:04:17,328-Speed 5949.03 samples/sec Loss 33.0803 LearningRate 0.0810 Epoch: 0 Global Step: 4200 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:04:24,201-Speed 5963.81 samples/sec Loss 33.1097 LearningRate 0.0812 Epoch: 0 Global Step: 4210 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:04:31,072-Speed 5962.64 samples/sec Loss 33.0498 LearningRate 0.0814 Epoch: 0 Global Step: 4220 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:04:37,930-Speed 5973.53 samples/sec Loss 32.9804 LearningRate 0.0816 Epoch: 0 Global Step: 4230 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:04:44,793-Speed 5969.70 samples/sec Loss 32.9122 LearningRate 0.0818 Epoch: 0 Global Step: 4240 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:04:51,641-Speed 5982.37 samples/sec Loss 32.8876 LearningRate 0.0820 Epoch: 0 Global Step: 4250 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:04:58,524-Speed 5952.09 samples/sec Loss 32.9156 LearningRate 0.0822 Epoch: 0 Global Step: 4260 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:05:05,374-Speed 5980.94 samples/sec Loss 32.8686 LearningRate 0.0824 Epoch: 0 Global Step: 4270 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:05:12,228-Speed 5976.41 samples/sec Loss 32.7162 LearningRate 0.0826 Epoch: 0 Global Step: 4280 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:05:19,102-Speed 5960.49 samples/sec Loss 32.7185 LearningRate 0.0827 Epoch: 0 Global Step: 4290 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:05:25,986-Speed 5950.86 samples/sec Loss 32.6298 LearningRate 0.0829 Epoch: 0 Global Step: 4300 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:05:32,860-Speed 5961.38 samples/sec Loss 32.6307 LearningRate 0.0831 Epoch: 0 Global Step: 4310 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:05:39,728-Speed 5965.15 samples/sec Loss 32.5682 LearningRate 0.0833 Epoch: 0 Global Step: 4320 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:05:46,649-Speed 5919.26 samples/sec Loss 32.5532 LearningRate 0.0835 Epoch: 0 Global Step: 4330 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:05:53,510-Speed 5972.44 samples/sec Loss 32.5038 LearningRate 0.0837 Epoch: 0 Global Step: 4340 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:06:00,366-Speed 5975.48 samples/sec Loss 32.4039 LearningRate 0.0839 Epoch: 0 Global Step: 4350 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:06:07,228-Speed 5970.33 samples/sec Loss 32.3771 LearningRate 0.0841 Epoch: 0 Global Step: 4360 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:06:14,083-Speed 5977.00 samples/sec Loss 32.4050 LearningRate 0.0843 Epoch: 0 Global Step: 4370 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:06:20,935-Speed 5978.90 samples/sec Loss 32.3070 LearningRate 0.0845 Epoch: 0 Global Step: 4380 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:06:27,818-Speed 5952.15 samples/sec Loss 32.2742 LearningRate 0.0847 Epoch: 0 Global Step: 4390 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:06:34,683-Speed 5967.68 samples/sec Loss 32.2404 LearningRate 0.0849 Epoch: 0 Global Step: 4400 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:06:41,546-Speed 5969.57 samples/sec Loss 32.1633 LearningRate 0.0851 Epoch: 0 Global Step: 4410 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:06:48,413-Speed 5966.08 samples/sec Loss 32.1344 LearningRate 0.0853 Epoch: 0 Global Step: 4420 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:06:55,270-Speed 5974.86 samples/sec Loss 32.1704 LearningRate 0.0854 Epoch: 0 Global Step: 4430 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:07:02,139-Speed 5964.65 samples/sec Loss 31.9536 LearningRate 0.0856 Epoch: 0 Global Step: 4440 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:07:09,026-Speed 5948.66 samples/sec Loss 31.9404 LearningRate 0.0858 Epoch: 0 Global Step: 4450 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:07:15,893-Speed 5965.24 samples/sec Loss 31.9361 LearningRate 0.0860 Epoch: 0 Global Step: 4460 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:07:22,762-Speed 5964.68 samples/sec Loss 31.9429 LearningRate 0.0862 Epoch: 0 Global Step: 4470 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:07:29,612-Speed 5980.47 samples/sec Loss 31.7962 LearningRate 0.0864 Epoch: 0 Global Step: 4480 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:07:36,477-Speed 5970.08 samples/sec Loss 31.8327 LearningRate 0.0866 Epoch: 0 Global Step: 4490 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:07:43,337-Speed 5972.30 samples/sec Loss 31.8090 LearningRate 0.0868 Epoch: 0 Global Step: 4500 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:07:50,225-Speed 5947.39 samples/sec Loss 31.7536 LearningRate 0.0870 Epoch: 0 Global Step: 4510 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:07:57,082-Speed 5974.19 samples/sec Loss 31.6869 LearningRate 0.0872 Epoch: 0 Global Step: 4520 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:08:03,944-Speed 5970.80 samples/sec Loss 31.7174 LearningRate 0.0874 Epoch: 0 Global Step: 4530 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:08:10,806-Speed 5970.32 samples/sec Loss 31.5951 LearningRate 0.0876 Epoch: 0 Global Step: 4540 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:08:17,682-Speed 5958.79 samples/sec Loss 31.4843 LearningRate 0.0878 Epoch: 0 Global Step: 4550 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:08:24,552-Speed 5965.15 samples/sec Loss 31.4488 LearningRate 0.0880 Epoch: 0 Global Step: 4560 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:08:31,413-Speed 5970.41 samples/sec Loss 31.4314 LearningRate 0.0881 Epoch: 0 Global Step: 4570 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:08:38,305-Speed 5944.07 samples/sec Loss 31.4752 LearningRate 0.0883 Epoch: 0 Global Step: 4580 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:08:45,161-Speed 5975.71 samples/sec Loss 31.3671 LearningRate 0.0885 Epoch: 0 Global Step: 4590 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:08:52,031-Speed 5963.13 samples/sec Loss 31.2846 LearningRate 0.0887 Epoch: 0 Global Step: 4600 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:08:58,911-Speed 5955.22 samples/sec Loss 31.2923 LearningRate 0.0889 Epoch: 0 Global Step: 4610 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:09:05,767-Speed 5976.88 samples/sec Loss 31.2495 LearningRate 0.0891 Epoch: 0 Global Step: 4620 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:09:12,644-Speed 5957.93 samples/sec Loss 31.2091 LearningRate 0.0893 Epoch: 0 Global Step: 4630 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:09:19,498-Speed 5976.95 samples/sec Loss 31.1343 LearningRate 0.0895 Epoch: 0 Global Step: 4640 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:09:26,366-Speed 5965.13 samples/sec Loss 31.0259 LearningRate 0.0897 Epoch: 0 Global Step: 4650 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:09:33,225-Speed 5972.71 samples/sec Loss 31.0299 LearningRate 0.0899 Epoch: 0 Global Step: 4660 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:09:40,083-Speed 5973.40 samples/sec Loss 31.0379 LearningRate 0.0901 Epoch: 0 Global Step: 4670 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:09:46,952-Speed 5964.57 samples/sec Loss 30.9790 LearningRate 0.0903 Epoch: 0 Global Step: 4680 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:09:53,825-Speed 5961.31 samples/sec Loss 30.8526 LearningRate 0.0905 Epoch: 0 Global Step: 4690 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:10:00,688-Speed 5969.34 samples/sec Loss 30.8566 LearningRate 0.0907 Epoch: 0 Global Step: 4700 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:10:07,529-Speed 5988.72 samples/sec Loss 30.8156 LearningRate 0.0908 Epoch: 0 Global Step: 4710 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:10:14,401-Speed 5961.45 samples/sec Loss 30.7997 LearningRate 0.0910 Epoch: 0 Global Step: 4720 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:10:21,267-Speed 5967.00 samples/sec Loss 30.7015 LearningRate 0.0912 Epoch: 0 Global Step: 4730 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:10:28,136-Speed 5963.75 samples/sec Loss 30.6229 LearningRate 0.0914 Epoch: 0 Global Step: 4740 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:10:34,994-Speed 5973.04 samples/sec Loss 30.6299 LearningRate 0.0916 Epoch: 0 Global Step: 4750 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:10:41,846-Speed 5978.69 samples/sec Loss 30.5255 LearningRate 0.0918 Epoch: 0 Global Step: 4760 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:10:48,709-Speed 5969.83 samples/sec Loss 30.4690 LearningRate 0.0920 Epoch: 0 Global Step: 4770 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:10:55,563-Speed 5977.00 samples/sec Loss 30.4642 LearningRate 0.0922 Epoch: 0 Global Step: 4780 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:11:02,420-Speed 5974.44 samples/sec Loss 30.4304 LearningRate 0.0924 Epoch: 0 Global Step: 4790 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:11:09,316-Speed 5941.27 samples/sec Loss 30.4013 LearningRate 0.0926 Epoch: 0 Global Step: 4800 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:11:16,176-Speed 5971.88 samples/sec Loss 30.3322 LearningRate 0.0928 Epoch: 0 Global Step: 4810 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:11:23,045-Speed 5964.77 samples/sec Loss 30.2091 LearningRate 0.0930 Epoch: 0 Global Step: 4820 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:11:29,904-Speed 5972.29 samples/sec Loss 30.2256 LearningRate 0.0932 Epoch: 0 Global Step: 4830 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:11:36,775-Speed 5965.95 samples/sec Loss 30.1388 LearningRate 0.0934 Epoch: 0 Global Step: 4840 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:11:43,623-Speed 5981.86 samples/sec Loss 30.1461 LearningRate 0.0935 Epoch: 0 Global Step: 4850 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:11:50,496-Speed 5961.49 samples/sec Loss 30.0141 LearningRate 0.0937 Epoch: 0 Global Step: 4860 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:11:57,350-Speed 5977.54 samples/sec Loss 30.0586 LearningRate 0.0939 Epoch: 0 Global Step: 4870 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:12:04,213-Speed 5968.98 samples/sec Loss 30.0525 LearningRate 0.0941 Epoch: 0 Global Step: 4880 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:12:11,090-Speed 5957.79 samples/sec Loss 29.9427 LearningRate 0.0943 Epoch: 0 Global Step: 4890 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:12:17,939-Speed 5981.20 samples/sec Loss 29.8670 LearningRate 0.0945 Epoch: 0 Global Step: 4900 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:12:24,918-Speed 5870.47 samples/sec Loss 29.7991 LearningRate 0.0947 Epoch: 0 Global Step: 4910 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:12:31,889-Speed 5879.80 samples/sec Loss 29.7880 LearningRate 0.0949 Epoch: 0 Global Step: 4920 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:12:38,772-Speed 5952.50 samples/sec Loss 29.8002 LearningRate 0.0951 Epoch: 0 Global Step: 4930 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:12:45,632-Speed 5972.23 samples/sec Loss 29.6866 LearningRate 0.0953 Epoch: 0 Global Step: 4940 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:12:52,493-Speed 5970.86 samples/sec Loss 29.6787 LearningRate 0.0955 Epoch: 0 Global Step: 4950 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:12:59,362-Speed 5964.51 samples/sec Loss 29.5911 LearningRate 0.0957 Epoch: 0 Global Step: 4960 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:13:06,229-Speed 5965.52 samples/sec Loss 29.5268 LearningRate 0.0959 Epoch: 0 Global Step: 4970 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:13:13,104-Speed 5959.25 samples/sec Loss 29.4482 LearningRate 0.0961 Epoch: 0 Global Step: 4980 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:13:19,974-Speed 5963.51 samples/sec Loss 29.3663 LearningRate 0.0962 Epoch: 0 Global Step: 4990 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:13:26,840-Speed 5966.98 samples/sec Loss 29.4447 LearningRate 0.0964 Epoch: 0 Global Step: 5000 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:13:53,821-[lfw][5000]XNorm: 22.570552 Training: 2022-01-07 21:13:53,822-[lfw][5000]Accuracy-Flip: 0.98167+-0.00695 Training: 2022-01-07 21:13:53,822-[lfw][5000]Accuracy-Highest: 0.98167 Training: 2022-01-07 21:14:24,946-[cfp_fp][5000]XNorm: 20.072500 Training: 2022-01-07 21:14:24,947-[cfp_fp][5000]Accuracy-Flip: 0.89171+-0.00985 Training: 2022-01-07 21:14:24,948-[cfp_fp][5000]Accuracy-Highest: 0.89171 Training: 2022-01-07 21:14:51,791-[agedb_30][5000]XNorm: 22.088100 Training: 2022-01-07 21:14:51,792-[agedb_30][5000]Accuracy-Flip: 0.85750+-0.01241 Training: 2022-01-07 21:14:51,793-[agedb_30][5000]Accuracy-Highest: 0.85750 Training: 2022-01-07 21:14:58,670-Speed 446.05 samples/sec Loss 29.3444 LearningRate 0.0966 Epoch: 0 Global Step: 5010 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 21:15:05,524-Speed 5976.56 samples/sec Loss 29.3053 LearningRate 0.0968 Epoch: 0 Global Step: 5020 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 21:15:12,382-Speed 5973.87 samples/sec Loss 29.1861 LearningRate 0.0970 Epoch: 0 Global Step: 5030 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 21:15:19,252-Speed 5966.44 samples/sec Loss 29.2594 LearningRate 0.0972 Epoch: 0 Global Step: 5040 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 21:15:26,140-Speed 5948.35 samples/sec Loss 29.0620 LearningRate 0.0974 Epoch: 0 Global Step: 5050 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 21:15:33,026-Speed 5949.21 samples/sec Loss 29.1541 LearningRate 0.0976 Epoch: 0 Global Step: 5060 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 21:15:39,910-Speed 5951.21 samples/sec Loss 29.0856 LearningRate 0.0978 Epoch: 0 Global Step: 5070 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 21:15:46,804-Speed 5942.84 samples/sec Loss 28.9425 LearningRate 0.0980 Epoch: 0 Global Step: 5080 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 21:15:53,693-Speed 5947.67 samples/sec Loss 29.0103 LearningRate 0.0982 Epoch: 0 Global Step: 5090 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 21:16:00,595-Speed 5935.11 samples/sec Loss 28.8262 LearningRate 0.0984 Epoch: 0 Global Step: 5100 Fp16 Grad Scale: 262144 Required: 40 hours Training: 2022-01-07 21:16:07,489-Speed 5942.36 samples/sec Loss 28.8606 LearningRate 0.0986 Epoch: 0 Global Step: 5110 Fp16 Grad Scale: 262144 Required: 40 hours Training: 2022-01-07 21:16:14,380-Speed 5945.20 samples/sec Loss 28.7762 LearningRate 0.0988 Epoch: 0 Global Step: 5120 Fp16 Grad Scale: 262144 Required: 40 hours Training: 2022-01-07 21:16:21,255-Speed 5959.55 samples/sec Loss 28.7434 LearningRate 0.0989 Epoch: 0 Global Step: 5130 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 21:16:28,143-Speed 5947.61 samples/sec Loss 28.6580 LearningRate 0.0991 Epoch: 0 Global Step: 5140 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 21:16:35,045-Speed 5935.38 samples/sec Loss 28.5890 LearningRate 0.0993 Epoch: 0 Global Step: 5150 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 21:16:41,923-Speed 5956.87 samples/sec Loss 28.5379 LearningRate 0.0995 Epoch: 0 Global Step: 5160 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 21:16:48,806-Speed 5955.93 samples/sec Loss 28.5709 LearningRate 0.0997 Epoch: 0 Global Step: 5170 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 21:16:55,671-Speed 5967.75 samples/sec Loss 28.5357 LearningRate 0.0999 Epoch: 0 Global Step: 5180 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 21:17:02,522-Speed 5980.25 samples/sec Loss 28.5321 LearningRate 0.1001 Epoch: 0 Global Step: 5190 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 21:17:09,371-Speed 5981.13 samples/sec Loss 28.3559 LearningRate 0.1003 Epoch: 0 Global Step: 5200 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 21:17:16,245-Speed 5959.54 samples/sec Loss 28.3976 LearningRate 0.1005 Epoch: 0 Global Step: 5210 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 21:17:23,133-Speed 5948.74 samples/sec Loss 28.3089 LearningRate 0.1007 Epoch: 0 Global Step: 5220 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 21:17:30,019-Speed 5949.51 samples/sec Loss 28.3166 LearningRate 0.1009 Epoch: 0 Global Step: 5230 Fp16 Grad Scale: 262144 Required: 40 hours Training: 2022-01-07 21:17:36,888-Speed 5964.24 samples/sec Loss 28.1921 LearningRate 0.1011 Epoch: 0 Global Step: 5240 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 21:17:43,825-Speed 5905.12 samples/sec Loss 28.1205 LearningRate 0.1013 Epoch: 0 Global Step: 5250 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 21:17:50,680-Speed 5977.08 samples/sec Loss 28.0629 LearningRate 0.1015 Epoch: 0 Global Step: 5260 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 21:17:57,548-Speed 5965.33 samples/sec Loss 28.1142 LearningRate 0.1016 Epoch: 0 Global Step: 5270 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 21:18:04,408-Speed 5971.56 samples/sec Loss 28.0000 LearningRate 0.1018 Epoch: 0 Global Step: 5280 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 21:18:11,282-Speed 5960.10 samples/sec Loss 28.0388 LearningRate 0.1020 Epoch: 0 Global Step: 5290 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 21:18:18,235-Speed 5893.06 samples/sec Loss 27.8527 LearningRate 0.1022 Epoch: 0 Global Step: 5300 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 21:18:25,198-Speed 5883.83 samples/sec Loss 27.8429 LearningRate 0.1024 Epoch: 0 Global Step: 5310 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 21:18:32,081-Speed 5952.28 samples/sec Loss 27.8267 LearningRate 0.1026 Epoch: 0 Global Step: 5320 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 21:18:38,956-Speed 5961.28 samples/sec Loss 27.8109 LearningRate 0.1028 Epoch: 0 Global Step: 5330 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 21:18:45,849-Speed 5943.23 samples/sec Loss 27.6715 LearningRate 0.1030 Epoch: 0 Global Step: 5340 Fp16 Grad Scale: 262144 Required: 40 hours Training: 2022-01-07 21:18:52,707-Speed 5973.83 samples/sec Loss 27.7718 LearningRate 0.1032 Epoch: 0 Global Step: 5350 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 21:18:59,586-Speed 5955.68 samples/sec Loss 27.6352 LearningRate 0.1034 Epoch: 0 Global Step: 5360 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 21:19:06,468-Speed 5952.31 samples/sec Loss 27.5857 LearningRate 0.1036 Epoch: 0 Global Step: 5370 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-07 21:19:13,348-Speed 5955.21 samples/sec Loss 27.5943 LearningRate 0.1038 Epoch: 0 Global Step: 5380 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:19:20,225-Speed 5957.64 samples/sec Loss 27.4644 LearningRate 0.1040 Epoch: 0 Global Step: 5390 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:19:27,116-Speed 5945.65 samples/sec Loss 27.4641 LearningRate 0.1042 Epoch: 0 Global Step: 5400 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:19:33,970-Speed 5976.93 samples/sec Loss 27.4321 LearningRate 0.1043 Epoch: 0 Global Step: 5410 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:19:40,857-Speed 5948.91 samples/sec Loss 27.3412 LearningRate 0.1045 Epoch: 0 Global Step: 5420 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:19:47,719-Speed 5970.70 samples/sec Loss 27.3061 LearningRate 0.1047 Epoch: 0 Global Step: 5430 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:19:54,591-Speed 5961.07 samples/sec Loss 27.1689 LearningRate 0.1049 Epoch: 0 Global Step: 5440 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:20:01,463-Speed 5961.92 samples/sec Loss 27.1390 LearningRate 0.1051 Epoch: 0 Global Step: 5450 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:20:08,325-Speed 5970.10 samples/sec Loss 27.2063 LearningRate 0.1053 Epoch: 0 Global Step: 5460 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:20:15,182-Speed 5974.22 samples/sec Loss 27.0826 LearningRate 0.1055 Epoch: 0 Global Step: 5470 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:20:22,065-Speed 5951.22 samples/sec Loss 27.1802 LearningRate 0.1057 Epoch: 0 Global Step: 5480 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:20:28,937-Speed 5962.28 samples/sec Loss 27.0110 LearningRate 0.1059 Epoch: 0 Global Step: 5490 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:20:35,797-Speed 5971.84 samples/sec Loss 26.8824 LearningRate 0.1061 Epoch: 0 Global Step: 5500 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:20:42,667-Speed 5963.04 samples/sec Loss 26.8471 LearningRate 0.1063 Epoch: 0 Global Step: 5510 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:20:49,564-Speed 5951.44 samples/sec Loss 26.8067 LearningRate 0.1065 Epoch: 0 Global Step: 5520 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:20:56,425-Speed 5971.03 samples/sec Loss 26.8032 LearningRate 0.1067 Epoch: 0 Global Step: 5530 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:21:03,309-Speed 5951.82 samples/sec Loss 26.7403 LearningRate 0.1069 Epoch: 0 Global Step: 5540 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:21:10,185-Speed 5957.98 samples/sec Loss 26.7607 LearningRate 0.1070 Epoch: 0 Global Step: 5550 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:21:17,043-Speed 5974.78 samples/sec Loss 26.6883 LearningRate 0.1072 Epoch: 0 Global Step: 5560 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:21:23,917-Speed 5959.76 samples/sec Loss 26.6191 LearningRate 0.1074 Epoch: 0 Global Step: 5570 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:21:30,796-Speed 5956.01 samples/sec Loss 26.4842 LearningRate 0.1076 Epoch: 0 Global Step: 5580 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:21:37,662-Speed 5966.31 samples/sec Loss 26.5450 LearningRate 0.1078 Epoch: 0 Global Step: 5590 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:21:44,520-Speed 5974.10 samples/sec Loss 26.5283 LearningRate 0.1080 Epoch: 0 Global Step: 5600 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:21:51,394-Speed 5959.67 samples/sec Loss 26.3961 LearningRate 0.1082 Epoch: 0 Global Step: 5610 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:21:58,252-Speed 5974.38 samples/sec Loss 26.3914 LearningRate 0.1084 Epoch: 0 Global Step: 5620 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:22:05,134-Speed 5953.02 samples/sec Loss 26.4419 LearningRate 0.1086 Epoch: 0 Global Step: 5630 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:22:12,031-Speed 5939.70 samples/sec Loss 26.3171 LearningRate 0.1088 Epoch: 0 Global Step: 5640 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:22:18,894-Speed 5969.72 samples/sec Loss 26.2565 LearningRate 0.1090 Epoch: 0 Global Step: 5650 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:22:25,758-Speed 5970.91 samples/sec Loss 26.1823 LearningRate 0.1092 Epoch: 0 Global Step: 5660 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-07 21:22:32,607-Speed 5982.23 samples/sec Loss 26.1081 LearningRate 0.1094 Epoch: 0 Global Step: 5670 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-07 21:22:39,475-Speed 5964.74 samples/sec Loss 26.0931 LearningRate 0.1096 Epoch: 0 Global Step: 5680 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-07 21:22:46,335-Speed 5972.58 samples/sec Loss 25.9846 LearningRate 0.1098 Epoch: 0 Global Step: 5690 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-07 21:22:53,235-Speed 5937.42 samples/sec Loss 25.8910 LearningRate 0.1099 Epoch: 0 Global Step: 5700 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-07 21:23:00,095-Speed 5971.53 samples/sec Loss 25.8874 LearningRate 0.1101 Epoch: 0 Global Step: 5710 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-07 21:23:06,948-Speed 5978.22 samples/sec Loss 25.8290 LearningRate 0.1103 Epoch: 0 Global Step: 5720 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-07 21:23:13,807-Speed 5972.87 samples/sec Loss 25.8547 LearningRate 0.1105 Epoch: 0 Global Step: 5730 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-07 21:23:20,688-Speed 5953.47 samples/sec Loss 25.6778 LearningRate 0.1107 Epoch: 0 Global Step: 5740 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-07 21:23:27,556-Speed 5964.94 samples/sec Loss 25.7239 LearningRate 0.1109 Epoch: 0 Global Step: 5750 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-07 21:23:34,435-Speed 5955.05 samples/sec Loss 25.6553 LearningRate 0.1111 Epoch: 0 Global Step: 5760 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:23:41,301-Speed 5967.44 samples/sec Loss 25.5779 LearningRate 0.1113 Epoch: 0 Global Step: 5770 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:23:48,160-Speed 5972.73 samples/sec Loss 25.5174 LearningRate 0.1115 Epoch: 0 Global Step: 5780 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:23:55,026-Speed 5966.79 samples/sec Loss 25.5147 LearningRate 0.1117 Epoch: 0 Global Step: 5790 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:24:01,920-Speed 5943.66 samples/sec Loss 25.4931 LearningRate 0.1119 Epoch: 0 Global Step: 5800 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:24:08,790-Speed 5963.33 samples/sec Loss 25.4098 LearningRate 0.1121 Epoch: 0 Global Step: 5810 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:24:15,659-Speed 5963.51 samples/sec Loss 25.3802 LearningRate 0.1123 Epoch: 0 Global Step: 5820 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:24:22,531-Speed 5961.83 samples/sec Loss 25.2667 LearningRate 0.1125 Epoch: 0 Global Step: 5830 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:24:29,436-Speed 5933.39 samples/sec Loss 25.3373 LearningRate 0.1126 Epoch: 0 Global Step: 5840 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:24:36,287-Speed 5979.96 samples/sec Loss 25.1256 LearningRate 0.1128 Epoch: 0 Global Step: 5850 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:24:43,152-Speed 5967.87 samples/sec Loss 25.2071 LearningRate 0.1130 Epoch: 0 Global Step: 5860 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:24:50,035-Speed 5951.68 samples/sec Loss 25.1097 LearningRate 0.1132 Epoch: 0 Global Step: 5870 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:24:56,904-Speed 5965.81 samples/sec Loss 25.0595 LearningRate 0.1134 Epoch: 0 Global Step: 5880 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:25:03,761-Speed 5973.98 samples/sec Loss 25.0161 LearningRate 0.1136 Epoch: 0 Global Step: 5890 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:25:10,630-Speed 5965.01 samples/sec Loss 24.9846 LearningRate 0.1138 Epoch: 0 Global Step: 5900 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:25:17,521-Speed 5944.68 samples/sec Loss 24.9442 LearningRate 0.1140 Epoch: 0 Global Step: 5910 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:25:24,400-Speed 5955.74 samples/sec Loss 24.9069 LearningRate 0.1142 Epoch: 0 Global Step: 5920 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:25:31,276-Speed 5958.85 samples/sec Loss 24.7856 LearningRate 0.1144 Epoch: 0 Global Step: 5930 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:25:38,148-Speed 5966.43 samples/sec Loss 24.8231 LearningRate 0.1146 Epoch: 0 Global Step: 5940 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:25:45,018-Speed 5966.49 samples/sec Loss 24.7873 LearningRate 0.1148 Epoch: 0 Global Step: 5950 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:25:51,882-Speed 5968.29 samples/sec Loss 24.7062 LearningRate 0.1150 Epoch: 0 Global Step: 5960 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:25:58,737-Speed 5975.70 samples/sec Loss 24.6227 LearningRate 0.1152 Epoch: 0 Global Step: 5970 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:26:05,606-Speed 5964.97 samples/sec Loss 24.5703 LearningRate 0.1153 Epoch: 0 Global Step: 5980 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:26:12,490-Speed 5955.15 samples/sec Loss 24.5106 LearningRate 0.1155 Epoch: 0 Global Step: 5990 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:26:19,347-Speed 5973.92 samples/sec Loss 24.5200 LearningRate 0.1157 Epoch: 0 Global Step: 6000 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:26:26,209-Speed 5972.50 samples/sec Loss 24.4210 LearningRate 0.1159 Epoch: 0 Global Step: 6010 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:26:33,068-Speed 5972.55 samples/sec Loss 24.3314 LearningRate 0.1161 Epoch: 0 Global Step: 6020 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:26:39,929-Speed 5971.43 samples/sec Loss 24.3441 LearningRate 0.1163 Epoch: 0 Global Step: 6030 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:26:46,811-Speed 5953.07 samples/sec Loss 24.2755 LearningRate 0.1165 Epoch: 0 Global Step: 6040 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:26:53,676-Speed 5967.62 samples/sec Loss 24.3109 LearningRate 0.1167 Epoch: 0 Global Step: 6050 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:27:00,548-Speed 5962.48 samples/sec Loss 24.2201 LearningRate 0.1169 Epoch: 0 Global Step: 6060 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:27:07,403-Speed 5978.60 samples/sec Loss 24.1402 LearningRate 0.1171 Epoch: 0 Global Step: 6070 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:27:14,283-Speed 5956.24 samples/sec Loss 24.2096 LearningRate 0.1173 Epoch: 0 Global Step: 6080 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:27:21,154-Speed 5962.07 samples/sec Loss 24.0298 LearningRate 0.1175 Epoch: 0 Global Step: 6090 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:27:28,051-Speed 5940.53 samples/sec Loss 23.9816 LearningRate 0.1177 Epoch: 0 Global Step: 6100 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:27:34,909-Speed 5973.11 samples/sec Loss 23.9674 LearningRate 0.1179 Epoch: 0 Global Step: 6110 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:27:41,777-Speed 5965.59 samples/sec Loss 23.9150 LearningRate 0.1180 Epoch: 0 Global Step: 6120 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:27:48,636-Speed 5972.94 samples/sec Loss 23.9063 LearningRate 0.1182 Epoch: 0 Global Step: 6130 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:27:55,506-Speed 5963.04 samples/sec Loss 23.7114 LearningRate 0.1184 Epoch: 0 Global Step: 6140 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:28:02,359-Speed 5977.83 samples/sec Loss 23.7432 LearningRate 0.1186 Epoch: 0 Global Step: 6150 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:28:09,214-Speed 5976.32 samples/sec Loss 23.7052 LearningRate 0.1188 Epoch: 0 Global Step: 6160 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:28:16,091-Speed 5957.14 samples/sec Loss 23.6870 LearningRate 0.1190 Epoch: 0 Global Step: 6170 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:28:22,948-Speed 5974.73 samples/sec Loss 23.6274 LearningRate 0.1192 Epoch: 0 Global Step: 6180 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:28:29,831-Speed 5953.22 samples/sec Loss 23.5379 LearningRate 0.1194 Epoch: 0 Global Step: 6190 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:28:36,695-Speed 5969.51 samples/sec Loss 23.5904 LearningRate 0.1196 Epoch: 0 Global Step: 6200 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:28:43,562-Speed 5966.37 samples/sec Loss 23.4785 LearningRate 0.1198 Epoch: 0 Global Step: 6210 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:28:50,430-Speed 5964.85 samples/sec Loss 23.4325 LearningRate 0.1200 Epoch: 0 Global Step: 6220 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:28:57,302-Speed 5961.17 samples/sec Loss 23.4735 LearningRate 0.1202 Epoch: 0 Global Step: 6230 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:29:04,168-Speed 5966.88 samples/sec Loss 23.2518 LearningRate 0.1204 Epoch: 0 Global Step: 6240 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:29:11,027-Speed 5973.08 samples/sec Loss 23.2595 LearningRate 0.1206 Epoch: 0 Global Step: 6250 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:29:17,900-Speed 5960.56 samples/sec Loss 23.2640 LearningRate 0.1207 Epoch: 0 Global Step: 6260 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:29:24,758-Speed 5973.71 samples/sec Loss 23.2695 LearningRate 0.1209 Epoch: 0 Global Step: 6270 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:29:31,623-Speed 5967.48 samples/sec Loss 23.1880 LearningRate 0.1211 Epoch: 0 Global Step: 6280 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:29:38,483-Speed 5972.32 samples/sec Loss 23.0638 LearningRate 0.1213 Epoch: 0 Global Step: 6290 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:29:45,342-Speed 5973.39 samples/sec Loss 23.0802 LearningRate 0.1215 Epoch: 0 Global Step: 6300 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:29:52,208-Speed 5966.15 samples/sec Loss 23.1020 LearningRate 0.1217 Epoch: 0 Global Step: 6310 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:29:59,078-Speed 5963.70 samples/sec Loss 23.1017 LearningRate 0.1219 Epoch: 0 Global Step: 6320 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:30:05,931-Speed 5977.81 samples/sec Loss 22.9111 LearningRate 0.1221 Epoch: 0 Global Step: 6330 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:30:12,802-Speed 5964.39 samples/sec Loss 22.9085 LearningRate 0.1223 Epoch: 0 Global Step: 6340 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:30:19,656-Speed 5977.29 samples/sec Loss 22.8651 LearningRate 0.1225 Epoch: 0 Global Step: 6350 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:30:26,519-Speed 5969.52 samples/sec Loss 22.8757 LearningRate 0.1227 Epoch: 0 Global Step: 6360 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:30:33,382-Speed 5969.18 samples/sec Loss 22.8368 LearningRate 0.1229 Epoch: 0 Global Step: 6370 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:30:40,239-Speed 5973.87 samples/sec Loss 22.8134 LearningRate 0.1231 Epoch: 0 Global Step: 6380 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:30:47,102-Speed 5969.56 samples/sec Loss 22.7087 LearningRate 0.1233 Epoch: 0 Global Step: 6390 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:30:53,973-Speed 5961.51 samples/sec Loss 22.5714 LearningRate 0.1234 Epoch: 0 Global Step: 6400 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:31:00,859-Speed 5949.39 samples/sec Loss 22.6576 LearningRate 0.1236 Epoch: 0 Global Step: 6410 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:31:07,729-Speed 5963.62 samples/sec Loss 22.5552 LearningRate 0.1238 Epoch: 0 Global Step: 6420 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:31:14,603-Speed 5959.63 samples/sec Loss 22.4687 LearningRate 0.1240 Epoch: 0 Global Step: 6430 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:31:21,460-Speed 5975.35 samples/sec Loss 22.5224 LearningRate 0.1242 Epoch: 0 Global Step: 6440 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:31:28,320-Speed 5972.62 samples/sec Loss 22.4048 LearningRate 0.1244 Epoch: 0 Global Step: 6450 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:31:35,193-Speed 5960.36 samples/sec Loss 22.2972 LearningRate 0.1246 Epoch: 0 Global Step: 6460 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:31:42,058-Speed 5967.67 samples/sec Loss 22.4571 LearningRate 0.1248 Epoch: 0 Global Step: 6470 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:31:48,928-Speed 5963.65 samples/sec Loss 22.2989 LearningRate 0.1250 Epoch: 0 Global Step: 6480 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:31:55,799-Speed 5962.06 samples/sec Loss 22.2457 LearningRate 0.1252 Epoch: 0 Global Step: 6490 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:32:02,688-Speed 5946.66 samples/sec Loss 22.1900 LearningRate 0.1254 Epoch: 0 Global Step: 6500 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:32:09,553-Speed 5967.20 samples/sec Loss 22.1462 LearningRate 0.1256 Epoch: 0 Global Step: 6510 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:32:16,422-Speed 5964.21 samples/sec Loss 22.1257 LearningRate 0.1258 Epoch: 0 Global Step: 6520 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:32:23,284-Speed 5969.97 samples/sec Loss 22.0367 LearningRate 0.1260 Epoch: 0 Global Step: 6530 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:32:30,139-Speed 5976.83 samples/sec Loss 22.0739 LearningRate 0.1261 Epoch: 0 Global Step: 6540 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:32:36,994-Speed 5976.29 samples/sec Loss 22.0852 LearningRate 0.1263 Epoch: 0 Global Step: 6550 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:32:43,864-Speed 5964.73 samples/sec Loss 21.8911 LearningRate 0.1265 Epoch: 0 Global Step: 6560 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:32:50,717-Speed 5977.08 samples/sec Loss 21.9507 LearningRate 0.1267 Epoch: 0 Global Step: 6570 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:32:57,573-Speed 5976.07 samples/sec Loss 21.9305 LearningRate 0.1269 Epoch: 0 Global Step: 6580 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:33:04,430-Speed 5974.25 samples/sec Loss 21.7735 LearningRate 0.1271 Epoch: 0 Global Step: 6590 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:33:11,274-Speed 5985.83 samples/sec Loss 21.8241 LearningRate 0.1273 Epoch: 0 Global Step: 6600 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:33:18,153-Speed 5957.60 samples/sec Loss 21.7262 LearningRate 0.1275 Epoch: 0 Global Step: 6610 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:33:25,018-Speed 5969.68 samples/sec Loss 21.6330 LearningRate 0.1277 Epoch: 0 Global Step: 6620 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:33:31,874-Speed 5975.31 samples/sec Loss 21.7573 LearningRate 0.1279 Epoch: 0 Global Step: 6630 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:33:38,727-Speed 5977.96 samples/sec Loss 21.5427 LearningRate 0.1281 Epoch: 0 Global Step: 6640 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:33:45,605-Speed 5959.25 samples/sec Loss 21.5805 LearningRate 0.1283 Epoch: 0 Global Step: 6650 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:33:52,471-Speed 5967.40 samples/sec Loss 21.5272 LearningRate 0.1285 Epoch: 0 Global Step: 6660 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:33:59,342-Speed 5963.36 samples/sec Loss 21.5388 LearningRate 0.1287 Epoch: 0 Global Step: 6670 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:34:06,213-Speed 5961.99 samples/sec Loss 21.4154 LearningRate 0.1288 Epoch: 0 Global Step: 6680 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:34:13,071-Speed 5977.10 samples/sec Loss 21.3313 LearningRate 0.1290 Epoch: 0 Global Step: 6690 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:34:19,931-Speed 5971.91 samples/sec Loss 21.3393 LearningRate 0.1292 Epoch: 0 Global Step: 6700 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:34:26,785-Speed 5976.91 samples/sec Loss 21.2535 LearningRate 0.1294 Epoch: 0 Global Step: 6710 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:34:33,653-Speed 5965.61 samples/sec Loss 21.3516 LearningRate 0.1296 Epoch: 0 Global Step: 6720 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:34:40,518-Speed 5967.19 samples/sec Loss 21.1714 LearningRate 0.1298 Epoch: 0 Global Step: 6730 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:34:47,377-Speed 5972.80 samples/sec Loss 21.1649 LearningRate 0.1300 Epoch: 0 Global Step: 6740 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:34:54,231-Speed 5977.58 samples/sec Loss 21.1721 LearningRate 0.1302 Epoch: 0 Global Step: 6750 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:35:01,109-Speed 5957.00 samples/sec Loss 21.0945 LearningRate 0.1304 Epoch: 0 Global Step: 6760 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:35:07,973-Speed 5968.32 samples/sec Loss 21.0659 LearningRate 0.1306 Epoch: 0 Global Step: 6770 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:35:14,839-Speed 5966.84 samples/sec Loss 21.0576 LearningRate 0.1308 Epoch: 0 Global Step: 6780 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:35:21,690-Speed 5979.67 samples/sec Loss 20.9320 LearningRate 0.1310 Epoch: 0 Global Step: 6790 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:35:28,556-Speed 5966.68 samples/sec Loss 20.9309 LearningRate 0.1312 Epoch: 0 Global Step: 6800 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:35:35,428-Speed 5961.89 samples/sec Loss 21.0435 LearningRate 0.1314 Epoch: 0 Global Step: 6810 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:35:42,283-Speed 5976.00 samples/sec Loss 20.8224 LearningRate 0.1315 Epoch: 0 Global Step: 6820 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:35:49,144-Speed 5971.06 samples/sec Loss 20.9089 LearningRate 0.1317 Epoch: 0 Global Step: 6830 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:35:56,014-Speed 5963.26 samples/sec Loss 20.7361 LearningRate 0.1319 Epoch: 0 Global Step: 6840 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:36:02,890-Speed 5959.69 samples/sec Loss 20.6986 LearningRate 0.1321 Epoch: 0 Global Step: 6850 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:36:09,731-Speed 5988.61 samples/sec Loss 20.7084 LearningRate 0.1323 Epoch: 0 Global Step: 6860 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:36:16,588-Speed 5974.47 samples/sec Loss 20.6970 LearningRate 0.1325 Epoch: 0 Global Step: 6870 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:36:23,462-Speed 5959.62 samples/sec Loss 20.7671 LearningRate 0.1327 Epoch: 0 Global Step: 6880 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:36:30,324-Speed 5970.42 samples/sec Loss 20.6050 LearningRate 0.1329 Epoch: 0 Global Step: 6890 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:36:37,175-Speed 5979.94 samples/sec Loss 20.5372 LearningRate 0.1331 Epoch: 0 Global Step: 6900 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:36:44,044-Speed 5963.80 samples/sec Loss 20.5397 LearningRate 0.1333 Epoch: 0 Global Step: 6910 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:36:50,912-Speed 5965.13 samples/sec Loss 20.5260 LearningRate 0.1335 Epoch: 0 Global Step: 6920 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:36:57,793-Speed 5954.23 samples/sec Loss 20.4918 LearningRate 0.1337 Epoch: 0 Global Step: 6930 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:37:04,650-Speed 5974.83 samples/sec Loss 20.4581 LearningRate 0.1339 Epoch: 0 Global Step: 6940 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:37:11,509-Speed 5972.35 samples/sec Loss 20.4610 LearningRate 0.1341 Epoch: 0 Global Step: 6950 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:37:18,364-Speed 5976.89 samples/sec Loss 20.3642 LearningRate 0.1342 Epoch: 0 Global Step: 6960 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:37:25,222-Speed 5973.30 samples/sec Loss 20.2534 LearningRate 0.1344 Epoch: 0 Global Step: 6970 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:37:32,069-Speed 5983.65 samples/sec Loss 20.2551 LearningRate 0.1346 Epoch: 0 Global Step: 6980 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:37:38,921-Speed 5978.87 samples/sec Loss 20.2410 LearningRate 0.1348 Epoch: 0 Global Step: 6990 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:37:45,804-Speed 5958.83 samples/sec Loss 20.2647 LearningRate 0.1350 Epoch: 0 Global Step: 7000 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:37:52,676-Speed 5962.48 samples/sec Loss 20.0806 LearningRate 0.1352 Epoch: 0 Global Step: 7010 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:37:59,529-Speed 5977.51 samples/sec Loss 20.0678 LearningRate 0.1354 Epoch: 0 Global Step: 7020 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:38:06,407-Speed 5957.03 samples/sec Loss 20.0957 LearningRate 0.1356 Epoch: 0 Global Step: 7030 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:38:13,279-Speed 5962.34 samples/sec Loss 20.1535 LearningRate 0.1358 Epoch: 0 Global Step: 7040 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:38:20,129-Speed 5980.39 samples/sec Loss 19.9917 LearningRate 0.1360 Epoch: 0 Global Step: 7050 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:38:27,005-Speed 5958.36 samples/sec Loss 20.0354 LearningRate 0.1362 Epoch: 0 Global Step: 7060 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:38:33,860-Speed 5976.83 samples/sec Loss 19.9335 LearningRate 0.1364 Epoch: 0 Global Step: 7070 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:38:40,734-Speed 5959.76 samples/sec Loss 19.9244 LearningRate 0.1366 Epoch: 0 Global Step: 7080 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:38:47,593-Speed 5972.79 samples/sec Loss 19.7635 LearningRate 0.1368 Epoch: 0 Global Step: 7090 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:38:54,472-Speed 5955.24 samples/sec Loss 19.8679 LearningRate 0.1369 Epoch: 0 Global Step: 7100 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:39:01,346-Speed 5960.10 samples/sec Loss 19.8105 LearningRate 0.1371 Epoch: 0 Global Step: 7110 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:39:08,216-Speed 5963.18 samples/sec Loss 19.7583 LearningRate 0.1373 Epoch: 0 Global Step: 7120 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:39:15,105-Speed 5946.76 samples/sec Loss 19.8394 LearningRate 0.1375 Epoch: 0 Global Step: 7130 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:39:21,973-Speed 5967.61 samples/sec Loss 19.6697 LearningRate 0.1377 Epoch: 0 Global Step: 7140 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:39:28,982-Speed 5844.81 samples/sec Loss 19.6214 LearningRate 0.1379 Epoch: 0 Global Step: 7150 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:39:35,847-Speed 5967.86 samples/sec Loss 19.5544 LearningRate 0.1381 Epoch: 0 Global Step: 7160 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:39:42,726-Speed 5955.97 samples/sec Loss 19.5964 LearningRate 0.1383 Epoch: 0 Global Step: 7170 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:39:49,598-Speed 5960.85 samples/sec Loss 19.5708 LearningRate 0.1385 Epoch: 0 Global Step: 7180 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:39:56,473-Speed 5959.79 samples/sec Loss 19.4787 LearningRate 0.1387 Epoch: 0 Global Step: 7190 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:40:03,345-Speed 5961.18 samples/sec Loss 19.4966 LearningRate 0.1389 Epoch: 0 Global Step: 7200 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:40:10,227-Speed 5953.62 samples/sec Loss 19.4631 LearningRate 0.1391 Epoch: 0 Global Step: 7210 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:40:17,090-Speed 5969.86 samples/sec Loss 19.3594 LearningRate 0.1393 Epoch: 0 Global Step: 7220 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:40:23,955-Speed 5979.01 samples/sec Loss 19.3658 LearningRate 0.1395 Epoch: 0 Global Step: 7230 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:40:30,819-Speed 5968.43 samples/sec Loss 19.3937 LearningRate 0.1396 Epoch: 0 Global Step: 7240 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:40:37,683-Speed 5968.47 samples/sec Loss 19.2979 LearningRate 0.1398 Epoch: 0 Global Step: 7250 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:40:44,554-Speed 5963.18 samples/sec Loss 19.2577 LearningRate 0.1400 Epoch: 0 Global Step: 7260 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:40:51,437-Speed 5951.93 samples/sec Loss 19.1587 LearningRate 0.1402 Epoch: 0 Global Step: 7270 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:40:58,305-Speed 5964.80 samples/sec Loss 19.2299 LearningRate 0.1404 Epoch: 0 Global Step: 7280 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:41:05,167-Speed 5971.42 samples/sec Loss 19.0937 LearningRate 0.1406 Epoch: 0 Global Step: 7290 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:41:12,028-Speed 5970.96 samples/sec Loss 19.0973 LearningRate 0.1408 Epoch: 0 Global Step: 7300 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:41:18,885-Speed 5974.82 samples/sec Loss 19.1337 LearningRate 0.1410 Epoch: 0 Global Step: 7310 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:41:25,741-Speed 5975.31 samples/sec Loss 19.0723 LearningRate 0.1412 Epoch: 0 Global Step: 7320 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:41:32,594-Speed 5977.63 samples/sec Loss 19.0455 LearningRate 0.1414 Epoch: 0 Global Step: 7330 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:41:39,452-Speed 5974.18 samples/sec Loss 18.9989 LearningRate 0.1416 Epoch: 0 Global Step: 7340 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:41:46,316-Speed 5969.69 samples/sec Loss 18.9994 LearningRate 0.1418 Epoch: 0 Global Step: 7350 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:41:53,192-Speed 5958.16 samples/sec Loss 19.0271 LearningRate 0.1420 Epoch: 0 Global Step: 7360 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:42:00,065-Speed 5961.22 samples/sec Loss 18.8798 LearningRate 0.1422 Epoch: 0 Global Step: 7370 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:42:06,921-Speed 5975.60 samples/sec Loss 18.9624 LearningRate 0.1423 Epoch: 0 Global Step: 7380 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:42:13,799-Speed 5956.61 samples/sec Loss 18.7277 LearningRate 0.1425 Epoch: 0 Global Step: 7390 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:42:20,681-Speed 5952.99 samples/sec Loss 18.8655 LearningRate 0.1427 Epoch: 0 Global Step: 7400 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:42:27,582-Speed 5942.45 samples/sec Loss 18.7466 LearningRate 0.1429 Epoch: 0 Global Step: 7410 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:42:34,456-Speed 5958.96 samples/sec Loss 18.7740 LearningRate 0.1431 Epoch: 0 Global Step: 7420 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:42:41,321-Speed 5968.31 samples/sec Loss 18.7195 LearningRate 0.1433 Epoch: 0 Global Step: 7430 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:42:48,197-Speed 5958.09 samples/sec Loss 18.7071 LearningRate 0.1435 Epoch: 0 Global Step: 7440 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:42:55,056-Speed 5972.48 samples/sec Loss 18.6442 LearningRate 0.1437 Epoch: 0 Global Step: 7450 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:43:01,927-Speed 5961.91 samples/sec Loss 18.5876 LearningRate 0.1439 Epoch: 0 Global Step: 7460 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:43:08,784-Speed 5974.95 samples/sec Loss 18.6559 LearningRate 0.1441 Epoch: 0 Global Step: 7470 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:43:15,670-Speed 5949.03 samples/sec Loss 18.5337 LearningRate 0.1443 Epoch: 0 Global Step: 7480 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:43:22,547-Speed 5958.75 samples/sec Loss 18.4889 LearningRate 0.1445 Epoch: 0 Global Step: 7490 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:43:29,421-Speed 5960.04 samples/sec Loss 18.5127 LearningRate 0.1447 Epoch: 0 Global Step: 7500 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:43:36,290-Speed 5964.25 samples/sec Loss 18.4916 LearningRate 0.1449 Epoch: 0 Global Step: 7510 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:43:43,140-Speed 5979.95 samples/sec Loss 18.4371 LearningRate 0.1450 Epoch: 0 Global Step: 7520 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:43:50,003-Speed 5972.43 samples/sec Loss 18.4683 LearningRate 0.1452 Epoch: 0 Global Step: 7530 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:43:56,882-Speed 5955.65 samples/sec Loss 18.4851 LearningRate 0.1454 Epoch: 0 Global Step: 7540 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:44:03,758-Speed 5958.74 samples/sec Loss 18.4679 LearningRate 0.1456 Epoch: 0 Global Step: 7550 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:44:10,609-Speed 5979.44 samples/sec Loss 18.3436 LearningRate 0.1458 Epoch: 0 Global Step: 7560 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:44:17,498-Speed 5946.89 samples/sec Loss 18.3011 LearningRate 0.1460 Epoch: 0 Global Step: 7570 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:44:24,357-Speed 5975.65 samples/sec Loss 18.3175 LearningRate 0.1462 Epoch: 0 Global Step: 7580 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:44:31,239-Speed 5952.19 samples/sec Loss 18.2334 LearningRate 0.1464 Epoch: 0 Global Step: 7590 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:44:38,129-Speed 5946.84 samples/sec Loss 18.2187 LearningRate 0.1466 Epoch: 0 Global Step: 7600 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:44:44,988-Speed 5972.74 samples/sec Loss 18.1654 LearningRate 0.1468 Epoch: 0 Global Step: 7610 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:44:51,861-Speed 5961.32 samples/sec Loss 18.1970 LearningRate 0.1470 Epoch: 0 Global Step: 7620 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:44:58,721-Speed 5971.95 samples/sec Loss 18.1375 LearningRate 0.1472 Epoch: 0 Global Step: 7630 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:45:05,587-Speed 5966.77 samples/sec Loss 18.0956 LearningRate 0.1474 Epoch: 0 Global Step: 7640 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:45:12,454-Speed 5966.29 samples/sec Loss 18.1329 LearningRate 0.1476 Epoch: 0 Global Step: 7650 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:45:19,314-Speed 5972.35 samples/sec Loss 18.0499 LearningRate 0.1477 Epoch: 0 Global Step: 7660 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:45:26,194-Speed 5955.82 samples/sec Loss 17.9180 LearningRate 0.1479 Epoch: 0 Global Step: 7670 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:45:33,069-Speed 5958.69 samples/sec Loss 17.9841 LearningRate 0.1481 Epoch: 0 Global Step: 7680 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:45:39,933-Speed 5968.69 samples/sec Loss 17.9670 LearningRate 0.1483 Epoch: 0 Global Step: 7690 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:45:46,805-Speed 5961.84 samples/sec Loss 17.9624 LearningRate 0.1485 Epoch: 0 Global Step: 7700 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:45:53,658-Speed 5978.53 samples/sec Loss 17.9092 LearningRate 0.1487 Epoch: 0 Global Step: 7710 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:46:00,510-Speed 5978.47 samples/sec Loss 18.0158 LearningRate 0.1489 Epoch: 0 Global Step: 7720 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:46:07,373-Speed 5968.92 samples/sec Loss 17.8455 LearningRate 0.1491 Epoch: 0 Global Step: 7730 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:46:14,226-Speed 5978.20 samples/sec Loss 17.9041 LearningRate 0.1493 Epoch: 0 Global Step: 7740 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:46:21,083-Speed 5974.76 samples/sec Loss 17.7961 LearningRate 0.1495 Epoch: 0 Global Step: 7750 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:46:27,948-Speed 5970.29 samples/sec Loss 17.8780 LearningRate 0.1497 Epoch: 0 Global Step: 7760 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:46:34,831-Speed 5951.61 samples/sec Loss 17.8372 LearningRate 0.1499 Epoch: 0 Global Step: 7770 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:46:41,731-Speed 5937.88 samples/sec Loss 17.7995 LearningRate 0.1501 Epoch: 0 Global Step: 7780 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:46:48,668-Speed 5906.01 samples/sec Loss 17.6803 LearningRate 0.1503 Epoch: 0 Global Step: 7790 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:46:55,536-Speed 5965.47 samples/sec Loss 17.5854 LearningRate 0.1504 Epoch: 0 Global Step: 7800 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:47:02,421-Speed 5950.40 samples/sec Loss 17.6792 LearningRate 0.1506 Epoch: 0 Global Step: 7810 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:47:09,322-Speed 5936.17 samples/sec Loss 17.6787 LearningRate 0.1508 Epoch: 0 Global Step: 7820 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:47:16,170-Speed 5982.13 samples/sec Loss 17.5801 LearningRate 0.1510 Epoch: 0 Global Step: 7830 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:47:23,097-Speed 5917.63 samples/sec Loss 17.6034 LearningRate 0.1512 Epoch: 0 Global Step: 7840 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:47:29,961-Speed 5968.49 samples/sec Loss 17.5990 LearningRate 0.1514 Epoch: 0 Global Step: 7850 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:47:36,837-Speed 5957.94 samples/sec Loss 17.5066 LearningRate 0.1516 Epoch: 0 Global Step: 7860 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:47:43,716-Speed 5956.29 samples/sec Loss 17.5724 LearningRate 0.1518 Epoch: 0 Global Step: 7870 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:47:50,591-Speed 5958.70 samples/sec Loss 17.4354 LearningRate 0.1520 Epoch: 0 Global Step: 7880 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:47:57,454-Speed 5969.46 samples/sec Loss 17.4540 LearningRate 0.1522 Epoch: 0 Global Step: 7890 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:48:04,307-Speed 5980.55 samples/sec Loss 17.4825 LearningRate 0.1524 Epoch: 0 Global Step: 7900 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:48:11,196-Speed 5946.72 samples/sec Loss 17.4668 LearningRate 0.1526 Epoch: 0 Global Step: 7910 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:48:18,079-Speed 5951.71 samples/sec Loss 17.3536 LearningRate 0.1528 Epoch: 0 Global Step: 7920 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:48:24,955-Speed 5960.21 samples/sec Loss 17.3835 LearningRate 0.1530 Epoch: 0 Global Step: 7930 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:48:31,820-Speed 5968.52 samples/sec Loss 17.2276 LearningRate 0.1531 Epoch: 0 Global Step: 7940 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:48:38,676-Speed 5975.15 samples/sec Loss 17.3389 LearningRate 0.1533 Epoch: 0 Global Step: 7950 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:48:45,542-Speed 5966.90 samples/sec Loss 17.3307 LearningRate 0.1535 Epoch: 0 Global Step: 7960 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:48:52,391-Speed 5980.98 samples/sec Loss 17.2087 LearningRate 0.1537 Epoch: 0 Global Step: 7970 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:48:59,251-Speed 5974.68 samples/sec Loss 17.2716 LearningRate 0.1539 Epoch: 0 Global Step: 7980 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:49:06,109-Speed 5974.09 samples/sec Loss 17.1665 LearningRate 0.1541 Epoch: 0 Global Step: 7990 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:49:12,964-Speed 5975.92 samples/sec Loss 17.1761 LearningRate 0.1543 Epoch: 0 Global Step: 8000 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:49:19,829-Speed 5967.77 samples/sec Loss 17.1844 LearningRate 0.1545 Epoch: 0 Global Step: 8010 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:49:26,685-Speed 5975.90 samples/sec Loss 17.1919 LearningRate 0.1547 Epoch: 0 Global Step: 8020 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:49:33,548-Speed 5969.10 samples/sec Loss 17.1077 LearningRate 0.1549 Epoch: 0 Global Step: 8030 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:49:40,402-Speed 5977.73 samples/sec Loss 17.1908 LearningRate 0.1551 Epoch: 0 Global Step: 8040 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-07 21:49:47,262-Speed 5971.62 samples/sec Loss 17.1776 LearningRate 0.1553 Epoch: 0 Global Step: 8050 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-07 21:49:54,140-Speed 5956.84 samples/sec Loss 17.1030 LearningRate 0.1555 Epoch: 0 Global Step: 8060 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-07 21:50:01,040-Speed 5937.50 samples/sec Loss 17.0069 LearningRate 0.1557 Epoch: 0 Global Step: 8070 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-07 21:50:07,909-Speed 5964.25 samples/sec Loss 17.0455 LearningRate 0.1558 Epoch: 0 Global Step: 8080 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-07 21:50:14,767-Speed 5973.01 samples/sec Loss 16.9300 LearningRate 0.1560 Epoch: 0 Global Step: 8090 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-07 21:50:21,611-Speed 5986.61 samples/sec Loss 16.9528 LearningRate 0.1562 Epoch: 0 Global Step: 8100 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-07 21:50:28,461-Speed 5980.24 samples/sec Loss 16.9615 LearningRate 0.1564 Epoch: 0 Global Step: 8110 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-07 21:50:35,317-Speed 5975.78 samples/sec Loss 16.9693 LearningRate 0.1566 Epoch: 0 Global Step: 8120 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-07 21:50:42,185-Speed 5964.57 samples/sec Loss 16.9228 LearningRate 0.1568 Epoch: 0 Global Step: 8130 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-07 21:50:49,073-Speed 5947.21 samples/sec Loss 16.9636 LearningRate 0.1570 Epoch: 0 Global Step: 8140 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:50:55,946-Speed 5960.55 samples/sec Loss 16.8588 LearningRate 0.1572 Epoch: 0 Global Step: 8150 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:51:02,787-Speed 5988.86 samples/sec Loss 16.7942 LearningRate 0.1574 Epoch: 0 Global Step: 8160 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-07 21:51:09,645-Speed 5973.32 samples/sec Loss 16.7800 LearningRate 0.1576 Epoch: 0 Global Step: 8170 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-07 21:51:16,521-Speed 5958.17 samples/sec Loss 16.8076 LearningRate 0.1578 Epoch: 0 Global Step: 8180 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-07 21:51:23,375-Speed 5977.03 samples/sec Loss 16.7639 LearningRate 0.1580 Epoch: 0 Global Step: 8190 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-07 21:51:30,230-Speed 5976.28 samples/sec Loss 16.8448 LearningRate 0.1582 Epoch: 0 Global Step: 8200 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-07 21:51:37,080-Speed 5982.97 samples/sec Loss 16.7525 LearningRate 0.1584 Epoch: 0 Global Step: 8210 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-07 21:51:43,921-Speed 5988.13 samples/sec Loss 16.7533 LearningRate 0.1585 Epoch: 0 Global Step: 8220 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-07 21:51:50,796-Speed 5958.71 samples/sec Loss 16.6894 LearningRate 0.1587 Epoch: 0 Global Step: 8230 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-07 21:51:57,641-Speed 5985.46 samples/sec Loss 16.5745 LearningRate 0.1589 Epoch: 0 Global Step: 8240 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-07 21:52:04,490-Speed 5981.53 samples/sec Loss 16.6495 LearningRate 0.1591 Epoch: 0 Global Step: 8250 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-07 21:52:11,358-Speed 5966.40 samples/sec Loss 16.5943 LearningRate 0.1593 Epoch: 0 Global Step: 8260 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:52:18,238-Speed 5954.73 samples/sec Loss 16.6616 LearningRate 0.1595 Epoch: 0 Global Step: 8270 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:52:25,118-Speed 5954.69 samples/sec Loss 16.5690 LearningRate 0.1597 Epoch: 0 Global Step: 8280 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:52:32,001-Speed 5952.08 samples/sec Loss 16.5439 LearningRate 0.1599 Epoch: 0 Global Step: 8290 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:52:38,858-Speed 5975.25 samples/sec Loss 16.5347 LearningRate 0.1601 Epoch: 0 Global Step: 8300 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:52:45,721-Speed 5968.65 samples/sec Loss 16.5297 LearningRate 0.1603 Epoch: 0 Global Step: 8310 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:52:52,566-Speed 5985.52 samples/sec Loss 16.4169 LearningRate 0.1605 Epoch: 0 Global Step: 8320 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:52:59,412-Speed 5984.15 samples/sec Loss 16.4549 LearningRate 0.1607 Epoch: 0 Global Step: 8330 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:53:06,271-Speed 5974.67 samples/sec Loss 16.4238 LearningRate 0.1609 Epoch: 0 Global Step: 8340 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:53:13,130-Speed 5972.61 samples/sec Loss 16.4734 LearningRate 0.1611 Epoch: 0 Global Step: 8350 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:53:19,988-Speed 5974.36 samples/sec Loss 16.4284 LearningRate 0.1612 Epoch: 0 Global Step: 8360 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:53:26,873-Speed 5950.03 samples/sec Loss 16.4370 LearningRate 0.1614 Epoch: 0 Global Step: 8370 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:53:33,735-Speed 5972.10 samples/sec Loss 16.4053 LearningRate 0.1616 Epoch: 0 Global Step: 8380 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:53:40,596-Speed 5973.49 samples/sec Loss 16.3828 LearningRate 0.1618 Epoch: 0 Global Step: 8390 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:53:47,478-Speed 5953.03 samples/sec Loss 16.3081 LearningRate 0.1620 Epoch: 0 Global Step: 8400 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:53:54,348-Speed 5963.56 samples/sec Loss 16.3545 LearningRate 0.1622 Epoch: 0 Global Step: 8410 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:54:01,206-Speed 5973.75 samples/sec Loss 16.3820 LearningRate 0.1624 Epoch: 0 Global Step: 8420 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:54:08,110-Speed 5934.46 samples/sec Loss 16.2807 LearningRate 0.1626 Epoch: 0 Global Step: 8430 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:54:14,975-Speed 5967.41 samples/sec Loss 16.2073 LearningRate 0.1628 Epoch: 0 Global Step: 8440 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:54:21,873-Speed 5939.34 samples/sec Loss 16.2582 LearningRate 0.1630 Epoch: 0 Global Step: 8450 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:54:28,761-Speed 5948.37 samples/sec Loss 16.2800 LearningRate 0.1632 Epoch: 0 Global Step: 8460 Fp16 Grad Scale: 524288 Required: 39 hours Training: 2022-01-07 21:54:35,635-Speed 5959.67 samples/sec Loss 16.2336 LearningRate 0.1634 Epoch: 0 Global Step: 8470 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:54:42,498-Speed 5968.84 samples/sec Loss 16.1895 LearningRate 0.1636 Epoch: 0 Global Step: 8480 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:54:49,359-Speed 5972.03 samples/sec Loss 16.1190 LearningRate 0.1638 Epoch: 0 Global Step: 8490 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:54:56,231-Speed 5961.46 samples/sec Loss 16.2384 LearningRate 0.1640 Epoch: 0 Global Step: 8500 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:55:03,130-Speed 5938.22 samples/sec Loss 16.1496 LearningRate 0.1641 Epoch: 0 Global Step: 8510 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:55:09,997-Speed 5966.15 samples/sec Loss 16.1222 LearningRate 0.1643 Epoch: 0 Global Step: 8520 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:55:16,885-Speed 5949.13 samples/sec Loss 16.0454 LearningRate 0.1645 Epoch: 0 Global Step: 8530 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:55:23,731-Speed 5984.32 samples/sec Loss 16.0432 LearningRate 0.1647 Epoch: 0 Global Step: 8540 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:55:30,585-Speed 5977.50 samples/sec Loss 16.0517 LearningRate 0.1649 Epoch: 0 Global Step: 8550 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:55:37,437-Speed 5978.66 samples/sec Loss 16.0492 LearningRate 0.1651 Epoch: 0 Global Step: 8560 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:55:44,311-Speed 5959.58 samples/sec Loss 16.0011 LearningRate 0.1653 Epoch: 0 Global Step: 8570 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:55:51,160-Speed 5981.20 samples/sec Loss 16.0361 LearningRate 0.1655 Epoch: 0 Global Step: 8580 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:55:58,009-Speed 5981.78 samples/sec Loss 16.0504 LearningRate 0.1657 Epoch: 0 Global Step: 8590 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:56:04,860-Speed 5979.13 samples/sec Loss 16.0179 LearningRate 0.1659 Epoch: 0 Global Step: 8600 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:56:11,744-Speed 5951.00 samples/sec Loss 15.9946 LearningRate 0.1661 Epoch: 0 Global Step: 8610 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:56:18,591-Speed 5983.49 samples/sec Loss 15.9184 LearningRate 0.1663 Epoch: 0 Global Step: 8620 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:56:25,459-Speed 5964.86 samples/sec Loss 15.9306 LearningRate 0.1665 Epoch: 0 Global Step: 8630 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 21:56:32,310-Speed 5979.75 samples/sec Loss 15.9924 LearningRate 0.1667 Epoch: 0 Global Step: 8640 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:56:39,167-Speed 5974.73 samples/sec Loss 15.8669 LearningRate 0.1668 Epoch: 0 Global Step: 8650 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 21:56:46,035-Speed 5965.17 samples/sec Loss 15.8896 LearningRate 0.1670 Epoch: 0 Global Step: 8660 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 21:56:52,905-Speed 5964.67 samples/sec Loss 15.8988 LearningRate 0.1672 Epoch: 0 Global Step: 8670 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 21:56:59,760-Speed 5976.00 samples/sec Loss 15.9095 LearningRate 0.1674 Epoch: 0 Global Step: 8680 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 21:57:06,619-Speed 5972.50 samples/sec Loss 15.7799 LearningRate 0.1676 Epoch: 0 Global Step: 8690 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 21:57:13,482-Speed 5970.02 samples/sec Loss 15.8239 LearningRate 0.1678 Epoch: 0 Global Step: 8700 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 21:57:20,333-Speed 5979.37 samples/sec Loss 15.8823 LearningRate 0.1680 Epoch: 0 Global Step: 8710 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 21:57:27,193-Speed 5971.60 samples/sec Loss 15.8293 LearningRate 0.1682 Epoch: 0 Global Step: 8720 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 21:57:34,058-Speed 5967.45 samples/sec Loss 15.8105 LearningRate 0.1684 Epoch: 0 Global Step: 8730 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 21:57:40,921-Speed 5970.35 samples/sec Loss 15.6523 LearningRate 0.1686 Epoch: 0 Global Step: 8740 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 21:57:47,770-Speed 5980.90 samples/sec Loss 15.7051 LearningRate 0.1688 Epoch: 0 Global Step: 8750 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 21:57:54,640-Speed 5963.30 samples/sec Loss 15.6780 LearningRate 0.1690 Epoch: 0 Global Step: 8760 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 21:58:01,497-Speed 5974.01 samples/sec Loss 15.7037 LearningRate 0.1692 Epoch: 0 Global Step: 8770 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 21:58:08,343-Speed 5984.34 samples/sec Loss 15.6728 LearningRate 0.1694 Epoch: 0 Global Step: 8780 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 21:58:15,206-Speed 5972.38 samples/sec Loss 15.6778 LearningRate 0.1695 Epoch: 0 Global Step: 8790 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 21:58:22,071-Speed 5967.51 samples/sec Loss 15.6115 LearningRate 0.1697 Epoch: 0 Global Step: 8800 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 21:58:28,921-Speed 5980.68 samples/sec Loss 15.6828 LearningRate 0.1699 Epoch: 0 Global Step: 8810 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 21:58:35,797-Speed 5958.19 samples/sec Loss 15.5774 LearningRate 0.1701 Epoch: 0 Global Step: 8820 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 21:58:42,691-Speed 5944.92 samples/sec Loss 15.5663 LearningRate 0.1703 Epoch: 0 Global Step: 8830 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 21:58:49,550-Speed 5972.61 samples/sec Loss 15.6027 LearningRate 0.1705 Epoch: 0 Global Step: 8840 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 21:58:56,427-Speed 5957.80 samples/sec Loss 15.5618 LearningRate 0.1707 Epoch: 0 Global Step: 8850 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 21:59:03,291-Speed 5968.42 samples/sec Loss 15.5316 LearningRate 0.1709 Epoch: 0 Global Step: 8860 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 21:59:10,172-Speed 5953.36 samples/sec Loss 15.5202 LearningRate 0.1711 Epoch: 0 Global Step: 8870 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 21:59:17,062-Speed 5946.22 samples/sec Loss 15.6317 LearningRate 0.1713 Epoch: 0 Global Step: 8880 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 21:59:23,936-Speed 5961.51 samples/sec Loss 15.4633 LearningRate 0.1715 Epoch: 0 Global Step: 8890 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 21:59:30,825-Speed 5946.84 samples/sec Loss 15.5509 LearningRate 0.1717 Epoch: 0 Global Step: 8900 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 21:59:37,678-Speed 5978.01 samples/sec Loss 15.5200 LearningRate 0.1719 Epoch: 0 Global Step: 8910 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 21:59:44,549-Speed 5962.87 samples/sec Loss 15.5216 LearningRate 0.1721 Epoch: 0 Global Step: 8920 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 21:59:51,409-Speed 5971.65 samples/sec Loss 15.5121 LearningRate 0.1722 Epoch: 0 Global Step: 8930 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 21:59:58,251-Speed 5987.55 samples/sec Loss 15.4882 LearningRate 0.1724 Epoch: 0 Global Step: 8940 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:00:05,137-Speed 5949.23 samples/sec Loss 15.3887 LearningRate 0.1726 Epoch: 0 Global Step: 8950 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:00:12,060-Speed 5917.53 samples/sec Loss 15.3911 LearningRate 0.1728 Epoch: 0 Global Step: 8960 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:00:18,920-Speed 5973.38 samples/sec Loss 15.4255 LearningRate 0.1730 Epoch: 0 Global Step: 8970 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:00:25,800-Speed 5955.86 samples/sec Loss 15.4783 LearningRate 0.1732 Epoch: 0 Global Step: 8980 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:00:32,659-Speed 5972.04 samples/sec Loss 15.4224 LearningRate 0.1734 Epoch: 0 Global Step: 8990 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:00:39,622-Speed 5886.68 samples/sec Loss 15.4192 LearningRate 0.1736 Epoch: 0 Global Step: 9000 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:00:46,478-Speed 5974.87 samples/sec Loss 15.3679 LearningRate 0.1738 Epoch: 0 Global Step: 9010 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:00:53,436-Speed 5889.08 samples/sec Loss 15.3293 LearningRate 0.1740 Epoch: 0 Global Step: 9020 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:01:00,413-Speed 5871.70 samples/sec Loss 15.4114 LearningRate 0.1742 Epoch: 0 Global Step: 9030 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:01:07,338-Speed 5916.54 samples/sec Loss 15.2618 LearningRate 0.1744 Epoch: 0 Global Step: 9040 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:01:14,208-Speed 5963.45 samples/sec Loss 15.3206 LearningRate 0.1746 Epoch: 0 Global Step: 9050 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:01:21,061-Speed 5978.01 samples/sec Loss 15.2302 LearningRate 0.1748 Epoch: 0 Global Step: 9060 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:01:27,913-Speed 5981.51 samples/sec Loss 15.2747 LearningRate 0.1749 Epoch: 0 Global Step: 9070 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:01:34,759-Speed 5983.42 samples/sec Loss 15.2203 LearningRate 0.1751 Epoch: 0 Global Step: 9080 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:01:41,607-Speed 5982.45 samples/sec Loss 15.2309 LearningRate 0.1753 Epoch: 0 Global Step: 9090 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:01:48,460-Speed 5978.22 samples/sec Loss 15.2254 LearningRate 0.1755 Epoch: 0 Global Step: 9100 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:01:55,332-Speed 5961.88 samples/sec Loss 15.1388 LearningRate 0.1757 Epoch: 0 Global Step: 9110 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:02:02,184-Speed 5979.45 samples/sec Loss 15.1683 LearningRate 0.1759 Epoch: 0 Global Step: 9120 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:02:09,045-Speed 5970.56 samples/sec Loss 15.1888 LearningRate 0.1761 Epoch: 0 Global Step: 9130 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:02:15,896-Speed 5979.68 samples/sec Loss 15.2158 LearningRate 0.1763 Epoch: 0 Global Step: 9140 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:02:22,752-Speed 5976.27 samples/sec Loss 15.2421 LearningRate 0.1765 Epoch: 0 Global Step: 9150 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:02:29,621-Speed 5964.50 samples/sec Loss 15.2884 LearningRate 0.1767 Epoch: 0 Global Step: 9160 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:02:36,499-Speed 5956.47 samples/sec Loss 15.1859 LearningRate 0.1769 Epoch: 0 Global Step: 9170 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:02:43,353-Speed 5976.57 samples/sec Loss 15.1560 LearningRate 0.1771 Epoch: 0 Global Step: 9180 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:02:50,226-Speed 5961.16 samples/sec Loss 15.1291 LearningRate 0.1773 Epoch: 0 Global Step: 9190 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:02:57,084-Speed 5973.72 samples/sec Loss 15.1348 LearningRate 0.1775 Epoch: 0 Global Step: 9200 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:03:03,963-Speed 5955.48 samples/sec Loss 15.1061 LearningRate 0.1776 Epoch: 0 Global Step: 9210 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:03:10,805-Speed 5988.06 samples/sec Loss 15.0376 LearningRate 0.1778 Epoch: 0 Global Step: 9220 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:03:19,506-Speed 4707.76 samples/sec Loss 15.0068 LearningRate 0.1780 Epoch: 0 Global Step: 9230 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:03:26,370-Speed 5971.57 samples/sec Loss 15.0558 LearningRate 0.1782 Epoch: 0 Global Step: 9240 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:03:33,237-Speed 5965.14 samples/sec Loss 15.0669 LearningRate 0.1784 Epoch: 0 Global Step: 9250 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:03:40,089-Speed 5979.11 samples/sec Loss 15.0292 LearningRate 0.1786 Epoch: 0 Global Step: 9260 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:03:46,938-Speed 5980.95 samples/sec Loss 15.0814 LearningRate 0.1788 Epoch: 0 Global Step: 9270 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:03:53,805-Speed 5966.18 samples/sec Loss 15.0248 LearningRate 0.1790 Epoch: 0 Global Step: 9280 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:04:00,657-Speed 5978.95 samples/sec Loss 15.1007 LearningRate 0.1792 Epoch: 0 Global Step: 9290 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:04:07,533-Speed 5957.96 samples/sec Loss 15.0560 LearningRate 0.1794 Epoch: 0 Global Step: 9300 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:04:14,411-Speed 5958.42 samples/sec Loss 15.0078 LearningRate 0.1796 Epoch: 0 Global Step: 9310 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:04:21,287-Speed 5958.40 samples/sec Loss 14.9496 LearningRate 0.1798 Epoch: 0 Global Step: 9320 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:04:28,191-Speed 5933.90 samples/sec Loss 15.0092 LearningRate 0.1800 Epoch: 0 Global Step: 9330 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:04:35,043-Speed 5979.85 samples/sec Loss 15.0581 LearningRate 0.1802 Epoch: 0 Global Step: 9340 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:04:41,915-Speed 5960.88 samples/sec Loss 14.9066 LearningRate 0.1803 Epoch: 0 Global Step: 9350 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:04:48,772-Speed 5975.30 samples/sec Loss 14.9653 LearningRate 0.1805 Epoch: 0 Global Step: 9360 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:04:55,616-Speed 5984.97 samples/sec Loss 14.9535 LearningRate 0.1807 Epoch: 0 Global Step: 9370 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:05:02,490-Speed 5959.58 samples/sec Loss 14.9437 LearningRate 0.1809 Epoch: 0 Global Step: 9380 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:05:09,358-Speed 5965.26 samples/sec Loss 14.9131 LearningRate 0.1811 Epoch: 0 Global Step: 9390 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:05:16,212-Speed 5977.78 samples/sec Loss 14.8616 LearningRate 0.1813 Epoch: 0 Global Step: 9400 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:05:23,099-Speed 5947.67 samples/sec Loss 14.8862 LearningRate 0.1815 Epoch: 0 Global Step: 9410 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:05:29,972-Speed 5961.38 samples/sec Loss 14.8848 LearningRate 0.1817 Epoch: 0 Global Step: 9420 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:05:36,832-Speed 5974.62 samples/sec Loss 14.9471 LearningRate 0.1819 Epoch: 0 Global Step: 9430 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:05:43,717-Speed 5950.09 samples/sec Loss 14.7083 LearningRate 0.1821 Epoch: 0 Global Step: 9440 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:05:50,617-Speed 5946.64 samples/sec Loss 14.8556 LearningRate 0.1823 Epoch: 0 Global Step: 9450 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:05:57,466-Speed 5981.99 samples/sec Loss 14.7837 LearningRate 0.1825 Epoch: 0 Global Step: 9460 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-07 22:06:04,331-Speed 5967.87 samples/sec Loss 14.7805 LearningRate 0.1827 Epoch: 0 Global Step: 9470 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-07 22:06:11,198-Speed 5966.21 samples/sec Loss 14.7731 LearningRate 0.1829 Epoch: 0 Global Step: 9480 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-07 22:06:18,043-Speed 5984.43 samples/sec Loss 14.7736 LearningRate 0.1830 Epoch: 0 Global Step: 9490 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-07 22:06:24,896-Speed 5978.25 samples/sec Loss 14.6963 LearningRate 0.1832 Epoch: 0 Global Step: 9500 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-07 22:06:31,769-Speed 5961.05 samples/sec Loss 14.7555 LearningRate 0.1834 Epoch: 0 Global Step: 9510 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-07 22:06:38,632-Speed 5969.31 samples/sec Loss 14.8794 LearningRate 0.1836 Epoch: 0 Global Step: 9520 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-07 22:06:45,503-Speed 5962.42 samples/sec Loss 14.7393 LearningRate 0.1838 Epoch: 0 Global Step: 9530 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-07 22:06:52,371-Speed 5964.55 samples/sec Loss 14.6987 LearningRate 0.1840 Epoch: 0 Global Step: 9540 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-07 22:06:59,231-Speed 5972.35 samples/sec Loss 14.8191 LearningRate 0.1842 Epoch: 0 Global Step: 9550 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-07 22:07:06,118-Speed 5949.69 samples/sec Loss 14.7644 LearningRate 0.1844 Epoch: 0 Global Step: 9560 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:07:12,970-Speed 5979.39 samples/sec Loss 14.7596 LearningRate 0.1846 Epoch: 0 Global Step: 9570 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:07:19,842-Speed 5960.79 samples/sec Loss 14.6877 LearningRate 0.1848 Epoch: 0 Global Step: 9580 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:07:26,735-Speed 5944.03 samples/sec Loss 14.6889 LearningRate 0.1850 Epoch: 0 Global Step: 9590 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:07:33,607-Speed 5961.82 samples/sec Loss 14.7247 LearningRate 0.1852 Epoch: 0 Global Step: 9600 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:07:40,473-Speed 5966.84 samples/sec Loss 14.6631 LearningRate 0.1854 Epoch: 0 Global Step: 9610 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:07:47,337-Speed 5968.04 samples/sec Loss 14.6609 LearningRate 0.1856 Epoch: 0 Global Step: 9620 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:07:54,211-Speed 5960.12 samples/sec Loss 14.7423 LearningRate 0.1857 Epoch: 0 Global Step: 9630 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:08:01,080-Speed 5964.56 samples/sec Loss 14.6019 LearningRate 0.1859 Epoch: 0 Global Step: 9640 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:08:07,942-Speed 5969.92 samples/sec Loss 14.7086 LearningRate 0.1861 Epoch: 0 Global Step: 9650 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:08:14,808-Speed 5966.55 samples/sec Loss 14.7254 LearningRate 0.1863 Epoch: 0 Global Step: 9660 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:08:21,671-Speed 5970.03 samples/sec Loss 14.6023 LearningRate 0.1865 Epoch: 0 Global Step: 9670 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:08:28,576-Speed 5932.58 samples/sec Loss 14.6784 LearningRate 0.1867 Epoch: 0 Global Step: 9680 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:08:35,432-Speed 5974.64 samples/sec Loss 14.6009 LearningRate 0.1869 Epoch: 0 Global Step: 9690 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:08:42,294-Speed 5970.17 samples/sec Loss 14.6094 LearningRate 0.1871 Epoch: 0 Global Step: 9700 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:08:49,151-Speed 5974.48 samples/sec Loss 14.6191 LearningRate 0.1873 Epoch: 0 Global Step: 9710 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:08:56,052-Speed 5937.18 samples/sec Loss 14.5669 LearningRate 0.1875 Epoch: 0 Global Step: 9720 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:09:02,974-Speed 5917.40 samples/sec Loss 14.4762 LearningRate 0.1877 Epoch: 0 Global Step: 9730 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:09:09,859-Speed 5950.68 samples/sec Loss 14.5649 LearningRate 0.1879 Epoch: 0 Global Step: 9740 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:09:16,707-Speed 5982.81 samples/sec Loss 14.6165 LearningRate 0.1881 Epoch: 0 Global Step: 9750 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:09:23,556-Speed 5981.32 samples/sec Loss 14.5682 LearningRate 0.1883 Epoch: 0 Global Step: 9760 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:09:30,404-Speed 5982.34 samples/sec Loss 14.5815 LearningRate 0.1884 Epoch: 0 Global Step: 9770 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:09:37,260-Speed 5974.96 samples/sec Loss 14.5272 LearningRate 0.1886 Epoch: 0 Global Step: 9780 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:09:44,112-Speed 5978.71 samples/sec Loss 14.4669 LearningRate 0.1888 Epoch: 0 Global Step: 9790 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:09:51,012-Speed 5938.28 samples/sec Loss 14.5058 LearningRate 0.1890 Epoch: 0 Global Step: 9800 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:09:57,865-Speed 5978.57 samples/sec Loss 14.5419 LearningRate 0.1892 Epoch: 0 Global Step: 9810 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:10:04,699-Speed 5994.10 samples/sec Loss 14.5219 LearningRate 0.1894 Epoch: 0 Global Step: 9820 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-07 22:10:11,560-Speed 5971.40 samples/sec Loss 14.5208 LearningRate 0.1896 Epoch: 0 Global Step: 9830 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-07 22:10:18,425-Speed 5968.15 samples/sec Loss 14.4499 LearningRate 0.1898 Epoch: 0 Global Step: 9840 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-07 22:10:25,277-Speed 5978.81 samples/sec Loss 14.4971 LearningRate 0.1900 Epoch: 0 Global Step: 9850 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-07 22:10:32,121-Speed 5986.56 samples/sec Loss 14.4689 LearningRate 0.1902 Epoch: 0 Global Step: 9860 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-07 22:10:38,968-Speed 5983.50 samples/sec Loss 14.3801 LearningRate 0.1904 Epoch: 0 Global Step: 9870 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-07 22:10:45,821-Speed 5978.25 samples/sec Loss 14.6264 LearningRate 0.1906 Epoch: 0 Global Step: 9880 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-07 22:10:52,679-Speed 5973.39 samples/sec Loss 14.4772 LearningRate 0.1908 Epoch: 0 Global Step: 9890 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-07 22:10:59,531-Speed 5979.28 samples/sec Loss 14.4862 LearningRate 0.1910 Epoch: 0 Global Step: 9900 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-07 22:11:06,380-Speed 5981.58 samples/sec Loss 14.4451 LearningRate 0.1911 Epoch: 0 Global Step: 9910 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-07 22:11:13,240-Speed 5977.03 samples/sec Loss 14.3839 LearningRate 0.1913 Epoch: 0 Global Step: 9920 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:11:20,110-Speed 5963.20 samples/sec Loss 14.4148 LearningRate 0.1915 Epoch: 0 Global Step: 9930 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:11:26,966-Speed 5975.77 samples/sec Loss 14.2782 LearningRate 0.1917 Epoch: 0 Global Step: 9940 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:11:33,827-Speed 5971.22 samples/sec Loss 14.4280 LearningRate 0.1919 Epoch: 0 Global Step: 9950 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:11:40,674-Speed 5982.52 samples/sec Loss 14.4327 LearningRate 0.1921 Epoch: 0 Global Step: 9960 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:11:47,548-Speed 5960.80 samples/sec Loss 14.3736 LearningRate 0.1923 Epoch: 0 Global Step: 9970 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:11:54,408-Speed 5974.15 samples/sec Loss 14.3185 LearningRate 0.1925 Epoch: 0 Global Step: 9980 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-07 22:12:01,279-Speed 5962.80 samples/sec Loss 14.3071 LearningRate 0.1927 Epoch: 0 Global Step: 9990 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-01-07 22:12:08,154-Speed 5958.91 samples/sec Loss 14.3788 LearningRate 0.1929 Epoch: 0 Global Step: 10000 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-01-07 22:12:35,258-[lfw][10000]XNorm: 23.939884 Training: 2022-01-07 22:12:35,259-[lfw][10000]Accuracy-Flip: 0.99600+-0.00238 Training: 2022-01-07 22:12:35,259-[lfw][10000]Accuracy-Highest: 0.99600 Training: 2022-01-07 22:13:06,053-[cfp_fp][10000]XNorm: 21.784641 Training: 2022-01-07 22:13:06,054-[cfp_fp][10000]Accuracy-Flip: 0.96643+-0.01007 Training: 2022-01-07 22:13:06,055-[cfp_fp][10000]Accuracy-Highest: 0.96643 Training: 2022-01-07 22:13:32,717-[agedb_30][10000]XNorm: 23.338620 Training: 2022-01-07 22:13:32,718-[agedb_30][10000]Accuracy-Flip: 0.94517+-0.01275 Training: 2022-01-07 22:13:32,719-[agedb_30][10000]Accuracy-Highest: 0.94517 Training: 2022-01-07 22:13:39,562-Speed 448.12 samples/sec Loss 14.3137 LearningRate 0.1931 Epoch: 0 Global Step: 10010 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-01-07 22:13:46,413-Speed 5979.83 samples/sec Loss 14.2814 LearningRate 0.1933 Epoch: 0 Global Step: 10020 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-01-07 22:13:53,246-Speed 5995.44 samples/sec Loss 14.3279 LearningRate 0.1935 Epoch: 0 Global Step: 10030 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-01-07 22:14:00,098-Speed 5979.38 samples/sec Loss 14.2759 LearningRate 0.1937 Epoch: 0 Global Step: 10040 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-01-07 22:14:06,961-Speed 5969.44 samples/sec Loss 14.3463 LearningRate 0.1938 Epoch: 0 Global Step: 10050 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-01-07 22:14:13,807-Speed 5983.98 samples/sec Loss 14.2758 LearningRate 0.1940 Epoch: 0 Global Step: 10060 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-01-07 22:14:20,674-Speed 5969.06 samples/sec Loss 14.3367 LearningRate 0.1942 Epoch: 0 Global Step: 10070 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-01-07 22:14:27,537-Speed 5971.23 samples/sec Loss 14.3381 LearningRate 0.1944 Epoch: 0 Global Step: 10080 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-01-07 22:14:34,396-Speed 5975.83 samples/sec Loss 14.2366 LearningRate 0.1946 Epoch: 0 Global Step: 10090 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-07 22:14:41,245-Speed 5980.45 samples/sec Loss 14.3585 LearningRate 0.1948 Epoch: 0 Global Step: 10100 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-07 22:14:48,120-Speed 5959.91 samples/sec Loss 14.2282 LearningRate 0.1950 Epoch: 0 Global Step: 10110 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-07 22:14:54,996-Speed 5957.39 samples/sec Loss 14.2740 LearningRate 0.1952 Epoch: 0 Global Step: 10120 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-07 22:15:01,879-Speed 5955.00 samples/sec Loss 14.2033 LearningRate 0.1954 Epoch: 0 Global Step: 10130 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-07 22:15:08,723-Speed 5985.40 samples/sec Loss 14.3284 LearningRate 0.1956 Epoch: 0 Global Step: 10140 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-07 22:15:15,580-Speed 5975.23 samples/sec Loss 14.3477 LearningRate 0.1958 Epoch: 0 Global Step: 10150 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-07 22:15:22,441-Speed 5970.54 samples/sec Loss 14.2149 LearningRate 0.1960 Epoch: 0 Global Step: 10160 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-07 22:15:29,319-Speed 5957.12 samples/sec Loss 14.2125 LearningRate 0.1962 Epoch: 0 Global Step: 10170 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-07 22:15:36,166-Speed 5984.80 samples/sec Loss 14.2637 LearningRate 0.1964 Epoch: 0 Global Step: 10180 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-07 22:15:43,079-Speed 5925.52 samples/sec Loss 14.1972 LearningRate 0.1965 Epoch: 0 Global Step: 10190 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 22:15:50,332-Speed 5649.18 samples/sec Loss 14.1581 LearningRate 0.1967 Epoch: 0 Global Step: 10200 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 22:15:57,945-Speed 5381.06 samples/sec Loss 14.2759 LearningRate 0.1969 Epoch: 0 Global Step: 10210 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 22:16:04,809-Speed 5968.26 samples/sec Loss 14.1730 LearningRate 0.1971 Epoch: 0 Global Step: 10220 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 22:16:11,648-Speed 5991.02 samples/sec Loss 14.2360 LearningRate 0.1973 Epoch: 0 Global Step: 10230 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 22:16:18,494-Speed 5984.15 samples/sec Loss 14.2386 LearningRate 0.1975 Epoch: 0 Global Step: 10240 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 22:16:25,340-Speed 5984.11 samples/sec Loss 14.1822 LearningRate 0.1977 Epoch: 0 Global Step: 10250 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 22:16:32,189-Speed 5981.18 samples/sec Loss 14.1233 LearningRate 0.1979 Epoch: 0 Global Step: 10260 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 22:16:39,027-Speed 5991.48 samples/sec Loss 14.1420 LearningRate 0.1981 Epoch: 0 Global Step: 10270 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 22:16:45,892-Speed 5966.98 samples/sec Loss 14.1991 LearningRate 0.1983 Epoch: 0 Global Step: 10280 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 22:16:52,750-Speed 5973.97 samples/sec Loss 14.1614 LearningRate 0.1985 Epoch: 0 Global Step: 10290 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 22:16:59,599-Speed 5982.00 samples/sec Loss 14.1625 LearningRate 0.1987 Epoch: 0 Global Step: 10300 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 22:17:06,462-Speed 5968.66 samples/sec Loss 14.2343 LearningRate 0.1989 Epoch: 0 Global Step: 10310 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 22:17:13,346-Speed 5951.54 samples/sec Loss 14.1250 LearningRate 0.1991 Epoch: 0 Global Step: 10320 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 22:17:20,232-Speed 5950.19 samples/sec Loss 14.1839 LearningRate 0.1992 Epoch: 0 Global Step: 10330 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 22:17:27,104-Speed 5961.20 samples/sec Loss 14.2074 LearningRate 0.1994 Epoch: 0 Global Step: 10340 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 22:17:33,951-Speed 5983.55 samples/sec Loss 14.1740 LearningRate 0.1996 Epoch: 0 Global Step: 10350 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 22:17:40,828-Speed 5957.19 samples/sec Loss 14.1352 LearningRate 0.1998 Epoch: 0 Global Step: 10360 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 22:17:47,689-Speed 5971.16 samples/sec Loss 14.1692 LearningRate 0.2000 Epoch: 0 Global Step: 10370 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 22:18:13,103-Speed 1611.83 samples/sec Loss 14.1098 LearningRate 0.2002 Epoch: 1 Global Step: 10380 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 22:18:19,936-Speed 5995.96 samples/sec Loss 14.1282 LearningRate 0.2004 Epoch: 1 Global Step: 10390 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 22:18:26,769-Speed 5995.23 samples/sec Loss 14.0333 LearningRate 0.2006 Epoch: 1 Global Step: 10400 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 22:18:33,613-Speed 5985.93 samples/sec Loss 14.0302 LearningRate 0.2008 Epoch: 1 Global Step: 10410 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 22:18:40,454-Speed 5988.31 samples/sec Loss 14.1824 LearningRate 0.2010 Epoch: 1 Global Step: 10420 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 22:18:47,317-Speed 5970.12 samples/sec Loss 14.0695 LearningRate 0.2012 Epoch: 1 Global Step: 10430 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 22:18:54,191-Speed 5960.42 samples/sec Loss 14.0892 LearningRate 0.2014 Epoch: 1 Global Step: 10440 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 22:19:01,063-Speed 5961.11 samples/sec Loss 14.0349 LearningRate 0.2016 Epoch: 1 Global Step: 10450 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 22:19:07,983-Speed 5920.33 samples/sec Loss 14.0302 LearningRate 0.2018 Epoch: 1 Global Step: 10460 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 22:19:14,907-Speed 5917.19 samples/sec Loss 13.9972 LearningRate 0.2019 Epoch: 1 Global Step: 10470 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 22:19:21,816-Speed 5930.00 samples/sec Loss 14.0190 LearningRate 0.2021 Epoch: 1 Global Step: 10480 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 22:19:28,710-Speed 5942.67 samples/sec Loss 13.9766 LearningRate 0.2023 Epoch: 1 Global Step: 10490 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 22:19:35,592-Speed 5953.25 samples/sec Loss 13.9245 LearningRate 0.2025 Epoch: 1 Global Step: 10500 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 22:19:42,472-Speed 5953.85 samples/sec Loss 14.0812 LearningRate 0.2027 Epoch: 1 Global Step: 10510 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 22:19:49,340-Speed 5967.27 samples/sec Loss 14.0322 LearningRate 0.2029 Epoch: 1 Global Step: 10520 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 22:19:56,199-Speed 5972.80 samples/sec Loss 13.9873 LearningRate 0.2031 Epoch: 1 Global Step: 10530 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 22:20:03,048-Speed 5981.00 samples/sec Loss 13.9751 LearningRate 0.2033 Epoch: 1 Global Step: 10540 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 22:20:09,899-Speed 5980.57 samples/sec Loss 14.0550 LearningRate 0.2035 Epoch: 1 Global Step: 10550 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 22:20:16,770-Speed 5961.62 samples/sec Loss 14.0107 LearningRate 0.2037 Epoch: 1 Global Step: 10560 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 22:20:23,640-Speed 5963.87 samples/sec Loss 14.0037 LearningRate 0.2039 Epoch: 1 Global Step: 10570 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 22:20:30,510-Speed 5963.50 samples/sec Loss 14.0178 LearningRate 0.2041 Epoch: 1 Global Step: 10580 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 22:20:37,376-Speed 5967.09 samples/sec Loss 13.9391 LearningRate 0.2043 Epoch: 1 Global Step: 10590 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 22:20:44,241-Speed 5967.57 samples/sec Loss 13.9715 LearningRate 0.2045 Epoch: 1 Global Step: 10600 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 22:20:51,099-Speed 5973.60 samples/sec Loss 13.9977 LearningRate 0.2046 Epoch: 1 Global Step: 10610 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 22:20:57,968-Speed 5964.44 samples/sec Loss 14.0022 LearningRate 0.2048 Epoch: 1 Global Step: 10620 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 22:21:04,834-Speed 5966.62 samples/sec Loss 13.8824 LearningRate 0.2050 Epoch: 1 Global Step: 10630 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 22:21:11,684-Speed 5980.75 samples/sec Loss 13.9032 LearningRate 0.2052 Epoch: 1 Global Step: 10640 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 22:21:18,555-Speed 5962.36 samples/sec Loss 13.8106 LearningRate 0.2054 Epoch: 1 Global Step: 10650 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 22:21:25,419-Speed 5968.56 samples/sec Loss 13.9920 LearningRate 0.2056 Epoch: 1 Global Step: 10660 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 22:21:32,292-Speed 5960.82 samples/sec Loss 13.9742 LearningRate 0.2058 Epoch: 1 Global Step: 10670 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 22:21:39,161-Speed 5963.47 samples/sec Loss 14.0033 LearningRate 0.2060 Epoch: 1 Global Step: 10680 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 22:21:46,025-Speed 5971.03 samples/sec Loss 13.9077 LearningRate 0.2062 Epoch: 1 Global Step: 10690 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 22:21:52,893-Speed 5965.37 samples/sec Loss 13.9826 LearningRate 0.2064 Epoch: 1 Global Step: 10700 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 22:21:59,748-Speed 5976.12 samples/sec Loss 13.9030 LearningRate 0.2066 Epoch: 1 Global Step: 10710 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 22:22:06,611-Speed 5969.69 samples/sec Loss 13.9293 LearningRate 0.2068 Epoch: 1 Global Step: 10720 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 22:22:13,463-Speed 5978.29 samples/sec Loss 13.9585 LearningRate 0.2070 Epoch: 1 Global Step: 10730 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 22:22:20,351-Speed 5947.61 samples/sec Loss 13.8371 LearningRate 0.2072 Epoch: 1 Global Step: 10740 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 22:22:27,215-Speed 5968.53 samples/sec Loss 13.8993 LearningRate 0.2073 Epoch: 1 Global Step: 10750 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 22:22:34,077-Speed 5970.22 samples/sec Loss 13.9453 LearningRate 0.2075 Epoch: 1 Global Step: 10760 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-07 22:22:40,953-Speed 5958.02 samples/sec Loss 13.8497 LearningRate 0.2077 Epoch: 1 Global Step: 10770 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 22:22:47,817-Speed 5967.72 samples/sec Loss 13.8728 LearningRate 0.2079 Epoch: 1 Global Step: 10780 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 22:22:54,670-Speed 5978.14 samples/sec Loss 13.9127 LearningRate 0.2081 Epoch: 1 Global Step: 10790 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 22:23:01,526-Speed 5975.35 samples/sec Loss 13.8699 LearningRate 0.2083 Epoch: 1 Global Step: 10800 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 22:23:08,392-Speed 5966.19 samples/sec Loss 13.9828 LearningRate 0.2085 Epoch: 1 Global Step: 10810 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-07 22:23:15,249-Speed 5976.57 samples/sec Loss 13.8497 LearningRate 0.2087 Epoch: 1 Global Step: 10820 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:23:22,107-Speed 5973.53 samples/sec Loss 13.8933 LearningRate 0.2089 Epoch: 1 Global Step: 10830 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:23:28,970-Speed 5969.43 samples/sec Loss 13.8995 LearningRate 0.2091 Epoch: 1 Global Step: 10840 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:23:35,833-Speed 5969.54 samples/sec Loss 13.8354 LearningRate 0.2093 Epoch: 1 Global Step: 10850 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:23:42,688-Speed 5976.34 samples/sec Loss 13.9544 LearningRate 0.2095 Epoch: 1 Global Step: 10860 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:23:49,548-Speed 5971.00 samples/sec Loss 13.8224 LearningRate 0.2097 Epoch: 1 Global Step: 10870 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:23:56,425-Speed 5956.78 samples/sec Loss 13.8277 LearningRate 0.2099 Epoch: 1 Global Step: 10880 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:24:03,331-Speed 5932.72 samples/sec Loss 13.8511 LearningRate 0.2100 Epoch: 1 Global Step: 10890 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:24:10,288-Speed 5888.49 samples/sec Loss 13.8512 LearningRate 0.2102 Epoch: 1 Global Step: 10900 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:24:17,139-Speed 5980.03 samples/sec Loss 13.8121 LearningRate 0.2104 Epoch: 1 Global Step: 10910 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:24:24,029-Speed 5946.24 samples/sec Loss 13.7923 LearningRate 0.2106 Epoch: 1 Global Step: 10920 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:24:30,902-Speed 5960.13 samples/sec Loss 13.7511 LearningRate 0.2108 Epoch: 1 Global Step: 10930 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:24:37,788-Speed 5949.03 samples/sec Loss 13.7990 LearningRate 0.2110 Epoch: 1 Global Step: 10940 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:24:44,658-Speed 5963.98 samples/sec Loss 13.7352 LearningRate 0.2112 Epoch: 1 Global Step: 10950 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:24:51,519-Speed 5971.69 samples/sec Loss 13.7823 LearningRate 0.2114 Epoch: 1 Global Step: 10960 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:24:58,422-Speed 5934.40 samples/sec Loss 13.7727 LearningRate 0.2116 Epoch: 1 Global Step: 10970 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:25:05,293-Speed 5962.18 samples/sec Loss 13.8628 LearningRate 0.2118 Epoch: 1 Global Step: 10980 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:25:12,166-Speed 5960.59 samples/sec Loss 13.8466 LearningRate 0.2120 Epoch: 1 Global Step: 10990 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:25:19,018-Speed 5980.37 samples/sec Loss 13.8052 LearningRate 0.2122 Epoch: 1 Global Step: 11000 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:25:25,878-Speed 5971.92 samples/sec Loss 13.7807 LearningRate 0.2124 Epoch: 1 Global Step: 11010 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:25:32,757-Speed 5955.87 samples/sec Loss 13.8322 LearningRate 0.2126 Epoch: 1 Global Step: 11020 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:25:39,625-Speed 5964.38 samples/sec Loss 13.7183 LearningRate 0.2127 Epoch: 1 Global Step: 11030 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:25:46,490-Speed 5967.75 samples/sec Loss 13.7025 LearningRate 0.2129 Epoch: 1 Global Step: 11040 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:25:53,364-Speed 5961.81 samples/sec Loss 13.7508 LearningRate 0.2131 Epoch: 1 Global Step: 11050 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:26:00,223-Speed 5973.37 samples/sec Loss 13.8307 LearningRate 0.2133 Epoch: 1 Global Step: 11060 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:26:07,096-Speed 5960.46 samples/sec Loss 13.7563 LearningRate 0.2135 Epoch: 1 Global Step: 11070 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:26:13,962-Speed 5966.99 samples/sec Loss 13.7717 LearningRate 0.2137 Epoch: 1 Global Step: 11080 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:26:20,833-Speed 5962.38 samples/sec Loss 13.7175 LearningRate 0.2139 Epoch: 1 Global Step: 11090 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:26:27,706-Speed 5960.15 samples/sec Loss 13.7015 LearningRate 0.2141 Epoch: 1 Global Step: 11100 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:26:34,564-Speed 5973.76 samples/sec Loss 13.7830 LearningRate 0.2143 Epoch: 1 Global Step: 11110 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:26:41,442-Speed 5955.80 samples/sec Loss 13.6656 LearningRate 0.2145 Epoch: 1 Global Step: 11120 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:26:48,308-Speed 5967.57 samples/sec Loss 13.7459 LearningRate 0.2147 Epoch: 1 Global Step: 11130 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:26:55,167-Speed 5972.09 samples/sec Loss 13.7614 LearningRate 0.2149 Epoch: 1 Global Step: 11140 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:27:02,030-Speed 5969.55 samples/sec Loss 13.7711 LearningRate 0.2151 Epoch: 1 Global Step: 11150 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:27:08,916-Speed 5952.13 samples/sec Loss 13.6937 LearningRate 0.2153 Epoch: 1 Global Step: 11160 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:27:15,783-Speed 5965.31 samples/sec Loss 13.7464 LearningRate 0.2154 Epoch: 1 Global Step: 11170 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:27:22,639-Speed 5975.79 samples/sec Loss 13.7480 LearningRate 0.2156 Epoch: 1 Global Step: 11180 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:27:29,524-Speed 5950.25 samples/sec Loss 13.7492 LearningRate 0.2158 Epoch: 1 Global Step: 11190 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:27:36,377-Speed 5978.14 samples/sec Loss 13.6757 LearningRate 0.2160 Epoch: 1 Global Step: 11200 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:27:43,252-Speed 5958.29 samples/sec Loss 13.7197 LearningRate 0.2162 Epoch: 1 Global Step: 11210 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:27:50,120-Speed 5964.44 samples/sec Loss 13.7666 LearningRate 0.2164 Epoch: 1 Global Step: 11220 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:27:56,993-Speed 5963.02 samples/sec Loss 13.6873 LearningRate 0.2166 Epoch: 1 Global Step: 11230 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:28:03,875-Speed 5954.47 samples/sec Loss 13.7739 LearningRate 0.2168 Epoch: 1 Global Step: 11240 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:28:10,729-Speed 5976.34 samples/sec Loss 13.7123 LearningRate 0.2170 Epoch: 1 Global Step: 11250 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:28:17,606-Speed 5957.70 samples/sec Loss 13.7345 LearningRate 0.2172 Epoch: 1 Global Step: 11260 Fp16 Grad Scale: 524288 Required: 38 hours Training: 2022-01-07 22:28:24,456-Speed 5980.81 samples/sec Loss 13.7241 LearningRate 0.2174 Epoch: 1 Global Step: 11270 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:28:31,338-Speed 5952.07 samples/sec Loss 13.6636 LearningRate 0.2176 Epoch: 1 Global Step: 11280 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:28:38,214-Speed 5958.97 samples/sec Loss 13.6032 LearningRate 0.2178 Epoch: 1 Global Step: 11290 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:28:45,095-Speed 5953.01 samples/sec Loss 13.7058 LearningRate 0.2180 Epoch: 1 Global Step: 11300 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:28:51,940-Speed 5984.94 samples/sec Loss 13.6561 LearningRate 0.2182 Epoch: 1 Global Step: 11310 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:28:58,805-Speed 5968.09 samples/sec Loss 13.6144 LearningRate 0.2183 Epoch: 1 Global Step: 11320 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:29:05,679-Speed 5963.07 samples/sec Loss 13.6600 LearningRate 0.2185 Epoch: 1 Global Step: 11330 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:29:12,558-Speed 5955.44 samples/sec Loss 13.6718 LearningRate 0.2187 Epoch: 1 Global Step: 11340 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:29:19,423-Speed 5967.43 samples/sec Loss 13.6897 LearningRate 0.2189 Epoch: 1 Global Step: 11350 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:29:26,284-Speed 5971.75 samples/sec Loss 13.6609 LearningRate 0.2191 Epoch: 1 Global Step: 11360 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:29:33,138-Speed 5976.50 samples/sec Loss 13.5745 LearningRate 0.2193 Epoch: 1 Global Step: 11370 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:29:40,008-Speed 5963.34 samples/sec Loss 13.6890 LearningRate 0.2195 Epoch: 1 Global Step: 11380 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:29:46,882-Speed 5959.90 samples/sec Loss 13.6210 LearningRate 0.2197 Epoch: 1 Global Step: 11390 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:29:53,730-Speed 5981.62 samples/sec Loss 13.5843 LearningRate 0.2199 Epoch: 1 Global Step: 11400 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:30:00,584-Speed 5977.12 samples/sec Loss 13.6330 LearningRate 0.2201 Epoch: 1 Global Step: 11410 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:30:07,448-Speed 5968.12 samples/sec Loss 13.6152 LearningRate 0.2203 Epoch: 1 Global Step: 11420 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:30:14,312-Speed 5969.20 samples/sec Loss 13.6855 LearningRate 0.2205 Epoch: 1 Global Step: 11430 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:30:21,189-Speed 5957.01 samples/sec Loss 13.6872 LearningRate 0.2207 Epoch: 1 Global Step: 11440 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:30:28,061-Speed 5961.52 samples/sec Loss 13.6349 LearningRate 0.2209 Epoch: 1 Global Step: 11450 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:30:34,936-Speed 5959.30 samples/sec Loss 13.6502 LearningRate 0.2210 Epoch: 1 Global Step: 11460 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:30:41,779-Speed 5987.51 samples/sec Loss 13.5804 LearningRate 0.2212 Epoch: 1 Global Step: 11470 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:30:48,633-Speed 5976.37 samples/sec Loss 13.5876 LearningRate 0.2214 Epoch: 1 Global Step: 11480 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:30:55,499-Speed 5967.24 samples/sec Loss 13.7085 LearningRate 0.2216 Epoch: 1 Global Step: 11490 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:31:02,348-Speed 5983.46 samples/sec Loss 13.6242 LearningRate 0.2218 Epoch: 1 Global Step: 11500 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:31:09,206-Speed 5973.30 samples/sec Loss 13.5619 LearningRate 0.2220 Epoch: 1 Global Step: 11510 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:31:16,060-Speed 5976.67 samples/sec Loss 13.6250 LearningRate 0.2222 Epoch: 1 Global Step: 11520 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:31:22,926-Speed 5968.93 samples/sec Loss 13.6700 LearningRate 0.2224 Epoch: 1 Global Step: 11530 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:31:29,814-Speed 5947.58 samples/sec Loss 13.6171 LearningRate 0.2226 Epoch: 1 Global Step: 11540 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:31:36,679-Speed 5981.81 samples/sec Loss 13.5750 LearningRate 0.2228 Epoch: 1 Global Step: 11550 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:31:43,550-Speed 5963.28 samples/sec Loss 13.5104 LearningRate 0.2230 Epoch: 1 Global Step: 11560 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:31:50,419-Speed 5964.11 samples/sec Loss 13.7035 LearningRate 0.2232 Epoch: 1 Global Step: 11570 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:31:57,293-Speed 5960.50 samples/sec Loss 13.6351 LearningRate 0.2234 Epoch: 1 Global Step: 11580 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:32:04,168-Speed 5963.96 samples/sec Loss 13.5961 LearningRate 0.2236 Epoch: 1 Global Step: 11590 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:32:11,129-Speed 5885.91 samples/sec Loss 13.6058 LearningRate 0.2237 Epoch: 1 Global Step: 11600 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:32:18,001-Speed 5961.66 samples/sec Loss 13.5534 LearningRate 0.2239 Epoch: 1 Global Step: 11610 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:32:24,854-Speed 5978.06 samples/sec Loss 13.5400 LearningRate 0.2241 Epoch: 1 Global Step: 11620 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:32:31,731-Speed 5957.53 samples/sec Loss 13.5385 LearningRate 0.2243 Epoch: 1 Global Step: 11630 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:32:38,605-Speed 5959.84 samples/sec Loss 13.4874 LearningRate 0.2245 Epoch: 1 Global Step: 11640 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:32:45,466-Speed 5973.12 samples/sec Loss 13.5296 LearningRate 0.2247 Epoch: 1 Global Step: 11650 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:32:52,335-Speed 5963.84 samples/sec Loss 13.6287 LearningRate 0.2249 Epoch: 1 Global Step: 11660 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:32:59,185-Speed 5980.59 samples/sec Loss 13.5643 LearningRate 0.2251 Epoch: 1 Global Step: 11670 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:33:06,038-Speed 5977.60 samples/sec Loss 13.5808 LearningRate 0.2253 Epoch: 1 Global Step: 11680 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:33:13,036-Speed 5854.65 samples/sec Loss 13.5032 LearningRate 0.2255 Epoch: 1 Global Step: 11690 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:33:19,981-Speed 5899.26 samples/sec Loss 13.5169 LearningRate 0.2257 Epoch: 1 Global Step: 11700 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:33:26,928-Speed 5896.85 samples/sec Loss 13.5284 LearningRate 0.2259 Epoch: 1 Global Step: 11710 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:33:33,819-Speed 5945.87 samples/sec Loss 13.6204 LearningRate 0.2261 Epoch: 1 Global Step: 11720 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:33:40,679-Speed 5971.84 samples/sec Loss 13.6002 LearningRate 0.2263 Epoch: 1 Global Step: 11730 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:33:48,159-Speed 5478.21 samples/sec Loss 13.6105 LearningRate 0.2264 Epoch: 1 Global Step: 11740 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:33:54,997-Speed 5991.91 samples/sec Loss 13.5823 LearningRate 0.2266 Epoch: 1 Global Step: 11750 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:34:01,846-Speed 5981.34 samples/sec Loss 13.5175 LearningRate 0.2268 Epoch: 1 Global Step: 11760 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:34:08,798-Speed 5893.69 samples/sec Loss 13.5645 LearningRate 0.2270 Epoch: 1 Global Step: 11770 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:34:15,664-Speed 5966.18 samples/sec Loss 13.5154 LearningRate 0.2272 Epoch: 1 Global Step: 11780 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:34:22,556-Speed 5944.54 samples/sec Loss 13.5497 LearningRate 0.2274 Epoch: 1 Global Step: 11790 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:34:29,404-Speed 5981.89 samples/sec Loss 13.5809 LearningRate 0.2276 Epoch: 1 Global Step: 11800 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:34:36,263-Speed 5972.61 samples/sec Loss 13.5916 LearningRate 0.2278 Epoch: 1 Global Step: 11810 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:34:43,131-Speed 5965.27 samples/sec Loss 13.4983 LearningRate 0.2280 Epoch: 1 Global Step: 11820 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:34:50,003-Speed 5961.51 samples/sec Loss 13.5457 LearningRate 0.2282 Epoch: 1 Global Step: 11830 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:34:56,861-Speed 5973.31 samples/sec Loss 13.5313 LearningRate 0.2284 Epoch: 1 Global Step: 11840 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:35:03,760-Speed 5938.58 samples/sec Loss 13.5980 LearningRate 0.2286 Epoch: 1 Global Step: 11850 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:35:10,630-Speed 5963.31 samples/sec Loss 13.6129 LearningRate 0.2288 Epoch: 1 Global Step: 11860 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:35:17,505-Speed 5959.77 samples/sec Loss 13.5270 LearningRate 0.2290 Epoch: 1 Global Step: 11870 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:35:24,390-Speed 5950.37 samples/sec Loss 13.4769 LearningRate 0.2291 Epoch: 1 Global Step: 11880 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:35:31,250-Speed 5971.91 samples/sec Loss 13.5047 LearningRate 0.2293 Epoch: 1 Global Step: 11890 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:35:38,098-Speed 5982.49 samples/sec Loss 13.5638 LearningRate 0.2295 Epoch: 1 Global Step: 11900 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:35:44,986-Speed 5947.88 samples/sec Loss 13.6013 LearningRate 0.2297 Epoch: 1 Global Step: 11910 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:35:51,842-Speed 5976.81 samples/sec Loss 13.5010 LearningRate 0.2299 Epoch: 1 Global Step: 11920 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:35:58,790-Speed 5895.58 samples/sec Loss 13.5016 LearningRate 0.2301 Epoch: 1 Global Step: 11930 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:36:05,661-Speed 5963.04 samples/sec Loss 13.4794 LearningRate 0.2303 Epoch: 1 Global Step: 11940 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:36:12,543-Speed 5952.62 samples/sec Loss 13.4979 LearningRate 0.2305 Epoch: 1 Global Step: 11950 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:36:19,403-Speed 5972.04 samples/sec Loss 13.5020 LearningRate 0.2307 Epoch: 1 Global Step: 11960 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:36:26,252-Speed 5982.15 samples/sec Loss 13.4064 LearningRate 0.2309 Epoch: 1 Global Step: 11970 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:36:33,104-Speed 5978.89 samples/sec Loss 13.4889 LearningRate 0.2311 Epoch: 1 Global Step: 11980 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:36:40,002-Speed 5939.28 samples/sec Loss 13.6197 LearningRate 0.2313 Epoch: 1 Global Step: 11990 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:36:46,860-Speed 5974.12 samples/sec Loss 13.4884 LearningRate 0.2315 Epoch: 1 Global Step: 12000 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:36:53,703-Speed 5986.62 samples/sec Loss 13.5918 LearningRate 0.2317 Epoch: 1 Global Step: 12010 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:37:00,563-Speed 5971.28 samples/sec Loss 13.5334 LearningRate 0.2318 Epoch: 1 Global Step: 12020 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:37:07,414-Speed 5980.76 samples/sec Loss 13.5287 LearningRate 0.2320 Epoch: 1 Global Step: 12030 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:37:14,279-Speed 5976.89 samples/sec Loss 13.4388 LearningRate 0.2322 Epoch: 1 Global Step: 12040 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:37:21,174-Speed 5981.20 samples/sec Loss 13.4080 LearningRate 0.2324 Epoch: 1 Global Step: 12050 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:37:28,025-Speed 5979.86 samples/sec Loss 13.5209 LearningRate 0.2326 Epoch: 1 Global Step: 12060 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:37:34,880-Speed 5976.64 samples/sec Loss 13.3983 LearningRate 0.2328 Epoch: 1 Global Step: 12070 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:37:41,731-Speed 5979.33 samples/sec Loss 13.3840 LearningRate 0.2330 Epoch: 1 Global Step: 12080 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:37:48,663-Speed 5910.43 samples/sec Loss 13.3153 LearningRate 0.2332 Epoch: 1 Global Step: 12090 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:37:55,529-Speed 5967.04 samples/sec Loss 13.4354 LearningRate 0.2334 Epoch: 1 Global Step: 12100 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:38:02,389-Speed 5971.70 samples/sec Loss 13.5573 LearningRate 0.2336 Epoch: 1 Global Step: 12110 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:38:09,258-Speed 5963.76 samples/sec Loss 13.5579 LearningRate 0.2338 Epoch: 1 Global Step: 12120 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:38:16,113-Speed 5977.32 samples/sec Loss 13.4677 LearningRate 0.2340 Epoch: 1 Global Step: 12130 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:38:22,976-Speed 5969.89 samples/sec Loss 13.4408 LearningRate 0.2342 Epoch: 1 Global Step: 12140 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:38:29,833-Speed 5974.16 samples/sec Loss 13.4713 LearningRate 0.2344 Epoch: 1 Global Step: 12150 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:38:36,695-Speed 5970.32 samples/sec Loss 13.4582 LearningRate 0.2345 Epoch: 1 Global Step: 12160 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:38:44,289-Speed 5395.19 samples/sec Loss 13.4835 LearningRate 0.2347 Epoch: 1 Global Step: 12170 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:38:51,146-Speed 5974.77 samples/sec Loss 13.4743 LearningRate 0.2349 Epoch: 1 Global Step: 12180 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:38:58,023-Speed 5957.12 samples/sec Loss 13.4596 LearningRate 0.2351 Epoch: 1 Global Step: 12190 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:39:04,893-Speed 5962.84 samples/sec Loss 13.4159 LearningRate 0.2353 Epoch: 1 Global Step: 12200 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:39:11,741-Speed 5982.10 samples/sec Loss 13.4644 LearningRate 0.2355 Epoch: 1 Global Step: 12210 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:39:18,702-Speed 5885.35 samples/sec Loss 13.4724 LearningRate 0.2357 Epoch: 1 Global Step: 12220 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:39:25,556-Speed 5977.63 samples/sec Loss 13.4566 LearningRate 0.2359 Epoch: 1 Global Step: 12230 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:39:32,416-Speed 5971.38 samples/sec Loss 13.4031 LearningRate 0.2361 Epoch: 1 Global Step: 12240 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:39:39,310-Speed 5942.73 samples/sec Loss 13.4809 LearningRate 0.2363 Epoch: 1 Global Step: 12250 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:39:46,174-Speed 5971.69 samples/sec Loss 13.4125 LearningRate 0.2365 Epoch: 1 Global Step: 12260 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:39:53,045-Speed 5964.48 samples/sec Loss 13.4082 LearningRate 0.2367 Epoch: 1 Global Step: 12270 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:39:59,897-Speed 5978.60 samples/sec Loss 13.3818 LearningRate 0.2369 Epoch: 1 Global Step: 12280 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:40:06,779-Speed 5953.38 samples/sec Loss 13.3761 LearningRate 0.2371 Epoch: 1 Global Step: 12290 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:40:13,645-Speed 5966.96 samples/sec Loss 13.4921 LearningRate 0.2372 Epoch: 1 Global Step: 12300 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:40:20,534-Speed 5947.11 samples/sec Loss 13.4065 LearningRate 0.2374 Epoch: 1 Global Step: 12310 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:40:27,414-Speed 5954.63 samples/sec Loss 13.4754 LearningRate 0.2376 Epoch: 1 Global Step: 12320 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:40:34,292-Speed 5955.89 samples/sec Loss 13.3684 LearningRate 0.2378 Epoch: 1 Global Step: 12330 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:40:41,219-Speed 5914.70 samples/sec Loss 13.4700 LearningRate 0.2380 Epoch: 1 Global Step: 12340 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:40:48,082-Speed 5969.56 samples/sec Loss 13.5197 LearningRate 0.2382 Epoch: 1 Global Step: 12350 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:40:54,942-Speed 5972.24 samples/sec Loss 13.4776 LearningRate 0.2384 Epoch: 1 Global Step: 12360 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:41:01,797-Speed 5975.76 samples/sec Loss 13.4367 LearningRate 0.2386 Epoch: 1 Global Step: 12370 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:41:08,643-Speed 5984.10 samples/sec Loss 13.4597 LearningRate 0.2388 Epoch: 1 Global Step: 12380 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:41:15,483-Speed 5989.75 samples/sec Loss 13.4099 LearningRate 0.2390 Epoch: 1 Global Step: 12390 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:41:22,344-Speed 5971.40 samples/sec Loss 13.4023 LearningRate 0.2392 Epoch: 1 Global Step: 12400 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:41:29,208-Speed 5968.43 samples/sec Loss 13.3859 LearningRate 0.2394 Epoch: 1 Global Step: 12410 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:41:36,073-Speed 5967.26 samples/sec Loss 13.4308 LearningRate 0.2396 Epoch: 1 Global Step: 12420 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:41:42,926-Speed 5978.23 samples/sec Loss 13.4157 LearningRate 0.2398 Epoch: 1 Global Step: 12430 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:41:49,778-Speed 5979.00 samples/sec Loss 13.4859 LearningRate 0.2399 Epoch: 1 Global Step: 12440 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:41:56,672-Speed 5942.50 samples/sec Loss 13.4966 LearningRate 0.2401 Epoch: 1 Global Step: 12450 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:42:03,527-Speed 5976.23 samples/sec Loss 13.3919 LearningRate 0.2403 Epoch: 1 Global Step: 12460 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:42:10,411-Speed 5951.07 samples/sec Loss 13.5871 LearningRate 0.2405 Epoch: 1 Global Step: 12470 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:42:17,273-Speed 5970.62 samples/sec Loss 13.5010 LearningRate 0.2407 Epoch: 1 Global Step: 12480 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:42:24,128-Speed 5975.31 samples/sec Loss 13.3839 LearningRate 0.2409 Epoch: 1 Global Step: 12490 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:42:31,057-Speed 5915.63 samples/sec Loss 13.2821 LearningRate 0.2411 Epoch: 1 Global Step: 12500 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:42:40,241-Speed 4461.10 samples/sec Loss 13.3748 LearningRate 0.2413 Epoch: 1 Global Step: 12510 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:42:47,135-Speed 5942.57 samples/sec Loss 13.3135 LearningRate 0.2415 Epoch: 1 Global Step: 12520 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:42:53,981-Speed 5983.55 samples/sec Loss 13.4362 LearningRate 0.2417 Epoch: 1 Global Step: 12530 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:43:00,830-Speed 5982.56 samples/sec Loss 13.3551 LearningRate 0.2419 Epoch: 1 Global Step: 12540 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:43:07,703-Speed 5960.75 samples/sec Loss 13.3940 LearningRate 0.2421 Epoch: 1 Global Step: 12550 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:43:14,548-Speed 5984.80 samples/sec Loss 13.3821 LearningRate 0.2423 Epoch: 1 Global Step: 12560 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:43:21,397-Speed 5981.28 samples/sec Loss 13.4084 LearningRate 0.2425 Epoch: 1 Global Step: 12570 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:43:28,253-Speed 5975.92 samples/sec Loss 13.4886 LearningRate 0.2426 Epoch: 1 Global Step: 12580 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:43:35,097-Speed 5985.71 samples/sec Loss 13.3540 LearningRate 0.2428 Epoch: 1 Global Step: 12590 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-07 22:43:41,958-Speed 5971.84 samples/sec Loss 13.4508 LearningRate 0.2430 Epoch: 1 Global Step: 12600 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-07 22:43:48,824-Speed 5967.34 samples/sec Loss 13.2850 LearningRate 0.2432 Epoch: 1 Global Step: 12610 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-07 22:43:55,678-Speed 5976.82 samples/sec Loss 13.3951 LearningRate 0.2434 Epoch: 1 Global Step: 12620 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-07 22:44:02,621-Speed 5900.97 samples/sec Loss 13.3448 LearningRate 0.2436 Epoch: 1 Global Step: 12630 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-07 22:44:09,478-Speed 5974.67 samples/sec Loss 13.3857 LearningRate 0.2438 Epoch: 1 Global Step: 12640 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-07 22:44:16,375-Speed 5940.30 samples/sec Loss 13.4477 LearningRate 0.2440 Epoch: 1 Global Step: 12650 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-07 22:44:23,238-Speed 5968.79 samples/sec Loss 13.3595 LearningRate 0.2442 Epoch: 1 Global Step: 12660 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-07 22:44:30,089-Speed 5980.02 samples/sec Loss 13.3928 LearningRate 0.2444 Epoch: 1 Global Step: 12670 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-07 22:44:36,945-Speed 5976.07 samples/sec Loss 13.3819 LearningRate 0.2446 Epoch: 1 Global Step: 12680 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-07 22:44:43,797-Speed 5978.72 samples/sec Loss 13.4128 LearningRate 0.2448 Epoch: 1 Global Step: 12690 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:44:50,658-Speed 5970.78 samples/sec Loss 13.4026 LearningRate 0.2450 Epoch: 1 Global Step: 12700 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:44:57,518-Speed 5972.35 samples/sec Loss 13.4589 LearningRate 0.2452 Epoch: 1 Global Step: 12710 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:45:04,370-Speed 5978.92 samples/sec Loss 13.4362 LearningRate 0.2453 Epoch: 1 Global Step: 12720 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:45:11,263-Speed 5943.73 samples/sec Loss 13.4275 LearningRate 0.2455 Epoch: 1 Global Step: 12730 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:45:18,142-Speed 5966.25 samples/sec Loss 13.4018 LearningRate 0.2457 Epoch: 1 Global Step: 12740 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:45:25,005-Speed 5969.37 samples/sec Loss 13.3887 LearningRate 0.2459 Epoch: 1 Global Step: 12750 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:45:31,853-Speed 5984.42 samples/sec Loss 13.3583 LearningRate 0.2461 Epoch: 1 Global Step: 12760 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:45:38,697-Speed 5985.73 samples/sec Loss 13.4276 LearningRate 0.2463 Epoch: 1 Global Step: 12770 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:45:45,549-Speed 5978.64 samples/sec Loss 13.4564 LearningRate 0.2465 Epoch: 1 Global Step: 12780 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:45:52,396-Speed 5982.71 samples/sec Loss 13.3388 LearningRate 0.2467 Epoch: 1 Global Step: 12790 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:45:59,270-Speed 5963.51 samples/sec Loss 13.3192 LearningRate 0.2469 Epoch: 1 Global Step: 12800 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:46:06,123-Speed 5977.80 samples/sec Loss 13.3994 LearningRate 0.2471 Epoch: 1 Global Step: 12810 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:46:12,993-Speed 5964.22 samples/sec Loss 13.3300 LearningRate 0.2473 Epoch: 1 Global Step: 12820 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:46:19,844-Speed 5979.87 samples/sec Loss 13.4093 LearningRate 0.2475 Epoch: 1 Global Step: 12830 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:46:26,697-Speed 5978.11 samples/sec Loss 13.4447 LearningRate 0.2477 Epoch: 1 Global Step: 12840 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:46:33,586-Speed 5946.54 samples/sec Loss 13.3710 LearningRate 0.2479 Epoch: 1 Global Step: 12850 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:46:40,428-Speed 5988.21 samples/sec Loss 13.3203 LearningRate 0.2480 Epoch: 1 Global Step: 12860 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:46:47,283-Speed 5977.66 samples/sec Loss 13.4673 LearningRate 0.2482 Epoch: 1 Global Step: 12870 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:46:54,132-Speed 5981.52 samples/sec Loss 13.4230 LearningRate 0.2484 Epoch: 1 Global Step: 12880 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:47:00,983-Speed 5979.98 samples/sec Loss 13.3872 LearningRate 0.2486 Epoch: 1 Global Step: 12890 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:47:07,838-Speed 5976.80 samples/sec Loss 13.3727 LearningRate 0.2488 Epoch: 1 Global Step: 12900 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:47:14,695-Speed 5977.19 samples/sec Loss 13.3786 LearningRate 0.2490 Epoch: 1 Global Step: 12910 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:47:21,545-Speed 5980.22 samples/sec Loss 13.3260 LearningRate 0.2492 Epoch: 1 Global Step: 12920 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:47:28,390-Speed 5984.45 samples/sec Loss 13.4114 LearningRate 0.2494 Epoch: 1 Global Step: 12930 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:47:35,242-Speed 5979.37 samples/sec Loss 13.3893 LearningRate 0.2496 Epoch: 1 Global Step: 12940 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:47:42,090-Speed 5983.28 samples/sec Loss 13.3332 LearningRate 0.2498 Epoch: 1 Global Step: 12950 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:47:48,933-Speed 5988.42 samples/sec Loss 13.3690 LearningRate 0.2500 Epoch: 1 Global Step: 12960 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:47:55,793-Speed 5972.33 samples/sec Loss 13.3407 LearningRate 0.2502 Epoch: 1 Global Step: 12970 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:48:02,640-Speed 5983.01 samples/sec Loss 13.3842 LearningRate 0.2504 Epoch: 1 Global Step: 12980 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:48:09,495-Speed 5975.89 samples/sec Loss 13.3845 LearningRate 0.2506 Epoch: 1 Global Step: 12990 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:48:16,381-Speed 5949.62 samples/sec Loss 13.4218 LearningRate 0.2507 Epoch: 1 Global Step: 13000 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:48:23,241-Speed 5972.12 samples/sec Loss 13.3755 LearningRate 0.2509 Epoch: 1 Global Step: 13010 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:48:30,109-Speed 5964.75 samples/sec Loss 13.4353 LearningRate 0.2511 Epoch: 1 Global Step: 13020 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:48:36,978-Speed 5965.01 samples/sec Loss 13.3840 LearningRate 0.2513 Epoch: 1 Global Step: 13030 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:48:43,836-Speed 5973.93 samples/sec Loss 13.3657 LearningRate 0.2515 Epoch: 1 Global Step: 13040 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:48:50,735-Speed 5937.95 samples/sec Loss 13.4059 LearningRate 0.2517 Epoch: 1 Global Step: 13050 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:48:57,591-Speed 5976.07 samples/sec Loss 13.4297 LearningRate 0.2519 Epoch: 1 Global Step: 13060 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:49:04,454-Speed 5968.96 samples/sec Loss 13.4385 LearningRate 0.2521 Epoch: 1 Global Step: 13070 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:49:11,310-Speed 5977.29 samples/sec Loss 13.3193 LearningRate 0.2523 Epoch: 1 Global Step: 13080 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:49:18,172-Speed 5969.52 samples/sec Loss 13.3120 LearningRate 0.2525 Epoch: 1 Global Step: 13090 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:49:25,026-Speed 5978.31 samples/sec Loss 13.4094 LearningRate 0.2527 Epoch: 1 Global Step: 13100 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:49:31,872-Speed 5983.77 samples/sec Loss 13.3162 LearningRate 0.2529 Epoch: 1 Global Step: 13110 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:49:38,725-Speed 5977.23 samples/sec Loss 13.4074 LearningRate 0.2531 Epoch: 1 Global Step: 13120 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:49:45,624-Speed 5938.81 samples/sec Loss 13.2848 LearningRate 0.2533 Epoch: 1 Global Step: 13130 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:49:52,504-Speed 5955.16 samples/sec Loss 13.3886 LearningRate 0.2534 Epoch: 1 Global Step: 13140 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:49:59,463-Speed 5887.96 samples/sec Loss 13.3250 LearningRate 0.2536 Epoch: 1 Global Step: 13150 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:50:06,319-Speed 5975.67 samples/sec Loss 13.3919 LearningRate 0.2538 Epoch: 1 Global Step: 13160 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:50:13,151-Speed 5996.27 samples/sec Loss 13.4338 LearningRate 0.2540 Epoch: 1 Global Step: 13170 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-07 22:50:20,031-Speed 5954.95 samples/sec Loss 13.4173 LearningRate 0.2542 Epoch: 1 Global Step: 13180 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-07 22:50:26,909-Speed 5956.13 samples/sec Loss 13.4546 LearningRate 0.2544 Epoch: 1 Global Step: 13190 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-07 22:50:33,769-Speed 5972.57 samples/sec Loss 13.4122 LearningRate 0.2546 Epoch: 1 Global Step: 13200 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-07 22:50:40,619-Speed 5980.19 samples/sec Loss 13.4007 LearningRate 0.2548 Epoch: 1 Global Step: 13210 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-07 22:50:47,467-Speed 5982.65 samples/sec Loss 13.3444 LearningRate 0.2550 Epoch: 1 Global Step: 13220 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-07 22:50:54,323-Speed 5975.62 samples/sec Loss 13.3490 LearningRate 0.2552 Epoch: 1 Global Step: 13230 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-07 22:51:01,166-Speed 5986.04 samples/sec Loss 13.3728 LearningRate 0.2554 Epoch: 1 Global Step: 13240 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-07 22:51:08,015-Speed 5981.84 samples/sec Loss 13.3331 LearningRate 0.2556 Epoch: 1 Global Step: 13250 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-07 22:51:14,893-Speed 5957.57 samples/sec Loss 13.3943 LearningRate 0.2558 Epoch: 1 Global Step: 13260 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-07 22:51:21,747-Speed 5977.01 samples/sec Loss 13.2752 LearningRate 0.2560 Epoch: 1 Global Step: 13270 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:51:28,603-Speed 5976.15 samples/sec Loss 13.2742 LearningRate 0.2561 Epoch: 1 Global Step: 13280 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:51:35,460-Speed 5974.04 samples/sec Loss 13.3283 LearningRate 0.2563 Epoch: 1 Global Step: 13290 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:51:42,309-Speed 5981.66 samples/sec Loss 13.2997 LearningRate 0.2565 Epoch: 1 Global Step: 13300 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:51:49,151-Speed 5987.25 samples/sec Loss 13.4128 LearningRate 0.2567 Epoch: 1 Global Step: 13310 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:51:56,010-Speed 5973.04 samples/sec Loss 13.2805 LearningRate 0.2569 Epoch: 1 Global Step: 13320 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:52:02,865-Speed 5976.49 samples/sec Loss 13.3333 LearningRate 0.2571 Epoch: 1 Global Step: 13330 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:52:09,721-Speed 5975.21 samples/sec Loss 13.4299 LearningRate 0.2573 Epoch: 1 Global Step: 13340 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:52:16,581-Speed 5972.14 samples/sec Loss 13.3732 LearningRate 0.2575 Epoch: 1 Global Step: 13350 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:52:23,430-Speed 5981.56 samples/sec Loss 13.3717 LearningRate 0.2577 Epoch: 1 Global Step: 13360 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:52:30,285-Speed 5976.44 samples/sec Loss 13.4000 LearningRate 0.2579 Epoch: 1 Global Step: 13370 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:52:37,149-Speed 5968.72 samples/sec Loss 13.3502 LearningRate 0.2581 Epoch: 1 Global Step: 13380 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:52:44,024-Speed 5961.92 samples/sec Loss 13.3221 LearningRate 0.2583 Epoch: 1 Global Step: 13390 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:52:50,879-Speed 5975.53 samples/sec Loss 13.3590 LearningRate 0.2585 Epoch: 1 Global Step: 13400 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:52:57,736-Speed 5975.28 samples/sec Loss 13.3513 LearningRate 0.2587 Epoch: 1 Global Step: 13410 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:53:04,581-Speed 5985.10 samples/sec Loss 13.3000 LearningRate 0.2588 Epoch: 1 Global Step: 13420 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:53:11,432-Speed 5979.22 samples/sec Loss 13.3401 LearningRate 0.2590 Epoch: 1 Global Step: 13430 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:53:18,306-Speed 5961.32 samples/sec Loss 13.4202 LearningRate 0.2592 Epoch: 1 Global Step: 13440 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:53:25,159-Speed 5978.05 samples/sec Loss 13.3750 LearningRate 0.2594 Epoch: 1 Global Step: 13450 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:53:32,010-Speed 5979.46 samples/sec Loss 13.3453 LearningRate 0.2596 Epoch: 1 Global Step: 13460 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:53:38,881-Speed 5962.76 samples/sec Loss 13.3172 LearningRate 0.2598 Epoch: 1 Global Step: 13470 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:53:45,740-Speed 5972.15 samples/sec Loss 13.3339 LearningRate 0.2600 Epoch: 1 Global Step: 13480 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:53:52,601-Speed 5971.21 samples/sec Loss 13.3951 LearningRate 0.2602 Epoch: 1 Global Step: 13490 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:53:59,463-Speed 5970.62 samples/sec Loss 13.3484 LearningRate 0.2604 Epoch: 1 Global Step: 13500 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:54:06,327-Speed 5968.48 samples/sec Loss 13.3612 LearningRate 0.2606 Epoch: 1 Global Step: 13510 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:54:13,188-Speed 5971.19 samples/sec Loss 13.4038 LearningRate 0.2608 Epoch: 1 Global Step: 13520 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:54:20,047-Speed 5972.76 samples/sec Loss 13.3616 LearningRate 0.2610 Epoch: 1 Global Step: 13530 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:54:26,917-Speed 5969.03 samples/sec Loss 13.3115 LearningRate 0.2612 Epoch: 1 Global Step: 13540 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:54:33,800-Speed 5951.65 samples/sec Loss 13.3324 LearningRate 0.2614 Epoch: 1 Global Step: 13550 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:54:40,639-Speed 5990.33 samples/sec Loss 13.3609 LearningRate 0.2615 Epoch: 1 Global Step: 13560 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:54:47,502-Speed 5969.59 samples/sec Loss 13.3426 LearningRate 0.2617 Epoch: 1 Global Step: 13570 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:54:54,360-Speed 5972.99 samples/sec Loss 13.4526 LearningRate 0.2619 Epoch: 1 Global Step: 13580 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:55:01,217-Speed 5974.74 samples/sec Loss 13.3280 LearningRate 0.2621 Epoch: 1 Global Step: 13590 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:55:08,131-Speed 5924.90 samples/sec Loss 13.3066 LearningRate 0.2623 Epoch: 1 Global Step: 13600 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:55:14,982-Speed 5980.39 samples/sec Loss 13.2876 LearningRate 0.2625 Epoch: 1 Global Step: 13610 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:55:21,857-Speed 5964.24 samples/sec Loss 13.3827 LearningRate 0.2627 Epoch: 1 Global Step: 13620 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:55:28,720-Speed 5971.12 samples/sec Loss 13.3172 LearningRate 0.2629 Epoch: 1 Global Step: 13630 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:55:35,585-Speed 5967.75 samples/sec Loss 13.2808 LearningRate 0.2631 Epoch: 1 Global Step: 13640 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:55:42,535-Speed 5894.48 samples/sec Loss 13.3140 LearningRate 0.2633 Epoch: 1 Global Step: 13650 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:55:49,391-Speed 5976.33 samples/sec Loss 13.3583 LearningRate 0.2635 Epoch: 1 Global Step: 13660 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:55:56,276-Speed 5949.08 samples/sec Loss 13.3435 LearningRate 0.2637 Epoch: 1 Global Step: 13670 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:56:03,155-Speed 5958.29 samples/sec Loss 13.2707 LearningRate 0.2639 Epoch: 1 Global Step: 13680 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:56:10,010-Speed 5976.05 samples/sec Loss 13.3229 LearningRate 0.2641 Epoch: 1 Global Step: 13690 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:56:16,856-Speed 5984.32 samples/sec Loss 13.3930 LearningRate 0.2642 Epoch: 1 Global Step: 13700 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:56:23,710-Speed 5977.13 samples/sec Loss 13.3522 LearningRate 0.2644 Epoch: 1 Global Step: 13710 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:56:30,570-Speed 5971.74 samples/sec Loss 13.3831 LearningRate 0.2646 Epoch: 1 Global Step: 13720 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:56:37,420-Speed 5981.00 samples/sec Loss 13.2492 LearningRate 0.2648 Epoch: 1 Global Step: 13730 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:56:44,282-Speed 5970.47 samples/sec Loss 13.4487 LearningRate 0.2650 Epoch: 1 Global Step: 13740 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:56:51,130-Speed 5982.35 samples/sec Loss 13.3583 LearningRate 0.2652 Epoch: 1 Global Step: 13750 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:56:58,018-Speed 5947.17 samples/sec Loss 13.3249 LearningRate 0.2654 Epoch: 1 Global Step: 13760 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:57:04,888-Speed 5964.85 samples/sec Loss 13.3623 LearningRate 0.2656 Epoch: 1 Global Step: 13770 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:57:11,774-Speed 5950.40 samples/sec Loss 13.3219 LearningRate 0.2658 Epoch: 1 Global Step: 13780 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:57:18,633-Speed 5972.47 samples/sec Loss 13.3856 LearningRate 0.2660 Epoch: 1 Global Step: 13790 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:57:25,498-Speed 5967.84 samples/sec Loss 13.3582 LearningRate 0.2662 Epoch: 1 Global Step: 13800 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:57:32,350-Speed 5978.57 samples/sec Loss 13.3538 LearningRate 0.2664 Epoch: 1 Global Step: 13810 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:57:39,204-Speed 5977.68 samples/sec Loss 13.3269 LearningRate 0.2666 Epoch: 1 Global Step: 13820 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:57:46,074-Speed 5962.95 samples/sec Loss 13.3918 LearningRate 0.2668 Epoch: 1 Global Step: 13830 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:57:52,924-Speed 5979.96 samples/sec Loss 13.2970 LearningRate 0.2669 Epoch: 1 Global Step: 13840 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:57:59,812-Speed 5950.85 samples/sec Loss 13.3519 LearningRate 0.2671 Epoch: 1 Global Step: 13850 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:58:06,677-Speed 5967.41 samples/sec Loss 13.4387 LearningRate 0.2673 Epoch: 1 Global Step: 13860 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:58:13,545-Speed 5965.19 samples/sec Loss 13.3872 LearningRate 0.2675 Epoch: 1 Global Step: 13870 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:58:20,423-Speed 5956.67 samples/sec Loss 13.3093 LearningRate 0.2677 Epoch: 1 Global Step: 13880 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:58:27,267-Speed 5985.41 samples/sec Loss 13.4046 LearningRate 0.2679 Epoch: 1 Global Step: 13890 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:58:34,129-Speed 5970.57 samples/sec Loss 13.3765 LearningRate 0.2681 Epoch: 1 Global Step: 13900 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:58:40,988-Speed 5973.44 samples/sec Loss 13.3732 LearningRate 0.2683 Epoch: 1 Global Step: 13910 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:58:47,845-Speed 5974.23 samples/sec Loss 13.2572 LearningRate 0.2685 Epoch: 1 Global Step: 13920 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:58:54,707-Speed 5970.40 samples/sec Loss 13.4115 LearningRate 0.2687 Epoch: 1 Global Step: 13930 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:59:01,571-Speed 5968.04 samples/sec Loss 13.2762 LearningRate 0.2689 Epoch: 1 Global Step: 13940 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 22:59:08,478-Speed 5931.91 samples/sec Loss 13.3598 LearningRate 0.2691 Epoch: 1 Global Step: 13950 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:59:15,401-Speed 5919.65 samples/sec Loss 13.3261 LearningRate 0.2693 Epoch: 1 Global Step: 13960 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:59:22,332-Speed 5910.08 samples/sec Loss 13.3511 LearningRate 0.2695 Epoch: 1 Global Step: 13970 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:59:29,247-Speed 5925.25 samples/sec Loss 13.3789 LearningRate 0.2696 Epoch: 1 Global Step: 13980 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:59:36,108-Speed 5970.95 samples/sec Loss 13.4042 LearningRate 0.2698 Epoch: 1 Global Step: 13990 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:59:42,972-Speed 5968.56 samples/sec Loss 13.5184 LearningRate 0.2700 Epoch: 1 Global Step: 14000 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:59:49,834-Speed 5969.72 samples/sec Loss 13.3888 LearningRate 0.2702 Epoch: 1 Global Step: 14010 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 22:59:56,728-Speed 5943.13 samples/sec Loss 13.3770 LearningRate 0.2704 Epoch: 1 Global Step: 14020 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 23:00:03,678-Speed 5895.47 samples/sec Loss 13.3109 LearningRate 0.2706 Epoch: 1 Global Step: 14030 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 23:00:10,624-Speed 5900.27 samples/sec Loss 13.4707 LearningRate 0.2708 Epoch: 1 Global Step: 14040 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 23:00:17,528-Speed 5938.79 samples/sec Loss 13.3500 LearningRate 0.2710 Epoch: 1 Global Step: 14050 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:00:24,408-Speed 5955.28 samples/sec Loss 13.3226 LearningRate 0.2712 Epoch: 1 Global Step: 14060 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:00:31,297-Speed 5946.97 samples/sec Loss 13.3664 LearningRate 0.2714 Epoch: 1 Global Step: 14070 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:00:38,219-Speed 5918.91 samples/sec Loss 13.4132 LearningRate 0.2716 Epoch: 1 Global Step: 14080 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:00:45,147-Speed 5913.33 samples/sec Loss 13.3968 LearningRate 0.2718 Epoch: 1 Global Step: 14090 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:00:52,010-Speed 5969.48 samples/sec Loss 13.2517 LearningRate 0.2720 Epoch: 1 Global Step: 14100 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:00:58,863-Speed 5979.34 samples/sec Loss 13.3662 LearningRate 0.2722 Epoch: 1 Global Step: 14110 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:01:05,719-Speed 5978.00 samples/sec Loss 13.3320 LearningRate 0.2724 Epoch: 1 Global Step: 14120 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:01:12,566-Speed 5985.14 samples/sec Loss 13.3620 LearningRate 0.2725 Epoch: 1 Global Step: 14130 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:01:19,416-Speed 5980.40 samples/sec Loss 13.4128 LearningRate 0.2727 Epoch: 1 Global Step: 14140 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:01:26,284-Speed 5967.00 samples/sec Loss 13.3255 LearningRate 0.2729 Epoch: 1 Global Step: 14150 Fp16 Grad Scale: 524288 Required: 38 hours Training: 2022-01-07 23:01:33,136-Speed 5979.39 samples/sec Loss 13.3650 LearningRate 0.2731 Epoch: 1 Global Step: 14160 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:01:40,007-Speed 5962.45 samples/sec Loss 13.3501 LearningRate 0.2733 Epoch: 1 Global Step: 14170 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:01:46,860-Speed 5978.11 samples/sec Loss 13.2552 LearningRate 0.2735 Epoch: 1 Global Step: 14180 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:01:53,719-Speed 5972.93 samples/sec Loss 13.3569 LearningRate 0.2737 Epoch: 1 Global Step: 14190 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:02:00,574-Speed 5977.67 samples/sec Loss 13.3420 LearningRate 0.2739 Epoch: 1 Global Step: 14200 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:02:07,447-Speed 5960.96 samples/sec Loss 13.3168 LearningRate 0.2741 Epoch: 1 Global Step: 14210 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:02:14,328-Speed 5954.29 samples/sec Loss 13.2723 LearningRate 0.2743 Epoch: 1 Global Step: 14220 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:02:21,188-Speed 5971.33 samples/sec Loss 13.2500 LearningRate 0.2745 Epoch: 1 Global Step: 14230 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:02:28,075-Speed 5948.88 samples/sec Loss 13.3454 LearningRate 0.2747 Epoch: 1 Global Step: 14240 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:02:34,927-Speed 5980.99 samples/sec Loss 13.3337 LearningRate 0.2749 Epoch: 1 Global Step: 14250 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:02:41,824-Speed 5940.56 samples/sec Loss 13.4041 LearningRate 0.2751 Epoch: 1 Global Step: 14260 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:02:48,681-Speed 5973.95 samples/sec Loss 13.3859 LearningRate 0.2752 Epoch: 1 Global Step: 14270 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:02:55,550-Speed 5965.70 samples/sec Loss 13.2703 LearningRate 0.2754 Epoch: 1 Global Step: 14280 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:03:02,404-Speed 5976.98 samples/sec Loss 13.3658 LearningRate 0.2756 Epoch: 1 Global Step: 14290 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:03:09,252-Speed 5982.64 samples/sec Loss 13.3290 LearningRate 0.2758 Epoch: 1 Global Step: 14300 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:03:16,109-Speed 5973.53 samples/sec Loss 13.3825 LearningRate 0.2760 Epoch: 1 Global Step: 14310 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:03:22,975-Speed 5967.11 samples/sec Loss 13.3303 LearningRate 0.2762 Epoch: 1 Global Step: 14320 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:03:29,829-Speed 5977.51 samples/sec Loss 13.4136 LearningRate 0.2764 Epoch: 1 Global Step: 14330 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:03:36,684-Speed 5976.37 samples/sec Loss 13.3776 LearningRate 0.2766 Epoch: 1 Global Step: 14340 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:03:43,534-Speed 5980.95 samples/sec Loss 13.4049 LearningRate 0.2768 Epoch: 1 Global Step: 14350 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:03:50,394-Speed 5971.83 samples/sec Loss 13.3844 LearningRate 0.2770 Epoch: 1 Global Step: 14360 Fp16 Grad Scale: 524288 Required: 38 hours Training: 2022-01-07 23:03:57,264-Speed 5963.03 samples/sec Loss 13.3966 LearningRate 0.2772 Epoch: 1 Global Step: 14370 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:04:04,126-Speed 5970.21 samples/sec Loss 13.3063 LearningRate 0.2774 Epoch: 1 Global Step: 14380 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:04:10,987-Speed 5971.88 samples/sec Loss 13.2916 LearningRate 0.2776 Epoch: 1 Global Step: 14390 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:04:17,874-Speed 5948.83 samples/sec Loss 13.3528 LearningRate 0.2778 Epoch: 1 Global Step: 14400 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:04:24,724-Speed 5980.23 samples/sec Loss 13.3707 LearningRate 0.2779 Epoch: 1 Global Step: 14410 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:04:31,587-Speed 5969.25 samples/sec Loss 13.3708 LearningRate 0.2781 Epoch: 1 Global Step: 14420 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:04:38,438-Speed 5979.61 samples/sec Loss 13.3413 LearningRate 0.2783 Epoch: 1 Global Step: 14430 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:04:45,289-Speed 5980.49 samples/sec Loss 13.4444 LearningRate 0.2785 Epoch: 1 Global Step: 14440 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 23:04:52,139-Speed 5981.53 samples/sec Loss 13.3996 LearningRate 0.2787 Epoch: 1 Global Step: 14450 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 23:04:59,001-Speed 5970.35 samples/sec Loss 13.4518 LearningRate 0.2789 Epoch: 1 Global Step: 14460 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 23:05:05,868-Speed 5966.08 samples/sec Loss 13.3741 LearningRate 0.2791 Epoch: 1 Global Step: 14470 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 23:05:12,712-Speed 5985.21 samples/sec Loss 13.3311 LearningRate 0.2793 Epoch: 1 Global Step: 14480 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 23:05:19,567-Speed 5976.82 samples/sec Loss 13.2830 LearningRate 0.2795 Epoch: 1 Global Step: 14490 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 23:05:26,424-Speed 5975.02 samples/sec Loss 13.4134 LearningRate 0.2797 Epoch: 1 Global Step: 14500 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 23:05:33,271-Speed 5983.38 samples/sec Loss 13.4203 LearningRate 0.2799 Epoch: 1 Global Step: 14510 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 23:05:40,142-Speed 5962.42 samples/sec Loss 13.4172 LearningRate 0.2801 Epoch: 1 Global Step: 14520 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 23:05:47,014-Speed 5964.25 samples/sec Loss 13.4270 LearningRate 0.2803 Epoch: 1 Global Step: 14530 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 23:05:53,859-Speed 5985.22 samples/sec Loss 13.3049 LearningRate 0.2805 Epoch: 1 Global Step: 14540 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:06:00,743-Speed 5950.34 samples/sec Loss 13.3042 LearningRate 0.2806 Epoch: 1 Global Step: 14550 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:06:07,602-Speed 5973.45 samples/sec Loss 13.2945 LearningRate 0.2808 Epoch: 1 Global Step: 14560 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:06:14,455-Speed 5978.40 samples/sec Loss 13.3187 LearningRate 0.2810 Epoch: 1 Global Step: 14570 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:06:21,305-Speed 5980.94 samples/sec Loss 13.4000 LearningRate 0.2812 Epoch: 1 Global Step: 14580 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 23:06:28,163-Speed 5974.67 samples/sec Loss 13.3307 LearningRate 0.2814 Epoch: 1 Global Step: 14590 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 23:06:35,021-Speed 5973.21 samples/sec Loss 13.2845 LearningRate 0.2816 Epoch: 1 Global Step: 14600 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 23:06:41,898-Speed 5957.56 samples/sec Loss 13.3296 LearningRate 0.2818 Epoch: 1 Global Step: 14610 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 23:06:48,757-Speed 5973.07 samples/sec Loss 13.3222 LearningRate 0.2820 Epoch: 1 Global Step: 14620 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 23:06:55,637-Speed 5955.27 samples/sec Loss 13.3750 LearningRate 0.2822 Epoch: 1 Global Step: 14630 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 23:07:02,486-Speed 5981.38 samples/sec Loss 13.3350 LearningRate 0.2824 Epoch: 1 Global Step: 14640 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 23:07:09,360-Speed 5959.96 samples/sec Loss 13.3443 LearningRate 0.2826 Epoch: 1 Global Step: 14650 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 23:07:16,214-Speed 5977.38 samples/sec Loss 13.3923 LearningRate 0.2828 Epoch: 1 Global Step: 14660 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 23:07:23,080-Speed 5967.25 samples/sec Loss 13.3356 LearningRate 0.2830 Epoch: 1 Global Step: 14670 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:07:29,933-Speed 5978.03 samples/sec Loss 13.3880 LearningRate 0.2832 Epoch: 1 Global Step: 14680 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:07:36,795-Speed 5970.68 samples/sec Loss 13.4061 LearningRate 0.2833 Epoch: 1 Global Step: 14690 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:07:43,654-Speed 5973.10 samples/sec Loss 13.4369 LearningRate 0.2835 Epoch: 1 Global Step: 14700 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:07:50,561-Speed 5931.12 samples/sec Loss 13.3604 LearningRate 0.2837 Epoch: 1 Global Step: 14710 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:07:57,447-Speed 5953.43 samples/sec Loss 13.4217 LearningRate 0.2839 Epoch: 1 Global Step: 14720 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:08:04,313-Speed 5966.96 samples/sec Loss 13.3687 LearningRate 0.2841 Epoch: 1 Global Step: 14730 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:08:11,182-Speed 5964.29 samples/sec Loss 13.4091 LearningRate 0.2843 Epoch: 1 Global Step: 14740 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:08:18,054-Speed 5961.96 samples/sec Loss 13.3638 LearningRate 0.2845 Epoch: 1 Global Step: 14750 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:08:24,915-Speed 5971.29 samples/sec Loss 13.3735 LearningRate 0.2847 Epoch: 1 Global Step: 14760 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:08:31,789-Speed 5959.53 samples/sec Loss 13.3975 LearningRate 0.2849 Epoch: 1 Global Step: 14770 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:08:38,639-Speed 5982.71 samples/sec Loss 13.4342 LearningRate 0.2851 Epoch: 1 Global Step: 14780 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:08:45,501-Speed 5969.96 samples/sec Loss 13.4340 LearningRate 0.2853 Epoch: 1 Global Step: 14790 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:08:52,356-Speed 5977.72 samples/sec Loss 13.3240 LearningRate 0.2855 Epoch: 1 Global Step: 14800 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:08:59,218-Speed 5970.49 samples/sec Loss 13.3655 LearningRate 0.2857 Epoch: 1 Global Step: 14810 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:09:06,069-Speed 5979.70 samples/sec Loss 13.3573 LearningRate 0.2859 Epoch: 1 Global Step: 14820 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:09:12,937-Speed 5965.23 samples/sec Loss 13.4520 LearningRate 0.2860 Epoch: 1 Global Step: 14830 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:09:19,804-Speed 5966.25 samples/sec Loss 13.4964 LearningRate 0.2862 Epoch: 1 Global Step: 14840 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:09:26,657-Speed 5979.21 samples/sec Loss 13.4579 LearningRate 0.2864 Epoch: 1 Global Step: 14850 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:09:33,502-Speed 5985.06 samples/sec Loss 13.3500 LearningRate 0.2866 Epoch: 1 Global Step: 14860 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:09:40,367-Speed 5967.27 samples/sec Loss 13.3725 LearningRate 0.2868 Epoch: 1 Global Step: 14870 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:09:47,228-Speed 5973.96 samples/sec Loss 13.3645 LearningRate 0.2870 Epoch: 1 Global Step: 14880 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:09:54,089-Speed 5971.72 samples/sec Loss 13.4489 LearningRate 0.2872 Epoch: 1 Global Step: 14890 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:10:00,947-Speed 5975.93 samples/sec Loss 13.3522 LearningRate 0.2874 Epoch: 1 Global Step: 14900 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:10:07,796-Speed 5981.50 samples/sec Loss 13.3079 LearningRate 0.2876 Epoch: 1 Global Step: 14910 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:10:14,667-Speed 5962.57 samples/sec Loss 13.4566 LearningRate 0.2878 Epoch: 1 Global Step: 14920 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:10:21,530-Speed 5969.77 samples/sec Loss 13.4482 LearningRate 0.2880 Epoch: 1 Global Step: 14930 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:10:28,380-Speed 5980.73 samples/sec Loss 13.3768 LearningRate 0.2882 Epoch: 1 Global Step: 14940 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:10:35,285-Speed 5932.44 samples/sec Loss 13.4382 LearningRate 0.2884 Epoch: 1 Global Step: 14950 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:10:42,141-Speed 5976.12 samples/sec Loss 13.3566 LearningRate 0.2886 Epoch: 1 Global Step: 14960 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:10:48,991-Speed 5980.74 samples/sec Loss 13.4080 LearningRate 0.2887 Epoch: 1 Global Step: 14970 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:10:55,844-Speed 5977.83 samples/sec Loss 13.4090 LearningRate 0.2889 Epoch: 1 Global Step: 14980 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:11:02,689-Speed 5985.54 samples/sec Loss 13.3826 LearningRate 0.2891 Epoch: 1 Global Step: 14990 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:11:09,554-Speed 5970.02 samples/sec Loss 13.4175 LearningRate 0.2893 Epoch: 1 Global Step: 15000 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:11:36,147-[lfw][15000]XNorm: 21.573219 Training: 2022-01-07 23:11:36,148-[lfw][15000]Accuracy-Flip: 0.99550+-0.00317 Training: 2022-01-07 23:11:36,148-[lfw][15000]Accuracy-Highest: 0.99600 Training: 2022-01-07 23:12:06,961-[cfp_fp][15000]XNorm: 19.733546 Training: 2022-01-07 23:12:06,962-[cfp_fp][15000]Accuracy-Flip: 0.96500+-0.00806 Training: 2022-01-07 23:12:06,963-[cfp_fp][15000]Accuracy-Highest: 0.96643 Training: 2022-01-07 23:12:33,658-[agedb_30][15000]XNorm: 21.743631 Training: 2022-01-07 23:12:33,659-[agedb_30][15000]Accuracy-Flip: 0.94533+-0.01211 Training: 2022-01-07 23:12:33,659-[agedb_30][15000]Accuracy-Highest: 0.94533 Training: 2022-01-07 23:12:40,515-Speed 450.31 samples/sec Loss 13.4131 LearningRate 0.2895 Epoch: 1 Global Step: 15010 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 23:12:47,347-Speed 5997.50 samples/sec Loss 13.4261 LearningRate 0.2897 Epoch: 1 Global Step: 15020 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 23:12:54,183-Speed 5992.56 samples/sec Loss 13.4464 LearningRate 0.2899 Epoch: 1 Global Step: 15030 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 23:13:01,037-Speed 5976.75 samples/sec Loss 13.4657 LearningRate 0.2901 Epoch: 1 Global Step: 15040 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:13:07,889-Speed 5979.19 samples/sec Loss 13.4950 LearningRate 0.2903 Epoch: 1 Global Step: 15050 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:13:14,762-Speed 5959.80 samples/sec Loss 13.4056 LearningRate 0.2905 Epoch: 1 Global Step: 15060 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:13:21,634-Speed 5961.86 samples/sec Loss 13.3735 LearningRate 0.2907 Epoch: 1 Global Step: 15070 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:13:28,505-Speed 5962.57 samples/sec Loss 13.3847 LearningRate 0.2909 Epoch: 1 Global Step: 15080 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:13:35,358-Speed 5977.74 samples/sec Loss 13.4357 LearningRate 0.2911 Epoch: 1 Global Step: 15090 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:13:42,224-Speed 5969.71 samples/sec Loss 13.3643 LearningRate 0.2913 Epoch: 1 Global Step: 15100 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:13:49,095-Speed 5962.51 samples/sec Loss 13.3815 LearningRate 0.2914 Epoch: 1 Global Step: 15110 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:13:55,978-Speed 5952.51 samples/sec Loss 13.5038 LearningRate 0.2916 Epoch: 1 Global Step: 15120 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:14:02,874-Speed 5940.68 samples/sec Loss 13.3523 LearningRate 0.2918 Epoch: 1 Global Step: 15130 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:14:09,713-Speed 5990.20 samples/sec Loss 13.3673 LearningRate 0.2920 Epoch: 1 Global Step: 15140 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:14:16,567-Speed 5977.31 samples/sec Loss 13.3770 LearningRate 0.2922 Epoch: 1 Global Step: 15150 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:14:23,467-Speed 5937.10 samples/sec Loss 13.3916 LearningRate 0.2924 Epoch: 1 Global Step: 15160 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 23:14:30,321-Speed 5978.08 samples/sec Loss 13.3713 LearningRate 0.2926 Epoch: 1 Global Step: 15170 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 23:14:37,178-Speed 5974.28 samples/sec Loss 13.3934 LearningRate 0.2928 Epoch: 1 Global Step: 15180 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 23:14:44,036-Speed 5974.36 samples/sec Loss 13.3804 LearningRate 0.2930 Epoch: 1 Global Step: 15190 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 23:14:50,897-Speed 5979.88 samples/sec Loss 13.4129 LearningRate 0.2932 Epoch: 1 Global Step: 15200 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 23:14:57,759-Speed 5974.89 samples/sec Loss 13.3840 LearningRate 0.2934 Epoch: 1 Global Step: 15210 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 23:15:04,632-Speed 5961.07 samples/sec Loss 13.3880 LearningRate 0.2936 Epoch: 1 Global Step: 15220 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 23:15:11,506-Speed 5960.41 samples/sec Loss 13.4143 LearningRate 0.2938 Epoch: 1 Global Step: 15230 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 23:15:18,378-Speed 5965.24 samples/sec Loss 13.4197 LearningRate 0.2940 Epoch: 1 Global Step: 15240 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 23:15:25,239-Speed 5971.13 samples/sec Loss 13.4558 LearningRate 0.2941 Epoch: 1 Global Step: 15250 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 23:15:32,098-Speed 5973.24 samples/sec Loss 13.4598 LearningRate 0.2943 Epoch: 1 Global Step: 15260 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:15:38,969-Speed 5965.00 samples/sec Loss 13.3531 LearningRate 0.2945 Epoch: 1 Global Step: 15270 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:15:45,829-Speed 5971.41 samples/sec Loss 13.4199 LearningRate 0.2947 Epoch: 1 Global Step: 15280 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:15:52,708-Speed 5957.22 samples/sec Loss 13.4127 LearningRate 0.2949 Epoch: 1 Global Step: 15290 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:15:59,559-Speed 5981.56 samples/sec Loss 13.4942 LearningRate 0.2951 Epoch: 1 Global Step: 15300 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:16:06,455-Speed 5942.51 samples/sec Loss 13.4453 LearningRate 0.2953 Epoch: 1 Global Step: 15310 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:16:13,318-Speed 5971.63 samples/sec Loss 13.4102 LearningRate 0.2955 Epoch: 1 Global Step: 15320 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:16:20,186-Speed 5964.73 samples/sec Loss 13.4260 LearningRate 0.2957 Epoch: 1 Global Step: 15330 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:16:27,050-Speed 5967.87 samples/sec Loss 13.5112 LearningRate 0.2959 Epoch: 1 Global Step: 15340 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:16:33,918-Speed 5965.06 samples/sec Loss 13.4856 LearningRate 0.2961 Epoch: 1 Global Step: 15350 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:16:40,766-Speed 5982.06 samples/sec Loss 13.4819 LearningRate 0.2963 Epoch: 1 Global Step: 15360 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 23:16:47,638-Speed 5961.57 samples/sec Loss 13.4270 LearningRate 0.2965 Epoch: 1 Global Step: 15370 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 23:16:54,506-Speed 5964.99 samples/sec Loss 13.4680 LearningRate 0.2967 Epoch: 1 Global Step: 15380 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 23:17:01,371-Speed 5967.44 samples/sec Loss 13.4062 LearningRate 0.2968 Epoch: 1 Global Step: 15390 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 23:17:08,249-Speed 5957.15 samples/sec Loss 13.6104 LearningRate 0.2970 Epoch: 1 Global Step: 15400 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 23:17:15,110-Speed 5970.69 samples/sec Loss 13.4143 LearningRate 0.2972 Epoch: 1 Global Step: 15410 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 23:17:21,957-Speed 5983.14 samples/sec Loss 13.4278 LearningRate 0.2974 Epoch: 1 Global Step: 15420 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 23:17:28,810-Speed 5980.43 samples/sec Loss 13.5008 LearningRate 0.2976 Epoch: 1 Global Step: 15430 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 23:17:35,679-Speed 5964.67 samples/sec Loss 13.4946 LearningRate 0.2978 Epoch: 1 Global Step: 15440 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 23:17:42,558-Speed 5955.70 samples/sec Loss 13.4833 LearningRate 0.2980 Epoch: 1 Global Step: 15450 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 23:17:49,422-Speed 5969.08 samples/sec Loss 13.4625 LearningRate 0.2982 Epoch: 1 Global Step: 15460 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:17:56,300-Speed 5955.36 samples/sec Loss 13.4593 LearningRate 0.2984 Epoch: 1 Global Step: 15470 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:18:03,165-Speed 5973.39 samples/sec Loss 13.5186 LearningRate 0.2986 Epoch: 1 Global Step: 15480 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:18:10,024-Speed 5973.52 samples/sec Loss 13.4385 LearningRate 0.2988 Epoch: 1 Global Step: 15490 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:18:16,863-Speed 5989.69 samples/sec Loss 14.1746 LearningRate 0.2990 Epoch: 1 Global Step: 15500 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-01-07 23:18:23,723-Speed 5972.11 samples/sec Loss 14.1723 LearningRate 0.2992 Epoch: 1 Global Step: 15510 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-01-07 23:18:30,568-Speed 5985.79 samples/sec Loss 13.8978 LearningRate 0.2994 Epoch: 1 Global Step: 15520 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-01-07 23:18:37,440-Speed 5961.89 samples/sec Loss 13.6282 LearningRate 0.2995 Epoch: 1 Global Step: 15530 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-01-07 23:18:44,288-Speed 5981.99 samples/sec Loss 13.6761 LearningRate 0.2997 Epoch: 1 Global Step: 15540 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-01-07 23:18:51,145-Speed 5976.44 samples/sec Loss 13.6871 LearningRate 0.2999 Epoch: 1 Global Step: 15550 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-01-07 23:18:57,998-Speed 5981.19 samples/sec Loss 13.6745 LearningRate 0.3001 Epoch: 1 Global Step: 15560 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-01-07 23:19:04,852-Speed 5978.38 samples/sec Loss 13.5055 LearningRate 0.3003 Epoch: 1 Global Step: 15570 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-01-07 23:19:11,748-Speed 5941.38 samples/sec Loss 13.5512 LearningRate 0.3005 Epoch: 1 Global Step: 15580 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-01-07 23:19:18,608-Speed 5971.74 samples/sec Loss 13.3739 LearningRate 0.3007 Epoch: 1 Global Step: 15590 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-01-07 23:19:25,477-Speed 5964.39 samples/sec Loss 13.5494 LearningRate 0.3009 Epoch: 1 Global Step: 15600 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-07 23:19:32,344-Speed 5967.47 samples/sec Loss 13.6070 LearningRate 0.3011 Epoch: 1 Global Step: 15610 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-07 23:19:39,198-Speed 5977.78 samples/sec Loss 13.4097 LearningRate 0.3013 Epoch: 1 Global Step: 15620 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-07 23:19:46,049-Speed 5979.22 samples/sec Loss 13.5055 LearningRate 0.3015 Epoch: 1 Global Step: 15630 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-07 23:19:52,940-Speed 5945.49 samples/sec Loss 13.4548 LearningRate 0.3017 Epoch: 1 Global Step: 15640 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-07 23:19:59,795-Speed 5976.42 samples/sec Loss 13.5019 LearningRate 0.3019 Epoch: 1 Global Step: 15650 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-07 23:20:06,665-Speed 5963.69 samples/sec Loss 13.5871 LearningRate 0.3021 Epoch: 1 Global Step: 15660 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-07 23:20:13,515-Speed 5980.35 samples/sec Loss 13.4935 LearningRate 0.3022 Epoch: 1 Global Step: 15670 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-07 23:20:20,394-Speed 5955.97 samples/sec Loss 13.5190 LearningRate 0.3024 Epoch: 1 Global Step: 15680 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-07 23:20:27,241-Speed 5983.10 samples/sec Loss 13.4510 LearningRate 0.3026 Epoch: 1 Global Step: 15690 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-07 23:20:34,127-Speed 5951.19 samples/sec Loss 13.5674 LearningRate 0.3028 Epoch: 1 Global Step: 15700 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 23:20:41,083-Speed 5891.30 samples/sec Loss 13.5876 LearningRate 0.3030 Epoch: 1 Global Step: 15710 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 23:20:47,930-Speed 5982.97 samples/sec Loss 13.4881 LearningRate 0.3032 Epoch: 1 Global Step: 15720 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 23:20:54,786-Speed 5975.25 samples/sec Loss 13.5346 LearningRate 0.3034 Epoch: 1 Global Step: 15730 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 23:21:01,657-Speed 5963.54 samples/sec Loss 13.4661 LearningRate 0.3036 Epoch: 1 Global Step: 15740 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 23:21:08,546-Speed 5946.69 samples/sec Loss 13.5104 LearningRate 0.3038 Epoch: 1 Global Step: 15750 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 23:21:15,470-Speed 5916.65 samples/sec Loss 13.5576 LearningRate 0.3040 Epoch: 1 Global Step: 15760 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 23:21:22,399-Speed 5912.49 samples/sec Loss 13.4516 LearningRate 0.3042 Epoch: 1 Global Step: 15770 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 23:21:29,255-Speed 5974.85 samples/sec Loss 13.4786 LearningRate 0.3044 Epoch: 1 Global Step: 15780 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 23:21:36,162-Speed 5931.53 samples/sec Loss 13.4427 LearningRate 0.3046 Epoch: 1 Global Step: 15790 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 23:21:43,028-Speed 5967.27 samples/sec Loss 13.5567 LearningRate 0.3048 Epoch: 1 Global Step: 15800 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:21:49,879-Speed 5979.34 samples/sec Loss 13.5347 LearningRate 0.3049 Epoch: 1 Global Step: 15810 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:21:56,731-Speed 5979.05 samples/sec Loss 13.5721 LearningRate 0.3051 Epoch: 1 Global Step: 15820 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-07 23:22:03,574-Speed 5989.17 samples/sec Loss 13.5376 LearningRate 0.3053 Epoch: 1 Global Step: 15830 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-07 23:22:10,428-Speed 5977.51 samples/sec Loss 13.5272 LearningRate 0.3055 Epoch: 1 Global Step: 15840 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:22:17,284-Speed 5975.45 samples/sec Loss 13.5826 LearningRate 0.3057 Epoch: 1 Global Step: 15850 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:22:24,138-Speed 5977.43 samples/sec Loss 13.5269 LearningRate 0.3059 Epoch: 1 Global Step: 15860 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:22:30,994-Speed 5975.48 samples/sec Loss 13.6547 LearningRate 0.3061 Epoch: 1 Global Step: 15870 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:22:37,865-Speed 5962.32 samples/sec Loss 13.5035 LearningRate 0.3063 Epoch: 1 Global Step: 15880 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:22:44,743-Speed 5956.23 samples/sec Loss 13.5744 LearningRate 0.3065 Epoch: 1 Global Step: 15890 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:22:51,621-Speed 5957.09 samples/sec Loss 13.4497 LearningRate 0.3067 Epoch: 1 Global Step: 15900 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:22:58,482-Speed 5972.61 samples/sec Loss 13.5382 LearningRate 0.3069 Epoch: 1 Global Step: 15910 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:23:05,341-Speed 5973.30 samples/sec Loss 13.4659 LearningRate 0.3071 Epoch: 1 Global Step: 15920 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:23:12,204-Speed 5969.72 samples/sec Loss 13.5750 LearningRate 0.3073 Epoch: 1 Global Step: 15930 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:23:19,098-Speed 5942.82 samples/sec Loss 13.5698 LearningRate 0.3075 Epoch: 1 Global Step: 15940 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:23:25,966-Speed 5964.64 samples/sec Loss 13.5265 LearningRate 0.3076 Epoch: 1 Global Step: 15950 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:23:32,847-Speed 5955.68 samples/sec Loss 13.5113 LearningRate 0.3078 Epoch: 1 Global Step: 15960 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:23:39,707-Speed 5971.85 samples/sec Loss 13.5685 LearningRate 0.3080 Epoch: 1 Global Step: 15970 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:23:46,607-Speed 5937.67 samples/sec Loss 13.5519 LearningRate 0.3082 Epoch: 1 Global Step: 15980 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:23:53,468-Speed 5971.01 samples/sec Loss 13.5371 LearningRate 0.3084 Epoch: 1 Global Step: 15990 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:24:00,328-Speed 5974.47 samples/sec Loss 13.4763 LearningRate 0.3086 Epoch: 1 Global Step: 16000 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:24:07,183-Speed 5976.30 samples/sec Loss 13.6106 LearningRate 0.3088 Epoch: 1 Global Step: 16010 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:24:14,031-Speed 5982.22 samples/sec Loss 13.6386 LearningRate 0.3090 Epoch: 1 Global Step: 16020 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:24:20,875-Speed 5985.85 samples/sec Loss 13.6359 LearningRate 0.3092 Epoch: 1 Global Step: 16030 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:24:27,728-Speed 5978.32 samples/sec Loss 13.5770 LearningRate 0.3094 Epoch: 1 Global Step: 16040 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:24:34,701-Speed 5875.46 samples/sec Loss 13.5464 LearningRate 0.3096 Epoch: 1 Global Step: 16050 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:24:41,636-Speed 5907.45 samples/sec Loss 13.5736 LearningRate 0.3098 Epoch: 1 Global Step: 16060 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:24:48,541-Speed 5933.08 samples/sec Loss 13.4884 LearningRate 0.3100 Epoch: 1 Global Step: 16070 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:24:55,458-Speed 5922.63 samples/sec Loss 13.4975 LearningRate 0.3102 Epoch: 1 Global Step: 16080 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:25:02,335-Speed 5957.24 samples/sec Loss 13.6263 LearningRate 0.3103 Epoch: 1 Global Step: 16090 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:25:09,204-Speed 5963.77 samples/sec Loss 13.5937 LearningRate 0.3105 Epoch: 1 Global Step: 16100 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:25:16,070-Speed 5967.29 samples/sec Loss 13.4952 LearningRate 0.3107 Epoch: 1 Global Step: 16110 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:25:22,947-Speed 5957.46 samples/sec Loss 13.6262 LearningRate 0.3109 Epoch: 1 Global Step: 16120 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:25:29,799-Speed 5978.94 samples/sec Loss 13.4778 LearningRate 0.3111 Epoch: 1 Global Step: 16130 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:25:36,655-Speed 5975.74 samples/sec Loss 13.5095 LearningRate 0.3113 Epoch: 1 Global Step: 16140 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:25:43,543-Speed 5948.87 samples/sec Loss 13.5937 LearningRate 0.3115 Epoch: 1 Global Step: 16150 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:25:50,403-Speed 5971.76 samples/sec Loss 13.5965 LearningRate 0.3117 Epoch: 1 Global Step: 16160 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:25:57,253-Speed 5980.60 samples/sec Loss 13.5081 LearningRate 0.3119 Epoch: 1 Global Step: 16170 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:26:04,149-Speed 5941.57 samples/sec Loss 13.4948 LearningRate 0.3121 Epoch: 1 Global Step: 16180 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:26:11,043-Speed 5942.82 samples/sec Loss 13.5404 LearningRate 0.3123 Epoch: 1 Global Step: 16190 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:26:17,928-Speed 5950.59 samples/sec Loss 13.5181 LearningRate 0.3125 Epoch: 1 Global Step: 16200 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:26:24,779-Speed 5979.14 samples/sec Loss 13.5478 LearningRate 0.3127 Epoch: 1 Global Step: 16210 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:26:31,630-Speed 5980.48 samples/sec Loss 13.6522 LearningRate 0.3129 Epoch: 1 Global Step: 16220 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:26:38,477-Speed 5982.90 samples/sec Loss 13.5650 LearningRate 0.3130 Epoch: 1 Global Step: 16230 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:26:45,339-Speed 5970.72 samples/sec Loss 13.6006 LearningRate 0.3132 Epoch: 1 Global Step: 16240 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:26:52,189-Speed 5982.50 samples/sec Loss 13.4791 LearningRate 0.3134 Epoch: 1 Global Step: 16250 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:26:59,035-Speed 5983.46 samples/sec Loss 13.6188 LearningRate 0.3136 Epoch: 1 Global Step: 16260 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:27:05,884-Speed 5981.57 samples/sec Loss 13.5324 LearningRate 0.3138 Epoch: 1 Global Step: 16270 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:27:12,731-Speed 5983.51 samples/sec Loss 13.5976 LearningRate 0.3140 Epoch: 1 Global Step: 16280 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:27:19,590-Speed 5972.82 samples/sec Loss 13.5397 LearningRate 0.3142 Epoch: 1 Global Step: 16290 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:27:26,448-Speed 5973.64 samples/sec Loss 13.6068 LearningRate 0.3144 Epoch: 1 Global Step: 16300 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:27:33,306-Speed 5973.04 samples/sec Loss 13.5860 LearningRate 0.3146 Epoch: 1 Global Step: 16310 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:27:40,168-Speed 5971.17 samples/sec Loss 13.5445 LearningRate 0.3148 Epoch: 1 Global Step: 16320 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:27:47,022-Speed 5977.94 samples/sec Loss 13.5715 LearningRate 0.3150 Epoch: 1 Global Step: 16330 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:27:53,866-Speed 5985.52 samples/sec Loss 13.4907 LearningRate 0.3152 Epoch: 1 Global Step: 16340 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:28:00,713-Speed 5983.52 samples/sec Loss 13.6210 LearningRate 0.3154 Epoch: 1 Global Step: 16350 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:28:07,568-Speed 5978.21 samples/sec Loss 13.6631 LearningRate 0.3156 Epoch: 1 Global Step: 16360 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:28:14,412-Speed 5986.32 samples/sec Loss 13.5691 LearningRate 0.3157 Epoch: 1 Global Step: 16370 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:28:21,265-Speed 5977.75 samples/sec Loss 13.5907 LearningRate 0.3159 Epoch: 1 Global Step: 16380 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:28:28,134-Speed 5963.72 samples/sec Loss 13.6887 LearningRate 0.3161 Epoch: 1 Global Step: 16390 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:28:34,976-Speed 5987.99 samples/sec Loss 13.6896 LearningRate 0.3163 Epoch: 1 Global Step: 16400 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:28:41,866-Speed 5946.28 samples/sec Loss 13.5651 LearningRate 0.3165 Epoch: 1 Global Step: 16410 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:28:48,710-Speed 5985.83 samples/sec Loss 13.4916 LearningRate 0.3167 Epoch: 1 Global Step: 16420 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:28:55,557-Speed 5982.92 samples/sec Loss 13.5781 LearningRate 0.3169 Epoch: 1 Global Step: 16430 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:29:02,408-Speed 5982.72 samples/sec Loss 13.6492 LearningRate 0.3171 Epoch: 1 Global Step: 16440 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:29:09,362-Speed 5890.89 samples/sec Loss 13.6221 LearningRate 0.3173 Epoch: 1 Global Step: 16450 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:29:16,315-Speed 5892.81 samples/sec Loss 13.5626 LearningRate 0.3175 Epoch: 1 Global Step: 16460 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:29:23,169-Speed 5977.36 samples/sec Loss 13.5413 LearningRate 0.3177 Epoch: 1 Global Step: 16470 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:29:30,035-Speed 5966.59 samples/sec Loss 13.5598 LearningRate 0.3179 Epoch: 1 Global Step: 16480 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:29:36,899-Speed 5967.91 samples/sec Loss 13.6273 LearningRate 0.3181 Epoch: 1 Global Step: 16490 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:29:43,765-Speed 5967.29 samples/sec Loss 13.6014 LearningRate 0.3183 Epoch: 1 Global Step: 16500 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:29:50,615-Speed 5982.18 samples/sec Loss 13.5867 LearningRate 0.3184 Epoch: 1 Global Step: 16510 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:29:57,513-Speed 5939.29 samples/sec Loss 13.5899 LearningRate 0.3186 Epoch: 1 Global Step: 16520 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:30:04,372-Speed 5972.14 samples/sec Loss 13.6879 LearningRate 0.3188 Epoch: 1 Global Step: 16530 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:30:11,234-Speed 5970.92 samples/sec Loss 13.5037 LearningRate 0.3190 Epoch: 1 Global Step: 16540 Fp16 Grad Scale: 524288 Required: 37 hours Training: 2022-01-07 23:30:18,136-Speed 5935.88 samples/sec Loss 13.5907 LearningRate 0.3192 Epoch: 1 Global Step: 16550 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:30:25,007-Speed 5962.66 samples/sec Loss 13.5221 LearningRate 0.3194 Epoch: 1 Global Step: 16560 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:30:31,867-Speed 5972.02 samples/sec Loss 13.5928 LearningRate 0.3196 Epoch: 1 Global Step: 16570 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:30:38,751-Speed 5953.91 samples/sec Loss 13.5408 LearningRate 0.3198 Epoch: 1 Global Step: 16580 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:30:45,612-Speed 5970.61 samples/sec Loss 13.6924 LearningRate 0.3200 Epoch: 1 Global Step: 16590 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:30:52,494-Speed 5955.72 samples/sec Loss 13.5503 LearningRate 0.3202 Epoch: 1 Global Step: 16600 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:30:59,434-Speed 5903.44 samples/sec Loss 13.5652 LearningRate 0.3204 Epoch: 1 Global Step: 16610 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:31:06,395-Speed 5885.85 samples/sec Loss 13.5341 LearningRate 0.3206 Epoch: 1 Global Step: 16620 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:31:13,256-Speed 5971.29 samples/sec Loss 13.5280 LearningRate 0.3208 Epoch: 1 Global Step: 16630 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:31:20,250-Speed 5858.19 samples/sec Loss 13.5779 LearningRate 0.3210 Epoch: 1 Global Step: 16640 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:31:27,219-Speed 5878.22 samples/sec Loss 13.6139 LearningRate 0.3211 Epoch: 1 Global Step: 16650 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:31:34,094-Speed 5959.13 samples/sec Loss 13.6436 LearningRate 0.3213 Epoch: 1 Global Step: 16660 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:31:40,962-Speed 5965.85 samples/sec Loss 13.6670 LearningRate 0.3215 Epoch: 1 Global Step: 16670 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:31:47,824-Speed 5970.04 samples/sec Loss 13.6402 LearningRate 0.3217 Epoch: 1 Global Step: 16680 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:31:54,670-Speed 5984.20 samples/sec Loss 13.6953 LearningRate 0.3219 Epoch: 1 Global Step: 16690 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:32:01,520-Speed 5980.34 samples/sec Loss 13.6261 LearningRate 0.3221 Epoch: 1 Global Step: 16700 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:32:08,394-Speed 5959.23 samples/sec Loss 13.7632 LearningRate 0.3223 Epoch: 1 Global Step: 16710 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:32:15,249-Speed 5976.26 samples/sec Loss 13.7188 LearningRate 0.3225 Epoch: 1 Global Step: 16720 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:32:22,098-Speed 5980.89 samples/sec Loss 13.6734 LearningRate 0.3227 Epoch: 1 Global Step: 16730 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:32:28,959-Speed 5971.58 samples/sec Loss 13.7128 LearningRate 0.3229 Epoch: 1 Global Step: 16740 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:32:35,824-Speed 5968.05 samples/sec Loss 13.6551 LearningRate 0.3231 Epoch: 1 Global Step: 16750 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:32:42,669-Speed 5984.83 samples/sec Loss 13.7071 LearningRate 0.3233 Epoch: 1 Global Step: 16760 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:32:49,532-Speed 5969.32 samples/sec Loss 13.6234 LearningRate 0.3235 Epoch: 1 Global Step: 16770 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:32:56,398-Speed 5967.86 samples/sec Loss 13.6636 LearningRate 0.3237 Epoch: 1 Global Step: 16780 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:33:03,248-Speed 5980.37 samples/sec Loss 13.6892 LearningRate 0.3238 Epoch: 1 Global Step: 16790 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:33:10,110-Speed 5969.85 samples/sec Loss 13.6051 LearningRate 0.3240 Epoch: 1 Global Step: 16800 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:33:16,975-Speed 5967.64 samples/sec Loss 13.6413 LearningRate 0.3242 Epoch: 1 Global Step: 16810 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:33:23,826-Speed 5982.16 samples/sec Loss 13.6612 LearningRate 0.3244 Epoch: 1 Global Step: 16820 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:33:30,679-Speed 5979.02 samples/sec Loss 13.6704 LearningRate 0.3246 Epoch: 1 Global Step: 16830 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:33:37,553-Speed 5959.50 samples/sec Loss 13.6010 LearningRate 0.3248 Epoch: 1 Global Step: 16840 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:33:44,409-Speed 5975.89 samples/sec Loss 13.7359 LearningRate 0.3250 Epoch: 1 Global Step: 16850 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:33:51,275-Speed 5965.87 samples/sec Loss 13.6511 LearningRate 0.3252 Epoch: 1 Global Step: 16860 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:33:58,152-Speed 5959.36 samples/sec Loss 13.6613 LearningRate 0.3254 Epoch: 1 Global Step: 16870 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:34:05,036-Speed 5951.93 samples/sec Loss 13.7053 LearningRate 0.3256 Epoch: 1 Global Step: 16880 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:34:11,918-Speed 5953.00 samples/sec Loss 13.6777 LearningRate 0.3258 Epoch: 1 Global Step: 16890 Fp16 Grad Scale: 524288 Required: 37 hours Training: 2022-01-07 23:34:18,780-Speed 5970.41 samples/sec Loss 13.6536 LearningRate 0.3260 Epoch: 1 Global Step: 16900 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:34:25,655-Speed 5958.91 samples/sec Loss 13.6583 LearningRate 0.3262 Epoch: 1 Global Step: 16910 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:34:32,529-Speed 5959.50 samples/sec Loss 13.6388 LearningRate 0.3264 Epoch: 1 Global Step: 16920 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:34:39,376-Speed 5983.12 samples/sec Loss 13.6330 LearningRate 0.3266 Epoch: 1 Global Step: 16930 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:34:46,228-Speed 5979.42 samples/sec Loss 13.7709 LearningRate 0.3267 Epoch: 1 Global Step: 16940 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:34:53,078-Speed 5980.01 samples/sec Loss 13.7244 LearningRate 0.3269 Epoch: 1 Global Step: 16950 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:34:59,931-Speed 5978.79 samples/sec Loss 13.7119 LearningRate 0.3271 Epoch: 1 Global Step: 16960 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:35:06,799-Speed 5964.78 samples/sec Loss 13.6501 LearningRate 0.3273 Epoch: 1 Global Step: 16970 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:35:13,670-Speed 5962.30 samples/sec Loss 13.6230 LearningRate 0.3275 Epoch: 1 Global Step: 16980 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:35:20,522-Speed 5978.94 samples/sec Loss 13.6883 LearningRate 0.3277 Epoch: 1 Global Step: 16990 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:35:27,364-Speed 5987.40 samples/sec Loss 13.6854 LearningRate 0.3279 Epoch: 1 Global Step: 17000 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:35:34,253-Speed 5947.16 samples/sec Loss 13.5612 LearningRate 0.3281 Epoch: 1 Global Step: 17010 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:35:41,148-Speed 5941.54 samples/sec Loss 13.7173 LearningRate 0.3283 Epoch: 1 Global Step: 17020 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:35:48,011-Speed 5969.21 samples/sec Loss 13.7088 LearningRate 0.3285 Epoch: 1 Global Step: 17030 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:35:54,863-Speed 5978.54 samples/sec Loss 13.7332 LearningRate 0.3287 Epoch: 1 Global Step: 17040 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:36:01,733-Speed 5964.14 samples/sec Loss 13.7426 LearningRate 0.3289 Epoch: 1 Global Step: 17050 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:36:08,600-Speed 5965.76 samples/sec Loss 13.7095 LearningRate 0.3291 Epoch: 1 Global Step: 17060 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:36:15,483-Speed 5951.29 samples/sec Loss 13.7001 LearningRate 0.3293 Epoch: 1 Global Step: 17070 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:36:22,334-Speed 5980.60 samples/sec Loss 13.6581 LearningRate 0.3294 Epoch: 1 Global Step: 17080 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:36:29,225-Speed 5945.37 samples/sec Loss 13.6808 LearningRate 0.3296 Epoch: 1 Global Step: 17090 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:36:36,079-Speed 5979.21 samples/sec Loss 13.6763 LearningRate 0.3298 Epoch: 1 Global Step: 17100 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:36:42,938-Speed 5973.11 samples/sec Loss 13.7769 LearningRate 0.3300 Epoch: 1 Global Step: 17110 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:36:49,808-Speed 5963.48 samples/sec Loss 13.8058 LearningRate 0.3302 Epoch: 1 Global Step: 17120 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:36:56,664-Speed 5975.38 samples/sec Loss 13.7655 LearningRate 0.3304 Epoch: 1 Global Step: 17130 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:37:03,565-Speed 5937.03 samples/sec Loss 13.6901 LearningRate 0.3306 Epoch: 1 Global Step: 17140 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:37:10,417-Speed 5978.89 samples/sec Loss 13.7478 LearningRate 0.3308 Epoch: 1 Global Step: 17150 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:37:17,289-Speed 5960.83 samples/sec Loss 13.6935 LearningRate 0.3310 Epoch: 1 Global Step: 17160 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:37:24,155-Speed 5967.05 samples/sec Loss 13.6315 LearningRate 0.3312 Epoch: 1 Global Step: 17170 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:37:31,018-Speed 5969.04 samples/sec Loss 13.7971 LearningRate 0.3314 Epoch: 1 Global Step: 17180 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:37:37,878-Speed 5972.36 samples/sec Loss 13.7591 LearningRate 0.3316 Epoch: 1 Global Step: 17190 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:37:49,509-Speed 3522.15 samples/sec Loss 13.7422 LearningRate 0.3318 Epoch: 1 Global Step: 17200 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:37:56,333-Speed 6002.60 samples/sec Loss 13.7860 LearningRate 0.3320 Epoch: 1 Global Step: 17210 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-07 23:38:03,195-Speed 5970.62 samples/sec Loss 13.7102 LearningRate 0.3321 Epoch: 1 Global Step: 17220 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-07 23:38:10,071-Speed 5957.99 samples/sec Loss 13.7592 LearningRate 0.3323 Epoch: 1 Global Step: 17230 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-07 23:38:16,912-Speed 5987.95 samples/sec Loss 13.8820 LearningRate 0.3325 Epoch: 1 Global Step: 17240 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-07 23:38:23,767-Speed 5976.19 samples/sec Loss 13.7958 LearningRate 0.3327 Epoch: 1 Global Step: 17250 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-07 23:38:30,635-Speed 5965.13 samples/sec Loss 13.6738 LearningRate 0.3329 Epoch: 1 Global Step: 17260 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-07 23:38:37,486-Speed 5979.40 samples/sec Loss 13.7432 LearningRate 0.3331 Epoch: 1 Global Step: 17270 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-07 23:38:44,335-Speed 5982.12 samples/sec Loss 13.6708 LearningRate 0.3333 Epoch: 1 Global Step: 17280 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-07 23:38:51,193-Speed 5973.48 samples/sec Loss 13.6714 LearningRate 0.3335 Epoch: 1 Global Step: 17290 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-07 23:38:58,054-Speed 5970.47 samples/sec Loss 13.6809 LearningRate 0.3337 Epoch: 1 Global Step: 17300 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-07 23:39:04,908-Speed 5977.58 samples/sec Loss 13.8118 LearningRate 0.3339 Epoch: 1 Global Step: 17310 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:39:11,770-Speed 5970.27 samples/sec Loss 13.6433 LearningRate 0.3341 Epoch: 1 Global Step: 17320 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:39:18,690-Speed 5919.77 samples/sec Loss 13.7042 LearningRate 0.3343 Epoch: 1 Global Step: 17330 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:39:25,552-Speed 5970.60 samples/sec Loss 13.6362 LearningRate 0.3345 Epoch: 1 Global Step: 17340 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:39:32,422-Speed 5963.34 samples/sec Loss 13.7085 LearningRate 0.3347 Epoch: 1 Global Step: 17350 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:39:39,273-Speed 5979.46 samples/sec Loss 13.7308 LearningRate 0.3348 Epoch: 1 Global Step: 17360 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:39:46,120-Speed 5983.34 samples/sec Loss 13.6726 LearningRate 0.3350 Epoch: 1 Global Step: 17370 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:39:52,966-Speed 5984.43 samples/sec Loss 13.7581 LearningRate 0.3352 Epoch: 1 Global Step: 17380 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:39:59,815-Speed 5980.97 samples/sec Loss 13.7323 LearningRate 0.3354 Epoch: 1 Global Step: 17390 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:40:06,660-Speed 5984.81 samples/sec Loss 13.8280 LearningRate 0.3356 Epoch: 1 Global Step: 17400 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:40:13,521-Speed 5971.98 samples/sec Loss 13.8037 LearningRate 0.3358 Epoch: 1 Global Step: 17410 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:40:20,379-Speed 5973.87 samples/sec Loss 13.7655 LearningRate 0.3360 Epoch: 1 Global Step: 17420 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:40:27,240-Speed 5970.79 samples/sec Loss 13.7297 LearningRate 0.3362 Epoch: 1 Global Step: 17430 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:40:34,097-Speed 5974.34 samples/sec Loss 13.8081 LearningRate 0.3364 Epoch: 1 Global Step: 17440 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:40:40,972-Speed 5960.09 samples/sec Loss 13.7597 LearningRate 0.3366 Epoch: 1 Global Step: 17450 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:40:47,835-Speed 5969.00 samples/sec Loss 13.8178 LearningRate 0.3368 Epoch: 1 Global Step: 17460 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:40:54,697-Speed 5970.44 samples/sec Loss 13.7594 LearningRate 0.3370 Epoch: 1 Global Step: 17470 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:41:01,559-Speed 5969.94 samples/sec Loss 13.6939 LearningRate 0.3372 Epoch: 1 Global Step: 17480 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:41:08,420-Speed 5970.90 samples/sec Loss 13.7459 LearningRate 0.3374 Epoch: 1 Global Step: 17490 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:41:15,358-Speed 5905.18 samples/sec Loss 13.7359 LearningRate 0.3375 Epoch: 1 Global Step: 17500 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:41:22,281-Speed 5917.25 samples/sec Loss 13.6005 LearningRate 0.3377 Epoch: 1 Global Step: 17510 Fp16 Grad Scale: 524288 Required: 37 hours Training: 2022-01-07 23:41:29,152-Speed 5962.71 samples/sec Loss 13.7836 LearningRate 0.3379 Epoch: 1 Global Step: 17520 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:41:36,017-Speed 5967.90 samples/sec Loss 13.7130 LearningRate 0.3381 Epoch: 1 Global Step: 17530 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:41:42,890-Speed 5960.70 samples/sec Loss 13.7379 LearningRate 0.3383 Epoch: 1 Global Step: 17540 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:41:49,816-Speed 5915.08 samples/sec Loss 13.7777 LearningRate 0.3385 Epoch: 1 Global Step: 17550 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:41:56,690-Speed 5958.99 samples/sec Loss 13.7386 LearningRate 0.3387 Epoch: 1 Global Step: 17560 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:42:03,584-Speed 5943.07 samples/sec Loss 13.7138 LearningRate 0.3389 Epoch: 1 Global Step: 17570 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:42:10,469-Speed 5949.64 samples/sec Loss 13.8273 LearningRate 0.3391 Epoch: 1 Global Step: 17580 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:42:17,316-Speed 5982.81 samples/sec Loss 13.8150 LearningRate 0.3393 Epoch: 1 Global Step: 17590 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:42:24,319-Speed 5853.28 samples/sec Loss 13.8399 LearningRate 0.3395 Epoch: 1 Global Step: 17600 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:42:31,200-Speed 5953.86 samples/sec Loss 13.8364 LearningRate 0.3397 Epoch: 1 Global Step: 17610 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:42:38,067-Speed 5965.65 samples/sec Loss 13.6920 LearningRate 0.3399 Epoch: 1 Global Step: 17620 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:42:44,919-Speed 5979.68 samples/sec Loss 13.8374 LearningRate 0.3401 Epoch: 1 Global Step: 17630 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:42:51,771-Speed 5978.72 samples/sec Loss 13.7834 LearningRate 0.3402 Epoch: 1 Global Step: 17640 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:42:58,667-Speed 5941.13 samples/sec Loss 13.7196 LearningRate 0.3404 Epoch: 1 Global Step: 17650 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:43:05,520-Speed 5978.04 samples/sec Loss 13.7237 LearningRate 0.3406 Epoch: 1 Global Step: 17660 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:43:12,401-Speed 5953.57 samples/sec Loss 13.8272 LearningRate 0.3408 Epoch: 1 Global Step: 17670 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:43:19,276-Speed 5958.75 samples/sec Loss 13.8081 LearningRate 0.3410 Epoch: 1 Global Step: 17680 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:43:26,133-Speed 5975.56 samples/sec Loss 13.7781 LearningRate 0.3412 Epoch: 1 Global Step: 17690 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:43:32,995-Speed 5970.07 samples/sec Loss 13.7597 LearningRate 0.3414 Epoch: 1 Global Step: 17700 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:43:39,859-Speed 5972.91 samples/sec Loss 13.8658 LearningRate 0.3416 Epoch: 1 Global Step: 17710 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:43:46,712-Speed 5978.50 samples/sec Loss 13.8530 LearningRate 0.3418 Epoch: 1 Global Step: 17720 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:43:53,588-Speed 5958.72 samples/sec Loss 13.7775 LearningRate 0.3420 Epoch: 1 Global Step: 17730 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:44:00,443-Speed 5975.74 samples/sec Loss 13.7305 LearningRate 0.3422 Epoch: 1 Global Step: 17740 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:44:07,301-Speed 5973.74 samples/sec Loss 13.7868 LearningRate 0.3424 Epoch: 1 Global Step: 17750 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:44:14,254-Speed 5892.26 samples/sec Loss 13.7215 LearningRate 0.3426 Epoch: 1 Global Step: 17760 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:44:21,118-Speed 5967.81 samples/sec Loss 13.8323 LearningRate 0.3428 Epoch: 1 Global Step: 17770 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:44:27,996-Speed 5956.54 samples/sec Loss 13.7486 LearningRate 0.3429 Epoch: 1 Global Step: 17780 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:44:34,843-Speed 5983.02 samples/sec Loss 13.8135 LearningRate 0.3431 Epoch: 1 Global Step: 17790 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:44:41,694-Speed 5979.53 samples/sec Loss 13.7908 LearningRate 0.3433 Epoch: 1 Global Step: 17800 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:44:48,538-Speed 5986.11 samples/sec Loss 13.9607 LearningRate 0.3435 Epoch: 1 Global Step: 17810 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:44:55,408-Speed 5963.36 samples/sec Loss 13.8850 LearningRate 0.3437 Epoch: 1 Global Step: 17820 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:45:02,262-Speed 5978.46 samples/sec Loss 13.8692 LearningRate 0.3439 Epoch: 1 Global Step: 17830 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:45:09,118-Speed 5976.29 samples/sec Loss 13.8387 LearningRate 0.3441 Epoch: 1 Global Step: 17840 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:45:15,993-Speed 5958.56 samples/sec Loss 13.8669 LearningRate 0.3443 Epoch: 1 Global Step: 17850 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:45:22,862-Speed 5964.89 samples/sec Loss 13.8780 LearningRate 0.3445 Epoch: 1 Global Step: 17860 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:45:29,716-Speed 5976.53 samples/sec Loss 13.9181 LearningRate 0.3447 Epoch: 1 Global Step: 17870 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:45:36,568-Speed 5979.55 samples/sec Loss 13.7479 LearningRate 0.3449 Epoch: 1 Global Step: 17880 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:45:43,439-Speed 5961.86 samples/sec Loss 13.8496 LearningRate 0.3451 Epoch: 1 Global Step: 17890 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:45:50,330-Speed 5945.11 samples/sec Loss 13.7674 LearningRate 0.3453 Epoch: 1 Global Step: 17900 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:45:57,196-Speed 5966.98 samples/sec Loss 13.8524 LearningRate 0.3455 Epoch: 1 Global Step: 17910 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:46:04,045-Speed 5981.74 samples/sec Loss 13.8419 LearningRate 0.3456 Epoch: 1 Global Step: 17920 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:46:10,912-Speed 5965.06 samples/sec Loss 13.8152 LearningRate 0.3458 Epoch: 1 Global Step: 17930 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:46:17,779-Speed 5966.25 samples/sec Loss 13.9321 LearningRate 0.3460 Epoch: 1 Global Step: 17940 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:46:24,671-Speed 5944.83 samples/sec Loss 13.7755 LearningRate 0.3462 Epoch: 1 Global Step: 17950 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:46:31,546-Speed 5959.08 samples/sec Loss 13.8450 LearningRate 0.3464 Epoch: 1 Global Step: 17960 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:46:38,404-Speed 5973.48 samples/sec Loss 13.8089 LearningRate 0.3466 Epoch: 1 Global Step: 17970 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:46:45,256-Speed 5978.57 samples/sec Loss 13.7460 LearningRate 0.3468 Epoch: 1 Global Step: 17980 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:46:52,120-Speed 5968.76 samples/sec Loss 13.8310 LearningRate 0.3470 Epoch: 1 Global Step: 17990 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:46:58,995-Speed 5959.08 samples/sec Loss 13.9294 LearningRate 0.3472 Epoch: 1 Global Step: 18000 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:47:05,856-Speed 5970.88 samples/sec Loss 13.8204 LearningRate 0.3474 Epoch: 1 Global Step: 18010 Fp16 Grad Scale: 524288 Required: 37 hours Training: 2022-01-07 23:47:12,716-Speed 5972.35 samples/sec Loss 13.9474 LearningRate 0.3476 Epoch: 1 Global Step: 18020 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:47:19,579-Speed 5968.83 samples/sec Loss 13.8836 LearningRate 0.3478 Epoch: 1 Global Step: 18030 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:47:26,468-Speed 5946.46 samples/sec Loss 13.7972 LearningRate 0.3480 Epoch: 1 Global Step: 18040 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:47:33,340-Speed 5972.26 samples/sec Loss 13.8102 LearningRate 0.3482 Epoch: 1 Global Step: 18050 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:47:40,206-Speed 5966.52 samples/sec Loss 13.7879 LearningRate 0.3483 Epoch: 1 Global Step: 18060 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:47:47,067-Speed 5971.21 samples/sec Loss 13.8131 LearningRate 0.3485 Epoch: 1 Global Step: 18070 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:47:53,916-Speed 5981.66 samples/sec Loss 13.7733 LearningRate 0.3487 Epoch: 1 Global Step: 18080 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:48:00,771-Speed 5976.25 samples/sec Loss 13.8660 LearningRate 0.3489 Epoch: 1 Global Step: 18090 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:48:07,617-Speed 5983.99 samples/sec Loss 13.8389 LearningRate 0.3491 Epoch: 1 Global Step: 18100 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:48:14,485-Speed 5965.40 samples/sec Loss 13.9846 LearningRate 0.3493 Epoch: 1 Global Step: 18110 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:48:21,334-Speed 5981.02 samples/sec Loss 13.8929 LearningRate 0.3495 Epoch: 1 Global Step: 18120 Fp16 Grad Scale: 524288 Required: 37 hours Training: 2022-01-07 23:48:28,190-Speed 5975.56 samples/sec Loss 13.8300 LearningRate 0.3497 Epoch: 1 Global Step: 18130 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:48:35,068-Speed 5956.46 samples/sec Loss 13.8567 LearningRate 0.3499 Epoch: 1 Global Step: 18140 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:48:41,941-Speed 5960.45 samples/sec Loss 13.8945 LearningRate 0.3501 Epoch: 1 Global Step: 18150 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:48:48,789-Speed 5982.47 samples/sec Loss 13.8212 LearningRate 0.3503 Epoch: 1 Global Step: 18160 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:48:55,639-Speed 5980.18 samples/sec Loss 13.9100 LearningRate 0.3505 Epoch: 1 Global Step: 18170 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:49:02,494-Speed 5976.43 samples/sec Loss 13.8306 LearningRate 0.3507 Epoch: 1 Global Step: 18180 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:49:09,346-Speed 5979.66 samples/sec Loss 13.8985 LearningRate 0.3509 Epoch: 1 Global Step: 18190 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:49:16,214-Speed 5964.98 samples/sec Loss 13.9400 LearningRate 0.3510 Epoch: 1 Global Step: 18200 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:49:23,058-Speed 5985.98 samples/sec Loss 13.9134 LearningRate 0.3512 Epoch: 1 Global Step: 18210 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:49:29,918-Speed 5972.19 samples/sec Loss 13.8623 LearningRate 0.3514 Epoch: 1 Global Step: 18220 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:49:36,769-Speed 5979.30 samples/sec Loss 13.9981 LearningRate 0.3516 Epoch: 1 Global Step: 18230 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:49:43,642-Speed 5960.22 samples/sec Loss 13.9922 LearningRate 0.3518 Epoch: 1 Global Step: 18240 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:49:50,486-Speed 5986.27 samples/sec Loss 13.8512 LearningRate 0.3520 Epoch: 1 Global Step: 18250 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:49:57,345-Speed 5972.55 samples/sec Loss 13.9770 LearningRate 0.3522 Epoch: 1 Global Step: 18260 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:50:04,192-Speed 5983.85 samples/sec Loss 13.8668 LearningRate 0.3524 Epoch: 1 Global Step: 18270 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:50:11,034-Speed 5987.25 samples/sec Loss 13.9187 LearningRate 0.3526 Epoch: 1 Global Step: 18280 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:50:17,897-Speed 5970.08 samples/sec Loss 13.9291 LearningRate 0.3528 Epoch: 1 Global Step: 18290 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:50:24,797-Speed 5937.71 samples/sec Loss 13.9126 LearningRate 0.3530 Epoch: 1 Global Step: 18300 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:50:31,653-Speed 5974.83 samples/sec Loss 13.8904 LearningRate 0.3532 Epoch: 1 Global Step: 18310 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:50:38,491-Speed 5991.02 samples/sec Loss 13.8233 LearningRate 0.3534 Epoch: 1 Global Step: 18320 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:50:47,422-Speed 4587.24 samples/sec Loss 13.9690 LearningRate 0.3536 Epoch: 1 Global Step: 18330 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:50:54,298-Speed 5957.88 samples/sec Loss 13.8792 LearningRate 0.3537 Epoch: 1 Global Step: 18340 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:51:01,135-Speed 5991.77 samples/sec Loss 13.7549 LearningRate 0.3539 Epoch: 1 Global Step: 18350 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:51:07,980-Speed 5985.00 samples/sec Loss 13.8333 LearningRate 0.3541 Epoch: 1 Global Step: 18360 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:51:14,837-Speed 5974.57 samples/sec Loss 13.9286 LearningRate 0.3543 Epoch: 1 Global Step: 18370 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:51:21,684-Speed 5983.41 samples/sec Loss 13.9130 LearningRate 0.3545 Epoch: 1 Global Step: 18380 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:51:28,545-Speed 5970.91 samples/sec Loss 13.9471 LearningRate 0.3547 Epoch: 1 Global Step: 18390 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:51:35,402-Speed 5975.24 samples/sec Loss 13.8493 LearningRate 0.3549 Epoch: 1 Global Step: 18400 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:51:42,267-Speed 5967.40 samples/sec Loss 13.8915 LearningRate 0.3551 Epoch: 1 Global Step: 18410 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:51:49,139-Speed 5961.38 samples/sec Loss 13.9241 LearningRate 0.3553 Epoch: 1 Global Step: 18420 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:51:55,994-Speed 5976.24 samples/sec Loss 13.9414 LearningRate 0.3555 Epoch: 1 Global Step: 18430 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:52:02,837-Speed 5986.78 samples/sec Loss 14.0998 LearningRate 0.3557 Epoch: 1 Global Step: 18440 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:52:09,700-Speed 5969.79 samples/sec Loss 13.8638 LearningRate 0.3559 Epoch: 1 Global Step: 18450 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:52:16,565-Speed 5967.84 samples/sec Loss 13.9586 LearningRate 0.3561 Epoch: 1 Global Step: 18460 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:52:23,433-Speed 5965.13 samples/sec Loss 13.9028 LearningRate 0.3563 Epoch: 1 Global Step: 18470 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:52:30,307-Speed 5959.97 samples/sec Loss 13.9832 LearningRate 0.3564 Epoch: 1 Global Step: 18480 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:52:37,180-Speed 5960.96 samples/sec Loss 13.9715 LearningRate 0.3566 Epoch: 1 Global Step: 18490 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:52:44,076-Speed 5940.48 samples/sec Loss 14.0058 LearningRate 0.3568 Epoch: 1 Global Step: 18500 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:52:50,973-Speed 5939.40 samples/sec Loss 13.8935 LearningRate 0.3570 Epoch: 1 Global Step: 18510 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:52:57,840-Speed 5966.70 samples/sec Loss 13.9564 LearningRate 0.3572 Epoch: 1 Global Step: 18520 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:53:04,696-Speed 5977.66 samples/sec Loss 13.9436 LearningRate 0.3574 Epoch: 1 Global Step: 18530 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:53:11,537-Speed 5987.77 samples/sec Loss 13.9862 LearningRate 0.3576 Epoch: 1 Global Step: 18540 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:53:18,382-Speed 5985.37 samples/sec Loss 14.0745 LearningRate 0.3578 Epoch: 1 Global Step: 18550 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:53:25,232-Speed 5981.09 samples/sec Loss 13.9317 LearningRate 0.3580 Epoch: 1 Global Step: 18560 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:53:32,078-Speed 5983.78 samples/sec Loss 14.1095 LearningRate 0.3582 Epoch: 1 Global Step: 18570 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:53:38,931-Speed 5978.28 samples/sec Loss 14.0087 LearningRate 0.3584 Epoch: 1 Global Step: 18580 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:53:45,793-Speed 5970.17 samples/sec Loss 13.9104 LearningRate 0.3586 Epoch: 1 Global Step: 18590 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:53:52,650-Speed 5974.73 samples/sec Loss 13.8459 LearningRate 0.3588 Epoch: 1 Global Step: 18600 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:53:59,502-Speed 5979.49 samples/sec Loss 13.8579 LearningRate 0.3590 Epoch: 1 Global Step: 18610 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:54:06,356-Speed 5976.64 samples/sec Loss 13.9240 LearningRate 0.3591 Epoch: 1 Global Step: 18620 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:54:13,212-Speed 5974.82 samples/sec Loss 13.9447 LearningRate 0.3593 Epoch: 1 Global Step: 18630 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:54:20,086-Speed 5960.51 samples/sec Loss 13.9407 LearningRate 0.3595 Epoch: 1 Global Step: 18640 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:54:26,942-Speed 5975.60 samples/sec Loss 13.9972 LearningRate 0.3597 Epoch: 1 Global Step: 18650 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:54:33,797-Speed 5976.23 samples/sec Loss 13.9797 LearningRate 0.3599 Epoch: 1 Global Step: 18660 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:54:40,661-Speed 5970.44 samples/sec Loss 14.0575 LearningRate 0.3601 Epoch: 1 Global Step: 18670 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:54:47,527-Speed 5966.52 samples/sec Loss 14.0184 LearningRate 0.3603 Epoch: 1 Global Step: 18680 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:54:54,422-Speed 5941.54 samples/sec Loss 14.0497 LearningRate 0.3605 Epoch: 1 Global Step: 18690 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:55:01,290-Speed 5965.57 samples/sec Loss 13.9497 LearningRate 0.3607 Epoch: 1 Global Step: 18700 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:55:08,150-Speed 5972.58 samples/sec Loss 13.9016 LearningRate 0.3609 Epoch: 1 Global Step: 18710 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:55:15,022-Speed 5964.93 samples/sec Loss 13.9410 LearningRate 0.3611 Epoch: 1 Global Step: 18720 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:55:21,883-Speed 5970.98 samples/sec Loss 13.9908 LearningRate 0.3613 Epoch: 1 Global Step: 18730 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:55:28,747-Speed 5968.33 samples/sec Loss 13.9626 LearningRate 0.3615 Epoch: 1 Global Step: 18740 Fp16 Grad Scale: 524288 Required: 37 hours Training: 2022-01-07 23:55:35,603-Speed 5975.66 samples/sec Loss 13.9755 LearningRate 0.3617 Epoch: 1 Global Step: 18750 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:55:42,472-Speed 5969.56 samples/sec Loss 13.9868 LearningRate 0.3618 Epoch: 1 Global Step: 18760 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:55:49,325-Speed 5977.91 samples/sec Loss 14.0110 LearningRate 0.3620 Epoch: 1 Global Step: 18770 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:55:56,186-Speed 5970.91 samples/sec Loss 14.0194 LearningRate 0.3622 Epoch: 1 Global Step: 18780 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:56:03,074-Speed 5947.93 samples/sec Loss 14.0090 LearningRate 0.3624 Epoch: 1 Global Step: 18790 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:56:09,933-Speed 5973.54 samples/sec Loss 14.0189 LearningRate 0.3626 Epoch: 1 Global Step: 18800 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:56:16,804-Speed 5963.02 samples/sec Loss 13.9372 LearningRate 0.3628 Epoch: 1 Global Step: 18810 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:56:23,666-Speed 5969.15 samples/sec Loss 13.9534 LearningRate 0.3630 Epoch: 1 Global Step: 18820 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:56:30,530-Speed 5969.20 samples/sec Loss 13.9706 LearningRate 0.3632 Epoch: 1 Global Step: 18830 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:56:37,379-Speed 5980.67 samples/sec Loss 13.9627 LearningRate 0.3634 Epoch: 1 Global Step: 18840 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:56:44,247-Speed 5965.86 samples/sec Loss 13.9976 LearningRate 0.3636 Epoch: 1 Global Step: 18850 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:56:51,128-Speed 5953.71 samples/sec Loss 13.9944 LearningRate 0.3638 Epoch: 1 Global Step: 18860 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:56:58,019-Speed 5944.46 samples/sec Loss 13.9540 LearningRate 0.3640 Epoch: 1 Global Step: 18870 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:57:04,892-Speed 5961.31 samples/sec Loss 14.0197 LearningRate 0.3642 Epoch: 1 Global Step: 18880 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:57:11,775-Speed 5959.49 samples/sec Loss 13.9573 LearningRate 0.3644 Epoch: 1 Global Step: 18890 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:57:18,610-Speed 5993.64 samples/sec Loss 14.2067 LearningRate 0.3645 Epoch: 1 Global Step: 18900 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:57:25,472-Speed 5970.48 samples/sec Loss 14.0772 LearningRate 0.3647 Epoch: 1 Global Step: 18910 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:57:32,321-Speed 5981.54 samples/sec Loss 14.0042 LearningRate 0.3649 Epoch: 1 Global Step: 18920 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:57:39,198-Speed 5956.67 samples/sec Loss 14.1247 LearningRate 0.3651 Epoch: 1 Global Step: 18930 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:57:46,051-Speed 5978.57 samples/sec Loss 14.0736 LearningRate 0.3653 Epoch: 1 Global Step: 18940 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:57:52,915-Speed 5968.60 samples/sec Loss 14.1132 LearningRate 0.3655 Epoch: 1 Global Step: 18950 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:57:59,767-Speed 5978.34 samples/sec Loss 14.0008 LearningRate 0.3657 Epoch: 1 Global Step: 18960 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:58:06,615-Speed 5982.68 samples/sec Loss 14.0409 LearningRate 0.3659 Epoch: 1 Global Step: 18970 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:58:13,474-Speed 5972.92 samples/sec Loss 14.0374 LearningRate 0.3661 Epoch: 1 Global Step: 18980 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:58:20,435-Speed 5884.83 samples/sec Loss 13.9483 LearningRate 0.3663 Epoch: 1 Global Step: 18990 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:58:27,277-Speed 5987.54 samples/sec Loss 13.9557 LearningRate 0.3665 Epoch: 1 Global Step: 19000 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:58:34,166-Speed 5947.89 samples/sec Loss 13.9590 LearningRate 0.3667 Epoch: 1 Global Step: 19010 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:58:41,030-Speed 5969.18 samples/sec Loss 13.9651 LearningRate 0.3669 Epoch: 1 Global Step: 19020 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:58:47,890-Speed 5971.96 samples/sec Loss 14.0354 LearningRate 0.3671 Epoch: 1 Global Step: 19030 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:58:54,741-Speed 5978.96 samples/sec Loss 14.0360 LearningRate 0.3672 Epoch: 1 Global Step: 19040 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:59:01,598-Speed 5975.57 samples/sec Loss 14.0053 LearningRate 0.3674 Epoch: 1 Global Step: 19050 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:59:08,458-Speed 5971.97 samples/sec Loss 13.9949 LearningRate 0.3676 Epoch: 1 Global Step: 19060 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-07 23:59:15,321-Speed 5969.59 samples/sec Loss 13.9995 LearningRate 0.3678 Epoch: 1 Global Step: 19070 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:59:22,166-Speed 5984.75 samples/sec Loss 14.0297 LearningRate 0.3680 Epoch: 1 Global Step: 19080 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:59:29,043-Speed 5957.49 samples/sec Loss 14.0583 LearningRate 0.3682 Epoch: 1 Global Step: 19090 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:59:35,896-Speed 5978.24 samples/sec Loss 14.0644 LearningRate 0.3684 Epoch: 1 Global Step: 19100 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:59:42,760-Speed 5968.24 samples/sec Loss 14.0370 LearningRate 0.3686 Epoch: 1 Global Step: 19110 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:59:49,613-Speed 5978.80 samples/sec Loss 13.9884 LearningRate 0.3688 Epoch: 1 Global Step: 19120 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-07 23:59:56,471-Speed 5973.34 samples/sec Loss 14.0149 LearningRate 0.3690 Epoch: 1 Global Step: 19130 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:00:03,324-Speed 5977.84 samples/sec Loss 14.0034 LearningRate 0.3692 Epoch: 1 Global Step: 19140 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:00:10,192-Speed 5965.29 samples/sec Loss 14.0988 LearningRate 0.3694 Epoch: 1 Global Step: 19150 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:00:17,042-Speed 5980.68 samples/sec Loss 14.0103 LearningRate 0.3696 Epoch: 1 Global Step: 19160 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:00:23,888-Speed 5983.81 samples/sec Loss 14.0522 LearningRate 0.3698 Epoch: 1 Global Step: 19170 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:00:30,745-Speed 5974.62 samples/sec Loss 14.0892 LearningRate 0.3699 Epoch: 1 Global Step: 19180 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:00:37,612-Speed 5965.80 samples/sec Loss 13.9976 LearningRate 0.3701 Epoch: 1 Global Step: 19190 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:00:44,477-Speed 5967.79 samples/sec Loss 14.0093 LearningRate 0.3703 Epoch: 1 Global Step: 19200 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:00:51,377-Speed 5937.21 samples/sec Loss 13.9825 LearningRate 0.3705 Epoch: 1 Global Step: 19210 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:00:58,252-Speed 5958.77 samples/sec Loss 14.0158 LearningRate 0.3707 Epoch: 1 Global Step: 19220 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:01:05,105-Speed 5978.00 samples/sec Loss 14.0896 LearningRate 0.3709 Epoch: 1 Global Step: 19230 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:01:11,979-Speed 5960.43 samples/sec Loss 13.9963 LearningRate 0.3711 Epoch: 1 Global Step: 19240 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:01:18,834-Speed 5976.00 samples/sec Loss 14.1589 LearningRate 0.3713 Epoch: 1 Global Step: 19250 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:01:25,692-Speed 5974.09 samples/sec Loss 14.0625 LearningRate 0.3715 Epoch: 1 Global Step: 19260 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:01:32,555-Speed 5969.35 samples/sec Loss 14.1026 LearningRate 0.3717 Epoch: 1 Global Step: 19270 Fp16 Grad Scale: 524288 Required: 37 hours Training: 2022-01-08 00:01:39,420-Speed 5968.12 samples/sec Loss 14.0799 LearningRate 0.3719 Epoch: 1 Global Step: 19280 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:01:46,295-Speed 5958.56 samples/sec Loss 14.0030 LearningRate 0.3721 Epoch: 1 Global Step: 19290 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:01:53,185-Speed 5945.64 samples/sec Loss 14.1314 LearningRate 0.3723 Epoch: 1 Global Step: 19300 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:02:00,053-Speed 5966.15 samples/sec Loss 13.9960 LearningRate 0.3725 Epoch: 1 Global Step: 19310 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:02:06,908-Speed 5975.72 samples/sec Loss 14.1186 LearningRate 0.3726 Epoch: 1 Global Step: 19320 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:02:13,811-Speed 5935.46 samples/sec Loss 14.0985 LearningRate 0.3728 Epoch: 1 Global Step: 19330 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:02:20,670-Speed 5973.36 samples/sec Loss 14.1169 LearningRate 0.3730 Epoch: 1 Global Step: 19340 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:02:27,554-Speed 5950.65 samples/sec Loss 14.0917 LearningRate 0.3732 Epoch: 1 Global Step: 19350 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:02:34,428-Speed 5962.65 samples/sec Loss 14.0252 LearningRate 0.3734 Epoch: 1 Global Step: 19360 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:02:41,323-Speed 5942.14 samples/sec Loss 14.1333 LearningRate 0.3736 Epoch: 1 Global Step: 19370 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:02:48,176-Speed 5978.13 samples/sec Loss 14.0050 LearningRate 0.3738 Epoch: 1 Global Step: 19380 Fp16 Grad Scale: 524288 Required: 37 hours Training: 2022-01-08 00:02:55,039-Speed 5969.13 samples/sec Loss 14.0591 LearningRate 0.3740 Epoch: 1 Global Step: 19390 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:03:01,910-Speed 5962.22 samples/sec Loss 14.1228 LearningRate 0.3742 Epoch: 1 Global Step: 19400 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:03:08,778-Speed 5964.55 samples/sec Loss 14.1607 LearningRate 0.3744 Epoch: 1 Global Step: 19410 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:03:15,654-Speed 5958.01 samples/sec Loss 14.1138 LearningRate 0.3746 Epoch: 1 Global Step: 19420 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:03:22,505-Speed 5979.83 samples/sec Loss 14.1975 LearningRate 0.3748 Epoch: 1 Global Step: 19430 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:03:29,367-Speed 5970.75 samples/sec Loss 14.1082 LearningRate 0.3750 Epoch: 1 Global Step: 19440 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:03:36,234-Speed 5965.57 samples/sec Loss 14.1248 LearningRate 0.3752 Epoch: 1 Global Step: 19450 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:03:43,107-Speed 5961.03 samples/sec Loss 14.0105 LearningRate 0.3753 Epoch: 1 Global Step: 19460 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:03:50,001-Speed 5943.11 samples/sec Loss 14.1139 LearningRate 0.3755 Epoch: 1 Global Step: 19470 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:03:56,870-Speed 5963.87 samples/sec Loss 14.0842 LearningRate 0.3757 Epoch: 1 Global Step: 19480 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:04:03,729-Speed 5975.08 samples/sec Loss 14.1358 LearningRate 0.3759 Epoch: 1 Global Step: 19490 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:04:10,591-Speed 5970.62 samples/sec Loss 14.0407 LearningRate 0.3761 Epoch: 1 Global Step: 19500 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:04:17,454-Speed 5968.90 samples/sec Loss 14.0594 LearningRate 0.3763 Epoch: 1 Global Step: 19510 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:04:24,365-Speed 5927.78 samples/sec Loss 14.1957 LearningRate 0.3765 Epoch: 1 Global Step: 19520 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:04:31,308-Speed 5901.12 samples/sec Loss 14.1407 LearningRate 0.3767 Epoch: 1 Global Step: 19530 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:04:38,237-Speed 5912.09 samples/sec Loss 14.1224 LearningRate 0.3769 Epoch: 1 Global Step: 19540 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:04:45,177-Speed 5904.00 samples/sec Loss 14.2013 LearningRate 0.3771 Epoch: 1 Global Step: 19550 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:04:52,115-Speed 5905.74 samples/sec Loss 14.0192 LearningRate 0.3773 Epoch: 1 Global Step: 19560 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:04:59,052-Speed 5905.49 samples/sec Loss 14.1646 LearningRate 0.3775 Epoch: 1 Global Step: 19570 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:05:05,990-Speed 5904.55 samples/sec Loss 14.1854 LearningRate 0.3777 Epoch: 1 Global Step: 19580 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:05:12,922-Speed 5910.49 samples/sec Loss 14.1399 LearningRate 0.3779 Epoch: 1 Global Step: 19590 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:05:19,865-Speed 5900.82 samples/sec Loss 14.0450 LearningRate 0.3780 Epoch: 1 Global Step: 19600 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:05:26,801-Speed 5906.41 samples/sec Loss 14.0800 LearningRate 0.3782 Epoch: 1 Global Step: 19610 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:05:33,668-Speed 5965.07 samples/sec Loss 14.1184 LearningRate 0.3784 Epoch: 1 Global Step: 19620 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 00:05:40,539-Speed 5963.12 samples/sec Loss 14.1055 LearningRate 0.3786 Epoch: 1 Global Step: 19630 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 00:05:47,473-Speed 5908.67 samples/sec Loss 14.1396 LearningRate 0.3788 Epoch: 1 Global Step: 19640 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 00:05:54,332-Speed 5972.65 samples/sec Loss 14.1499 LearningRate 0.3790 Epoch: 1 Global Step: 19650 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 00:06:01,186-Speed 5978.00 samples/sec Loss 14.2281 LearningRate 0.3792 Epoch: 1 Global Step: 19660 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 00:06:08,035-Speed 5981.44 samples/sec Loss 14.1477 LearningRate 0.3794 Epoch: 1 Global Step: 19670 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 00:06:14,883-Speed 5984.26 samples/sec Loss 14.1274 LearningRate 0.3796 Epoch: 1 Global Step: 19680 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 00:06:21,751-Speed 5964.96 samples/sec Loss 14.1036 LearningRate 0.3798 Epoch: 1 Global Step: 19690 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 00:06:28,612-Speed 5970.91 samples/sec Loss 14.2154 LearningRate 0.3800 Epoch: 1 Global Step: 19700 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 00:06:35,554-Speed 5901.30 samples/sec Loss 14.2355 LearningRate 0.3802 Epoch: 1 Global Step: 19710 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 00:06:42,398-Speed 5985.80 samples/sec Loss 14.1710 LearningRate 0.3804 Epoch: 1 Global Step: 19720 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:06:49,247-Speed 5981.46 samples/sec Loss 14.1057 LearningRate 0.3806 Epoch: 1 Global Step: 19730 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:06:56,108-Speed 5970.48 samples/sec Loss 14.1512 LearningRate 0.3808 Epoch: 1 Global Step: 19740 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:07:02,986-Speed 5956.59 samples/sec Loss 14.0652 LearningRate 0.3809 Epoch: 1 Global Step: 19750 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:07:09,837-Speed 5979.65 samples/sec Loss 14.1996 LearningRate 0.3811 Epoch: 1 Global Step: 19760 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:07:16,700-Speed 5969.55 samples/sec Loss 14.1248 LearningRate 0.3813 Epoch: 1 Global Step: 19770 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:07:23,563-Speed 5969.06 samples/sec Loss 14.2057 LearningRate 0.3815 Epoch: 1 Global Step: 19780 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:07:30,425-Speed 5970.34 samples/sec Loss 14.0777 LearningRate 0.3817 Epoch: 1 Global Step: 19790 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:07:37,285-Speed 5971.72 samples/sec Loss 14.1842 LearningRate 0.3819 Epoch: 1 Global Step: 19800 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:07:44,143-Speed 5973.66 samples/sec Loss 14.2392 LearningRate 0.3821 Epoch: 1 Global Step: 19810 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:07:51,027-Speed 5951.76 samples/sec Loss 14.1756 LearningRate 0.3823 Epoch: 1 Global Step: 19820 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:07:57,981-Speed 5890.61 samples/sec Loss 14.1716 LearningRate 0.3825 Epoch: 1 Global Step: 19830 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:08:04,843-Speed 5970.23 samples/sec Loss 14.1575 LearningRate 0.3827 Epoch: 1 Global Step: 19840 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:08:11,718-Speed 5959.20 samples/sec Loss 14.1660 LearningRate 0.3829 Epoch: 1 Global Step: 19850 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:08:18,576-Speed 5973.61 samples/sec Loss 14.2101 LearningRate 0.3831 Epoch: 1 Global Step: 19860 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:08:25,424-Speed 5983.13 samples/sec Loss 14.2317 LearningRate 0.3833 Epoch: 1 Global Step: 19870 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:08:32,279-Speed 5976.21 samples/sec Loss 14.1799 LearningRate 0.3835 Epoch: 1 Global Step: 19880 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:08:39,142-Speed 5968.74 samples/sec Loss 14.1250 LearningRate 0.3836 Epoch: 1 Global Step: 19890 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:08:46,010-Speed 5964.68 samples/sec Loss 14.1447 LearningRate 0.3838 Epoch: 1 Global Step: 19900 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:08:52,868-Speed 5973.96 samples/sec Loss 14.1496 LearningRate 0.3840 Epoch: 1 Global Step: 19910 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:08:59,715-Speed 5982.94 samples/sec Loss 14.1836 LearningRate 0.3842 Epoch: 1 Global Step: 19920 Fp16 Grad Scale: 524288 Required: 37 hours Training: 2022-01-08 00:09:06,567-Speed 5978.62 samples/sec Loss 14.1244 LearningRate 0.3844 Epoch: 1 Global Step: 19930 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:09:13,495-Speed 5913.72 samples/sec Loss 14.2306 LearningRate 0.3846 Epoch: 1 Global Step: 19940 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:09:20,396-Speed 5936.58 samples/sec Loss 14.1884 LearningRate 0.3848 Epoch: 1 Global Step: 19950 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:09:27,252-Speed 5975.62 samples/sec Loss 14.2145 LearningRate 0.3850 Epoch: 1 Global Step: 19960 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:09:34,108-Speed 5975.69 samples/sec Loss 14.2657 LearningRate 0.3852 Epoch: 1 Global Step: 19970 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:09:40,959-Speed 5979.21 samples/sec Loss 14.1910 LearningRate 0.3854 Epoch: 1 Global Step: 19980 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:09:47,838-Speed 5957.88 samples/sec Loss 14.2744 LearningRate 0.3856 Epoch: 1 Global Step: 19990 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:09:54,699-Speed 5971.21 samples/sec Loss 14.2818 LearningRate 0.3858 Epoch: 1 Global Step: 20000 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:10:21,473-[lfw][20000]XNorm: 23.188454 Training: 2022-01-08 00:10:21,474-[lfw][20000]Accuracy-Flip: 0.99617+-0.00308 Training: 2022-01-08 00:10:21,475-[lfw][20000]Accuracy-Highest: 0.99617 Training: 2022-01-08 00:10:52,567-[cfp_fp][20000]XNorm: 20.132425 Training: 2022-01-08 00:10:52,568-[cfp_fp][20000]Accuracy-Flip: 0.95814+-0.00899 Training: 2022-01-08 00:10:52,569-[cfp_fp][20000]Accuracy-Highest: 0.96643 Training: 2022-01-08 00:11:19,198-[agedb_30][20000]XNorm: 22.495338 Training: 2022-01-08 00:11:19,199-[agedb_30][20000]Accuracy-Flip: 0.94933+-0.01106 Training: 2022-01-08 00:11:19,200-[agedb_30][20000]Accuracy-Highest: 0.94933 Training: 2022-01-08 00:11:26,054-Speed 448.37 samples/sec Loss 14.1682 LearningRate 0.3860 Epoch: 1 Global Step: 20010 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:11:32,891-Speed 5992.84 samples/sec Loss 14.3106 LearningRate 0.3862 Epoch: 1 Global Step: 20020 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:11:39,759-Speed 5964.20 samples/sec Loss 14.2503 LearningRate 0.3863 Epoch: 1 Global Step: 20030 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:11:46,617-Speed 5974.12 samples/sec Loss 14.1657 LearningRate 0.3865 Epoch: 1 Global Step: 20040 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:11:53,473-Speed 5978.13 samples/sec Loss 14.1472 LearningRate 0.3867 Epoch: 1 Global Step: 20050 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:12:00,345-Speed 5961.51 samples/sec Loss 14.1893 LearningRate 0.3869 Epoch: 1 Global Step: 20060 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:12:07,208-Speed 5969.39 samples/sec Loss 14.2230 LearningRate 0.3871 Epoch: 1 Global Step: 20070 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:12:14,077-Speed 5963.67 samples/sec Loss 14.2626 LearningRate 0.3873 Epoch: 1 Global Step: 20080 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:12:20,950-Speed 5960.19 samples/sec Loss 14.2551 LearningRate 0.3875 Epoch: 1 Global Step: 20090 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:12:27,845-Speed 5941.60 samples/sec Loss 14.2190 LearningRate 0.3877 Epoch: 1 Global Step: 20100 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:12:34,736-Speed 5946.01 samples/sec Loss 14.1969 LearningRate 0.3879 Epoch: 1 Global Step: 20110 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:12:41,602-Speed 5965.94 samples/sec Loss 14.1787 LearningRate 0.3881 Epoch: 1 Global Step: 20120 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:12:48,495-Speed 5943.37 samples/sec Loss 14.2623 LearningRate 0.3883 Epoch: 1 Global Step: 20130 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:12:55,366-Speed 5962.69 samples/sec Loss 14.2177 LearningRate 0.3885 Epoch: 1 Global Step: 20140 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:13:02,256-Speed 5946.28 samples/sec Loss 14.2581 LearningRate 0.3887 Epoch: 1 Global Step: 20150 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:13:09,161-Speed 5932.48 samples/sec Loss 14.1890 LearningRate 0.3889 Epoch: 1 Global Step: 20160 Fp16 Grad Scale: 524288 Required: 37 hours Training: 2022-01-08 00:13:16,017-Speed 5976.26 samples/sec Loss 14.1762 LearningRate 0.3890 Epoch: 1 Global Step: 20170 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:13:22,873-Speed 5974.61 samples/sec Loss 14.0890 LearningRate 0.3892 Epoch: 1 Global Step: 20180 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:13:29,750-Speed 5957.47 samples/sec Loss 14.1919 LearningRate 0.3894 Epoch: 1 Global Step: 20190 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:13:36,747-Speed 5855.61 samples/sec Loss 14.2904 LearningRate 0.3896 Epoch: 1 Global Step: 20200 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:13:43,725-Speed 5871.17 samples/sec Loss 14.2980 LearningRate 0.3898 Epoch: 1 Global Step: 20210 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:13:50,692-Speed 5880.38 samples/sec Loss 14.2522 LearningRate 0.3900 Epoch: 1 Global Step: 20220 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:13:57,539-Speed 5982.75 samples/sec Loss 14.2189 LearningRate 0.3902 Epoch: 1 Global Step: 20230 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:14:04,397-Speed 5974.00 samples/sec Loss 14.3442 LearningRate 0.3904 Epoch: 1 Global Step: 20240 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:14:11,279-Speed 5955.03 samples/sec Loss 14.2672 LearningRate 0.3906 Epoch: 1 Global Step: 20250 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:14:18,145-Speed 5966.22 samples/sec Loss 14.3534 LearningRate 0.3908 Epoch: 1 Global Step: 20260 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:14:25,005-Speed 5972.35 samples/sec Loss 14.1867 LearningRate 0.3910 Epoch: 1 Global Step: 20270 Fp16 Grad Scale: 524288 Required: 37 hours Training: 2022-01-08 00:14:31,853-Speed 5982.07 samples/sec Loss 14.2767 LearningRate 0.3912 Epoch: 1 Global Step: 20280 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:14:38,725-Speed 5962.58 samples/sec Loss 14.2535 LearningRate 0.3914 Epoch: 1 Global Step: 20290 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:14:45,577-Speed 5978.72 samples/sec Loss 14.1849 LearningRate 0.3916 Epoch: 1 Global Step: 20300 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:14:52,435-Speed 5974.43 samples/sec Loss 14.2256 LearningRate 0.3917 Epoch: 1 Global Step: 20310 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:14:59,283-Speed 5981.65 samples/sec Loss 14.3316 LearningRate 0.3919 Epoch: 1 Global Step: 20320 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:15:06,134-Speed 5980.23 samples/sec Loss 14.2770 LearningRate 0.3921 Epoch: 1 Global Step: 20330 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:15:13,062-Speed 5913.08 samples/sec Loss 14.3326 LearningRate 0.3923 Epoch: 1 Global Step: 20340 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:15:19,943-Speed 5953.68 samples/sec Loss 14.2301 LearningRate 0.3925 Epoch: 1 Global Step: 20350 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:15:26,805-Speed 5970.26 samples/sec Loss 14.2283 LearningRate 0.3927 Epoch: 1 Global Step: 20360 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:15:33,668-Speed 5969.55 samples/sec Loss 14.3166 LearningRate 0.3929 Epoch: 1 Global Step: 20370 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:15:40,528-Speed 5972.44 samples/sec Loss 14.1838 LearningRate 0.3931 Epoch: 1 Global Step: 20380 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:15:47,382-Speed 5976.51 samples/sec Loss 14.2109 LearningRate 0.3933 Epoch: 1 Global Step: 20390 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:15:54,259-Speed 5957.40 samples/sec Loss 14.2276 LearningRate 0.3935 Epoch: 1 Global Step: 20400 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:16:01,112-Speed 5977.70 samples/sec Loss 14.3041 LearningRate 0.3937 Epoch: 1 Global Step: 20410 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:16:07,963-Speed 5980.51 samples/sec Loss 14.2910 LearningRate 0.3939 Epoch: 1 Global Step: 20420 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:16:14,825-Speed 5969.23 samples/sec Loss 14.2784 LearningRate 0.3941 Epoch: 1 Global Step: 20430 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:16:21,686-Speed 5971.34 samples/sec Loss 14.2588 LearningRate 0.3943 Epoch: 1 Global Step: 20440 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:16:28,544-Speed 5972.99 samples/sec Loss 14.3891 LearningRate 0.3944 Epoch: 1 Global Step: 20450 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:16:35,408-Speed 5968.79 samples/sec Loss 14.2704 LearningRate 0.3946 Epoch: 1 Global Step: 20460 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:16:42,254-Speed 5983.98 samples/sec Loss 14.2026 LearningRate 0.3948 Epoch: 1 Global Step: 20470 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:16:49,116-Speed 5970.21 samples/sec Loss 14.3087 LearningRate 0.3950 Epoch: 1 Global Step: 20480 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:16:55,978-Speed 5970.01 samples/sec Loss 14.2646 LearningRate 0.3952 Epoch: 1 Global Step: 20490 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:17:02,830-Speed 5978.81 samples/sec Loss 14.2050 LearningRate 0.3954 Epoch: 1 Global Step: 20500 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:17:09,697-Speed 5966.27 samples/sec Loss 14.1739 LearningRate 0.3956 Epoch: 1 Global Step: 20510 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:17:16,566-Speed 5963.80 samples/sec Loss 14.2827 LearningRate 0.3958 Epoch: 1 Global Step: 20520 Fp16 Grad Scale: 524288 Required: 37 hours Training: 2022-01-08 00:17:23,412-Speed 5984.52 samples/sec Loss 14.3574 LearningRate 0.3960 Epoch: 1 Global Step: 20530 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:17:30,284-Speed 5961.48 samples/sec Loss 14.2339 LearningRate 0.3962 Epoch: 1 Global Step: 20540 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:17:37,150-Speed 5966.63 samples/sec Loss 14.3780 LearningRate 0.3964 Epoch: 1 Global Step: 20550 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:17:44,007-Speed 5974.78 samples/sec Loss 14.3412 LearningRate 0.3966 Epoch: 1 Global Step: 20560 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:17:50,859-Speed 5979.34 samples/sec Loss 14.3192 LearningRate 0.3968 Epoch: 1 Global Step: 20570 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:17:57,713-Speed 5976.12 samples/sec Loss 14.3630 LearningRate 0.3970 Epoch: 1 Global Step: 20580 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:18:04,580-Speed 5966.49 samples/sec Loss 14.3197 LearningRate 0.3971 Epoch: 1 Global Step: 20590 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:18:11,444-Speed 5968.89 samples/sec Loss 14.2878 LearningRate 0.3973 Epoch: 1 Global Step: 20600 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:18:18,321-Speed 5956.96 samples/sec Loss 14.4368 LearningRate 0.3975 Epoch: 1 Global Step: 20610 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:18:25,185-Speed 5968.12 samples/sec Loss 14.3663 LearningRate 0.3977 Epoch: 1 Global Step: 20620 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:18:32,043-Speed 5974.63 samples/sec Loss 14.3877 LearningRate 0.3979 Epoch: 1 Global Step: 20630 Fp16 Grad Scale: 524288 Required: 37 hours Training: 2022-01-08 00:18:38,892-Speed 5981.42 samples/sec Loss 14.2651 LearningRate 0.3981 Epoch: 1 Global Step: 20640 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:18:45,739-Speed 5982.56 samples/sec Loss 14.4182 LearningRate 0.3983 Epoch: 1 Global Step: 20650 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:18:52,582-Speed 5987.36 samples/sec Loss 14.3083 LearningRate 0.3985 Epoch: 1 Global Step: 20660 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:18:59,432-Speed 5980.58 samples/sec Loss 14.3564 LearningRate 0.3987 Epoch: 1 Global Step: 20670 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:19:06,291-Speed 5972.25 samples/sec Loss 14.3171 LearningRate 0.3989 Epoch: 1 Global Step: 20680 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:19:13,139-Speed 5982.88 samples/sec Loss 14.3761 LearningRate 0.3991 Epoch: 1 Global Step: 20690 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:19:19,986-Speed 5982.54 samples/sec Loss 14.3618 LearningRate 0.3993 Epoch: 1 Global Step: 20700 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:19:26,846-Speed 5971.59 samples/sec Loss 14.3783 LearningRate 0.3995 Epoch: 1 Global Step: 20710 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:19:33,693-Speed 5983.49 samples/sec Loss 14.3348 LearningRate 0.3997 Epoch: 1 Global Step: 20720 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:19:40,545-Speed 5978.61 samples/sec Loss 14.2900 LearningRate 0.3998 Epoch: 1 Global Step: 20730 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:19:47,414-Speed 5965.87 samples/sec Loss 14.2773 LearningRate 0.4000 Epoch: 1 Global Step: 20740 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:20:10,652-Speed 1762.68 samples/sec Loss 14.3140 LearningRate 0.3999 Epoch: 2 Global Step: 20750 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:20:17,496-Speed 5987.93 samples/sec Loss 14.2361 LearningRate 0.3999 Epoch: 2 Global Step: 20760 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:20:26,146-Speed 4736.05 samples/sec Loss 14.3834 LearningRate 0.3999 Epoch: 2 Global Step: 20770 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:20:33,011-Speed 5967.52 samples/sec Loss 14.4183 LearningRate 0.3998 Epoch: 2 Global Step: 20780 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:20:39,856-Speed 5985.66 samples/sec Loss 14.3247 LearningRate 0.3998 Epoch: 2 Global Step: 20790 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:20:46,706-Speed 5991.00 samples/sec Loss 14.3725 LearningRate 0.3997 Epoch: 2 Global Step: 20800 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:20:53,551-Speed 5984.46 samples/sec Loss 14.3150 LearningRate 0.3997 Epoch: 2 Global Step: 20810 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:21:00,393-Speed 5988.46 samples/sec Loss 14.3427 LearningRate 0.3996 Epoch: 2 Global Step: 20820 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:21:07,245-Speed 5981.40 samples/sec Loss 14.2727 LearningRate 0.3996 Epoch: 2 Global Step: 20830 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:21:14,100-Speed 5976.79 samples/sec Loss 14.2951 LearningRate 0.3996 Epoch: 2 Global Step: 20840 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:21:20,947-Speed 5982.92 samples/sec Loss 14.1956 LearningRate 0.3995 Epoch: 2 Global Step: 20850 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:21:27,793-Speed 5984.77 samples/sec Loss 14.2624 LearningRate 0.3995 Epoch: 2 Global Step: 20860 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:21:34,664-Speed 5962.61 samples/sec Loss 14.4431 LearningRate 0.3994 Epoch: 2 Global Step: 20870 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:21:41,514-Speed 5981.33 samples/sec Loss 14.3564 LearningRate 0.3994 Epoch: 2 Global Step: 20880 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:21:48,411-Speed 5939.53 samples/sec Loss 14.3146 LearningRate 0.3993 Epoch: 2 Global Step: 20890 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:21:55,280-Speed 5963.96 samples/sec Loss 14.4168 LearningRate 0.3993 Epoch: 2 Global Step: 20900 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:22:02,164-Speed 5950.74 samples/sec Loss 14.2678 LearningRate 0.3993 Epoch: 2 Global Step: 20910 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:22:09,023-Speed 5973.73 samples/sec Loss 14.2952 LearningRate 0.3992 Epoch: 2 Global Step: 20920 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:22:15,927-Speed 5934.25 samples/sec Loss 14.4262 LearningRate 0.3992 Epoch: 2 Global Step: 20930 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:22:22,832-Speed 5933.34 samples/sec Loss 14.2494 LearningRate 0.3991 Epoch: 2 Global Step: 20940 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:22:29,767-Speed 5908.15 samples/sec Loss 14.3205 LearningRate 0.3991 Epoch: 2 Global Step: 20950 Fp16 Grad Scale: 524288 Required: 37 hours Training: 2022-01-08 00:22:36,647-Speed 5953.98 samples/sec Loss 14.2857 LearningRate 0.3990 Epoch: 2 Global Step: 20960 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:22:43,514-Speed 5966.82 samples/sec Loss 14.4554 LearningRate 0.3990 Epoch: 2 Global Step: 20970 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:22:50,378-Speed 5968.14 samples/sec Loss 14.2867 LearningRate 0.3990 Epoch: 2 Global Step: 20980 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:22:57,255-Speed 5957.35 samples/sec Loss 14.3021 LearningRate 0.3989 Epoch: 2 Global Step: 20990 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:23:04,116-Speed 5971.51 samples/sec Loss 14.3730 LearningRate 0.3989 Epoch: 2 Global Step: 21000 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:23:10,988-Speed 5961.26 samples/sec Loss 14.3949 LearningRate 0.3988 Epoch: 2 Global Step: 21010 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:23:17,836-Speed 5983.06 samples/sec Loss 14.2998 LearningRate 0.3988 Epoch: 2 Global Step: 21020 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:23:24,688-Speed 5978.75 samples/sec Loss 14.4026 LearningRate 0.3987 Epoch: 2 Global Step: 21030 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:23:31,536-Speed 5982.33 samples/sec Loss 14.3942 LearningRate 0.3987 Epoch: 2 Global Step: 21040 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:23:38,389-Speed 5978.24 samples/sec Loss 14.2842 LearningRate 0.3987 Epoch: 2 Global Step: 21050 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 00:23:45,256-Speed 5971.72 samples/sec Loss 14.3547 LearningRate 0.3986 Epoch: 2 Global Step: 21060 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:23:52,130-Speed 5959.57 samples/sec Loss 14.3218 LearningRate 0.3986 Epoch: 2 Global Step: 21070 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:23:58,981-Speed 5980.33 samples/sec Loss 14.3023 LearningRate 0.3985 Epoch: 2 Global Step: 21080 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:24:05,850-Speed 5968.97 samples/sec Loss 14.3818 LearningRate 0.3985 Epoch: 2 Global Step: 21090 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:24:12,730-Speed 5956.19 samples/sec Loss 14.3352 LearningRate 0.3984 Epoch: 2 Global Step: 21100 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:24:19,592-Speed 5969.66 samples/sec Loss 14.4112 LearningRate 0.3984 Epoch: 2 Global Step: 21110 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:24:26,487-Speed 5941.95 samples/sec Loss 14.2535 LearningRate 0.3984 Epoch: 2 Global Step: 21120 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 00:24:33,353-Speed 5966.57 samples/sec Loss 14.2664 LearningRate 0.3983 Epoch: 2 Global Step: 21130 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:24:40,252-Speed 5939.29 samples/sec Loss 14.2748 LearningRate 0.3983 Epoch: 2 Global Step: 21140 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:24:47,139-Speed 5950.32 samples/sec Loss 14.3054 LearningRate 0.3982 Epoch: 2 Global Step: 21150 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:24:54,009-Speed 5963.61 samples/sec Loss 14.3873 LearningRate 0.3982 Epoch: 2 Global Step: 21160 Fp16 Grad Scale: 524288 Required: 36 hours Training: 2022-01-08 00:25:00,863-Speed 5976.78 samples/sec Loss 14.3431 LearningRate 0.3982 Epoch: 2 Global Step: 21170 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:25:07,729-Speed 5967.24 samples/sec Loss 14.3505 LearningRate 0.3981 Epoch: 2 Global Step: 21180 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:25:14,580-Speed 5979.58 samples/sec Loss 14.3024 LearningRate 0.3981 Epoch: 2 Global Step: 21190 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:25:21,444-Speed 5968.73 samples/sec Loss 14.4004 LearningRate 0.3980 Epoch: 2 Global Step: 21200 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:25:28,304-Speed 5971.68 samples/sec Loss 14.3849 LearningRate 0.3980 Epoch: 2 Global Step: 21210 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:25:35,171-Speed 5965.49 samples/sec Loss 14.3819 LearningRate 0.3979 Epoch: 2 Global Step: 21220 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:25:42,032-Speed 5972.78 samples/sec Loss 14.3496 LearningRate 0.3979 Epoch: 2 Global Step: 21230 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:25:48,865-Speed 5995.87 samples/sec Loss 14.3283 LearningRate 0.3979 Epoch: 2 Global Step: 21240 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 00:25:55,716-Speed 5979.50 samples/sec Loss 14.2704 LearningRate 0.3978 Epoch: 2 Global Step: 21250 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 00:26:02,560-Speed 5985.05 samples/sec Loss 14.4301 LearningRate 0.3978 Epoch: 2 Global Step: 21260 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 00:26:09,413-Speed 5978.33 samples/sec Loss 14.3630 LearningRate 0.3977 Epoch: 2 Global Step: 21270 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 00:26:16,277-Speed 5968.61 samples/sec Loss 14.3071 LearningRate 0.3977 Epoch: 2 Global Step: 21280 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 00:26:23,132-Speed 5975.76 samples/sec Loss 14.3212 LearningRate 0.3976 Epoch: 2 Global Step: 21290 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 00:26:30,021-Speed 5946.70 samples/sec Loss 14.3807 LearningRate 0.3976 Epoch: 2 Global Step: 21300 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 00:26:36,875-Speed 5976.84 samples/sec Loss 14.3744 LearningRate 0.3976 Epoch: 2 Global Step: 21310 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 00:26:43,744-Speed 5963.98 samples/sec Loss 14.3029 LearningRate 0.3975 Epoch: 2 Global Step: 21320 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 00:26:50,605-Speed 5971.33 samples/sec Loss 14.3643 LearningRate 0.3975 Epoch: 2 Global Step: 21330 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 00:26:57,460-Speed 5976.37 samples/sec Loss 14.2226 LearningRate 0.3974 Epoch: 2 Global Step: 21340 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:27:04,326-Speed 5966.81 samples/sec Loss 14.2794 LearningRate 0.3974 Epoch: 2 Global Step: 21350 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:27:11,246-Speed 5920.27 samples/sec Loss 14.3303 LearningRate 0.3973 Epoch: 2 Global Step: 21360 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:27:18,098-Speed 5978.98 samples/sec Loss 14.4116 LearningRate 0.3973 Epoch: 2 Global Step: 21370 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:27:24,943-Speed 5984.05 samples/sec Loss 14.3438 LearningRate 0.3973 Epoch: 2 Global Step: 21380 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:27:31,811-Speed 5965.02 samples/sec Loss 14.2876 LearningRate 0.3972 Epoch: 2 Global Step: 21390 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:27:38,672-Speed 5971.18 samples/sec Loss 14.2442 LearningRate 0.3972 Epoch: 2 Global Step: 21400 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:27:45,536-Speed 5968.69 samples/sec Loss 14.2198 LearningRate 0.3971 Epoch: 2 Global Step: 21410 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:27:52,400-Speed 5968.30 samples/sec Loss 14.2628 LearningRate 0.3971 Epoch: 2 Global Step: 21420 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:27:59,246-Speed 5984.08 samples/sec Loss 14.3163 LearningRate 0.3970 Epoch: 2 Global Step: 21430 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:28:06,093-Speed 5983.45 samples/sec Loss 14.3459 LearningRate 0.3970 Epoch: 2 Global Step: 21440 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:28:12,959-Speed 5966.29 samples/sec Loss 14.2767 LearningRate 0.3970 Epoch: 2 Global Step: 21450 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:28:19,823-Speed 5968.62 samples/sec Loss 14.3242 LearningRate 0.3969 Epoch: 2 Global Step: 21460 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:28:26,690-Speed 5965.50 samples/sec Loss 14.1972 LearningRate 0.3969 Epoch: 2 Global Step: 21470 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:28:33,547-Speed 5974.81 samples/sec Loss 14.2503 LearningRate 0.3968 Epoch: 2 Global Step: 21480 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:28:40,423-Speed 5957.81 samples/sec Loss 14.2704 LearningRate 0.3968 Epoch: 2 Global Step: 21490 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:28:47,289-Speed 5966.51 samples/sec Loss 14.3235 LearningRate 0.3967 Epoch: 2 Global Step: 21500 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:28:54,162-Speed 5968.89 samples/sec Loss 14.3293 LearningRate 0.3967 Epoch: 2 Global Step: 21510 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:29:01,016-Speed 5977.20 samples/sec Loss 14.2658 LearningRate 0.3967 Epoch: 2 Global Step: 21520 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:29:07,881-Speed 5967.92 samples/sec Loss 14.2109 LearningRate 0.3966 Epoch: 2 Global Step: 21530 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:29:14,727-Speed 5983.01 samples/sec Loss 14.2416 LearningRate 0.3966 Epoch: 2 Global Step: 21540 Fp16 Grad Scale: 524288 Required: 36 hours Training: 2022-01-08 00:29:21,588-Speed 5971.41 samples/sec Loss 14.2938 LearningRate 0.3965 Epoch: 2 Global Step: 21550 Fp16 Grad Scale: 524288 Required: 36 hours Training: 2022-01-08 00:29:28,426-Speed 5991.26 samples/sec Loss 14.2347 LearningRate 0.3965 Epoch: 2 Global Step: 21560 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:29:35,285-Speed 5972.51 samples/sec Loss 14.2654 LearningRate 0.3964 Epoch: 2 Global Step: 21570 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:29:42,150-Speed 5967.95 samples/sec Loss 14.3196 LearningRate 0.3964 Epoch: 2 Global Step: 21580 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:29:49,010-Speed 5971.22 samples/sec Loss 14.3266 LearningRate 0.3964 Epoch: 2 Global Step: 21590 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:29:55,862-Speed 5978.82 samples/sec Loss 14.3442 LearningRate 0.3963 Epoch: 2 Global Step: 21600 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:30:02,718-Speed 5975.64 samples/sec Loss 14.2674 LearningRate 0.3963 Epoch: 2 Global Step: 21610 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:30:09,567-Speed 5982.04 samples/sec Loss 14.2480 LearningRate 0.3962 Epoch: 2 Global Step: 21620 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:30:16,428-Speed 5970.69 samples/sec Loss 14.3095 LearningRate 0.3962 Epoch: 2 Global Step: 21630 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:30:23,285-Speed 5974.31 samples/sec Loss 14.3140 LearningRate 0.3961 Epoch: 2 Global Step: 21640 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:30:30,145-Speed 5971.52 samples/sec Loss 14.1711 LearningRate 0.3961 Epoch: 2 Global Step: 21650 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:30:37,052-Speed 5931.43 samples/sec Loss 14.3286 LearningRate 0.3961 Epoch: 2 Global Step: 21660 Fp16 Grad Scale: 524288 Required: 36 hours Training: 2022-01-08 00:30:43,918-Speed 5966.98 samples/sec Loss 14.2176 LearningRate 0.3960 Epoch: 2 Global Step: 21670 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:30:50,766-Speed 5981.82 samples/sec Loss 14.2466 LearningRate 0.3960 Epoch: 2 Global Step: 21680 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:30:57,626-Speed 5971.99 samples/sec Loss 14.2014 LearningRate 0.3959 Epoch: 2 Global Step: 21690 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:31:04,505-Speed 5955.76 samples/sec Loss 14.3234 LearningRate 0.3959 Epoch: 2 Global Step: 21700 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:31:11,353-Speed 5982.04 samples/sec Loss 14.2770 LearningRate 0.3958 Epoch: 2 Global Step: 21710 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:31:18,214-Speed 5972.87 samples/sec Loss 14.2389 LearningRate 0.3958 Epoch: 2 Global Step: 21720 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:31:25,063-Speed 5981.47 samples/sec Loss 14.2748 LearningRate 0.3958 Epoch: 2 Global Step: 21730 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:31:31,908-Speed 5985.56 samples/sec Loss 14.2566 LearningRate 0.3957 Epoch: 2 Global Step: 21740 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:31:38,763-Speed 5975.52 samples/sec Loss 14.3305 LearningRate 0.3957 Epoch: 2 Global Step: 21750 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:31:45,626-Speed 5969.30 samples/sec Loss 14.2937 LearningRate 0.3956 Epoch: 2 Global Step: 21760 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:31:52,478-Speed 5978.76 samples/sec Loss 14.1934 LearningRate 0.3956 Epoch: 2 Global Step: 21770 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:31:59,364-Speed 5952.25 samples/sec Loss 14.1999 LearningRate 0.3955 Epoch: 2 Global Step: 21780 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:32:06,247-Speed 5953.59 samples/sec Loss 14.2191 LearningRate 0.3955 Epoch: 2 Global Step: 21790 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:32:13,103-Speed 5975.67 samples/sec Loss 14.2084 LearningRate 0.3955 Epoch: 2 Global Step: 21800 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:32:19,966-Speed 5969.11 samples/sec Loss 14.2641 LearningRate 0.3954 Epoch: 2 Global Step: 21810 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:32:26,824-Speed 5975.78 samples/sec Loss 14.2374 LearningRate 0.3954 Epoch: 2 Global Step: 21820 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:32:33,685-Speed 5970.52 samples/sec Loss 14.1974 LearningRate 0.3953 Epoch: 2 Global Step: 21830 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:32:40,556-Speed 5962.30 samples/sec Loss 14.1758 LearningRate 0.3953 Epoch: 2 Global Step: 21840 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:32:47,427-Speed 5962.76 samples/sec Loss 14.1688 LearningRate 0.3952 Epoch: 2 Global Step: 21850 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:32:54,279-Speed 5977.91 samples/sec Loss 14.3149 LearningRate 0.3952 Epoch: 2 Global Step: 21860 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:33:01,150-Speed 5965.24 samples/sec Loss 14.2255 LearningRate 0.3952 Epoch: 2 Global Step: 21870 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:33:08,020-Speed 5964.13 samples/sec Loss 14.2528 LearningRate 0.3951 Epoch: 2 Global Step: 21880 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:33:14,876-Speed 5975.31 samples/sec Loss 14.2058 LearningRate 0.3951 Epoch: 2 Global Step: 21890 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:33:21,729-Speed 5977.61 samples/sec Loss 14.1990 LearningRate 0.3950 Epoch: 2 Global Step: 21900 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:33:28,599-Speed 5963.43 samples/sec Loss 14.2609 LearningRate 0.3950 Epoch: 2 Global Step: 21910 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:33:35,449-Speed 5980.88 samples/sec Loss 14.2688 LearningRate 0.3949 Epoch: 2 Global Step: 21920 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:33:42,303-Speed 5977.01 samples/sec Loss 14.2504 LearningRate 0.3949 Epoch: 2 Global Step: 21930 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:33:49,164-Speed 5971.81 samples/sec Loss 14.2212 LearningRate 0.3949 Epoch: 2 Global Step: 21940 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:33:56,019-Speed 5976.33 samples/sec Loss 14.2014 LearningRate 0.3948 Epoch: 2 Global Step: 21950 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:34:05,057-Speed 4534.06 samples/sec Loss 14.2327 LearningRate 0.3948 Epoch: 2 Global Step: 21960 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:34:11,906-Speed 5982.65 samples/sec Loss 14.2860 LearningRate 0.3947 Epoch: 2 Global Step: 21970 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:34:18,775-Speed 5963.55 samples/sec Loss 14.2784 LearningRate 0.3947 Epoch: 2 Global Step: 21980 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:34:25,639-Speed 5968.74 samples/sec Loss 14.2768 LearningRate 0.3947 Epoch: 2 Global Step: 21990 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:34:32,495-Speed 5975.60 samples/sec Loss 14.1829 LearningRate 0.3946 Epoch: 2 Global Step: 22000 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:34:39,454-Speed 5887.81 samples/sec Loss 14.2072 LearningRate 0.3946 Epoch: 2 Global Step: 22010 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:34:46,319-Speed 5967.57 samples/sec Loss 14.2233 LearningRate 0.3945 Epoch: 2 Global Step: 22020 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:34:53,186-Speed 5965.87 samples/sec Loss 14.2054 LearningRate 0.3945 Epoch: 2 Global Step: 22030 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:35:00,047-Speed 5970.93 samples/sec Loss 14.2182 LearningRate 0.3944 Epoch: 2 Global Step: 22040 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:35:06,918-Speed 5961.85 samples/sec Loss 14.2360 LearningRate 0.3944 Epoch: 2 Global Step: 22050 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:35:13,797-Speed 5956.03 samples/sec Loss 14.1742 LearningRate 0.3944 Epoch: 2 Global Step: 22060 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:35:20,686-Speed 5946.81 samples/sec Loss 14.2176 LearningRate 0.3943 Epoch: 2 Global Step: 22070 Fp16 Grad Scale: 524288 Required: 36 hours Training: 2022-01-08 00:35:27,565-Speed 5955.43 samples/sec Loss 14.2145 LearningRate 0.3943 Epoch: 2 Global Step: 22080 Fp16 Grad Scale: 524288 Required: 36 hours Training: 2022-01-08 00:35:34,435-Speed 5963.13 samples/sec Loss 14.1871 LearningRate 0.3942 Epoch: 2 Global Step: 22090 Fp16 Grad Scale: 524288 Required: 36 hours Training: 2022-01-08 00:35:41,291-Speed 5976.18 samples/sec Loss 14.3394 LearningRate 0.3942 Epoch: 2 Global Step: 22100 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:35:48,157-Speed 5966.42 samples/sec Loss 14.1843 LearningRate 0.3941 Epoch: 2 Global Step: 22110 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:35:55,023-Speed 5966.77 samples/sec Loss 14.3061 LearningRate 0.3941 Epoch: 2 Global Step: 22120 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:36:01,913-Speed 5946.19 samples/sec Loss 14.2714 LearningRate 0.3941 Epoch: 2 Global Step: 22130 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:36:08,791-Speed 5957.01 samples/sec Loss 14.1357 LearningRate 0.3940 Epoch: 2 Global Step: 22140 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:36:15,645-Speed 5977.86 samples/sec Loss 14.2081 LearningRate 0.3940 Epoch: 2 Global Step: 22150 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:36:22,510-Speed 5967.34 samples/sec Loss 14.2459 LearningRate 0.3939 Epoch: 2 Global Step: 22160 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:36:29,381-Speed 5962.71 samples/sec Loss 14.1652 LearningRate 0.3939 Epoch: 2 Global Step: 22170 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:36:36,236-Speed 5976.52 samples/sec Loss 14.2050 LearningRate 0.3938 Epoch: 2 Global Step: 22180 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:36:43,095-Speed 5973.04 samples/sec Loss 14.1626 LearningRate 0.3938 Epoch: 2 Global Step: 22190 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:36:49,943-Speed 5982.28 samples/sec Loss 14.1306 LearningRate 0.3938 Epoch: 2 Global Step: 22200 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:36:56,833-Speed 5948.17 samples/sec Loss 14.1156 LearningRate 0.3937 Epoch: 2 Global Step: 22210 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:37:03,702-Speed 5963.87 samples/sec Loss 14.1536 LearningRate 0.3937 Epoch: 2 Global Step: 22220 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:37:10,562-Speed 5972.30 samples/sec Loss 14.1750 LearningRate 0.3936 Epoch: 2 Global Step: 22230 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:37:17,416-Speed 5977.44 samples/sec Loss 14.2190 LearningRate 0.3936 Epoch: 2 Global Step: 22240 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:37:24,282-Speed 5967.03 samples/sec Loss 14.1557 LearningRate 0.3935 Epoch: 2 Global Step: 22250 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:37:31,140-Speed 5973.38 samples/sec Loss 14.1897 LearningRate 0.3935 Epoch: 2 Global Step: 22260 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:37:38,006-Speed 5967.43 samples/sec Loss 14.1485 LearningRate 0.3935 Epoch: 2 Global Step: 22270 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:37:44,894-Speed 5947.88 samples/sec Loss 14.1886 LearningRate 0.3934 Epoch: 2 Global Step: 22280 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:37:51,756-Speed 5970.44 samples/sec Loss 14.2661 LearningRate 0.3934 Epoch: 2 Global Step: 22290 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:37:58,605-Speed 5980.99 samples/sec Loss 14.1290 LearningRate 0.3933 Epoch: 2 Global Step: 22300 Fp16 Grad Scale: 524288 Required: 36 hours Training: 2022-01-08 00:38:05,461-Speed 5975.31 samples/sec Loss 14.2003 LearningRate 0.3933 Epoch: 2 Global Step: 22310 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:38:12,331-Speed 5963.97 samples/sec Loss 14.1672 LearningRate 0.3932 Epoch: 2 Global Step: 22320 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:38:19,228-Speed 5939.31 samples/sec Loss 14.1905 LearningRate 0.3932 Epoch: 2 Global Step: 22330 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:38:26,085-Speed 5975.24 samples/sec Loss 14.1574 LearningRate 0.3932 Epoch: 2 Global Step: 22340 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:38:32,938-Speed 5977.71 samples/sec Loss 14.1776 LearningRate 0.3931 Epoch: 2 Global Step: 22350 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:38:39,783-Speed 5984.75 samples/sec Loss 14.1735 LearningRate 0.3931 Epoch: 2 Global Step: 22360 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:38:46,647-Speed 5969.27 samples/sec Loss 14.1370 LearningRate 0.3930 Epoch: 2 Global Step: 22370 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 00:38:53,487-Speed 5989.01 samples/sec Loss 14.1593 LearningRate 0.3930 Epoch: 2 Global Step: 22380 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 00:39:00,336-Speed 5981.53 samples/sec Loss 14.1947 LearningRate 0.3930 Epoch: 2 Global Step: 22390 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 00:39:07,193-Speed 5977.65 samples/sec Loss 14.1829 LearningRate 0.3929 Epoch: 2 Global Step: 22400 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 00:39:14,044-Speed 5982.30 samples/sec Loss 14.1760 LearningRate 0.3929 Epoch: 2 Global Step: 22410 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 00:39:20,927-Speed 5952.53 samples/sec Loss 14.1884 LearningRate 0.3928 Epoch: 2 Global Step: 22420 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 00:39:27,783-Speed 5975.07 samples/sec Loss 14.1778 LearningRate 0.3928 Epoch: 2 Global Step: 22430 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 00:39:34,652-Speed 5965.03 samples/sec Loss 14.1320 LearningRate 0.3927 Epoch: 2 Global Step: 22440 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 00:39:41,522-Speed 5962.65 samples/sec Loss 14.1271 LearningRate 0.3927 Epoch: 2 Global Step: 22450 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 00:39:48,397-Speed 5959.45 samples/sec Loss 14.1441 LearningRate 0.3927 Epoch: 2 Global Step: 22460 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 00:39:55,262-Speed 5968.19 samples/sec Loss 14.0453 LearningRate 0.3926 Epoch: 2 Global Step: 22470 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:40:02,126-Speed 5968.41 samples/sec Loss 14.0934 LearningRate 0.3926 Epoch: 2 Global Step: 22480 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:40:09,003-Speed 5957.15 samples/sec Loss 14.1610 LearningRate 0.3925 Epoch: 2 Global Step: 22490 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:40:15,873-Speed 5964.11 samples/sec Loss 14.1890 LearningRate 0.3925 Epoch: 2 Global Step: 22500 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:40:22,747-Speed 5959.60 samples/sec Loss 14.1203 LearningRate 0.3924 Epoch: 2 Global Step: 22510 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:40:29,609-Speed 5969.52 samples/sec Loss 14.1778 LearningRate 0.3924 Epoch: 2 Global Step: 22520 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:40:36,467-Speed 5973.85 samples/sec Loss 14.1334 LearningRate 0.3924 Epoch: 2 Global Step: 22530 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:40:43,331-Speed 5968.31 samples/sec Loss 14.1505 LearningRate 0.3923 Epoch: 2 Global Step: 22540 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:40:50,173-Speed 5987.25 samples/sec Loss 14.1639 LearningRate 0.3923 Epoch: 2 Global Step: 22550 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:40:57,053-Speed 5954.83 samples/sec Loss 14.1391 LearningRate 0.3922 Epoch: 2 Global Step: 22560 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:41:03,912-Speed 5973.32 samples/sec Loss 14.0848 LearningRate 0.3922 Epoch: 2 Global Step: 22570 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:41:10,785-Speed 5960.57 samples/sec Loss 14.1164 LearningRate 0.3921 Epoch: 2 Global Step: 22580 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:41:17,632-Speed 5983.26 samples/sec Loss 14.0412 LearningRate 0.3921 Epoch: 2 Global Step: 22590 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:41:24,483-Speed 5980.46 samples/sec Loss 14.1738 LearningRate 0.3921 Epoch: 2 Global Step: 22600 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:41:31,356-Speed 5959.93 samples/sec Loss 14.0443 LearningRate 0.3920 Epoch: 2 Global Step: 22610 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:41:38,207-Speed 5980.28 samples/sec Loss 14.0466 LearningRate 0.3920 Epoch: 2 Global Step: 22620 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:41:45,071-Speed 5968.40 samples/sec Loss 14.0778 LearningRate 0.3919 Epoch: 2 Global Step: 22630 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:41:51,974-Speed 5933.83 samples/sec Loss 14.0713 LearningRate 0.3919 Epoch: 2 Global Step: 22640 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:41:58,839-Speed 5967.68 samples/sec Loss 14.1006 LearningRate 0.3918 Epoch: 2 Global Step: 22650 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:42:05,722-Speed 5952.72 samples/sec Loss 14.0768 LearningRate 0.3918 Epoch: 2 Global Step: 22660 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:42:12,573-Speed 5979.28 samples/sec Loss 14.1881 LearningRate 0.3918 Epoch: 2 Global Step: 22670 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:42:19,434-Speed 5972.90 samples/sec Loss 14.1648 LearningRate 0.3917 Epoch: 2 Global Step: 22680 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:42:26,302-Speed 5967.16 samples/sec Loss 14.0715 LearningRate 0.3917 Epoch: 2 Global Step: 22690 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:42:33,170-Speed 5964.91 samples/sec Loss 14.1922 LearningRate 0.3916 Epoch: 2 Global Step: 22700 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:42:40,040-Speed 5963.46 samples/sec Loss 14.1711 LearningRate 0.3916 Epoch: 2 Global Step: 22710 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:42:46,894-Speed 5977.34 samples/sec Loss 14.1372 LearningRate 0.3915 Epoch: 2 Global Step: 22720 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:42:53,738-Speed 5985.10 samples/sec Loss 14.1419 LearningRate 0.3915 Epoch: 2 Global Step: 22730 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:43:00,604-Speed 5968.40 samples/sec Loss 14.1381 LearningRate 0.3915 Epoch: 2 Global Step: 22740 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:43:07,459-Speed 5977.34 samples/sec Loss 14.2863 LearningRate 0.3914 Epoch: 2 Global Step: 22750 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:43:14,319-Speed 5971.68 samples/sec Loss 14.1946 LearningRate 0.3914 Epoch: 2 Global Step: 22760 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:43:21,171-Speed 5978.51 samples/sec Loss 14.0913 LearningRate 0.3913 Epoch: 2 Global Step: 22770 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:43:28,020-Speed 5981.96 samples/sec Loss 14.0679 LearningRate 0.3913 Epoch: 2 Global Step: 22780 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:43:34,864-Speed 5985.94 samples/sec Loss 14.2658 LearningRate 0.3913 Epoch: 2 Global Step: 22790 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:43:41,708-Speed 5985.52 samples/sec Loss 14.0832 LearningRate 0.3912 Epoch: 2 Global Step: 22800 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:43:48,575-Speed 5968.19 samples/sec Loss 14.0574 LearningRate 0.3912 Epoch: 2 Global Step: 22810 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:43:55,437-Speed 5970.58 samples/sec Loss 14.1184 LearningRate 0.3911 Epoch: 2 Global Step: 22820 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:44:02,303-Speed 5966.75 samples/sec Loss 14.0254 LearningRate 0.3911 Epoch: 2 Global Step: 22830 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:44:09,163-Speed 5972.81 samples/sec Loss 14.1197 LearningRate 0.3910 Epoch: 2 Global Step: 22840 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:44:16,031-Speed 5964.24 samples/sec Loss 14.0599 LearningRate 0.3910 Epoch: 2 Global Step: 22850 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:44:22,903-Speed 5962.40 samples/sec Loss 14.0116 LearningRate 0.3910 Epoch: 2 Global Step: 22860 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:44:29,770-Speed 5965.75 samples/sec Loss 14.0778 LearningRate 0.3909 Epoch: 2 Global Step: 22870 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:44:36,634-Speed 5970.74 samples/sec Loss 14.0903 LearningRate 0.3909 Epoch: 2 Global Step: 22880 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:44:43,486-Speed 5979.08 samples/sec Loss 14.0374 LearningRate 0.3908 Epoch: 2 Global Step: 22890 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:44:50,372-Speed 5949.84 samples/sec Loss 14.1685 LearningRate 0.3908 Epoch: 2 Global Step: 22900 Fp16 Grad Scale: 524288 Required: 36 hours Training: 2022-01-08 00:44:57,226-Speed 5976.59 samples/sec Loss 14.0940 LearningRate 0.3907 Epoch: 2 Global Step: 22910 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:45:04,083-Speed 5975.40 samples/sec Loss 14.1220 LearningRate 0.3907 Epoch: 2 Global Step: 22920 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:45:10,947-Speed 5968.79 samples/sec Loss 14.0751 LearningRate 0.3907 Epoch: 2 Global Step: 22930 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:45:17,795-Speed 5982.94 samples/sec Loss 14.0276 LearningRate 0.3906 Epoch: 2 Global Step: 22940 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:45:24,661-Speed 5967.35 samples/sec Loss 14.0342 LearningRate 0.3906 Epoch: 2 Global Step: 22950 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:45:31,533-Speed 5960.90 samples/sec Loss 14.1023 LearningRate 0.3905 Epoch: 2 Global Step: 22960 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:45:38,379-Speed 5984.63 samples/sec Loss 14.0893 LearningRate 0.3905 Epoch: 2 Global Step: 22970 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:45:45,227-Speed 5981.96 samples/sec Loss 14.0414 LearningRate 0.3904 Epoch: 2 Global Step: 22980 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:45:52,078-Speed 5979.96 samples/sec Loss 14.1573 LearningRate 0.3904 Epoch: 2 Global Step: 22990 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:45:58,923-Speed 5984.20 samples/sec Loss 14.0091 LearningRate 0.3904 Epoch: 2 Global Step: 23000 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:46:05,781-Speed 5974.42 samples/sec Loss 13.9939 LearningRate 0.3903 Epoch: 2 Global Step: 23010 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:46:12,630-Speed 5980.52 samples/sec Loss 14.0397 LearningRate 0.3903 Epoch: 2 Global Step: 23020 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:46:19,494-Speed 5968.83 samples/sec Loss 14.0265 LearningRate 0.3902 Epoch: 2 Global Step: 23030 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:46:26,356-Speed 5969.77 samples/sec Loss 14.1000 LearningRate 0.3902 Epoch: 2 Global Step: 23040 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:46:33,236-Speed 5955.40 samples/sec Loss 14.0987 LearningRate 0.3902 Epoch: 2 Global Step: 23050 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:46:40,092-Speed 5975.96 samples/sec Loss 14.0380 LearningRate 0.3901 Epoch: 2 Global Step: 23060 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:46:46,948-Speed 5975.25 samples/sec Loss 14.0538 LearningRate 0.3901 Epoch: 2 Global Step: 23070 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:46:53,803-Speed 5977.77 samples/sec Loss 13.9945 LearningRate 0.3900 Epoch: 2 Global Step: 23080 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:47:00,663-Speed 5971.34 samples/sec Loss 14.1052 LearningRate 0.3900 Epoch: 2 Global Step: 23090 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:47:07,532-Speed 5964.64 samples/sec Loss 14.1976 LearningRate 0.3899 Epoch: 2 Global Step: 23100 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:47:14,398-Speed 5967.12 samples/sec Loss 14.0025 LearningRate 0.3899 Epoch: 2 Global Step: 23110 Fp16 Grad Scale: 524288 Required: 36 hours Training: 2022-01-08 00:47:21,245-Speed 5983.49 samples/sec Loss 14.0998 LearningRate 0.3899 Epoch: 2 Global Step: 23120 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:47:28,093-Speed 5982.65 samples/sec Loss 14.0715 LearningRate 0.3898 Epoch: 2 Global Step: 23130 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:47:34,999-Speed 5932.08 samples/sec Loss 14.1028 LearningRate 0.3898 Epoch: 2 Global Step: 23140 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:47:41,873-Speed 5960.01 samples/sec Loss 14.0421 LearningRate 0.3897 Epoch: 2 Global Step: 23150 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:47:48,795-Speed 5920.37 samples/sec Loss 14.0352 LearningRate 0.3897 Epoch: 2 Global Step: 23160 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:47:55,660-Speed 5967.35 samples/sec Loss 14.0011 LearningRate 0.3896 Epoch: 2 Global Step: 23170 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:48:02,562-Speed 5936.29 samples/sec Loss 14.0740 LearningRate 0.3896 Epoch: 2 Global Step: 23180 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:48:09,406-Speed 5986.04 samples/sec Loss 14.0124 LearningRate 0.3896 Epoch: 2 Global Step: 23190 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:48:16,249-Speed 5986.75 samples/sec Loss 14.0447 LearningRate 0.3895 Epoch: 2 Global Step: 23200 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:48:23,104-Speed 5975.86 samples/sec Loss 14.1016 LearningRate 0.3895 Epoch: 2 Global Step: 23210 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:48:29,947-Speed 5986.96 samples/sec Loss 14.0516 LearningRate 0.3894 Epoch: 2 Global Step: 23220 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:48:36,807-Speed 5974.85 samples/sec Loss 14.0936 LearningRate 0.3894 Epoch: 2 Global Step: 23230 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:48:43,680-Speed 5960.21 samples/sec Loss 14.0500 LearningRate 0.3893 Epoch: 2 Global Step: 23240 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:48:50,549-Speed 5967.56 samples/sec Loss 14.1053 LearningRate 0.3893 Epoch: 2 Global Step: 23250 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:48:57,409-Speed 5971.86 samples/sec Loss 14.0067 LearningRate 0.3893 Epoch: 2 Global Step: 23260 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:49:04,267-Speed 5973.87 samples/sec Loss 14.0054 LearningRate 0.3892 Epoch: 2 Global Step: 23270 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:49:11,144-Speed 5958.66 samples/sec Loss 14.1754 LearningRate 0.3892 Epoch: 2 Global Step: 23280 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:49:18,004-Speed 5972.35 samples/sec Loss 14.0135 LearningRate 0.3891 Epoch: 2 Global Step: 23290 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:49:24,841-Speed 5992.02 samples/sec Loss 14.1041 LearningRate 0.3891 Epoch: 2 Global Step: 23300 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 00:49:31,689-Speed 5982.59 samples/sec Loss 14.0937 LearningRate 0.3891 Epoch: 2 Global Step: 23310 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 00:49:38,582-Speed 5944.07 samples/sec Loss 14.1093 LearningRate 0.3890 Epoch: 2 Global Step: 23320 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 00:49:45,542-Speed 5886.17 samples/sec Loss 14.0455 LearningRate 0.3890 Epoch: 2 Global Step: 23330 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 00:49:52,491-Speed 5895.40 samples/sec Loss 13.9352 LearningRate 0.3889 Epoch: 2 Global Step: 23340 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 00:49:59,352-Speed 5971.51 samples/sec Loss 13.9859 LearningRate 0.3889 Epoch: 2 Global Step: 23350 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 00:50:06,242-Speed 5945.85 samples/sec Loss 13.9780 LearningRate 0.3888 Epoch: 2 Global Step: 23360 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 00:50:13,097-Speed 5975.79 samples/sec Loss 14.0392 LearningRate 0.3888 Epoch: 2 Global Step: 23370 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 00:50:19,972-Speed 5959.10 samples/sec Loss 13.9398 LearningRate 0.3888 Epoch: 2 Global Step: 23380 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 00:50:26,846-Speed 5959.98 samples/sec Loss 14.0097 LearningRate 0.3887 Epoch: 2 Global Step: 23390 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 00:50:33,701-Speed 5976.78 samples/sec Loss 13.9574 LearningRate 0.3887 Epoch: 2 Global Step: 23400 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:50:40,566-Speed 5967.42 samples/sec Loss 14.0373 LearningRate 0.3886 Epoch: 2 Global Step: 23410 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:50:47,411-Speed 5984.44 samples/sec Loss 14.0196 LearningRate 0.3886 Epoch: 2 Global Step: 23420 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:50:54,257-Speed 5986.51 samples/sec Loss 14.0445 LearningRate 0.3885 Epoch: 2 Global Step: 23430 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:51:01,152-Speed 5941.75 samples/sec Loss 14.0080 LearningRate 0.3885 Epoch: 2 Global Step: 23440 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:51:08,043-Speed 5945.47 samples/sec Loss 14.0169 LearningRate 0.3885 Epoch: 2 Global Step: 23450 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:51:14,890-Speed 5983.46 samples/sec Loss 14.0015 LearningRate 0.3884 Epoch: 2 Global Step: 23460 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:51:21,744-Speed 5977.18 samples/sec Loss 13.9358 LearningRate 0.3884 Epoch: 2 Global Step: 23470 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:51:28,621-Speed 5956.98 samples/sec Loss 14.0190 LearningRate 0.3883 Epoch: 2 Global Step: 23480 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:51:35,474-Speed 5978.39 samples/sec Loss 14.0386 LearningRate 0.3883 Epoch: 2 Global Step: 23490 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:51:42,323-Speed 5981.70 samples/sec Loss 14.0802 LearningRate 0.3882 Epoch: 2 Global Step: 23500 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:51:49,173-Speed 5980.81 samples/sec Loss 14.1407 LearningRate 0.3882 Epoch: 2 Global Step: 23510 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:51:56,019-Speed 5983.97 samples/sec Loss 14.0265 LearningRate 0.3882 Epoch: 2 Global Step: 23520 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:52:02,860-Speed 5989.68 samples/sec Loss 14.0864 LearningRate 0.3881 Epoch: 2 Global Step: 23530 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:52:09,728-Speed 5965.25 samples/sec Loss 14.1064 LearningRate 0.3881 Epoch: 2 Global Step: 23540 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:52:16,585-Speed 5974.98 samples/sec Loss 14.0331 LearningRate 0.3880 Epoch: 2 Global Step: 23550 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:52:23,435-Speed 5981.19 samples/sec Loss 13.9939 LearningRate 0.3880 Epoch: 2 Global Step: 23560 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:52:30,290-Speed 5976.56 samples/sec Loss 14.0511 LearningRate 0.3880 Epoch: 2 Global Step: 23570 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:52:37,178-Speed 5947.05 samples/sec Loss 13.9675 LearningRate 0.3879 Epoch: 2 Global Step: 23580 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:52:44,039-Speed 5970.94 samples/sec Loss 13.9894 LearningRate 0.3879 Epoch: 2 Global Step: 23590 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:52:50,886-Speed 5983.41 samples/sec Loss 13.9550 LearningRate 0.3878 Epoch: 2 Global Step: 23600 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:52:57,735-Speed 5981.26 samples/sec Loss 14.0503 LearningRate 0.3878 Epoch: 2 Global Step: 23610 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:53:04,592-Speed 5975.27 samples/sec Loss 13.8863 LearningRate 0.3877 Epoch: 2 Global Step: 23620 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:53:11,461-Speed 5964.15 samples/sec Loss 13.9802 LearningRate 0.3877 Epoch: 2 Global Step: 23630 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:53:18,332-Speed 5961.87 samples/sec Loss 14.0260 LearningRate 0.3877 Epoch: 2 Global Step: 23640 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:53:25,187-Speed 5976.10 samples/sec Loss 14.0060 LearningRate 0.3876 Epoch: 2 Global Step: 23650 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:53:32,042-Speed 5976.11 samples/sec Loss 13.9623 LearningRate 0.3876 Epoch: 2 Global Step: 23660 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:53:38,892-Speed 5981.02 samples/sec Loss 14.0539 LearningRate 0.3875 Epoch: 2 Global Step: 23670 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:53:45,740-Speed 5983.11 samples/sec Loss 14.0299 LearningRate 0.3875 Epoch: 2 Global Step: 23680 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:53:52,589-Speed 5981.34 samples/sec Loss 14.0155 LearningRate 0.3874 Epoch: 2 Global Step: 23690 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:53:59,434-Speed 5984.31 samples/sec Loss 14.0276 LearningRate 0.3874 Epoch: 2 Global Step: 23700 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:54:06,285-Speed 5979.81 samples/sec Loss 13.9930 LearningRate 0.3874 Epoch: 2 Global Step: 23710 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:54:13,153-Speed 5965.14 samples/sec Loss 13.9748 LearningRate 0.3873 Epoch: 2 Global Step: 23720 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:54:20,011-Speed 5973.56 samples/sec Loss 14.0608 LearningRate 0.3873 Epoch: 2 Global Step: 23730 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:54:26,875-Speed 5968.72 samples/sec Loss 13.9937 LearningRate 0.3872 Epoch: 2 Global Step: 23740 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:54:33,734-Speed 5972.50 samples/sec Loss 14.0031 LearningRate 0.3872 Epoch: 2 Global Step: 23750 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:54:40,591-Speed 5974.55 samples/sec Loss 13.9673 LearningRate 0.3872 Epoch: 2 Global Step: 23760 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:54:47,475-Speed 5950.61 samples/sec Loss 13.9923 LearningRate 0.3871 Epoch: 2 Global Step: 23770 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:54:54,319-Speed 5985.89 samples/sec Loss 13.9806 LearningRate 0.3871 Epoch: 2 Global Step: 23780 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:55:01,165-Speed 5984.85 samples/sec Loss 14.0062 LearningRate 0.3870 Epoch: 2 Global Step: 23790 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:55:08,018-Speed 5977.92 samples/sec Loss 13.9517 LearningRate 0.3870 Epoch: 2 Global Step: 23800 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:55:14,924-Speed 5932.48 samples/sec Loss 14.0137 LearningRate 0.3869 Epoch: 2 Global Step: 23810 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:55:21,790-Speed 5966.88 samples/sec Loss 13.9781 LearningRate 0.3869 Epoch: 2 Global Step: 23820 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:55:28,651-Speed 5971.06 samples/sec Loss 14.0616 LearningRate 0.3869 Epoch: 2 Global Step: 23830 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:55:35,509-Speed 5973.98 samples/sec Loss 13.9684 LearningRate 0.3868 Epoch: 2 Global Step: 23840 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:55:42,369-Speed 5972.28 samples/sec Loss 13.9625 LearningRate 0.3868 Epoch: 2 Global Step: 23850 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:55:49,225-Speed 5975.17 samples/sec Loss 13.9224 LearningRate 0.3867 Epoch: 2 Global Step: 23860 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:55:56,079-Speed 5979.45 samples/sec Loss 14.0215 LearningRate 0.3867 Epoch: 2 Global Step: 23870 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:56:02,943-Speed 5968.85 samples/sec Loss 14.0371 LearningRate 0.3866 Epoch: 2 Global Step: 23880 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:56:09,821-Speed 5956.71 samples/sec Loss 13.9542 LearningRate 0.3866 Epoch: 2 Global Step: 23890 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:56:16,692-Speed 5962.35 samples/sec Loss 13.9151 LearningRate 0.3866 Epoch: 2 Global Step: 23900 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:56:23,543-Speed 5980.28 samples/sec Loss 14.0051 LearningRate 0.3865 Epoch: 2 Global Step: 23910 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:56:30,393-Speed 5980.49 samples/sec Loss 13.8735 LearningRate 0.3865 Epoch: 2 Global Step: 23920 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:56:37,279-Speed 5949.36 samples/sec Loss 13.9737 LearningRate 0.3864 Epoch: 2 Global Step: 23930 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:56:44,134-Speed 5980.99 samples/sec Loss 13.9383 LearningRate 0.3864 Epoch: 2 Global Step: 23940 Fp16 Grad Scale: 524288 Required: 36 hours Training: 2022-01-08 00:56:50,983-Speed 5981.75 samples/sec Loss 13.8777 LearningRate 0.3864 Epoch: 2 Global Step: 23950 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:56:57,852-Speed 5963.66 samples/sec Loss 13.9657 LearningRate 0.3863 Epoch: 2 Global Step: 23960 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:57:04,716-Speed 5971.23 samples/sec Loss 13.8737 LearningRate 0.3863 Epoch: 2 Global Step: 23970 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:57:11,567-Speed 5979.74 samples/sec Loss 13.9538 LearningRate 0.3862 Epoch: 2 Global Step: 23980 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:57:18,442-Speed 5965.75 samples/sec Loss 14.0127 LearningRate 0.3862 Epoch: 2 Global Step: 23990 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:57:25,297-Speed 5976.43 samples/sec Loss 13.9917 LearningRate 0.3861 Epoch: 2 Global Step: 24000 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:57:32,148-Speed 5980.37 samples/sec Loss 13.8718 LearningRate 0.3861 Epoch: 2 Global Step: 24010 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:57:38,998-Speed 5980.38 samples/sec Loss 13.9299 LearningRate 0.3861 Epoch: 2 Global Step: 24020 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:57:45,883-Speed 5950.24 samples/sec Loss 13.9139 LearningRate 0.3860 Epoch: 2 Global Step: 24030 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:57:52,758-Speed 5959.20 samples/sec Loss 13.9320 LearningRate 0.3860 Epoch: 2 Global Step: 24040 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:57:59,624-Speed 5966.47 samples/sec Loss 13.9672 LearningRate 0.3859 Epoch: 2 Global Step: 24050 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:58:06,476-Speed 5979.49 samples/sec Loss 14.0092 LearningRate 0.3859 Epoch: 2 Global Step: 24060 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:58:13,357-Speed 5952.70 samples/sec Loss 13.9779 LearningRate 0.3858 Epoch: 2 Global Step: 24070 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:58:20,245-Speed 5948.94 samples/sec Loss 13.9416 LearningRate 0.3858 Epoch: 2 Global Step: 24080 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:58:27,095-Speed 5981.23 samples/sec Loss 14.0203 LearningRate 0.3858 Epoch: 2 Global Step: 24090 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:58:33,962-Speed 5965.72 samples/sec Loss 13.9513 LearningRate 0.3857 Epoch: 2 Global Step: 24100 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:58:40,824-Speed 5970.45 samples/sec Loss 13.9184 LearningRate 0.3857 Epoch: 2 Global Step: 24110 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:58:47,690-Speed 5966.56 samples/sec Loss 13.9361 LearningRate 0.3856 Epoch: 2 Global Step: 24120 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:58:54,558-Speed 5965.25 samples/sec Loss 14.0023 LearningRate 0.3856 Epoch: 2 Global Step: 24130 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:59:01,407-Speed 5981.08 samples/sec Loss 13.9258 LearningRate 0.3856 Epoch: 2 Global Step: 24140 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:59:08,249-Speed 5990.47 samples/sec Loss 13.9046 LearningRate 0.3855 Epoch: 2 Global Step: 24150 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:59:15,136-Speed 5955.84 samples/sec Loss 13.8546 LearningRate 0.3855 Epoch: 2 Global Step: 24160 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:59:22,022-Speed 5984.48 samples/sec Loss 13.9088 LearningRate 0.3854 Epoch: 2 Global Step: 24170 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:59:28,871-Speed 5980.90 samples/sec Loss 13.8380 LearningRate 0.3854 Epoch: 2 Global Step: 24180 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:59:35,719-Speed 5982.54 samples/sec Loss 13.8560 LearningRate 0.3853 Epoch: 2 Global Step: 24190 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 00:59:42,563-Speed 5985.03 samples/sec Loss 14.0121 LearningRate 0.3853 Epoch: 2 Global Step: 24200 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:59:49,427-Speed 5968.31 samples/sec Loss 14.0285 LearningRate 0.3853 Epoch: 2 Global Step: 24210 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 00:59:56,282-Speed 5977.04 samples/sec Loss 14.0161 LearningRate 0.3852 Epoch: 2 Global Step: 24220 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:00:03,141-Speed 5973.09 samples/sec Loss 13.9133 LearningRate 0.3852 Epoch: 2 Global Step: 24230 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:00:09,984-Speed 5986.40 samples/sec Loss 13.9593 LearningRate 0.3851 Epoch: 2 Global Step: 24240 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:00:16,839-Speed 5976.65 samples/sec Loss 13.8857 LearningRate 0.3851 Epoch: 2 Global Step: 24250 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:00:23,682-Speed 5986.44 samples/sec Loss 13.9293 LearningRate 0.3850 Epoch: 2 Global Step: 24260 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:00:30,557-Speed 5958.84 samples/sec Loss 13.9162 LearningRate 0.3850 Epoch: 2 Global Step: 24270 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:00:37,412-Speed 5976.90 samples/sec Loss 13.9691 LearningRate 0.3850 Epoch: 2 Global Step: 24280 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:00:44,259-Speed 5982.60 samples/sec Loss 13.9539 LearningRate 0.3849 Epoch: 2 Global Step: 24290 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:00:51,112-Speed 5978.27 samples/sec Loss 13.9704 LearningRate 0.3849 Epoch: 2 Global Step: 24300 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:00:57,969-Speed 5974.29 samples/sec Loss 13.9181 LearningRate 0.3848 Epoch: 2 Global Step: 24310 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:01:04,858-Speed 5946.87 samples/sec Loss 13.9254 LearningRate 0.3848 Epoch: 2 Global Step: 24320 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:01:11,711-Speed 5977.26 samples/sec Loss 13.9765 LearningRate 0.3848 Epoch: 2 Global Step: 24330 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:01:18,569-Speed 5973.75 samples/sec Loss 13.9073 LearningRate 0.3847 Epoch: 2 Global Step: 24340 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:01:25,425-Speed 5975.37 samples/sec Loss 13.8822 LearningRate 0.3847 Epoch: 2 Global Step: 24350 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:01:32,276-Speed 5979.72 samples/sec Loss 13.9290 LearningRate 0.3846 Epoch: 2 Global Step: 24360 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:01:39,222-Speed 5900.35 samples/sec Loss 13.9210 LearningRate 0.3846 Epoch: 2 Global Step: 24370 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:01:46,078-Speed 5975.75 samples/sec Loss 13.9001 LearningRate 0.3845 Epoch: 2 Global Step: 24380 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:01:52,931-Speed 5977.34 samples/sec Loss 13.9704 LearningRate 0.3845 Epoch: 2 Global Step: 24390 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:01:59,772-Speed 5991.38 samples/sec Loss 13.9966 LearningRate 0.3845 Epoch: 2 Global Step: 24400 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:02:06,633-Speed 5970.79 samples/sec Loss 13.9597 LearningRate 0.3844 Epoch: 2 Global Step: 24410 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:02:13,506-Speed 5961.09 samples/sec Loss 13.8545 LearningRate 0.3844 Epoch: 2 Global Step: 24420 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:02:20,377-Speed 5962.87 samples/sec Loss 13.9128 LearningRate 0.3843 Epoch: 2 Global Step: 24430 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:02:27,232-Speed 5976.19 samples/sec Loss 13.8978 LearningRate 0.3843 Epoch: 2 Global Step: 24440 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:02:34,066-Speed 5995.48 samples/sec Loss 13.8966 LearningRate 0.3842 Epoch: 2 Global Step: 24450 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 01:02:40,915-Speed 5981.18 samples/sec Loss 13.9321 LearningRate 0.3842 Epoch: 2 Global Step: 24460 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 01:02:47,771-Speed 5976.19 samples/sec Loss 13.9218 LearningRate 0.3842 Epoch: 2 Global Step: 24470 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 01:02:54,630-Speed 5972.20 samples/sec Loss 13.9966 LearningRate 0.3841 Epoch: 2 Global Step: 24480 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 01:03:01,496-Speed 5967.14 samples/sec Loss 13.9001 LearningRate 0.3841 Epoch: 2 Global Step: 24490 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 01:03:08,346-Speed 5980.66 samples/sec Loss 13.8621 LearningRate 0.3840 Epoch: 2 Global Step: 24500 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 01:03:15,195-Speed 5980.96 samples/sec Loss 13.8579 LearningRate 0.3840 Epoch: 2 Global Step: 24510 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 01:03:22,046-Speed 5980.42 samples/sec Loss 13.9412 LearningRate 0.3840 Epoch: 2 Global Step: 24520 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 01:03:28,901-Speed 5975.52 samples/sec Loss 13.8765 LearningRate 0.3839 Epoch: 2 Global Step: 24530 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 01:03:35,751-Speed 5981.52 samples/sec Loss 13.9083 LearningRate 0.3839 Epoch: 2 Global Step: 24540 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 01:03:42,605-Speed 5977.71 samples/sec Loss 13.8435 LearningRate 0.3838 Epoch: 2 Global Step: 24550 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:03:49,459-Speed 5976.28 samples/sec Loss 13.8170 LearningRate 0.3838 Epoch: 2 Global Step: 24560 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:03:56,314-Speed 5976.59 samples/sec Loss 13.8885 LearningRate 0.3837 Epoch: 2 Global Step: 24570 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:04:03,163-Speed 5982.08 samples/sec Loss 13.8344 LearningRate 0.3837 Epoch: 2 Global Step: 24580 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:04:10,015-Speed 5979.18 samples/sec Loss 13.8733 LearningRate 0.3837 Epoch: 2 Global Step: 24590 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:04:16,896-Speed 5953.80 samples/sec Loss 13.8753 LearningRate 0.3836 Epoch: 2 Global Step: 24600 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:04:23,759-Speed 5969.16 samples/sec Loss 13.9488 LearningRate 0.3836 Epoch: 2 Global Step: 24610 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:04:30,629-Speed 5963.47 samples/sec Loss 13.8709 LearningRate 0.3835 Epoch: 2 Global Step: 24620 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:04:37,482-Speed 5978.37 samples/sec Loss 13.7750 LearningRate 0.3835 Epoch: 2 Global Step: 24630 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:04:44,332-Speed 5980.86 samples/sec Loss 13.8968 LearningRate 0.3834 Epoch: 2 Global Step: 24640 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:04:51,178-Speed 5985.16 samples/sec Loss 13.8664 LearningRate 0.3834 Epoch: 2 Global Step: 24650 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:04:58,045-Speed 5965.69 samples/sec Loss 13.8703 LearningRate 0.3834 Epoch: 2 Global Step: 24660 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:05:04,922-Speed 5957.75 samples/sec Loss 13.8455 LearningRate 0.3833 Epoch: 2 Global Step: 24670 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:05:11,780-Speed 5974.20 samples/sec Loss 13.8968 LearningRate 0.3833 Epoch: 2 Global Step: 24680 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:05:18,636-Speed 5975.25 samples/sec Loss 13.8740 LearningRate 0.3832 Epoch: 2 Global Step: 24690 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:05:25,502-Speed 5967.41 samples/sec Loss 13.9227 LearningRate 0.3832 Epoch: 2 Global Step: 24700 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:05:32,369-Speed 5966.22 samples/sec Loss 13.8361 LearningRate 0.3832 Epoch: 2 Global Step: 24710 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:05:39,246-Speed 5957.20 samples/sec Loss 13.8741 LearningRate 0.3831 Epoch: 2 Global Step: 24720 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:05:46,107-Speed 5970.49 samples/sec Loss 13.7809 LearningRate 0.3831 Epoch: 2 Global Step: 24730 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:05:52,976-Speed 5964.33 samples/sec Loss 13.9455 LearningRate 0.3830 Epoch: 2 Global Step: 24740 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:05:59,841-Speed 5968.08 samples/sec Loss 13.8800 LearningRate 0.3830 Epoch: 2 Global Step: 24750 Fp16 Grad Scale: 524288 Required: 36 hours Training: 2022-01-08 01:06:06,708-Speed 5966.21 samples/sec Loss 13.9415 LearningRate 0.3829 Epoch: 2 Global Step: 24760 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:06:13,564-Speed 5975.64 samples/sec Loss 13.8512 LearningRate 0.3829 Epoch: 2 Global Step: 24770 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:06:20,422-Speed 5973.17 samples/sec Loss 13.8832 LearningRate 0.3829 Epoch: 2 Global Step: 24780 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:06:27,334-Speed 5927.67 samples/sec Loss 13.9719 LearningRate 0.3828 Epoch: 2 Global Step: 24790 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:06:34,189-Speed 5975.90 samples/sec Loss 13.8622 LearningRate 0.3828 Epoch: 2 Global Step: 24800 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:06:41,056-Speed 5966.02 samples/sec Loss 13.7881 LearningRate 0.3827 Epoch: 2 Global Step: 24810 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:06:47,911-Speed 5976.49 samples/sec Loss 13.8267 LearningRate 0.3827 Epoch: 2 Global Step: 24820 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:06:54,753-Speed 5987.47 samples/sec Loss 13.7402 LearningRate 0.3827 Epoch: 2 Global Step: 24830 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:07:01,709-Speed 5891.89 samples/sec Loss 13.9169 LearningRate 0.3826 Epoch: 2 Global Step: 24840 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:07:08,577-Speed 5965.63 samples/sec Loss 13.8370 LearningRate 0.3826 Epoch: 2 Global Step: 24850 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:07:15,714-Speed 5739.55 samples/sec Loss 13.8606 LearningRate 0.3825 Epoch: 2 Global Step: 24860 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:07:22,557-Speed 5986.87 samples/sec Loss 13.7941 LearningRate 0.3825 Epoch: 2 Global Step: 24870 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:07:29,410-Speed 5978.69 samples/sec Loss 13.9272 LearningRate 0.3824 Epoch: 2 Global Step: 24880 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:07:36,277-Speed 5965.91 samples/sec Loss 13.8284 LearningRate 0.3824 Epoch: 2 Global Step: 24890 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:07:43,130-Speed 5977.91 samples/sec Loss 13.8517 LearningRate 0.3824 Epoch: 2 Global Step: 24900 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:07:49,999-Speed 5963.63 samples/sec Loss 13.8566 LearningRate 0.3823 Epoch: 2 Global Step: 24910 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:07:56,850-Speed 5980.25 samples/sec Loss 13.7817 LearningRate 0.3823 Epoch: 2 Global Step: 24920 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:08:03,730-Speed 5953.60 samples/sec Loss 13.7324 LearningRate 0.3822 Epoch: 2 Global Step: 24930 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:08:10,590-Speed 5972.35 samples/sec Loss 13.8816 LearningRate 0.3822 Epoch: 2 Global Step: 24940 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:08:17,448-Speed 5973.37 samples/sec Loss 13.8050 LearningRate 0.3821 Epoch: 2 Global Step: 24950 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:08:24,323-Speed 5958.55 samples/sec Loss 13.8838 LearningRate 0.3821 Epoch: 2 Global Step: 24960 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:08:31,193-Speed 5963.42 samples/sec Loss 13.8795 LearningRate 0.3821 Epoch: 2 Global Step: 24970 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:08:38,058-Speed 5967.32 samples/sec Loss 13.7811 LearningRate 0.3820 Epoch: 2 Global Step: 24980 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:08:44,923-Speed 5970.40 samples/sec Loss 13.8022 LearningRate 0.3820 Epoch: 2 Global Step: 24990 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:08:51,809-Speed 5949.24 samples/sec Loss 13.7938 LearningRate 0.3819 Epoch: 2 Global Step: 25000 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:09:19,057-[lfw][25000]XNorm: 22.584027 Training: 2022-01-08 01:09:19,058-[lfw][25000]Accuracy-Flip: 0.99650+-0.00252 Training: 2022-01-08 01:09:19,058-[lfw][25000]Accuracy-Highest: 0.99650 Training: 2022-01-08 01:09:50,553-[cfp_fp][25000]XNorm: 19.887459 Training: 2022-01-08 01:09:50,554-[cfp_fp][25000]Accuracy-Flip: 0.96957+-0.00823 Training: 2022-01-08 01:09:50,555-[cfp_fp][25000]Accuracy-Highest: 0.96957 Training: 2022-01-08 01:10:17,800-[agedb_30][25000]XNorm: 21.802798 Training: 2022-01-08 01:10:17,801-[agedb_30][25000]Accuracy-Flip: 0.95400+-0.01070 Training: 2022-01-08 01:10:17,802-[agedb_30][25000]Accuracy-Highest: 0.95400 Training: 2022-01-08 01:10:24,676-Speed 441.07 samples/sec Loss 13.7654 LearningRate 0.3819 Epoch: 2 Global Step: 25010 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:10:31,513-Speed 5992.25 samples/sec Loss 13.7905 LearningRate 0.3819 Epoch: 2 Global Step: 25020 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:10:38,348-Speed 5994.22 samples/sec Loss 13.8612 LearningRate 0.3818 Epoch: 2 Global Step: 25030 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:10:45,224-Speed 5958.01 samples/sec Loss 13.9032 LearningRate 0.3818 Epoch: 2 Global Step: 25040 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:10:52,082-Speed 5976.48 samples/sec Loss 13.8590 LearningRate 0.3817 Epoch: 2 Global Step: 25050 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:10:58,932-Speed 5980.32 samples/sec Loss 13.8154 LearningRate 0.3817 Epoch: 2 Global Step: 25060 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:11:05,799-Speed 5965.59 samples/sec Loss 13.9317 LearningRate 0.3816 Epoch: 2 Global Step: 25070 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:11:12,679-Speed 5955.00 samples/sec Loss 13.8109 LearningRate 0.3816 Epoch: 2 Global Step: 25080 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:11:26,647-Speed 2932.76 samples/sec Loss 13.8697 LearningRate 0.3816 Epoch: 2 Global Step: 25090 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:11:33,480-Speed 5995.28 samples/sec Loss 13.8030 LearningRate 0.3815 Epoch: 2 Global Step: 25100 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:11:40,325-Speed 5985.17 samples/sec Loss 13.7888 LearningRate 0.3815 Epoch: 2 Global Step: 25110 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:11:47,174-Speed 5981.50 samples/sec Loss 13.7858 LearningRate 0.3814 Epoch: 2 Global Step: 25120 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:11:54,021-Speed 5983.62 samples/sec Loss 13.8480 LearningRate 0.3814 Epoch: 2 Global Step: 25130 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:12:00,892-Speed 5962.76 samples/sec Loss 13.8866 LearningRate 0.3814 Epoch: 2 Global Step: 25140 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:12:07,754-Speed 5969.50 samples/sec Loss 13.7054 LearningRate 0.3813 Epoch: 2 Global Step: 25150 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:12:14,603-Speed 5983.54 samples/sec Loss 13.8872 LearningRate 0.3813 Epoch: 2 Global Step: 25160 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:12:21,459-Speed 5976.18 samples/sec Loss 13.7366 LearningRate 0.3812 Epoch: 2 Global Step: 25170 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:12:28,315-Speed 5975.28 samples/sec Loss 13.7538 LearningRate 0.3812 Epoch: 2 Global Step: 25180 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:12:35,170-Speed 5976.50 samples/sec Loss 13.7342 LearningRate 0.3811 Epoch: 2 Global Step: 25190 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:12:42,021-Speed 5979.92 samples/sec Loss 13.8342 LearningRate 0.3811 Epoch: 2 Global Step: 25200 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:12:48,879-Speed 5975.05 samples/sec Loss 13.7796 LearningRate 0.3811 Epoch: 2 Global Step: 25210 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:12:55,731-Speed 5978.64 samples/sec Loss 13.8360 LearningRate 0.3810 Epoch: 2 Global Step: 25220 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:13:02,579-Speed 5982.15 samples/sec Loss 13.8828 LearningRate 0.3810 Epoch: 2 Global Step: 25230 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:13:09,423-Speed 5986.42 samples/sec Loss 13.7699 LearningRate 0.3809 Epoch: 2 Global Step: 25240 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:13:16,274-Speed 5980.09 samples/sec Loss 13.7515 LearningRate 0.3809 Epoch: 2 Global Step: 25250 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:13:23,150-Speed 5958.22 samples/sec Loss 13.8051 LearningRate 0.3809 Epoch: 2 Global Step: 25260 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:13:30,001-Speed 5980.62 samples/sec Loss 13.8497 LearningRate 0.3808 Epoch: 2 Global Step: 25270 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:13:36,844-Speed 5986.95 samples/sec Loss 13.8222 LearningRate 0.3808 Epoch: 2 Global Step: 25280 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:13:43,689-Speed 5984.25 samples/sec Loss 13.8502 LearningRate 0.3807 Epoch: 2 Global Step: 25290 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:13:50,561-Speed 5961.99 samples/sec Loss 13.8057 LearningRate 0.3807 Epoch: 2 Global Step: 25300 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:13:57,412-Speed 5980.11 samples/sec Loss 13.7627 LearningRate 0.3806 Epoch: 2 Global Step: 25310 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:14:04,266-Speed 5976.66 samples/sec Loss 13.8250 LearningRate 0.3806 Epoch: 2 Global Step: 25320 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:14:11,117-Speed 5980.29 samples/sec Loss 13.7984 LearningRate 0.3806 Epoch: 2 Global Step: 25330 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:14:17,992-Speed 5958.33 samples/sec Loss 13.6955 LearningRate 0.3805 Epoch: 2 Global Step: 25340 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:14:24,835-Speed 5988.68 samples/sec Loss 13.7030 LearningRate 0.3805 Epoch: 2 Global Step: 25350 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:14:31,699-Speed 5968.53 samples/sec Loss 13.8835 LearningRate 0.3804 Epoch: 2 Global Step: 25360 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:14:38,538-Speed 5990.11 samples/sec Loss 13.8349 LearningRate 0.3804 Epoch: 2 Global Step: 25370 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:14:45,379-Speed 5988.08 samples/sec Loss 13.8715 LearningRate 0.3804 Epoch: 2 Global Step: 25380 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:14:52,235-Speed 5975.56 samples/sec Loss 13.7877 LearningRate 0.3803 Epoch: 2 Global Step: 25390 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:14:59,080-Speed 5985.32 samples/sec Loss 13.7411 LearningRate 0.3803 Epoch: 2 Global Step: 25400 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:15:05,923-Speed 5986.23 samples/sec Loss 13.8143 LearningRate 0.3802 Epoch: 2 Global Step: 25410 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:15:12,774-Speed 5980.07 samples/sec Loss 13.7165 LearningRate 0.3802 Epoch: 2 Global Step: 25420 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:15:19,627-Speed 5978.21 samples/sec Loss 13.8177 LearningRate 0.3801 Epoch: 2 Global Step: 25430 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:15:26,479-Speed 5978.62 samples/sec Loss 13.7081 LearningRate 0.3801 Epoch: 2 Global Step: 25440 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:15:33,327-Speed 5982.53 samples/sec Loss 13.8284 LearningRate 0.3801 Epoch: 2 Global Step: 25450 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:15:40,186-Speed 5973.38 samples/sec Loss 13.7661 LearningRate 0.3800 Epoch: 2 Global Step: 25460 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:15:47,045-Speed 5972.81 samples/sec Loss 13.8034 LearningRate 0.3800 Epoch: 2 Global Step: 25470 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:15:53,896-Speed 5979.93 samples/sec Loss 13.7365 LearningRate 0.3799 Epoch: 2 Global Step: 25480 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:16:00,751-Speed 5976.13 samples/sec Loss 13.7591 LearningRate 0.3799 Epoch: 2 Global Step: 25490 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:16:07,607-Speed 5975.53 samples/sec Loss 13.7670 LearningRate 0.3798 Epoch: 2 Global Step: 25500 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:16:14,443-Speed 5992.56 samples/sec Loss 13.7296 LearningRate 0.3798 Epoch: 2 Global Step: 25510 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:16:21,297-Speed 5976.81 samples/sec Loss 13.7742 LearningRate 0.3798 Epoch: 2 Global Step: 25520 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:16:28,145-Speed 5982.03 samples/sec Loss 13.8177 LearningRate 0.3797 Epoch: 2 Global Step: 25530 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:16:34,995-Speed 5980.95 samples/sec Loss 13.7881 LearningRate 0.3797 Epoch: 2 Global Step: 25540 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:16:41,847-Speed 5978.88 samples/sec Loss 13.7583 LearningRate 0.3796 Epoch: 2 Global Step: 25550 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:16:48,725-Speed 5956.70 samples/sec Loss 13.6832 LearningRate 0.3796 Epoch: 2 Global Step: 25560 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:16:55,571-Speed 5984.01 samples/sec Loss 13.7041 LearningRate 0.3796 Epoch: 2 Global Step: 25570 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:17:02,412-Speed 5988.06 samples/sec Loss 13.8197 LearningRate 0.3795 Epoch: 2 Global Step: 25580 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:17:09,258-Speed 5985.06 samples/sec Loss 13.7730 LearningRate 0.3795 Epoch: 2 Global Step: 25590 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:17:16,116-Speed 5972.95 samples/sec Loss 13.8609 LearningRate 0.3794 Epoch: 2 Global Step: 25600 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:17:22,965-Speed 5980.94 samples/sec Loss 13.7668 LearningRate 0.3794 Epoch: 2 Global Step: 25610 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:17:29,812-Speed 5983.69 samples/sec Loss 13.8410 LearningRate 0.3793 Epoch: 2 Global Step: 25620 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:17:36,656-Speed 5985.83 samples/sec Loss 13.7376 LearningRate 0.3793 Epoch: 2 Global Step: 25630 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:17:43,556-Speed 5938.06 samples/sec Loss 13.7392 LearningRate 0.3793 Epoch: 2 Global Step: 25640 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:17:50,415-Speed 5975.32 samples/sec Loss 13.8135 LearningRate 0.3792 Epoch: 2 Global Step: 25650 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:17:57,285-Speed 5964.65 samples/sec Loss 13.8126 LearningRate 0.3792 Epoch: 2 Global Step: 25660 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:18:04,158-Speed 5960.07 samples/sec Loss 13.7715 LearningRate 0.3791 Epoch: 2 Global Step: 25670 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:18:11,045-Speed 5948.65 samples/sec Loss 13.7694 LearningRate 0.3791 Epoch: 2 Global Step: 25680 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:18:17,903-Speed 5974.29 samples/sec Loss 13.7822 LearningRate 0.3791 Epoch: 2 Global Step: 25690 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:18:24,753-Speed 5980.95 samples/sec Loss 13.7723 LearningRate 0.3790 Epoch: 2 Global Step: 25700 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:18:31,608-Speed 5976.12 samples/sec Loss 13.7219 LearningRate 0.3790 Epoch: 2 Global Step: 25710 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:18:38,471-Speed 5970.13 samples/sec Loss 13.7251 LearningRate 0.3789 Epoch: 2 Global Step: 25720 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:18:45,325-Speed 5977.11 samples/sec Loss 13.7227 LearningRate 0.3789 Epoch: 2 Global Step: 25730 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:18:52,175-Speed 5980.07 samples/sec Loss 13.7346 LearningRate 0.3788 Epoch: 2 Global Step: 25740 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:18:59,065-Speed 5946.25 samples/sec Loss 13.7497 LearningRate 0.3788 Epoch: 2 Global Step: 25750 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:19:05,924-Speed 5973.64 samples/sec Loss 13.7231 LearningRate 0.3788 Epoch: 2 Global Step: 25760 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:19:12,799-Speed 5959.14 samples/sec Loss 13.8171 LearningRate 0.3787 Epoch: 2 Global Step: 25770 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:19:19,666-Speed 5965.53 samples/sec Loss 13.8529 LearningRate 0.3787 Epoch: 2 Global Step: 25780 Fp16 Grad Scale: 524288 Required: 36 hours Training: 2022-01-08 01:19:26,562-Speed 5940.57 samples/sec Loss 13.6829 LearningRate 0.3786 Epoch: 2 Global Step: 25790 Fp16 Grad Scale: 524288 Required: 36 hours Training: 2022-01-08 01:19:33,404-Speed 5988.48 samples/sec Loss 13.7335 LearningRate 0.3786 Epoch: 2 Global Step: 25800 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:19:40,290-Speed 5948.90 samples/sec Loss 13.6739 LearningRate 0.3786 Epoch: 2 Global Step: 25810 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:19:47,149-Speed 5975.46 samples/sec Loss 13.6900 LearningRate 0.3785 Epoch: 2 Global Step: 25820 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:19:54,012-Speed 5968.85 samples/sec Loss 13.6428 LearningRate 0.3785 Epoch: 2 Global Step: 25830 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:20:00,880-Speed 5964.67 samples/sec Loss 13.7507 LearningRate 0.3784 Epoch: 2 Global Step: 25840 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:20:07,772-Speed 5944.87 samples/sec Loss 13.8341 LearningRate 0.3784 Epoch: 2 Global Step: 25850 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:20:14,638-Speed 5966.70 samples/sec Loss 13.7054 LearningRate 0.3783 Epoch: 2 Global Step: 25860 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:20:22,094-Speed 5494.77 samples/sec Loss 13.7126 LearningRate 0.3783 Epoch: 2 Global Step: 25870 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:20:28,950-Speed 5975.82 samples/sec Loss 13.7429 LearningRate 0.3783 Epoch: 2 Global Step: 25880 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:20:35,823-Speed 5960.46 samples/sec Loss 13.7527 LearningRate 0.3782 Epoch: 2 Global Step: 25890 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:20:42,716-Speed 5943.83 samples/sec Loss 13.6022 LearningRate 0.3782 Epoch: 2 Global Step: 25900 Fp16 Grad Scale: 524288 Required: 36 hours Training: 2022-01-08 01:20:49,571-Speed 5975.95 samples/sec Loss 13.7720 LearningRate 0.3781 Epoch: 2 Global Step: 25910 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:20:56,424-Speed 5978.63 samples/sec Loss 13.7792 LearningRate 0.3781 Epoch: 2 Global Step: 25920 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:21:03,278-Speed 5976.28 samples/sec Loss 13.6985 LearningRate 0.3781 Epoch: 2 Global Step: 25930 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:21:10,154-Speed 5957.51 samples/sec Loss 13.7622 LearningRate 0.3780 Epoch: 2 Global Step: 25940 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:21:17,011-Speed 5975.01 samples/sec Loss 13.6394 LearningRate 0.3780 Epoch: 2 Global Step: 25950 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:21:23,866-Speed 5976.44 samples/sec Loss 13.7463 LearningRate 0.3779 Epoch: 2 Global Step: 25960 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:21:30,737-Speed 5962.09 samples/sec Loss 13.6804 LearningRate 0.3779 Epoch: 2 Global Step: 25970 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:21:37,620-Speed 5952.50 samples/sec Loss 13.6947 LearningRate 0.3778 Epoch: 2 Global Step: 25980 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:21:44,480-Speed 5971.66 samples/sec Loss 13.7531 LearningRate 0.3778 Epoch: 2 Global Step: 25990 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:21:51,331-Speed 5980.53 samples/sec Loss 13.6580 LearningRate 0.3778 Epoch: 2 Global Step: 26000 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:21:58,188-Speed 5974.12 samples/sec Loss 13.7681 LearningRate 0.3777 Epoch: 2 Global Step: 26010 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:22:05,037-Speed 5981.08 samples/sec Loss 13.5717 LearningRate 0.3777 Epoch: 2 Global Step: 26020 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:22:11,889-Speed 5978.92 samples/sec Loss 13.7537 LearningRate 0.3776 Epoch: 2 Global Step: 26030 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:22:18,750-Speed 5970.90 samples/sec Loss 13.6724 LearningRate 0.3776 Epoch: 2 Global Step: 26040 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 01:22:25,651-Speed 5937.17 samples/sec Loss 13.7839 LearningRate 0.3776 Epoch: 2 Global Step: 26050 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:22:32,511-Speed 5972.09 samples/sec Loss 13.7636 LearningRate 0.3775 Epoch: 2 Global Step: 26060 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:22:39,381-Speed 5965.15 samples/sec Loss 13.7235 LearningRate 0.3775 Epoch: 2 Global Step: 26070 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:22:46,237-Speed 5975.06 samples/sec Loss 13.6640 LearningRate 0.3774 Epoch: 2 Global Step: 26080 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:22:53,091-Speed 5978.16 samples/sec Loss 13.7696 LearningRate 0.3774 Epoch: 2 Global Step: 26090 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:22:59,973-Speed 5952.19 samples/sec Loss 13.7369 LearningRate 0.3773 Epoch: 2 Global Step: 26100 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:23:06,866-Speed 5943.79 samples/sec Loss 13.7038 LearningRate 0.3773 Epoch: 2 Global Step: 26110 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:23:13,720-Speed 5977.39 samples/sec Loss 13.7467 LearningRate 0.3773 Epoch: 2 Global Step: 26120 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:23:20,576-Speed 5974.89 samples/sec Loss 13.6879 LearningRate 0.3772 Epoch: 2 Global Step: 26130 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:23:27,472-Speed 5944.33 samples/sec Loss 13.6172 LearningRate 0.3772 Epoch: 2 Global Step: 26140 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:23:34,312-Speed 5989.65 samples/sec Loss 13.6925 LearningRate 0.3771 Epoch: 2 Global Step: 26150 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:23:41,185-Speed 5960.88 samples/sec Loss 13.7113 LearningRate 0.3771 Epoch: 2 Global Step: 26160 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:23:48,041-Speed 5975.93 samples/sec Loss 13.7046 LearningRate 0.3771 Epoch: 2 Global Step: 26170 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:23:54,896-Speed 5976.87 samples/sec Loss 13.7839 LearningRate 0.3770 Epoch: 2 Global Step: 26180 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:24:01,768-Speed 5961.31 samples/sec Loss 13.6862 LearningRate 0.3770 Epoch: 2 Global Step: 26190 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:24:08,632-Speed 5968.79 samples/sec Loss 13.6857 LearningRate 0.3769 Epoch: 2 Global Step: 26200 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:24:15,553-Speed 5919.05 samples/sec Loss 13.6483 LearningRate 0.3769 Epoch: 2 Global Step: 26210 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 01:24:22,463-Speed 5930.09 samples/sec Loss 13.7143 LearningRate 0.3768 Epoch: 2 Global Step: 26220 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:24:29,380-Speed 5923.16 samples/sec Loss 13.7059 LearningRate 0.3768 Epoch: 2 Global Step: 26230 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:24:36,283-Speed 5934.18 samples/sec Loss 13.6342 LearningRate 0.3768 Epoch: 2 Global Step: 26240 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:24:43,168-Speed 5950.38 samples/sec Loss 13.7483 LearningRate 0.3767 Epoch: 2 Global Step: 26250 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:24:50,014-Speed 5984.41 samples/sec Loss 13.7296 LearningRate 0.3767 Epoch: 2 Global Step: 26260 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:24:56,862-Speed 5982.28 samples/sec Loss 13.7665 LearningRate 0.3766 Epoch: 2 Global Step: 26270 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:25:03,707-Speed 5984.81 samples/sec Loss 13.7506 LearningRate 0.3766 Epoch: 2 Global Step: 26280 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:25:10,571-Speed 5968.08 samples/sec Loss 13.7810 LearningRate 0.3766 Epoch: 2 Global Step: 26290 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:25:17,447-Speed 5958.12 samples/sec Loss 13.6429 LearningRate 0.3765 Epoch: 2 Global Step: 26300 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:25:24,300-Speed 5979.18 samples/sec Loss 13.6361 LearningRate 0.3765 Epoch: 2 Global Step: 26310 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:25:33,187-Speed 5978.81 samples/sec Loss 13.6743 LearningRate 0.3764 Epoch: 2 Global Step: 26320 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:25:40,040-Speed 5978.05 samples/sec Loss 13.7593 LearningRate 0.3764 Epoch: 2 Global Step: 26330 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:25:46,908-Speed 5964.23 samples/sec Loss 13.7227 LearningRate 0.3763 Epoch: 2 Global Step: 26340 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:25:53,774-Speed 5967.58 samples/sec Loss 13.7956 LearningRate 0.3763 Epoch: 2 Global Step: 26350 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:26:00,647-Speed 5960.48 samples/sec Loss 13.6381 LearningRate 0.3763 Epoch: 2 Global Step: 26360 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:26:07,515-Speed 5965.31 samples/sec Loss 13.6688 LearningRate 0.3762 Epoch: 2 Global Step: 26370 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:26:14,365-Speed 5980.57 samples/sec Loss 13.7191 LearningRate 0.3762 Epoch: 2 Global Step: 26380 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:26:21,218-Speed 5977.88 samples/sec Loss 13.7467 LearningRate 0.3761 Epoch: 2 Global Step: 26390 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:26:28,104-Speed 5949.69 samples/sec Loss 13.8379 LearningRate 0.3761 Epoch: 2 Global Step: 26400 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:26:34,953-Speed 5981.45 samples/sec Loss 13.6658 LearningRate 0.3761 Epoch: 2 Global Step: 26410 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:26:41,825-Speed 5961.48 samples/sec Loss 13.6465 LearningRate 0.3760 Epoch: 2 Global Step: 26420 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:26:48,693-Speed 5965.68 samples/sec Loss 13.6978 LearningRate 0.3760 Epoch: 2 Global Step: 26430 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:26:55,542-Speed 5980.67 samples/sec Loss 13.7322 LearningRate 0.3759 Epoch: 2 Global Step: 26440 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:27:02,394-Speed 5978.87 samples/sec Loss 13.6750 LearningRate 0.3759 Epoch: 2 Global Step: 26450 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:27:09,255-Speed 5973.63 samples/sec Loss 13.7441 LearningRate 0.3758 Epoch: 2 Global Step: 26460 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:27:16,117-Speed 5970.80 samples/sec Loss 13.7223 LearningRate 0.3758 Epoch: 2 Global Step: 26470 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:27:22,988-Speed 5961.85 samples/sec Loss 13.6644 LearningRate 0.3758 Epoch: 2 Global Step: 26480 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:27:29,844-Speed 5975.65 samples/sec Loss 13.5712 LearningRate 0.3757 Epoch: 2 Global Step: 26490 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:27:36,701-Speed 5975.17 samples/sec Loss 13.6912 LearningRate 0.3757 Epoch: 2 Global Step: 26500 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:27:43,563-Speed 5970.91 samples/sec Loss 13.6566 LearningRate 0.3756 Epoch: 2 Global Step: 26510 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:27:50,417-Speed 5976.70 samples/sec Loss 13.6912 LearningRate 0.3756 Epoch: 2 Global Step: 26520 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:27:57,301-Speed 5951.15 samples/sec Loss 13.6387 LearningRate 0.3756 Epoch: 2 Global Step: 26530 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:28:04,173-Speed 5961.36 samples/sec Loss 13.5748 LearningRate 0.3755 Epoch: 2 Global Step: 26540 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:28:11,040-Speed 5966.28 samples/sec Loss 13.6015 LearningRate 0.3755 Epoch: 2 Global Step: 26550 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:28:17,890-Speed 5980.76 samples/sec Loss 13.5567 LearningRate 0.3754 Epoch: 2 Global Step: 26560 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:28:24,750-Speed 5971.89 samples/sec Loss 13.6832 LearningRate 0.3754 Epoch: 2 Global Step: 26570 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:28:31,612-Speed 5970.01 samples/sec Loss 13.6644 LearningRate 0.3754 Epoch: 2 Global Step: 26580 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:28:38,471-Speed 5973.62 samples/sec Loss 13.5560 LearningRate 0.3753 Epoch: 2 Global Step: 26590 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:28:45,338-Speed 5966.22 samples/sec Loss 13.7581 LearningRate 0.3753 Epoch: 2 Global Step: 26600 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:28:52,201-Speed 5968.49 samples/sec Loss 13.7183 LearningRate 0.3752 Epoch: 2 Global Step: 26610 Fp16 Grad Scale: 524288 Required: 35 hours Training: 2022-01-08 01:28:59,079-Speed 5958.26 samples/sec Loss 13.5831 LearningRate 0.3752 Epoch: 2 Global Step: 26620 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:29:05,942-Speed 5969.69 samples/sec Loss 13.7186 LearningRate 0.3751 Epoch: 2 Global Step: 26630 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:29:12,800-Speed 5973.56 samples/sec Loss 13.6775 LearningRate 0.3751 Epoch: 2 Global Step: 26640 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:29:19,654-Speed 5977.25 samples/sec Loss 13.6092 LearningRate 0.3751 Epoch: 2 Global Step: 26650 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:29:26,516-Speed 5970.31 samples/sec Loss 13.6159 LearningRate 0.3750 Epoch: 2 Global Step: 26660 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:29:33,375-Speed 5973.45 samples/sec Loss 13.7130 LearningRate 0.3750 Epoch: 2 Global Step: 26670 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:29:40,253-Speed 5960.20 samples/sec Loss 13.6725 LearningRate 0.3749 Epoch: 2 Global Step: 26680 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:29:47,108-Speed 5976.07 samples/sec Loss 13.6422 LearningRate 0.3749 Epoch: 2 Global Step: 26690 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:29:53,964-Speed 5975.15 samples/sec Loss 13.5937 LearningRate 0.3749 Epoch: 2 Global Step: 26700 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:30:00,825-Speed 5971.18 samples/sec Loss 13.6038 LearningRate 0.3748 Epoch: 2 Global Step: 26710 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:30:07,705-Speed 5954.59 samples/sec Loss 13.6693 LearningRate 0.3748 Epoch: 2 Global Step: 26720 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:30:14,584-Speed 5956.51 samples/sec Loss 13.6837 LearningRate 0.3747 Epoch: 2 Global Step: 26730 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:30:21,552-Speed 5879.76 samples/sec Loss 13.5883 LearningRate 0.3747 Epoch: 2 Global Step: 26740 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:30:28,442-Speed 5946.26 samples/sec Loss 13.6603 LearningRate 0.3746 Epoch: 2 Global Step: 26750 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:30:35,310-Speed 5964.64 samples/sec Loss 13.6101 LearningRate 0.3746 Epoch: 2 Global Step: 26760 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:30:42,162-Speed 5979.19 samples/sec Loss 13.6753 LearningRate 0.3746 Epoch: 2 Global Step: 26770 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:30:49,033-Speed 5962.79 samples/sec Loss 13.6268 LearningRate 0.3745 Epoch: 2 Global Step: 26780 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:30:55,920-Speed 5948.76 samples/sec Loss 13.6313 LearningRate 0.3745 Epoch: 2 Global Step: 26790 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:31:02,778-Speed 5973.43 samples/sec Loss 13.6017 LearningRate 0.3744 Epoch: 2 Global Step: 26800 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:31:09,632-Speed 5977.23 samples/sec Loss 13.6329 LearningRate 0.3744 Epoch: 2 Global Step: 26810 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:31:16,486-Speed 5977.81 samples/sec Loss 13.6434 LearningRate 0.3744 Epoch: 2 Global Step: 26820 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:31:23,338-Speed 5979.68 samples/sec Loss 13.5879 LearningRate 0.3743 Epoch: 2 Global Step: 26830 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:31:30,199-Speed 5970.64 samples/sec Loss 13.6624 LearningRate 0.3743 Epoch: 2 Global Step: 26840 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:31:37,065-Speed 5968.58 samples/sec Loss 13.6062 LearningRate 0.3742 Epoch: 2 Global Step: 26850 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:31:43,919-Speed 5977.09 samples/sec Loss 13.6581 LearningRate 0.3742 Epoch: 2 Global Step: 26860 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:31:50,784-Speed 5968.21 samples/sec Loss 13.6614 LearningRate 0.3741 Epoch: 2 Global Step: 26870 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:31:57,659-Speed 5958.51 samples/sec Loss 13.6304 LearningRate 0.3741 Epoch: 2 Global Step: 26880 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:32:04,542-Speed 5951.06 samples/sec Loss 13.6447 LearningRate 0.3741 Epoch: 2 Global Step: 26890 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:32:11,401-Speed 5972.51 samples/sec Loss 13.6507 LearningRate 0.3740 Epoch: 2 Global Step: 26900 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:32:18,265-Speed 5970.33 samples/sec Loss 13.5275 LearningRate 0.3740 Epoch: 2 Global Step: 26910 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:32:25,124-Speed 5972.68 samples/sec Loss 13.5761 LearningRate 0.3739 Epoch: 2 Global Step: 26920 Fp16 Grad Scale: 524288 Required: 35 hours Training: 2022-01-08 01:32:31,992-Speed 5964.72 samples/sec Loss 13.5697 LearningRate 0.3739 Epoch: 2 Global Step: 26930 Fp16 Grad Scale: 524288 Required: 35 hours Training: 2022-01-08 01:32:38,844-Speed 5979.18 samples/sec Loss 13.6483 LearningRate 0.3739 Epoch: 2 Global Step: 26940 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:32:45,695-Speed 5979.02 samples/sec Loss 13.4729 LearningRate 0.3738 Epoch: 2 Global Step: 26950 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:32:52,546-Speed 5980.06 samples/sec Loss 13.5383 LearningRate 0.3738 Epoch: 2 Global Step: 26960 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:32:59,400-Speed 5976.75 samples/sec Loss 13.6330 LearningRate 0.3737 Epoch: 2 Global Step: 26970 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:33:06,255-Speed 5977.19 samples/sec Loss 13.6260 LearningRate 0.3737 Epoch: 2 Global Step: 26980 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:33:13,084-Speed 5998.28 samples/sec Loss 13.5772 LearningRate 0.3737 Epoch: 2 Global Step: 26990 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 01:33:19,935-Speed 5980.41 samples/sec Loss 13.6540 LearningRate 0.3736 Epoch: 2 Global Step: 27000 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 01:33:26,782-Speed 5982.67 samples/sec Loss 13.6177 LearningRate 0.3736 Epoch: 2 Global Step: 27010 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 01:33:33,656-Speed 5960.48 samples/sec Loss 13.6007 LearningRate 0.3735 Epoch: 2 Global Step: 27020 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 01:33:40,511-Speed 5975.91 samples/sec Loss 13.5825 LearningRate 0.3735 Epoch: 2 Global Step: 27030 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 01:33:47,358-Speed 5983.35 samples/sec Loss 13.5947 LearningRate 0.3734 Epoch: 2 Global Step: 27040 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 01:33:54,214-Speed 5975.79 samples/sec Loss 13.6225 LearningRate 0.3734 Epoch: 2 Global Step: 27050 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 01:34:01,072-Speed 5973.66 samples/sec Loss 13.5882 LearningRate 0.3734 Epoch: 2 Global Step: 27060 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 01:34:07,921-Speed 5982.41 samples/sec Loss 13.7022 LearningRate 0.3733 Epoch: 2 Global Step: 27070 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 01:34:14,779-Speed 5974.15 samples/sec Loss 13.6660 LearningRate 0.3733 Epoch: 2 Global Step: 27080 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 01:34:21,626-Speed 5982.79 samples/sec Loss 13.5882 LearningRate 0.3732 Epoch: 2 Global Step: 27090 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:34:28,493-Speed 5966.53 samples/sec Loss 13.5939 LearningRate 0.3732 Epoch: 2 Global Step: 27100 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:34:35,353-Speed 5971.99 samples/sec Loss 13.6947 LearningRate 0.3732 Epoch: 2 Global Step: 27110 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:34:42,204-Speed 5979.63 samples/sec Loss 13.5980 LearningRate 0.3731 Epoch: 2 Global Step: 27120 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:34:49,063-Speed 5972.76 samples/sec Loss 13.5703 LearningRate 0.3731 Epoch: 2 Global Step: 27130 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:34:55,921-Speed 5973.80 samples/sec Loss 13.5803 LearningRate 0.3730 Epoch: 2 Global Step: 27140 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:35:02,785-Speed 5968.57 samples/sec Loss 13.6238 LearningRate 0.3730 Epoch: 2 Global Step: 27150 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:35:09,639-Speed 5977.68 samples/sec Loss 13.6382 LearningRate 0.3729 Epoch: 2 Global Step: 27160 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:35:16,484-Speed 5983.93 samples/sec Loss 13.6828 LearningRate 0.3729 Epoch: 2 Global Step: 27170 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:35:23,362-Speed 5957.03 samples/sec Loss 13.5840 LearningRate 0.3729 Epoch: 2 Global Step: 27180 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:35:30,218-Speed 5975.53 samples/sec Loss 13.5437 LearningRate 0.3728 Epoch: 2 Global Step: 27190 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:35:37,089-Speed 5961.42 samples/sec Loss 13.6370 LearningRate 0.3728 Epoch: 2 Global Step: 27200 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:35:43,955-Speed 5967.05 samples/sec Loss 13.6725 LearningRate 0.3727 Epoch: 2 Global Step: 27210 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:35:50,790-Speed 5993.95 samples/sec Loss 13.6890 LearningRate 0.3727 Epoch: 2 Global Step: 27220 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 01:35:57,657-Speed 5965.49 samples/sec Loss 13.5851 LearningRate 0.3727 Epoch: 2 Global Step: 27230 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 01:36:04,513-Speed 5976.12 samples/sec Loss 13.5831 LearningRate 0.3726 Epoch: 2 Global Step: 27240 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 01:36:11,678-Speed 5717.63 samples/sec Loss 13.6335 LearningRate 0.3726 Epoch: 2 Global Step: 27250 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 01:36:18,568-Speed 5945.63 samples/sec Loss 13.5923 LearningRate 0.3725 Epoch: 2 Global Step: 27260 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 01:36:25,439-Speed 5962.97 samples/sec Loss 13.6661 LearningRate 0.3725 Epoch: 2 Global Step: 27270 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 01:36:32,304-Speed 5968.07 samples/sec Loss 13.5430 LearningRate 0.3725 Epoch: 2 Global Step: 27280 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 01:36:39,183-Speed 5954.90 samples/sec Loss 13.5820 LearningRate 0.3724 Epoch: 2 Global Step: 27290 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 01:36:46,037-Speed 5977.97 samples/sec Loss 13.5875 LearningRate 0.3724 Epoch: 2 Global Step: 27300 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 01:36:52,920-Speed 5966.02 samples/sec Loss 13.5783 LearningRate 0.3723 Epoch: 2 Global Step: 27310 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 01:36:59,793-Speed 5961.45 samples/sec Loss 13.5867 LearningRate 0.3723 Epoch: 2 Global Step: 27320 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:37:06,652-Speed 5972.71 samples/sec Loss 13.5410 LearningRate 0.3722 Epoch: 2 Global Step: 27330 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:37:13,527-Speed 5959.99 samples/sec Loss 13.6728 LearningRate 0.3722 Epoch: 2 Global Step: 27340 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:37:20,399-Speed 5961.71 samples/sec Loss 13.6069 LearningRate 0.3722 Epoch: 2 Global Step: 27350 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:37:27,288-Speed 5947.47 samples/sec Loss 13.5980 LearningRate 0.3721 Epoch: 2 Global Step: 27360 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:37:34,140-Speed 5978.86 samples/sec Loss 13.5655 LearningRate 0.3721 Epoch: 2 Global Step: 27370 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:37:41,006-Speed 5966.86 samples/sec Loss 13.5608 LearningRate 0.3720 Epoch: 2 Global Step: 27380 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:37:47,864-Speed 5973.83 samples/sec Loss 13.5247 LearningRate 0.3720 Epoch: 2 Global Step: 27390 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:37:54,760-Speed 5944.63 samples/sec Loss 13.5848 LearningRate 0.3720 Epoch: 2 Global Step: 27400 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:38:01,627-Speed 5965.74 samples/sec Loss 13.5632 LearningRate 0.3719 Epoch: 2 Global Step: 27410 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:38:08,488-Speed 5970.46 samples/sec Loss 13.5995 LearningRate 0.3719 Epoch: 2 Global Step: 27420 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:38:15,387-Speed 5938.92 samples/sec Loss 13.5097 LearningRate 0.3718 Epoch: 2 Global Step: 27430 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:38:22,235-Speed 5982.16 samples/sec Loss 13.5352 LearningRate 0.3718 Epoch: 2 Global Step: 27440 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:38:29,098-Speed 5971.46 samples/sec Loss 13.4810 LearningRate 0.3717 Epoch: 2 Global Step: 27450 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:38:35,959-Speed 5974.27 samples/sec Loss 13.5736 LearningRate 0.3717 Epoch: 2 Global Step: 27460 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:38:42,829-Speed 5963.17 samples/sec Loss 13.5299 LearningRate 0.3717 Epoch: 2 Global Step: 27470 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:38:49,699-Speed 5962.83 samples/sec Loss 13.5963 LearningRate 0.3716 Epoch: 2 Global Step: 27480 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:38:56,571-Speed 5961.42 samples/sec Loss 13.5331 LearningRate 0.3716 Epoch: 2 Global Step: 27490 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:39:03,425-Speed 5977.94 samples/sec Loss 13.5437 LearningRate 0.3715 Epoch: 2 Global Step: 27500 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:39:10,399-Speed 5874.40 samples/sec Loss 13.6184 LearningRate 0.3715 Epoch: 2 Global Step: 27510 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:39:17,278-Speed 5958.76 samples/sec Loss 13.6004 LearningRate 0.3715 Epoch: 2 Global Step: 27520 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:39:24,129-Speed 5981.35 samples/sec Loss 13.5138 LearningRate 0.3714 Epoch: 2 Global Step: 27530 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:39:30,974-Speed 5986.81 samples/sec Loss 13.5524 LearningRate 0.3714 Epoch: 2 Global Step: 27540 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:39:37,845-Speed 5962.41 samples/sec Loss 13.6089 LearningRate 0.3713 Epoch: 2 Global Step: 27550 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:39:44,703-Speed 5973.31 samples/sec Loss 13.5777 LearningRate 0.3713 Epoch: 2 Global Step: 27560 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:39:51,559-Speed 5976.06 samples/sec Loss 13.5756 LearningRate 0.3713 Epoch: 2 Global Step: 27570 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:39:58,417-Speed 5973.01 samples/sec Loss 13.5295 LearningRate 0.3712 Epoch: 2 Global Step: 27580 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:40:05,299-Speed 5953.57 samples/sec Loss 13.5208 LearningRate 0.3712 Epoch: 2 Global Step: 27590 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:40:12,149-Speed 5980.63 samples/sec Loss 13.5552 LearningRate 0.3711 Epoch: 2 Global Step: 27600 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:40:19,004-Speed 5976.21 samples/sec Loss 13.5900 LearningRate 0.3711 Epoch: 2 Global Step: 27610 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:40:25,857-Speed 5978.28 samples/sec Loss 13.5339 LearningRate 0.3710 Epoch: 2 Global Step: 27620 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:40:32,707-Speed 5980.40 samples/sec Loss 13.5503 LearningRate 0.3710 Epoch: 2 Global Step: 27630 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:40:39,560-Speed 5977.67 samples/sec Loss 13.5785 LearningRate 0.3710 Epoch: 2 Global Step: 27640 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:40:46,396-Speed 5993.44 samples/sec Loss 13.5139 LearningRate 0.3709 Epoch: 2 Global Step: 27650 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:40:53,234-Speed 5990.79 samples/sec Loss 13.5656 LearningRate 0.3709 Epoch: 2 Global Step: 27660 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:41:00,075-Speed 5989.99 samples/sec Loss 13.5723 LearningRate 0.3708 Epoch: 2 Global Step: 27670 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:41:06,942-Speed 5966.00 samples/sec Loss 13.4718 LearningRate 0.3708 Epoch: 2 Global Step: 27680 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:41:13,794-Speed 5979.06 samples/sec Loss 13.5156 LearningRate 0.3708 Epoch: 2 Global Step: 27690 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:41:20,655-Speed 5971.54 samples/sec Loss 13.6010 LearningRate 0.3707 Epoch: 2 Global Step: 27700 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:41:27,512-Speed 5976.60 samples/sec Loss 13.6054 LearningRate 0.3707 Epoch: 2 Global Step: 27710 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:41:34,360-Speed 5982.77 samples/sec Loss 13.6347 LearningRate 0.3706 Epoch: 2 Global Step: 27720 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:41:41,234-Speed 5961.84 samples/sec Loss 13.6214 LearningRate 0.3706 Epoch: 2 Global Step: 27730 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:41:48,087-Speed 5977.99 samples/sec Loss 13.5388 LearningRate 0.3706 Epoch: 2 Global Step: 27740 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:41:54,950-Speed 5969.09 samples/sec Loss 13.5262 LearningRate 0.3705 Epoch: 2 Global Step: 27750 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:42:01,822-Speed 5961.24 samples/sec Loss 13.5633 LearningRate 0.3705 Epoch: 2 Global Step: 27760 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:42:08,675-Speed 5978.28 samples/sec Loss 13.5856 LearningRate 0.3704 Epoch: 2 Global Step: 27770 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:42:15,521-Speed 5984.48 samples/sec Loss 13.5831 LearningRate 0.3704 Epoch: 2 Global Step: 27780 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:42:22,393-Speed 5961.29 samples/sec Loss 13.5050 LearningRate 0.3703 Epoch: 2 Global Step: 27790 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:42:29,257-Speed 5968.79 samples/sec Loss 13.4935 LearningRate 0.3703 Epoch: 2 Global Step: 27800 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:42:36,110-Speed 5978.33 samples/sec Loss 13.6546 LearningRate 0.3703 Epoch: 2 Global Step: 27810 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:42:42,959-Speed 5981.96 samples/sec Loss 13.5166 LearningRate 0.3702 Epoch: 2 Global Step: 27820 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:42:49,798-Speed 5989.35 samples/sec Loss 13.5519 LearningRate 0.3702 Epoch: 2 Global Step: 27830 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:42:56,644-Speed 5984.77 samples/sec Loss 13.4997 LearningRate 0.3701 Epoch: 2 Global Step: 27840 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:43:03,521-Speed 5956.30 samples/sec Loss 13.4868 LearningRate 0.3701 Epoch: 2 Global Step: 27850 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:43:10,371-Speed 5980.80 samples/sec Loss 13.4742 LearningRate 0.3701 Epoch: 2 Global Step: 27860 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:43:17,218-Speed 5983.41 samples/sec Loss 13.5114 LearningRate 0.3700 Epoch: 2 Global Step: 27870 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:43:24,080-Speed 5970.18 samples/sec Loss 13.4435 LearningRate 0.3700 Epoch: 2 Global Step: 27880 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:43:30,949-Speed 5964.55 samples/sec Loss 13.5445 LearningRate 0.3699 Epoch: 2 Global Step: 27890 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:43:37,808-Speed 5972.50 samples/sec Loss 13.5522 LearningRate 0.3699 Epoch: 2 Global Step: 27900 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:43:44,649-Speed 5988.49 samples/sec Loss 13.4796 LearningRate 0.3698 Epoch: 2 Global Step: 27910 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:43:51,512-Speed 5970.00 samples/sec Loss 13.4651 LearningRate 0.3698 Epoch: 2 Global Step: 27920 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:43:58,362-Speed 5979.93 samples/sec Loss 13.4314 LearningRate 0.3698 Epoch: 2 Global Step: 27930 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:44:05,216-Speed 5977.08 samples/sec Loss 13.5116 LearningRate 0.3697 Epoch: 2 Global Step: 27940 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:44:12,070-Speed 5977.39 samples/sec Loss 13.5071 LearningRate 0.3697 Epoch: 2 Global Step: 27950 Fp16 Grad Scale: 524288 Required: 35 hours Training: 2022-01-08 01:44:18,918-Speed 5982.26 samples/sec Loss 13.4833 LearningRate 0.3696 Epoch: 2 Global Step: 27960 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:44:25,765-Speed 5982.83 samples/sec Loss 13.6038 LearningRate 0.3696 Epoch: 2 Global Step: 27970 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:44:32,620-Speed 5976.27 samples/sec Loss 13.5886 LearningRate 0.3696 Epoch: 2 Global Step: 27980 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:44:39,507-Speed 5949.49 samples/sec Loss 13.5425 LearningRate 0.3695 Epoch: 2 Global Step: 27990 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:44:46,361-Speed 5976.60 samples/sec Loss 13.4736 LearningRate 0.3695 Epoch: 2 Global Step: 28000 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:44:53,289-Speed 5913.29 samples/sec Loss 13.4979 LearningRate 0.3694 Epoch: 2 Global Step: 28010 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:45:00,154-Speed 5967.38 samples/sec Loss 13.4849 LearningRate 0.3694 Epoch: 2 Global Step: 28020 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:45:07,023-Speed 5964.70 samples/sec Loss 13.4291 LearningRate 0.3694 Epoch: 2 Global Step: 28030 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:45:13,877-Speed 5976.49 samples/sec Loss 13.4423 LearningRate 0.3693 Epoch: 2 Global Step: 28040 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:45:20,733-Speed 5975.60 samples/sec Loss 13.5695 LearningRate 0.3693 Epoch: 2 Global Step: 28050 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:45:27,587-Speed 5977.67 samples/sec Loss 13.4650 LearningRate 0.3692 Epoch: 2 Global Step: 28060 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:45:34,429-Speed 5986.78 samples/sec Loss 13.4988 LearningRate 0.3692 Epoch: 2 Global Step: 28070 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:45:41,282-Speed 5978.53 samples/sec Loss 13.5946 LearningRate 0.3691 Epoch: 2 Global Step: 28080 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:45:48,150-Speed 5965.50 samples/sec Loss 13.4639 LearningRate 0.3691 Epoch: 2 Global Step: 28090 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:45:55,010-Speed 5971.61 samples/sec Loss 13.5149 LearningRate 0.3691 Epoch: 2 Global Step: 28100 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:46:01,864-Speed 5977.11 samples/sec Loss 13.7073 LearningRate 0.3690 Epoch: 2 Global Step: 28110 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:46:08,741-Speed 5956.99 samples/sec Loss 13.5509 LearningRate 0.3690 Epoch: 2 Global Step: 28120 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:46:15,604-Speed 5969.81 samples/sec Loss 13.4748 LearningRate 0.3689 Epoch: 2 Global Step: 28130 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:46:22,468-Speed 5969.00 samples/sec Loss 13.4289 LearningRate 0.3689 Epoch: 2 Global Step: 28140 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:46:29,323-Speed 5976.13 samples/sec Loss 13.5339 LearningRate 0.3689 Epoch: 2 Global Step: 28150 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:46:36,276-Speed 5892.93 samples/sec Loss 13.4037 LearningRate 0.3688 Epoch: 2 Global Step: 28160 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:46:43,211-Speed 5908.04 samples/sec Loss 13.4124 LearningRate 0.3688 Epoch: 2 Global Step: 28170 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:46:50,078-Speed 5969.27 samples/sec Loss 13.4899 LearningRate 0.3687 Epoch: 2 Global Step: 28180 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:46:56,944-Speed 5966.82 samples/sec Loss 13.5522 LearningRate 0.3687 Epoch: 2 Global Step: 28190 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:47:03,813-Speed 5964.38 samples/sec Loss 13.5106 LearningRate 0.3687 Epoch: 2 Global Step: 28200 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:47:10,690-Speed 5956.88 samples/sec Loss 13.5581 LearningRate 0.3686 Epoch: 2 Global Step: 28210 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:47:17,550-Speed 5972.29 samples/sec Loss 13.4211 LearningRate 0.3686 Epoch: 2 Global Step: 28220 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:47:24,410-Speed 5972.13 samples/sec Loss 13.4514 LearningRate 0.3685 Epoch: 2 Global Step: 28230 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:47:31,273-Speed 5969.49 samples/sec Loss 13.4742 LearningRate 0.3685 Epoch: 2 Global Step: 28240 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:47:38,150-Speed 5959.84 samples/sec Loss 13.5331 LearningRate 0.3684 Epoch: 2 Global Step: 28250 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:47:45,010-Speed 5971.41 samples/sec Loss 13.5776 LearningRate 0.3684 Epoch: 2 Global Step: 28260 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:47:51,872-Speed 5972.05 samples/sec Loss 13.4974 LearningRate 0.3684 Epoch: 2 Global Step: 28270 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:47:58,730-Speed 5973.92 samples/sec Loss 13.5702 LearningRate 0.3683 Epoch: 2 Global Step: 28280 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:48:05,594-Speed 5967.98 samples/sec Loss 13.4359 LearningRate 0.3683 Epoch: 2 Global Step: 28290 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:48:12,454-Speed 5972.29 samples/sec Loss 13.4110 LearningRate 0.3682 Epoch: 2 Global Step: 28300 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:48:19,317-Speed 5969.29 samples/sec Loss 13.4307 LearningRate 0.3682 Epoch: 2 Global Step: 28310 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:48:26,170-Speed 5978.04 samples/sec Loss 13.4532 LearningRate 0.3682 Epoch: 2 Global Step: 28320 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:48:33,022-Speed 5979.32 samples/sec Loss 13.4922 LearningRate 0.3681 Epoch: 2 Global Step: 28330 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:48:39,870-Speed 5982.36 samples/sec Loss 13.5094 LearningRate 0.3681 Epoch: 2 Global Step: 28340 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:48:46,731-Speed 5970.93 samples/sec Loss 13.5004 LearningRate 0.3680 Epoch: 2 Global Step: 28350 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:48:53,586-Speed 5977.37 samples/sec Loss 13.4173 LearningRate 0.3680 Epoch: 2 Global Step: 28360 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:49:00,438-Speed 5978.84 samples/sec Loss 13.5056 LearningRate 0.3680 Epoch: 2 Global Step: 28370 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:49:07,297-Speed 5973.27 samples/sec Loss 13.4855 LearningRate 0.3679 Epoch: 2 Global Step: 28380 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:49:14,281-Speed 5866.21 samples/sec Loss 13.4152 LearningRate 0.3679 Epoch: 2 Global Step: 28390 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:49:21,274-Speed 5858.54 samples/sec Loss 13.4446 LearningRate 0.3678 Epoch: 2 Global Step: 28400 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:49:28,124-Speed 5981.52 samples/sec Loss 13.5148 LearningRate 0.3678 Epoch: 2 Global Step: 28410 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:49:34,983-Speed 5972.35 samples/sec Loss 13.3985 LearningRate 0.3678 Epoch: 2 Global Step: 28420 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:49:41,840-Speed 5975.20 samples/sec Loss 13.4651 LearningRate 0.3677 Epoch: 2 Global Step: 28430 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:49:48,712-Speed 5960.98 samples/sec Loss 13.5179 LearningRate 0.3677 Epoch: 2 Global Step: 28440 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:49:55,579-Speed 5965.85 samples/sec Loss 13.4557 LearningRate 0.3676 Epoch: 2 Global Step: 28450 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:50:02,424-Speed 5985.17 samples/sec Loss 13.3985 LearningRate 0.3676 Epoch: 2 Global Step: 28460 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 01:50:09,281-Speed 5974.47 samples/sec Loss 13.5578 LearningRate 0.3675 Epoch: 2 Global Step: 28470 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 01:50:16,154-Speed 5960.80 samples/sec Loss 13.5788 LearningRate 0.3675 Epoch: 2 Global Step: 28480 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 01:50:23,003-Speed 5981.86 samples/sec Loss 13.3677 LearningRate 0.3675 Epoch: 2 Global Step: 28490 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 01:50:29,852-Speed 5981.57 samples/sec Loss 13.4260 LearningRate 0.3674 Epoch: 2 Global Step: 28500 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 01:50:36,813-Speed 5885.26 samples/sec Loss 13.4868 LearningRate 0.3674 Epoch: 2 Global Step: 28510 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 01:50:43,734-Speed 5919.34 samples/sec Loss 13.4074 LearningRate 0.3673 Epoch: 2 Global Step: 28520 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 01:50:50,642-Speed 5930.35 samples/sec Loss 13.5420 LearningRate 0.3673 Epoch: 2 Global Step: 28530 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 01:50:57,492-Speed 5981.01 samples/sec Loss 13.5446 LearningRate 0.3673 Epoch: 2 Global Step: 28540 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 01:51:04,350-Speed 5973.22 samples/sec Loss 13.5001 LearningRate 0.3672 Epoch: 2 Global Step: 28550 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 01:51:11,206-Speed 5975.35 samples/sec Loss 13.4289 LearningRate 0.3672 Epoch: 2 Global Step: 28560 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:51:18,050-Speed 5986.73 samples/sec Loss 13.5487 LearningRate 0.3671 Epoch: 2 Global Step: 28570 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:51:24,930-Speed 5954.26 samples/sec Loss 13.4304 LearningRate 0.3671 Epoch: 2 Global Step: 28580 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:51:31,800-Speed 5963.71 samples/sec Loss 13.3935 LearningRate 0.3671 Epoch: 2 Global Step: 28590 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:51:38,642-Speed 5987.33 samples/sec Loss 13.3776 LearningRate 0.3670 Epoch: 2 Global Step: 28600 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:51:45,506-Speed 5970.77 samples/sec Loss 13.4224 LearningRate 0.3670 Epoch: 2 Global Step: 28610 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:51:52,353-Speed 5984.15 samples/sec Loss 13.5348 LearningRate 0.3669 Epoch: 2 Global Step: 28620 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:51:59,206-Speed 5977.82 samples/sec Loss 13.4169 LearningRate 0.3669 Epoch: 2 Global Step: 28630 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:52:06,046-Speed 5989.68 samples/sec Loss 13.4382 LearningRate 0.3668 Epoch: 2 Global Step: 28640 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:52:12,898-Speed 5978.88 samples/sec Loss 13.3681 LearningRate 0.3668 Epoch: 2 Global Step: 28650 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:52:19,762-Speed 5968.96 samples/sec Loss 13.4814 LearningRate 0.3668 Epoch: 2 Global Step: 28660 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:52:26,620-Speed 5973.19 samples/sec Loss 13.5864 LearningRate 0.3667 Epoch: 2 Global Step: 28670 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:52:33,506-Speed 5949.93 samples/sec Loss 13.4118 LearningRate 0.3667 Epoch: 2 Global Step: 28680 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:52:40,349-Speed 5986.31 samples/sec Loss 13.3822 LearningRate 0.3666 Epoch: 2 Global Step: 28690 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:52:47,194-Speed 5985.55 samples/sec Loss 13.4331 LearningRate 0.3666 Epoch: 2 Global Step: 28700 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:52:54,057-Speed 5969.12 samples/sec Loss 13.3578 LearningRate 0.3666 Epoch: 2 Global Step: 28710 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:53:00,907-Speed 5980.45 samples/sec Loss 13.4771 LearningRate 0.3665 Epoch: 2 Global Step: 28720 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:53:07,760-Speed 5978.43 samples/sec Loss 13.4670 LearningRate 0.3665 Epoch: 2 Global Step: 28730 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:53:14,610-Speed 5980.57 samples/sec Loss 13.3973 LearningRate 0.3664 Epoch: 2 Global Step: 28740 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:53:21,461-Speed 5981.84 samples/sec Loss 13.4385 LearningRate 0.3664 Epoch: 2 Global Step: 28750 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:53:28,312-Speed 5980.28 samples/sec Loss 13.3755 LearningRate 0.3664 Epoch: 2 Global Step: 28760 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:53:35,162-Speed 5980.10 samples/sec Loss 13.4565 LearningRate 0.3663 Epoch: 2 Global Step: 28770 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:53:42,007-Speed 5984.82 samples/sec Loss 13.4293 LearningRate 0.3663 Epoch: 2 Global Step: 28780 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:53:48,868-Speed 5970.89 samples/sec Loss 13.3710 LearningRate 0.3662 Epoch: 2 Global Step: 28790 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:53:55,742-Speed 5960.15 samples/sec Loss 13.5205 LearningRate 0.3662 Epoch: 2 Global Step: 28800 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:54:02,602-Speed 5972.06 samples/sec Loss 13.4772 LearningRate 0.3661 Epoch: 2 Global Step: 28810 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:54:09,458-Speed 5975.18 samples/sec Loss 13.4950 LearningRate 0.3661 Epoch: 2 Global Step: 28820 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:54:16,303-Speed 5984.92 samples/sec Loss 13.4323 LearningRate 0.3661 Epoch: 2 Global Step: 28830 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:54:23,167-Speed 5967.92 samples/sec Loss 13.3541 LearningRate 0.3660 Epoch: 2 Global Step: 28840 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:54:30,023-Speed 5975.94 samples/sec Loss 13.4473 LearningRate 0.3660 Epoch: 2 Global Step: 28850 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:54:36,879-Speed 5975.52 samples/sec Loss 13.5789 LearningRate 0.3659 Epoch: 2 Global Step: 28860 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:54:43,749-Speed 5962.72 samples/sec Loss 13.3996 LearningRate 0.3659 Epoch: 2 Global Step: 28870 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:54:50,616-Speed 5965.72 samples/sec Loss 13.3779 LearningRate 0.3659 Epoch: 2 Global Step: 28880 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:54:57,481-Speed 5967.36 samples/sec Loss 13.4121 LearningRate 0.3658 Epoch: 2 Global Step: 28890 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:55:04,342-Speed 5971.63 samples/sec Loss 13.4367 LearningRate 0.3658 Epoch: 2 Global Step: 28900 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:55:11,190-Speed 5982.14 samples/sec Loss 13.4233 LearningRate 0.3657 Epoch: 2 Global Step: 28910 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:55:18,058-Speed 5965.54 samples/sec Loss 13.4327 LearningRate 0.3657 Epoch: 2 Global Step: 28920 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:55:24,915-Speed 5974.66 samples/sec Loss 13.4354 LearningRate 0.3657 Epoch: 2 Global Step: 28930 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:55:31,885-Speed 5877.60 samples/sec Loss 13.3666 LearningRate 0.3656 Epoch: 2 Global Step: 28940 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:55:38,835-Speed 5897.04 samples/sec Loss 13.3794 LearningRate 0.3656 Epoch: 2 Global Step: 28950 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:55:45,779-Speed 5900.24 samples/sec Loss 13.4921 LearningRate 0.3655 Epoch: 2 Global Step: 28960 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:55:52,726-Speed 5896.17 samples/sec Loss 13.4108 LearningRate 0.3655 Epoch: 2 Global Step: 28970 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:55:59,671-Speed 5899.63 samples/sec Loss 13.4267 LearningRate 0.3655 Epoch: 2 Global Step: 28980 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:56:06,567-Speed 5940.49 samples/sec Loss 13.3403 LearningRate 0.3654 Epoch: 2 Global Step: 28990 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:56:13,428-Speed 5971.25 samples/sec Loss 13.3999 LearningRate 0.3654 Epoch: 2 Global Step: 29000 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:56:20,310-Speed 5952.13 samples/sec Loss 13.4245 LearningRate 0.3653 Epoch: 2 Global Step: 29010 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:56:27,166-Speed 5975.17 samples/sec Loss 13.4292 LearningRate 0.3653 Epoch: 2 Global Step: 29020 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:56:34,040-Speed 5960.58 samples/sec Loss 13.4773 LearningRate 0.3652 Epoch: 2 Global Step: 29030 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:56:40,909-Speed 5963.55 samples/sec Loss 13.3921 LearningRate 0.3652 Epoch: 2 Global Step: 29040 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:56:47,765-Speed 5977.81 samples/sec Loss 13.3645 LearningRate 0.3652 Epoch: 2 Global Step: 29050 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:56:54,617-Speed 5982.07 samples/sec Loss 13.4147 LearningRate 0.3651 Epoch: 2 Global Step: 29060 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:57:01,503-Speed 5949.62 samples/sec Loss 13.4559 LearningRate 0.3651 Epoch: 2 Global Step: 29070 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:57:08,384-Speed 5953.97 samples/sec Loss 13.4147 LearningRate 0.3650 Epoch: 2 Global Step: 29080 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:57:15,310-Speed 5914.90 samples/sec Loss 13.3781 LearningRate 0.3650 Epoch: 2 Global Step: 29090 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:57:22,212-Speed 5936.00 samples/sec Loss 13.3696 LearningRate 0.3650 Epoch: 2 Global Step: 29100 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:57:29,195-Speed 5867.05 samples/sec Loss 13.4039 LearningRate 0.3649 Epoch: 2 Global Step: 29110 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:57:36,048-Speed 5978.20 samples/sec Loss 13.3618 LearningRate 0.3649 Epoch: 2 Global Step: 29120 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:57:42,895-Speed 5983.65 samples/sec Loss 13.4578 LearningRate 0.3648 Epoch: 2 Global Step: 29130 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:57:49,743-Speed 5983.09 samples/sec Loss 13.4374 LearningRate 0.3648 Epoch: 2 Global Step: 29140 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:57:56,610-Speed 5965.15 samples/sec Loss 13.4423 LearningRate 0.3648 Epoch: 2 Global Step: 29150 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:58:03,477-Speed 5967.96 samples/sec Loss 13.4426 LearningRate 0.3647 Epoch: 2 Global Step: 29160 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:58:10,340-Speed 5969.29 samples/sec Loss 13.3604 LearningRate 0.3647 Epoch: 2 Global Step: 29170 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:58:17,195-Speed 5976.97 samples/sec Loss 13.3898 LearningRate 0.3646 Epoch: 2 Global Step: 29180 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:58:24,050-Speed 5976.40 samples/sec Loss 13.4535 LearningRate 0.3646 Epoch: 2 Global Step: 29190 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:58:30,899-Speed 5980.72 samples/sec Loss 13.4689 LearningRate 0.3646 Epoch: 2 Global Step: 29200 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:58:37,777-Speed 5956.61 samples/sec Loss 13.2880 LearningRate 0.3645 Epoch: 2 Global Step: 29210 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:58:44,644-Speed 5966.38 samples/sec Loss 13.3238 LearningRate 0.3645 Epoch: 2 Global Step: 29220 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:58:51,507-Speed 5969.77 samples/sec Loss 13.4772 LearningRate 0.3644 Epoch: 2 Global Step: 29230 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:58:58,374-Speed 5966.03 samples/sec Loss 13.3057 LearningRate 0.3644 Epoch: 2 Global Step: 29240 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:59:05,251-Speed 5957.61 samples/sec Loss 13.3511 LearningRate 0.3643 Epoch: 2 Global Step: 29250 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:59:12,127-Speed 5958.08 samples/sec Loss 13.3074 LearningRate 0.3643 Epoch: 2 Global Step: 29260 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:59:18,997-Speed 5963.13 samples/sec Loss 13.4360 LearningRate 0.3643 Epoch: 2 Global Step: 29270 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 01:59:25,858-Speed 5972.60 samples/sec Loss 13.3611 LearningRate 0.3642 Epoch: 2 Global Step: 29280 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:59:32,718-Speed 5973.05 samples/sec Loss 13.3667 LearningRate 0.3642 Epoch: 2 Global Step: 29290 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:59:39,574-Speed 5975.94 samples/sec Loss 13.3967 LearningRate 0.3641 Epoch: 2 Global Step: 29300 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:59:46,435-Speed 5970.86 samples/sec Loss 13.4047 LearningRate 0.3641 Epoch: 2 Global Step: 29310 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 01:59:53,299-Speed 5968.47 samples/sec Loss 13.3572 LearningRate 0.3641 Epoch: 2 Global Step: 29320 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:00:00,166-Speed 5966.17 samples/sec Loss 13.4233 LearningRate 0.3640 Epoch: 2 Global Step: 29330 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:00:07,024-Speed 5973.45 samples/sec Loss 13.3465 LearningRate 0.3640 Epoch: 2 Global Step: 29340 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:00:13,869-Speed 5985.44 samples/sec Loss 13.3618 LearningRate 0.3639 Epoch: 2 Global Step: 29350 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:00:20,745-Speed 5958.29 samples/sec Loss 13.4446 LearningRate 0.3639 Epoch: 2 Global Step: 29360 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:00:27,629-Speed 5951.11 samples/sec Loss 13.2636 LearningRate 0.3639 Epoch: 2 Global Step: 29370 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:00:34,494-Speed 5967.33 samples/sec Loss 13.3571 LearningRate 0.3638 Epoch: 2 Global Step: 29380 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:00:41,345-Speed 5980.34 samples/sec Loss 13.3836 LearningRate 0.3638 Epoch: 2 Global Step: 29390 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:00:48,205-Speed 5971.65 samples/sec Loss 13.3246 LearningRate 0.3637 Epoch: 2 Global Step: 29400 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:00:55,269-Speed 5803.76 samples/sec Loss 13.3951 LearningRate 0.3637 Epoch: 2 Global Step: 29410 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:01:02,160-Speed 5947.60 samples/sec Loss 13.3017 LearningRate 0.3637 Epoch: 2 Global Step: 29420 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:01:09,064-Speed 5933.99 samples/sec Loss 13.3491 LearningRate 0.3636 Epoch: 2 Global Step: 29430 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:01:15,919-Speed 5976.35 samples/sec Loss 13.3212 LearningRate 0.3636 Epoch: 2 Global Step: 29440 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:01:22,770-Speed 5979.88 samples/sec Loss 13.3149 LearningRate 0.3635 Epoch: 2 Global Step: 29450 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:01:29,634-Speed 5969.05 samples/sec Loss 13.3395 LearningRate 0.3635 Epoch: 2 Global Step: 29460 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:01:36,492-Speed 5974.02 samples/sec Loss 13.3895 LearningRate 0.3634 Epoch: 2 Global Step: 29470 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:01:43,352-Speed 5971.61 samples/sec Loss 13.2721 LearningRate 0.3634 Epoch: 2 Global Step: 29480 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:01:50,196-Speed 5986.04 samples/sec Loss 13.4008 LearningRate 0.3634 Epoch: 2 Global Step: 29490 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:01:57,071-Speed 5962.05 samples/sec Loss 13.4222 LearningRate 0.3633 Epoch: 2 Global Step: 29500 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:02:03,929-Speed 5973.81 samples/sec Loss 13.3382 LearningRate 0.3633 Epoch: 2 Global Step: 29510 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:02:10,781-Speed 5978.74 samples/sec Loss 13.3453 LearningRate 0.3632 Epoch: 2 Global Step: 29520 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:02:17,658-Speed 5957.44 samples/sec Loss 13.3539 LearningRate 0.3632 Epoch: 2 Global Step: 29530 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:02:24,500-Speed 5987.59 samples/sec Loss 13.3311 LearningRate 0.3632 Epoch: 2 Global Step: 29540 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:02:31,362-Speed 5972.66 samples/sec Loss 13.3991 LearningRate 0.3631 Epoch: 2 Global Step: 29550 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:02:38,215-Speed 5977.83 samples/sec Loss 13.3184 LearningRate 0.3631 Epoch: 2 Global Step: 29560 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:02:45,063-Speed 5982.63 samples/sec Loss 13.3197 LearningRate 0.3630 Epoch: 2 Global Step: 29570 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:02:51,944-Speed 5953.19 samples/sec Loss 13.3317 LearningRate 0.3630 Epoch: 2 Global Step: 29580 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:02:58,814-Speed 5963.54 samples/sec Loss 13.3183 LearningRate 0.3630 Epoch: 2 Global Step: 29590 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:03:05,684-Speed 5963.48 samples/sec Loss 13.2906 LearningRate 0.3629 Epoch: 2 Global Step: 29600 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:03:12,547-Speed 5969.04 samples/sec Loss 13.3050 LearningRate 0.3629 Epoch: 2 Global Step: 29610 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:03:19,417-Speed 5964.32 samples/sec Loss 13.3860 LearningRate 0.3628 Epoch: 2 Global Step: 29620 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:03:26,302-Speed 5952.11 samples/sec Loss 13.3373 LearningRate 0.3628 Epoch: 2 Global Step: 29630 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:03:33,163-Speed 5973.44 samples/sec Loss 13.4345 LearningRate 0.3628 Epoch: 2 Global Step: 29640 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:03:40,031-Speed 5964.90 samples/sec Loss 13.3262 LearningRate 0.3627 Epoch: 2 Global Step: 29650 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:03:46,899-Speed 5965.35 samples/sec Loss 13.3563 LearningRate 0.3627 Epoch: 2 Global Step: 29660 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:03:53,744-Speed 5984.72 samples/sec Loss 13.3796 LearningRate 0.3626 Epoch: 2 Global Step: 29670 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:04:00,615-Speed 5963.07 samples/sec Loss 13.3162 LearningRate 0.3626 Epoch: 2 Global Step: 29680 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:04:07,497-Speed 5953.31 samples/sec Loss 13.2945 LearningRate 0.3625 Epoch: 2 Global Step: 29690 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:04:14,333-Speed 5992.77 samples/sec Loss 13.3812 LearningRate 0.3625 Epoch: 2 Global Step: 29700 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:04:21,174-Speed 5988.01 samples/sec Loss 13.3277 LearningRate 0.3625 Epoch: 2 Global Step: 29710 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:04:28,042-Speed 5965.20 samples/sec Loss 13.3712 LearningRate 0.3624 Epoch: 2 Global Step: 29720 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:04:34,889-Speed 5982.79 samples/sec Loss 13.2976 LearningRate 0.3624 Epoch: 2 Global Step: 29730 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:04:41,744-Speed 5976.21 samples/sec Loss 13.3113 LearningRate 0.3623 Epoch: 2 Global Step: 29740 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:04:48,626-Speed 5952.80 samples/sec Loss 13.4310 LearningRate 0.3623 Epoch: 2 Global Step: 29750 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:04:55,470-Speed 5986.29 samples/sec Loss 13.2656 LearningRate 0.3623 Epoch: 2 Global Step: 29760 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:05:02,309-Speed 5990.54 samples/sec Loss 13.4156 LearningRate 0.3622 Epoch: 2 Global Step: 29770 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:05:09,181-Speed 5961.51 samples/sec Loss 13.3028 LearningRate 0.3622 Epoch: 2 Global Step: 29780 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:05:16,019-Speed 5990.80 samples/sec Loss 13.3865 LearningRate 0.3621 Epoch: 2 Global Step: 29790 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:05:22,892-Speed 5961.25 samples/sec Loss 13.3084 LearningRate 0.3621 Epoch: 2 Global Step: 29800 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:05:29,758-Speed 5966.92 samples/sec Loss 13.2157 LearningRate 0.3621 Epoch: 2 Global Step: 29810 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:05:36,621-Speed 5969.86 samples/sec Loss 13.2445 LearningRate 0.3620 Epoch: 2 Global Step: 29820 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:05:43,504-Speed 5951.96 samples/sec Loss 13.2813 LearningRate 0.3620 Epoch: 2 Global Step: 29830 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:05:50,397-Speed 5942.72 samples/sec Loss 13.2227 LearningRate 0.3619 Epoch: 2 Global Step: 29840 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:05:57,262-Speed 5967.88 samples/sec Loss 13.3302 LearningRate 0.3619 Epoch: 2 Global Step: 29850 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:06:04,134-Speed 5960.91 samples/sec Loss 13.3055 LearningRate 0.3619 Epoch: 2 Global Step: 29860 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:06:10,995-Speed 5975.83 samples/sec Loss 13.3024 LearningRate 0.3618 Epoch: 2 Global Step: 29870 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:06:17,842-Speed 5982.72 samples/sec Loss 13.2827 LearningRate 0.3618 Epoch: 2 Global Step: 29880 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:06:24,738-Speed 5941.03 samples/sec Loss 13.2801 LearningRate 0.3617 Epoch: 2 Global Step: 29890 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:06:31,578-Speed 5989.34 samples/sec Loss 13.2288 LearningRate 0.3617 Epoch: 2 Global Step: 29900 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:06:38,452-Speed 5960.38 samples/sec Loss 13.4332 LearningRate 0.3617 Epoch: 2 Global Step: 29910 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:06:45,299-Speed 5982.33 samples/sec Loss 13.3131 LearningRate 0.3616 Epoch: 2 Global Step: 29920 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:06:52,200-Speed 5937.21 samples/sec Loss 13.3312 LearningRate 0.3616 Epoch: 2 Global Step: 29930 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:06:59,073-Speed 5960.02 samples/sec Loss 13.3242 LearningRate 0.3615 Epoch: 2 Global Step: 29940 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:07:05,934-Speed 5973.60 samples/sec Loss 13.3416 LearningRate 0.3615 Epoch: 2 Global Step: 29950 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:07:12,793-Speed 5974.99 samples/sec Loss 13.3385 LearningRate 0.3614 Epoch: 2 Global Step: 29960 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:07:19,657-Speed 5968.22 samples/sec Loss 13.2514 LearningRate 0.3614 Epoch: 2 Global Step: 29970 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:07:26,506-Speed 5980.21 samples/sec Loss 13.2349 LearningRate 0.3614 Epoch: 2 Global Step: 29980 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:07:33,350-Speed 5986.22 samples/sec Loss 13.3578 LearningRate 0.3613 Epoch: 2 Global Step: 29990 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:07:40,194-Speed 5986.21 samples/sec Loss 13.3206 LearningRate 0.3613 Epoch: 2 Global Step: 30000 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:08:07,036-[lfw][30000]XNorm: 22.330070 Training: 2022-01-08 02:08:07,037-[lfw][30000]Accuracy-Flip: 0.99583+-0.00281 Training: 2022-01-08 02:08:07,038-[lfw][30000]Accuracy-Highest: 0.99650 Training: 2022-01-08 02:08:38,320-[cfp_fp][30000]XNorm: 19.638639 Training: 2022-01-08 02:08:38,320-[cfp_fp][30000]Accuracy-Flip: 0.96086+-0.00649 Training: 2022-01-08 02:08:38,320-[cfp_fp][30000]Accuracy-Highest: 0.96957 Training: 2022-01-08 02:09:04,946-[agedb_30][30000]XNorm: 21.622061 Training: 2022-01-08 02:09:04,947-[agedb_30][30000]Accuracy-Flip: 0.95983+-0.00841 Training: 2022-01-08 02:09:04,948-[agedb_30][30000]Accuracy-Highest: 0.95983 Training: 2022-01-08 02:09:11,786-Speed 447.20 samples/sec Loss 13.2658 LearningRate 0.3612 Epoch: 2 Global Step: 30010 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:09:18,626-Speed 5989.72 samples/sec Loss 13.2728 LearningRate 0.3612 Epoch: 2 Global Step: 30020 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:09:25,488-Speed 5971.34 samples/sec Loss 13.2585 LearningRate 0.3612 Epoch: 2 Global Step: 30030 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:09:32,348-Speed 5971.70 samples/sec Loss 13.3334 LearningRate 0.3611 Epoch: 2 Global Step: 30040 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 02:09:39,215-Speed 5966.22 samples/sec Loss 13.2744 LearningRate 0.3611 Epoch: 2 Global Step: 30050 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 02:09:46,088-Speed 5960.17 samples/sec Loss 13.3032 LearningRate 0.3610 Epoch: 2 Global Step: 30060 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 02:09:52,988-Speed 5959.77 samples/sec Loss 13.3318 LearningRate 0.3610 Epoch: 2 Global Step: 30070 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 02:09:59,859-Speed 5963.13 samples/sec Loss 13.3951 LearningRate 0.3610 Epoch: 2 Global Step: 30080 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 02:10:06,740-Speed 5954.02 samples/sec Loss 13.3032 LearningRate 0.3609 Epoch: 2 Global Step: 30090 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 02:10:13,615-Speed 5959.87 samples/sec Loss 13.2385 LearningRate 0.3609 Epoch: 2 Global Step: 30100 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 02:10:20,477-Speed 5970.77 samples/sec Loss 13.3664 LearningRate 0.3608 Epoch: 2 Global Step: 30110 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 02:10:27,352-Speed 5959.07 samples/sec Loss 13.3306 LearningRate 0.3608 Epoch: 2 Global Step: 30120 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 02:10:34,199-Speed 5982.97 samples/sec Loss 13.2423 LearningRate 0.3608 Epoch: 2 Global Step: 30130 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 02:10:41,063-Speed 5970.58 samples/sec Loss 13.2100 LearningRate 0.3607 Epoch: 2 Global Step: 30140 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:10:47,925-Speed 5970.84 samples/sec Loss 13.2730 LearningRate 0.3607 Epoch: 2 Global Step: 30150 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:10:54,777-Speed 5979.75 samples/sec Loss 13.2130 LearningRate 0.3606 Epoch: 2 Global Step: 30160 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:11:01,630-Speed 5978.12 samples/sec Loss 13.3048 LearningRate 0.3606 Epoch: 2 Global Step: 30170 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:11:08,473-Speed 5987.13 samples/sec Loss 13.3317 LearningRate 0.3606 Epoch: 2 Global Step: 30180 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:11:15,335-Speed 5969.94 samples/sec Loss 13.3031 LearningRate 0.3605 Epoch: 2 Global Step: 30190 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:11:22,171-Speed 5993.39 samples/sec Loss 13.3082 LearningRate 0.3605 Epoch: 2 Global Step: 30200 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:11:29,023-Speed 5979.01 samples/sec Loss 13.3794 LearningRate 0.3604 Epoch: 2 Global Step: 30210 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:11:35,865-Speed 5987.16 samples/sec Loss 13.3049 LearningRate 0.3604 Epoch: 2 Global Step: 30220 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:11:42,713-Speed 5982.69 samples/sec Loss 13.3306 LearningRate 0.3603 Epoch: 2 Global Step: 30230 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:11:49,573-Speed 5972.23 samples/sec Loss 13.2375 LearningRate 0.3603 Epoch: 2 Global Step: 30240 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:11:56,426-Speed 5977.66 samples/sec Loss 13.2275 LearningRate 0.3603 Epoch: 2 Global Step: 30250 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:12:03,279-Speed 5978.40 samples/sec Loss 13.2110 LearningRate 0.3602 Epoch: 2 Global Step: 30260 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:12:10,132-Speed 5978.24 samples/sec Loss 13.2323 LearningRate 0.3602 Epoch: 2 Global Step: 30270 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:12:16,983-Speed 5979.47 samples/sec Loss 13.2713 LearningRate 0.3601 Epoch: 2 Global Step: 30280 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:12:23,838-Speed 5976.82 samples/sec Loss 13.1733 LearningRate 0.3601 Epoch: 2 Global Step: 30290 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:12:30,688-Speed 5980.19 samples/sec Loss 13.3073 LearningRate 0.3601 Epoch: 2 Global Step: 30300 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:12:37,552-Speed 5969.15 samples/sec Loss 13.1847 LearningRate 0.3600 Epoch: 2 Global Step: 30310 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:12:44,405-Speed 5977.92 samples/sec Loss 13.2441 LearningRate 0.3600 Epoch: 2 Global Step: 30320 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:12:51,270-Speed 5969.22 samples/sec Loss 13.2913 LearningRate 0.3599 Epoch: 2 Global Step: 30330 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:12:58,131-Speed 5970.25 samples/sec Loss 13.2750 LearningRate 0.3599 Epoch: 2 Global Step: 30340 Fp16 Grad Scale: 524288 Required: 35 hours Training: 2022-01-08 02:13:04,979-Speed 5982.45 samples/sec Loss 13.2419 LearningRate 0.3599 Epoch: 2 Global Step: 30350 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:13:11,818-Speed 5990.66 samples/sec Loss 13.2907 LearningRate 0.3598 Epoch: 2 Global Step: 30360 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:13:18,676-Speed 5973.89 samples/sec Loss 13.2013 LearningRate 0.3598 Epoch: 2 Global Step: 30370 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:13:25,555-Speed 5955.81 samples/sec Loss 13.2151 LearningRate 0.3597 Epoch: 2 Global Step: 30380 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:13:32,398-Speed 5987.37 samples/sec Loss 13.1755 LearningRate 0.3597 Epoch: 2 Global Step: 30390 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:13:39,249-Speed 5982.12 samples/sec Loss 13.2208 LearningRate 0.3597 Epoch: 2 Global Step: 30400 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:13:46,141-Speed 5944.63 samples/sec Loss 13.3622 LearningRate 0.3596 Epoch: 2 Global Step: 30410 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:13:52,997-Speed 5975.55 samples/sec Loss 13.3133 LearningRate 0.3596 Epoch: 2 Global Step: 30420 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:13:59,845-Speed 5982.34 samples/sec Loss 13.2794 LearningRate 0.3595 Epoch: 2 Global Step: 30430 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:14:06,691-Speed 5984.70 samples/sec Loss 13.3071 LearningRate 0.3595 Epoch: 2 Global Step: 30440 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:14:13,630-Speed 5904.33 samples/sec Loss 13.2490 LearningRate 0.3595 Epoch: 2 Global Step: 30450 Fp16 Grad Scale: 524288 Required: 35 hours Training: 2022-01-08 02:14:20,479-Speed 5980.99 samples/sec Loss 13.2651 LearningRate 0.3594 Epoch: 2 Global Step: 30460 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:14:27,335-Speed 5975.81 samples/sec Loss 13.2399 LearningRate 0.3594 Epoch: 2 Global Step: 30470 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:14:34,186-Speed 5979.45 samples/sec Loss 13.3617 LearningRate 0.3593 Epoch: 2 Global Step: 30480 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:14:41,042-Speed 5977.55 samples/sec Loss 13.2360 LearningRate 0.3593 Epoch: 2 Global Step: 30490 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:14:47,885-Speed 5986.20 samples/sec Loss 13.3443 LearningRate 0.3593 Epoch: 2 Global Step: 30500 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:14:54,756-Speed 5962.15 samples/sec Loss 13.2258 LearningRate 0.3592 Epoch: 2 Global Step: 30510 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:15:01,632-Speed 5958.38 samples/sec Loss 13.1823 LearningRate 0.3592 Epoch: 2 Global Step: 30520 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:15:08,477-Speed 5984.70 samples/sec Loss 13.1504 LearningRate 0.3591 Epoch: 2 Global Step: 30530 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:15:15,333-Speed 5977.80 samples/sec Loss 13.1801 LearningRate 0.3591 Epoch: 2 Global Step: 30540 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:15:22,178-Speed 5984.50 samples/sec Loss 13.2376 LearningRate 0.3590 Epoch: 2 Global Step: 30550 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:15:29,023-Speed 5985.65 samples/sec Loss 13.2365 LearningRate 0.3590 Epoch: 2 Global Step: 30560 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:15:35,867-Speed 5985.47 samples/sec Loss 13.2725 LearningRate 0.3590 Epoch: 2 Global Step: 30570 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:15:42,732-Speed 5970.15 samples/sec Loss 13.3035 LearningRate 0.3589 Epoch: 2 Global Step: 30580 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:15:49,579-Speed 5983.50 samples/sec Loss 13.2350 LearningRate 0.3589 Epoch: 2 Global Step: 30590 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:15:56,445-Speed 5966.43 samples/sec Loss 13.3613 LearningRate 0.3588 Epoch: 2 Global Step: 30600 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:16:03,315-Speed 5963.34 samples/sec Loss 13.2468 LearningRate 0.3588 Epoch: 2 Global Step: 30610 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:16:10,160-Speed 5985.03 samples/sec Loss 13.2642 LearningRate 0.3588 Epoch: 2 Global Step: 30620 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:16:17,042-Speed 5953.53 samples/sec Loss 13.2157 LearningRate 0.3587 Epoch: 2 Global Step: 30630 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:16:23,912-Speed 5962.99 samples/sec Loss 13.1742 LearningRate 0.3587 Epoch: 2 Global Step: 30640 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:16:30,761-Speed 5982.15 samples/sec Loss 13.2937 LearningRate 0.3586 Epoch: 2 Global Step: 30650 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:16:37,622-Speed 5974.18 samples/sec Loss 13.2627 LearningRate 0.3586 Epoch: 2 Global Step: 30660 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:16:44,472-Speed 5980.70 samples/sec Loss 13.2243 LearningRate 0.3586 Epoch: 2 Global Step: 30670 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:16:51,320-Speed 5982.20 samples/sec Loss 13.2668 LearningRate 0.3585 Epoch: 2 Global Step: 30680 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:16:58,182-Speed 5970.87 samples/sec Loss 13.1957 LearningRate 0.3585 Epoch: 2 Global Step: 30690 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:17:05,067-Speed 5952.44 samples/sec Loss 13.1023 LearningRate 0.3584 Epoch: 2 Global Step: 30700 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:17:11,945-Speed 5956.86 samples/sec Loss 13.2731 LearningRate 0.3584 Epoch: 2 Global Step: 30710 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:17:18,815-Speed 5962.74 samples/sec Loss 13.2253 LearningRate 0.3584 Epoch: 2 Global Step: 30720 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:17:25,694-Speed 5955.99 samples/sec Loss 13.2824 LearningRate 0.3583 Epoch: 2 Global Step: 30730 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:17:32,912-Speed 5675.98 samples/sec Loss 13.1842 LearningRate 0.3583 Epoch: 2 Global Step: 30740 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:17:39,785-Speed 5961.23 samples/sec Loss 13.2910 LearningRate 0.3582 Epoch: 2 Global Step: 30750 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:17:46,646-Speed 5972.02 samples/sec Loss 13.1742 LearningRate 0.3582 Epoch: 2 Global Step: 30760 Fp16 Grad Scale: 524288 Required: 35 hours Training: 2022-01-08 02:17:53,486-Speed 5989.29 samples/sec Loss 13.1909 LearningRate 0.3582 Epoch: 2 Global Step: 30770 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:18:00,339-Speed 5978.24 samples/sec Loss 13.2577 LearningRate 0.3581 Epoch: 2 Global Step: 30780 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:18:07,210-Speed 5962.14 samples/sec Loss 13.2270 LearningRate 0.3581 Epoch: 2 Global Step: 30790 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:18:14,086-Speed 5958.04 samples/sec Loss 13.2123 LearningRate 0.3580 Epoch: 2 Global Step: 30800 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:18:20,942-Speed 5975.91 samples/sec Loss 13.1570 LearningRate 0.3580 Epoch: 2 Global Step: 30810 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:18:27,794-Speed 5978.80 samples/sec Loss 13.2143 LearningRate 0.3580 Epoch: 2 Global Step: 30820 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:18:34,645-Speed 5979.84 samples/sec Loss 13.1995 LearningRate 0.3579 Epoch: 2 Global Step: 30830 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:18:41,502-Speed 5974.71 samples/sec Loss 13.2158 LearningRate 0.3579 Epoch: 2 Global Step: 30840 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:18:48,364-Speed 5969.16 samples/sec Loss 13.2442 LearningRate 0.3578 Epoch: 2 Global Step: 30850 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:18:55,219-Speed 5977.35 samples/sec Loss 13.1834 LearningRate 0.3578 Epoch: 2 Global Step: 30860 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:19:02,093-Speed 5959.31 samples/sec Loss 13.2330 LearningRate 0.3578 Epoch: 2 Global Step: 30870 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:19:08,942-Speed 5980.57 samples/sec Loss 13.2312 LearningRate 0.3577 Epoch: 2 Global Step: 30880 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:19:15,817-Speed 5959.54 samples/sec Loss 13.2415 LearningRate 0.3577 Epoch: 2 Global Step: 30890 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:19:22,667-Speed 5980.58 samples/sec Loss 13.3492 LearningRate 0.3576 Epoch: 2 Global Step: 30900 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:19:29,514-Speed 5982.98 samples/sec Loss 13.2467 LearningRate 0.3576 Epoch: 2 Global Step: 30910 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:19:36,362-Speed 5982.54 samples/sec Loss 13.1219 LearningRate 0.3575 Epoch: 2 Global Step: 30920 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:19:43,207-Speed 5984.80 samples/sec Loss 13.2883 LearningRate 0.3575 Epoch: 2 Global Step: 30930 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:19:50,057-Speed 5980.56 samples/sec Loss 13.2211 LearningRate 0.3575 Epoch: 2 Global Step: 30940 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:19:56,925-Speed 5964.59 samples/sec Loss 13.2182 LearningRate 0.3574 Epoch: 2 Global Step: 30950 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:20:03,772-Speed 5983.68 samples/sec Loss 13.2059 LearningRate 0.3574 Epoch: 2 Global Step: 30960 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:20:10,625-Speed 5978.46 samples/sec Loss 13.1648 LearningRate 0.3573 Epoch: 2 Global Step: 30970 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:20:17,498-Speed 5960.70 samples/sec Loss 13.2665 LearningRate 0.3573 Epoch: 2 Global Step: 30980 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:20:24,369-Speed 5962.56 samples/sec Loss 13.1784 LearningRate 0.3573 Epoch: 2 Global Step: 30990 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:20:31,244-Speed 5959.06 samples/sec Loss 13.1884 LearningRate 0.3572 Epoch: 2 Global Step: 31000 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:20:38,129-Speed 5950.15 samples/sec Loss 13.1692 LearningRate 0.3572 Epoch: 2 Global Step: 31010 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:20:44,984-Speed 5976.21 samples/sec Loss 13.1929 LearningRate 0.3571 Epoch: 2 Global Step: 31020 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:20:51,836-Speed 5978.76 samples/sec Loss 13.2295 LearningRate 0.3571 Epoch: 2 Global Step: 31030 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:20:58,700-Speed 5968.86 samples/sec Loss 13.2546 LearningRate 0.3571 Epoch: 2 Global Step: 31040 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:21:05,558-Speed 5973.24 samples/sec Loss 13.2547 LearningRate 0.3570 Epoch: 2 Global Step: 31050 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:21:12,426-Speed 5965.76 samples/sec Loss 13.2273 LearningRate 0.3570 Epoch: 2 Global Step: 31060 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:21:19,305-Speed 5955.87 samples/sec Loss 13.2423 LearningRate 0.3569 Epoch: 2 Global Step: 31070 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:21:26,139-Speed 5995.14 samples/sec Loss 13.2507 LearningRate 0.3569 Epoch: 2 Global Step: 31080 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 02:21:33,020-Speed 5953.14 samples/sec Loss 13.2396 LearningRate 0.3569 Epoch: 2 Global Step: 31090 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 02:21:39,886-Speed 5966.85 samples/sec Loss 13.2020 LearningRate 0.3568 Epoch: 2 Global Step: 31100 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 02:21:46,782-Speed 5941.31 samples/sec Loss 13.1816 LearningRate 0.3568 Epoch: 2 Global Step: 31110 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 02:22:10,847-Speed 1702.17 samples/sec Loss 13.1906 LearningRate 0.3567 Epoch: 3 Global Step: 31120 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 02:22:17,689-Speed 5987.54 samples/sec Loss 13.1722 LearningRate 0.3567 Epoch: 3 Global Step: 31130 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 02:22:24,572-Speed 5952.97 samples/sec Loss 13.2291 LearningRate 0.3567 Epoch: 3 Global Step: 31140 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 02:22:31,419-Speed 5982.99 samples/sec Loss 13.1715 LearningRate 0.3566 Epoch: 3 Global Step: 31150 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 02:22:38,334-Speed 5924.72 samples/sec Loss 13.1557 LearningRate 0.3566 Epoch: 3 Global Step: 31160 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 02:22:45,196-Speed 5970.03 samples/sec Loss 13.1924 LearningRate 0.3565 Epoch: 3 Global Step: 31170 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 02:22:52,086-Speed 5947.96 samples/sec Loss 13.1306 LearningRate 0.3565 Epoch: 3 Global Step: 31180 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:22:58,976-Speed 5946.42 samples/sec Loss 13.1457 LearningRate 0.3565 Epoch: 3 Global Step: 31190 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:23:05,859-Speed 5951.20 samples/sec Loss 13.1622 LearningRate 0.3564 Epoch: 3 Global Step: 31200 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:23:12,786-Speed 5914.44 samples/sec Loss 13.2364 LearningRate 0.3564 Epoch: 3 Global Step: 31210 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:23:19,669-Speed 5952.31 samples/sec Loss 13.1812 LearningRate 0.3563 Epoch: 3 Global Step: 31220 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:23:26,548-Speed 5955.33 samples/sec Loss 13.1692 LearningRate 0.3563 Epoch: 3 Global Step: 31230 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:23:33,418-Speed 5963.89 samples/sec Loss 13.1883 LearningRate 0.3563 Epoch: 3 Global Step: 31240 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:23:40,287-Speed 5963.43 samples/sec Loss 13.2539 LearningRate 0.3562 Epoch: 3 Global Step: 31250 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:23:47,168-Speed 5955.83 samples/sec Loss 13.1067 LearningRate 0.3562 Epoch: 3 Global Step: 31260 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:23:54,026-Speed 5973.25 samples/sec Loss 13.1268 LearningRate 0.3561 Epoch: 3 Global Step: 31270 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 02:24:00,882-Speed 5974.86 samples/sec Loss 13.0878 LearningRate 0.3561 Epoch: 3 Global Step: 31280 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:24:07,742-Speed 5971.80 samples/sec Loss 13.1320 LearningRate 0.3560 Epoch: 3 Global Step: 31290 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:24:14,601-Speed 5972.43 samples/sec Loss 13.2254 LearningRate 0.3560 Epoch: 3 Global Step: 31300 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:24:21,473-Speed 5962.58 samples/sec Loss 13.1778 LearningRate 0.3560 Epoch: 3 Global Step: 31310 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:24:28,373-Speed 5937.48 samples/sec Loss 13.2429 LearningRate 0.3559 Epoch: 3 Global Step: 31320 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:24:35,231-Speed 5973.11 samples/sec Loss 13.2023 LearningRate 0.3559 Epoch: 3 Global Step: 31330 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:24:42,107-Speed 5957.98 samples/sec Loss 13.2089 LearningRate 0.3558 Epoch: 3 Global Step: 31340 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:24:48,961-Speed 5977.62 samples/sec Loss 13.1801 LearningRate 0.3558 Epoch: 3 Global Step: 31350 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:24:55,835-Speed 5959.58 samples/sec Loss 13.0692 LearningRate 0.3558 Epoch: 3 Global Step: 31360 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:25:02,687-Speed 5978.48 samples/sec Loss 13.1391 LearningRate 0.3557 Epoch: 3 Global Step: 31370 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 02:25:09,538-Speed 5981.85 samples/sec Loss 13.1648 LearningRate 0.3557 Epoch: 3 Global Step: 31380 Fp16 Grad Scale: 524288 Required: 34 hours Training: 2022-01-08 02:25:16,393-Speed 5976.34 samples/sec Loss 13.1655 LearningRate 0.3556 Epoch: 3 Global Step: 31390 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:25:23,271-Speed 5956.92 samples/sec Loss 13.1405 LearningRate 0.3556 Epoch: 3 Global Step: 31400 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:25:30,127-Speed 5975.91 samples/sec Loss 13.0991 LearningRate 0.3556 Epoch: 3 Global Step: 31410 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:25:37,025-Speed 5938.87 samples/sec Loss 13.1228 LearningRate 0.3555 Epoch: 3 Global Step: 31420 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:25:43,902-Speed 5956.93 samples/sec Loss 13.1744 LearningRate 0.3555 Epoch: 3 Global Step: 31430 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:25:50,780-Speed 5956.50 samples/sec Loss 13.1852 LearningRate 0.3554 Epoch: 3 Global Step: 31440 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:25:57,643-Speed 5970.03 samples/sec Loss 13.0951 LearningRate 0.3554 Epoch: 3 Global Step: 31450 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:26:06,610-Speed 4568.18 samples/sec Loss 13.1435 LearningRate 0.3554 Epoch: 3 Global Step: 31460 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:26:13,482-Speed 5961.88 samples/sec Loss 13.2187 LearningRate 0.3553 Epoch: 3 Global Step: 31470 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:26:20,336-Speed 5977.62 samples/sec Loss 13.1472 LearningRate 0.3553 Epoch: 3 Global Step: 31480 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:26:27,201-Speed 5967.34 samples/sec Loss 13.2424 LearningRate 0.3552 Epoch: 3 Global Step: 31490 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:26:34,087-Speed 5949.13 samples/sec Loss 13.1701 LearningRate 0.3552 Epoch: 3 Global Step: 31500 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:26:41,004-Speed 5922.83 samples/sec Loss 13.1614 LearningRate 0.3552 Epoch: 3 Global Step: 31510 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:26:47,892-Speed 5947.80 samples/sec Loss 13.1492 LearningRate 0.3551 Epoch: 3 Global Step: 31520 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:26:54,828-Speed 5907.47 samples/sec Loss 13.1364 LearningRate 0.3551 Epoch: 3 Global Step: 31530 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:27:01,714-Speed 5949.82 samples/sec Loss 13.1822 LearningRate 0.3550 Epoch: 3 Global Step: 31540 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:27:08,599-Speed 5952.27 samples/sec Loss 13.1372 LearningRate 0.3550 Epoch: 3 Global Step: 31550 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:27:15,493-Speed 5942.49 samples/sec Loss 13.1532 LearningRate 0.3550 Epoch: 3 Global Step: 31560 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:27:22,365-Speed 5961.45 samples/sec Loss 13.2012 LearningRate 0.3549 Epoch: 3 Global Step: 31570 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:27:29,237-Speed 5962.43 samples/sec Loss 13.1910 LearningRate 0.3549 Epoch: 3 Global Step: 31580 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:27:36,101-Speed 5968.47 samples/sec Loss 13.1007 LearningRate 0.3548 Epoch: 3 Global Step: 31590 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:27:42,962-Speed 5971.06 samples/sec Loss 13.1567 LearningRate 0.3548 Epoch: 3 Global Step: 31600 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:27:49,817-Speed 5976.18 samples/sec Loss 13.1272 LearningRate 0.3548 Epoch: 3 Global Step: 31610 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:27:56,675-Speed 5973.92 samples/sec Loss 13.1684 LearningRate 0.3547 Epoch: 3 Global Step: 31620 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:28:03,533-Speed 5973.54 samples/sec Loss 13.1587 LearningRate 0.3547 Epoch: 3 Global Step: 31630 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:28:10,395-Speed 5971.38 samples/sec Loss 13.1547 LearningRate 0.3546 Epoch: 3 Global Step: 31640 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:28:17,289-Speed 5944.02 samples/sec Loss 13.1450 LearningRate 0.3546 Epoch: 3 Global Step: 31650 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:28:24,156-Speed 5966.16 samples/sec Loss 13.0625 LearningRate 0.3546 Epoch: 3 Global Step: 31660 Fp16 Grad Scale: 524288 Required: 34 hours Training: 2022-01-08 02:28:31,013-Speed 5974.53 samples/sec Loss 13.0686 LearningRate 0.3545 Epoch: 3 Global Step: 31670 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:28:37,876-Speed 5969.19 samples/sec Loss 13.1644 LearningRate 0.3545 Epoch: 3 Global Step: 31680 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:28:44,739-Speed 5969.49 samples/sec Loss 13.0883 LearningRate 0.3544 Epoch: 3 Global Step: 31690 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:28:51,623-Speed 5950.85 samples/sec Loss 13.0983 LearningRate 0.3544 Epoch: 3 Global Step: 31700 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:28:58,485-Speed 5970.46 samples/sec Loss 13.0795 LearningRate 0.3544 Epoch: 3 Global Step: 31710 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:29:05,358-Speed 5961.09 samples/sec Loss 13.1935 LearningRate 0.3543 Epoch: 3 Global Step: 31720 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:29:12,217-Speed 5972.52 samples/sec Loss 13.1114 LearningRate 0.3543 Epoch: 3 Global Step: 31730 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:29:19,083-Speed 5967.21 samples/sec Loss 13.0445 LearningRate 0.3542 Epoch: 3 Global Step: 31740 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:29:25,967-Speed 5952.18 samples/sec Loss 13.1918 LearningRate 0.3542 Epoch: 3 Global Step: 31750 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:29:32,826-Speed 5973.08 samples/sec Loss 13.1060 LearningRate 0.3542 Epoch: 3 Global Step: 31760 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:29:39,666-Speed 5988.54 samples/sec Loss 13.1283 LearningRate 0.3541 Epoch: 3 Global Step: 31770 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:29:46,617-Speed 5894.30 samples/sec Loss 13.0785 LearningRate 0.3541 Epoch: 3 Global Step: 31780 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:29:53,476-Speed 5972.54 samples/sec Loss 13.1312 LearningRate 0.3540 Epoch: 3 Global Step: 31790 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:30:00,323-Speed 5983.17 samples/sec Loss 13.0771 LearningRate 0.3540 Epoch: 3 Global Step: 31800 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:30:07,174-Speed 5980.22 samples/sec Loss 13.1729 LearningRate 0.3539 Epoch: 3 Global Step: 31810 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:30:14,023-Speed 5980.59 samples/sec Loss 13.0318 LearningRate 0.3539 Epoch: 3 Global Step: 31820 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:30:20,891-Speed 5965.13 samples/sec Loss 13.1083 LearningRate 0.3539 Epoch: 3 Global Step: 31830 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:30:27,808-Speed 5922.78 samples/sec Loss 13.2295 LearningRate 0.3538 Epoch: 3 Global Step: 31840 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:30:34,667-Speed 5972.93 samples/sec Loss 13.1540 LearningRate 0.3538 Epoch: 3 Global Step: 31850 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:30:41,525-Speed 5973.71 samples/sec Loss 13.0825 LearningRate 0.3537 Epoch: 3 Global Step: 31860 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:30:48,373-Speed 5981.82 samples/sec Loss 13.1229 LearningRate 0.3537 Epoch: 3 Global Step: 31870 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:30:55,242-Speed 5966.02 samples/sec Loss 13.1220 LearningRate 0.3537 Epoch: 3 Global Step: 31880 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:31:02,104-Speed 5971.54 samples/sec Loss 13.0070 LearningRate 0.3536 Epoch: 3 Global Step: 31890 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:31:08,982-Speed 5957.00 samples/sec Loss 13.1347 LearningRate 0.3536 Epoch: 3 Global Step: 31900 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:31:15,890-Speed 5930.83 samples/sec Loss 13.1472 LearningRate 0.3535 Epoch: 3 Global Step: 31910 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:31:23,034-Speed 5735.81 samples/sec Loss 13.1535 LearningRate 0.3535 Epoch: 3 Global Step: 31920 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:31:29,911-Speed 5957.35 samples/sec Loss 13.1362 LearningRate 0.3535 Epoch: 3 Global Step: 31930 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:31:36,787-Speed 5958.60 samples/sec Loss 13.1487 LearningRate 0.3534 Epoch: 3 Global Step: 31940 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:31:44,166-Speed 5551.69 samples/sec Loss 13.1131 LearningRate 0.3534 Epoch: 3 Global Step: 31950 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:31:51,043-Speed 5957.54 samples/sec Loss 13.1657 LearningRate 0.3533 Epoch: 3 Global Step: 31960 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:31:57,901-Speed 5973.87 samples/sec Loss 13.1368 LearningRate 0.3533 Epoch: 3 Global Step: 31970 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:32:04,762-Speed 5972.19 samples/sec Loss 13.1563 LearningRate 0.3533 Epoch: 3 Global Step: 31980 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:32:11,641-Speed 5955.80 samples/sec Loss 13.0722 LearningRate 0.3532 Epoch: 3 Global Step: 31990 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:32:18,500-Speed 5973.09 samples/sec Loss 13.0240 LearningRate 0.3532 Epoch: 3 Global Step: 32000 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:32:25,366-Speed 5966.39 samples/sec Loss 13.0601 LearningRate 0.3531 Epoch: 3 Global Step: 32010 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:32:32,236-Speed 5963.75 samples/sec Loss 13.1149 LearningRate 0.3531 Epoch: 3 Global Step: 32020 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:32:39,102-Speed 5966.90 samples/sec Loss 13.0328 LearningRate 0.3531 Epoch: 3 Global Step: 32030 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:32:46,001-Speed 5938.02 samples/sec Loss 13.1347 LearningRate 0.3530 Epoch: 3 Global Step: 32040 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:32:52,867-Speed 5967.09 samples/sec Loss 13.1187 LearningRate 0.3530 Epoch: 3 Global Step: 32050 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:32:59,741-Speed 5960.55 samples/sec Loss 13.0928 LearningRate 0.3529 Epoch: 3 Global Step: 32060 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:33:06,613-Speed 5961.41 samples/sec Loss 13.1734 LearningRate 0.3529 Epoch: 3 Global Step: 32070 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:33:13,468-Speed 5976.49 samples/sec Loss 13.0913 LearningRate 0.3529 Epoch: 3 Global Step: 32080 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:33:20,311-Speed 5986.55 samples/sec Loss 13.1969 LearningRate 0.3528 Epoch: 3 Global Step: 32090 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 02:33:27,166-Speed 5976.35 samples/sec Loss 13.1042 LearningRate 0.3528 Epoch: 3 Global Step: 32100 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 02:33:34,037-Speed 5961.47 samples/sec Loss 13.1284 LearningRate 0.3527 Epoch: 3 Global Step: 32110 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 02:33:40,897-Speed 5972.37 samples/sec Loss 13.1048 LearningRate 0.3527 Epoch: 3 Global Step: 32120 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 02:33:47,753-Speed 5975.36 samples/sec Loss 13.1091 LearningRate 0.3527 Epoch: 3 Global Step: 32130 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 02:33:54,626-Speed 5960.71 samples/sec Loss 13.0894 LearningRate 0.3526 Epoch: 3 Global Step: 32140 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 02:34:01,510-Speed 5950.89 samples/sec Loss 13.1014 LearningRate 0.3526 Epoch: 3 Global Step: 32150 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 02:34:08,388-Speed 5956.45 samples/sec Loss 13.1436 LearningRate 0.3525 Epoch: 3 Global Step: 32160 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 02:34:15,268-Speed 5955.81 samples/sec Loss 13.2205 LearningRate 0.3525 Epoch: 3 Global Step: 32170 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 02:34:22,138-Speed 5965.86 samples/sec Loss 13.0562 LearningRate 0.3525 Epoch: 3 Global Step: 32180 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 02:34:28,988-Speed 5980.55 samples/sec Loss 13.0958 LearningRate 0.3524 Epoch: 3 Global Step: 32190 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:34:35,887-Speed 5938.64 samples/sec Loss 13.0562 LearningRate 0.3524 Epoch: 3 Global Step: 32200 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:34:42,747-Speed 5973.30 samples/sec Loss 13.0482 LearningRate 0.3523 Epoch: 3 Global Step: 32210 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:34:49,635-Speed 5947.92 samples/sec Loss 12.9981 LearningRate 0.3523 Epoch: 3 Global Step: 32220 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:34:56,513-Speed 5955.88 samples/sec Loss 12.9976 LearningRate 0.3523 Epoch: 3 Global Step: 32230 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:35:03,395-Speed 5953.32 samples/sec Loss 13.0517 LearningRate 0.3522 Epoch: 3 Global Step: 32240 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:35:10,244-Speed 5981.25 samples/sec Loss 13.0055 LearningRate 0.3522 Epoch: 3 Global Step: 32250 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:35:17,102-Speed 5974.22 samples/sec Loss 13.1200 LearningRate 0.3521 Epoch: 3 Global Step: 32260 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:35:23,957-Speed 5976.69 samples/sec Loss 13.0624 LearningRate 0.3521 Epoch: 3 Global Step: 32270 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:35:30,810-Speed 5977.49 samples/sec Loss 13.1798 LearningRate 0.3521 Epoch: 3 Global Step: 32280 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:35:37,667-Speed 5974.63 samples/sec Loss 13.0115 LearningRate 0.3520 Epoch: 3 Global Step: 32290 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:35:44,526-Speed 5972.55 samples/sec Loss 13.0931 LearningRate 0.3520 Epoch: 3 Global Step: 32300 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:35:51,378-Speed 5979.45 samples/sec Loss 13.0160 LearningRate 0.3519 Epoch: 3 Global Step: 32310 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:35:58,242-Speed 5967.99 samples/sec Loss 13.1804 LearningRate 0.3519 Epoch: 3 Global Step: 32320 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:36:05,117-Speed 5958.80 samples/sec Loss 13.0165 LearningRate 0.3519 Epoch: 3 Global Step: 32330 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:36:11,994-Speed 5957.53 samples/sec Loss 13.1345 LearningRate 0.3518 Epoch: 3 Global Step: 32340 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:36:18,862-Speed 5968.30 samples/sec Loss 13.0609 LearningRate 0.3518 Epoch: 3 Global Step: 32350 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:36:25,722-Speed 5972.05 samples/sec Loss 13.1251 LearningRate 0.3517 Epoch: 3 Global Step: 32360 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:36:32,573-Speed 5979.51 samples/sec Loss 13.0517 LearningRate 0.3517 Epoch: 3 Global Step: 32370 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:36:39,439-Speed 5966.65 samples/sec Loss 13.1045 LearningRate 0.3517 Epoch: 3 Global Step: 32380 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:36:46,302-Speed 5969.66 samples/sec Loss 13.1043 LearningRate 0.3516 Epoch: 3 Global Step: 32390 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:36:53,178-Speed 5958.65 samples/sec Loss 12.9836 LearningRate 0.3516 Epoch: 3 Global Step: 32400 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:37:00,046-Speed 5964.52 samples/sec Loss 13.1311 LearningRate 0.3515 Epoch: 3 Global Step: 32410 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:37:06,912-Speed 5967.24 samples/sec Loss 13.1076 LearningRate 0.3515 Epoch: 3 Global Step: 32420 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:37:13,796-Speed 5951.50 samples/sec Loss 13.0169 LearningRate 0.3515 Epoch: 3 Global Step: 32430 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:37:20,650-Speed 5976.49 samples/sec Loss 13.0365 LearningRate 0.3514 Epoch: 3 Global Step: 32440 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:37:27,527-Speed 5957.55 samples/sec Loss 12.9755 LearningRate 0.3514 Epoch: 3 Global Step: 32450 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:37:34,367-Speed 5989.24 samples/sec Loss 13.1631 LearningRate 0.3513 Epoch: 3 Global Step: 32460 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:37:41,240-Speed 5961.14 samples/sec Loss 13.0192 LearningRate 0.3513 Epoch: 3 Global Step: 32470 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:37:48,118-Speed 5957.54 samples/sec Loss 12.9854 LearningRate 0.3513 Epoch: 3 Global Step: 32480 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:37:54,991-Speed 5960.47 samples/sec Loss 13.0677 LearningRate 0.3512 Epoch: 3 Global Step: 32490 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:38:01,872-Speed 5953.88 samples/sec Loss 13.0943 LearningRate 0.3512 Epoch: 3 Global Step: 32500 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:38:08,765-Speed 5944.17 samples/sec Loss 13.1247 LearningRate 0.3511 Epoch: 3 Global Step: 32510 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:38:15,641-Speed 5957.56 samples/sec Loss 13.1465 LearningRate 0.3511 Epoch: 3 Global Step: 32520 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:38:22,516-Speed 5959.56 samples/sec Loss 13.0679 LearningRate 0.3511 Epoch: 3 Global Step: 32530 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:38:29,372-Speed 5975.55 samples/sec Loss 13.1082 LearningRate 0.3510 Epoch: 3 Global Step: 32540 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:38:36,226-Speed 5976.74 samples/sec Loss 13.1528 LearningRate 0.3510 Epoch: 3 Global Step: 32550 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:38:43,120-Speed 5941.90 samples/sec Loss 12.9783 LearningRate 0.3509 Epoch: 3 Global Step: 32560 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:38:50,063-Speed 5901.28 samples/sec Loss 13.0680 LearningRate 0.3509 Epoch: 3 Global Step: 32570 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:38:56,917-Speed 5977.00 samples/sec Loss 13.0893 LearningRate 0.3509 Epoch: 3 Global Step: 32580 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:39:03,771-Speed 5976.96 samples/sec Loss 13.0698 LearningRate 0.3508 Epoch: 3 Global Step: 32590 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:39:10,647-Speed 5957.47 samples/sec Loss 13.1024 LearningRate 0.3508 Epoch: 3 Global Step: 32600 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:39:17,521-Speed 5960.62 samples/sec Loss 13.0799 LearningRate 0.3507 Epoch: 3 Global Step: 32610 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:39:24,395-Speed 5959.86 samples/sec Loss 13.0139 LearningRate 0.3507 Epoch: 3 Global Step: 32620 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:39:31,258-Speed 5969.98 samples/sec Loss 13.0407 LearningRate 0.3507 Epoch: 3 Global Step: 32630 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:39:38,115-Speed 5973.75 samples/sec Loss 13.0512 LearningRate 0.3506 Epoch: 3 Global Step: 32640 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:39:44,968-Speed 5978.23 samples/sec Loss 13.0665 LearningRate 0.3506 Epoch: 3 Global Step: 32650 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:39:51,805-Speed 5991.96 samples/sec Loss 13.0726 LearningRate 0.3505 Epoch: 3 Global Step: 32660 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:39:58,671-Speed 5966.47 samples/sec Loss 13.0974 LearningRate 0.3505 Epoch: 3 Global Step: 32670 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:40:05,563-Speed 5944.49 samples/sec Loss 13.0805 LearningRate 0.3505 Epoch: 3 Global Step: 32680 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:40:12,425-Speed 5972.10 samples/sec Loss 12.9869 LearningRate 0.3504 Epoch: 3 Global Step: 32690 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:40:19,299-Speed 5959.40 samples/sec Loss 13.0295 LearningRate 0.3504 Epoch: 3 Global Step: 32700 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:40:26,193-Speed 5942.78 samples/sec Loss 13.0521 LearningRate 0.3503 Epoch: 3 Global Step: 32710 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:40:33,059-Speed 5967.14 samples/sec Loss 13.0089 LearningRate 0.3503 Epoch: 3 Global Step: 32720 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:40:39,928-Speed 5963.73 samples/sec Loss 12.9362 LearningRate 0.3503 Epoch: 3 Global Step: 32730 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:40:46,790-Speed 5970.15 samples/sec Loss 13.0172 LearningRate 0.3502 Epoch: 3 Global Step: 32740 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:40:53,642-Speed 5981.28 samples/sec Loss 12.9950 LearningRate 0.3502 Epoch: 3 Global Step: 32750 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:41:00,513-Speed 5962.80 samples/sec Loss 13.0027 LearningRate 0.3501 Epoch: 3 Global Step: 32760 Fp16 Grad Scale: 524288 Required: 34 hours Training: 2022-01-08 02:41:07,374-Speed 5971.32 samples/sec Loss 12.9652 LearningRate 0.3501 Epoch: 3 Global Step: 32770 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:41:14,268-Speed 5942.44 samples/sec Loss 12.9824 LearningRate 0.3500 Epoch: 3 Global Step: 32780 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:41:21,124-Speed 5975.30 samples/sec Loss 12.9819 LearningRate 0.3500 Epoch: 3 Global Step: 32790 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:41:27,985-Speed 5970.82 samples/sec Loss 12.9509 LearningRate 0.3500 Epoch: 3 Global Step: 32800 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:41:34,874-Speed 5950.06 samples/sec Loss 13.0460 LearningRate 0.3499 Epoch: 3 Global Step: 32810 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:41:41,755-Speed 5954.12 samples/sec Loss 12.9924 LearningRate 0.3499 Epoch: 3 Global Step: 32820 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:41:48,610-Speed 5975.90 samples/sec Loss 12.9362 LearningRate 0.3498 Epoch: 3 Global Step: 32830 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:41:55,477-Speed 5965.37 samples/sec Loss 13.0206 LearningRate 0.3498 Epoch: 3 Global Step: 32840 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:42:02,355-Speed 5956.74 samples/sec Loss 13.0096 LearningRate 0.3498 Epoch: 3 Global Step: 32850 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:42:09,221-Speed 5966.92 samples/sec Loss 12.9891 LearningRate 0.3497 Epoch: 3 Global Step: 32860 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:42:16,084-Speed 5969.61 samples/sec Loss 13.0319 LearningRate 0.3497 Epoch: 3 Global Step: 32870 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:42:22,947-Speed 5969.85 samples/sec Loss 13.0817 LearningRate 0.3496 Epoch: 3 Global Step: 32880 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:42:29,820-Speed 5960.37 samples/sec Loss 13.0109 LearningRate 0.3496 Epoch: 3 Global Step: 32890 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:42:36,684-Speed 5969.51 samples/sec Loss 12.9629 LearningRate 0.3496 Epoch: 3 Global Step: 32900 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:42:43,559-Speed 5958.76 samples/sec Loss 13.0479 LearningRate 0.3495 Epoch: 3 Global Step: 32910 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:42:50,477-Speed 5922.37 samples/sec Loss 13.0116 LearningRate 0.3495 Epoch: 3 Global Step: 32920 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:42:57,365-Speed 5947.71 samples/sec Loss 12.9860 LearningRate 0.3494 Epoch: 3 Global Step: 32930 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:43:04,325-Speed 5886.60 samples/sec Loss 13.0188 LearningRate 0.3494 Epoch: 3 Global Step: 32940 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:43:11,309-Speed 5869.03 samples/sec Loss 13.1129 LearningRate 0.3494 Epoch: 3 Global Step: 32950 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:43:18,252-Speed 5900.29 samples/sec Loss 13.0538 LearningRate 0.3493 Epoch: 3 Global Step: 32960 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:43:25,106-Speed 5977.93 samples/sec Loss 13.0555 LearningRate 0.3493 Epoch: 3 Global Step: 32970 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:43:31,962-Speed 5975.34 samples/sec Loss 12.9507 LearningRate 0.3492 Epoch: 3 Global Step: 32980 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:43:38,937-Speed 5874.18 samples/sec Loss 13.0298 LearningRate 0.3492 Epoch: 3 Global Step: 32990 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:43:45,911-Speed 5874.46 samples/sec Loss 13.0150 LearningRate 0.3492 Epoch: 3 Global Step: 33000 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:43:52,810-Speed 5938.11 samples/sec Loss 13.0228 LearningRate 0.3491 Epoch: 3 Global Step: 33010 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:43:59,685-Speed 5958.36 samples/sec Loss 12.9767 LearningRate 0.3491 Epoch: 3 Global Step: 33020 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:44:06,560-Speed 5962.24 samples/sec Loss 13.0147 LearningRate 0.3490 Epoch: 3 Global Step: 33030 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:44:13,409-Speed 5981.88 samples/sec Loss 13.0104 LearningRate 0.3490 Epoch: 3 Global Step: 33040 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:44:20,274-Speed 5967.43 samples/sec Loss 13.0239 LearningRate 0.3490 Epoch: 3 Global Step: 33050 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:44:27,142-Speed 5965.55 samples/sec Loss 12.9516 LearningRate 0.3489 Epoch: 3 Global Step: 33060 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:44:33,992-Speed 5981.50 samples/sec Loss 13.0494 LearningRate 0.3489 Epoch: 3 Global Step: 33070 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:44:40,865-Speed 5960.76 samples/sec Loss 13.0617 LearningRate 0.3488 Epoch: 3 Global Step: 33080 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:44:47,726-Speed 5970.90 samples/sec Loss 13.0140 LearningRate 0.3488 Epoch: 3 Global Step: 33090 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:44:54,581-Speed 5976.60 samples/sec Loss 13.0356 LearningRate 0.3488 Epoch: 3 Global Step: 33100 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:45:01,447-Speed 5966.33 samples/sec Loss 13.0459 LearningRate 0.3487 Epoch: 3 Global Step: 33110 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:45:08,324-Speed 5956.59 samples/sec Loss 12.9940 LearningRate 0.3487 Epoch: 3 Global Step: 33120 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:45:15,184-Speed 5972.58 samples/sec Loss 13.0542 LearningRate 0.3486 Epoch: 3 Global Step: 33130 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:45:22,066-Speed 5952.46 samples/sec Loss 13.0009 LearningRate 0.3486 Epoch: 3 Global Step: 33140 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:45:28,914-Speed 5983.67 samples/sec Loss 13.0359 LearningRate 0.3486 Epoch: 3 Global Step: 33150 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:45:35,766-Speed 5978.96 samples/sec Loss 13.0516 LearningRate 0.3485 Epoch: 3 Global Step: 33160 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:45:42,636-Speed 5964.00 samples/sec Loss 13.0571 LearningRate 0.3485 Epoch: 3 Global Step: 33170 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:45:49,494-Speed 5972.85 samples/sec Loss 12.9615 LearningRate 0.3484 Epoch: 3 Global Step: 33180 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:45:56,349-Speed 5976.82 samples/sec Loss 13.0164 LearningRate 0.3484 Epoch: 3 Global Step: 33190 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:46:03,226-Speed 5956.94 samples/sec Loss 13.0071 LearningRate 0.3484 Epoch: 3 Global Step: 33200 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:46:10,104-Speed 5956.86 samples/sec Loss 13.0085 LearningRate 0.3483 Epoch: 3 Global Step: 33210 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:46:16,971-Speed 5965.62 samples/sec Loss 12.9677 LearningRate 0.3483 Epoch: 3 Global Step: 33220 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:46:23,827-Speed 5975.64 samples/sec Loss 12.9082 LearningRate 0.3482 Epoch: 3 Global Step: 33230 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:46:30,686-Speed 5973.17 samples/sec Loss 12.9607 LearningRate 0.3482 Epoch: 3 Global Step: 33240 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:46:37,561-Speed 5958.82 samples/sec Loss 13.0553 LearningRate 0.3482 Epoch: 3 Global Step: 33250 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:46:44,428-Speed 5965.52 samples/sec Loss 12.9846 LearningRate 0.3481 Epoch: 3 Global Step: 33260 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:46:51,299-Speed 5962.33 samples/sec Loss 12.9508 LearningRate 0.3481 Epoch: 3 Global Step: 33270 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:46:58,159-Speed 5971.62 samples/sec Loss 12.9889 LearningRate 0.3480 Epoch: 3 Global Step: 33280 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:47:05,006-Speed 5983.24 samples/sec Loss 12.9454 LearningRate 0.3480 Epoch: 3 Global Step: 33290 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:47:11,872-Speed 5967.11 samples/sec Loss 12.9624 LearningRate 0.3480 Epoch: 3 Global Step: 33300 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:47:18,734-Speed 5972.04 samples/sec Loss 13.0104 LearningRate 0.3479 Epoch: 3 Global Step: 33310 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:47:25,580-Speed 5984.33 samples/sec Loss 12.8991 LearningRate 0.3479 Epoch: 3 Global Step: 33320 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:47:32,428-Speed 5982.26 samples/sec Loss 13.0301 LearningRate 0.3478 Epoch: 3 Global Step: 33330 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:47:39,280-Speed 5978.86 samples/sec Loss 12.9676 LearningRate 0.3478 Epoch: 3 Global Step: 33340 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:47:46,150-Speed 5964.24 samples/sec Loss 12.8425 LearningRate 0.3478 Epoch: 3 Global Step: 33350 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:47:53,009-Speed 5972.25 samples/sec Loss 13.0152 LearningRate 0.3477 Epoch: 3 Global Step: 33360 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:47:59,871-Speed 5970.66 samples/sec Loss 13.0069 LearningRate 0.3477 Epoch: 3 Global Step: 33370 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:48:06,843-Speed 5876.02 samples/sec Loss 12.8953 LearningRate 0.3476 Epoch: 3 Global Step: 33380 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:48:13,707-Speed 5970.05 samples/sec Loss 12.9681 LearningRate 0.3476 Epoch: 3 Global Step: 33390 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:48:20,581-Speed 5962.29 samples/sec Loss 12.9848 LearningRate 0.3476 Epoch: 3 Global Step: 33400 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:48:27,452-Speed 5962.29 samples/sec Loss 12.9588 LearningRate 0.3475 Epoch: 3 Global Step: 33410 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:48:34,315-Speed 5969.15 samples/sec Loss 12.9844 LearningRate 0.3475 Epoch: 3 Global Step: 33420 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:48:41,166-Speed 5980.25 samples/sec Loss 13.0504 LearningRate 0.3474 Epoch: 3 Global Step: 33430 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:48:48,031-Speed 5967.13 samples/sec Loss 12.9709 LearningRate 0.3474 Epoch: 3 Global Step: 33440 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:48:54,917-Speed 5950.37 samples/sec Loss 12.9475 LearningRate 0.3474 Epoch: 3 Global Step: 33450 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:49:01,787-Speed 5962.97 samples/sec Loss 12.9374 LearningRate 0.3473 Epoch: 3 Global Step: 33460 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:49:08,653-Speed 5966.67 samples/sec Loss 12.9578 LearningRate 0.3473 Epoch: 3 Global Step: 33470 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:49:15,526-Speed 5960.51 samples/sec Loss 12.8909 LearningRate 0.3472 Epoch: 3 Global Step: 33480 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:49:22,396-Speed 5963.69 samples/sec Loss 13.1079 LearningRate 0.3472 Epoch: 3 Global Step: 33490 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:49:29,266-Speed 5963.33 samples/sec Loss 12.9771 LearningRate 0.3472 Epoch: 3 Global Step: 33500 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:49:36,150-Speed 5950.40 samples/sec Loss 13.0024 LearningRate 0.3471 Epoch: 3 Global Step: 33510 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:49:43,018-Speed 5965.44 samples/sec Loss 12.9676 LearningRate 0.3471 Epoch: 3 Global Step: 33520 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:49:49,887-Speed 5964.04 samples/sec Loss 12.9617 LearningRate 0.3470 Epoch: 3 Global Step: 33530 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:49:56,781-Speed 5944.61 samples/sec Loss 12.9747 LearningRate 0.3470 Epoch: 3 Global Step: 33540 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:50:03,641-Speed 5971.86 samples/sec Loss 12.9897 LearningRate 0.3470 Epoch: 3 Global Step: 33550 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:50:10,513-Speed 5961.66 samples/sec Loss 13.0635 LearningRate 0.3469 Epoch: 3 Global Step: 33560 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:50:17,426-Speed 5926.43 samples/sec Loss 13.0727 LearningRate 0.3469 Epoch: 3 Global Step: 33570 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:50:24,286-Speed 5972.51 samples/sec Loss 12.9991 LearningRate 0.3468 Epoch: 3 Global Step: 33580 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:50:31,175-Speed 5946.30 samples/sec Loss 12.9635 LearningRate 0.3468 Epoch: 3 Global Step: 33590 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:50:38,049-Speed 5959.80 samples/sec Loss 12.9372 LearningRate 0.3468 Epoch: 3 Global Step: 33600 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:50:44,904-Speed 5976.45 samples/sec Loss 13.0339 LearningRate 0.3467 Epoch: 3 Global Step: 33610 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:50:51,765-Speed 5970.71 samples/sec Loss 12.9286 LearningRate 0.3467 Epoch: 3 Global Step: 33620 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:50:58,653-Speed 5948.05 samples/sec Loss 12.8788 LearningRate 0.3466 Epoch: 3 Global Step: 33630 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:51:05,508-Speed 5976.31 samples/sec Loss 12.9646 LearningRate 0.3466 Epoch: 3 Global Step: 33640 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:51:12,382-Speed 5960.16 samples/sec Loss 12.9956 LearningRate 0.3466 Epoch: 3 Global Step: 33650 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:51:19,254-Speed 5961.17 samples/sec Loss 12.9818 LearningRate 0.3465 Epoch: 3 Global Step: 33660 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:51:26,113-Speed 5973.01 samples/sec Loss 12.9588 LearningRate 0.3465 Epoch: 3 Global Step: 33670 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:51:32,999-Speed 5951.36 samples/sec Loss 12.8610 LearningRate 0.3465 Epoch: 3 Global Step: 33680 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:51:39,875-Speed 5957.67 samples/sec Loss 13.0184 LearningRate 0.3464 Epoch: 3 Global Step: 33690 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:51:46,749-Speed 5960.07 samples/sec Loss 12.9366 LearningRate 0.3464 Epoch: 3 Global Step: 33700 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:51:53,599-Speed 5980.38 samples/sec Loss 13.0326 LearningRate 0.3463 Epoch: 3 Global Step: 33710 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:52:00,455-Speed 5975.42 samples/sec Loss 12.9514 LearningRate 0.3463 Epoch: 3 Global Step: 33720 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:52:07,344-Speed 5947.11 samples/sec Loss 12.9239 LearningRate 0.3463 Epoch: 3 Global Step: 33730 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:52:14,202-Speed 5973.63 samples/sec Loss 12.9256 LearningRate 0.3462 Epoch: 3 Global Step: 33740 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:52:21,066-Speed 5968.45 samples/sec Loss 12.8862 LearningRate 0.3462 Epoch: 3 Global Step: 33750 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:52:27,932-Speed 5967.44 samples/sec Loss 12.9521 LearningRate 0.3461 Epoch: 3 Global Step: 33760 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:52:34,806-Speed 5959.57 samples/sec Loss 12.9266 LearningRate 0.3461 Epoch: 3 Global Step: 33770 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:52:41,693-Speed 5948.91 samples/sec Loss 12.9735 LearningRate 0.3461 Epoch: 3 Global Step: 33780 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:52:48,685-Speed 5859.48 samples/sec Loss 12.9519 LearningRate 0.3460 Epoch: 3 Global Step: 33790 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:52:55,655-Speed 5877.58 samples/sec Loss 12.9090 LearningRate 0.3460 Epoch: 3 Global Step: 33800 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:53:02,592-Speed 5906.19 samples/sec Loss 13.0005 LearningRate 0.3459 Epoch: 3 Global Step: 33810 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:53:09,452-Speed 5971.28 samples/sec Loss 12.8626 LearningRate 0.3459 Epoch: 3 Global Step: 33820 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:53:16,310-Speed 5974.34 samples/sec Loss 12.9246 LearningRate 0.3459 Epoch: 3 Global Step: 33830 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:53:23,166-Speed 5976.57 samples/sec Loss 12.9989 LearningRate 0.3458 Epoch: 3 Global Step: 33840 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:53:30,029-Speed 5969.64 samples/sec Loss 12.9496 LearningRate 0.3458 Epoch: 3 Global Step: 33850 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:53:36,896-Speed 5965.88 samples/sec Loss 12.9742 LearningRate 0.3457 Epoch: 3 Global Step: 33860 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:53:43,780-Speed 5951.39 samples/sec Loss 13.0075 LearningRate 0.3457 Epoch: 3 Global Step: 33870 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:53:50,642-Speed 5969.51 samples/sec Loss 12.9606 LearningRate 0.3457 Epoch: 3 Global Step: 33880 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:53:57,521-Speed 5955.99 samples/sec Loss 12.9344 LearningRate 0.3456 Epoch: 3 Global Step: 33890 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:54:04,387-Speed 5966.23 samples/sec Loss 12.9661 LearningRate 0.3456 Epoch: 3 Global Step: 33900 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:54:11,243-Speed 5975.21 samples/sec Loss 12.9285 LearningRate 0.3455 Epoch: 3 Global Step: 33910 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:54:18,107-Speed 5968.37 samples/sec Loss 12.8981 LearningRate 0.3455 Epoch: 3 Global Step: 33920 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:54:24,961-Speed 5976.90 samples/sec Loss 12.9564 LearningRate 0.3455 Epoch: 3 Global Step: 33930 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:54:31,844-Speed 5952.40 samples/sec Loss 13.0498 LearningRate 0.3454 Epoch: 3 Global Step: 33940 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:54:38,694-Speed 5980.71 samples/sec Loss 12.9252 LearningRate 0.3454 Epoch: 3 Global Step: 33950 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:54:45,576-Speed 5955.31 samples/sec Loss 12.8676 LearningRate 0.3453 Epoch: 3 Global Step: 33960 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:54:52,472-Speed 5941.24 samples/sec Loss 12.8670 LearningRate 0.3453 Epoch: 3 Global Step: 33970 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:54:59,332-Speed 5971.96 samples/sec Loss 12.9310 LearningRate 0.3453 Epoch: 3 Global Step: 33980 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:55:06,187-Speed 5975.77 samples/sec Loss 12.8811 LearningRate 0.3452 Epoch: 3 Global Step: 33990 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:55:13,145-Speed 5889.03 samples/sec Loss 12.9152 LearningRate 0.3452 Epoch: 3 Global Step: 34000 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:55:20,099-Speed 5890.93 samples/sec Loss 12.9747 LearningRate 0.3451 Epoch: 3 Global Step: 34010 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:55:26,996-Speed 5939.88 samples/sec Loss 12.9659 LearningRate 0.3451 Epoch: 3 Global Step: 34020 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:55:33,878-Speed 5953.20 samples/sec Loss 12.9396 LearningRate 0.3451 Epoch: 3 Global Step: 34030 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:55:40,730-Speed 5978.86 samples/sec Loss 12.8983 LearningRate 0.3450 Epoch: 3 Global Step: 34040 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:55:47,598-Speed 5965.39 samples/sec Loss 12.9311 LearningRate 0.3450 Epoch: 3 Global Step: 34050 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:55:54,470-Speed 5961.56 samples/sec Loss 12.9677 LearningRate 0.3449 Epoch: 3 Global Step: 34060 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:56:01,340-Speed 5963.37 samples/sec Loss 12.9206 LearningRate 0.3449 Epoch: 3 Global Step: 34070 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:56:08,203-Speed 5968.69 samples/sec Loss 12.9075 LearningRate 0.3449 Epoch: 3 Global Step: 34080 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 02:56:15,076-Speed 5961.26 samples/sec Loss 12.9878 LearningRate 0.3448 Epoch: 3 Global Step: 34090 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:56:21,948-Speed 5961.09 samples/sec Loss 12.9206 LearningRate 0.3448 Epoch: 3 Global Step: 34100 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:56:28,823-Speed 5958.97 samples/sec Loss 12.8239 LearningRate 0.3447 Epoch: 3 Global Step: 34110 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:56:35,690-Speed 5965.89 samples/sec Loss 12.9741 LearningRate 0.3447 Epoch: 3 Global Step: 34120 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:56:42,551-Speed 5971.24 samples/sec Loss 12.8839 LearningRate 0.3447 Epoch: 3 Global Step: 34130 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:56:49,412-Speed 5970.56 samples/sec Loss 12.9548 LearningRate 0.3446 Epoch: 3 Global Step: 34140 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:56:56,281-Speed 5964.94 samples/sec Loss 12.8104 LearningRate 0.3446 Epoch: 3 Global Step: 34150 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:57:03,150-Speed 5964.04 samples/sec Loss 12.9538 LearningRate 0.3445 Epoch: 3 Global Step: 34160 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:57:10,068-Speed 5921.89 samples/sec Loss 12.9426 LearningRate 0.3445 Epoch: 3 Global Step: 34170 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:57:17,057-Speed 5861.92 samples/sec Loss 12.9151 LearningRate 0.3445 Epoch: 3 Global Step: 34180 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:57:24,010-Speed 5891.94 samples/sec Loss 12.8918 LearningRate 0.3444 Epoch: 3 Global Step: 34190 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:57:30,971-Speed 5885.54 samples/sec Loss 12.9776 LearningRate 0.3444 Epoch: 3 Global Step: 34200 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:57:37,850-Speed 5956.06 samples/sec Loss 12.8922 LearningRate 0.3443 Epoch: 3 Global Step: 34210 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:57:44,723-Speed 5960.76 samples/sec Loss 12.9198 LearningRate 0.3443 Epoch: 3 Global Step: 34220 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:57:51,594-Speed 5962.70 samples/sec Loss 12.8769 LearningRate 0.3443 Epoch: 3 Global Step: 34230 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:57:58,467-Speed 5960.14 samples/sec Loss 12.9206 LearningRate 0.3442 Epoch: 3 Global Step: 34240 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:58:05,344-Speed 5957.28 samples/sec Loss 12.8988 LearningRate 0.3442 Epoch: 3 Global Step: 34250 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:58:12,208-Speed 5968.24 samples/sec Loss 12.8704 LearningRate 0.3441 Epoch: 3 Global Step: 34260 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:58:19,088-Speed 5954.45 samples/sec Loss 12.8952 LearningRate 0.3441 Epoch: 3 Global Step: 34270 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:58:25,958-Speed 5962.85 samples/sec Loss 12.8123 LearningRate 0.3441 Epoch: 3 Global Step: 34280 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:58:32,831-Speed 5961.07 samples/sec Loss 12.8565 LearningRate 0.3440 Epoch: 3 Global Step: 34290 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:58:39,704-Speed 5961.40 samples/sec Loss 12.9327 LearningRate 0.3440 Epoch: 3 Global Step: 34300 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:58:46,570-Speed 5966.74 samples/sec Loss 12.8449 LearningRate 0.3439 Epoch: 3 Global Step: 34310 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:58:53,430-Speed 5971.74 samples/sec Loss 12.8190 LearningRate 0.3439 Epoch: 3 Global Step: 34320 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:59:00,301-Speed 5962.42 samples/sec Loss 12.9359 LearningRate 0.3439 Epoch: 3 Global Step: 34330 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:59:07,173-Speed 5961.65 samples/sec Loss 12.8642 LearningRate 0.3438 Epoch: 3 Global Step: 34340 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:59:14,028-Speed 5976.15 samples/sec Loss 12.8692 LearningRate 0.3438 Epoch: 3 Global Step: 34350 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:59:20,914-Speed 5949.89 samples/sec Loss 12.9574 LearningRate 0.3437 Epoch: 3 Global Step: 34360 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:59:27,778-Speed 5968.56 samples/sec Loss 12.9374 LearningRate 0.3437 Epoch: 3 Global Step: 34370 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:59:34,708-Speed 5914.25 samples/sec Loss 12.9699 LearningRate 0.3437 Epoch: 3 Global Step: 34380 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:59:41,561-Speed 5978.40 samples/sec Loss 12.9106 LearningRate 0.3436 Epoch: 3 Global Step: 34390 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:59:48,439-Speed 5956.95 samples/sec Loss 12.8611 LearningRate 0.3436 Epoch: 3 Global Step: 34400 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 02:59:55,308-Speed 5964.75 samples/sec Loss 12.9241 LearningRate 0.3435 Epoch: 3 Global Step: 34410 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:00:02,170-Speed 5970.00 samples/sec Loss 12.9254 LearningRate 0.3435 Epoch: 3 Global Step: 34420 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:00:09,038-Speed 5965.39 samples/sec Loss 12.8321 LearningRate 0.3435 Epoch: 3 Global Step: 34430 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:00:15,895-Speed 5976.79 samples/sec Loss 12.7926 LearningRate 0.3434 Epoch: 3 Global Step: 34440 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:00:22,746-Speed 5980.07 samples/sec Loss 12.8846 LearningRate 0.3434 Epoch: 3 Global Step: 34450 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:00:29,612-Speed 5966.65 samples/sec Loss 12.8582 LearningRate 0.3433 Epoch: 3 Global Step: 34460 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:00:36,474-Speed 5970.01 samples/sec Loss 12.9062 LearningRate 0.3433 Epoch: 3 Global Step: 34470 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:00:43,337-Speed 5969.14 samples/sec Loss 12.8937 LearningRate 0.3433 Epoch: 3 Global Step: 34480 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:00:50,196-Speed 5972.09 samples/sec Loss 12.9692 LearningRate 0.3432 Epoch: 3 Global Step: 34490 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:00:57,065-Speed 5964.42 samples/sec Loss 12.8395 LearningRate 0.3432 Epoch: 3 Global Step: 34500 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:01:03,949-Speed 5951.59 samples/sec Loss 12.9037 LearningRate 0.3431 Epoch: 3 Global Step: 34510 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:01:10,840-Speed 5945.07 samples/sec Loss 12.8309 LearningRate 0.3431 Epoch: 3 Global Step: 34520 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:01:17,703-Speed 5969.36 samples/sec Loss 12.8004 LearningRate 0.3431 Epoch: 3 Global Step: 34530 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:01:24,565-Speed 5973.28 samples/sec Loss 12.8615 LearningRate 0.3430 Epoch: 3 Global Step: 34540 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:01:31,469-Speed 5933.71 samples/sec Loss 12.8086 LearningRate 0.3430 Epoch: 3 Global Step: 34550 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:01:38,330-Speed 5970.93 samples/sec Loss 12.8029 LearningRate 0.3429 Epoch: 3 Global Step: 34560 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:01:45,196-Speed 5966.62 samples/sec Loss 12.7801 LearningRate 0.3429 Epoch: 3 Global Step: 34570 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:01:52,086-Speed 5946.98 samples/sec Loss 12.9255 LearningRate 0.3429 Epoch: 3 Global Step: 34580 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:01:58,950-Speed 5968.65 samples/sec Loss 12.9080 LearningRate 0.3428 Epoch: 3 Global Step: 34590 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:02:05,824-Speed 5959.48 samples/sec Loss 12.8843 LearningRate 0.3428 Epoch: 3 Global Step: 34600 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:02:12,700-Speed 5958.16 samples/sec Loss 12.9027 LearningRate 0.3428 Epoch: 3 Global Step: 34610 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:02:19,561-Speed 5975.34 samples/sec Loss 12.8984 LearningRate 0.3427 Epoch: 3 Global Step: 34620 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:02:26,413-Speed 5979.20 samples/sec Loss 12.9026 LearningRate 0.3427 Epoch: 3 Global Step: 34630 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:02:33,279-Speed 5966.15 samples/sec Loss 12.8617 LearningRate 0.3426 Epoch: 3 Global Step: 34640 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:02:40,158-Speed 5955.65 samples/sec Loss 12.9171 LearningRate 0.3426 Epoch: 3 Global Step: 34650 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:02:47,024-Speed 5969.41 samples/sec Loss 12.9317 LearningRate 0.3426 Epoch: 3 Global Step: 34660 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:02:53,913-Speed 5946.85 samples/sec Loss 12.8840 LearningRate 0.3425 Epoch: 3 Global Step: 34670 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:03:00,771-Speed 5973.61 samples/sec Loss 12.8959 LearningRate 0.3425 Epoch: 3 Global Step: 34680 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:03:07,649-Speed 5956.64 samples/sec Loss 12.8747 LearningRate 0.3424 Epoch: 3 Global Step: 34690 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:03:14,507-Speed 5974.06 samples/sec Loss 12.8555 LearningRate 0.3424 Epoch: 3 Global Step: 34700 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:03:21,400-Speed 5943.30 samples/sec Loss 12.8518 LearningRate 0.3424 Epoch: 3 Global Step: 34710 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:03:28,303-Speed 5934.15 samples/sec Loss 12.8006 LearningRate 0.3423 Epoch: 3 Global Step: 34720 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:03:35,170-Speed 5966.54 samples/sec Loss 12.8625 LearningRate 0.3423 Epoch: 3 Global Step: 34730 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:03:42,039-Speed 5963.40 samples/sec Loss 12.8481 LearningRate 0.3422 Epoch: 3 Global Step: 34740 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:03:48,890-Speed 5980.00 samples/sec Loss 12.8778 LearningRate 0.3422 Epoch: 3 Global Step: 34750 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:03:55,771-Speed 5953.82 samples/sec Loss 12.9244 LearningRate 0.3422 Epoch: 3 Global Step: 34760 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:04:02,653-Speed 5953.13 samples/sec Loss 12.8778 LearningRate 0.3421 Epoch: 3 Global Step: 34770 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:04:09,535-Speed 5952.88 samples/sec Loss 12.7923 LearningRate 0.3421 Epoch: 3 Global Step: 34780 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:04:16,412-Speed 5959.25 samples/sec Loss 12.7474 LearningRate 0.3420 Epoch: 3 Global Step: 34790 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:04:23,271-Speed 5972.51 samples/sec Loss 12.8250 LearningRate 0.3420 Epoch: 3 Global Step: 34800 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:04:30,151-Speed 5954.73 samples/sec Loss 12.9870 LearningRate 0.3420 Epoch: 3 Global Step: 34810 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:04:36,997-Speed 5984.22 samples/sec Loss 12.8540 LearningRate 0.3419 Epoch: 3 Global Step: 34820 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:04:43,877-Speed 5954.24 samples/sec Loss 12.8846 LearningRate 0.3419 Epoch: 3 Global Step: 34830 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:04:50,736-Speed 5973.37 samples/sec Loss 12.7806 LearningRate 0.3418 Epoch: 3 Global Step: 34840 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:04:57,635-Speed 5938.04 samples/sec Loss 12.8654 LearningRate 0.3418 Epoch: 3 Global Step: 34850 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:05:04,544-Speed 5930.39 samples/sec Loss 12.8341 LearningRate 0.3418 Epoch: 3 Global Step: 34860 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:05:11,398-Speed 5976.79 samples/sec Loss 12.7746 LearningRate 0.3417 Epoch: 3 Global Step: 34870 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:05:18,281-Speed 5951.94 samples/sec Loss 12.8412 LearningRate 0.3417 Epoch: 3 Global Step: 34880 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:05:25,139-Speed 5978.76 samples/sec Loss 12.8030 LearningRate 0.3416 Epoch: 3 Global Step: 34890 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:05:32,011-Speed 5961.46 samples/sec Loss 12.8893 LearningRate 0.3416 Epoch: 3 Global Step: 34900 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:05:38,905-Speed 5942.61 samples/sec Loss 12.8925 LearningRate 0.3416 Epoch: 3 Global Step: 34910 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:05:45,772-Speed 5966.23 samples/sec Loss 12.8383 LearningRate 0.3415 Epoch: 3 Global Step: 34920 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:05:52,638-Speed 5966.21 samples/sec Loss 12.8303 LearningRate 0.3415 Epoch: 3 Global Step: 34930 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:05:59,500-Speed 5970.47 samples/sec Loss 12.8605 LearningRate 0.3414 Epoch: 3 Global Step: 34940 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:06:06,379-Speed 5956.00 samples/sec Loss 12.7994 LearningRate 0.3414 Epoch: 3 Global Step: 34950 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:06:13,250-Speed 5962.21 samples/sec Loss 12.8447 LearningRate 0.3414 Epoch: 3 Global Step: 34960 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:06:20,137-Speed 5951.80 samples/sec Loss 12.8177 LearningRate 0.3413 Epoch: 3 Global Step: 34970 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:06:27,082-Speed 5898.95 samples/sec Loss 12.8266 LearningRate 0.3413 Epoch: 3 Global Step: 34980 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:06:33,963-Speed 5954.17 samples/sec Loss 12.8863 LearningRate 0.3412 Epoch: 3 Global Step: 34990 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:06:40,841-Speed 5956.14 samples/sec Loss 12.8217 LearningRate 0.3412 Epoch: 3 Global Step: 35000 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:07:07,647-[lfw][35000]XNorm: 22.868388 Training: 2022-01-08 03:07:07,647-[lfw][35000]Accuracy-Flip: 0.99617+-0.00248 Training: 2022-01-08 03:07:07,648-[lfw][35000]Accuracy-Highest: 0.99650 Training: 2022-01-08 03:07:38,565-[cfp_fp][35000]XNorm: 20.275993 Training: 2022-01-08 03:07:38,566-[cfp_fp][35000]Accuracy-Flip: 0.97057+-0.00661 Training: 2022-01-08 03:07:38,567-[cfp_fp][35000]Accuracy-Highest: 0.97057 Training: 2022-01-08 03:08:05,315-[agedb_30][35000]XNorm: 22.245622 Training: 2022-01-08 03:08:05,316-[agedb_30][35000]Accuracy-Flip: 0.96200+-0.00823 Training: 2022-01-08 03:08:05,316-[agedb_30][35000]Accuracy-Highest: 0.96200 Training: 2022-01-08 03:08:12,167-Speed 448.51 samples/sec Loss 12.8203 LearningRate 0.3412 Epoch: 3 Global Step: 35010 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:08:19,021-Speed 5977.63 samples/sec Loss 12.8510 LearningRate 0.3411 Epoch: 3 Global Step: 35020 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:08:25,884-Speed 5969.47 samples/sec Loss 12.7904 LearningRate 0.3411 Epoch: 3 Global Step: 35030 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:08:32,763-Speed 5955.90 samples/sec Loss 12.7809 LearningRate 0.3410 Epoch: 3 Global Step: 35040 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:08:39,640-Speed 5956.81 samples/sec Loss 12.8707 LearningRate 0.3410 Epoch: 3 Global Step: 35050 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:08:46,532-Speed 5947.09 samples/sec Loss 12.8722 LearningRate 0.3410 Epoch: 3 Global Step: 35060 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:08:53,412-Speed 5953.93 samples/sec Loss 12.7846 LearningRate 0.3409 Epoch: 3 Global Step: 35070 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:09:00,296-Speed 5951.36 samples/sec Loss 12.8447 LearningRate 0.3409 Epoch: 3 Global Step: 35080 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:09:07,177-Speed 5953.29 samples/sec Loss 12.8343 LearningRate 0.3408 Epoch: 3 Global Step: 35090 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:09:14,074-Speed 5939.26 samples/sec Loss 12.8410 LearningRate 0.3408 Epoch: 3 Global Step: 35100 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:09:20,957-Speed 5952.89 samples/sec Loss 12.9310 LearningRate 0.3408 Epoch: 3 Global Step: 35110 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:09:27,832-Speed 5958.70 samples/sec Loss 12.8291 LearningRate 0.3407 Epoch: 3 Global Step: 35120 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:09:34,734-Speed 5935.20 samples/sec Loss 12.8046 LearningRate 0.3407 Epoch: 3 Global Step: 35130 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:09:41,771-Speed 5822.02 samples/sec Loss 12.7869 LearningRate 0.3407 Epoch: 3 Global Step: 35140 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:09:48,713-Speed 5902.26 samples/sec Loss 12.7894 LearningRate 0.3406 Epoch: 3 Global Step: 35150 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:09:55,568-Speed 5976.64 samples/sec Loss 12.8847 LearningRate 0.3406 Epoch: 3 Global Step: 35160 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:10:02,424-Speed 5974.60 samples/sec Loss 12.8157 LearningRate 0.3405 Epoch: 3 Global Step: 35170 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:10:09,273-Speed 5982.48 samples/sec Loss 12.8728 LearningRate 0.3405 Epoch: 3 Global Step: 35180 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:10:16,143-Speed 5963.69 samples/sec Loss 12.8404 LearningRate 0.3405 Epoch: 3 Global Step: 35190 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:10:23,010-Speed 5966.16 samples/sec Loss 12.7803 LearningRate 0.3404 Epoch: 3 Global Step: 35200 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:10:29,906-Speed 5940.83 samples/sec Loss 12.8216 LearningRate 0.3404 Epoch: 3 Global Step: 35210 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:10:36,793-Speed 5949.79 samples/sec Loss 12.8713 LearningRate 0.3403 Epoch: 3 Global Step: 35220 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:10:43,742-Speed 5895.05 samples/sec Loss 12.7762 LearningRate 0.3403 Epoch: 3 Global Step: 35230 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:10:50,594-Speed 5979.02 samples/sec Loss 12.8535 LearningRate 0.3403 Epoch: 3 Global Step: 35240 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:10:57,459-Speed 5968.36 samples/sec Loss 12.8611 LearningRate 0.3402 Epoch: 3 Global Step: 35250 Fp16 Grad Scale: 524288 Required: 34 hours Training: 2022-01-08 03:11:04,307-Speed 5982.29 samples/sec Loss 12.8067 LearningRate 0.3402 Epoch: 3 Global Step: 35260 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:11:11,199-Speed 5944.31 samples/sec Loss 12.7844 LearningRate 0.3401 Epoch: 3 Global Step: 35270 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:11:18,069-Speed 5963.87 samples/sec Loss 12.7851 LearningRate 0.3401 Epoch: 3 Global Step: 35280 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:11:24,921-Speed 5979.19 samples/sec Loss 12.8212 LearningRate 0.3401 Epoch: 3 Global Step: 35290 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:11:31,794-Speed 5960.81 samples/sec Loss 12.7987 LearningRate 0.3400 Epoch: 3 Global Step: 35300 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:11:38,778-Speed 5869.44 samples/sec Loss 12.8035 LearningRate 0.3400 Epoch: 3 Global Step: 35310 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:11:45,686-Speed 5931.33 samples/sec Loss 12.7588 LearningRate 0.3399 Epoch: 3 Global Step: 35320 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:11:52,546-Speed 5972.12 samples/sec Loss 12.8414 LearningRate 0.3399 Epoch: 3 Global Step: 35330 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:11:59,392-Speed 5982.96 samples/sec Loss 12.7769 LearningRate 0.3399 Epoch: 3 Global Step: 35340 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:12:06,267-Speed 5959.21 samples/sec Loss 12.7827 LearningRate 0.3398 Epoch: 3 Global Step: 35350 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:12:13,114-Speed 5983.36 samples/sec Loss 12.7988 LearningRate 0.3398 Epoch: 3 Global Step: 35360 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:12:19,983-Speed 5964.11 samples/sec Loss 12.8141 LearningRate 0.3397 Epoch: 3 Global Step: 35370 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:12:26,831-Speed 5982.86 samples/sec Loss 12.9291 LearningRate 0.3397 Epoch: 3 Global Step: 35380 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:12:33,704-Speed 5960.47 samples/sec Loss 12.7336 LearningRate 0.3397 Epoch: 3 Global Step: 35390 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:12:40,548-Speed 5985.29 samples/sec Loss 12.7857 LearningRate 0.3396 Epoch: 3 Global Step: 35400 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:12:47,404-Speed 5977.37 samples/sec Loss 12.7702 LearningRate 0.3396 Epoch: 3 Global Step: 35410 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:12:54,265-Speed 5971.03 samples/sec Loss 12.8544 LearningRate 0.3395 Epoch: 3 Global Step: 35420 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:13:01,122-Speed 5974.05 samples/sec Loss 12.8004 LearningRate 0.3395 Epoch: 3 Global Step: 35430 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:13:07,981-Speed 5972.61 samples/sec Loss 12.6801 LearningRate 0.3395 Epoch: 3 Global Step: 35440 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:13:14,836-Speed 5976.48 samples/sec Loss 12.7863 LearningRate 0.3394 Epoch: 3 Global Step: 35450 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:13:21,705-Speed 5963.98 samples/sec Loss 12.7665 LearningRate 0.3394 Epoch: 3 Global Step: 35460 Fp16 Grad Scale: 524288 Required: 34 hours Training: 2022-01-08 03:13:28,561-Speed 5976.24 samples/sec Loss 12.7936 LearningRate 0.3393 Epoch: 3 Global Step: 35470 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:13:35,427-Speed 5966.67 samples/sec Loss 12.8745 LearningRate 0.3393 Epoch: 3 Global Step: 35480 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:13:42,311-Speed 5952.83 samples/sec Loss 12.8626 LearningRate 0.3393 Epoch: 3 Global Step: 35490 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:13:49,167-Speed 5975.55 samples/sec Loss 12.7848 LearningRate 0.3392 Epoch: 3 Global Step: 35500 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:13:56,031-Speed 5968.29 samples/sec Loss 12.7828 LearningRate 0.3392 Epoch: 3 Global Step: 35510 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:14:02,898-Speed 5966.49 samples/sec Loss 12.6658 LearningRate 0.3391 Epoch: 3 Global Step: 35520 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:14:09,781-Speed 5951.97 samples/sec Loss 12.7361 LearningRate 0.3391 Epoch: 3 Global Step: 35530 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:14:16,699-Speed 5922.30 samples/sec Loss 12.7698 LearningRate 0.3391 Epoch: 3 Global Step: 35540 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:14:23,559-Speed 5973.44 samples/sec Loss 12.7793 LearningRate 0.3390 Epoch: 3 Global Step: 35550 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:14:30,405-Speed 5984.42 samples/sec Loss 12.8783 LearningRate 0.3390 Epoch: 3 Global Step: 35560 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:14:37,274-Speed 5964.55 samples/sec Loss 12.7579 LearningRate 0.3390 Epoch: 3 Global Step: 35570 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:14:44,134-Speed 5971.82 samples/sec Loss 12.8019 LearningRate 0.3389 Epoch: 3 Global Step: 35580 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:14:50,990-Speed 5976.06 samples/sec Loss 12.8036 LearningRate 0.3389 Epoch: 3 Global Step: 35590 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:14:57,869-Speed 5957.62 samples/sec Loss 12.7034 LearningRate 0.3388 Epoch: 3 Global Step: 35600 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:15:04,720-Speed 5979.46 samples/sec Loss 12.8082 LearningRate 0.3388 Epoch: 3 Global Step: 35610 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:15:11,596-Speed 5957.40 samples/sec Loss 12.7496 LearningRate 0.3388 Epoch: 3 Global Step: 35620 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:15:18,468-Speed 5961.31 samples/sec Loss 12.7596 LearningRate 0.3387 Epoch: 3 Global Step: 35630 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:15:25,344-Speed 5958.45 samples/sec Loss 12.7673 LearningRate 0.3387 Epoch: 3 Global Step: 35640 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:15:32,231-Speed 5949.14 samples/sec Loss 12.8837 LearningRate 0.3386 Epoch: 3 Global Step: 35650 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:15:39,142-Speed 5927.86 samples/sec Loss 12.7112 LearningRate 0.3386 Epoch: 3 Global Step: 35660 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:15:46,014-Speed 5961.65 samples/sec Loss 12.7600 LearningRate 0.3386 Epoch: 3 Global Step: 35670 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:15:52,876-Speed 5970.46 samples/sec Loss 12.8109 LearningRate 0.3385 Epoch: 3 Global Step: 35680 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:15:59,733-Speed 5975.39 samples/sec Loss 12.8131 LearningRate 0.3385 Epoch: 3 Global Step: 35690 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:16:06,594-Speed 5971.08 samples/sec Loss 12.6762 LearningRate 0.3384 Epoch: 3 Global Step: 35700 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:16:13,451-Speed 5974.49 samples/sec Loss 12.8999 LearningRate 0.3384 Epoch: 3 Global Step: 35710 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:16:20,321-Speed 5963.20 samples/sec Loss 12.7516 LearningRate 0.3384 Epoch: 3 Global Step: 35720 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:16:27,186-Speed 5968.15 samples/sec Loss 12.7926 LearningRate 0.3383 Epoch: 3 Global Step: 35730 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:16:34,072-Speed 5949.04 samples/sec Loss 12.7247 LearningRate 0.3383 Epoch: 3 Global Step: 35740 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:16:40,949-Speed 5958.04 samples/sec Loss 12.7451 LearningRate 0.3382 Epoch: 3 Global Step: 35750 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:16:47,815-Speed 5966.10 samples/sec Loss 12.7257 LearningRate 0.3382 Epoch: 3 Global Step: 35760 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:16:54,669-Speed 5977.20 samples/sec Loss 12.8309 LearningRate 0.3382 Epoch: 3 Global Step: 35770 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:17:01,540-Speed 5963.35 samples/sec Loss 12.8084 LearningRate 0.3381 Epoch: 3 Global Step: 35780 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:17:08,408-Speed 5965.15 samples/sec Loss 12.7074 LearningRate 0.3381 Epoch: 3 Global Step: 35790 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:17:15,284-Speed 5957.67 samples/sec Loss 12.8252 LearningRate 0.3380 Epoch: 3 Global Step: 35800 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:17:22,171-Speed 5951.88 samples/sec Loss 12.6984 LearningRate 0.3380 Epoch: 3 Global Step: 35810 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:17:29,040-Speed 5964.44 samples/sec Loss 12.8010 LearningRate 0.3380 Epoch: 3 Global Step: 35820 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:17:35,934-Speed 5942.90 samples/sec Loss 12.7681 LearningRate 0.3379 Epoch: 3 Global Step: 35830 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:17:42,795-Speed 5973.33 samples/sec Loss 12.7770 LearningRate 0.3379 Epoch: 3 Global Step: 35840 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:17:49,682-Speed 5949.68 samples/sec Loss 12.7095 LearningRate 0.3378 Epoch: 3 Global Step: 35850 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:17:56,559-Speed 5957.22 samples/sec Loss 12.7904 LearningRate 0.3378 Epoch: 3 Global Step: 35860 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:18:03,415-Speed 5976.02 samples/sec Loss 12.7532 LearningRate 0.3378 Epoch: 3 Global Step: 35870 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:18:10,384-Speed 5878.68 samples/sec Loss 12.7651 LearningRate 0.3377 Epoch: 3 Global Step: 35880 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:18:17,241-Speed 5975.04 samples/sec Loss 12.7378 LearningRate 0.3377 Epoch: 3 Global Step: 35890 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:18:24,111-Speed 5962.86 samples/sec Loss 12.7082 LearningRate 0.3377 Epoch: 3 Global Step: 35900 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:18:30,981-Speed 5963.71 samples/sec Loss 12.8126 LearningRate 0.3376 Epoch: 3 Global Step: 35910 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:18:37,858-Speed 5957.54 samples/sec Loss 12.8129 LearningRate 0.3376 Epoch: 3 Global Step: 35920 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:18:44,718-Speed 5971.07 samples/sec Loss 12.6949 LearningRate 0.3375 Epoch: 3 Global Step: 35930 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:18:51,593-Speed 5961.05 samples/sec Loss 12.7852 LearningRate 0.3375 Epoch: 3 Global Step: 35940 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:18:58,458-Speed 5968.24 samples/sec Loss 12.7763 LearningRate 0.3375 Epoch: 3 Global Step: 35950 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:19:05,322-Speed 5969.64 samples/sec Loss 12.7034 LearningRate 0.3374 Epoch: 3 Global Step: 35960 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:19:12,171-Speed 5980.79 samples/sec Loss 12.7383 LearningRate 0.3374 Epoch: 3 Global Step: 35970 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:19:19,078-Speed 5931.86 samples/sec Loss 12.7678 LearningRate 0.3373 Epoch: 3 Global Step: 35980 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:19:25,937-Speed 5972.61 samples/sec Loss 12.7070 LearningRate 0.3373 Epoch: 3 Global Step: 35990 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:19:32,843-Speed 5931.95 samples/sec Loss 12.7858 LearningRate 0.3373 Epoch: 3 Global Step: 36000 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:19:39,708-Speed 5967.76 samples/sec Loss 12.7878 LearningRate 0.3372 Epoch: 3 Global Step: 36010 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:19:46,563-Speed 5976.28 samples/sec Loss 12.7912 LearningRate 0.3372 Epoch: 3 Global Step: 36020 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:19:53,424-Speed 5971.08 samples/sec Loss 12.7139 LearningRate 0.3371 Epoch: 3 Global Step: 36030 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:20:00,285-Speed 5971.09 samples/sec Loss 12.6620 LearningRate 0.3371 Epoch: 3 Global Step: 36040 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:20:07,134-Speed 5981.19 samples/sec Loss 12.6628 LearningRate 0.3371 Epoch: 3 Global Step: 36050 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:20:13,985-Speed 5979.86 samples/sec Loss 12.8105 LearningRate 0.3370 Epoch: 3 Global Step: 36060 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:20:20,899-Speed 5927.22 samples/sec Loss 12.7956 LearningRate 0.3370 Epoch: 3 Global Step: 36070 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:20:27,778-Speed 5954.95 samples/sec Loss 12.7076 LearningRate 0.3369 Epoch: 3 Global Step: 36080 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:20:34,633-Speed 5976.67 samples/sec Loss 12.7001 LearningRate 0.3369 Epoch: 3 Global Step: 36090 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:20:41,488-Speed 5976.33 samples/sec Loss 12.7498 LearningRate 0.3369 Epoch: 3 Global Step: 36100 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:20:48,350-Speed 5970.11 samples/sec Loss 12.6509 LearningRate 0.3368 Epoch: 3 Global Step: 36110 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:20:55,198-Speed 5982.04 samples/sec Loss 12.7733 LearningRate 0.3368 Epoch: 3 Global Step: 36120 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:21:02,044-Speed 5983.77 samples/sec Loss 12.7583 LearningRate 0.3367 Epoch: 3 Global Step: 36130 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:21:08,888-Speed 5986.20 samples/sec Loss 12.7351 LearningRate 0.3367 Epoch: 3 Global Step: 36140 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:21:15,741-Speed 5978.55 samples/sec Loss 12.7883 LearningRate 0.3367 Epoch: 3 Global Step: 36150 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:21:22,590-Speed 5981.43 samples/sec Loss 12.7253 LearningRate 0.3366 Epoch: 3 Global Step: 36160 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:21:29,441-Speed 5979.07 samples/sec Loss 12.7207 LearningRate 0.3366 Epoch: 3 Global Step: 36170 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:21:36,321-Speed 5971.95 samples/sec Loss 12.7699 LearningRate 0.3365 Epoch: 3 Global Step: 36180 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:21:43,220-Speed 5938.35 samples/sec Loss 12.6634 LearningRate 0.3365 Epoch: 3 Global Step: 36190 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:21:50,130-Speed 5927.87 samples/sec Loss 12.7498 LearningRate 0.3365 Epoch: 3 Global Step: 36200 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:21:57,069-Speed 5904.44 samples/sec Loss 12.7494 LearningRate 0.3364 Epoch: 3 Global Step: 36210 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:22:03,977-Speed 5930.63 samples/sec Loss 12.7281 LearningRate 0.3364 Epoch: 3 Global Step: 36220 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:22:10,884-Speed 5931.27 samples/sec Loss 12.6809 LearningRate 0.3364 Epoch: 3 Global Step: 36230 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:22:17,793-Speed 5929.96 samples/sec Loss 12.7909 LearningRate 0.3363 Epoch: 3 Global Step: 36240 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:22:24,689-Speed 5940.35 samples/sec Loss 12.6952 LearningRate 0.3363 Epoch: 3 Global Step: 36250 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:22:31,596-Speed 5931.85 samples/sec Loss 12.6594 LearningRate 0.3362 Epoch: 3 Global Step: 36260 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:22:38,452-Speed 5975.24 samples/sec Loss 12.6951 LearningRate 0.3362 Epoch: 3 Global Step: 36270 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:22:45,309-Speed 5974.81 samples/sec Loss 12.7464 LearningRate 0.3362 Epoch: 3 Global Step: 36280 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:22:52,178-Speed 5964.13 samples/sec Loss 12.7730 LearningRate 0.3361 Epoch: 3 Global Step: 36290 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:22:59,038-Speed 5972.15 samples/sec Loss 12.7160 LearningRate 0.3361 Epoch: 3 Global Step: 36300 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:23:05,902-Speed 5968.26 samples/sec Loss 12.7355 LearningRate 0.3360 Epoch: 3 Global Step: 36310 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:23:12,748-Speed 5984.56 samples/sec Loss 12.7999 LearningRate 0.3360 Epoch: 3 Global Step: 36320 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:23:19,595-Speed 5983.61 samples/sec Loss 12.6677 LearningRate 0.3360 Epoch: 3 Global Step: 36330 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:23:26,467-Speed 5960.77 samples/sec Loss 12.7652 LearningRate 0.3359 Epoch: 3 Global Step: 36340 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 03:23:33,336-Speed 5965.26 samples/sec Loss 12.6245 LearningRate 0.3359 Epoch: 3 Global Step: 36350 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:23:40,190-Speed 5977.37 samples/sec Loss 12.7708 LearningRate 0.3358 Epoch: 3 Global Step: 36360 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:23:47,040-Speed 5981.18 samples/sec Loss 12.7157 LearningRate 0.3358 Epoch: 3 Global Step: 36370 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:23:53,896-Speed 5975.13 samples/sec Loss 12.7464 LearningRate 0.3358 Epoch: 3 Global Step: 36380 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:24:00,762-Speed 5968.98 samples/sec Loss 12.7494 LearningRate 0.3357 Epoch: 3 Global Step: 36390 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:24:07,641-Speed 5955.30 samples/sec Loss 12.6455 LearningRate 0.3357 Epoch: 3 Global Step: 36400 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:24:14,513-Speed 5961.91 samples/sec Loss 12.6999 LearningRate 0.3356 Epoch: 3 Global Step: 36410 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:24:21,399-Speed 5951.81 samples/sec Loss 12.6190 LearningRate 0.3356 Epoch: 3 Global Step: 36420 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:24:28,256-Speed 5973.82 samples/sec Loss 12.7502 LearningRate 0.3356 Epoch: 3 Global Step: 36430 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 03:24:35,131-Speed 5959.65 samples/sec Loss 12.7067 LearningRate 0.3355 Epoch: 3 Global Step: 36440 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:24:41,991-Speed 5972.18 samples/sec Loss 12.7294 LearningRate 0.3355 Epoch: 3 Global Step: 36450 Fp16 Grad Scale: 524288 Required: 33 hours Training: 2022-01-08 03:24:48,846-Speed 5976.41 samples/sec Loss 12.6933 LearningRate 0.3354 Epoch: 3 Global Step: 36460 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:24:55,712-Speed 5966.70 samples/sec Loss 12.6976 LearningRate 0.3354 Epoch: 3 Global Step: 36470 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:25:02,575-Speed 5969.21 samples/sec Loss 12.7342 LearningRate 0.3354 Epoch: 3 Global Step: 36480 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:25:09,433-Speed 5973.71 samples/sec Loss 12.7039 LearningRate 0.3353 Epoch: 3 Global Step: 36490 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:25:16,335-Speed 5935.49 samples/sec Loss 12.6476 LearningRate 0.3353 Epoch: 3 Global Step: 36500 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:25:23,204-Speed 5964.19 samples/sec Loss 12.6428 LearningRate 0.3353 Epoch: 3 Global Step: 36510 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:25:30,065-Speed 5970.99 samples/sec Loss 12.6771 LearningRate 0.3352 Epoch: 3 Global Step: 36520 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:25:36,926-Speed 5970.49 samples/sec Loss 12.7464 LearningRate 0.3352 Epoch: 3 Global Step: 36530 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:25:43,795-Speed 5964.58 samples/sec Loss 12.7487 LearningRate 0.3351 Epoch: 3 Global Step: 36540 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:25:50,658-Speed 5969.28 samples/sec Loss 12.7200 LearningRate 0.3351 Epoch: 3 Global Step: 36550 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:25:57,523-Speed 5968.10 samples/sec Loss 12.7274 LearningRate 0.3351 Epoch: 3 Global Step: 36560 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:26:04,393-Speed 5963.24 samples/sec Loss 12.7040 LearningRate 0.3350 Epoch: 3 Global Step: 36570 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:26:11,241-Speed 5982.05 samples/sec Loss 12.7686 LearningRate 0.3350 Epoch: 3 Global Step: 36580 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:26:18,119-Speed 5956.85 samples/sec Loss 12.6817 LearningRate 0.3349 Epoch: 3 Global Step: 36590 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:26:25,013-Speed 5942.66 samples/sec Loss 12.6538 LearningRate 0.3349 Epoch: 3 Global Step: 36600 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:26:31,872-Speed 5973.56 samples/sec Loss 12.6835 LearningRate 0.3349 Epoch: 3 Global Step: 36610 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:26:38,739-Speed 5965.40 samples/sec Loss 12.7363 LearningRate 0.3348 Epoch: 3 Global Step: 36620 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:26:45,614-Speed 5958.85 samples/sec Loss 12.7086 LearningRate 0.3348 Epoch: 3 Global Step: 36630 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:26:52,522-Speed 5932.44 samples/sec Loss 12.7005 LearningRate 0.3347 Epoch: 3 Global Step: 36640 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:26:59,394-Speed 5961.86 samples/sec Loss 12.8042 LearningRate 0.3347 Epoch: 3 Global Step: 36650 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:27:06,267-Speed 5960.03 samples/sec Loss 12.7443 LearningRate 0.3347 Epoch: 3 Global Step: 36660 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:27:13,120-Speed 5979.60 samples/sec Loss 12.7888 LearningRate 0.3346 Epoch: 3 Global Step: 36670 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:27:19,986-Speed 5966.82 samples/sec Loss 12.7056 LearningRate 0.3346 Epoch: 3 Global Step: 36680 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:27:26,869-Speed 5951.91 samples/sec Loss 12.7464 LearningRate 0.3345 Epoch: 3 Global Step: 36690 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:27:33,738-Speed 5963.99 samples/sec Loss 12.6800 LearningRate 0.3345 Epoch: 3 Global Step: 36700 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:27:40,595-Speed 5974.15 samples/sec Loss 12.7388 LearningRate 0.3345 Epoch: 3 Global Step: 36710 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:27:47,479-Speed 5951.85 samples/sec Loss 12.6554 LearningRate 0.3344 Epoch: 3 Global Step: 36720 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:27:54,373-Speed 5942.56 samples/sec Loss 12.5863 LearningRate 0.3344 Epoch: 3 Global Step: 36730 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:28:01,236-Speed 5969.83 samples/sec Loss 12.6945 LearningRate 0.3344 Epoch: 3 Global Step: 36740 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:28:08,084-Speed 5982.55 samples/sec Loss 12.6624 LearningRate 0.3343 Epoch: 3 Global Step: 36750 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:28:14,940-Speed 5976.33 samples/sec Loss 12.6923 LearningRate 0.3343 Epoch: 3 Global Step: 36760 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:28:21,826-Speed 5949.68 samples/sec Loss 12.6267 LearningRate 0.3342 Epoch: 3 Global Step: 36770 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:28:28,672-Speed 5984.63 samples/sec Loss 12.6975 LearningRate 0.3342 Epoch: 3 Global Step: 36780 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:28:35,543-Speed 5962.07 samples/sec Loss 12.7067 LearningRate 0.3342 Epoch: 3 Global Step: 36790 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:28:42,412-Speed 5963.86 samples/sec Loss 12.5945 LearningRate 0.3341 Epoch: 3 Global Step: 36800 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:28:49,274-Speed 5970.37 samples/sec Loss 12.7477 LearningRate 0.3341 Epoch: 3 Global Step: 36810 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:28:56,143-Speed 5964.04 samples/sec Loss 12.6694 LearningRate 0.3340 Epoch: 3 Global Step: 36820 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:29:03,009-Speed 5966.77 samples/sec Loss 12.7394 LearningRate 0.3340 Epoch: 3 Global Step: 36830 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:29:09,860-Speed 5980.30 samples/sec Loss 12.7103 LearningRate 0.3340 Epoch: 3 Global Step: 36840 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:29:16,724-Speed 5968.43 samples/sec Loss 12.6837 LearningRate 0.3339 Epoch: 3 Global Step: 36850 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:29:23,628-Speed 5934.52 samples/sec Loss 12.7270 LearningRate 0.3339 Epoch: 3 Global Step: 36860 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:29:30,483-Speed 5975.91 samples/sec Loss 12.7602 LearningRate 0.3338 Epoch: 3 Global Step: 36870 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:29:37,373-Speed 5945.34 samples/sec Loss 12.6789 LearningRate 0.3338 Epoch: 3 Global Step: 36880 Fp16 Grad Scale: 524288 Required: 33 hours Training: 2022-01-08 03:29:44,313-Speed 5905.65 samples/sec Loss 12.6439 LearningRate 0.3338 Epoch: 3 Global Step: 36890 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:29:51,267-Speed 5891.63 samples/sec Loss 12.6255 LearningRate 0.3337 Epoch: 3 Global Step: 36900 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:29:58,183-Speed 5923.98 samples/sec Loss 12.6802 LearningRate 0.3337 Epoch: 3 Global Step: 36910 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:30:05,035-Speed 5978.98 samples/sec Loss 12.6254 LearningRate 0.3336 Epoch: 3 Global Step: 36920 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:30:11,888-Speed 5977.61 samples/sec Loss 12.6151 LearningRate 0.3336 Epoch: 3 Global Step: 36930 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:30:18,738-Speed 5980.07 samples/sec Loss 12.7890 LearningRate 0.3336 Epoch: 3 Global Step: 36940 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:30:25,599-Speed 5971.58 samples/sec Loss 12.6873 LearningRate 0.3335 Epoch: 3 Global Step: 36950 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:30:32,453-Speed 5977.31 samples/sec Loss 12.5946 LearningRate 0.3335 Epoch: 3 Global Step: 36960 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:30:39,300-Speed 5982.48 samples/sec Loss 12.7157 LearningRate 0.3335 Epoch: 3 Global Step: 36970 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:30:46,173-Speed 5961.06 samples/sec Loss 12.6670 LearningRate 0.3334 Epoch: 3 Global Step: 36980 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:30:53,041-Speed 5965.96 samples/sec Loss 12.7137 LearningRate 0.3334 Epoch: 3 Global Step: 36990 Fp16 Grad Scale: 524288 Required: 33 hours Training: 2022-01-08 03:30:59,879-Speed 5991.05 samples/sec Loss 12.6532 LearningRate 0.3333 Epoch: 3 Global Step: 37000 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:31:06,726-Speed 5983.72 samples/sec Loss 12.6859 LearningRate 0.3333 Epoch: 3 Global Step: 37010 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:31:13,573-Speed 5985.72 samples/sec Loss 12.6274 LearningRate 0.3333 Epoch: 3 Global Step: 37020 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:31:20,425-Speed 5977.66 samples/sec Loss 12.5645 LearningRate 0.3332 Epoch: 3 Global Step: 37030 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:31:27,339-Speed 5926.07 samples/sec Loss 12.5940 LearningRate 0.3332 Epoch: 3 Global Step: 37040 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:31:34,288-Speed 5895.91 samples/sec Loss 12.6335 LearningRate 0.3331 Epoch: 3 Global Step: 37050 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:31:41,221-Speed 5908.51 samples/sec Loss 12.6465 LearningRate 0.3331 Epoch: 3 Global Step: 37060 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:31:48,151-Speed 5911.82 samples/sec Loss 12.6853 LearningRate 0.3331 Epoch: 3 Global Step: 37070 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:31:55,020-Speed 5963.75 samples/sec Loss 12.7395 LearningRate 0.3330 Epoch: 3 Global Step: 37080 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:32:01,876-Speed 5975.90 samples/sec Loss 12.6807 LearningRate 0.3330 Epoch: 3 Global Step: 37090 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:32:08,747-Speed 5962.49 samples/sec Loss 12.6110 LearningRate 0.3329 Epoch: 3 Global Step: 37100 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:32:15,598-Speed 5979.99 samples/sec Loss 12.6406 LearningRate 0.3329 Epoch: 3 Global Step: 37110 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:32:22,455-Speed 5974.85 samples/sec Loss 12.6175 LearningRate 0.3329 Epoch: 3 Global Step: 37120 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:32:29,359-Speed 5934.17 samples/sec Loss 12.6407 LearningRate 0.3328 Epoch: 3 Global Step: 37130 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:32:36,228-Speed 5964.54 samples/sec Loss 12.6230 LearningRate 0.3328 Epoch: 3 Global Step: 37140 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:32:43,083-Speed 5976.69 samples/sec Loss 12.6270 LearningRate 0.3327 Epoch: 3 Global Step: 37150 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:32:49,936-Speed 5977.39 samples/sec Loss 12.7598 LearningRate 0.3327 Epoch: 3 Global Step: 37160 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:32:56,801-Speed 5967.87 samples/sec Loss 12.7111 LearningRate 0.3327 Epoch: 3 Global Step: 37170 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:33:03,657-Speed 5974.70 samples/sec Loss 12.6190 LearningRate 0.3326 Epoch: 3 Global Step: 37180 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:33:10,523-Speed 5966.91 samples/sec Loss 12.6459 LearningRate 0.3326 Epoch: 3 Global Step: 37190 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:33:17,397-Speed 5960.26 samples/sec Loss 12.6075 LearningRate 0.3326 Epoch: 3 Global Step: 37200 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:33:24,253-Speed 5975.26 samples/sec Loss 12.6625 LearningRate 0.3325 Epoch: 3 Global Step: 37210 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:33:31,105-Speed 5978.34 samples/sec Loss 12.6557 LearningRate 0.3325 Epoch: 3 Global Step: 37220 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:33:37,971-Speed 5967.12 samples/sec Loss 12.6076 LearningRate 0.3324 Epoch: 3 Global Step: 37230 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:33:44,828-Speed 5974.84 samples/sec Loss 12.6893 LearningRate 0.3324 Epoch: 3 Global Step: 37240 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:33:51,700-Speed 5960.29 samples/sec Loss 12.6359 LearningRate 0.3324 Epoch: 3 Global Step: 37250 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:33:58,549-Speed 5981.70 samples/sec Loss 12.5731 LearningRate 0.3323 Epoch: 3 Global Step: 37260 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:34:05,402-Speed 5978.31 samples/sec Loss 12.6456 LearningRate 0.3323 Epoch: 3 Global Step: 37270 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:34:12,265-Speed 5969.97 samples/sec Loss 12.5871 LearningRate 0.3322 Epoch: 3 Global Step: 37280 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:34:19,123-Speed 5972.98 samples/sec Loss 12.6819 LearningRate 0.3322 Epoch: 3 Global Step: 37290 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:34:25,970-Speed 5983.82 samples/sec Loss 12.5792 LearningRate 0.3322 Epoch: 3 Global Step: 37300 Fp16 Grad Scale: 524288 Required: 33 hours Training: 2022-01-08 03:34:32,806-Speed 5992.37 samples/sec Loss 12.5957 LearningRate 0.3321 Epoch: 3 Global Step: 37310 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:34:39,653-Speed 5982.91 samples/sec Loss 12.5799 LearningRate 0.3321 Epoch: 3 Global Step: 37320 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:34:46,502-Speed 5982.19 samples/sec Loss 12.6768 LearningRate 0.3320 Epoch: 3 Global Step: 37330 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:34:53,359-Speed 5973.78 samples/sec Loss 12.6477 LearningRate 0.3320 Epoch: 3 Global Step: 37340 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:35:00,220-Speed 5970.98 samples/sec Loss 12.5960 LearningRate 0.3320 Epoch: 3 Global Step: 37350 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:35:07,097-Speed 5957.50 samples/sec Loss 12.5747 LearningRate 0.3319 Epoch: 3 Global Step: 37360 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:35:13,967-Speed 5965.79 samples/sec Loss 12.6657 LearningRate 0.3319 Epoch: 3 Global Step: 37370 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:35:20,832-Speed 5967.44 samples/sec Loss 12.6502 LearningRate 0.3318 Epoch: 3 Global Step: 37380 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:35:27,680-Speed 5981.94 samples/sec Loss 12.5721 LearningRate 0.3318 Epoch: 3 Global Step: 37390 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:35:34,562-Speed 5954.38 samples/sec Loss 12.6792 LearningRate 0.3318 Epoch: 3 Global Step: 37400 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:35:41,451-Speed 5946.61 samples/sec Loss 12.6572 LearningRate 0.3317 Epoch: 3 Global Step: 37410 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:35:48,334-Speed 5952.51 samples/sec Loss 12.6424 LearningRate 0.3317 Epoch: 3 Global Step: 37420 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:35:55,646-Speed 5603.19 samples/sec Loss 12.7355 LearningRate 0.3317 Epoch: 3 Global Step: 37430 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:36:02,512-Speed 5966.32 samples/sec Loss 12.6558 LearningRate 0.3316 Epoch: 3 Global Step: 37440 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:36:09,371-Speed 5973.43 samples/sec Loss 12.6723 LearningRate 0.3316 Epoch: 3 Global Step: 37450 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:36:16,229-Speed 5977.94 samples/sec Loss 12.6590 LearningRate 0.3315 Epoch: 3 Global Step: 37460 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:36:23,100-Speed 5962.55 samples/sec Loss 12.6872 LearningRate 0.3315 Epoch: 3 Global Step: 37470 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:36:29,955-Speed 5976.37 samples/sec Loss 12.6424 LearningRate 0.3315 Epoch: 3 Global Step: 37480 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:36:36,805-Speed 5979.84 samples/sec Loss 12.6345 LearningRate 0.3314 Epoch: 3 Global Step: 37490 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:36:43,665-Speed 5972.05 samples/sec Loss 12.5064 LearningRate 0.3314 Epoch: 3 Global Step: 37500 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:36:50,519-Speed 5977.13 samples/sec Loss 12.6438 LearningRate 0.3313 Epoch: 3 Global Step: 37510 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:36:57,363-Speed 5985.89 samples/sec Loss 12.5684 LearningRate 0.3313 Epoch: 3 Global Step: 37520 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:37:04,212-Speed 5980.82 samples/sec Loss 12.6136 LearningRate 0.3313 Epoch: 3 Global Step: 37530 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:37:11,062-Speed 5981.12 samples/sec Loss 12.6442 LearningRate 0.3312 Epoch: 3 Global Step: 37540 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:37:17,938-Speed 5957.90 samples/sec Loss 12.6409 LearningRate 0.3312 Epoch: 3 Global Step: 37550 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:37:24,796-Speed 5974.19 samples/sec Loss 12.6409 LearningRate 0.3311 Epoch: 3 Global Step: 37560 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:37:31,672-Speed 5959.39 samples/sec Loss 12.6750 LearningRate 0.3311 Epoch: 3 Global Step: 37570 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:37:38,540-Speed 5965.17 samples/sec Loss 12.6516 LearningRate 0.3311 Epoch: 3 Global Step: 37580 Fp16 Grad Scale: 524288 Required: 33 hours Training: 2022-01-08 03:37:45,381-Speed 5989.79 samples/sec Loss 12.6622 LearningRate 0.3310 Epoch: 3 Global Step: 37590 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:37:52,265-Speed 5951.57 samples/sec Loss 12.6350 LearningRate 0.3310 Epoch: 3 Global Step: 37600 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:37:59,119-Speed 5976.28 samples/sec Loss 12.6009 LearningRate 0.3310 Epoch: 3 Global Step: 37610 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:38:06,012-Speed 5943.16 samples/sec Loss 12.5878 LearningRate 0.3309 Epoch: 3 Global Step: 37620 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:38:12,907-Speed 5942.36 samples/sec Loss 12.5414 LearningRate 0.3309 Epoch: 3 Global Step: 37630 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:38:19,791-Speed 5951.71 samples/sec Loss 12.6552 LearningRate 0.3308 Epoch: 3 Global Step: 37640 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:38:26,664-Speed 5960.16 samples/sec Loss 12.5761 LearningRate 0.3308 Epoch: 3 Global Step: 37650 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:38:33,599-Speed 5908.11 samples/sec Loss 12.6725 LearningRate 0.3308 Epoch: 3 Global Step: 37660 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:38:40,585-Speed 5864.66 samples/sec Loss 12.5626 LearningRate 0.3307 Epoch: 3 Global Step: 37670 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:38:47,502-Speed 5922.83 samples/sec Loss 12.6381 LearningRate 0.3307 Epoch: 3 Global Step: 37680 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:38:54,354-Speed 5978.97 samples/sec Loss 12.6035 LearningRate 0.3306 Epoch: 3 Global Step: 37690 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:39:01,203-Speed 5982.03 samples/sec Loss 12.6377 LearningRate 0.3306 Epoch: 3 Global Step: 37700 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:39:08,064-Speed 5970.59 samples/sec Loss 12.5364 LearningRate 0.3306 Epoch: 3 Global Step: 37710 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:39:14,935-Speed 5962.38 samples/sec Loss 12.6032 LearningRate 0.3305 Epoch: 3 Global Step: 37720 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:39:21,788-Speed 5980.62 samples/sec Loss 12.6803 LearningRate 0.3305 Epoch: 3 Global Step: 37730 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:39:28,638-Speed 5980.89 samples/sec Loss 12.5847 LearningRate 0.3304 Epoch: 3 Global Step: 37740 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:39:35,520-Speed 5954.72 samples/sec Loss 12.6186 LearningRate 0.3304 Epoch: 3 Global Step: 37750 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:39:42,392-Speed 5961.97 samples/sec Loss 12.6310 LearningRate 0.3304 Epoch: 3 Global Step: 37760 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:39:49,258-Speed 5966.53 samples/sec Loss 12.5693 LearningRate 0.3303 Epoch: 3 Global Step: 37770 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:39:56,133-Speed 5958.91 samples/sec Loss 12.6398 LearningRate 0.3303 Epoch: 3 Global Step: 37780 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:40:03,007-Speed 5962.40 samples/sec Loss 12.6090 LearningRate 0.3302 Epoch: 3 Global Step: 37790 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:40:09,862-Speed 5976.46 samples/sec Loss 12.6038 LearningRate 0.3302 Epoch: 3 Global Step: 37800 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:40:16,745-Speed 5953.96 samples/sec Loss 12.6388 LearningRate 0.3302 Epoch: 3 Global Step: 37810 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:40:23,611-Speed 5966.48 samples/sec Loss 12.5973 LearningRate 0.3301 Epoch: 3 Global Step: 37820 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:40:30,490-Speed 5955.70 samples/sec Loss 12.6026 LearningRate 0.3301 Epoch: 3 Global Step: 37830 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:40:37,350-Speed 5970.99 samples/sec Loss 12.5977 LearningRate 0.3301 Epoch: 3 Global Step: 37840 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:40:44,208-Speed 5974.01 samples/sec Loss 12.5787 LearningRate 0.3300 Epoch: 3 Global Step: 37850 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:40:51,087-Speed 5955.91 samples/sec Loss 12.5363 LearningRate 0.3300 Epoch: 3 Global Step: 37860 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:40:57,937-Speed 5979.98 samples/sec Loss 12.6492 LearningRate 0.3299 Epoch: 3 Global Step: 37870 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:41:04,817-Speed 5954.40 samples/sec Loss 12.5687 LearningRate 0.3299 Epoch: 3 Global Step: 37880 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:41:11,706-Speed 5946.98 samples/sec Loss 12.5984 LearningRate 0.3299 Epoch: 3 Global Step: 37890 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:41:18,568-Speed 5970.39 samples/sec Loss 12.6733 LearningRate 0.3298 Epoch: 3 Global Step: 37900 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:41:25,425-Speed 5974.08 samples/sec Loss 12.5902 LearningRate 0.3298 Epoch: 3 Global Step: 37910 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:41:32,288-Speed 5969.52 samples/sec Loss 12.4894 LearningRate 0.3297 Epoch: 3 Global Step: 37920 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:41:39,128-Speed 5989.32 samples/sec Loss 12.6839 LearningRate 0.3297 Epoch: 3 Global Step: 37930 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:41:45,989-Speed 5970.87 samples/sec Loss 12.6462 LearningRate 0.3297 Epoch: 3 Global Step: 37940 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:41:52,863-Speed 5962.96 samples/sec Loss 12.6668 LearningRate 0.3296 Epoch: 3 Global Step: 37950 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:41:59,761-Speed 5939.89 samples/sec Loss 12.6203 LearningRate 0.3296 Epoch: 3 Global Step: 37960 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:42:06,641-Speed 5954.32 samples/sec Loss 12.6075 LearningRate 0.3295 Epoch: 3 Global Step: 37970 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:42:13,540-Speed 5938.45 samples/sec Loss 12.6129 LearningRate 0.3295 Epoch: 3 Global Step: 37980 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:42:20,392-Speed 5978.38 samples/sec Loss 12.5694 LearningRate 0.3295 Epoch: 3 Global Step: 37990 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:42:27,264-Speed 5962.17 samples/sec Loss 12.6853 LearningRate 0.3294 Epoch: 3 Global Step: 38000 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:42:34,117-Speed 5978.27 samples/sec Loss 12.5961 LearningRate 0.3294 Epoch: 3 Global Step: 38010 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:42:40,974-Speed 5974.31 samples/sec Loss 12.5567 LearningRate 0.3294 Epoch: 3 Global Step: 38020 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:42:47,840-Speed 5966.97 samples/sec Loss 12.6078 LearningRate 0.3293 Epoch: 3 Global Step: 38030 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:42:54,692-Speed 5979.66 samples/sec Loss 12.6037 LearningRate 0.3293 Epoch: 3 Global Step: 38040 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:43:01,549-Speed 5974.28 samples/sec Loss 12.6170 LearningRate 0.3292 Epoch: 3 Global Step: 38050 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:43:08,436-Speed 5949.73 samples/sec Loss 12.5800 LearningRate 0.3292 Epoch: 3 Global Step: 38060 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:43:15,318-Speed 5952.48 samples/sec Loss 12.6216 LearningRate 0.3292 Epoch: 3 Global Step: 38070 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:43:22,185-Speed 5965.93 samples/sec Loss 12.4427 LearningRate 0.3291 Epoch: 3 Global Step: 38080 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:43:29,044-Speed 5973.13 samples/sec Loss 12.6336 LearningRate 0.3291 Epoch: 3 Global Step: 38090 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:43:35,890-Speed 5984.01 samples/sec Loss 12.5907 LearningRate 0.3290 Epoch: 3 Global Step: 38100 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:43:42,766-Speed 5958.53 samples/sec Loss 12.6163 LearningRate 0.3290 Epoch: 3 Global Step: 38110 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:43:49,617-Speed 5979.97 samples/sec Loss 12.5781 LearningRate 0.3290 Epoch: 3 Global Step: 38120 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:43:56,464-Speed 5983.42 samples/sec Loss 12.5057 LearningRate 0.3289 Epoch: 3 Global Step: 38130 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:44:03,337-Speed 5963.79 samples/sec Loss 12.5571 LearningRate 0.3289 Epoch: 3 Global Step: 38140 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:44:10,193-Speed 5975.23 samples/sec Loss 12.6063 LearningRate 0.3288 Epoch: 3 Global Step: 38150 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:44:17,042-Speed 5981.25 samples/sec Loss 12.5887 LearningRate 0.3288 Epoch: 3 Global Step: 38160 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:44:23,902-Speed 5971.97 samples/sec Loss 12.6268 LearningRate 0.3288 Epoch: 3 Global Step: 38170 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:44:30,766-Speed 5970.82 samples/sec Loss 12.4536 LearningRate 0.3287 Epoch: 3 Global Step: 38180 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:44:37,635-Speed 5963.95 samples/sec Loss 12.5921 LearningRate 0.3287 Epoch: 3 Global Step: 38190 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:44:44,489-Speed 5977.00 samples/sec Loss 12.5266 LearningRate 0.3287 Epoch: 3 Global Step: 38200 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:44:51,370-Speed 5954.05 samples/sec Loss 12.6290 LearningRate 0.3286 Epoch: 3 Global Step: 38210 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:44:58,229-Speed 5977.19 samples/sec Loss 12.6298 LearningRate 0.3286 Epoch: 3 Global Step: 38220 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:45:05,112-Speed 5953.05 samples/sec Loss 12.5481 LearningRate 0.3285 Epoch: 3 Global Step: 38230 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:45:11,967-Speed 5976.40 samples/sec Loss 12.5227 LearningRate 0.3285 Epoch: 3 Global Step: 38240 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:45:18,828-Speed 5971.31 samples/sec Loss 12.5287 LearningRate 0.3285 Epoch: 3 Global Step: 38250 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:45:25,677-Speed 5980.59 samples/sec Loss 12.5417 LearningRate 0.3284 Epoch: 3 Global Step: 38260 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:45:32,528-Speed 5980.08 samples/sec Loss 12.4943 LearningRate 0.3284 Epoch: 3 Global Step: 38270 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:45:39,384-Speed 5975.68 samples/sec Loss 12.5232 LearningRate 0.3283 Epoch: 3 Global Step: 38280 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:45:46,257-Speed 5960.90 samples/sec Loss 12.5806 LearningRate 0.3283 Epoch: 3 Global Step: 38290 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:45:53,114-Speed 5974.17 samples/sec Loss 12.5383 LearningRate 0.3283 Epoch: 3 Global Step: 38300 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:45:59,985-Speed 5962.81 samples/sec Loss 12.5038 LearningRate 0.3282 Epoch: 3 Global Step: 38310 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:46:06,856-Speed 5962.75 samples/sec Loss 12.4937 LearningRate 0.3282 Epoch: 3 Global Step: 38320 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:46:13,711-Speed 5976.43 samples/sec Loss 12.6306 LearningRate 0.3281 Epoch: 3 Global Step: 38330 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:46:20,583-Speed 5961.84 samples/sec Loss 12.6478 LearningRate 0.3281 Epoch: 3 Global Step: 38340 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:46:27,457-Speed 5959.96 samples/sec Loss 12.5746 LearningRate 0.3281 Epoch: 3 Global Step: 38350 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:46:34,315-Speed 5973.54 samples/sec Loss 12.5405 LearningRate 0.3280 Epoch: 3 Global Step: 38360 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:46:41,161-Speed 5984.57 samples/sec Loss 12.5488 LearningRate 0.3280 Epoch: 3 Global Step: 38370 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:46:48,009-Speed 5983.00 samples/sec Loss 12.5242 LearningRate 0.3280 Epoch: 3 Global Step: 38380 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:46:54,888-Speed 5954.81 samples/sec Loss 12.5973 LearningRate 0.3279 Epoch: 3 Global Step: 38390 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:47:01,746-Speed 5973.49 samples/sec Loss 12.6002 LearningRate 0.3279 Epoch: 3 Global Step: 38400 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:47:08,620-Speed 5959.80 samples/sec Loss 12.5369 LearningRate 0.3278 Epoch: 3 Global Step: 38410 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:47:15,479-Speed 5973.16 samples/sec Loss 12.5320 LearningRate 0.3278 Epoch: 3 Global Step: 38420 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:47:22,345-Speed 5967.09 samples/sec Loss 12.5620 LearningRate 0.3278 Epoch: 3 Global Step: 38430 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:47:29,204-Speed 5973.18 samples/sec Loss 12.5285 LearningRate 0.3277 Epoch: 3 Global Step: 38440 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:47:36,177-Speed 5876.64 samples/sec Loss 12.5314 LearningRate 0.3277 Epoch: 3 Global Step: 38450 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:47:43,138-Speed 5885.80 samples/sec Loss 12.5798 LearningRate 0.3276 Epoch: 3 Global Step: 38460 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:47:50,072-Speed 5908.36 samples/sec Loss 12.6282 LearningRate 0.3276 Epoch: 3 Global Step: 38470 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:47:56,928-Speed 5975.20 samples/sec Loss 12.5460 LearningRate 0.3276 Epoch: 3 Global Step: 38480 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:48:03,767-Speed 5990.26 samples/sec Loss 12.5094 LearningRate 0.3275 Epoch: 3 Global Step: 38490 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:48:10,710-Speed 5900.39 samples/sec Loss 12.4741 LearningRate 0.3275 Epoch: 3 Global Step: 38500 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:48:17,586-Speed 5957.91 samples/sec Loss 12.5675 LearningRate 0.3275 Epoch: 3 Global Step: 38510 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:48:24,449-Speed 5969.73 samples/sec Loss 12.5532 LearningRate 0.3274 Epoch: 3 Global Step: 38520 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:48:31,291-Speed 5987.61 samples/sec Loss 12.6054 LearningRate 0.3274 Epoch: 3 Global Step: 38530 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:48:38,143-Speed 5978.87 samples/sec Loss 12.5412 LearningRate 0.3273 Epoch: 3 Global Step: 38540 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:48:45,013-Speed 5963.33 samples/sec Loss 12.5211 LearningRate 0.3273 Epoch: 3 Global Step: 38550 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:48:51,870-Speed 5974.57 samples/sec Loss 12.5255 LearningRate 0.3273 Epoch: 3 Global Step: 38560 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:48:58,773-Speed 5934.44 samples/sec Loss 12.5938 LearningRate 0.3272 Epoch: 3 Global Step: 38570 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:49:05,673-Speed 5937.38 samples/sec Loss 12.5220 LearningRate 0.3272 Epoch: 3 Global Step: 38580 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:49:12,524-Speed 5982.36 samples/sec Loss 12.5647 LearningRate 0.3271 Epoch: 3 Global Step: 38590 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:49:19,367-Speed 5986.66 samples/sec Loss 12.5847 LearningRate 0.3271 Epoch: 3 Global Step: 38600 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:49:26,208-Speed 5988.48 samples/sec Loss 12.4932 LearningRate 0.3271 Epoch: 3 Global Step: 38610 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:49:33,063-Speed 5976.02 samples/sec Loss 12.5259 LearningRate 0.3270 Epoch: 3 Global Step: 38620 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:49:39,921-Speed 5973.75 samples/sec Loss 12.4760 LearningRate 0.3270 Epoch: 3 Global Step: 38630 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:49:46,789-Speed 5965.34 samples/sec Loss 12.5461 LearningRate 0.3269 Epoch: 3 Global Step: 38640 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:49:53,662-Speed 5960.29 samples/sec Loss 12.4876 LearningRate 0.3269 Epoch: 3 Global Step: 38650 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:50:00,509-Speed 5983.70 samples/sec Loss 12.5286 LearningRate 0.3269 Epoch: 3 Global Step: 38660 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:50:07,373-Speed 5968.88 samples/sec Loss 12.5160 LearningRate 0.3268 Epoch: 3 Global Step: 38670 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:50:14,227-Speed 5977.80 samples/sec Loss 12.4860 LearningRate 0.3268 Epoch: 3 Global Step: 38680 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:50:21,204-Speed 5872.35 samples/sec Loss 12.5231 LearningRate 0.3268 Epoch: 3 Global Step: 38690 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:50:28,148-Speed 5899.95 samples/sec Loss 12.5507 LearningRate 0.3267 Epoch: 3 Global Step: 38700 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:50:35,035-Speed 5949.25 samples/sec Loss 12.5859 LearningRate 0.3267 Epoch: 3 Global Step: 38710 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:50:41,889-Speed 5976.80 samples/sec Loss 12.6093 LearningRate 0.3266 Epoch: 3 Global Step: 38720 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:50:48,737-Speed 5983.09 samples/sec Loss 12.5779 LearningRate 0.3266 Epoch: 3 Global Step: 38730 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:50:55,595-Speed 5975.98 samples/sec Loss 12.6251 LearningRate 0.3266 Epoch: 3 Global Step: 38740 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:51:02,443-Speed 5981.88 samples/sec Loss 12.6098 LearningRate 0.3265 Epoch: 3 Global Step: 38750 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:51:09,286-Speed 5987.14 samples/sec Loss 12.4917 LearningRate 0.3265 Epoch: 3 Global Step: 38760 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:51:16,150-Speed 5971.27 samples/sec Loss 12.5139 LearningRate 0.3264 Epoch: 3 Global Step: 38770 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:51:23,114-Speed 5884.08 samples/sec Loss 12.5633 LearningRate 0.3264 Epoch: 3 Global Step: 38780 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:51:30,060-Speed 5898.33 samples/sec Loss 12.5438 LearningRate 0.3264 Epoch: 3 Global Step: 38790 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:51:36,929-Speed 5964.81 samples/sec Loss 12.5595 LearningRate 0.3263 Epoch: 3 Global Step: 38800 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:51:43,791-Speed 5970.62 samples/sec Loss 12.4988 LearningRate 0.3263 Epoch: 3 Global Step: 38810 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:51:50,645-Speed 5977.10 samples/sec Loss 12.4955 LearningRate 0.3262 Epoch: 3 Global Step: 38820 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:51:57,504-Speed 5972.95 samples/sec Loss 12.6119 LearningRate 0.3262 Epoch: 3 Global Step: 38830 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:52:04,375-Speed 5963.15 samples/sec Loss 12.5728 LearningRate 0.3262 Epoch: 3 Global Step: 38840 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:52:11,237-Speed 5969.43 samples/sec Loss 12.5621 LearningRate 0.3261 Epoch: 3 Global Step: 38850 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:52:18,097-Speed 5972.27 samples/sec Loss 12.5424 LearningRate 0.3261 Epoch: 3 Global Step: 38860 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:52:24,951-Speed 5976.99 samples/sec Loss 12.5450 LearningRate 0.3261 Epoch: 3 Global Step: 38870 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:52:31,802-Speed 5979.79 samples/sec Loss 12.5559 LearningRate 0.3260 Epoch: 3 Global Step: 38880 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:52:38,650-Speed 5982.32 samples/sec Loss 12.4631 LearningRate 0.3260 Epoch: 3 Global Step: 38890 Fp16 Grad Scale: 524288 Required: 33 hours Training: 2022-01-08 03:52:45,502-Speed 5981.10 samples/sec Loss 12.5027 LearningRate 0.3259 Epoch: 3 Global Step: 38900 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:52:52,352-Speed 5981.02 samples/sec Loss 12.4684 LearningRate 0.3259 Epoch: 3 Global Step: 38910 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:52:59,200-Speed 5982.16 samples/sec Loss 12.5136 LearningRate 0.3259 Epoch: 3 Global Step: 38920 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:53:06,071-Speed 5962.74 samples/sec Loss 12.5269 LearningRate 0.3258 Epoch: 3 Global Step: 38930 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:53:12,933-Speed 5969.84 samples/sec Loss 12.5325 LearningRate 0.3258 Epoch: 3 Global Step: 38940 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:53:19,776-Speed 5987.32 samples/sec Loss 12.4682 LearningRate 0.3257 Epoch: 3 Global Step: 38950 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:53:26,627-Speed 5979.33 samples/sec Loss 12.4844 LearningRate 0.3257 Epoch: 3 Global Step: 38960 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:53:33,573-Speed 5897.82 samples/sec Loss 12.4941 LearningRate 0.3257 Epoch: 3 Global Step: 38970 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:53:40,417-Speed 5985.46 samples/sec Loss 12.5639 LearningRate 0.3256 Epoch: 3 Global Step: 38980 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:53:47,269-Speed 5978.48 samples/sec Loss 12.4777 LearningRate 0.3256 Epoch: 3 Global Step: 38990 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:53:54,113-Speed 5986.08 samples/sec Loss 12.6288 LearningRate 0.3256 Epoch: 3 Global Step: 39000 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:54:00,956-Speed 5986.72 samples/sec Loss 12.4854 LearningRate 0.3255 Epoch: 3 Global Step: 39010 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:54:07,801-Speed 5985.13 samples/sec Loss 12.5601 LearningRate 0.3255 Epoch: 3 Global Step: 39020 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:54:14,659-Speed 5974.20 samples/sec Loss 12.4375 LearningRate 0.3254 Epoch: 3 Global Step: 39030 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:54:21,536-Speed 5957.27 samples/sec Loss 12.5606 LearningRate 0.3254 Epoch: 3 Global Step: 39040 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:54:28,387-Speed 5979.66 samples/sec Loss 12.5041 LearningRate 0.3254 Epoch: 3 Global Step: 39050 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:54:35,228-Speed 5988.75 samples/sec Loss 12.5378 LearningRate 0.3253 Epoch: 3 Global Step: 39060 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:54:42,073-Speed 5984.31 samples/sec Loss 12.5141 LearningRate 0.3253 Epoch: 3 Global Step: 39070 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:54:48,919-Speed 5984.84 samples/sec Loss 12.4989 LearningRate 0.3252 Epoch: 3 Global Step: 39080 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:54:55,769-Speed 5979.99 samples/sec Loss 12.5728 LearningRate 0.3252 Epoch: 3 Global Step: 39090 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:55:02,640-Speed 5962.29 samples/sec Loss 12.4821 LearningRate 0.3252 Epoch: 3 Global Step: 39100 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:55:09,485-Speed 5985.34 samples/sec Loss 12.5292 LearningRate 0.3251 Epoch: 3 Global Step: 39110 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:55:16,331-Speed 5984.50 samples/sec Loss 12.4884 LearningRate 0.3251 Epoch: 3 Global Step: 39120 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:55:23,180-Speed 5982.13 samples/sec Loss 12.6597 LearningRate 0.3251 Epoch: 3 Global Step: 39130 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:55:30,034-Speed 5976.83 samples/sec Loss 12.5087 LearningRate 0.3250 Epoch: 3 Global Step: 39140 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:55:36,895-Speed 5971.76 samples/sec Loss 12.4141 LearningRate 0.3250 Epoch: 3 Global Step: 39150 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:55:43,787-Speed 5944.18 samples/sec Loss 12.5464 LearningRate 0.3249 Epoch: 3 Global Step: 39160 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:55:50,636-Speed 5984.98 samples/sec Loss 12.4593 LearningRate 0.3249 Epoch: 3 Global Step: 39170 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:55:57,489-Speed 5977.35 samples/sec Loss 12.5347 LearningRate 0.3249 Epoch: 3 Global Step: 39180 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:56:04,328-Speed 5990.73 samples/sec Loss 12.6511 LearningRate 0.3248 Epoch: 3 Global Step: 39190 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:56:11,175-Speed 5982.72 samples/sec Loss 12.5709 LearningRate 0.3248 Epoch: 3 Global Step: 39200 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:56:18,025-Speed 5980.78 samples/sec Loss 12.4510 LearningRate 0.3247 Epoch: 3 Global Step: 39210 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:56:24,880-Speed 5976.15 samples/sec Loss 12.4816 LearningRate 0.3247 Epoch: 3 Global Step: 39220 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:56:31,757-Speed 5957.81 samples/sec Loss 12.4445 LearningRate 0.3247 Epoch: 3 Global Step: 39230 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:56:38,609-Speed 5978.29 samples/sec Loss 12.5044 LearningRate 0.3246 Epoch: 3 Global Step: 39240 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:56:45,478-Speed 5964.34 samples/sec Loss 12.4646 LearningRate 0.3246 Epoch: 3 Global Step: 39250 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:56:52,352-Speed 5959.38 samples/sec Loss 12.5062 LearningRate 0.3245 Epoch: 3 Global Step: 39260 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:56:59,213-Speed 5972.10 samples/sec Loss 12.5395 LearningRate 0.3245 Epoch: 3 Global Step: 39270 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:57:06,064-Speed 5980.07 samples/sec Loss 12.6131 LearningRate 0.3245 Epoch: 3 Global Step: 39280 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:57:12,898-Speed 5995.52 samples/sec Loss 12.5600 LearningRate 0.3244 Epoch: 3 Global Step: 39290 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 03:57:19,765-Speed 5965.11 samples/sec Loss 12.5007 LearningRate 0.3244 Epoch: 3 Global Step: 39300 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 03:57:26,690-Speed 5918.81 samples/sec Loss 12.4904 LearningRate 0.3244 Epoch: 3 Global Step: 39310 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 03:57:33,538-Speed 5984.89 samples/sec Loss 12.4152 LearningRate 0.3243 Epoch: 3 Global Step: 39320 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 03:57:40,393-Speed 5975.94 samples/sec Loss 12.4821 LearningRate 0.3243 Epoch: 3 Global Step: 39330 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 03:57:47,247-Speed 5981.84 samples/sec Loss 12.4861 LearningRate 0.3242 Epoch: 3 Global Step: 39340 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 03:57:54,088-Speed 5988.33 samples/sec Loss 12.4583 LearningRate 0.3242 Epoch: 3 Global Step: 39350 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 03:58:00,968-Speed 5954.57 samples/sec Loss 12.4637 LearningRate 0.3242 Epoch: 3 Global Step: 39360 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 03:58:07,837-Speed 5964.49 samples/sec Loss 12.5031 LearningRate 0.3241 Epoch: 3 Global Step: 39370 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 03:58:14,678-Speed 5988.10 samples/sec Loss 12.4801 LearningRate 0.3241 Epoch: 3 Global Step: 39380 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 03:58:21,525-Speed 5984.12 samples/sec Loss 12.5322 LearningRate 0.3240 Epoch: 3 Global Step: 39390 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:58:28,371-Speed 5983.87 samples/sec Loss 12.4382 LearningRate 0.3240 Epoch: 3 Global Step: 39400 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:58:35,242-Speed 5962.73 samples/sec Loss 12.4096 LearningRate 0.3240 Epoch: 3 Global Step: 39410 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:58:42,105-Speed 5969.74 samples/sec Loss 12.5102 LearningRate 0.3239 Epoch: 3 Global Step: 39420 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:58:48,961-Speed 5974.76 samples/sec Loss 12.4587 LearningRate 0.3239 Epoch: 3 Global Step: 39430 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:58:55,811-Speed 5981.27 samples/sec Loss 12.5289 LearningRate 0.3239 Epoch: 3 Global Step: 39440 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:59:02,667-Speed 5983.11 samples/sec Loss 12.5244 LearningRate 0.3238 Epoch: 3 Global Step: 39450 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:59:09,530-Speed 5969.01 samples/sec Loss 12.5088 LearningRate 0.3238 Epoch: 3 Global Step: 39460 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:59:16,377-Speed 5983.20 samples/sec Loss 12.4876 LearningRate 0.3237 Epoch: 3 Global Step: 39470 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:59:23,224-Speed 5982.92 samples/sec Loss 12.5398 LearningRate 0.3237 Epoch: 3 Global Step: 39480 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 03:59:30,072-Speed 5984.42 samples/sec Loss 12.4371 LearningRate 0.3237 Epoch: 3 Global Step: 39490 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:59:36,934-Speed 5969.90 samples/sec Loss 12.4951 LearningRate 0.3236 Epoch: 3 Global Step: 39500 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:59:43,795-Speed 5971.49 samples/sec Loss 12.5456 LearningRate 0.3236 Epoch: 3 Global Step: 39510 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:59:50,678-Speed 5951.94 samples/sec Loss 12.4411 LearningRate 0.3235 Epoch: 3 Global Step: 39520 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 03:59:57,562-Speed 5952.66 samples/sec Loss 12.4096 LearningRate 0.3235 Epoch: 3 Global Step: 39530 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:00:04,414-Speed 5979.35 samples/sec Loss 12.4285 LearningRate 0.3235 Epoch: 3 Global Step: 39540 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:00:11,291-Speed 5956.13 samples/sec Loss 12.5028 LearningRate 0.3234 Epoch: 3 Global Step: 39550 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:00:18,155-Speed 5969.20 samples/sec Loss 12.5324 LearningRate 0.3234 Epoch: 3 Global Step: 39560 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:00:25,035-Speed 5954.12 samples/sec Loss 12.4265 LearningRate 0.3234 Epoch: 3 Global Step: 39570 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:00:31,899-Speed 5969.05 samples/sec Loss 12.4760 LearningRate 0.3233 Epoch: 3 Global Step: 39580 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:00:38,754-Speed 5976.48 samples/sec Loss 12.4021 LearningRate 0.3233 Epoch: 3 Global Step: 39590 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:00:45,606-Speed 5978.40 samples/sec Loss 12.4768 LearningRate 0.3232 Epoch: 3 Global Step: 39600 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:00:52,485-Speed 5955.76 samples/sec Loss 12.4418 LearningRate 0.3232 Epoch: 3 Global Step: 39610 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:00:59,356-Speed 5962.69 samples/sec Loss 12.5525 LearningRate 0.3232 Epoch: 3 Global Step: 39620 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:01:06,209-Speed 5978.29 samples/sec Loss 12.3968 LearningRate 0.3231 Epoch: 3 Global Step: 39630 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:01:13,062-Speed 5978.37 samples/sec Loss 12.4151 LearningRate 0.3231 Epoch: 3 Global Step: 39640 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:01:19,983-Speed 5919.96 samples/sec Loss 12.4872 LearningRate 0.3230 Epoch: 3 Global Step: 39650 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:01:26,825-Speed 5987.47 samples/sec Loss 12.4391 LearningRate 0.3230 Epoch: 3 Global Step: 39660 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:01:33,675-Speed 5980.80 samples/sec Loss 12.5339 LearningRate 0.3230 Epoch: 3 Global Step: 39670 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:01:40,522-Speed 5983.08 samples/sec Loss 12.4369 LearningRate 0.3229 Epoch: 3 Global Step: 39680 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:01:47,376-Speed 5976.60 samples/sec Loss 12.5525 LearningRate 0.3229 Epoch: 3 Global Step: 39690 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:01:54,225-Speed 5982.10 samples/sec Loss 12.4377 LearningRate 0.3229 Epoch: 3 Global Step: 39700 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:02:01,079-Speed 5976.61 samples/sec Loss 12.4451 LearningRate 0.3228 Epoch: 3 Global Step: 39710 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:02:07,957-Speed 5957.27 samples/sec Loss 12.4982 LearningRate 0.3228 Epoch: 3 Global Step: 39720 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:02:14,815-Speed 5973.74 samples/sec Loss 12.4570 LearningRate 0.3227 Epoch: 3 Global Step: 39730 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:02:21,696-Speed 5953.52 samples/sec Loss 12.5241 LearningRate 0.3227 Epoch: 3 Global Step: 39740 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:02:28,570-Speed 5959.79 samples/sec Loss 12.4518 LearningRate 0.3227 Epoch: 3 Global Step: 39750 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:02:35,429-Speed 5972.98 samples/sec Loss 12.4099 LearningRate 0.3226 Epoch: 3 Global Step: 39760 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:02:42,288-Speed 5973.07 samples/sec Loss 12.4194 LearningRate 0.3226 Epoch: 3 Global Step: 39770 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:02:49,194-Speed 5932.84 samples/sec Loss 12.5078 LearningRate 0.3225 Epoch: 3 Global Step: 39780 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:02:56,092-Speed 5940.51 samples/sec Loss 12.4708 LearningRate 0.3225 Epoch: 3 Global Step: 39790 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:03:02,963-Speed 5961.94 samples/sec Loss 12.5106 LearningRate 0.3225 Epoch: 3 Global Step: 39800 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:03:09,816-Speed 5979.18 samples/sec Loss 12.4027 LearningRate 0.3224 Epoch: 3 Global Step: 39810 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:03:16,681-Speed 5966.90 samples/sec Loss 12.4111 LearningRate 0.3224 Epoch: 3 Global Step: 39820 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:03:23,526-Speed 5986.02 samples/sec Loss 12.3879 LearningRate 0.3224 Epoch: 3 Global Step: 39830 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:03:30,373-Speed 5982.72 samples/sec Loss 12.4791 LearningRate 0.3223 Epoch: 3 Global Step: 39840 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:03:37,226-Speed 5977.92 samples/sec Loss 12.5393 LearningRate 0.3223 Epoch: 3 Global Step: 39850 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:03:44,072-Speed 5984.84 samples/sec Loss 12.4822 LearningRate 0.3222 Epoch: 3 Global Step: 39860 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:03:50,941-Speed 5964.27 samples/sec Loss 12.3749 LearningRate 0.3222 Epoch: 3 Global Step: 39870 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:03:57,779-Speed 5991.92 samples/sec Loss 12.3757 LearningRate 0.3222 Epoch: 3 Global Step: 39880 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:04:04,622-Speed 5986.00 samples/sec Loss 12.4762 LearningRate 0.3221 Epoch: 3 Global Step: 39890 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:04:11,483-Speed 5973.22 samples/sec Loss 12.5047 LearningRate 0.3221 Epoch: 3 Global Step: 39900 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:04:18,336-Speed 5977.86 samples/sec Loss 12.3577 LearningRate 0.3220 Epoch: 3 Global Step: 39910 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:04:25,205-Speed 5963.73 samples/sec Loss 12.3603 LearningRate 0.3220 Epoch: 3 Global Step: 39920 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:04:32,048-Speed 5987.37 samples/sec Loss 12.3436 LearningRate 0.3220 Epoch: 3 Global Step: 39930 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:04:38,892-Speed 5985.15 samples/sec Loss 12.4700 LearningRate 0.3219 Epoch: 3 Global Step: 39940 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:04:45,747-Speed 5976.16 samples/sec Loss 12.3845 LearningRate 0.3219 Epoch: 3 Global Step: 39950 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:04:52,697-Speed 5895.80 samples/sec Loss 12.4294 LearningRate 0.3219 Epoch: 3 Global Step: 39960 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:04:59,563-Speed 5967.31 samples/sec Loss 12.4987 LearningRate 0.3218 Epoch: 3 Global Step: 39970 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:05:06,482-Speed 5923.47 samples/sec Loss 12.4867 LearningRate 0.3218 Epoch: 3 Global Step: 39980 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 04:05:13,334-Speed 5979.61 samples/sec Loss 12.4053 LearningRate 0.3217 Epoch: 3 Global Step: 39990 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 04:05:20,182-Speed 5982.56 samples/sec Loss 12.4488 LearningRate 0.3217 Epoch: 3 Global Step: 40000 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 04:05:47,026-[lfw][40000]XNorm: 23.529627 Training: 2022-01-08 04:05:47,027-[lfw][40000]Accuracy-Flip: 0.99583+-0.00300 Training: 2022-01-08 04:05:47,027-[lfw][40000]Accuracy-Highest: 0.99650 Training: 2022-01-08 04:06:17,815-[cfp_fp][40000]XNorm: 21.023732 Training: 2022-01-08 04:06:17,816-[cfp_fp][40000]Accuracy-Flip: 0.96929+-0.00881 Training: 2022-01-08 04:06:17,817-[cfp_fp][40000]Accuracy-Highest: 0.97057 Training: 2022-01-08 04:06:44,493-[agedb_30][40000]XNorm: 22.993922 Training: 2022-01-08 04:06:44,494-[agedb_30][40000]Accuracy-Flip: 0.95900+-0.00764 Training: 2022-01-08 04:06:44,494-[agedb_30][40000]Accuracy-Highest: 0.96200 Training: 2022-01-08 04:06:51,319-Speed 449.44 samples/sec Loss 12.3644 LearningRate 0.3217 Epoch: 3 Global Step: 40010 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 04:06:58,153-Speed 5998.31 samples/sec Loss 12.3871 LearningRate 0.3216 Epoch: 3 Global Step: 40020 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 04:07:04,996-Speed 5987.69 samples/sec Loss 12.5671 LearningRate 0.3216 Epoch: 3 Global Step: 40030 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 04:07:11,856-Speed 5972.22 samples/sec Loss 12.3908 LearningRate 0.3215 Epoch: 3 Global Step: 40040 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 04:07:18,737-Speed 5953.05 samples/sec Loss 12.4383 LearningRate 0.3215 Epoch: 3 Global Step: 40050 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 04:07:25,583-Speed 5985.47 samples/sec Loss 12.4455 LearningRate 0.3215 Epoch: 3 Global Step: 40060 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 04:07:32,443-Speed 5971.74 samples/sec Loss 12.5821 LearningRate 0.3214 Epoch: 3 Global Step: 40070 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 04:07:39,320-Speed 5957.65 samples/sec Loss 12.4478 LearningRate 0.3214 Epoch: 3 Global Step: 40080 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:07:46,184-Speed 5968.58 samples/sec Loss 12.4571 LearningRate 0.3214 Epoch: 3 Global Step: 40090 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:07:53,028-Speed 5985.26 samples/sec Loss 12.3787 LearningRate 0.3213 Epoch: 3 Global Step: 40100 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:07:59,870-Speed 5988.27 samples/sec Loss 12.4116 LearningRate 0.3213 Epoch: 3 Global Step: 40110 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:08:06,718-Speed 5982.66 samples/sec Loss 12.3759 LearningRate 0.3212 Epoch: 3 Global Step: 40120 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:08:13,576-Speed 5973.34 samples/sec Loss 12.4409 LearningRate 0.3212 Epoch: 3 Global Step: 40130 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:08:20,409-Speed 5995.52 samples/sec Loss 12.4336 LearningRate 0.3212 Epoch: 3 Global Step: 40140 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:08:27,252-Speed 5988.11 samples/sec Loss 12.4288 LearningRate 0.3211 Epoch: 3 Global Step: 40150 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:08:34,088-Speed 5992.34 samples/sec Loss 12.4750 LearningRate 0.3211 Epoch: 3 Global Step: 40160 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:08:40,933-Speed 5985.17 samples/sec Loss 12.4629 LearningRate 0.3210 Epoch: 3 Global Step: 40170 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:08:47,856-Speed 5917.70 samples/sec Loss 12.4461 LearningRate 0.3210 Epoch: 3 Global Step: 40180 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:08:54,795-Speed 5904.81 samples/sec Loss 12.4516 LearningRate 0.3210 Epoch: 3 Global Step: 40190 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:09:01,762-Speed 5879.72 samples/sec Loss 12.4486 LearningRate 0.3209 Epoch: 3 Global Step: 40200 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:09:08,611-Speed 5981.99 samples/sec Loss 12.4087 LearningRate 0.3209 Epoch: 3 Global Step: 40210 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:09:15,577-Speed 5881.90 samples/sec Loss 12.4293 LearningRate 0.3209 Epoch: 3 Global Step: 40220 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:09:22,424-Speed 5983.24 samples/sec Loss 12.3407 LearningRate 0.3208 Epoch: 3 Global Step: 40230 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:09:29,265-Speed 5988.00 samples/sec Loss 12.4213 LearningRate 0.3208 Epoch: 3 Global Step: 40240 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:09:36,133-Speed 5965.27 samples/sec Loss 12.3502 LearningRate 0.3207 Epoch: 3 Global Step: 40250 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:09:42,993-Speed 5972.37 samples/sec Loss 12.4218 LearningRate 0.3207 Epoch: 3 Global Step: 40260 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:09:49,879-Speed 5949.22 samples/sec Loss 12.4817 LearningRate 0.3207 Epoch: 3 Global Step: 40270 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:09:56,735-Speed 5974.98 samples/sec Loss 12.5177 LearningRate 0.3206 Epoch: 3 Global Step: 40280 Fp16 Grad Scale: 524288 Required: 33 hours Training: 2022-01-08 04:10:03,568-Speed 5996.09 samples/sec Loss 12.4101 LearningRate 0.3206 Epoch: 3 Global Step: 40290 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:10:10,414-Speed 5984.21 samples/sec Loss 12.4619 LearningRate 0.3205 Epoch: 3 Global Step: 40300 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:10:17,274-Speed 5971.70 samples/sec Loss 12.5110 LearningRate 0.3205 Epoch: 3 Global Step: 40310 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:10:24,151-Speed 5956.71 samples/sec Loss 12.4289 LearningRate 0.3205 Epoch: 3 Global Step: 40320 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:10:31,001-Speed 5981.20 samples/sec Loss 12.4474 LearningRate 0.3204 Epoch: 3 Global Step: 40330 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:10:37,845-Speed 5985.62 samples/sec Loss 12.4549 LearningRate 0.3204 Epoch: 3 Global Step: 40340 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:10:44,696-Speed 5979.72 samples/sec Loss 12.3718 LearningRate 0.3204 Epoch: 3 Global Step: 40350 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:10:51,554-Speed 5973.91 samples/sec Loss 12.4172 LearningRate 0.3203 Epoch: 3 Global Step: 40360 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:10:58,397-Speed 5986.65 samples/sec Loss 12.3810 LearningRate 0.3203 Epoch: 3 Global Step: 40370 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:11:05,267-Speed 5965.46 samples/sec Loss 12.4643 LearningRate 0.3202 Epoch: 3 Global Step: 40380 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:11:12,114-Speed 5983.70 samples/sec Loss 12.4554 LearningRate 0.3202 Epoch: 3 Global Step: 40390 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:11:18,966-Speed 5979.37 samples/sec Loss 12.3225 LearningRate 0.3202 Epoch: 3 Global Step: 40400 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:11:25,843-Speed 5956.87 samples/sec Loss 12.3173 LearningRate 0.3201 Epoch: 3 Global Step: 40410 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:11:32,691-Speed 5982.09 samples/sec Loss 12.3897 LearningRate 0.3201 Epoch: 3 Global Step: 40420 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:11:39,544-Speed 5978.10 samples/sec Loss 12.3913 LearningRate 0.3200 Epoch: 3 Global Step: 40430 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:11:46,391-Speed 5983.45 samples/sec Loss 12.3659 LearningRate 0.3200 Epoch: 3 Global Step: 40440 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:11:53,237-Speed 5984.47 samples/sec Loss 12.4603 LearningRate 0.3200 Epoch: 3 Global Step: 40450 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:12:00,089-Speed 5978.31 samples/sec Loss 12.3493 LearningRate 0.3199 Epoch: 3 Global Step: 40460 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:12:06,945-Speed 5975.63 samples/sec Loss 12.4186 LearningRate 0.3199 Epoch: 3 Global Step: 40470 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:12:13,794-Speed 5982.02 samples/sec Loss 12.4262 LearningRate 0.3199 Epoch: 3 Global Step: 40480 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:12:20,644-Speed 5981.03 samples/sec Loss 12.3881 LearningRate 0.3198 Epoch: 3 Global Step: 40490 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:12:27,602-Speed 5888.14 samples/sec Loss 12.4378 LearningRate 0.3198 Epoch: 3 Global Step: 40500 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:12:34,556-Speed 5891.31 samples/sec Loss 12.4114 LearningRate 0.3197 Epoch: 3 Global Step: 40510 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:12:41,403-Speed 5983.37 samples/sec Loss 12.4213 LearningRate 0.3197 Epoch: 3 Global Step: 40520 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:12:48,254-Speed 5981.48 samples/sec Loss 12.3774 LearningRate 0.3197 Epoch: 3 Global Step: 40530 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:12:55,101-Speed 5986.59 samples/sec Loss 12.4056 LearningRate 0.3196 Epoch: 3 Global Step: 40540 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:13:01,940-Speed 5989.79 samples/sec Loss 12.3303 LearningRate 0.3196 Epoch: 3 Global Step: 40550 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:13:08,785-Speed 5985.45 samples/sec Loss 12.3894 LearningRate 0.3195 Epoch: 3 Global Step: 40560 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:13:15,641-Speed 5974.99 samples/sec Loss 12.3624 LearningRate 0.3195 Epoch: 3 Global Step: 40570 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:13:22,500-Speed 5973.55 samples/sec Loss 12.3347 LearningRate 0.3195 Epoch: 3 Global Step: 40580 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:13:29,367-Speed 5966.32 samples/sec Loss 12.4675 LearningRate 0.3194 Epoch: 3 Global Step: 40590 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:13:36,207-Speed 5988.74 samples/sec Loss 12.3966 LearningRate 0.3194 Epoch: 3 Global Step: 40600 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:13:43,050-Speed 5987.46 samples/sec Loss 12.4034 LearningRate 0.3194 Epoch: 3 Global Step: 40610 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:13:49,942-Speed 5943.89 samples/sec Loss 12.3913 LearningRate 0.3193 Epoch: 3 Global Step: 40620 Fp16 Grad Scale: 524288 Required: 33 hours Training: 2022-01-08 04:13:56,782-Speed 5991.18 samples/sec Loss 12.3835 LearningRate 0.3193 Epoch: 3 Global Step: 40630 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:14:03,630-Speed 5982.20 samples/sec Loss 12.5009 LearningRate 0.3192 Epoch: 3 Global Step: 40640 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:14:10,477-Speed 5983.84 samples/sec Loss 12.3626 LearningRate 0.3192 Epoch: 3 Global Step: 40650 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:14:17,323-Speed 5983.86 samples/sec Loss 12.3596 LearningRate 0.3192 Epoch: 3 Global Step: 40660 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:14:24,181-Speed 5974.49 samples/sec Loss 12.3262 LearningRate 0.3191 Epoch: 3 Global Step: 40670 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:14:31,144-Speed 5883.28 samples/sec Loss 12.3703 LearningRate 0.3191 Epoch: 3 Global Step: 40680 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:14:38,122-Speed 5871.28 samples/sec Loss 12.4618 LearningRate 0.3191 Epoch: 3 Global Step: 40690 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:14:44,966-Speed 5986.10 samples/sec Loss 12.5028 LearningRate 0.3190 Epoch: 3 Global Step: 40700 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:14:51,835-Speed 5964.64 samples/sec Loss 12.4609 LearningRate 0.3190 Epoch: 3 Global Step: 40710 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:14:58,682-Speed 5982.91 samples/sec Loss 12.3838 LearningRate 0.3189 Epoch: 3 Global Step: 40720 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:15:05,549-Speed 5966.22 samples/sec Loss 12.4011 LearningRate 0.3189 Epoch: 3 Global Step: 40730 Fp16 Grad Scale: 524288 Required: 33 hours Training: 2022-01-08 04:15:12,408-Speed 5973.07 samples/sec Loss 12.3133 LearningRate 0.3189 Epoch: 3 Global Step: 40740 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:15:19,274-Speed 5967.13 samples/sec Loss 12.4409 LearningRate 0.3188 Epoch: 3 Global Step: 40750 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:15:26,130-Speed 5974.87 samples/sec Loss 12.3614 LearningRate 0.3188 Epoch: 3 Global Step: 40760 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:15:32,966-Speed 5993.16 samples/sec Loss 12.3763 LearningRate 0.3187 Epoch: 3 Global Step: 40770 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:15:39,851-Speed 5952.06 samples/sec Loss 12.2901 LearningRate 0.3187 Epoch: 3 Global Step: 40780 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:15:46,717-Speed 5966.93 samples/sec Loss 12.3059 LearningRate 0.3187 Epoch: 3 Global Step: 40790 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:15:53,567-Speed 5980.37 samples/sec Loss 12.3710 LearningRate 0.3186 Epoch: 3 Global Step: 40800 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:16:00,436-Speed 5965.40 samples/sec Loss 12.4447 LearningRate 0.3186 Epoch: 3 Global Step: 40810 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:16:07,302-Speed 5966.24 samples/sec Loss 12.3554 LearningRate 0.3186 Epoch: 3 Global Step: 40820 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:16:14,158-Speed 5978.76 samples/sec Loss 12.3522 LearningRate 0.3185 Epoch: 3 Global Step: 40830 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:16:21,012-Speed 5977.35 samples/sec Loss 12.4330 LearningRate 0.3185 Epoch: 3 Global Step: 40840 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:16:27,887-Speed 5960.32 samples/sec Loss 12.3366 LearningRate 0.3184 Epoch: 3 Global Step: 40850 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:16:34,751-Speed 5968.38 samples/sec Loss 12.3996 LearningRate 0.3184 Epoch: 3 Global Step: 40860 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:16:41,610-Speed 5975.10 samples/sec Loss 12.3049 LearningRate 0.3184 Epoch: 3 Global Step: 40870 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:16:48,457-Speed 5983.95 samples/sec Loss 12.4050 LearningRate 0.3183 Epoch: 3 Global Step: 40880 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:16:55,306-Speed 5981.20 samples/sec Loss 12.3337 LearningRate 0.3183 Epoch: 3 Global Step: 40890 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:17:02,157-Speed 5979.63 samples/sec Loss 12.3417 LearningRate 0.3182 Epoch: 3 Global Step: 40900 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:17:08,999-Speed 5987.79 samples/sec Loss 12.3386 LearningRate 0.3182 Epoch: 3 Global Step: 40910 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:17:15,847-Speed 5982.18 samples/sec Loss 12.4258 LearningRate 0.3182 Epoch: 3 Global Step: 40920 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:17:22,740-Speed 5943.53 samples/sec Loss 12.3609 LearningRate 0.3181 Epoch: 3 Global Step: 40930 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:17:29,579-Speed 5989.92 samples/sec Loss 12.4308 LearningRate 0.3181 Epoch: 3 Global Step: 40940 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:17:36,457-Speed 5956.35 samples/sec Loss 12.4741 LearningRate 0.3181 Epoch: 3 Global Step: 40950 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:17:43,318-Speed 5972.69 samples/sec Loss 12.3985 LearningRate 0.3180 Epoch: 3 Global Step: 40960 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:17:50,272-Speed 5891.21 samples/sec Loss 12.3591 LearningRate 0.3180 Epoch: 3 Global Step: 40970 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:17:57,147-Speed 5959.58 samples/sec Loss 12.4255 LearningRate 0.3179 Epoch: 3 Global Step: 40980 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:18:03,996-Speed 5980.86 samples/sec Loss 12.4043 LearningRate 0.3179 Epoch: 3 Global Step: 40990 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:18:10,882-Speed 5950.35 samples/sec Loss 12.4578 LearningRate 0.3179 Epoch: 3 Global Step: 41000 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:18:17,736-Speed 5976.90 samples/sec Loss 12.4539 LearningRate 0.3178 Epoch: 3 Global Step: 41010 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:18:24,585-Speed 5981.61 samples/sec Loss 12.3998 LearningRate 0.3178 Epoch: 3 Global Step: 41020 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:18:31,433-Speed 5982.41 samples/sec Loss 12.3784 LearningRate 0.3178 Epoch: 3 Global Step: 41030 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:18:38,304-Speed 5962.43 samples/sec Loss 12.3736 LearningRate 0.3177 Epoch: 3 Global Step: 41040 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:18:45,166-Speed 5970.30 samples/sec Loss 12.3685 LearningRate 0.3177 Epoch: 3 Global Step: 41050 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:18:52,026-Speed 5971.23 samples/sec Loss 12.3407 LearningRate 0.3176 Epoch: 3 Global Step: 41060 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:18:58,906-Speed 5954.62 samples/sec Loss 12.3541 LearningRate 0.3176 Epoch: 3 Global Step: 41070 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:19:05,769-Speed 5969.19 samples/sec Loss 12.4037 LearningRate 0.3176 Epoch: 3 Global Step: 41080 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:19:12,624-Speed 5976.56 samples/sec Loss 12.3901 LearningRate 0.3175 Epoch: 3 Global Step: 41090 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:19:19,483-Speed 5972.45 samples/sec Loss 12.3223 LearningRate 0.3175 Epoch: 3 Global Step: 41100 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:19:26,359-Speed 5958.58 samples/sec Loss 12.3169 LearningRate 0.3174 Epoch: 3 Global Step: 41110 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:19:33,221-Speed 5970.39 samples/sec Loss 12.3092 LearningRate 0.3174 Epoch: 3 Global Step: 41120 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:19:40,092-Speed 5961.97 samples/sec Loss 12.3509 LearningRate 0.3174 Epoch: 3 Global Step: 41130 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:19:46,982-Speed 5946.50 samples/sec Loss 12.3943 LearningRate 0.3173 Epoch: 3 Global Step: 41140 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:19:53,843-Speed 5971.02 samples/sec Loss 12.3886 LearningRate 0.3173 Epoch: 3 Global Step: 41150 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:20:00,713-Speed 5963.69 samples/sec Loss 12.2934 LearningRate 0.3173 Epoch: 3 Global Step: 41160 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:20:07,571-Speed 5974.35 samples/sec Loss 12.2695 LearningRate 0.3172 Epoch: 3 Global Step: 41170 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:20:14,432-Speed 5970.92 samples/sec Loss 12.4190 LearningRate 0.3172 Epoch: 3 Global Step: 41180 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:20:21,282-Speed 5980.16 samples/sec Loss 12.3641 LearningRate 0.3171 Epoch: 3 Global Step: 41190 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:20:28,173-Speed 5945.74 samples/sec Loss 12.2419 LearningRate 0.3171 Epoch: 3 Global Step: 41200 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:20:35,025-Speed 5979.41 samples/sec Loss 12.2793 LearningRate 0.3171 Epoch: 3 Global Step: 41210 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:20:41,875-Speed 5979.99 samples/sec Loss 12.3443 LearningRate 0.3170 Epoch: 3 Global Step: 41220 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:20:48,756-Speed 5953.63 samples/sec Loss 12.2889 LearningRate 0.3170 Epoch: 3 Global Step: 41230 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:20:55,615-Speed 5972.82 samples/sec Loss 12.2802 LearningRate 0.3169 Epoch: 3 Global Step: 41240 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:21:02,491-Speed 5958.18 samples/sec Loss 12.3231 LearningRate 0.3169 Epoch: 3 Global Step: 41250 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:21:09,365-Speed 5960.67 samples/sec Loss 12.3039 LearningRate 0.3169 Epoch: 3 Global Step: 41260 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:21:16,224-Speed 5972.29 samples/sec Loss 12.4219 LearningRate 0.3168 Epoch: 3 Global Step: 41270 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:21:23,079-Speed 5976.73 samples/sec Loss 12.4391 LearningRate 0.3168 Epoch: 3 Global Step: 41280 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:21:29,935-Speed 5974.86 samples/sec Loss 12.3445 LearningRate 0.3168 Epoch: 3 Global Step: 41290 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:21:36,792-Speed 5974.95 samples/sec Loss 12.2655 LearningRate 0.3167 Epoch: 3 Global Step: 41300 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:21:43,668-Speed 5957.60 samples/sec Loss 12.4385 LearningRate 0.3167 Epoch: 3 Global Step: 41310 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:21:50,536-Speed 5966.25 samples/sec Loss 12.3664 LearningRate 0.3166 Epoch: 3 Global Step: 41320 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:21:57,408-Speed 5960.86 samples/sec Loss 12.3308 LearningRate 0.3166 Epoch: 3 Global Step: 41330 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:22:04,281-Speed 5961.05 samples/sec Loss 12.3905 LearningRate 0.3166 Epoch: 3 Global Step: 41340 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:22:11,147-Speed 5967.46 samples/sec Loss 12.3933 LearningRate 0.3165 Epoch: 3 Global Step: 41350 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:22:18,041-Speed 5942.59 samples/sec Loss 12.3957 LearningRate 0.3165 Epoch: 3 Global Step: 41360 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:22:24,896-Speed 5975.95 samples/sec Loss 12.3263 LearningRate 0.3165 Epoch: 3 Global Step: 41370 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:22:31,762-Speed 5967.96 samples/sec Loss 12.3227 LearningRate 0.3164 Epoch: 3 Global Step: 41380 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:22:38,630-Speed 5965.96 samples/sec Loss 12.3047 LearningRate 0.3164 Epoch: 3 Global Step: 41390 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:22:45,528-Speed 5938.79 samples/sec Loss 12.3027 LearningRate 0.3163 Epoch: 3 Global Step: 41400 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:22:52,387-Speed 5973.24 samples/sec Loss 12.3586 LearningRate 0.3163 Epoch: 3 Global Step: 41410 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:22:59,383-Speed 5855.91 samples/sec Loss 12.3700 LearningRate 0.3163 Epoch: 3 Global Step: 41420 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:23:06,274-Speed 5945.34 samples/sec Loss 12.4116 LearningRate 0.3162 Epoch: 3 Global Step: 41430 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:23:13,173-Speed 5938.42 samples/sec Loss 12.3417 LearningRate 0.3162 Epoch: 3 Global Step: 41440 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:23:20,029-Speed 5976.63 samples/sec Loss 12.4061 LearningRate 0.3161 Epoch: 3 Global Step: 41450 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:23:26,894-Speed 5967.53 samples/sec Loss 12.3552 LearningRate 0.3161 Epoch: 3 Global Step: 41460 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:23:33,753-Speed 5973.02 samples/sec Loss 12.3734 LearningRate 0.3161 Epoch: 3 Global Step: 41470 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:23:40,628-Speed 5959.05 samples/sec Loss 12.3554 LearningRate 0.3160 Epoch: 3 Global Step: 41480 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:24:05,461-Speed 1649.47 samples/sec Loss 12.3452 LearningRate 0.3160 Epoch: 4 Global Step: 41490 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:24:12,295-Speed 5995.64 samples/sec Loss 12.3252 LearningRate 0.3160 Epoch: 4 Global Step: 41500 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:24:19,136-Speed 5988.48 samples/sec Loss 12.2741 LearningRate 0.3159 Epoch: 4 Global Step: 41510 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:24:25,984-Speed 5985.62 samples/sec Loss 12.3232 LearningRate 0.3159 Epoch: 4 Global Step: 41520 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:24:32,825-Speed 5988.42 samples/sec Loss 12.3896 LearningRate 0.3158 Epoch: 4 Global Step: 41530 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:24:39,668-Speed 5987.30 samples/sec Loss 12.3262 LearningRate 0.3158 Epoch: 4 Global Step: 41540 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 04:24:46,505-Speed 5991.45 samples/sec Loss 12.3333 LearningRate 0.3158 Epoch: 4 Global Step: 41550 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:24:53,359-Speed 5979.61 samples/sec Loss 12.3392 LearningRate 0.3157 Epoch: 4 Global Step: 41560 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:25:00,204-Speed 5986.48 samples/sec Loss 12.3238 LearningRate 0.3157 Epoch: 4 Global Step: 41570 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 04:25:07,051-Speed 5983.49 samples/sec Loss 12.3442 LearningRate 0.3157 Epoch: 4 Global Step: 41580 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:25:13,900-Speed 5980.92 samples/sec Loss 12.3251 LearningRate 0.3156 Epoch: 4 Global Step: 41590 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:25:20,757-Speed 5975.21 samples/sec Loss 12.3409 LearningRate 0.3156 Epoch: 4 Global Step: 41600 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:25:27,623-Speed 5966.08 samples/sec Loss 12.3308 LearningRate 0.3155 Epoch: 4 Global Step: 41610 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:25:34,492-Speed 5964.49 samples/sec Loss 12.2824 LearningRate 0.3155 Epoch: 4 Global Step: 41620 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:25:41,354-Speed 5969.51 samples/sec Loss 12.2670 LearningRate 0.3155 Epoch: 4 Global Step: 41630 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:25:48,222-Speed 5964.93 samples/sec Loss 12.2845 LearningRate 0.3154 Epoch: 4 Global Step: 41640 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:25:55,066-Speed 5985.97 samples/sec Loss 12.3321 LearningRate 0.3154 Epoch: 4 Global Step: 41650 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:26:01,938-Speed 5961.89 samples/sec Loss 12.3106 LearningRate 0.3153 Epoch: 4 Global Step: 41660 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:26:08,794-Speed 5975.07 samples/sec Loss 12.3262 LearningRate 0.3153 Epoch: 4 Global Step: 41670 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:26:15,665-Speed 5962.55 samples/sec Loss 12.3313 LearningRate 0.3153 Epoch: 4 Global Step: 41680 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:26:22,521-Speed 5975.76 samples/sec Loss 12.3210 LearningRate 0.3152 Epoch: 4 Global Step: 41690 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:26:29,477-Speed 5889.04 samples/sec Loss 12.3218 LearningRate 0.3152 Epoch: 4 Global Step: 41700 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:26:36,343-Speed 5966.67 samples/sec Loss 12.2837 LearningRate 0.3152 Epoch: 4 Global Step: 41710 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:26:43,291-Speed 5896.20 samples/sec Loss 12.3690 LearningRate 0.3151 Epoch: 4 Global Step: 41720 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:26:50,146-Speed 5975.95 samples/sec Loss 12.3170 LearningRate 0.3151 Epoch: 4 Global Step: 41730 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:26:57,038-Speed 5944.56 samples/sec Loss 12.2192 LearningRate 0.3150 Epoch: 4 Global Step: 41740 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:27:03,930-Speed 5943.70 samples/sec Loss 12.2869 LearningRate 0.3150 Epoch: 4 Global Step: 41750 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:27:10,809-Speed 5955.99 samples/sec Loss 12.3663 LearningRate 0.3150 Epoch: 4 Global Step: 41760 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:27:17,676-Speed 5965.57 samples/sec Loss 12.3158 LearningRate 0.3149 Epoch: 4 Global Step: 41770 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:27:24,543-Speed 5966.36 samples/sec Loss 12.3297 LearningRate 0.3149 Epoch: 4 Global Step: 41780 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:27:31,407-Speed 5967.94 samples/sec Loss 12.2634 LearningRate 0.3149 Epoch: 4 Global Step: 41790 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:27:38,253-Speed 5984.40 samples/sec Loss 12.3641 LearningRate 0.3148 Epoch: 4 Global Step: 41800 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:27:45,114-Speed 5970.85 samples/sec Loss 12.2816 LearningRate 0.3148 Epoch: 4 Global Step: 41810 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:27:52,008-Speed 5942.41 samples/sec Loss 12.2855 LearningRate 0.3147 Epoch: 4 Global Step: 41820 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:27:58,873-Speed 5968.34 samples/sec Loss 12.2519 LearningRate 0.3147 Epoch: 4 Global Step: 41830 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:28:05,736-Speed 5969.18 samples/sec Loss 12.2621 LearningRate 0.3147 Epoch: 4 Global Step: 41840 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:28:12,607-Speed 5962.05 samples/sec Loss 12.3465 LearningRate 0.3146 Epoch: 4 Global Step: 41850 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:28:19,472-Speed 5967.36 samples/sec Loss 12.3384 LearningRate 0.3146 Epoch: 4 Global Step: 41860 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:28:26,344-Speed 5961.31 samples/sec Loss 12.3581 LearningRate 0.3146 Epoch: 4 Global Step: 41870 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:28:33,208-Speed 5968.18 samples/sec Loss 12.3510 LearningRate 0.3145 Epoch: 4 Global Step: 41880 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:28:40,064-Speed 5976.13 samples/sec Loss 12.2684 LearningRate 0.3145 Epoch: 4 Global Step: 41890 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:28:46,930-Speed 5966.92 samples/sec Loss 12.3072 LearningRate 0.3144 Epoch: 4 Global Step: 41900 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:28:53,796-Speed 5966.33 samples/sec Loss 12.3215 LearningRate 0.3144 Epoch: 4 Global Step: 41910 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:29:00,656-Speed 5971.79 samples/sec Loss 12.3124 LearningRate 0.3144 Epoch: 4 Global Step: 41920 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:29:08,548-Speed 5190.44 samples/sec Loss 12.3229 LearningRate 0.3143 Epoch: 4 Global Step: 41930 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:29:15,395-Speed 5983.89 samples/sec Loss 12.3410 LearningRate 0.3143 Epoch: 4 Global Step: 41940 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:29:22,269-Speed 5959.83 samples/sec Loss 12.2776 LearningRate 0.3142 Epoch: 4 Global Step: 41950 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:29:29,121-Speed 5978.45 samples/sec Loss 12.3865 LearningRate 0.3142 Epoch: 4 Global Step: 41960 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:29:35,972-Speed 5979.22 samples/sec Loss 12.3127 LearningRate 0.3142 Epoch: 4 Global Step: 41970 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:29:42,829-Speed 5975.20 samples/sec Loss 12.2860 LearningRate 0.3141 Epoch: 4 Global Step: 41980 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:29:49,683-Speed 5976.83 samples/sec Loss 12.3531 LearningRate 0.3141 Epoch: 4 Global Step: 41990 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:29:56,561-Speed 5956.24 samples/sec Loss 12.2843 LearningRate 0.3141 Epoch: 4 Global Step: 42000 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:30:03,424-Speed 5969.28 samples/sec Loss 12.3143 LearningRate 0.3140 Epoch: 4 Global Step: 42010 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:30:10,285-Speed 5971.45 samples/sec Loss 12.2536 LearningRate 0.3140 Epoch: 4 Global Step: 42020 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:30:17,145-Speed 5971.96 samples/sec Loss 12.2750 LearningRate 0.3139 Epoch: 4 Global Step: 42030 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:30:23,995-Speed 5980.09 samples/sec Loss 12.2829 LearningRate 0.3139 Epoch: 4 Global Step: 42040 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:30:30,863-Speed 5964.86 samples/sec Loss 12.2603 LearningRate 0.3139 Epoch: 4 Global Step: 42050 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:30:37,720-Speed 5974.67 samples/sec Loss 12.3200 LearningRate 0.3138 Epoch: 4 Global Step: 42060 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:30:44,580-Speed 5974.44 samples/sec Loss 12.2004 LearningRate 0.3138 Epoch: 4 Global Step: 42070 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:30:51,443-Speed 5972.17 samples/sec Loss 12.2838 LearningRate 0.3138 Epoch: 4 Global Step: 42080 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:30:58,313-Speed 5963.19 samples/sec Loss 12.3671 LearningRate 0.3137 Epoch: 4 Global Step: 42090 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:31:05,189-Speed 5957.77 samples/sec Loss 12.3061 LearningRate 0.3137 Epoch: 4 Global Step: 42100 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:31:12,126-Speed 5905.77 samples/sec Loss 12.3305 LearningRate 0.3136 Epoch: 4 Global Step: 42110 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:31:18,983-Speed 5976.69 samples/sec Loss 12.2935 LearningRate 0.3136 Epoch: 4 Global Step: 42120 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:31:25,838-Speed 5975.60 samples/sec Loss 12.3050 LearningRate 0.3136 Epoch: 4 Global Step: 42130 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:31:32,684-Speed 5984.75 samples/sec Loss 12.2720 LearningRate 0.3135 Epoch: 4 Global Step: 42140 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:31:39,557-Speed 5960.73 samples/sec Loss 12.2934 LearningRate 0.3135 Epoch: 4 Global Step: 42150 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:31:46,416-Speed 5973.47 samples/sec Loss 12.2224 LearningRate 0.3134 Epoch: 4 Global Step: 42160 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:31:53,266-Speed 5979.89 samples/sec Loss 12.2938 LearningRate 0.3134 Epoch: 4 Global Step: 42170 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:32:00,211-Speed 5898.99 samples/sec Loss 12.3552 LearningRate 0.3134 Epoch: 4 Global Step: 42180 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:32:07,094-Speed 5952.00 samples/sec Loss 12.2664 LearningRate 0.3133 Epoch: 4 Global Step: 42190 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:32:13,982-Speed 5947.81 samples/sec Loss 12.2254 LearningRate 0.3133 Epoch: 4 Global Step: 42200 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:32:20,850-Speed 5965.69 samples/sec Loss 12.2791 LearningRate 0.3133 Epoch: 4 Global Step: 42210 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:32:27,698-Speed 5982.11 samples/sec Loss 12.2612 LearningRate 0.3132 Epoch: 4 Global Step: 42220 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:32:34,547-Speed 5981.74 samples/sec Loss 12.2589 LearningRate 0.3132 Epoch: 4 Global Step: 42230 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:32:41,390-Speed 5985.82 samples/sec Loss 12.3221 LearningRate 0.3131 Epoch: 4 Global Step: 42240 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:32:48,276-Speed 5949.17 samples/sec Loss 12.3270 LearningRate 0.3131 Epoch: 4 Global Step: 42250 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:32:55,127-Speed 5980.47 samples/sec Loss 12.2934 LearningRate 0.3131 Epoch: 4 Global Step: 42260 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:33:01,983-Speed 5975.04 samples/sec Loss 12.3324 LearningRate 0.3130 Epoch: 4 Global Step: 42270 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:33:08,886-Speed 5934.43 samples/sec Loss 12.2447 LearningRate 0.3130 Epoch: 4 Global Step: 42280 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:33:15,847-Speed 5886.17 samples/sec Loss 12.3387 LearningRate 0.3130 Epoch: 4 Global Step: 42290 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:33:22,860-Speed 5861.12 samples/sec Loss 12.2401 LearningRate 0.3129 Epoch: 4 Global Step: 42300 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:33:29,866-Speed 5972.36 samples/sec Loss 12.2937 LearningRate 0.3129 Epoch: 4 Global Step: 42310 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:33:36,717-Speed 5979.91 samples/sec Loss 12.2831 LearningRate 0.3128 Epoch: 4 Global Step: 42320 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:33:43,564-Speed 5983.09 samples/sec Loss 12.3447 LearningRate 0.3128 Epoch: 4 Global Step: 42330 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:33:50,441-Speed 5957.55 samples/sec Loss 12.1937 LearningRate 0.3128 Epoch: 4 Global Step: 42340 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:33:57,318-Speed 5958.38 samples/sec Loss 12.2299 LearningRate 0.3127 Epoch: 4 Global Step: 42350 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:34:04,155-Speed 5992.18 samples/sec Loss 12.2730 LearningRate 0.3127 Epoch: 4 Global Step: 42360 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 04:34:11,003-Speed 5982.48 samples/sec Loss 12.2795 LearningRate 0.3127 Epoch: 4 Global Step: 42370 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 04:34:17,856-Speed 5978.49 samples/sec Loss 12.2786 LearningRate 0.3126 Epoch: 4 Global Step: 42380 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 04:34:24,739-Speed 5952.15 samples/sec Loss 12.2285 LearningRate 0.3126 Epoch: 4 Global Step: 42390 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 04:34:31,643-Speed 5933.64 samples/sec Loss 12.2954 LearningRate 0.3125 Epoch: 4 Global Step: 42400 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 04:34:38,502-Speed 5973.03 samples/sec Loss 12.2785 LearningRate 0.3125 Epoch: 4 Global Step: 42410 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 04:34:45,387-Speed 5950.56 samples/sec Loss 12.2556 LearningRate 0.3125 Epoch: 4 Global Step: 42420 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 04:34:52,258-Speed 5962.13 samples/sec Loss 12.2340 LearningRate 0.3124 Epoch: 4 Global Step: 42430 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 04:34:59,118-Speed 5972.08 samples/sec Loss 12.2550 LearningRate 0.3124 Epoch: 4 Global Step: 42440 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 04:35:05,973-Speed 5978.29 samples/sec Loss 12.2504 LearningRate 0.3123 Epoch: 4 Global Step: 42450 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 04:35:12,842-Speed 5963.62 samples/sec Loss 12.3497 LearningRate 0.3123 Epoch: 4 Global Step: 42460 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:35:19,716-Speed 5960.78 samples/sec Loss 12.3179 LearningRate 0.3123 Epoch: 4 Global Step: 42470 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:35:26,615-Speed 5939.76 samples/sec Loss 12.2261 LearningRate 0.3122 Epoch: 4 Global Step: 42480 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:35:33,485-Speed 5962.99 samples/sec Loss 12.2666 LearningRate 0.3122 Epoch: 4 Global Step: 42490 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:35:40,356-Speed 5962.63 samples/sec Loss 12.2607 LearningRate 0.3122 Epoch: 4 Global Step: 42500 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:35:47,229-Speed 5961.04 samples/sec Loss 12.3064 LearningRate 0.3121 Epoch: 4 Global Step: 42510 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:35:54,101-Speed 5960.97 samples/sec Loss 12.2570 LearningRate 0.3121 Epoch: 4 Global Step: 42520 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:36:00,990-Speed 5946.39 samples/sec Loss 12.1705 LearningRate 0.3120 Epoch: 4 Global Step: 42530 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:36:07,851-Speed 5972.18 samples/sec Loss 12.2525 LearningRate 0.3120 Epoch: 4 Global Step: 42540 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:36:14,700-Speed 5981.22 samples/sec Loss 12.2786 LearningRate 0.3120 Epoch: 4 Global Step: 42550 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:36:21,575-Speed 5958.65 samples/sec Loss 12.2113 LearningRate 0.3119 Epoch: 4 Global Step: 42560 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:36:28,464-Speed 5947.13 samples/sec Loss 12.2689 LearningRate 0.3119 Epoch: 4 Global Step: 42570 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:36:35,332-Speed 5964.95 samples/sec Loss 12.2543 LearningRate 0.3119 Epoch: 4 Global Step: 42580 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:36:42,193-Speed 5971.28 samples/sec Loss 12.2510 LearningRate 0.3118 Epoch: 4 Global Step: 42590 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:36:49,052-Speed 5973.15 samples/sec Loss 12.2249 LearningRate 0.3118 Epoch: 4 Global Step: 42600 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:36:55,917-Speed 5967.36 samples/sec Loss 12.1716 LearningRate 0.3117 Epoch: 4 Global Step: 42610 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:37:02,765-Speed 5982.70 samples/sec Loss 12.3283 LearningRate 0.3117 Epoch: 4 Global Step: 42620 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:37:09,624-Speed 5972.97 samples/sec Loss 12.2414 LearningRate 0.3117 Epoch: 4 Global Step: 42630 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:37:16,483-Speed 5974.66 samples/sec Loss 12.2059 LearningRate 0.3116 Epoch: 4 Global Step: 42640 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:37:23,330-Speed 5983.18 samples/sec Loss 12.2874 LearningRate 0.3116 Epoch: 4 Global Step: 42650 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:37:30,224-Speed 5942.98 samples/sec Loss 12.2791 LearningRate 0.3116 Epoch: 4 Global Step: 42660 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:37:37,080-Speed 5975.26 samples/sec Loss 12.2503 LearningRate 0.3115 Epoch: 4 Global Step: 42670 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:37:43,936-Speed 5975.11 samples/sec Loss 12.3297 LearningRate 0.3115 Epoch: 4 Global Step: 42680 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:37:50,808-Speed 5962.83 samples/sec Loss 12.2216 LearningRate 0.3114 Epoch: 4 Global Step: 42690 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:37:57,677-Speed 5964.13 samples/sec Loss 12.2489 LearningRate 0.3114 Epoch: 4 Global Step: 42700 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:38:04,540-Speed 5969.20 samples/sec Loss 12.3458 LearningRate 0.3114 Epoch: 4 Global Step: 42710 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:38:11,404-Speed 5968.05 samples/sec Loss 12.2509 LearningRate 0.3113 Epoch: 4 Global Step: 42720 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:38:18,262-Speed 5974.67 samples/sec Loss 12.2372 LearningRate 0.3113 Epoch: 4 Global Step: 42730 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:38:25,125-Speed 5969.22 samples/sec Loss 12.1879 LearningRate 0.3113 Epoch: 4 Global Step: 42740 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:38:31,991-Speed 5966.60 samples/sec Loss 12.2875 LearningRate 0.3112 Epoch: 4 Global Step: 42750 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:38:38,858-Speed 5966.09 samples/sec Loss 12.1849 LearningRate 0.3112 Epoch: 4 Global Step: 42760 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:38:45,714-Speed 5975.47 samples/sec Loss 12.2079 LearningRate 0.3111 Epoch: 4 Global Step: 42770 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:38:52,589-Speed 5960.63 samples/sec Loss 12.1590 LearningRate 0.3111 Epoch: 4 Global Step: 42780 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:38:59,492-Speed 5934.62 samples/sec Loss 12.1217 LearningRate 0.3111 Epoch: 4 Global Step: 42790 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:39:06,355-Speed 5968.83 samples/sec Loss 12.2374 LearningRate 0.3110 Epoch: 4 Global Step: 42800 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:39:13,217-Speed 5971.12 samples/sec Loss 12.2571 LearningRate 0.3110 Epoch: 4 Global Step: 42810 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:39:20,084-Speed 5968.97 samples/sec Loss 12.2931 LearningRate 0.3109 Epoch: 4 Global Step: 42820 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:39:26,949-Speed 5966.76 samples/sec Loss 12.2628 LearningRate 0.3109 Epoch: 4 Global Step: 42830 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:39:33,812-Speed 5969.07 samples/sec Loss 12.2178 LearningRate 0.3109 Epoch: 4 Global Step: 42840 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:39:40,691-Speed 5955.67 samples/sec Loss 12.2049 LearningRate 0.3108 Epoch: 4 Global Step: 42850 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:39:47,551-Speed 5973.88 samples/sec Loss 12.2370 LearningRate 0.3108 Epoch: 4 Global Step: 42860 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:39:54,413-Speed 5970.41 samples/sec Loss 12.2385 LearningRate 0.3108 Epoch: 4 Global Step: 42870 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:40:01,279-Speed 5969.25 samples/sec Loss 12.2428 LearningRate 0.3107 Epoch: 4 Global Step: 42880 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:40:08,138-Speed 5972.29 samples/sec Loss 12.2250 LearningRate 0.3107 Epoch: 4 Global Step: 42890 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:40:15,000-Speed 5970.85 samples/sec Loss 12.2959 LearningRate 0.3106 Epoch: 4 Global Step: 42900 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:40:21,851-Speed 5980.41 samples/sec Loss 12.1980 LearningRate 0.3106 Epoch: 4 Global Step: 42910 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:40:28,745-Speed 5942.46 samples/sec Loss 12.2654 LearningRate 0.3106 Epoch: 4 Global Step: 42920 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:40:35,598-Speed 5978.17 samples/sec Loss 12.2313 LearningRate 0.3105 Epoch: 4 Global Step: 42930 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:40:42,466-Speed 5964.80 samples/sec Loss 12.2207 LearningRate 0.3105 Epoch: 4 Global Step: 42940 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:40:49,334-Speed 5965.26 samples/sec Loss 12.2958 LearningRate 0.3105 Epoch: 4 Global Step: 42950 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:40:56,210-Speed 5957.90 samples/sec Loss 12.2269 LearningRate 0.3104 Epoch: 4 Global Step: 42960 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:41:03,075-Speed 5967.96 samples/sec Loss 12.2758 LearningRate 0.3104 Epoch: 4 Global Step: 42970 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:41:10,042-Speed 5880.26 samples/sec Loss 12.1963 LearningRate 0.3103 Epoch: 4 Global Step: 42980 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:41:16,918-Speed 5958.57 samples/sec Loss 12.3013 LearningRate 0.3103 Epoch: 4 Global Step: 42990 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:41:23,771-Speed 5978.16 samples/sec Loss 12.2319 LearningRate 0.3103 Epoch: 4 Global Step: 43000 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:41:30,617-Speed 5984.04 samples/sec Loss 12.2641 LearningRate 0.3102 Epoch: 4 Global Step: 43010 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:41:37,462-Speed 5984.72 samples/sec Loss 12.2127 LearningRate 0.3102 Epoch: 4 Global Step: 43020 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:41:44,333-Speed 5961.85 samples/sec Loss 12.2773 LearningRate 0.3102 Epoch: 4 Global Step: 43030 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:41:51,192-Speed 5973.26 samples/sec Loss 12.3037 LearningRate 0.3101 Epoch: 4 Global Step: 43040 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:41:58,078-Speed 5949.51 samples/sec Loss 12.2201 LearningRate 0.3101 Epoch: 4 Global Step: 43050 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:42:04,913-Speed 5993.71 samples/sec Loss 12.2070 LearningRate 0.3100 Epoch: 4 Global Step: 43060 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:42:11,767-Speed 5979.29 samples/sec Loss 12.1550 LearningRate 0.3100 Epoch: 4 Global Step: 43070 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:42:18,626-Speed 5973.09 samples/sec Loss 12.2192 LearningRate 0.3100 Epoch: 4 Global Step: 43080 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:42:25,481-Speed 5976.07 samples/sec Loss 12.1859 LearningRate 0.3099 Epoch: 4 Global Step: 43090 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:42:32,361-Speed 5956.98 samples/sec Loss 12.1811 LearningRate 0.3099 Epoch: 4 Global Step: 43100 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:42:39,213-Speed 5978.49 samples/sec Loss 12.2055 LearningRate 0.3099 Epoch: 4 Global Step: 43110 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:42:46,075-Speed 5970.19 samples/sec Loss 12.1858 LearningRate 0.3098 Epoch: 4 Global Step: 43120 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:42:52,921-Speed 5984.29 samples/sec Loss 12.1804 LearningRate 0.3098 Epoch: 4 Global Step: 43130 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:42:59,787-Speed 5967.51 samples/sec Loss 12.2169 LearningRate 0.3097 Epoch: 4 Global Step: 43140 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:43:06,667-Speed 5954.77 samples/sec Loss 12.2450 LearningRate 0.3097 Epoch: 4 Global Step: 43150 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:43:13,528-Speed 5971.04 samples/sec Loss 12.2020 LearningRate 0.3097 Epoch: 4 Global Step: 43160 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:43:20,404-Speed 5957.77 samples/sec Loss 12.2074 LearningRate 0.3096 Epoch: 4 Global Step: 43170 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:43:27,282-Speed 5956.39 samples/sec Loss 12.2336 LearningRate 0.3096 Epoch: 4 Global Step: 43180 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:43:34,160-Speed 5956.35 samples/sec Loss 12.1448 LearningRate 0.3096 Epoch: 4 Global Step: 43190 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:43:41,015-Speed 5976.56 samples/sec Loss 12.2047 LearningRate 0.3095 Epoch: 4 Global Step: 43200 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:43:47,901-Speed 5949.66 samples/sec Loss 12.1946 LearningRate 0.3095 Epoch: 4 Global Step: 43210 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:43:54,750-Speed 5983.05 samples/sec Loss 12.1906 LearningRate 0.3094 Epoch: 4 Global Step: 43220 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:44:01,616-Speed 5966.72 samples/sec Loss 12.2063 LearningRate 0.3094 Epoch: 4 Global Step: 43230 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:44:08,464-Speed 5982.19 samples/sec Loss 12.2265 LearningRate 0.3094 Epoch: 4 Global Step: 43240 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:44:15,331-Speed 5966.50 samples/sec Loss 12.2573 LearningRate 0.3093 Epoch: 4 Global Step: 43250 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:44:22,181-Speed 5981.20 samples/sec Loss 12.1885 LearningRate 0.3093 Epoch: 4 Global Step: 43260 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:44:29,051-Speed 5962.85 samples/sec Loss 12.1911 LearningRate 0.3093 Epoch: 4 Global Step: 43270 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:44:35,910-Speed 5973.16 samples/sec Loss 12.1864 LearningRate 0.3092 Epoch: 4 Global Step: 43280 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:44:42,771-Speed 5971.34 samples/sec Loss 12.2014 LearningRate 0.3092 Epoch: 4 Global Step: 43290 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:44:49,647-Speed 5957.11 samples/sec Loss 12.1139 LearningRate 0.3091 Epoch: 4 Global Step: 43300 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:45:02,358-Speed 3222.78 samples/sec Loss 12.1929 LearningRate 0.3091 Epoch: 4 Global Step: 43310 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:45:09,226-Speed 5964.86 samples/sec Loss 12.1833 LearningRate 0.3091 Epoch: 4 Global Step: 43320 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:45:16,076-Speed 5980.74 samples/sec Loss 12.2575 LearningRate 0.3090 Epoch: 4 Global Step: 43330 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:45:22,952-Speed 5957.92 samples/sec Loss 12.1551 LearningRate 0.3090 Epoch: 4 Global Step: 43340 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:45:29,812-Speed 5972.81 samples/sec Loss 12.1622 LearningRate 0.3089 Epoch: 4 Global Step: 43350 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:45:36,668-Speed 5974.91 samples/sec Loss 12.2281 LearningRate 0.3089 Epoch: 4 Global Step: 43360 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:45:43,536-Speed 5964.98 samples/sec Loss 12.1932 LearningRate 0.3089 Epoch: 4 Global Step: 43370 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:45:50,406-Speed 5963.20 samples/sec Loss 12.1923 LearningRate 0.3088 Epoch: 4 Global Step: 43380 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:45:57,265-Speed 5973.12 samples/sec Loss 12.1413 LearningRate 0.3088 Epoch: 4 Global Step: 43390 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:46:04,138-Speed 5960.79 samples/sec Loss 12.2150 LearningRate 0.3088 Epoch: 4 Global Step: 43400 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:46:10,995-Speed 5974.24 samples/sec Loss 12.2418 LearningRate 0.3087 Epoch: 4 Global Step: 43410 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:46:17,861-Speed 5967.21 samples/sec Loss 12.1994 LearningRate 0.3087 Epoch: 4 Global Step: 43420 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:46:24,713-Speed 5978.99 samples/sec Loss 12.1776 LearningRate 0.3086 Epoch: 4 Global Step: 43430 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:46:31,569-Speed 5974.87 samples/sec Loss 12.1951 LearningRate 0.3086 Epoch: 4 Global Step: 43440 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:46:38,437-Speed 5965.38 samples/sec Loss 12.1873 LearningRate 0.3086 Epoch: 4 Global Step: 43450 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:46:45,296-Speed 5972.61 samples/sec Loss 12.1203 LearningRate 0.3085 Epoch: 4 Global Step: 43460 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:46:52,188-Speed 5944.47 samples/sec Loss 12.2076 LearningRate 0.3085 Epoch: 4 Global Step: 43470 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:46:59,043-Speed 5976.51 samples/sec Loss 12.1772 LearningRate 0.3085 Epoch: 4 Global Step: 43480 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:47:05,873-Speed 5997.53 samples/sec Loss 12.2039 LearningRate 0.3084 Epoch: 4 Global Step: 43490 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 04:47:12,719-Speed 5984.31 samples/sec Loss 12.1745 LearningRate 0.3084 Epoch: 4 Global Step: 43500 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 04:47:19,562-Speed 5986.35 samples/sec Loss 12.1746 LearningRate 0.3083 Epoch: 4 Global Step: 43510 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 04:47:26,415-Speed 5978.26 samples/sec Loss 12.2296 LearningRate 0.3083 Epoch: 4 Global Step: 43520 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 04:47:33,263-Speed 5981.83 samples/sec Loss 12.1122 LearningRate 0.3083 Epoch: 4 Global Step: 43530 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 04:47:40,123-Speed 5971.51 samples/sec Loss 12.1680 LearningRate 0.3082 Epoch: 4 Global Step: 43540 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 04:47:46,996-Speed 5963.85 samples/sec Loss 12.1942 LearningRate 0.3082 Epoch: 4 Global Step: 43550 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 04:47:53,892-Speed 5940.66 samples/sec Loss 12.1087 LearningRate 0.3082 Epoch: 4 Global Step: 43560 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 04:48:00,770-Speed 5955.74 samples/sec Loss 12.1028 LearningRate 0.3081 Epoch: 4 Global Step: 43570 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 04:48:07,618-Speed 5983.24 samples/sec Loss 12.1808 LearningRate 0.3081 Epoch: 4 Global Step: 43580 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 04:48:14,483-Speed 5967.39 samples/sec Loss 12.1671 LearningRate 0.3080 Epoch: 4 Global Step: 43590 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:48:21,339-Speed 5975.69 samples/sec Loss 12.1214 LearningRate 0.3080 Epoch: 4 Global Step: 43600 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:48:29,277-Speed 5160.86 samples/sec Loss 12.1920 LearningRate 0.3080 Epoch: 4 Global Step: 43610 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:48:36,179-Speed 5937.54 samples/sec Loss 12.1446 LearningRate 0.3079 Epoch: 4 Global Step: 43620 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:48:43,025-Speed 5984.12 samples/sec Loss 12.2184 LearningRate 0.3079 Epoch: 4 Global Step: 43630 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:48:49,883-Speed 5973.45 samples/sec Loss 12.2308 LearningRate 0.3079 Epoch: 4 Global Step: 43640 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:48:56,757-Speed 5960.18 samples/sec Loss 12.2223 LearningRate 0.3078 Epoch: 4 Global Step: 43650 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:49:03,610-Speed 5978.47 samples/sec Loss 12.1085 LearningRate 0.3078 Epoch: 4 Global Step: 43660 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:49:10,592-Speed 5867.20 samples/sec Loss 12.2206 LearningRate 0.3077 Epoch: 4 Global Step: 43670 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:49:17,537-Speed 5899.60 samples/sec Loss 12.1812 LearningRate 0.3077 Epoch: 4 Global Step: 43680 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:49:24,484-Speed 5896.94 samples/sec Loss 12.2238 LearningRate 0.3077 Epoch: 4 Global Step: 43690 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:49:31,429-Speed 5899.57 samples/sec Loss 12.1191 LearningRate 0.3076 Epoch: 4 Global Step: 43700 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:49:38,273-Speed 5985.65 samples/sec Loss 12.1709 LearningRate 0.3076 Epoch: 4 Global Step: 43710 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:49:45,111-Speed 5990.54 samples/sec Loss 12.1871 LearningRate 0.3076 Epoch: 4 Global Step: 43720 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:49:51,978-Speed 5966.49 samples/sec Loss 12.2428 LearningRate 0.3075 Epoch: 4 Global Step: 43730 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:49:58,831-Speed 5977.54 samples/sec Loss 12.2341 LearningRate 0.3075 Epoch: 4 Global Step: 43740 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:50:05,678-Speed 5983.39 samples/sec Loss 12.0970 LearningRate 0.3074 Epoch: 4 Global Step: 43750 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:50:12,536-Speed 5973.56 samples/sec Loss 12.1587 LearningRate 0.3074 Epoch: 4 Global Step: 43760 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:50:19,390-Speed 5977.02 samples/sec Loss 12.1732 LearningRate 0.3074 Epoch: 4 Global Step: 43770 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:50:26,267-Speed 5957.41 samples/sec Loss 12.1594 LearningRate 0.3073 Epoch: 4 Global Step: 43780 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:50:33,124-Speed 5974.63 samples/sec Loss 12.1858 LearningRate 0.3073 Epoch: 4 Global Step: 43790 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:50:39,980-Speed 5975.95 samples/sec Loss 12.2229 LearningRate 0.3073 Epoch: 4 Global Step: 43800 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:50:46,836-Speed 5976.55 samples/sec Loss 12.2114 LearningRate 0.3072 Epoch: 4 Global Step: 43810 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:50:53,684-Speed 5982.19 samples/sec Loss 12.1807 LearningRate 0.3072 Epoch: 4 Global Step: 43820 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:51:00,519-Speed 5993.19 samples/sec Loss 12.1491 LearningRate 0.3071 Epoch: 4 Global Step: 43830 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:51:07,365-Speed 5984.26 samples/sec Loss 12.1366 LearningRate 0.3071 Epoch: 4 Global Step: 43840 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:51:14,228-Speed 5972.28 samples/sec Loss 12.1732 LearningRate 0.3071 Epoch: 4 Global Step: 43850 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:51:21,078-Speed 5980.23 samples/sec Loss 12.2070 LearningRate 0.3070 Epoch: 4 Global Step: 43860 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:51:27,924-Speed 5984.15 samples/sec Loss 12.2030 LearningRate 0.3070 Epoch: 4 Global Step: 43870 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:51:34,779-Speed 5976.45 samples/sec Loss 12.2068 LearningRate 0.3070 Epoch: 4 Global Step: 43880 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:51:41,623-Speed 5985.53 samples/sec Loss 12.2202 LearningRate 0.3069 Epoch: 4 Global Step: 43890 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:51:48,473-Speed 5980.37 samples/sec Loss 12.0997 LearningRate 0.3069 Epoch: 4 Global Step: 43900 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:51:55,345-Speed 5962.68 samples/sec Loss 12.2495 LearningRate 0.3068 Epoch: 4 Global Step: 43910 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:52:02,207-Speed 5970.10 samples/sec Loss 12.2534 LearningRate 0.3068 Epoch: 4 Global Step: 43920 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:52:09,074-Speed 5965.26 samples/sec Loss 12.1438 LearningRate 0.3068 Epoch: 4 Global Step: 43930 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:52:15,940-Speed 5967.13 samples/sec Loss 12.2088 LearningRate 0.3067 Epoch: 4 Global Step: 43940 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:52:22,799-Speed 5973.17 samples/sec Loss 12.1747 LearningRate 0.3067 Epoch: 4 Global Step: 43950 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:52:29,639-Speed 5989.24 samples/sec Loss 12.2419 LearningRate 0.3067 Epoch: 4 Global Step: 43960 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:52:36,511-Speed 5961.64 samples/sec Loss 12.1382 LearningRate 0.3066 Epoch: 4 Global Step: 43970 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:52:43,364-Speed 5978.55 samples/sec Loss 12.2317 LearningRate 0.3066 Epoch: 4 Global Step: 43980 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:52:50,224-Speed 5971.42 samples/sec Loss 12.0493 LearningRate 0.3065 Epoch: 4 Global Step: 43990 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:52:57,072-Speed 5982.70 samples/sec Loss 12.2457 LearningRate 0.3065 Epoch: 4 Global Step: 44000 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:53:03,914-Speed 5987.44 samples/sec Loss 12.1192 LearningRate 0.3065 Epoch: 4 Global Step: 44010 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:53:10,776-Speed 5970.56 samples/sec Loss 12.1081 LearningRate 0.3064 Epoch: 4 Global Step: 44020 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:53:17,624-Speed 5982.51 samples/sec Loss 12.1248 LearningRate 0.3064 Epoch: 4 Global Step: 44030 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:53:24,477-Speed 5977.97 samples/sec Loss 12.1380 LearningRate 0.3064 Epoch: 4 Global Step: 44040 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:53:31,336-Speed 5972.90 samples/sec Loss 12.1640 LearningRate 0.3063 Epoch: 4 Global Step: 44050 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:53:38,195-Speed 5972.55 samples/sec Loss 12.1988 LearningRate 0.3063 Epoch: 4 Global Step: 44060 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:53:45,043-Speed 5982.95 samples/sec Loss 12.2378 LearningRate 0.3062 Epoch: 4 Global Step: 44070 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:53:51,898-Speed 5978.07 samples/sec Loss 12.1729 LearningRate 0.3062 Epoch: 4 Global Step: 44080 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:53:58,736-Speed 5991.06 samples/sec Loss 12.1582 LearningRate 0.3062 Epoch: 4 Global Step: 44090 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:54:05,602-Speed 5967.28 samples/sec Loss 12.1565 LearningRate 0.3061 Epoch: 4 Global Step: 44100 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:54:12,449-Speed 5982.78 samples/sec Loss 12.0446 LearningRate 0.3061 Epoch: 4 Global Step: 44110 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:54:19,307-Speed 5973.35 samples/sec Loss 12.2222 LearningRate 0.3061 Epoch: 4 Global Step: 44120 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:54:26,151-Speed 5986.69 samples/sec Loss 12.1137 LearningRate 0.3060 Epoch: 4 Global Step: 44130 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:54:33,015-Speed 5967.60 samples/sec Loss 12.1430 LearningRate 0.3060 Epoch: 4 Global Step: 44140 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:54:39,873-Speed 5974.86 samples/sec Loss 12.2716 LearningRate 0.3059 Epoch: 4 Global Step: 44150 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:54:46,727-Speed 5976.94 samples/sec Loss 12.1307 LearningRate 0.3059 Epoch: 4 Global Step: 44160 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:54:53,675-Speed 5896.75 samples/sec Loss 12.2150 LearningRate 0.3059 Epoch: 4 Global Step: 44170 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:55:00,537-Speed 5969.69 samples/sec Loss 12.1348 LearningRate 0.3058 Epoch: 4 Global Step: 44180 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:55:07,391-Speed 5977.44 samples/sec Loss 12.2064 LearningRate 0.3058 Epoch: 4 Global Step: 44190 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:55:14,246-Speed 5976.12 samples/sec Loss 12.1077 LearningRate 0.3058 Epoch: 4 Global Step: 44200 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:55:21,107-Speed 5971.41 samples/sec Loss 12.1410 LearningRate 0.3057 Epoch: 4 Global Step: 44210 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:55:27,971-Speed 5968.98 samples/sec Loss 12.1263 LearningRate 0.3057 Epoch: 4 Global Step: 44220 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:55:34,834-Speed 5969.00 samples/sec Loss 12.1565 LearningRate 0.3056 Epoch: 4 Global Step: 44230 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:55:41,701-Speed 5965.98 samples/sec Loss 12.1472 LearningRate 0.3056 Epoch: 4 Global Step: 44240 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:55:48,585-Speed 5950.77 samples/sec Loss 12.0869 LearningRate 0.3056 Epoch: 4 Global Step: 44250 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:55:55,444-Speed 5973.70 samples/sec Loss 12.1449 LearningRate 0.3055 Epoch: 4 Global Step: 44260 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:56:02,312-Speed 5965.05 samples/sec Loss 12.1568 LearningRate 0.3055 Epoch: 4 Global Step: 44270 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:56:09,144-Speed 5995.57 samples/sec Loss 12.0719 LearningRate 0.3055 Epoch: 4 Global Step: 44280 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:56:15,988-Speed 5986.18 samples/sec Loss 12.2084 LearningRate 0.3054 Epoch: 4 Global Step: 44290 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:56:22,843-Speed 5976.73 samples/sec Loss 12.1155 LearningRate 0.3054 Epoch: 4 Global Step: 44300 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:56:29,718-Speed 5959.02 samples/sec Loss 12.1567 LearningRate 0.3053 Epoch: 4 Global Step: 44310 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:56:36,602-Speed 5951.20 samples/sec Loss 12.1285 LearningRate 0.3053 Epoch: 4 Global Step: 44320 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:56:43,452-Speed 5980.87 samples/sec Loss 12.1381 LearningRate 0.3053 Epoch: 4 Global Step: 44330 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:56:50,352-Speed 5937.34 samples/sec Loss 11.9966 LearningRate 0.3052 Epoch: 4 Global Step: 44340 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:56:57,209-Speed 5975.00 samples/sec Loss 12.1323 LearningRate 0.3052 Epoch: 4 Global Step: 44350 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:57:04,082-Speed 5959.73 samples/sec Loss 12.1221 LearningRate 0.3052 Epoch: 4 Global Step: 44360 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:57:11,012-Speed 5911.60 samples/sec Loss 12.1237 LearningRate 0.3051 Epoch: 4 Global Step: 44370 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:57:17,883-Speed 5962.72 samples/sec Loss 12.1292 LearningRate 0.3051 Epoch: 4 Global Step: 44380 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:57:24,754-Speed 5962.40 samples/sec Loss 12.0903 LearningRate 0.3050 Epoch: 4 Global Step: 44390 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 04:57:31,593-Speed 5990.13 samples/sec Loss 12.0929 LearningRate 0.3050 Epoch: 4 Global Step: 44400 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:57:38,454-Speed 5971.38 samples/sec Loss 12.1770 LearningRate 0.3050 Epoch: 4 Global Step: 44410 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:57:45,308-Speed 5977.53 samples/sec Loss 12.2501 LearningRate 0.3049 Epoch: 4 Global Step: 44420 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:57:52,185-Speed 5957.62 samples/sec Loss 12.1672 LearningRate 0.3049 Epoch: 4 Global Step: 44430 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:57:59,051-Speed 5966.88 samples/sec Loss 12.1057 LearningRate 0.3049 Epoch: 4 Global Step: 44440 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:58:05,895-Speed 5985.56 samples/sec Loss 12.0729 LearningRate 0.3048 Epoch: 4 Global Step: 44450 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:58:12,743-Speed 5982.97 samples/sec Loss 12.1908 LearningRate 0.3048 Epoch: 4 Global Step: 44460 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 04:58:19,590-Speed 5982.61 samples/sec Loss 12.1686 LearningRate 0.3047 Epoch: 4 Global Step: 44470 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 04:58:26,437-Speed 5986.40 samples/sec Loss 12.0700 LearningRate 0.3047 Epoch: 4 Global Step: 44480 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 04:58:33,315-Speed 5955.78 samples/sec Loss 12.1766 LearningRate 0.3047 Epoch: 4 Global Step: 44490 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 04:58:40,159-Speed 5986.61 samples/sec Loss 12.2113 LearningRate 0.3046 Epoch: 4 Global Step: 44500 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 04:58:47,022-Speed 5969.04 samples/sec Loss 12.1432 LearningRate 0.3046 Epoch: 4 Global Step: 44510 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 04:58:53,901-Speed 5956.40 samples/sec Loss 12.1080 LearningRate 0.3046 Epoch: 4 Global Step: 44520 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 04:59:00,746-Speed 5985.06 samples/sec Loss 12.1292 LearningRate 0.3045 Epoch: 4 Global Step: 44530 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 04:59:07,620-Speed 5959.29 samples/sec Loss 12.1320 LearningRate 0.3045 Epoch: 4 Global Step: 44540 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 04:59:14,467-Speed 5983.38 samples/sec Loss 12.1756 LearningRate 0.3044 Epoch: 4 Global Step: 44550 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 04:59:21,334-Speed 5965.29 samples/sec Loss 12.1187 LearningRate 0.3044 Epoch: 4 Global Step: 44560 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:59:28,191-Speed 5974.99 samples/sec Loss 12.0831 LearningRate 0.3044 Epoch: 4 Global Step: 44570 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:59:35,043-Speed 5979.24 samples/sec Loss 12.0574 LearningRate 0.3043 Epoch: 4 Global Step: 44580 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:59:41,943-Speed 5936.67 samples/sec Loss 12.1378 LearningRate 0.3043 Epoch: 4 Global Step: 44590 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:59:48,863-Speed 5920.20 samples/sec Loss 12.1392 LearningRate 0.3043 Epoch: 4 Global Step: 44600 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 04:59:55,791-Speed 5912.78 samples/sec Loss 12.1854 LearningRate 0.3042 Epoch: 4 Global Step: 44610 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:00:02,687-Speed 5941.37 samples/sec Loss 12.0778 LearningRate 0.3042 Epoch: 4 Global Step: 44620 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:00:09,563-Speed 5958.55 samples/sec Loss 12.0837 LearningRate 0.3041 Epoch: 4 Global Step: 44630 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:00:16,447-Speed 5950.44 samples/sec Loss 12.2109 LearningRate 0.3041 Epoch: 4 Global Step: 44640 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:00:23,295-Speed 5982.66 samples/sec Loss 12.2524 LearningRate 0.3041 Epoch: 4 Global Step: 44650 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:00:30,161-Speed 5966.85 samples/sec Loss 12.0454 LearningRate 0.3040 Epoch: 4 Global Step: 44660 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:00:37,010-Speed 5982.07 samples/sec Loss 12.0847 LearningRate 0.3040 Epoch: 4 Global Step: 44670 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:00:43,866-Speed 5975.56 samples/sec Loss 12.0877 LearningRate 0.3040 Epoch: 4 Global Step: 44680 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:00:50,724-Speed 5973.65 samples/sec Loss 12.0698 LearningRate 0.3039 Epoch: 4 Global Step: 44690 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:00:57,576-Speed 5980.96 samples/sec Loss 12.0686 LearningRate 0.3039 Epoch: 4 Global Step: 44700 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:01:04,430-Speed 5977.52 samples/sec Loss 12.1537 LearningRate 0.3038 Epoch: 4 Global Step: 44710 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:01:11,288-Speed 5972.43 samples/sec Loss 12.1676 LearningRate 0.3038 Epoch: 4 Global Step: 44720 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:01:18,276-Speed 5862.93 samples/sec Loss 12.1503 LearningRate 0.3038 Epoch: 4 Global Step: 44730 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:01:25,208-Speed 5910.01 samples/sec Loss 12.0866 LearningRate 0.3037 Epoch: 4 Global Step: 44740 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:01:32,101-Speed 5944.19 samples/sec Loss 12.0694 LearningRate 0.3037 Epoch: 4 Global Step: 44750 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:01:38,952-Speed 5980.03 samples/sec Loss 12.1022 LearningRate 0.3037 Epoch: 4 Global Step: 44760 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:01:45,800-Speed 5982.61 samples/sec Loss 12.1619 LearningRate 0.3036 Epoch: 4 Global Step: 44770 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:01:52,666-Speed 5967.87 samples/sec Loss 12.1424 LearningRate 0.3036 Epoch: 4 Global Step: 44780 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:01:59,519-Speed 5978.74 samples/sec Loss 12.0799 LearningRate 0.3035 Epoch: 4 Global Step: 44790 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:02:06,393-Speed 5959.73 samples/sec Loss 12.0972 LearningRate 0.3035 Epoch: 4 Global Step: 44800 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:02:13,259-Speed 5966.77 samples/sec Loss 12.1115 LearningRate 0.3035 Epoch: 4 Global Step: 44810 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:02:20,115-Speed 5975.65 samples/sec Loss 12.0619 LearningRate 0.3034 Epoch: 4 Global Step: 44820 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:02:26,993-Speed 5956.10 samples/sec Loss 12.0588 LearningRate 0.3034 Epoch: 4 Global Step: 44830 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:02:33,856-Speed 5969.72 samples/sec Loss 12.0532 LearningRate 0.3034 Epoch: 4 Global Step: 44840 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:02:40,703-Speed 5982.76 samples/sec Loss 12.0458 LearningRate 0.3033 Epoch: 4 Global Step: 44850 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:02:47,570-Speed 5965.52 samples/sec Loss 12.0171 LearningRate 0.3033 Epoch: 4 Global Step: 44860 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:02:54,420-Speed 5980.01 samples/sec Loss 12.0469 LearningRate 0.3033 Epoch: 4 Global Step: 44870 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:03:01,290-Speed 5963.49 samples/sec Loss 12.0426 LearningRate 0.3032 Epoch: 4 Global Step: 44880 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:03:08,139-Speed 5981.34 samples/sec Loss 12.0940 LearningRate 0.3032 Epoch: 4 Global Step: 44890 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:03:15,000-Speed 5972.01 samples/sec Loss 12.1427 LearningRate 0.3031 Epoch: 4 Global Step: 44900 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:03:21,858-Speed 5972.80 samples/sec Loss 12.0669 LearningRate 0.3031 Epoch: 4 Global Step: 44910 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:03:28,843-Speed 5865.86 samples/sec Loss 12.1135 LearningRate 0.3031 Epoch: 4 Global Step: 44920 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:03:35,712-Speed 5964.86 samples/sec Loss 12.0717 LearningRate 0.3030 Epoch: 4 Global Step: 44930 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:03:42,566-Speed 5977.24 samples/sec Loss 12.0778 LearningRate 0.3030 Epoch: 4 Global Step: 44940 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:03:49,539-Speed 5875.60 samples/sec Loss 12.0979 LearningRate 0.3030 Epoch: 4 Global Step: 44950 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:03:56,387-Speed 5982.22 samples/sec Loss 12.0961 LearningRate 0.3029 Epoch: 4 Global Step: 44960 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:04:03,257-Speed 5963.15 samples/sec Loss 12.0631 LearningRate 0.3029 Epoch: 4 Global Step: 44970 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:04:10,119-Speed 5973.76 samples/sec Loss 12.0895 LearningRate 0.3028 Epoch: 4 Global Step: 44980 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:04:17,004-Speed 5950.51 samples/sec Loss 12.1194 LearningRate 0.3028 Epoch: 4 Global Step: 44990 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:04:23,891-Speed 5948.90 samples/sec Loss 12.0505 LearningRate 0.3028 Epoch: 4 Global Step: 45000 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:04:50,929-[lfw][45000]XNorm: 22.296307 Training: 2022-01-08 05:04:50,930-[lfw][45000]Accuracy-Flip: 0.99700+-0.00245 Training: 2022-01-08 05:04:50,930-[lfw][45000]Accuracy-Highest: 0.99700 Training: 2022-01-08 05:05:22,252-[cfp_fp][45000]XNorm: 19.527793 Training: 2022-01-08 05:05:22,253-[cfp_fp][45000]Accuracy-Flip: 0.97043+-0.00897 Training: 2022-01-08 05:05:22,254-[cfp_fp][45000]Accuracy-Highest: 0.97057 Training: 2022-01-08 05:05:49,095-[agedb_30][45000]XNorm: 21.845312 Training: 2022-01-08 05:05:49,096-[agedb_30][45000]Accuracy-Flip: 0.96100+-0.00892 Training: 2022-01-08 05:05:49,097-[agedb_30][45000]Accuracy-Highest: 0.96200 Training: 2022-01-08 05:05:55,946-Speed 444.96 samples/sec Loss 12.0866 LearningRate 0.3027 Epoch: 4 Global Step: 45010 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:06:02,792-Speed 5984.97 samples/sec Loss 12.1171 LearningRate 0.3027 Epoch: 4 Global Step: 45020 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:06:09,625-Speed 5995.57 samples/sec Loss 12.1182 LearningRate 0.3027 Epoch: 4 Global Step: 45030 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:06:16,493-Speed 5965.27 samples/sec Loss 12.1106 LearningRate 0.3026 Epoch: 4 Global Step: 45040 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:06:23,434-Speed 5901.98 samples/sec Loss 12.1221 LearningRate 0.3026 Epoch: 4 Global Step: 45050 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:06:30,295-Speed 5972.55 samples/sec Loss 12.0302 LearningRate 0.3025 Epoch: 4 Global Step: 45060 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:06:37,185-Speed 5945.73 samples/sec Loss 12.1428 LearningRate 0.3025 Epoch: 4 Global Step: 45070 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:06:44,044-Speed 5973.54 samples/sec Loss 12.1223 LearningRate 0.3025 Epoch: 4 Global Step: 45080 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:06:50,905-Speed 5971.20 samples/sec Loss 12.0435 LearningRate 0.3024 Epoch: 4 Global Step: 45090 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:06:57,763-Speed 5972.97 samples/sec Loss 11.9932 LearningRate 0.3024 Epoch: 4 Global Step: 45100 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:07:04,613-Speed 5980.96 samples/sec Loss 12.1019 LearningRate 0.3024 Epoch: 4 Global Step: 45110 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:07:11,483-Speed 5965.68 samples/sec Loss 12.1385 LearningRate 0.3023 Epoch: 4 Global Step: 45120 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:07:18,349-Speed 5966.20 samples/sec Loss 12.1165 LearningRate 0.3023 Epoch: 4 Global Step: 45130 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:07:25,203-Speed 5977.90 samples/sec Loss 12.0534 LearningRate 0.3022 Epoch: 4 Global Step: 45140 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:07:32,057-Speed 5977.35 samples/sec Loss 12.0771 LearningRate 0.3022 Epoch: 4 Global Step: 45150 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:07:38,910-Speed 5978.14 samples/sec Loss 12.1388 LearningRate 0.3022 Epoch: 4 Global Step: 45160 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:07:45,769-Speed 5972.64 samples/sec Loss 12.0402 LearningRate 0.3021 Epoch: 4 Global Step: 45170 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:07:52,619-Speed 5981.07 samples/sec Loss 11.9956 LearningRate 0.3021 Epoch: 4 Global Step: 45180 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:07:59,499-Speed 5954.57 samples/sec Loss 12.1222 LearningRate 0.3021 Epoch: 4 Global Step: 45190 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:08:06,360-Speed 5970.96 samples/sec Loss 11.9941 LearningRate 0.3020 Epoch: 4 Global Step: 45200 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:08:13,198-Speed 5990.73 samples/sec Loss 12.0690 LearningRate 0.3020 Epoch: 4 Global Step: 45210 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:08:20,036-Speed 5991.28 samples/sec Loss 12.0149 LearningRate 0.3019 Epoch: 4 Global Step: 45220 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:08:26,888-Speed 5978.31 samples/sec Loss 12.0716 LearningRate 0.3019 Epoch: 4 Global Step: 45230 Fp16 Grad Scale: 524288 Required: 32 hours Training: 2022-01-08 05:08:33,715-Speed 6001.27 samples/sec Loss 12.0650 LearningRate 0.3019 Epoch: 4 Global Step: 45240 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:08:40,556-Speed 5988.42 samples/sec Loss 11.9405 LearningRate 0.3018 Epoch: 4 Global Step: 45250 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:08:47,418-Speed 5970.06 samples/sec Loss 12.0555 LearningRate 0.3018 Epoch: 4 Global Step: 45260 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:08:54,269-Speed 5980.06 samples/sec Loss 12.0748 LearningRate 0.3018 Epoch: 4 Global Step: 45270 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:09:01,121-Speed 5979.06 samples/sec Loss 12.1156 LearningRate 0.3017 Epoch: 4 Global Step: 45280 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:09:07,991-Speed 5963.92 samples/sec Loss 12.0931 LearningRate 0.3017 Epoch: 4 Global Step: 45290 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:09:14,841-Speed 5980.07 samples/sec Loss 11.9772 LearningRate 0.3016 Epoch: 4 Global Step: 45300 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:09:21,687-Speed 5984.27 samples/sec Loss 12.0955 LearningRate 0.3016 Epoch: 4 Global Step: 45310 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:09:28,556-Speed 5964.37 samples/sec Loss 12.0381 LearningRate 0.3016 Epoch: 4 Global Step: 45320 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:09:35,405-Speed 5983.06 samples/sec Loss 12.0070 LearningRate 0.3015 Epoch: 4 Global Step: 45330 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:09:42,260-Speed 5975.76 samples/sec Loss 12.0732 LearningRate 0.3015 Epoch: 4 Global Step: 45340 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:09:49,106-Speed 5984.47 samples/sec Loss 12.0269 LearningRate 0.3015 Epoch: 4 Global Step: 45350 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:09:55,954-Speed 5982.64 samples/sec Loss 12.0868 LearningRate 0.3014 Epoch: 4 Global Step: 45360 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:10:02,799-Speed 5984.73 samples/sec Loss 11.9722 LearningRate 0.3014 Epoch: 4 Global Step: 45370 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:10:09,652-Speed 5980.58 samples/sec Loss 12.1008 LearningRate 0.3014 Epoch: 4 Global Step: 45380 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:10:16,508-Speed 5975.52 samples/sec Loss 12.0024 LearningRate 0.3013 Epoch: 4 Global Step: 45390 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:10:23,362-Speed 5976.79 samples/sec Loss 12.0717 LearningRate 0.3013 Epoch: 4 Global Step: 45400 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:10:30,206-Speed 5986.36 samples/sec Loss 12.0679 LearningRate 0.3012 Epoch: 4 Global Step: 45410 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:10:37,071-Speed 5969.40 samples/sec Loss 12.0565 LearningRate 0.3012 Epoch: 4 Global Step: 45420 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:10:43,917-Speed 5984.07 samples/sec Loss 12.0181 LearningRate 0.3012 Epoch: 4 Global Step: 45430 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:10:50,765-Speed 5983.20 samples/sec Loss 12.0195 LearningRate 0.3011 Epoch: 4 Global Step: 45440 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:10:57,602-Speed 5991.77 samples/sec Loss 12.0555 LearningRate 0.3011 Epoch: 4 Global Step: 45450 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:11:04,464-Speed 5969.95 samples/sec Loss 12.0896 LearningRate 0.3011 Epoch: 4 Global Step: 45460 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:11:11,332-Speed 5965.03 samples/sec Loss 12.0051 LearningRate 0.3010 Epoch: 4 Global Step: 45470 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:11:18,197-Speed 5967.70 samples/sec Loss 12.1215 LearningRate 0.3010 Epoch: 4 Global Step: 45480 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:11:25,054-Speed 5974.15 samples/sec Loss 12.1314 LearningRate 0.3009 Epoch: 4 Global Step: 45490 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:11:31,920-Speed 5966.38 samples/sec Loss 12.0320 LearningRate 0.3009 Epoch: 4 Global Step: 45500 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:11:38,782-Speed 5972.29 samples/sec Loss 12.0021 LearningRate 0.3009 Epoch: 4 Global Step: 45510 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:11:45,641-Speed 5973.03 samples/sec Loss 12.0751 LearningRate 0.3008 Epoch: 4 Global Step: 45520 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:11:52,491-Speed 5980.64 samples/sec Loss 12.1489 LearningRate 0.3008 Epoch: 4 Global Step: 45530 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:11:59,333-Speed 5987.92 samples/sec Loss 11.9336 LearningRate 0.3008 Epoch: 4 Global Step: 45540 Fp16 Grad Scale: 524288 Required: 32 hours Training: 2022-01-08 05:12:06,240-Speed 5931.41 samples/sec Loss 12.0507 LearningRate 0.3007 Epoch: 4 Global Step: 45550 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:12:13,089-Speed 5981.97 samples/sec Loss 12.1502 LearningRate 0.3007 Epoch: 4 Global Step: 45560 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:12:19,931-Speed 5987.79 samples/sec Loss 12.0394 LearningRate 0.3006 Epoch: 4 Global Step: 45570 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:12:26,796-Speed 5967.39 samples/sec Loss 12.1092 LearningRate 0.3006 Epoch: 4 Global Step: 45580 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:12:33,683-Speed 5948.52 samples/sec Loss 12.0568 LearningRate 0.3006 Epoch: 4 Global Step: 45590 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:12:40,527-Speed 5985.88 samples/sec Loss 11.9863 LearningRate 0.3005 Epoch: 4 Global Step: 45600 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:12:47,377-Speed 5980.75 samples/sec Loss 12.0718 LearningRate 0.3005 Epoch: 4 Global Step: 45610 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:12:54,219-Speed 5987.61 samples/sec Loss 12.0343 LearningRate 0.3005 Epoch: 4 Global Step: 45620 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:13:01,068-Speed 5981.46 samples/sec Loss 12.1668 LearningRate 0.3004 Epoch: 4 Global Step: 45630 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:13:07,953-Speed 5950.38 samples/sec Loss 12.0488 LearningRate 0.3004 Epoch: 4 Global Step: 45640 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:13:14,809-Speed 5976.09 samples/sec Loss 11.9714 LearningRate 0.3003 Epoch: 4 Global Step: 45650 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:13:21,661-Speed 5978.43 samples/sec Loss 12.0063 LearningRate 0.3003 Epoch: 4 Global Step: 45660 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:13:28,521-Speed 5971.82 samples/sec Loss 11.9407 LearningRate 0.3003 Epoch: 4 Global Step: 45670 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:13:35,387-Speed 5967.18 samples/sec Loss 12.0118 LearningRate 0.3002 Epoch: 4 Global Step: 45680 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:13:42,243-Speed 5975.44 samples/sec Loss 11.9279 LearningRate 0.3002 Epoch: 4 Global Step: 45690 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:13:49,090-Speed 5982.97 samples/sec Loss 12.0269 LearningRate 0.3002 Epoch: 4 Global Step: 45700 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:13:55,957-Speed 5968.45 samples/sec Loss 12.0304 LearningRate 0.3001 Epoch: 4 Global Step: 45710 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:14:02,817-Speed 5972.66 samples/sec Loss 11.9342 LearningRate 0.3001 Epoch: 4 Global Step: 45720 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:14:09,678-Speed 5970.63 samples/sec Loss 12.1035 LearningRate 0.3000 Epoch: 4 Global Step: 45730 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:14:16,540-Speed 5970.56 samples/sec Loss 12.0061 LearningRate 0.3000 Epoch: 4 Global Step: 45740 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:14:23,389-Speed 5981.08 samples/sec Loss 12.0162 LearningRate 0.3000 Epoch: 4 Global Step: 45750 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:14:30,250-Speed 5974.59 samples/sec Loss 11.9622 LearningRate 0.2999 Epoch: 4 Global Step: 45760 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:14:37,111-Speed 5970.39 samples/sec Loss 12.0696 LearningRate 0.2999 Epoch: 4 Global Step: 45770 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:14:43,988-Speed 5957.53 samples/sec Loss 12.0618 LearningRate 0.2999 Epoch: 4 Global Step: 45780 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:14:50,842-Speed 5977.73 samples/sec Loss 12.0100 LearningRate 0.2998 Epoch: 4 Global Step: 45790 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:14:57,720-Speed 5956.81 samples/sec Loss 12.0511 LearningRate 0.2998 Epoch: 4 Global Step: 45800 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:15:04,596-Speed 5957.42 samples/sec Loss 12.0665 LearningRate 0.2998 Epoch: 4 Global Step: 45810 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:15:11,470-Speed 5960.16 samples/sec Loss 11.9410 LearningRate 0.2997 Epoch: 4 Global Step: 45820 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:15:18,334-Speed 5968.72 samples/sec Loss 12.0108 LearningRate 0.2997 Epoch: 4 Global Step: 45830 Fp16 Grad Scale: 524288 Required: 32 hours Training: 2022-01-08 05:15:25,199-Speed 5967.15 samples/sec Loss 11.9657 LearningRate 0.2996 Epoch: 4 Global Step: 45840 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:15:32,076-Speed 5957.88 samples/sec Loss 11.9340 LearningRate 0.2996 Epoch: 4 Global Step: 45850 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:15:38,952-Speed 5958.34 samples/sec Loss 11.9951 LearningRate 0.2996 Epoch: 4 Global Step: 45860 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:15:45,849-Speed 5942.66 samples/sec Loss 11.9995 LearningRate 0.2995 Epoch: 4 Global Step: 45870 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:15:52,737-Speed 5948.95 samples/sec Loss 11.9779 LearningRate 0.2995 Epoch: 4 Global Step: 45880 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:15:59,700-Speed 5883.72 samples/sec Loss 11.9359 LearningRate 0.2995 Epoch: 4 Global Step: 45890 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:16:06,632-Speed 5910.54 samples/sec Loss 12.0799 LearningRate 0.2994 Epoch: 4 Global Step: 45900 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:16:13,499-Speed 5967.22 samples/sec Loss 12.0188 LearningRate 0.2994 Epoch: 4 Global Step: 45910 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:16:20,382-Speed 5953.00 samples/sec Loss 12.0129 LearningRate 0.2993 Epoch: 4 Global Step: 45920 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:16:27,266-Speed 5950.67 samples/sec Loss 11.9647 LearningRate 0.2993 Epoch: 4 Global Step: 45930 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:16:34,105-Speed 5992.93 samples/sec Loss 12.0560 LearningRate 0.2993 Epoch: 4 Global Step: 45940 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:16:40,969-Speed 5967.71 samples/sec Loss 11.9973 LearningRate 0.2992 Epoch: 4 Global Step: 45950 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:16:47,823-Speed 5977.23 samples/sec Loss 11.9549 LearningRate 0.2992 Epoch: 4 Global Step: 45960 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:16:54,693-Speed 5963.33 samples/sec Loss 11.9974 LearningRate 0.2992 Epoch: 4 Global Step: 45970 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:17:01,550-Speed 5973.85 samples/sec Loss 12.0019 LearningRate 0.2991 Epoch: 4 Global Step: 45980 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:17:08,416-Speed 5966.37 samples/sec Loss 11.9536 LearningRate 0.2991 Epoch: 4 Global Step: 45990 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:17:15,264-Speed 5982.98 samples/sec Loss 11.8825 LearningRate 0.2990 Epoch: 4 Global Step: 46000 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:17:22,113-Speed 5981.86 samples/sec Loss 12.0649 LearningRate 0.2990 Epoch: 4 Global Step: 46010 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:17:28,961-Speed 5981.67 samples/sec Loss 12.1270 LearningRate 0.2990 Epoch: 4 Global Step: 46020 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:17:35,821-Speed 5971.55 samples/sec Loss 12.0349 LearningRate 0.2989 Epoch: 4 Global Step: 46030 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:17:42,678-Speed 5974.82 samples/sec Loss 12.0451 LearningRate 0.2989 Epoch: 4 Global Step: 46040 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:17:49,533-Speed 5976.58 samples/sec Loss 11.9268 LearningRate 0.2989 Epoch: 4 Global Step: 46050 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:17:56,385-Speed 5979.04 samples/sec Loss 11.9679 LearningRate 0.2988 Epoch: 4 Global Step: 46060 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:18:03,244-Speed 5972.76 samples/sec Loss 11.9854 LearningRate 0.2988 Epoch: 4 Global Step: 46070 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:18:10,109-Speed 5966.87 samples/sec Loss 11.9992 LearningRate 0.2988 Epoch: 4 Global Step: 46080 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:18:16,954-Speed 5985.07 samples/sec Loss 11.9995 LearningRate 0.2987 Epoch: 4 Global Step: 46090 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:18:23,818-Speed 5967.90 samples/sec Loss 11.9737 LearningRate 0.2987 Epoch: 4 Global Step: 46100 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:18:30,689-Speed 5963.01 samples/sec Loss 11.9810 LearningRate 0.2986 Epoch: 4 Global Step: 46110 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:18:37,544-Speed 5975.71 samples/sec Loss 12.0030 LearningRate 0.2986 Epoch: 4 Global Step: 46120 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:18:44,442-Speed 5939.22 samples/sec Loss 11.9872 LearningRate 0.2986 Epoch: 4 Global Step: 46130 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:18:51,334-Speed 5945.16 samples/sec Loss 12.0754 LearningRate 0.2985 Epoch: 4 Global Step: 46140 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:18:58,188-Speed 5977.00 samples/sec Loss 11.9799 LearningRate 0.2985 Epoch: 4 Global Step: 46150 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:19:05,043-Speed 5976.74 samples/sec Loss 12.0722 LearningRate 0.2985 Epoch: 4 Global Step: 46160 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:19:11,939-Speed 5946.64 samples/sec Loss 12.0808 LearningRate 0.2984 Epoch: 4 Global Step: 46170 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:19:18,810-Speed 5962.71 samples/sec Loss 11.9924 LearningRate 0.2984 Epoch: 4 Global Step: 46180 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:19:25,684-Speed 5959.55 samples/sec Loss 12.0017 LearningRate 0.2983 Epoch: 4 Global Step: 46190 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:19:32,540-Speed 5975.00 samples/sec Loss 12.0851 LearningRate 0.2983 Epoch: 4 Global Step: 46200 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:19:39,420-Speed 5954.69 samples/sec Loss 11.9812 LearningRate 0.2983 Epoch: 4 Global Step: 46210 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:19:46,274-Speed 5976.89 samples/sec Loss 12.0410 LearningRate 0.2982 Epoch: 4 Global Step: 46220 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:19:53,146-Speed 5962.09 samples/sec Loss 12.0323 LearningRate 0.2982 Epoch: 4 Global Step: 46230 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:20:00,054-Speed 5930.79 samples/sec Loss 11.9513 LearningRate 0.2982 Epoch: 4 Global Step: 46240 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:20:06,919-Speed 5967.25 samples/sec Loss 11.9814 LearningRate 0.2981 Epoch: 4 Global Step: 46250 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:20:13,781-Speed 5970.15 samples/sec Loss 12.0927 LearningRate 0.2981 Epoch: 4 Global Step: 46260 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:20:20,642-Speed 5971.10 samples/sec Loss 11.9882 LearningRate 0.2980 Epoch: 4 Global Step: 46270 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:20:27,524-Speed 5953.08 samples/sec Loss 11.9390 LearningRate 0.2980 Epoch: 4 Global Step: 46280 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:20:34,377-Speed 5978.09 samples/sec Loss 11.9715 LearningRate 0.2980 Epoch: 4 Global Step: 46290 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:20:41,251-Speed 5960.24 samples/sec Loss 11.9281 LearningRate 0.2979 Epoch: 4 Global Step: 46300 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:20:48,109-Speed 5973.30 samples/sec Loss 11.9799 LearningRate 0.2979 Epoch: 4 Global Step: 46310 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 05:20:55,190-Speed 5785.98 samples/sec Loss 11.9659 LearningRate 0.2979 Epoch: 4 Global Step: 46320 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 05:21:02,105-Speed 5924.22 samples/sec Loss 12.0047 LearningRate 0.2978 Epoch: 4 Global Step: 46330 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 05:21:08,956-Speed 5980.01 samples/sec Loss 11.9718 LearningRate 0.2978 Epoch: 4 Global Step: 46340 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 05:21:15,831-Speed 5958.70 samples/sec Loss 11.9137 LearningRate 0.2978 Epoch: 4 Global Step: 46350 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 05:21:22,713-Speed 5954.31 samples/sec Loss 11.9921 LearningRate 0.2977 Epoch: 4 Global Step: 46360 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 05:21:29,570-Speed 5974.07 samples/sec Loss 12.0369 LearningRate 0.2977 Epoch: 4 Global Step: 46370 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 05:21:36,433-Speed 5969.13 samples/sec Loss 12.1142 LearningRate 0.2976 Epoch: 4 Global Step: 46380 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 05:21:43,281-Speed 5982.37 samples/sec Loss 11.9264 LearningRate 0.2976 Epoch: 4 Global Step: 46390 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 05:21:50,132-Speed 5980.03 samples/sec Loss 12.0131 LearningRate 0.2976 Epoch: 4 Global Step: 46400 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 05:21:56,981-Speed 5981.67 samples/sec Loss 11.9747 LearningRate 0.2975 Epoch: 4 Global Step: 46410 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:22:03,835-Speed 5977.24 samples/sec Loss 11.8980 LearningRate 0.2975 Epoch: 4 Global Step: 46420 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:22:10,708-Speed 5962.55 samples/sec Loss 11.9685 LearningRate 0.2975 Epoch: 4 Global Step: 46430 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:22:17,565-Speed 5976.66 samples/sec Loss 11.9601 LearningRate 0.2974 Epoch: 4 Global Step: 46440 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:22:24,414-Speed 5981.61 samples/sec Loss 11.8974 LearningRate 0.2974 Epoch: 4 Global Step: 46450 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:22:31,276-Speed 5970.63 samples/sec Loss 11.9089 LearningRate 0.2973 Epoch: 4 Global Step: 46460 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:22:38,143-Speed 5965.94 samples/sec Loss 11.9713 LearningRate 0.2973 Epoch: 4 Global Step: 46470 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:22:45,114-Speed 5877.36 samples/sec Loss 11.9370 LearningRate 0.2973 Epoch: 4 Global Step: 46480 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:22:51,958-Speed 5985.74 samples/sec Loss 12.0115 LearningRate 0.2972 Epoch: 4 Global Step: 46490 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:22:58,842-Speed 5952.10 samples/sec Loss 11.9533 LearningRate 0.2972 Epoch: 4 Global Step: 46500 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:23:05,703-Speed 5971.74 samples/sec Loss 11.9516 LearningRate 0.2972 Epoch: 4 Global Step: 46510 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:23:12,552-Speed 5982.15 samples/sec Loss 11.9117 LearningRate 0.2971 Epoch: 4 Global Step: 46520 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:23:19,418-Speed 5966.41 samples/sec Loss 11.9951 LearningRate 0.2971 Epoch: 4 Global Step: 46530 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:23:26,304-Speed 5949.71 samples/sec Loss 11.9672 LearningRate 0.2970 Epoch: 4 Global Step: 46540 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:23:33,185-Speed 5953.56 samples/sec Loss 11.9318 LearningRate 0.2970 Epoch: 4 Global Step: 46550 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:23:40,079-Speed 5942.52 samples/sec Loss 11.9982 LearningRate 0.2970 Epoch: 4 Global Step: 46560 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:23:46,949-Speed 5963.42 samples/sec Loss 11.9530 LearningRate 0.2969 Epoch: 4 Global Step: 46570 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:23:53,826-Speed 5957.28 samples/sec Loss 11.9830 LearningRate 0.2969 Epoch: 4 Global Step: 46580 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:24:00,691-Speed 5968.35 samples/sec Loss 11.9012 LearningRate 0.2969 Epoch: 4 Global Step: 46590 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:24:07,560-Speed 5965.58 samples/sec Loss 11.9745 LearningRate 0.2968 Epoch: 4 Global Step: 46600 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:24:14,418-Speed 5974.05 samples/sec Loss 12.0257 LearningRate 0.2968 Epoch: 4 Global Step: 46610 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:24:21,267-Speed 5981.24 samples/sec Loss 11.9837 LearningRate 0.2968 Epoch: 4 Global Step: 46620 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:24:28,115-Speed 5982.68 samples/sec Loss 12.0012 LearningRate 0.2967 Epoch: 4 Global Step: 46630 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:24:34,968-Speed 5977.66 samples/sec Loss 12.0141 LearningRate 0.2967 Epoch: 4 Global Step: 46640 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 05:24:41,824-Speed 5974.70 samples/sec Loss 11.9226 LearningRate 0.2966 Epoch: 4 Global Step: 46650 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:24:48,688-Speed 5969.35 samples/sec Loss 12.0312 LearningRate 0.2966 Epoch: 4 Global Step: 46660 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 05:24:55,561-Speed 5960.55 samples/sec Loss 11.9086 LearningRate 0.2966 Epoch: 4 Global Step: 46670 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:25:02,420-Speed 5973.63 samples/sec Loss 11.9614 LearningRate 0.2965 Epoch: 4 Global Step: 46680 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:25:09,284-Speed 5967.80 samples/sec Loss 11.9398 LearningRate 0.2965 Epoch: 4 Global Step: 46690 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:25:16,139-Speed 5976.34 samples/sec Loss 11.9151 LearningRate 0.2965 Epoch: 4 Global Step: 46700 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:25:22,998-Speed 5972.52 samples/sec Loss 11.9776 LearningRate 0.2964 Epoch: 4 Global Step: 46710 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:25:29,863-Speed 5968.38 samples/sec Loss 11.9435 LearningRate 0.2964 Epoch: 4 Global Step: 46720 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:25:36,722-Speed 5973.28 samples/sec Loss 11.9173 LearningRate 0.2963 Epoch: 4 Global Step: 46730 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:25:43,640-Speed 5921.06 samples/sec Loss 11.9429 LearningRate 0.2963 Epoch: 4 Global Step: 46740 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:25:50,517-Speed 5957.54 samples/sec Loss 11.8924 LearningRate 0.2963 Epoch: 4 Global Step: 46750 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:25:57,384-Speed 5966.31 samples/sec Loss 11.9745 LearningRate 0.2962 Epoch: 4 Global Step: 46760 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:26:04,228-Speed 5985.75 samples/sec Loss 11.9283 LearningRate 0.2962 Epoch: 4 Global Step: 46770 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:26:11,143-Speed 5923.64 samples/sec Loss 11.9558 LearningRate 0.2962 Epoch: 4 Global Step: 46780 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:26:18,067-Speed 5917.21 samples/sec Loss 11.9633 LearningRate 0.2961 Epoch: 4 Global Step: 46790 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:26:24,916-Speed 5981.30 samples/sec Loss 11.9223 LearningRate 0.2961 Epoch: 4 Global Step: 46800 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:26:31,767-Speed 5980.62 samples/sec Loss 11.9406 LearningRate 0.2961 Epoch: 4 Global Step: 46810 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:26:38,651-Speed 5950.84 samples/sec Loss 11.8801 LearningRate 0.2960 Epoch: 4 Global Step: 46820 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:26:45,505-Speed 5977.06 samples/sec Loss 11.9106 LearningRate 0.2960 Epoch: 4 Global Step: 46830 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:26:52,378-Speed 5960.67 samples/sec Loss 11.9697 LearningRate 0.2959 Epoch: 4 Global Step: 46840 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:26:59,254-Speed 5958.36 samples/sec Loss 11.9182 LearningRate 0.2959 Epoch: 4 Global Step: 46850 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:27:06,121-Speed 5968.43 samples/sec Loss 11.9224 LearningRate 0.2959 Epoch: 4 Global Step: 46860 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:27:12,972-Speed 5979.66 samples/sec Loss 11.9170 LearningRate 0.2958 Epoch: 4 Global Step: 46870 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:27:19,817-Speed 5984.88 samples/sec Loss 11.9815 LearningRate 0.2958 Epoch: 4 Global Step: 46880 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:27:26,685-Speed 5966.57 samples/sec Loss 11.9322 LearningRate 0.2958 Epoch: 4 Global Step: 46890 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:27:33,540-Speed 5977.08 samples/sec Loss 11.9091 LearningRate 0.2957 Epoch: 4 Global Step: 46900 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:27:40,426-Speed 5949.01 samples/sec Loss 11.9728 LearningRate 0.2957 Epoch: 4 Global Step: 46910 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:27:47,294-Speed 5965.43 samples/sec Loss 11.9686 LearningRate 0.2956 Epoch: 4 Global Step: 46920 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:27:54,141-Speed 5982.39 samples/sec Loss 11.9698 LearningRate 0.2956 Epoch: 4 Global Step: 46930 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:28:00,992-Speed 5980.70 samples/sec Loss 11.9346 LearningRate 0.2956 Epoch: 4 Global Step: 46940 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:28:07,868-Speed 5957.93 samples/sec Loss 11.9415 LearningRate 0.2955 Epoch: 4 Global Step: 46950 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:28:14,739-Speed 5962.96 samples/sec Loss 11.8808 LearningRate 0.2955 Epoch: 4 Global Step: 46960 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:28:21,609-Speed 5963.36 samples/sec Loss 11.9148 LearningRate 0.2955 Epoch: 4 Global Step: 46970 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:28:28,471-Speed 5970.28 samples/sec Loss 11.9626 LearningRate 0.2954 Epoch: 4 Global Step: 46980 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:28:35,332-Speed 5970.84 samples/sec Loss 12.0095 LearningRate 0.2954 Epoch: 4 Global Step: 46990 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:28:42,204-Speed 5962.12 samples/sec Loss 12.0001 LearningRate 0.2954 Epoch: 4 Global Step: 47000 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:28:49,078-Speed 5961.32 samples/sec Loss 11.9776 LearningRate 0.2953 Epoch: 4 Global Step: 47010 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:28:55,928-Speed 5981.04 samples/sec Loss 11.9913 LearningRate 0.2953 Epoch: 4 Global Step: 47020 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:29:02,795-Speed 5965.96 samples/sec Loss 11.9932 LearningRate 0.2952 Epoch: 4 Global Step: 47030 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:29:09,653-Speed 5973.64 samples/sec Loss 11.8534 LearningRate 0.2952 Epoch: 4 Global Step: 47040 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:29:16,641-Speed 5863.11 samples/sec Loss 11.9442 LearningRate 0.2952 Epoch: 4 Global Step: 47050 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:29:23,492-Speed 5979.00 samples/sec Loss 11.9744 LearningRate 0.2951 Epoch: 4 Global Step: 47060 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:29:30,322-Speed 5998.38 samples/sec Loss 12.0028 LearningRate 0.2951 Epoch: 4 Global Step: 47070 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 05:29:37,172-Speed 5980.71 samples/sec Loss 11.9676 LearningRate 0.2951 Epoch: 4 Global Step: 47080 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 05:29:44,033-Speed 5971.29 samples/sec Loss 11.8781 LearningRate 0.2950 Epoch: 4 Global Step: 47090 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 05:29:50,884-Speed 5979.91 samples/sec Loss 11.9775 LearningRate 0.2950 Epoch: 4 Global Step: 47100 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 05:29:57,767-Speed 5952.23 samples/sec Loss 11.9545 LearningRate 0.2949 Epoch: 4 Global Step: 47110 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 05:30:04,650-Speed 5952.36 samples/sec Loss 11.9224 LearningRate 0.2949 Epoch: 4 Global Step: 47120 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 05:30:11,491-Speed 5988.11 samples/sec Loss 11.9480 LearningRate 0.2949 Epoch: 4 Global Step: 47130 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 05:30:18,347-Speed 5975.80 samples/sec Loss 11.9867 LearningRate 0.2948 Epoch: 4 Global Step: 47140 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 05:30:25,205-Speed 5973.21 samples/sec Loss 11.9892 LearningRate 0.2948 Epoch: 4 Global Step: 47150 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 05:30:32,067-Speed 5971.58 samples/sec Loss 11.9234 LearningRate 0.2948 Epoch: 4 Global Step: 47160 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 05:30:38,925-Speed 5973.78 samples/sec Loss 11.9733 LearningRate 0.2947 Epoch: 4 Global Step: 47170 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:30:45,784-Speed 5973.22 samples/sec Loss 11.9874 LearningRate 0.2947 Epoch: 4 Global Step: 47180 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:30:52,647-Speed 5969.62 samples/sec Loss 11.8801 LearningRate 0.2947 Epoch: 4 Global Step: 47190 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:30:59,516-Speed 5964.46 samples/sec Loss 11.9486 LearningRate 0.2946 Epoch: 4 Global Step: 47200 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:31:06,366-Speed 5980.65 samples/sec Loss 11.9473 LearningRate 0.2946 Epoch: 4 Global Step: 47210 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:31:13,208-Speed 5987.72 samples/sec Loss 11.8790 LearningRate 0.2945 Epoch: 4 Global Step: 47220 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 05:31:20,059-Speed 5980.04 samples/sec Loss 11.9180 LearningRate 0.2945 Epoch: 4 Global Step: 47230 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 05:31:26,908-Speed 5981.87 samples/sec Loss 11.8767 LearningRate 0.2945 Epoch: 4 Global Step: 47240 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 05:31:33,768-Speed 5971.98 samples/sec Loss 11.8571 LearningRate 0.2944 Epoch: 4 Global Step: 47250 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 05:31:40,618-Speed 5980.42 samples/sec Loss 11.9489 LearningRate 0.2944 Epoch: 4 Global Step: 47260 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 05:31:47,468-Speed 5980.49 samples/sec Loss 11.8926 LearningRate 0.2944 Epoch: 4 Global Step: 47270 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 05:31:54,318-Speed 5981.13 samples/sec Loss 11.9810 LearningRate 0.2943 Epoch: 4 Global Step: 47280 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 05:32:01,178-Speed 5971.62 samples/sec Loss 11.9262 LearningRate 0.2943 Epoch: 4 Global Step: 47290 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 05:32:08,045-Speed 5966.66 samples/sec Loss 11.8172 LearningRate 0.2942 Epoch: 4 Global Step: 47300 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 05:32:14,896-Speed 5980.78 samples/sec Loss 11.8549 LearningRate 0.2942 Epoch: 4 Global Step: 47310 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 05:32:21,745-Speed 5982.19 samples/sec Loss 11.8458 LearningRate 0.2942 Epoch: 4 Global Step: 47320 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:32:28,606-Speed 5970.98 samples/sec Loss 11.8190 LearningRate 0.2941 Epoch: 4 Global Step: 47330 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:32:35,487-Speed 5953.81 samples/sec Loss 12.0037 LearningRate 0.2941 Epoch: 4 Global Step: 47340 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:32:42,335-Speed 5982.64 samples/sec Loss 11.8924 LearningRate 0.2941 Epoch: 4 Global Step: 47350 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:32:49,198-Speed 5969.04 samples/sec Loss 11.9269 LearningRate 0.2940 Epoch: 4 Global Step: 47360 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:32:56,048-Speed 5982.45 samples/sec Loss 11.8469 LearningRate 0.2940 Epoch: 4 Global Step: 47370 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:33:02,895-Speed 5983.21 samples/sec Loss 11.9346 LearningRate 0.2940 Epoch: 4 Global Step: 47380 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:33:09,754-Speed 5973.56 samples/sec Loss 11.8703 LearningRate 0.2939 Epoch: 4 Global Step: 47390 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:33:16,628-Speed 5959.88 samples/sec Loss 11.9506 LearningRate 0.2939 Epoch: 4 Global Step: 47400 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:33:23,485-Speed 5976.38 samples/sec Loss 11.9232 LearningRate 0.2938 Epoch: 4 Global Step: 47410 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:33:30,363-Speed 5956.61 samples/sec Loss 11.9633 LearningRate 0.2938 Epoch: 4 Global Step: 47420 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:33:37,225-Speed 5970.35 samples/sec Loss 11.9239 LearningRate 0.2938 Epoch: 4 Global Step: 47430 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:33:44,084-Speed 5972.92 samples/sec Loss 11.9150 LearningRate 0.2937 Epoch: 4 Global Step: 47440 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:33:50,919-Speed 5993.29 samples/sec Loss 11.8275 LearningRate 0.2937 Epoch: 4 Global Step: 47450 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:33:57,770-Speed 5981.62 samples/sec Loss 11.8667 LearningRate 0.2937 Epoch: 4 Global Step: 47460 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:34:04,629-Speed 5972.87 samples/sec Loss 11.8952 LearningRate 0.2936 Epoch: 4 Global Step: 47470 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:34:11,479-Speed 5981.41 samples/sec Loss 11.8958 LearningRate 0.2936 Epoch: 4 Global Step: 47480 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:34:18,326-Speed 5983.34 samples/sec Loss 11.8655 LearningRate 0.2936 Epoch: 4 Global Step: 47490 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:34:25,203-Speed 5957.08 samples/sec Loss 11.8312 LearningRate 0.2935 Epoch: 4 Global Step: 47500 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:34:32,068-Speed 5968.04 samples/sec Loss 11.7991 LearningRate 0.2935 Epoch: 4 Global Step: 47510 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:34:38,918-Speed 5982.02 samples/sec Loss 11.9937 LearningRate 0.2934 Epoch: 4 Global Step: 47520 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:34:45,784-Speed 5966.63 samples/sec Loss 11.8806 LearningRate 0.2934 Epoch: 4 Global Step: 47530 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:34:52,645-Speed 5971.22 samples/sec Loss 11.8926 LearningRate 0.2934 Epoch: 4 Global Step: 47540 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:34:59,513-Speed 5965.51 samples/sec Loss 11.8894 LearningRate 0.2933 Epoch: 4 Global Step: 47550 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:35:06,383-Speed 5963.43 samples/sec Loss 11.9087 LearningRate 0.2933 Epoch: 4 Global Step: 47560 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:35:13,239-Speed 5977.33 samples/sec Loss 11.9149 LearningRate 0.2933 Epoch: 4 Global Step: 47570 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:35:20,119-Speed 5953.86 samples/sec Loss 11.8901 LearningRate 0.2932 Epoch: 4 Global Step: 47580 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:35:26,978-Speed 5973.47 samples/sec Loss 11.8846 LearningRate 0.2932 Epoch: 4 Global Step: 47590 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:35:33,901-Speed 5917.56 samples/sec Loss 11.8413 LearningRate 0.2931 Epoch: 4 Global Step: 47600 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:35:40,751-Speed 5980.52 samples/sec Loss 11.8666 LearningRate 0.2931 Epoch: 4 Global Step: 47610 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:35:47,617-Speed 5966.53 samples/sec Loss 11.9508 LearningRate 0.2931 Epoch: 4 Global Step: 47620 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:35:54,497-Speed 5955.07 samples/sec Loss 11.8117 LearningRate 0.2930 Epoch: 4 Global Step: 47630 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:36:01,358-Speed 5970.79 samples/sec Loss 11.9102 LearningRate 0.2930 Epoch: 4 Global Step: 47640 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:36:08,789-Speed 5513.53 samples/sec Loss 11.9181 LearningRate 0.2930 Epoch: 4 Global Step: 47650 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:36:15,622-Speed 5998.59 samples/sec Loss 11.9303 LearningRate 0.2929 Epoch: 4 Global Step: 47660 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:36:22,479-Speed 5974.72 samples/sec Loss 11.9364 LearningRate 0.2929 Epoch: 4 Global Step: 47670 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:36:29,408-Speed 5912.53 samples/sec Loss 11.8348 LearningRate 0.2929 Epoch: 4 Global Step: 47680 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:36:36,274-Speed 5967.24 samples/sec Loss 11.7914 LearningRate 0.2928 Epoch: 4 Global Step: 47690 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 05:36:43,108-Speed 5994.74 samples/sec Loss 11.8182 LearningRate 0.2928 Epoch: 4 Global Step: 47700 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-01-08 05:36:49,946-Speed 5989.99 samples/sec Loss 11.8064 LearningRate 0.2927 Epoch: 4 Global Step: 47710 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-01-08 05:36:56,827-Speed 5954.65 samples/sec Loss 11.8188 LearningRate 0.2927 Epoch: 4 Global Step: 47720 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-01-08 05:37:03,711-Speed 5954.22 samples/sec Loss 11.8527 LearningRate 0.2927 Epoch: 4 Global Step: 47730 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-01-08 05:37:10,565-Speed 5977.19 samples/sec Loss 12.0054 LearningRate 0.2926 Epoch: 4 Global Step: 47740 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-01-08 05:37:17,414-Speed 5980.70 samples/sec Loss 11.8342 LearningRate 0.2926 Epoch: 4 Global Step: 47750 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-01-08 05:37:24,266-Speed 5979.73 samples/sec Loss 11.9463 LearningRate 0.2926 Epoch: 4 Global Step: 47760 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-01-08 05:37:31,129-Speed 5968.60 samples/sec Loss 11.8320 LearningRate 0.2925 Epoch: 4 Global Step: 47770 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-01-08 05:37:37,984-Speed 5976.29 samples/sec Loss 11.9463 LearningRate 0.2925 Epoch: 4 Global Step: 47780 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-01-08 05:37:44,831-Speed 5983.36 samples/sec Loss 11.8634 LearningRate 0.2925 Epoch: 4 Global Step: 47790 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-01-08 05:37:51,678-Speed 5983.35 samples/sec Loss 11.8768 LearningRate 0.2924 Epoch: 4 Global Step: 47800 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 05:37:58,524-Speed 5983.59 samples/sec Loss 11.9282 LearningRate 0.2924 Epoch: 4 Global Step: 47810 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 05:38:05,383-Speed 5972.48 samples/sec Loss 11.9281 LearningRate 0.2923 Epoch: 4 Global Step: 47820 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 05:38:12,246-Speed 5969.56 samples/sec Loss 11.8994 LearningRate 0.2923 Epoch: 4 Global Step: 47830 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 05:38:19,092-Speed 5983.98 samples/sec Loss 11.8064 LearningRate 0.2923 Epoch: 4 Global Step: 47840 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 05:38:25,957-Speed 5967.77 samples/sec Loss 11.8614 LearningRate 0.2922 Epoch: 4 Global Step: 47850 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 05:38:32,819-Speed 5970.17 samples/sec Loss 12.0116 LearningRate 0.2922 Epoch: 4 Global Step: 47860 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 05:38:39,665-Speed 5984.65 samples/sec Loss 11.9275 LearningRate 0.2922 Epoch: 4 Global Step: 47870 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 05:38:46,528-Speed 5969.33 samples/sec Loss 11.8257 LearningRate 0.2921 Epoch: 4 Global Step: 47880 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 05:38:53,417-Speed 5947.29 samples/sec Loss 11.8580 LearningRate 0.2921 Epoch: 4 Global Step: 47890 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 05:39:00,267-Speed 5980.53 samples/sec Loss 11.8479 LearningRate 0.2920 Epoch: 4 Global Step: 47900 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:39:07,122-Speed 5979.15 samples/sec Loss 11.8221 LearningRate 0.2920 Epoch: 4 Global Step: 47910 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:39:13,973-Speed 5979.67 samples/sec Loss 11.8558 LearningRate 0.2920 Epoch: 4 Global Step: 47920 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:39:20,842-Speed 5965.55 samples/sec Loss 11.9559 LearningRate 0.2919 Epoch: 4 Global Step: 47930 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:39:27,692-Speed 5983.46 samples/sec Loss 11.8649 LearningRate 0.2919 Epoch: 4 Global Step: 47940 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:39:34,538-Speed 5983.60 samples/sec Loss 11.9079 LearningRate 0.2919 Epoch: 4 Global Step: 47950 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:39:41,382-Speed 5986.30 samples/sec Loss 11.8812 LearningRate 0.2918 Epoch: 4 Global Step: 47960 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:39:48,244-Speed 5969.50 samples/sec Loss 11.8869 LearningRate 0.2918 Epoch: 4 Global Step: 47970 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:39:55,093-Speed 5986.95 samples/sec Loss 12.0213 LearningRate 0.2918 Epoch: 4 Global Step: 47980 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:40:01,957-Speed 5968.78 samples/sec Loss 11.8447 LearningRate 0.2917 Epoch: 4 Global Step: 47990 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:40:08,810-Speed 5977.51 samples/sec Loss 11.7711 LearningRate 0.2917 Epoch: 4 Global Step: 48000 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:40:15,667-Speed 5975.00 samples/sec Loss 11.8609 LearningRate 0.2916 Epoch: 4 Global Step: 48010 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:40:22,517-Speed 5980.35 samples/sec Loss 11.7989 LearningRate 0.2916 Epoch: 4 Global Step: 48020 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 05:40:29,366-Speed 5981.38 samples/sec Loss 11.9153 LearningRate 0.2916 Epoch: 4 Global Step: 48030 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 05:40:36,216-Speed 5980.66 samples/sec Loss 11.8481 LearningRate 0.2915 Epoch: 4 Global Step: 48040 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 05:40:43,068-Speed 5978.98 samples/sec Loss 11.8884 LearningRate 0.2915 Epoch: 4 Global Step: 48050 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 05:40:49,909-Speed 5990.08 samples/sec Loss 11.9009 LearningRate 0.2915 Epoch: 4 Global Step: 48060 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 05:40:56,756-Speed 5983.80 samples/sec Loss 11.9378 LearningRate 0.2914 Epoch: 4 Global Step: 48070 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 05:41:03,619-Speed 5968.41 samples/sec Loss 11.8791 LearningRate 0.2914 Epoch: 4 Global Step: 48080 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 05:41:10,470-Speed 5981.43 samples/sec Loss 11.7580 LearningRate 0.2914 Epoch: 4 Global Step: 48090 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 05:41:17,329-Speed 5971.99 samples/sec Loss 11.8153 LearningRate 0.2913 Epoch: 4 Global Step: 48100 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 05:41:24,175-Speed 5984.51 samples/sec Loss 11.9159 LearningRate 0.2913 Epoch: 4 Global Step: 48110 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 05:41:31,051-Speed 5958.70 samples/sec Loss 11.8870 LearningRate 0.2912 Epoch: 4 Global Step: 48120 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:41:37,918-Speed 5965.89 samples/sec Loss 11.8030 LearningRate 0.2912 Epoch: 4 Global Step: 48130 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:41:44,773-Speed 5976.00 samples/sec Loss 11.8611 LearningRate 0.2912 Epoch: 4 Global Step: 48140 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:41:51,623-Speed 5982.94 samples/sec Loss 11.7795 LearningRate 0.2911 Epoch: 4 Global Step: 48150 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:41:58,480-Speed 5974.20 samples/sec Loss 11.7381 LearningRate 0.2911 Epoch: 4 Global Step: 48160 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:42:05,322-Speed 5987.79 samples/sec Loss 11.9339 LearningRate 0.2911 Epoch: 4 Global Step: 48170 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:42:12,210-Speed 5948.49 samples/sec Loss 11.8585 LearningRate 0.2910 Epoch: 4 Global Step: 48180 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:42:19,202-Speed 5858.85 samples/sec Loss 11.8434 LearningRate 0.2910 Epoch: 4 Global Step: 48190 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:42:26,043-Speed 5988.77 samples/sec Loss 11.8119 LearningRate 0.2909 Epoch: 4 Global Step: 48200 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:42:32,974-Speed 5911.11 samples/sec Loss 11.7890 LearningRate 0.2909 Epoch: 4 Global Step: 48210 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:42:39,885-Speed 5927.53 samples/sec Loss 11.8918 LearningRate 0.2909 Epoch: 4 Global Step: 48220 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:42:46,742-Speed 5975.42 samples/sec Loss 11.8064 LearningRate 0.2908 Epoch: 4 Global Step: 48230 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:42:53,597-Speed 5976.24 samples/sec Loss 11.9707 LearningRate 0.2908 Epoch: 4 Global Step: 48240 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:43:00,463-Speed 5966.55 samples/sec Loss 11.8331 LearningRate 0.2908 Epoch: 4 Global Step: 48250 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:43:07,319-Speed 5975.21 samples/sec Loss 11.8599 LearningRate 0.2907 Epoch: 4 Global Step: 48260 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:43:14,169-Speed 5981.15 samples/sec Loss 11.8648 LearningRate 0.2907 Epoch: 4 Global Step: 48270 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:43:21,053-Speed 5951.02 samples/sec Loss 11.8534 LearningRate 0.2907 Epoch: 4 Global Step: 48280 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:43:27,908-Speed 5979.52 samples/sec Loss 11.9176 LearningRate 0.2906 Epoch: 4 Global Step: 48290 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:43:34,773-Speed 5967.23 samples/sec Loss 11.8725 LearningRate 0.2906 Epoch: 4 Global Step: 48300 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:43:41,665-Speed 5945.22 samples/sec Loss 11.7985 LearningRate 0.2905 Epoch: 4 Global Step: 48310 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:43:48,515-Speed 5980.94 samples/sec Loss 11.9267 LearningRate 0.2905 Epoch: 4 Global Step: 48320 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:43:55,365-Speed 5983.66 samples/sec Loss 11.8333 LearningRate 0.2905 Epoch: 4 Global Step: 48330 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:44:02,227-Speed 5971.07 samples/sec Loss 11.7696 LearningRate 0.2904 Epoch: 4 Global Step: 48340 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:44:09,064-Speed 5991.51 samples/sec Loss 11.8106 LearningRate 0.2904 Epoch: 4 Global Step: 48350 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:44:15,910-Speed 5983.62 samples/sec Loss 11.7725 LearningRate 0.2904 Epoch: 4 Global Step: 48360 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:44:22,768-Speed 5975.91 samples/sec Loss 11.8207 LearningRate 0.2903 Epoch: 4 Global Step: 48370 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:44:29,623-Speed 5975.94 samples/sec Loss 11.7944 LearningRate 0.2903 Epoch: 4 Global Step: 48380 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:44:36,475-Speed 5978.77 samples/sec Loss 11.8014 LearningRate 0.2903 Epoch: 4 Global Step: 48390 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:44:43,314-Speed 5990.71 samples/sec Loss 11.7879 LearningRate 0.2902 Epoch: 4 Global Step: 48400 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:44:50,164-Speed 5983.13 samples/sec Loss 11.8264 LearningRate 0.2902 Epoch: 4 Global Step: 48410 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:44:57,016-Speed 5979.16 samples/sec Loss 11.7772 LearningRate 0.2901 Epoch: 4 Global Step: 48420 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:45:03,856-Speed 5988.64 samples/sec Loss 11.8209 LearningRate 0.2901 Epoch: 4 Global Step: 48430 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:45:10,725-Speed 5964.06 samples/sec Loss 11.8087 LearningRate 0.2901 Epoch: 4 Global Step: 48440 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:45:17,571-Speed 5984.07 samples/sec Loss 11.8160 LearningRate 0.2900 Epoch: 4 Global Step: 48450 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:45:24,417-Speed 5983.15 samples/sec Loss 11.8893 LearningRate 0.2900 Epoch: 4 Global Step: 48460 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:45:31,267-Speed 5981.89 samples/sec Loss 11.8577 LearningRate 0.2900 Epoch: 4 Global Step: 48470 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:45:38,137-Speed 5963.21 samples/sec Loss 11.9045 LearningRate 0.2899 Epoch: 4 Global Step: 48480 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:45:44,984-Speed 5982.53 samples/sec Loss 11.8395 LearningRate 0.2899 Epoch: 4 Global Step: 48490 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:45:51,832-Speed 5982.95 samples/sec Loss 11.7861 LearningRate 0.2899 Epoch: 4 Global Step: 48500 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:45:58,688-Speed 5975.89 samples/sec Loss 11.8561 LearningRate 0.2898 Epoch: 4 Global Step: 48510 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:46:05,558-Speed 5963.86 samples/sec Loss 11.8878 LearningRate 0.2898 Epoch: 4 Global Step: 48520 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:46:12,401-Speed 5986.38 samples/sec Loss 11.8523 LearningRate 0.2897 Epoch: 4 Global Step: 48530 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:46:19,256-Speed 5976.70 samples/sec Loss 11.8747 LearningRate 0.2897 Epoch: 4 Global Step: 48540 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:46:26,119-Speed 5968.89 samples/sec Loss 11.7862 LearningRate 0.2897 Epoch: 4 Global Step: 48550 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:46:32,976-Speed 5974.83 samples/sec Loss 11.8494 LearningRate 0.2896 Epoch: 4 Global Step: 48560 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:46:39,821-Speed 5985.78 samples/sec Loss 11.8170 LearningRate 0.2896 Epoch: 4 Global Step: 48570 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:46:46,705-Speed 5950.73 samples/sec Loss 11.8292 LearningRate 0.2896 Epoch: 4 Global Step: 48580 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:46:53,569-Speed 5971.79 samples/sec Loss 11.8007 LearningRate 0.2895 Epoch: 4 Global Step: 48590 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:47:00,412-Speed 5986.30 samples/sec Loss 11.7601 LearningRate 0.2895 Epoch: 4 Global Step: 48600 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:47:07,271-Speed 5973.10 samples/sec Loss 11.8216 LearningRate 0.2895 Epoch: 4 Global Step: 48610 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:47:14,109-Speed 5990.74 samples/sec Loss 11.8325 LearningRate 0.2894 Epoch: 4 Global Step: 48620 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:47:20,946-Speed 5992.32 samples/sec Loss 11.7756 LearningRate 0.2894 Epoch: 4 Global Step: 48630 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:47:27,800-Speed 5977.17 samples/sec Loss 11.7695 LearningRate 0.2893 Epoch: 4 Global Step: 48640 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:47:34,648-Speed 5981.90 samples/sec Loss 11.7289 LearningRate 0.2893 Epoch: 4 Global Step: 48650 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:47:41,487-Speed 5990.83 samples/sec Loss 11.7853 LearningRate 0.2893 Epoch: 4 Global Step: 48660 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:47:48,333-Speed 5983.18 samples/sec Loss 11.7336 LearningRate 0.2892 Epoch: 4 Global Step: 48670 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:47:55,189-Speed 5976.41 samples/sec Loss 11.8089 LearningRate 0.2892 Epoch: 4 Global Step: 48680 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:48:02,035-Speed 5983.96 samples/sec Loss 11.8176 LearningRate 0.2892 Epoch: 4 Global Step: 48690 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:48:08,884-Speed 5981.42 samples/sec Loss 11.7924 LearningRate 0.2891 Epoch: 4 Global Step: 48700 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:48:15,740-Speed 5975.91 samples/sec Loss 11.8726 LearningRate 0.2891 Epoch: 4 Global Step: 48710 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:48:22,601-Speed 5970.95 samples/sec Loss 11.7158 LearningRate 0.2891 Epoch: 4 Global Step: 48720 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:48:29,459-Speed 5972.67 samples/sec Loss 11.7515 LearningRate 0.2890 Epoch: 4 Global Step: 48730 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:48:36,309-Speed 5980.93 samples/sec Loss 11.8089 LearningRate 0.2890 Epoch: 4 Global Step: 48740 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:48:43,162-Speed 5978.54 samples/sec Loss 11.8580 LearningRate 0.2889 Epoch: 4 Global Step: 48750 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:48:50,028-Speed 5966.01 samples/sec Loss 11.7545 LearningRate 0.2889 Epoch: 4 Global Step: 48760 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:48:56,920-Speed 5943.82 samples/sec Loss 11.7794 LearningRate 0.2889 Epoch: 4 Global Step: 48770 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:49:03,787-Speed 5966.19 samples/sec Loss 11.7654 LearningRate 0.2888 Epoch: 4 Global Step: 48780 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:49:10,651-Speed 5969.24 samples/sec Loss 11.7434 LearningRate 0.2888 Epoch: 4 Global Step: 48790 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:49:17,521-Speed 5962.82 samples/sec Loss 11.7571 LearningRate 0.2888 Epoch: 4 Global Step: 48800 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:49:24,383-Speed 5970.35 samples/sec Loss 11.7964 LearningRate 0.2887 Epoch: 4 Global Step: 48810 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:49:31,227-Speed 5985.55 samples/sec Loss 11.8455 LearningRate 0.2887 Epoch: 4 Global Step: 48820 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:49:38,086-Speed 5972.86 samples/sec Loss 11.8047 LearningRate 0.2887 Epoch: 4 Global Step: 48830 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:49:44,941-Speed 5976.45 samples/sec Loss 11.9025 LearningRate 0.2886 Epoch: 4 Global Step: 48840 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:49:51,794-Speed 5977.71 samples/sec Loss 11.7506 LearningRate 0.2886 Epoch: 4 Global Step: 48850 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:49:58,643-Speed 5981.16 samples/sec Loss 11.8150 LearningRate 0.2885 Epoch: 4 Global Step: 48860 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:50:05,513-Speed 5964.02 samples/sec Loss 11.8171 LearningRate 0.2885 Epoch: 4 Global Step: 48870 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:50:12,380-Speed 5966.50 samples/sec Loss 11.7150 LearningRate 0.2885 Epoch: 4 Global Step: 48880 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:50:19,227-Speed 5983.11 samples/sec Loss 11.7040 LearningRate 0.2884 Epoch: 4 Global Step: 48890 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:50:26,093-Speed 5967.20 samples/sec Loss 11.7772 LearningRate 0.2884 Epoch: 4 Global Step: 48900 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:50:32,943-Speed 5979.74 samples/sec Loss 11.7527 LearningRate 0.2884 Epoch: 4 Global Step: 48910 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:50:39,863-Speed 5920.14 samples/sec Loss 11.7573 LearningRate 0.2883 Epoch: 4 Global Step: 48920 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:50:46,747-Speed 5951.63 samples/sec Loss 11.8163 LearningRate 0.2883 Epoch: 4 Global Step: 48930 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:50:53,604-Speed 5974.48 samples/sec Loss 11.8066 LearningRate 0.2883 Epoch: 4 Global Step: 48940 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:51:00,451-Speed 5983.49 samples/sec Loss 11.8065 LearningRate 0.2882 Epoch: 4 Global Step: 48950 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:51:07,287-Speed 5992.23 samples/sec Loss 11.7952 LearningRate 0.2882 Epoch: 4 Global Step: 48960 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:51:14,155-Speed 5965.39 samples/sec Loss 11.7857 LearningRate 0.2881 Epoch: 4 Global Step: 48970 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:51:21,018-Speed 5969.80 samples/sec Loss 11.8897 LearningRate 0.2881 Epoch: 4 Global Step: 48980 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:51:27,884-Speed 5965.83 samples/sec Loss 11.8739 LearningRate 0.2881 Epoch: 4 Global Step: 48990 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:51:34,746-Speed 5970.59 samples/sec Loss 11.7533 LearningRate 0.2880 Epoch: 4 Global Step: 49000 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:51:41,615-Speed 5967.33 samples/sec Loss 11.8089 LearningRate 0.2880 Epoch: 4 Global Step: 49010 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:51:48,484-Speed 5966.72 samples/sec Loss 11.7473 LearningRate 0.2880 Epoch: 4 Global Step: 49020 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:51:55,341-Speed 5974.95 samples/sec Loss 11.7726 LearningRate 0.2879 Epoch: 4 Global Step: 49030 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:52:02,200-Speed 5972.42 samples/sec Loss 11.8099 LearningRate 0.2879 Epoch: 4 Global Step: 49040 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:52:09,051-Speed 5979.92 samples/sec Loss 11.7458 LearningRate 0.2879 Epoch: 4 Global Step: 49050 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:52:15,909-Speed 5973.50 samples/sec Loss 11.7888 LearningRate 0.2878 Epoch: 4 Global Step: 49060 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:52:22,767-Speed 5973.43 samples/sec Loss 11.7689 LearningRate 0.2878 Epoch: 4 Global Step: 49070 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:52:29,614-Speed 5985.94 samples/sec Loss 11.7281 LearningRate 0.2877 Epoch: 4 Global Step: 49080 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:52:36,493-Speed 5954.95 samples/sec Loss 11.7695 LearningRate 0.2877 Epoch: 4 Global Step: 49090 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:52:43,341-Speed 5982.16 samples/sec Loss 11.7900 LearningRate 0.2877 Epoch: 4 Global Step: 49100 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:52:50,185-Speed 5986.74 samples/sec Loss 11.7166 LearningRate 0.2876 Epoch: 4 Global Step: 49110 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:52:57,027-Speed 5986.54 samples/sec Loss 11.8064 LearningRate 0.2876 Epoch: 4 Global Step: 49120 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:53:03,872-Speed 5985.54 samples/sec Loss 11.7855 LearningRate 0.2876 Epoch: 4 Global Step: 49130 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:53:10,723-Speed 5979.49 samples/sec Loss 11.7845 LearningRate 0.2875 Epoch: 4 Global Step: 49140 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:53:17,571-Speed 5982.63 samples/sec Loss 11.8657 LearningRate 0.2875 Epoch: 4 Global Step: 49150 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:53:24,408-Speed 5991.38 samples/sec Loss 11.7470 LearningRate 0.2875 Epoch: 4 Global Step: 49160 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:53:31,280-Speed 5960.88 samples/sec Loss 11.7441 LearningRate 0.2874 Epoch: 4 Global Step: 49170 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:53:38,137-Speed 5974.38 samples/sec Loss 11.7915 LearningRate 0.2874 Epoch: 4 Global Step: 49180 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:53:44,996-Speed 5972.71 samples/sec Loss 11.7496 LearningRate 0.2873 Epoch: 4 Global Step: 49190 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:53:51,853-Speed 5974.67 samples/sec Loss 11.7563 LearningRate 0.2873 Epoch: 4 Global Step: 49200 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:53:58,696-Speed 5986.78 samples/sec Loss 11.7721 LearningRate 0.2873 Epoch: 4 Global Step: 49210 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:54:05,556-Speed 5971.25 samples/sec Loss 11.7168 LearningRate 0.2872 Epoch: 4 Global Step: 49220 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:54:12,403-Speed 5983.27 samples/sec Loss 11.7712 LearningRate 0.2872 Epoch: 4 Global Step: 49230 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:54:19,267-Speed 5968.86 samples/sec Loss 11.7840 LearningRate 0.2872 Epoch: 4 Global Step: 49240 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:54:26,159-Speed 5944.19 samples/sec Loss 11.7996 LearningRate 0.2871 Epoch: 4 Global Step: 49250 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:54:33,053-Speed 5943.34 samples/sec Loss 11.7058 LearningRate 0.2871 Epoch: 4 Global Step: 49260 Fp16 Grad Scale: 524288 Required: 31 hours Training: 2022-01-08 05:54:39,896-Speed 5986.31 samples/sec Loss 11.7397 LearningRate 0.2871 Epoch: 4 Global Step: 49270 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:54:46,787-Speed 5945.15 samples/sec Loss 11.7396 LearningRate 0.2870 Epoch: 4 Global Step: 49280 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:54:53,662-Speed 5959.19 samples/sec Loss 11.6403 LearningRate 0.2870 Epoch: 4 Global Step: 49290 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:55:00,561-Speed 5938.43 samples/sec Loss 11.7431 LearningRate 0.2869 Epoch: 4 Global Step: 49300 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:55:07,417-Speed 5975.81 samples/sec Loss 11.7867 LearningRate 0.2869 Epoch: 4 Global Step: 49310 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:55:14,274-Speed 5973.56 samples/sec Loss 11.7818 LearningRate 0.2869 Epoch: 4 Global Step: 49320 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:55:21,114-Speed 5990.23 samples/sec Loss 11.7575 LearningRate 0.2868 Epoch: 4 Global Step: 49330 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:55:27,963-Speed 5983.33 samples/sec Loss 11.8020 LearningRate 0.2868 Epoch: 4 Global Step: 49340 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:55:34,806-Speed 5986.14 samples/sec Loss 11.7121 LearningRate 0.2868 Epoch: 4 Global Step: 49350 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:55:41,674-Speed 5965.28 samples/sec Loss 11.7497 LearningRate 0.2867 Epoch: 4 Global Step: 49360 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:55:48,521-Speed 5983.54 samples/sec Loss 11.7363 LearningRate 0.2867 Epoch: 4 Global Step: 49370 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:55:55,387-Speed 5968.57 samples/sec Loss 11.7267 LearningRate 0.2867 Epoch: 4 Global Step: 49380 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:56:02,248-Speed 5971.09 samples/sec Loss 11.6784 LearningRate 0.2866 Epoch: 4 Global Step: 49390 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 05:56:09,127-Speed 5955.71 samples/sec Loss 11.8149 LearningRate 0.2866 Epoch: 4 Global Step: 49400 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 05:56:15,980-Speed 5978.63 samples/sec Loss 11.7763 LearningRate 0.2865 Epoch: 4 Global Step: 49410 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 05:56:22,822-Speed 5987.28 samples/sec Loss 11.7223 LearningRate 0.2865 Epoch: 4 Global Step: 49420 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 05:56:29,689-Speed 5966.26 samples/sec Loss 11.7677 LearningRate 0.2865 Epoch: 4 Global Step: 49430 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 05:56:36,552-Speed 5969.31 samples/sec Loss 11.7614 LearningRate 0.2864 Epoch: 4 Global Step: 49440 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 05:56:43,421-Speed 5964.58 samples/sec Loss 11.7794 LearningRate 0.2864 Epoch: 4 Global Step: 49450 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 05:56:50,265-Speed 5985.40 samples/sec Loss 11.7558 LearningRate 0.2864 Epoch: 4 Global Step: 49460 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 05:56:57,144-Speed 5957.07 samples/sec Loss 11.7432 LearningRate 0.2863 Epoch: 4 Global Step: 49470 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 05:57:03,994-Speed 5979.91 samples/sec Loss 11.7208 LearningRate 0.2863 Epoch: 4 Global Step: 49480 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 05:57:10,872-Speed 5957.29 samples/sec Loss 11.7311 LearningRate 0.2863 Epoch: 4 Global Step: 49490 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:57:17,781-Speed 5931.10 samples/sec Loss 11.7243 LearningRate 0.2862 Epoch: 4 Global Step: 49500 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:57:24,647-Speed 5968.23 samples/sec Loss 11.7675 LearningRate 0.2862 Epoch: 4 Global Step: 49510 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:57:31,511-Speed 5968.82 samples/sec Loss 11.7548 LearningRate 0.2861 Epoch: 4 Global Step: 49520 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:57:38,378-Speed 5968.24 samples/sec Loss 11.7132 LearningRate 0.2861 Epoch: 4 Global Step: 49530 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:57:45,256-Speed 5956.54 samples/sec Loss 11.7472 LearningRate 0.2861 Epoch: 4 Global Step: 49540 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:57:52,122-Speed 5967.30 samples/sec Loss 11.7423 LearningRate 0.2860 Epoch: 4 Global Step: 49550 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:57:58,996-Speed 5959.79 samples/sec Loss 11.8101 LearningRate 0.2860 Epoch: 4 Global Step: 49560 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:58:05,904-Speed 5930.36 samples/sec Loss 11.7850 LearningRate 0.2860 Epoch: 4 Global Step: 49570 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:58:12,801-Speed 5940.14 samples/sec Loss 11.8351 LearningRate 0.2859 Epoch: 4 Global Step: 49580 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:58:19,659-Speed 5973.88 samples/sec Loss 11.7413 LearningRate 0.2859 Epoch: 4 Global Step: 49590 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:58:26,495-Speed 5992.52 samples/sec Loss 11.7289 LearningRate 0.2859 Epoch: 4 Global Step: 49600 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:58:33,343-Speed 5982.72 samples/sec Loss 11.6758 LearningRate 0.2858 Epoch: 4 Global Step: 49610 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:58:40,188-Speed 5984.96 samples/sec Loss 11.7349 LearningRate 0.2858 Epoch: 4 Global Step: 49620 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:58:47,039-Speed 5979.18 samples/sec Loss 11.7297 LearningRate 0.2857 Epoch: 4 Global Step: 49630 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:58:53,892-Speed 5977.82 samples/sec Loss 11.7119 LearningRate 0.2857 Epoch: 4 Global Step: 49640 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:59:00,741-Speed 5981.90 samples/sec Loss 11.6782 LearningRate 0.2857 Epoch: 4 Global Step: 49650 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:59:07,592-Speed 5979.78 samples/sec Loss 11.7326 LearningRate 0.2856 Epoch: 4 Global Step: 49660 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:59:14,462-Speed 5963.19 samples/sec Loss 11.7006 LearningRate 0.2856 Epoch: 4 Global Step: 49670 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:59:21,317-Speed 5976.39 samples/sec Loss 11.8179 LearningRate 0.2856 Epoch: 4 Global Step: 49680 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:59:28,184-Speed 5965.91 samples/sec Loss 11.7388 LearningRate 0.2855 Epoch: 4 Global Step: 49690 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 05:59:35,042-Speed 5973.35 samples/sec Loss 11.8249 LearningRate 0.2855 Epoch: 4 Global Step: 49700 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:59:41,902-Speed 5971.96 samples/sec Loss 11.6844 LearningRate 0.2855 Epoch: 4 Global Step: 49710 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:59:48,778-Speed 5957.54 samples/sec Loss 11.7831 LearningRate 0.2854 Epoch: 4 Global Step: 49720 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 05:59:55,623-Speed 5985.84 samples/sec Loss 11.7152 LearningRate 0.2854 Epoch: 4 Global Step: 49730 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:00:02,474-Speed 5980.05 samples/sec Loss 11.7290 LearningRate 0.2853 Epoch: 4 Global Step: 49740 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:00:09,326-Speed 5978.20 samples/sec Loss 11.7683 LearningRate 0.2853 Epoch: 4 Global Step: 49750 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:00:16,186-Speed 5972.30 samples/sec Loss 11.8469 LearningRate 0.2853 Epoch: 4 Global Step: 49760 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:00:23,030-Speed 5986.03 samples/sec Loss 11.6356 LearningRate 0.2852 Epoch: 4 Global Step: 49770 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:00:29,911-Speed 5954.05 samples/sec Loss 11.7880 LearningRate 0.2852 Epoch: 4 Global Step: 49780 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:00:36,762-Speed 5979.80 samples/sec Loss 11.6625 LearningRate 0.2852 Epoch: 4 Global Step: 49790 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:00:43,607-Speed 5985.30 samples/sec Loss 11.7747 LearningRate 0.2851 Epoch: 4 Global Step: 49800 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:00:50,490-Speed 5951.88 samples/sec Loss 11.7287 LearningRate 0.2851 Epoch: 4 Global Step: 49810 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:00:57,348-Speed 5973.59 samples/sec Loss 11.7368 LearningRate 0.2851 Epoch: 4 Global Step: 49820 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:01:04,189-Speed 5988.95 samples/sec Loss 11.6608 LearningRate 0.2850 Epoch: 4 Global Step: 49830 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:01:11,034-Speed 5984.72 samples/sec Loss 11.7312 LearningRate 0.2850 Epoch: 4 Global Step: 49840 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:01:17,882-Speed 5984.06 samples/sec Loss 11.6512 LearningRate 0.2849 Epoch: 4 Global Step: 49850 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:01:24,727-Speed 5984.57 samples/sec Loss 11.7671 LearningRate 0.2849 Epoch: 4 Global Step: 49860 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:01:31,571-Speed 5986.15 samples/sec Loss 11.7298 LearningRate 0.2849 Epoch: 4 Global Step: 49870 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:01:38,449-Speed 5956.53 samples/sec Loss 11.7842 LearningRate 0.2848 Epoch: 4 Global Step: 49880 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:01:45,304-Speed 5976.14 samples/sec Loss 11.7281 LearningRate 0.2848 Epoch: 4 Global Step: 49890 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:01:52,153-Speed 5980.74 samples/sec Loss 11.7280 LearningRate 0.2848 Epoch: 4 Global Step: 49900 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:01:59,013-Speed 5974.95 samples/sec Loss 11.6962 LearningRate 0.2847 Epoch: 4 Global Step: 49910 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:02:05,860-Speed 5982.52 samples/sec Loss 11.6701 LearningRate 0.2847 Epoch: 4 Global Step: 49920 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:02:12,713-Speed 5977.82 samples/sec Loss 11.7491 LearningRate 0.2847 Epoch: 4 Global Step: 49930 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:02:19,564-Speed 5983.65 samples/sec Loss 11.6977 LearningRate 0.2846 Epoch: 4 Global Step: 49940 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:02:26,432-Speed 5965.79 samples/sec Loss 11.7694 LearningRate 0.2846 Epoch: 4 Global Step: 49950 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:02:33,302-Speed 5963.63 samples/sec Loss 11.7414 LearningRate 0.2846 Epoch: 4 Global Step: 49960 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:02:40,190-Speed 5947.58 samples/sec Loss 11.5907 LearningRate 0.2845 Epoch: 4 Global Step: 49970 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:02:47,049-Speed 5972.41 samples/sec Loss 11.6604 LearningRate 0.2845 Epoch: 4 Global Step: 49980 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:02:53,908-Speed 5972.99 samples/sec Loss 11.6917 LearningRate 0.2844 Epoch: 4 Global Step: 49990 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:03:00,766-Speed 5974.05 samples/sec Loss 11.6831 LearningRate 0.2844 Epoch: 4 Global Step: 50000 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:03:27,631-[lfw][50000]XNorm: 22.400308 Training: 2022-01-08 06:03:27,632-[lfw][50000]Accuracy-Flip: 0.99550+-0.00299 Training: 2022-01-08 06:03:27,633-[lfw][50000]Accuracy-Highest: 0.99700 Training: 2022-01-08 06:03:58,769-[cfp_fp][50000]XNorm: 19.297992 Training: 2022-01-08 06:03:58,770-[cfp_fp][50000]Accuracy-Flip: 0.96743+-0.00956 Training: 2022-01-08 06:03:58,771-[cfp_fp][50000]Accuracy-Highest: 0.97057 Training: 2022-01-08 06:04:25,654-[agedb_30][50000]XNorm: 21.656549 Training: 2022-01-08 06:04:25,655-[agedb_30][50000]Accuracy-Flip: 0.96283+-0.00975 Training: 2022-01-08 06:04:25,656-[agedb_30][50000]Accuracy-Highest: 0.96283 Training: 2022-01-08 06:04:32,494-Speed 446.55 samples/sec Loss 11.6215 LearningRate 0.2844 Epoch: 4 Global Step: 50010 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:04:39,322-Speed 6000.00 samples/sec Loss 11.6512 LearningRate 0.2843 Epoch: 4 Global Step: 50020 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:04:46,155-Speed 5994.37 samples/sec Loss 11.7144 LearningRate 0.2843 Epoch: 4 Global Step: 50030 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:04:53,018-Speed 5969.71 samples/sec Loss 11.6973 LearningRate 0.2843 Epoch: 4 Global Step: 50040 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:04:59,880-Speed 5970.67 samples/sec Loss 11.7399 LearningRate 0.2842 Epoch: 4 Global Step: 50050 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:05:06,736-Speed 5975.73 samples/sec Loss 11.6811 LearningRate 0.2842 Epoch: 4 Global Step: 50060 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:05:13,592-Speed 5978.24 samples/sec Loss 11.7304 LearningRate 0.2842 Epoch: 4 Global Step: 50070 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:05:20,447-Speed 5976.25 samples/sec Loss 11.6734 LearningRate 0.2841 Epoch: 4 Global Step: 50080 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:05:27,304-Speed 5974.75 samples/sec Loss 11.7731 LearningRate 0.2841 Epoch: 4 Global Step: 50090 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:05:34,343-Speed 5819.77 samples/sec Loss 11.7663 LearningRate 0.2840 Epoch: 4 Global Step: 50100 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:05:41,204-Speed 5971.80 samples/sec Loss 11.6941 LearningRate 0.2840 Epoch: 4 Global Step: 50110 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:05:48,092-Speed 5948.14 samples/sec Loss 11.6544 LearningRate 0.2840 Epoch: 4 Global Step: 50120 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:05:54,952-Speed 5971.60 samples/sec Loss 11.6655 LearningRate 0.2839 Epoch: 4 Global Step: 50130 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:06:01,808-Speed 5975.56 samples/sec Loss 11.6294 LearningRate 0.2839 Epoch: 4 Global Step: 50140 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:06:08,685-Speed 5957.34 samples/sec Loss 11.6576 LearningRate 0.2839 Epoch: 4 Global Step: 50150 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:06:15,582-Speed 5941.19 samples/sec Loss 11.6081 LearningRate 0.2838 Epoch: 4 Global Step: 50160 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:06:22,431-Speed 5981.50 samples/sec Loss 11.6890 LearningRate 0.2838 Epoch: 4 Global Step: 50170 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:06:29,291-Speed 5978.69 samples/sec Loss 11.6619 LearningRate 0.2838 Epoch: 4 Global Step: 50180 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:06:36,163-Speed 5962.08 samples/sec Loss 11.6201 LearningRate 0.2837 Epoch: 4 Global Step: 50190 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:06:43,042-Speed 5955.67 samples/sec Loss 11.8166 LearningRate 0.2837 Epoch: 4 Global Step: 50200 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:06:49,869-Speed 6000.87 samples/sec Loss 11.6811 LearningRate 0.2836 Epoch: 4 Global Step: 50210 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:06:56,740-Speed 5962.15 samples/sec Loss 11.6050 LearningRate 0.2836 Epoch: 4 Global Step: 50220 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:07:03,603-Speed 5969.63 samples/sec Loss 11.6712 LearningRate 0.2836 Epoch: 4 Global Step: 50230 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:07:10,469-Speed 5967.22 samples/sec Loss 11.7312 LearningRate 0.2835 Epoch: 4 Global Step: 50240 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:07:17,349-Speed 5955.02 samples/sec Loss 11.6728 LearningRate 0.2835 Epoch: 4 Global Step: 50250 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:07:24,203-Speed 5976.98 samples/sec Loss 11.6731 LearningRate 0.2835 Epoch: 4 Global Step: 50260 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:07:31,068-Speed 5968.43 samples/sec Loss 11.6261 LearningRate 0.2834 Epoch: 4 Global Step: 50270 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:07:37,940-Speed 5961.38 samples/sec Loss 11.6697 LearningRate 0.2834 Epoch: 4 Global Step: 50280 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:07:44,797-Speed 5978.23 samples/sec Loss 11.6440 LearningRate 0.2834 Epoch: 4 Global Step: 50290 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:07:51,648-Speed 5979.76 samples/sec Loss 11.7059 LearningRate 0.2833 Epoch: 4 Global Step: 50300 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:07:58,537-Speed 5946.64 samples/sec Loss 11.6222 LearningRate 0.2833 Epoch: 4 Global Step: 50310 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:08:05,415-Speed 5956.45 samples/sec Loss 11.6912 LearningRate 0.2833 Epoch: 4 Global Step: 50320 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:08:12,286-Speed 5962.39 samples/sec Loss 11.6433 LearningRate 0.2832 Epoch: 4 Global Step: 50330 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:08:19,162-Speed 5957.93 samples/sec Loss 11.6370 LearningRate 0.2832 Epoch: 4 Global Step: 50340 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:08:26,042-Speed 5954.23 samples/sec Loss 11.6520 LearningRate 0.2831 Epoch: 4 Global Step: 50350 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:08:32,881-Speed 5989.66 samples/sec Loss 11.7255 LearningRate 0.2831 Epoch: 4 Global Step: 50360 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:08:39,758-Speed 5957.26 samples/sec Loss 11.6629 LearningRate 0.2831 Epoch: 4 Global Step: 50370 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:08:46,731-Speed 5875.45 samples/sec Loss 11.7207 LearningRate 0.2830 Epoch: 4 Global Step: 50380 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:08:53,684-Speed 5892.75 samples/sec Loss 11.6789 LearningRate 0.2830 Epoch: 4 Global Step: 50390 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:09:00,555-Speed 5962.83 samples/sec Loss 11.6608 LearningRate 0.2830 Epoch: 4 Global Step: 50400 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:09:07,504-Speed 5898.53 samples/sec Loss 11.6870 LearningRate 0.2829 Epoch: 4 Global Step: 50410 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:09:14,454-Speed 5894.80 samples/sec Loss 11.6671 LearningRate 0.2829 Epoch: 4 Global Step: 50420 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:09:21,309-Speed 5976.96 samples/sec Loss 11.6752 LearningRate 0.2829 Epoch: 4 Global Step: 50430 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:09:28,176-Speed 5965.41 samples/sec Loss 11.5497 LearningRate 0.2828 Epoch: 4 Global Step: 50440 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:09:35,047-Speed 5962.47 samples/sec Loss 11.6815 LearningRate 0.2828 Epoch: 4 Global Step: 50450 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:09:41,903-Speed 5975.45 samples/sec Loss 11.6085 LearningRate 0.2827 Epoch: 4 Global Step: 50460 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:09:48,775-Speed 5961.50 samples/sec Loss 11.6727 LearningRate 0.2827 Epoch: 4 Global Step: 50470 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:09:55,630-Speed 5977.01 samples/sec Loss 11.7422 LearningRate 0.2827 Epoch: 4 Global Step: 50480 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:10:02,476-Speed 5984.62 samples/sec Loss 11.6345 LearningRate 0.2826 Epoch: 4 Global Step: 50490 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:10:09,336-Speed 5972.50 samples/sec Loss 11.6466 LearningRate 0.2826 Epoch: 4 Global Step: 50500 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:10:16,185-Speed 5981.76 samples/sec Loss 11.6778 LearningRate 0.2826 Epoch: 4 Global Step: 50510 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:10:23,034-Speed 5981.46 samples/sec Loss 11.6576 LearningRate 0.2825 Epoch: 4 Global Step: 50520 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:10:29,887-Speed 5977.24 samples/sec Loss 11.6173 LearningRate 0.2825 Epoch: 4 Global Step: 50530 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:10:36,749-Speed 5970.34 samples/sec Loss 11.6587 LearningRate 0.2825 Epoch: 4 Global Step: 50540 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:10:43,607-Speed 5974.04 samples/sec Loss 11.6375 LearningRate 0.2824 Epoch: 4 Global Step: 50550 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:10:50,469-Speed 5970.83 samples/sec Loss 11.6951 LearningRate 0.2824 Epoch: 4 Global Step: 50560 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:10:57,324-Speed 5976.46 samples/sec Loss 11.6010 LearningRate 0.2824 Epoch: 4 Global Step: 50570 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:11:04,194-Speed 5962.86 samples/sec Loss 11.6408 LearningRate 0.2823 Epoch: 4 Global Step: 50580 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:11:11,051-Speed 5975.30 samples/sec Loss 11.6082 LearningRate 0.2823 Epoch: 4 Global Step: 50590 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:11:17,926-Speed 5958.75 samples/sec Loss 11.6580 LearningRate 0.2822 Epoch: 4 Global Step: 50600 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:11:24,792-Speed 5966.84 samples/sec Loss 11.6808 LearningRate 0.2822 Epoch: 4 Global Step: 50610 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:11:31,651-Speed 5973.24 samples/sec Loss 11.6836 LearningRate 0.2822 Epoch: 4 Global Step: 50620 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:11:38,542-Speed 5944.41 samples/sec Loss 11.6677 LearningRate 0.2821 Epoch: 4 Global Step: 50630 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:11:45,403-Speed 5971.34 samples/sec Loss 11.6864 LearningRate 0.2821 Epoch: 4 Global Step: 50640 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:11:52,278-Speed 5959.01 samples/sec Loss 11.6701 LearningRate 0.2821 Epoch: 4 Global Step: 50650 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:11:59,173-Speed 5942.41 samples/sec Loss 11.6610 LearningRate 0.2820 Epoch: 4 Global Step: 50660 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:12:06,057-Speed 5950.52 samples/sec Loss 11.6518 LearningRate 0.2820 Epoch: 4 Global Step: 50670 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:12:12,910-Speed 5980.07 samples/sec Loss 11.6421 LearningRate 0.2820 Epoch: 4 Global Step: 50680 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:12:19,760-Speed 5980.92 samples/sec Loss 11.6501 LearningRate 0.2819 Epoch: 4 Global Step: 50690 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:12:26,623-Speed 5969.65 samples/sec Loss 11.6258 LearningRate 0.2819 Epoch: 4 Global Step: 50700 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:12:33,484-Speed 5970.39 samples/sec Loss 11.7094 LearningRate 0.2818 Epoch: 4 Global Step: 50710 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:12:40,336-Speed 5979.96 samples/sec Loss 11.6670 LearningRate 0.2818 Epoch: 4 Global Step: 50720 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:12:47,238-Speed 5935.64 samples/sec Loss 11.5532 LearningRate 0.2818 Epoch: 4 Global Step: 50730 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:12:54,103-Speed 5967.45 samples/sec Loss 11.6430 LearningRate 0.2817 Epoch: 4 Global Step: 50740 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:13:00,954-Speed 5980.26 samples/sec Loss 11.6118 LearningRate 0.2817 Epoch: 4 Global Step: 50750 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:13:07,809-Speed 5975.70 samples/sec Loss 11.5784 LearningRate 0.2817 Epoch: 4 Global Step: 50760 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:13:14,675-Speed 5966.76 samples/sec Loss 11.6434 LearningRate 0.2816 Epoch: 4 Global Step: 50770 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:13:21,529-Speed 5979.77 samples/sec Loss 11.6201 LearningRate 0.2816 Epoch: 4 Global Step: 50780 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:13:28,396-Speed 5965.99 samples/sec Loss 11.6243 LearningRate 0.2816 Epoch: 4 Global Step: 50790 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:13:35,263-Speed 5965.89 samples/sec Loss 11.7484 LearningRate 0.2815 Epoch: 4 Global Step: 50800 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:13:42,114-Speed 5980.22 samples/sec Loss 11.6290 LearningRate 0.2815 Epoch: 4 Global Step: 50810 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:13:48,970-Speed 5975.31 samples/sec Loss 11.7262 LearningRate 0.2815 Epoch: 4 Global Step: 50820 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:13:55,835-Speed 5966.77 samples/sec Loss 11.7088 LearningRate 0.2814 Epoch: 4 Global Step: 50830 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:14:02,708-Speed 5960.91 samples/sec Loss 11.5175 LearningRate 0.2814 Epoch: 4 Global Step: 50840 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:14:09,556-Speed 5982.42 samples/sec Loss 11.6267 LearningRate 0.2813 Epoch: 4 Global Step: 50850 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:14:16,404-Speed 5982.32 samples/sec Loss 11.5705 LearningRate 0.2813 Epoch: 4 Global Step: 50860 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:14:23,262-Speed 5973.85 samples/sec Loss 11.7076 LearningRate 0.2813 Epoch: 4 Global Step: 50870 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:14:30,127-Speed 5970.85 samples/sec Loss 11.6465 LearningRate 0.2812 Epoch: 4 Global Step: 50880 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:14:36,988-Speed 5970.79 samples/sec Loss 11.5755 LearningRate 0.2812 Epoch: 4 Global Step: 50890 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:14:43,864-Speed 5957.49 samples/sec Loss 11.6207 LearningRate 0.2812 Epoch: 4 Global Step: 50900 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:14:50,718-Speed 5976.88 samples/sec Loss 11.6135 LearningRate 0.2811 Epoch: 4 Global Step: 50910 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:14:57,569-Speed 5980.25 samples/sec Loss 11.5793 LearningRate 0.2811 Epoch: 4 Global Step: 50920 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:15:04,449-Speed 5954.07 samples/sec Loss 11.5969 LearningRate 0.2811 Epoch: 4 Global Step: 50930 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:15:11,315-Speed 5967.56 samples/sec Loss 11.5839 LearningRate 0.2810 Epoch: 4 Global Step: 50940 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:15:18,178-Speed 5969.41 samples/sec Loss 11.5623 LearningRate 0.2810 Epoch: 4 Global Step: 50950 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:15:25,060-Speed 5952.51 samples/sec Loss 11.5315 LearningRate 0.2809 Epoch: 4 Global Step: 50960 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:15:31,918-Speed 5973.56 samples/sec Loss 11.6062 LearningRate 0.2809 Epoch: 4 Global Step: 50970 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:15:38,766-Speed 5982.50 samples/sec Loss 11.6748 LearningRate 0.2809 Epoch: 4 Global Step: 50980 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:15:45,670-Speed 5934.32 samples/sec Loss 11.6256 LearningRate 0.2808 Epoch: 4 Global Step: 50990 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:15:52,539-Speed 5964.24 samples/sec Loss 11.6233 LearningRate 0.2808 Epoch: 4 Global Step: 51000 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:15:59,393-Speed 5977.63 samples/sec Loss 11.6501 LearningRate 0.2808 Epoch: 4 Global Step: 51010 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:16:06,252-Speed 5972.90 samples/sec Loss 11.6361 LearningRate 0.2807 Epoch: 4 Global Step: 51020 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:16:13,130-Speed 5957.58 samples/sec Loss 11.6192 LearningRate 0.2807 Epoch: 4 Global Step: 51030 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:16:19,979-Speed 5981.27 samples/sec Loss 11.6501 LearningRate 0.2807 Epoch: 4 Global Step: 51040 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:16:26,859-Speed 5955.86 samples/sec Loss 11.6041 LearningRate 0.2806 Epoch: 4 Global Step: 51050 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:16:33,719-Speed 5972.64 samples/sec Loss 11.6651 LearningRate 0.2806 Epoch: 4 Global Step: 51060 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:16:40,575-Speed 5975.35 samples/sec Loss 11.6829 LearningRate 0.2806 Epoch: 4 Global Step: 51070 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:16:47,424-Speed 5981.68 samples/sec Loss 11.6193 LearningRate 0.2805 Epoch: 4 Global Step: 51080 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:16:54,397-Speed 5875.41 samples/sec Loss 11.5882 LearningRate 0.2805 Epoch: 4 Global Step: 51090 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:17:01,352-Speed 5890.17 samples/sec Loss 11.6463 LearningRate 0.2804 Epoch: 4 Global Step: 51100 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:17:08,209-Speed 5975.39 samples/sec Loss 11.6126 LearningRate 0.2804 Epoch: 4 Global Step: 51110 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:17:15,757-Speed 5427.59 samples/sec Loss 11.5599 LearningRate 0.2804 Epoch: 4 Global Step: 51120 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:17:22,627-Speed 5963.36 samples/sec Loss 11.5309 LearningRate 0.2803 Epoch: 4 Global Step: 51130 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:17:29,552-Speed 5915.94 samples/sec Loss 11.6319 LearningRate 0.2803 Epoch: 4 Global Step: 51140 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:17:36,396-Speed 5985.90 samples/sec Loss 11.6606 LearningRate 0.2803 Epoch: 4 Global Step: 51150 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:17:43,243-Speed 5983.43 samples/sec Loss 11.6067 LearningRate 0.2802 Epoch: 4 Global Step: 51160 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:17:50,093-Speed 5980.30 samples/sec Loss 11.6177 LearningRate 0.2802 Epoch: 4 Global Step: 51170 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:17:56,947-Speed 5977.61 samples/sec Loss 11.6112 LearningRate 0.2802 Epoch: 4 Global Step: 51180 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:18:03,791-Speed 5985.40 samples/sec Loss 11.6756 LearningRate 0.2801 Epoch: 4 Global Step: 51190 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:18:10,635-Speed 5986.31 samples/sec Loss 11.6832 LearningRate 0.2801 Epoch: 4 Global Step: 51200 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:18:17,491-Speed 5975.40 samples/sec Loss 11.5238 LearningRate 0.2801 Epoch: 4 Global Step: 51210 Fp16 Grad Scale: 524288 Required: 31 hours Training: 2022-01-08 06:18:24,346-Speed 5976.21 samples/sec Loss 11.5281 LearningRate 0.2800 Epoch: 4 Global Step: 51220 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:18:31,197-Speed 5979.40 samples/sec Loss 11.6352 LearningRate 0.2800 Epoch: 4 Global Step: 51230 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:18:38,075-Speed 5957.20 samples/sec Loss 11.6084 LearningRate 0.2799 Epoch: 4 Global Step: 51240 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:18:44,930-Speed 5976.67 samples/sec Loss 11.6124 LearningRate 0.2799 Epoch: 4 Global Step: 51250 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:18:51,795-Speed 5967.13 samples/sec Loss 11.6158 LearningRate 0.2799 Epoch: 4 Global Step: 51260 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:18:58,656-Speed 5972.09 samples/sec Loss 11.6191 LearningRate 0.2798 Epoch: 4 Global Step: 51270 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:19:05,523-Speed 5966.63 samples/sec Loss 11.6552 LearningRate 0.2798 Epoch: 4 Global Step: 51280 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:19:12,410-Speed 5949.08 samples/sec Loss 11.6958 LearningRate 0.2798 Epoch: 4 Global Step: 51290 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:19:19,292-Speed 5952.19 samples/sec Loss 11.5714 LearningRate 0.2797 Epoch: 4 Global Step: 51300 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:19:26,147-Speed 5976.35 samples/sec Loss 11.6487 LearningRate 0.2797 Epoch: 4 Global Step: 51310 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:19:33,006-Speed 5972.93 samples/sec Loss 11.5633 LearningRate 0.2797 Epoch: 4 Global Step: 51320 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:19:39,855-Speed 5981.11 samples/sec Loss 11.5483 LearningRate 0.2796 Epoch: 4 Global Step: 51330 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:19:46,706-Speed 5980.45 samples/sec Loss 11.5792 LearningRate 0.2796 Epoch: 4 Global Step: 51340 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:19:53,588-Speed 5952.99 samples/sec Loss 11.5935 LearningRate 0.2795 Epoch: 4 Global Step: 51350 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:20:00,432-Speed 5986.22 samples/sec Loss 11.5646 LearningRate 0.2795 Epoch: 4 Global Step: 51360 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:20:07,279-Speed 5983.12 samples/sec Loss 11.5770 LearningRate 0.2795 Epoch: 4 Global Step: 51370 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:20:14,129-Speed 5981.01 samples/sec Loss 11.5677 LearningRate 0.2794 Epoch: 4 Global Step: 51380 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:20:20,976-Speed 5983.70 samples/sec Loss 11.6051 LearningRate 0.2794 Epoch: 4 Global Step: 51390 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:20:27,841-Speed 5970.08 samples/sec Loss 11.5785 LearningRate 0.2794 Epoch: 4 Global Step: 51400 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:20:34,724-Speed 5952.00 samples/sec Loss 11.6214 LearningRate 0.2793 Epoch: 4 Global Step: 51410 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:20:41,586-Speed 5970.55 samples/sec Loss 11.5417 LearningRate 0.2793 Epoch: 4 Global Step: 51420 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:20:48,440-Speed 5976.51 samples/sec Loss 11.5471 LearningRate 0.2793 Epoch: 4 Global Step: 51430 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:20:55,303-Speed 5968.88 samples/sec Loss 11.5746 LearningRate 0.2792 Epoch: 4 Global Step: 51440 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:21:02,169-Speed 5967.20 samples/sec Loss 11.6266 LearningRate 0.2792 Epoch: 4 Global Step: 51450 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:21:09,034-Speed 5967.51 samples/sec Loss 11.5673 LearningRate 0.2792 Epoch: 4 Global Step: 51460 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:21:15,875-Speed 5988.81 samples/sec Loss 11.6125 LearningRate 0.2791 Epoch: 4 Global Step: 51470 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:21:22,731-Speed 5975.15 samples/sec Loss 11.5958 LearningRate 0.2791 Epoch: 4 Global Step: 51480 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:21:29,576-Speed 5984.29 samples/sec Loss 11.5832 LearningRate 0.2790 Epoch: 4 Global Step: 51490 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:21:36,454-Speed 5956.27 samples/sec Loss 11.5235 LearningRate 0.2790 Epoch: 4 Global Step: 51500 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:21:43,313-Speed 5972.17 samples/sec Loss 11.5069 LearningRate 0.2790 Epoch: 4 Global Step: 51510 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:21:51,151-Speed 5950.05 samples/sec Loss 11.5738 LearningRate 0.2789 Epoch: 4 Global Step: 51520 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:21:57,990-Speed 5990.24 samples/sec Loss 11.5394 LearningRate 0.2789 Epoch: 4 Global Step: 51530 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:22:04,846-Speed 5975.14 samples/sec Loss 11.5878 LearningRate 0.2789 Epoch: 4 Global Step: 51540 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:22:11,705-Speed 5972.92 samples/sec Loss 11.5694 LearningRate 0.2788 Epoch: 4 Global Step: 51550 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:22:18,559-Speed 5976.87 samples/sec Loss 11.5812 LearningRate 0.2788 Epoch: 4 Global Step: 51560 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:22:25,432-Speed 5960.98 samples/sec Loss 11.6510 LearningRate 0.2788 Epoch: 4 Global Step: 51570 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:22:32,281-Speed 5981.37 samples/sec Loss 11.6324 LearningRate 0.2787 Epoch: 4 Global Step: 51580 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:22:39,137-Speed 5975.34 samples/sec Loss 11.5357 LearningRate 0.2787 Epoch: 4 Global Step: 51590 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:22:46,005-Speed 5964.37 samples/sec Loss 11.5194 LearningRate 0.2787 Epoch: 4 Global Step: 51600 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:22:52,882-Speed 5956.99 samples/sec Loss 11.5589 LearningRate 0.2786 Epoch: 4 Global Step: 51610 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:22:59,739-Speed 5974.54 samples/sec Loss 11.5337 LearningRate 0.2786 Epoch: 4 Global Step: 51620 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 06:23:06,695-Speed 5890.06 samples/sec Loss 11.5402 LearningRate 0.2785 Epoch: 4 Global Step: 51630 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:23:13,662-Speed 5880.26 samples/sec Loss 11.5192 LearningRate 0.2785 Epoch: 4 Global Step: 51640 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:23:20,661-Speed 5853.36 samples/sec Loss 11.6227 LearningRate 0.2785 Epoch: 4 Global Step: 51650 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:23:27,548-Speed 5948.65 samples/sec Loss 11.5725 LearningRate 0.2784 Epoch: 4 Global Step: 51660 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:23:34,392-Speed 5985.58 samples/sec Loss 11.6697 LearningRate 0.2784 Epoch: 4 Global Step: 51670 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:23:41,238-Speed 5983.60 samples/sec Loss 11.6424 LearningRate 0.2784 Epoch: 4 Global Step: 51680 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:23:48,085-Speed 5983.74 samples/sec Loss 11.5381 LearningRate 0.2783 Epoch: 4 Global Step: 51690 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:23:54,934-Speed 5980.81 samples/sec Loss 11.6315 LearningRate 0.2783 Epoch: 4 Global Step: 51700 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:24:01,802-Speed 5967.59 samples/sec Loss 11.5682 LearningRate 0.2783 Epoch: 4 Global Step: 51710 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 06:24:08,650-Speed 5984.63 samples/sec Loss 11.5315 LearningRate 0.2782 Epoch: 4 Global Step: 51720 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:24:15,522-Speed 5961.23 samples/sec Loss 11.5493 LearningRate 0.2782 Epoch: 4 Global Step: 51730 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:24:22,368-Speed 5984.71 samples/sec Loss 11.4920 LearningRate 0.2782 Epoch: 4 Global Step: 51740 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:24:29,220-Speed 5979.14 samples/sec Loss 11.5446 LearningRate 0.2781 Epoch: 4 Global Step: 51750 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:24:36,068-Speed 5984.12 samples/sec Loss 11.6249 LearningRate 0.2781 Epoch: 4 Global Step: 51760 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:24:42,954-Speed 5949.13 samples/sec Loss 11.5598 LearningRate 0.2780 Epoch: 4 Global Step: 51770 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:24:49,863-Speed 5929.54 samples/sec Loss 11.5466 LearningRate 0.2780 Epoch: 4 Global Step: 51780 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:24:56,763-Speed 5937.10 samples/sec Loss 11.5861 LearningRate 0.2780 Epoch: 4 Global Step: 51790 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:25:03,672-Speed 5929.71 samples/sec Loss 11.5186 LearningRate 0.2779 Epoch: 4 Global Step: 51800 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:25:10,586-Speed 5925.74 samples/sec Loss 11.5940 LearningRate 0.2779 Epoch: 4 Global Step: 51810 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:25:17,498-Speed 5927.24 samples/sec Loss 11.6814 LearningRate 0.2779 Epoch: 4 Global Step: 51820 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:25:24,346-Speed 5982.63 samples/sec Loss 11.5181 LearningRate 0.2778 Epoch: 4 Global Step: 51830 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:25:31,189-Speed 5986.60 samples/sec Loss 11.5783 LearningRate 0.2778 Epoch: 4 Global Step: 51840 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:25:54,640-Speed 1746.87 samples/sec Loss 11.6261 LearningRate 0.2778 Epoch: 5 Global Step: 51850 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:26:01,466-Speed 6001.82 samples/sec Loss 11.5731 LearningRate 0.2777 Epoch: 5 Global Step: 51860 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:26:08,307-Speed 5989.31 samples/sec Loss 11.4783 LearningRate 0.2777 Epoch: 5 Global Step: 51870 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:26:15,174-Speed 5965.38 samples/sec Loss 11.5942 LearningRate 0.2777 Epoch: 5 Global Step: 51880 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:26:22,015-Speed 5988.34 samples/sec Loss 11.5850 LearningRate 0.2776 Epoch: 5 Global Step: 51890 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:26:28,855-Speed 5989.13 samples/sec Loss 11.5959 LearningRate 0.2776 Epoch: 5 Global Step: 51900 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:26:35,708-Speed 5977.91 samples/sec Loss 11.5034 LearningRate 0.2775 Epoch: 5 Global Step: 51910 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:26:42,576-Speed 5965.16 samples/sec Loss 11.4726 LearningRate 0.2775 Epoch: 5 Global Step: 51920 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:26:49,430-Speed 5977.71 samples/sec Loss 11.5159 LearningRate 0.2775 Epoch: 5 Global Step: 51930 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:26:56,325-Speed 5941.76 samples/sec Loss 11.4724 LearningRate 0.2774 Epoch: 5 Global Step: 51940 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:27:03,206-Speed 5953.85 samples/sec Loss 11.5536 LearningRate 0.2774 Epoch: 5 Global Step: 51950 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:27:10,086-Speed 5956.27 samples/sec Loss 11.5634 LearningRate 0.2774 Epoch: 5 Global Step: 51960 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:27:16,975-Speed 5946.46 samples/sec Loss 11.5570 LearningRate 0.2773 Epoch: 5 Global Step: 51970 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:27:23,848-Speed 5960.69 samples/sec Loss 11.6333 LearningRate 0.2773 Epoch: 5 Global Step: 51980 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:27:30,722-Speed 5960.47 samples/sec Loss 11.5067 LearningRate 0.2773 Epoch: 5 Global Step: 51990 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:27:37,592-Speed 5962.42 samples/sec Loss 11.6254 LearningRate 0.2772 Epoch: 5 Global Step: 52000 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:27:44,444-Speed 5979.46 samples/sec Loss 11.5024 LearningRate 0.2772 Epoch: 5 Global Step: 52010 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:27:51,308-Speed 5968.77 samples/sec Loss 11.5203 LearningRate 0.2772 Epoch: 5 Global Step: 52020 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:27:58,185-Speed 5956.81 samples/sec Loss 11.4787 LearningRate 0.2771 Epoch: 5 Global Step: 52030 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:28:05,040-Speed 5977.44 samples/sec Loss 11.4611 LearningRate 0.2771 Epoch: 5 Global Step: 52040 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:28:11,889-Speed 5981.86 samples/sec Loss 11.5165 LearningRate 0.2770 Epoch: 5 Global Step: 52050 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:28:18,747-Speed 5973.14 samples/sec Loss 11.4929 LearningRate 0.2770 Epoch: 5 Global Step: 52060 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:28:25,608-Speed 5971.36 samples/sec Loss 11.5661 LearningRate 0.2770 Epoch: 5 Global Step: 52070 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:28:32,465-Speed 5974.72 samples/sec Loss 11.5610 LearningRate 0.2769 Epoch: 5 Global Step: 52080 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:28:39,311-Speed 5983.43 samples/sec Loss 11.5007 LearningRate 0.2769 Epoch: 5 Global Step: 52090 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:28:46,160-Speed 5981.24 samples/sec Loss 11.4992 LearningRate 0.2769 Epoch: 5 Global Step: 52100 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:28:53,030-Speed 5963.97 samples/sec Loss 11.4888 LearningRate 0.2768 Epoch: 5 Global Step: 52110 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:28:59,891-Speed 5970.59 samples/sec Loss 11.4524 LearningRate 0.2768 Epoch: 5 Global Step: 52120 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:29:06,766-Speed 5959.22 samples/sec Loss 11.4897 LearningRate 0.2768 Epoch: 5 Global Step: 52130 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:29:13,621-Speed 5976.00 samples/sec Loss 11.5300 LearningRate 0.2767 Epoch: 5 Global Step: 52140 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:29:20,497-Speed 5958.30 samples/sec Loss 11.5684 LearningRate 0.2767 Epoch: 5 Global Step: 52150 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:29:27,357-Speed 5972.35 samples/sec Loss 11.4863 LearningRate 0.2767 Epoch: 5 Global Step: 52160 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:29:34,228-Speed 5962.58 samples/sec Loss 11.5483 LearningRate 0.2766 Epoch: 5 Global Step: 52170 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:29:41,083-Speed 5975.89 samples/sec Loss 11.6141 LearningRate 0.2766 Epoch: 5 Global Step: 52180 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:29:47,932-Speed 5981.88 samples/sec Loss 11.5448 LearningRate 0.2765 Epoch: 5 Global Step: 52190 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:29:54,799-Speed 5965.62 samples/sec Loss 11.5414 LearningRate 0.2765 Epoch: 5 Global Step: 52200 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:30:01,661-Speed 5970.59 samples/sec Loss 11.3880 LearningRate 0.2765 Epoch: 5 Global Step: 52210 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:30:08,516-Speed 5975.60 samples/sec Loss 11.4484 LearningRate 0.2764 Epoch: 5 Global Step: 52220 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:30:15,372-Speed 5976.37 samples/sec Loss 11.4607 LearningRate 0.2764 Epoch: 5 Global Step: 52230 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:30:22,227-Speed 5975.82 samples/sec Loss 11.5810 LearningRate 0.2764 Epoch: 5 Global Step: 52240 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:30:29,089-Speed 5969.68 samples/sec Loss 11.5709 LearningRate 0.2763 Epoch: 5 Global Step: 52250 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:30:35,950-Speed 5971.30 samples/sec Loss 11.5026 LearningRate 0.2763 Epoch: 5 Global Step: 52260 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:30:42,820-Speed 5962.87 samples/sec Loss 11.5492 LearningRate 0.2763 Epoch: 5 Global Step: 52270 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:30:49,701-Speed 5954.95 samples/sec Loss 11.5272 LearningRate 0.2762 Epoch: 5 Global Step: 52280 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:30:57,403-Speed 5319.20 samples/sec Loss 11.5450 LearningRate 0.2762 Epoch: 5 Global Step: 52290 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:31:04,295-Speed 5943.60 samples/sec Loss 11.4874 LearningRate 0.2762 Epoch: 5 Global Step: 52300 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:31:11,148-Speed 5978.16 samples/sec Loss 11.5034 LearningRate 0.2761 Epoch: 5 Global Step: 52310 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:31:18,024-Speed 5958.53 samples/sec Loss 11.5110 LearningRate 0.2761 Epoch: 5 Global Step: 52320 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:31:24,886-Speed 5969.95 samples/sec Loss 11.6302 LearningRate 0.2760 Epoch: 5 Global Step: 52330 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:31:31,741-Speed 5976.44 samples/sec Loss 11.4809 LearningRate 0.2760 Epoch: 5 Global Step: 52340 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:31:38,604-Speed 5971.53 samples/sec Loss 11.5359 LearningRate 0.2760 Epoch: 5 Global Step: 52350 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:31:45,464-Speed 5971.90 samples/sec Loss 11.5317 LearningRate 0.2759 Epoch: 5 Global Step: 52360 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:31:52,335-Speed 5962.51 samples/sec Loss 11.4804 LearningRate 0.2759 Epoch: 5 Global Step: 52370 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:31:59,198-Speed 5969.59 samples/sec Loss 11.4947 LearningRate 0.2759 Epoch: 5 Global Step: 52380 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:32:06,059-Speed 5971.56 samples/sec Loss 11.4918 LearningRate 0.2758 Epoch: 5 Global Step: 52390 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:32:12,917-Speed 5974.51 samples/sec Loss 11.5375 LearningRate 0.2758 Epoch: 5 Global Step: 52400 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:32:19,822-Speed 5932.96 samples/sec Loss 11.4206 LearningRate 0.2758 Epoch: 5 Global Step: 52410 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:32:26,671-Speed 5981.31 samples/sec Loss 11.4582 LearningRate 0.2757 Epoch: 5 Global Step: 52420 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:32:33,527-Speed 5976.26 samples/sec Loss 11.5115 LearningRate 0.2757 Epoch: 5 Global Step: 52430 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:32:40,383-Speed 5976.03 samples/sec Loss 11.5404 LearningRate 0.2757 Epoch: 5 Global Step: 52440 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:32:47,232-Speed 5981.00 samples/sec Loss 11.5326 LearningRate 0.2756 Epoch: 5 Global Step: 52450 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:32:54,100-Speed 5968.16 samples/sec Loss 11.5891 LearningRate 0.2756 Epoch: 5 Global Step: 52460 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:33:00,958-Speed 5973.32 samples/sec Loss 11.5254 LearningRate 0.2755 Epoch: 5 Global Step: 52470 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:33:07,811-Speed 5978.89 samples/sec Loss 11.5162 LearningRate 0.2755 Epoch: 5 Global Step: 52480 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:33:14,686-Speed 5959.37 samples/sec Loss 11.5526 LearningRate 0.2755 Epoch: 5 Global Step: 52490 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:33:21,540-Speed 5977.06 samples/sec Loss 11.4682 LearningRate 0.2754 Epoch: 5 Global Step: 52500 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:33:28,397-Speed 5974.37 samples/sec Loss 11.4961 LearningRate 0.2754 Epoch: 5 Global Step: 52510 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:33:35,255-Speed 5974.66 samples/sec Loss 11.3888 LearningRate 0.2754 Epoch: 5 Global Step: 52520 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:33:42,159-Speed 5933.36 samples/sec Loss 11.4128 LearningRate 0.2753 Epoch: 5 Global Step: 52530 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:33:49,005-Speed 5984.48 samples/sec Loss 11.4623 LearningRate 0.2753 Epoch: 5 Global Step: 52540 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:33:55,862-Speed 5974.65 samples/sec Loss 11.5511 LearningRate 0.2753 Epoch: 5 Global Step: 52550 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:34:02,719-Speed 5975.46 samples/sec Loss 11.5156 LearningRate 0.2752 Epoch: 5 Global Step: 52560 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:34:09,570-Speed 5979.90 samples/sec Loss 11.4849 LearningRate 0.2752 Epoch: 5 Global Step: 52570 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:34:16,415-Speed 5984.91 samples/sec Loss 11.4991 LearningRate 0.2752 Epoch: 5 Global Step: 52580 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:34:23,343-Speed 5915.58 samples/sec Loss 11.4314 LearningRate 0.2751 Epoch: 5 Global Step: 52590 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:34:30,194-Speed 5979.45 samples/sec Loss 11.4795 LearningRate 0.2751 Epoch: 5 Global Step: 52600 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:34:37,065-Speed 5962.69 samples/sec Loss 11.4230 LearningRate 0.2751 Epoch: 5 Global Step: 52610 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:34:43,919-Speed 5977.64 samples/sec Loss 11.5089 LearningRate 0.2750 Epoch: 5 Global Step: 52620 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:34:50,765-Speed 5983.37 samples/sec Loss 11.5446 LearningRate 0.2750 Epoch: 5 Global Step: 52630 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:34:57,634-Speed 5967.24 samples/sec Loss 11.4660 LearningRate 0.2749 Epoch: 5 Global Step: 52640 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:35:04,494-Speed 5972.71 samples/sec Loss 11.4211 LearningRate 0.2749 Epoch: 5 Global Step: 52650 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:35:11,371-Speed 5956.81 samples/sec Loss 11.5726 LearningRate 0.2749 Epoch: 5 Global Step: 52660 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:35:18,223-Speed 5978.70 samples/sec Loss 11.5268 LearningRate 0.2748 Epoch: 5 Global Step: 52670 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:35:25,101-Speed 5958.48 samples/sec Loss 11.5139 LearningRate 0.2748 Epoch: 5 Global Step: 52680 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:35:31,970-Speed 5964.14 samples/sec Loss 11.4999 LearningRate 0.2748 Epoch: 5 Global Step: 52690 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:35:38,816-Speed 5984.36 samples/sec Loss 11.4837 LearningRate 0.2747 Epoch: 5 Global Step: 52700 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:35:45,668-Speed 5978.79 samples/sec Loss 11.5423 LearningRate 0.2747 Epoch: 5 Global Step: 52710 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:35:52,527-Speed 5973.14 samples/sec Loss 11.4830 LearningRate 0.2747 Epoch: 5 Global Step: 52720 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:35:59,374-Speed 5984.33 samples/sec Loss 11.4948 LearningRate 0.2746 Epoch: 5 Global Step: 52730 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:36:06,248-Speed 5959.53 samples/sec Loss 11.3984 LearningRate 0.2746 Epoch: 5 Global Step: 52740 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:36:13,109-Speed 5971.15 samples/sec Loss 11.4677 LearningRate 0.2746 Epoch: 5 Global Step: 52750 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:36:19,957-Speed 5982.25 samples/sec Loss 11.4515 LearningRate 0.2745 Epoch: 5 Global Step: 52760 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:36:26,820-Speed 5971.77 samples/sec Loss 11.4833 LearningRate 0.2745 Epoch: 5 Global Step: 52770 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:36:33,669-Speed 5980.94 samples/sec Loss 11.4541 LearningRate 0.2744 Epoch: 5 Global Step: 52780 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:36:40,514-Speed 5985.21 samples/sec Loss 11.3943 LearningRate 0.2744 Epoch: 5 Global Step: 52790 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:36:47,362-Speed 5984.49 samples/sec Loss 11.4266 LearningRate 0.2744 Epoch: 5 Global Step: 52800 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:36:54,212-Speed 5983.80 samples/sec Loss 11.4504 LearningRate 0.2743 Epoch: 5 Global Step: 52810 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:37:01,062-Speed 5980.42 samples/sec Loss 11.4446 LearningRate 0.2743 Epoch: 5 Global Step: 52820 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:37:07,926-Speed 5969.89 samples/sec Loss 11.4092 LearningRate 0.2743 Epoch: 5 Global Step: 52830 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:37:14,791-Speed 5967.41 samples/sec Loss 11.4371 LearningRate 0.2742 Epoch: 5 Global Step: 52840 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:37:21,678-Speed 5948.77 samples/sec Loss 11.4943 LearningRate 0.2742 Epoch: 5 Global Step: 52850 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:37:28,568-Speed 5946.43 samples/sec Loss 11.4860 LearningRate 0.2742 Epoch: 5 Global Step: 52860 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:37:35,431-Speed 5969.47 samples/sec Loss 11.5099 LearningRate 0.2741 Epoch: 5 Global Step: 52870 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:37:42,289-Speed 5973.80 samples/sec Loss 11.5244 LearningRate 0.2741 Epoch: 5 Global Step: 52880 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:37:49,163-Speed 5959.41 samples/sec Loss 11.4577 LearningRate 0.2741 Epoch: 5 Global Step: 52890 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:37:56,024-Speed 5971.43 samples/sec Loss 11.4452 LearningRate 0.2740 Epoch: 5 Global Step: 52900 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:38:02,877-Speed 5978.37 samples/sec Loss 11.4821 LearningRate 0.2740 Epoch: 5 Global Step: 52910 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:38:09,735-Speed 5974.07 samples/sec Loss 11.5000 LearningRate 0.2740 Epoch: 5 Global Step: 52920 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:38:16,578-Speed 5986.46 samples/sec Loss 11.4490 LearningRate 0.2739 Epoch: 5 Global Step: 52930 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:38:23,468-Speed 5946.09 samples/sec Loss 11.4794 LearningRate 0.2739 Epoch: 5 Global Step: 52940 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:38:30,337-Speed 5964.55 samples/sec Loss 11.4349 LearningRate 0.2738 Epoch: 5 Global Step: 52950 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:38:37,193-Speed 5975.30 samples/sec Loss 11.4043 LearningRate 0.2738 Epoch: 5 Global Step: 52960 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:38:44,063-Speed 5963.62 samples/sec Loss 11.4629 LearningRate 0.2738 Epoch: 5 Global Step: 52970 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:38:50,933-Speed 5962.93 samples/sec Loss 11.4274 LearningRate 0.2737 Epoch: 5 Global Step: 52980 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:39:01,375-Speed 3922.98 samples/sec Loss 11.4236 LearningRate 0.2737 Epoch: 5 Global Step: 52990 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:39:08,249-Speed 5961.39 samples/sec Loss 11.5724 LearningRate 0.2737 Epoch: 5 Global Step: 53000 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:39:15,087-Speed 5991.16 samples/sec Loss 11.4559 LearningRate 0.2736 Epoch: 5 Global Step: 53010 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:39:21,948-Speed 5973.68 samples/sec Loss 11.4539 LearningRate 0.2736 Epoch: 5 Global Step: 53020 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:39:28,805-Speed 5974.99 samples/sec Loss 11.4847 LearningRate 0.2736 Epoch: 5 Global Step: 53030 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:39:35,659-Speed 5976.83 samples/sec Loss 11.4393 LearningRate 0.2735 Epoch: 5 Global Step: 53040 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:39:42,527-Speed 5967.33 samples/sec Loss 11.4943 LearningRate 0.2735 Epoch: 5 Global Step: 53050 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:39:49,389-Speed 5970.42 samples/sec Loss 11.5117 LearningRate 0.2735 Epoch: 5 Global Step: 53060 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:39:56,248-Speed 5972.71 samples/sec Loss 11.4543 LearningRate 0.2734 Epoch: 5 Global Step: 53070 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:40:03,116-Speed 5965.12 samples/sec Loss 11.4938 LearningRate 0.2734 Epoch: 5 Global Step: 53080 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:40:09,958-Speed 5987.26 samples/sec Loss 11.4405 LearningRate 0.2733 Epoch: 5 Global Step: 53090 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:40:16,807-Speed 5981.37 samples/sec Loss 11.4074 LearningRate 0.2733 Epoch: 5 Global Step: 53100 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:40:23,656-Speed 5981.98 samples/sec Loss 11.4537 LearningRate 0.2733 Epoch: 5 Global Step: 53110 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:40:30,504-Speed 5982.78 samples/sec Loss 11.4024 LearningRate 0.2732 Epoch: 5 Global Step: 53120 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:40:37,449-Speed 5898.38 samples/sec Loss 11.4746 LearningRate 0.2732 Epoch: 5 Global Step: 53130 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:40:44,311-Speed 5971.76 samples/sec Loss 11.4222 LearningRate 0.2732 Epoch: 5 Global Step: 53140 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:40:51,269-Speed 5889.15 samples/sec Loss 11.3964 LearningRate 0.2731 Epoch: 5 Global Step: 53150 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:40:58,223-Speed 5891.04 samples/sec Loss 11.4217 LearningRate 0.2731 Epoch: 5 Global Step: 53160 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:41:05,176-Speed 5892.43 samples/sec Loss 11.4027 LearningRate 0.2731 Epoch: 5 Global Step: 53170 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:41:12,026-Speed 5980.60 samples/sec Loss 11.5166 LearningRate 0.2730 Epoch: 5 Global Step: 53180 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:41:18,970-Speed 5899.12 samples/sec Loss 11.4285 LearningRate 0.2730 Epoch: 5 Global Step: 53190 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:41:25,848-Speed 5956.92 samples/sec Loss 11.4626 LearningRate 0.2730 Epoch: 5 Global Step: 53200 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:41:32,701-Speed 5977.95 samples/sec Loss 11.4985 LearningRate 0.2729 Epoch: 5 Global Step: 53210 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:41:39,556-Speed 5975.96 samples/sec Loss 11.4204 LearningRate 0.2729 Epoch: 5 Global Step: 53220 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:41:46,386-Speed 5999.34 samples/sec Loss 11.4263 LearningRate 0.2729 Epoch: 5 Global Step: 53230 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 06:41:53,258-Speed 5963.56 samples/sec Loss 11.4999 LearningRate 0.2728 Epoch: 5 Global Step: 53240 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 06:42:00,106-Speed 5981.39 samples/sec Loss 11.4257 LearningRate 0.2728 Epoch: 5 Global Step: 53250 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 06:42:06,946-Speed 5990.27 samples/sec Loss 11.4249 LearningRate 0.2727 Epoch: 5 Global Step: 53260 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 06:42:13,828-Speed 5953.03 samples/sec Loss 11.4548 LearningRate 0.2727 Epoch: 5 Global Step: 53270 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 06:42:20,688-Speed 5971.19 samples/sec Loss 11.4405 LearningRate 0.2727 Epoch: 5 Global Step: 53280 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 06:42:27,542-Speed 5977.55 samples/sec Loss 11.4692 LearningRate 0.2726 Epoch: 5 Global Step: 53290 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 06:42:34,981-Speed 5509.63 samples/sec Loss 11.4277 LearningRate 0.2726 Epoch: 5 Global Step: 53300 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 06:42:41,824-Speed 5986.40 samples/sec Loss 11.3748 LearningRate 0.2726 Epoch: 5 Global Step: 53310 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 06:42:48,681-Speed 5974.15 samples/sec Loss 11.4258 LearningRate 0.2725 Epoch: 5 Global Step: 53320 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 06:42:55,546-Speed 5967.79 samples/sec Loss 11.4444 LearningRate 0.2725 Epoch: 5 Global Step: 53330 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 06:43:02,406-Speed 5972.32 samples/sec Loss 11.5383 LearningRate 0.2725 Epoch: 5 Global Step: 53340 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 06:43:09,274-Speed 5964.77 samples/sec Loss 11.4528 LearningRate 0.2724 Epoch: 5 Global Step: 53350 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 06:43:16,118-Speed 5986.23 samples/sec Loss 11.4702 LearningRate 0.2724 Epoch: 5 Global Step: 53360 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 06:43:22,968-Speed 5980.12 samples/sec Loss 11.3987 LearningRate 0.2724 Epoch: 5 Global Step: 53370 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 06:43:29,862-Speed 5942.71 samples/sec Loss 11.3670 LearningRate 0.2723 Epoch: 5 Global Step: 53380 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 06:43:36,764-Speed 5935.59 samples/sec Loss 11.4128 LearningRate 0.2723 Epoch: 5 Global Step: 53390 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 06:43:43,632-Speed 5964.82 samples/sec Loss 11.4444 LearningRate 0.2723 Epoch: 5 Global Step: 53400 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 06:43:50,493-Speed 5972.74 samples/sec Loss 11.4268 LearningRate 0.2722 Epoch: 5 Global Step: 53410 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 06:43:57,394-Speed 5937.26 samples/sec Loss 11.4234 LearningRate 0.2722 Epoch: 5 Global Step: 53420 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 06:44:04,270-Speed 5957.89 samples/sec Loss 11.4684 LearningRate 0.2721 Epoch: 5 Global Step: 53430 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:44:11,119-Speed 5981.42 samples/sec Loss 11.5047 LearningRate 0.2721 Epoch: 5 Global Step: 53440 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:44:18,068-Speed 5896.07 samples/sec Loss 11.4768 LearningRate 0.2721 Epoch: 5 Global Step: 53450 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:44:25,012-Speed 5899.47 samples/sec Loss 11.3417 LearningRate 0.2720 Epoch: 5 Global Step: 53460 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:44:31,858-Speed 5985.91 samples/sec Loss 11.4627 LearningRate 0.2720 Epoch: 5 Global Step: 53470 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:44:38,717-Speed 5973.08 samples/sec Loss 11.4122 LearningRate 0.2720 Epoch: 5 Global Step: 53480 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:44:45,570-Speed 5977.76 samples/sec Loss 11.4744 LearningRate 0.2719 Epoch: 5 Global Step: 53490 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:44:52,434-Speed 5968.42 samples/sec Loss 11.3469 LearningRate 0.2719 Epoch: 5 Global Step: 53500 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:44:59,286-Speed 5980.10 samples/sec Loss 11.4121 LearningRate 0.2719 Epoch: 5 Global Step: 53510 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:45:06,285-Speed 5853.04 samples/sec Loss 11.3730 LearningRate 0.2718 Epoch: 5 Global Step: 53520 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:45:13,147-Speed 5970.30 samples/sec Loss 11.4255 LearningRate 0.2718 Epoch: 5 Global Step: 53530 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:45:20,010-Speed 5969.52 samples/sec Loss 11.3954 LearningRate 0.2718 Epoch: 5 Global Step: 53540 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:45:26,870-Speed 5972.06 samples/sec Loss 11.3858 LearningRate 0.2717 Epoch: 5 Global Step: 53550 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:45:33,727-Speed 5975.07 samples/sec Loss 11.4615 LearningRate 0.2717 Epoch: 5 Global Step: 53560 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:45:40,580-Speed 5978.05 samples/sec Loss 11.4036 LearningRate 0.2717 Epoch: 5 Global Step: 53570 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:45:47,435-Speed 5975.90 samples/sec Loss 11.3989 LearningRate 0.2716 Epoch: 5 Global Step: 53580 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:45:54,285-Speed 5980.48 samples/sec Loss 11.3275 LearningRate 0.2716 Epoch: 5 Global Step: 53590 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:46:01,136-Speed 5980.12 samples/sec Loss 11.4364 LearningRate 0.2715 Epoch: 5 Global Step: 53600 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:46:07,994-Speed 5973.24 samples/sec Loss 11.3624 LearningRate 0.2715 Epoch: 5 Global Step: 53610 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:46:14,836-Speed 5987.97 samples/sec Loss 11.3900 LearningRate 0.2715 Epoch: 5 Global Step: 53620 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 06:46:21,696-Speed 5971.41 samples/sec Loss 11.3929 LearningRate 0.2714 Epoch: 5 Global Step: 53630 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 06:46:28,540-Speed 5986.01 samples/sec Loss 11.4063 LearningRate 0.2714 Epoch: 5 Global Step: 53640 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 06:46:35,405-Speed 5967.69 samples/sec Loss 11.4424 LearningRate 0.2714 Epoch: 5 Global Step: 53650 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 06:46:42,290-Speed 5950.09 samples/sec Loss 11.4733 LearningRate 0.2713 Epoch: 5 Global Step: 53660 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 06:46:49,150-Speed 5972.10 samples/sec Loss 11.4396 LearningRate 0.2713 Epoch: 5 Global Step: 53670 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 06:46:56,007-Speed 5974.66 samples/sec Loss 11.4256 LearningRate 0.2713 Epoch: 5 Global Step: 53680 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 06:47:02,874-Speed 5966.40 samples/sec Loss 11.3658 LearningRate 0.2712 Epoch: 5 Global Step: 53690 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 06:47:09,721-Speed 5982.53 samples/sec Loss 11.3995 LearningRate 0.2712 Epoch: 5 Global Step: 53700 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 06:47:16,581-Speed 5972.54 samples/sec Loss 11.4159 LearningRate 0.2712 Epoch: 5 Global Step: 53710 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 06:47:23,448-Speed 5966.16 samples/sec Loss 11.3732 LearningRate 0.2711 Epoch: 5 Global Step: 53720 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:47:30,306-Speed 5973.20 samples/sec Loss 11.4088 LearningRate 0.2711 Epoch: 5 Global Step: 53730 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:47:37,204-Speed 5939.51 samples/sec Loss 11.3715 LearningRate 0.2711 Epoch: 5 Global Step: 53740 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:47:44,075-Speed 5962.05 samples/sec Loss 11.4142 LearningRate 0.2710 Epoch: 5 Global Step: 53750 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:47:50,936-Speed 5971.36 samples/sec Loss 11.3491 LearningRate 0.2710 Epoch: 5 Global Step: 53760 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:47:57,783-Speed 5983.27 samples/sec Loss 11.3701 LearningRate 0.2709 Epoch: 5 Global Step: 53770 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 06:48:04,659-Speed 5958.67 samples/sec Loss 11.4149 LearningRate 0.2709 Epoch: 5 Global Step: 53780 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 06:48:11,510-Speed 5979.67 samples/sec Loss 11.4449 LearningRate 0.2709 Epoch: 5 Global Step: 53790 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 06:48:18,359-Speed 5981.25 samples/sec Loss 11.4184 LearningRate 0.2708 Epoch: 5 Global Step: 53800 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 06:48:25,219-Speed 5972.37 samples/sec Loss 11.4111 LearningRate 0.2708 Epoch: 5 Global Step: 53810 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 06:48:32,072-Speed 5977.64 samples/sec Loss 11.3983 LearningRate 0.2708 Epoch: 5 Global Step: 53820 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 06:48:38,922-Speed 5980.52 samples/sec Loss 11.3664 LearningRate 0.2707 Epoch: 5 Global Step: 53830 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 06:48:45,780-Speed 5973.83 samples/sec Loss 11.4068 LearningRate 0.2707 Epoch: 5 Global Step: 53840 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 06:48:52,630-Speed 5980.14 samples/sec Loss 11.4449 LearningRate 0.2707 Epoch: 5 Global Step: 53850 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 06:48:59,469-Speed 5990.50 samples/sec Loss 11.4383 LearningRate 0.2706 Epoch: 5 Global Step: 53860 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 06:49:06,315-Speed 5984.27 samples/sec Loss 11.3599 LearningRate 0.2706 Epoch: 5 Global Step: 53870 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:49:13,181-Speed 5966.27 samples/sec Loss 11.3964 LearningRate 0.2706 Epoch: 5 Global Step: 53880 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:49:20,037-Speed 5975.77 samples/sec Loss 11.3974 LearningRate 0.2705 Epoch: 5 Global Step: 53890 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:49:26,895-Speed 5973.80 samples/sec Loss 11.3924 LearningRate 0.2705 Epoch: 5 Global Step: 53900 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:49:33,762-Speed 5965.88 samples/sec Loss 11.3776 LearningRate 0.2705 Epoch: 5 Global Step: 53910 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:49:40,636-Speed 5960.02 samples/sec Loss 11.3024 LearningRate 0.2704 Epoch: 5 Global Step: 53920 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:49:47,506-Speed 5963.27 samples/sec Loss 11.4017 LearningRate 0.2704 Epoch: 5 Global Step: 53930 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:49:54,371-Speed 5967.58 samples/sec Loss 11.4781 LearningRate 0.2703 Epoch: 5 Global Step: 53940 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:50:01,261-Speed 5946.49 samples/sec Loss 11.3166 LearningRate 0.2703 Epoch: 5 Global Step: 53950 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:50:08,236-Speed 5873.88 samples/sec Loss 11.3461 LearningRate 0.2703 Epoch: 5 Global Step: 53960 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:50:15,091-Speed 5976.29 samples/sec Loss 11.4040 LearningRate 0.2702 Epoch: 5 Global Step: 53970 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:50:21,936-Speed 5984.84 samples/sec Loss 11.3080 LearningRate 0.2702 Epoch: 5 Global Step: 53980 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:50:28,830-Speed 5953.15 samples/sec Loss 11.3828 LearningRate 0.2702 Epoch: 5 Global Step: 53990 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:50:35,781-Speed 5894.04 samples/sec Loss 11.2722 LearningRate 0.2701 Epoch: 5 Global Step: 54000 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:50:42,650-Speed 5963.89 samples/sec Loss 11.4052 LearningRate 0.2701 Epoch: 5 Global Step: 54010 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:50:49,543-Speed 5944.18 samples/sec Loss 11.4180 LearningRate 0.2701 Epoch: 5 Global Step: 54020 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:50:56,506-Speed 5883.29 samples/sec Loss 11.3255 LearningRate 0.2700 Epoch: 5 Global Step: 54030 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:51:03,353-Speed 5983.79 samples/sec Loss 11.3233 LearningRate 0.2700 Epoch: 5 Global Step: 54040 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:51:10,227-Speed 5960.59 samples/sec Loss 11.3665 LearningRate 0.2700 Epoch: 5 Global Step: 54050 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:51:17,096-Speed 5963.40 samples/sec Loss 11.5146 LearningRate 0.2699 Epoch: 5 Global Step: 54060 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:51:23,958-Speed 5970.76 samples/sec Loss 11.4004 LearningRate 0.2699 Epoch: 5 Global Step: 54070 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:51:30,823-Speed 5967.69 samples/sec Loss 11.2959 LearningRate 0.2699 Epoch: 5 Global Step: 54080 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:51:37,695-Speed 5961.67 samples/sec Loss 11.3539 LearningRate 0.2698 Epoch: 5 Global Step: 54090 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:51:44,560-Speed 5971.53 samples/sec Loss 11.4595 LearningRate 0.2698 Epoch: 5 Global Step: 54100 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:51:51,407-Speed 5983.01 samples/sec Loss 11.3425 LearningRate 0.2697 Epoch: 5 Global Step: 54110 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:51:58,288-Speed 5953.16 samples/sec Loss 11.4040 LearningRate 0.2697 Epoch: 5 Global Step: 54120 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:52:05,134-Speed 5985.29 samples/sec Loss 11.3596 LearningRate 0.2697 Epoch: 5 Global Step: 54130 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:52:11,986-Speed 5980.48 samples/sec Loss 11.3537 LearningRate 0.2696 Epoch: 5 Global Step: 54140 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:52:18,825-Speed 5989.80 samples/sec Loss 11.4251 LearningRate 0.2696 Epoch: 5 Global Step: 54150 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:52:25,675-Speed 5981.08 samples/sec Loss 11.3821 LearningRate 0.2696 Epoch: 5 Global Step: 54160 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:52:32,561-Speed 5949.21 samples/sec Loss 11.3105 LearningRate 0.2695 Epoch: 5 Global Step: 54170 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:52:39,415-Speed 5976.58 samples/sec Loss 11.3226 LearningRate 0.2695 Epoch: 5 Global Step: 54180 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:52:46,266-Speed 5979.74 samples/sec Loss 11.3690 LearningRate 0.2695 Epoch: 5 Global Step: 54190 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:52:53,116-Speed 5981.14 samples/sec Loss 11.3985 LearningRate 0.2694 Epoch: 5 Global Step: 54200 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:52:59,983-Speed 5965.98 samples/sec Loss 11.2754 LearningRate 0.2694 Epoch: 5 Global Step: 54210 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:53:06,851-Speed 5964.57 samples/sec Loss 11.4193 LearningRate 0.2694 Epoch: 5 Global Step: 54220 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:53:13,705-Speed 5977.81 samples/sec Loss 11.3658 LearningRate 0.2693 Epoch: 5 Global Step: 54230 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:53:20,617-Speed 5926.34 samples/sec Loss 11.3486 LearningRate 0.2693 Epoch: 5 Global Step: 54240 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:53:27,473-Speed 5975.99 samples/sec Loss 11.4357 LearningRate 0.2693 Epoch: 5 Global Step: 54250 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:53:34,324-Speed 5980.77 samples/sec Loss 11.3464 LearningRate 0.2692 Epoch: 5 Global Step: 54260 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:53:41,222-Speed 5939.14 samples/sec Loss 11.4246 LearningRate 0.2692 Epoch: 5 Global Step: 54270 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:53:48,068-Speed 5984.64 samples/sec Loss 11.3295 LearningRate 0.2691 Epoch: 5 Global Step: 54280 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:53:54,919-Speed 5979.83 samples/sec Loss 11.3563 LearningRate 0.2691 Epoch: 5 Global Step: 54290 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:54:01,794-Speed 5959.03 samples/sec Loss 11.3402 LearningRate 0.2691 Epoch: 5 Global Step: 54300 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:54:08,663-Speed 5963.95 samples/sec Loss 11.3827 LearningRate 0.2690 Epoch: 5 Global Step: 54310 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:54:15,542-Speed 5956.16 samples/sec Loss 11.3982 LearningRate 0.2690 Epoch: 5 Global Step: 54320 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:54:22,384-Speed 5987.38 samples/sec Loss 11.3158 LearningRate 0.2690 Epoch: 5 Global Step: 54330 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:54:29,244-Speed 5972.18 samples/sec Loss 11.2903 LearningRate 0.2689 Epoch: 5 Global Step: 54340 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:54:36,115-Speed 5963.28 samples/sec Loss 11.4444 LearningRate 0.2689 Epoch: 5 Global Step: 54350 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:54:42,970-Speed 5976.70 samples/sec Loss 11.3694 LearningRate 0.2689 Epoch: 5 Global Step: 54360 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:54:49,855-Speed 5951.75 samples/sec Loss 11.3395 LearningRate 0.2688 Epoch: 5 Global Step: 54370 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:54:56,728-Speed 5960.10 samples/sec Loss 11.2919 LearningRate 0.2688 Epoch: 5 Global Step: 54380 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:55:03,573-Speed 5985.00 samples/sec Loss 11.3932 LearningRate 0.2688 Epoch: 5 Global Step: 54390 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:55:10,427-Speed 5977.59 samples/sec Loss 11.2594 LearningRate 0.2687 Epoch: 5 Global Step: 54400 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:55:17,285-Speed 5973.86 samples/sec Loss 11.4542 LearningRate 0.2687 Epoch: 5 Global Step: 54410 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:55:24,140-Speed 5975.82 samples/sec Loss 11.3647 LearningRate 0.2687 Epoch: 5 Global Step: 54420 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:55:31,070-Speed 5971.97 samples/sec Loss 11.3228 LearningRate 0.2686 Epoch: 5 Global Step: 54430 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:55:37,967-Speed 5940.85 samples/sec Loss 11.4798 LearningRate 0.2686 Epoch: 5 Global Step: 54440 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:55:44,823-Speed 5975.10 samples/sec Loss 11.3424 LearningRate 0.2686 Epoch: 5 Global Step: 54450 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:55:51,678-Speed 5976.14 samples/sec Loss 11.3954 LearningRate 0.2685 Epoch: 5 Global Step: 54460 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:55:58,524-Speed 5985.86 samples/sec Loss 11.3171 LearningRate 0.2685 Epoch: 5 Global Step: 54470 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:56:05,403-Speed 5954.29 samples/sec Loss 11.3385 LearningRate 0.2684 Epoch: 5 Global Step: 54480 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:56:12,254-Speed 5980.02 samples/sec Loss 11.3265 LearningRate 0.2684 Epoch: 5 Global Step: 54490 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:56:19,097-Speed 5987.33 samples/sec Loss 11.2806 LearningRate 0.2684 Epoch: 5 Global Step: 54500 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:56:25,959-Speed 5969.43 samples/sec Loss 11.2783 LearningRate 0.2683 Epoch: 5 Global Step: 54510 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:56:32,818-Speed 5973.29 samples/sec Loss 11.3985 LearningRate 0.2683 Epoch: 5 Global Step: 54520 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:56:39,698-Speed 5955.07 samples/sec Loss 11.3476 LearningRate 0.2683 Epoch: 5 Global Step: 54530 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:56:46,551-Speed 5978.08 samples/sec Loss 11.3186 LearningRate 0.2682 Epoch: 5 Global Step: 54540 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:56:53,415-Speed 5967.98 samples/sec Loss 11.3572 LearningRate 0.2682 Epoch: 5 Global Step: 54550 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:57:00,258-Speed 5987.03 samples/sec Loss 11.4009 LearningRate 0.2682 Epoch: 5 Global Step: 54560 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:57:07,123-Speed 5967.87 samples/sec Loss 11.2490 LearningRate 0.2681 Epoch: 5 Global Step: 54570 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:57:13,995-Speed 5961.65 samples/sec Loss 11.3333 LearningRate 0.2681 Epoch: 5 Global Step: 54580 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:57:20,878-Speed 5953.73 samples/sec Loss 11.2777 LearningRate 0.2681 Epoch: 5 Global Step: 54590 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:57:27,762-Speed 5951.07 samples/sec Loss 11.3559 LearningRate 0.2680 Epoch: 5 Global Step: 54600 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:57:34,614-Speed 5978.98 samples/sec Loss 11.4183 LearningRate 0.2680 Epoch: 5 Global Step: 54610 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:57:41,472-Speed 5973.73 samples/sec Loss 11.2914 LearningRate 0.2680 Epoch: 5 Global Step: 54620 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 06:57:48,318-Speed 5983.95 samples/sec Loss 11.3474 LearningRate 0.2679 Epoch: 5 Global Step: 54630 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 06:57:55,176-Speed 5973.57 samples/sec Loss 11.4114 LearningRate 0.2679 Epoch: 5 Global Step: 54640 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 06:58:02,040-Speed 5969.28 samples/sec Loss 11.3737 LearningRate 0.2678 Epoch: 5 Global Step: 54650 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 06:58:08,915-Speed 5958.87 samples/sec Loss 11.3172 LearningRate 0.2678 Epoch: 5 Global Step: 54660 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 06:58:15,772-Speed 5974.46 samples/sec Loss 11.3154 LearningRate 0.2678 Epoch: 5 Global Step: 54670 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 06:58:22,631-Speed 5973.27 samples/sec Loss 11.3862 LearningRate 0.2677 Epoch: 5 Global Step: 54680 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 06:58:29,482-Speed 5979.06 samples/sec Loss 11.3681 LearningRate 0.2677 Epoch: 5 Global Step: 54690 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 06:58:36,346-Speed 5968.63 samples/sec Loss 11.3612 LearningRate 0.2677 Epoch: 5 Global Step: 54700 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 06:58:43,227-Speed 5953.98 samples/sec Loss 11.3320 LearningRate 0.2676 Epoch: 5 Global Step: 54710 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 06:58:50,071-Speed 5984.94 samples/sec Loss 11.2942 LearningRate 0.2676 Epoch: 5 Global Step: 54720 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 06:58:56,920-Speed 5981.85 samples/sec Loss 11.2745 LearningRate 0.2676 Epoch: 5 Global Step: 54730 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:59:03,776-Speed 5975.87 samples/sec Loss 11.2779 LearningRate 0.2675 Epoch: 5 Global Step: 54740 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:59:10,639-Speed 5968.60 samples/sec Loss 11.3431 LearningRate 0.2675 Epoch: 5 Global Step: 54750 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:59:17,487-Speed 5982.60 samples/sec Loss 11.2955 LearningRate 0.2675 Epoch: 5 Global Step: 54760 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:59:24,337-Speed 5981.05 samples/sec Loss 11.2964 LearningRate 0.2674 Epoch: 5 Global Step: 54770 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:59:31,196-Speed 5973.23 samples/sec Loss 11.2732 LearningRate 0.2674 Epoch: 5 Global Step: 54780 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:59:38,070-Speed 5960.25 samples/sec Loss 11.3836 LearningRate 0.2674 Epoch: 5 Global Step: 54790 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:59:44,932-Speed 5969.86 samples/sec Loss 11.2928 LearningRate 0.2673 Epoch: 5 Global Step: 54800 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:59:51,794-Speed 5969.90 samples/sec Loss 11.3828 LearningRate 0.2673 Epoch: 5 Global Step: 54810 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 06:59:58,643-Speed 5982.13 samples/sec Loss 11.3380 LearningRate 0.2673 Epoch: 5 Global Step: 54820 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:00:05,511-Speed 5965.13 samples/sec Loss 11.3133 LearningRate 0.2672 Epoch: 5 Global Step: 54830 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:00:12,369-Speed 5973.91 samples/sec Loss 11.3018 LearningRate 0.2672 Epoch: 5 Global Step: 54840 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:00:19,238-Speed 5964.10 samples/sec Loss 11.3175 LearningRate 0.2671 Epoch: 5 Global Step: 54850 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:00:26,094-Speed 5980.12 samples/sec Loss 11.3927 LearningRate 0.2671 Epoch: 5 Global Step: 54860 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:00:32,952-Speed 5973.40 samples/sec Loss 11.2832 LearningRate 0.2671 Epoch: 5 Global Step: 54870 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:00:39,814-Speed 5972.54 samples/sec Loss 11.3478 LearningRate 0.2670 Epoch: 5 Global Step: 54880 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:00:46,684-Speed 5963.28 samples/sec Loss 11.2281 LearningRate 0.2670 Epoch: 5 Global Step: 54890 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:00:53,540-Speed 5975.01 samples/sec Loss 11.3526 LearningRate 0.2670 Epoch: 5 Global Step: 54900 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:01:00,387-Speed 5983.41 samples/sec Loss 11.3697 LearningRate 0.2669 Epoch: 5 Global Step: 54910 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:01:07,247-Speed 5974.15 samples/sec Loss 11.3304 LearningRate 0.2669 Epoch: 5 Global Step: 54920 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:01:14,118-Speed 5962.68 samples/sec Loss 11.2850 LearningRate 0.2669 Epoch: 5 Global Step: 54930 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:01:20,992-Speed 5959.72 samples/sec Loss 11.3023 LearningRate 0.2668 Epoch: 5 Global Step: 54940 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:01:27,840-Speed 5984.66 samples/sec Loss 11.3579 LearningRate 0.2668 Epoch: 5 Global Step: 54950 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:01:34,725-Speed 5949.96 samples/sec Loss 11.4219 LearningRate 0.2668 Epoch: 5 Global Step: 54960 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:01:41,600-Speed 5959.85 samples/sec Loss 11.3894 LearningRate 0.2667 Epoch: 5 Global Step: 54970 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:01:48,489-Speed 5946.40 samples/sec Loss 11.2812 LearningRate 0.2667 Epoch: 5 Global Step: 54980 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:01:55,359-Speed 5963.68 samples/sec Loss 11.2453 LearningRate 0.2667 Epoch: 5 Global Step: 54990 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:02:02,212-Speed 5978.43 samples/sec Loss 11.2946 LearningRate 0.2666 Epoch: 5 Global Step: 55000 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 07:02:32,176-[lfw][55000]XNorm: 23.234773 Training: 2022-01-08 07:02:32,177-[lfw][55000]Accuracy-Flip: 0.99667+-0.00289 Training: 2022-01-08 07:02:32,178-[lfw][55000]Accuracy-Highest: 0.99700 Training: 2022-01-08 07:03:02,970-[cfp_fp][55000]XNorm: 20.125612 Training: 2022-01-08 07:03:02,971-[cfp_fp][55000]Accuracy-Flip: 0.97686+-0.00676 Training: 2022-01-08 07:03:02,972-[cfp_fp][55000]Accuracy-Highest: 0.97686 Training: 2022-01-08 07:03:34,405-[agedb_30][55000]XNorm: 22.376593 Training: 2022-01-08 07:03:34,406-[agedb_30][55000]Accuracy-Flip: 0.96150+-0.01050 Training: 2022-01-08 07:03:34,406-[agedb_30][55000]Accuracy-Highest: 0.96283 Training: 2022-01-08 07:03:41,281-Speed 413.45 samples/sec Loss 11.3682 LearningRate 0.2666 Epoch: 5 Global Step: 55010 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 07:03:48,130-Speed 5981.99 samples/sec Loss 11.3465 LearningRate 0.2666 Epoch: 5 Global Step: 55020 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 07:03:54,961-Speed 5997.58 samples/sec Loss 11.3370 LearningRate 0.2665 Epoch: 5 Global Step: 55030 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 07:04:01,812-Speed 5980.32 samples/sec Loss 11.2941 LearningRate 0.2665 Epoch: 5 Global Step: 55040 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 07:04:08,664-Speed 5979.25 samples/sec Loss 11.2405 LearningRate 0.2664 Epoch: 5 Global Step: 55050 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 07:04:15,531-Speed 5965.39 samples/sec Loss 11.3544 LearningRate 0.2664 Epoch: 5 Global Step: 55060 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 07:04:22,389-Speed 5974.19 samples/sec Loss 11.1859 LearningRate 0.2664 Epoch: 5 Global Step: 55070 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 07:04:29,255-Speed 5966.66 samples/sec Loss 11.3670 LearningRate 0.2663 Epoch: 5 Global Step: 55080 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 07:04:36,123-Speed 5965.50 samples/sec Loss 11.1874 LearningRate 0.2663 Epoch: 5 Global Step: 55090 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 07:04:43,013-Speed 5946.14 samples/sec Loss 11.3795 LearningRate 0.2663 Epoch: 5 Global Step: 55100 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:04:49,881-Speed 5965.34 samples/sec Loss 11.3481 LearningRate 0.2662 Epoch: 5 Global Step: 55110 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:04:56,767-Speed 5948.92 samples/sec Loss 11.3349 LearningRate 0.2662 Epoch: 5 Global Step: 55120 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:05:03,625-Speed 5976.88 samples/sec Loss 11.3137 LearningRate 0.2662 Epoch: 5 Global Step: 55130 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:05:10,489-Speed 5968.54 samples/sec Loss 11.2717 LearningRate 0.2661 Epoch: 5 Global Step: 55140 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:05:17,361-Speed 5961.89 samples/sec Loss 11.4158 LearningRate 0.2661 Epoch: 5 Global Step: 55150 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:05:24,212-Speed 5980.41 samples/sec Loss 11.3715 LearningRate 0.2661 Epoch: 5 Global Step: 55160 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:05:31,072-Speed 5971.92 samples/sec Loss 11.2714 LearningRate 0.2660 Epoch: 5 Global Step: 55170 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:05:37,920-Speed 5982.46 samples/sec Loss 11.3868 LearningRate 0.2660 Epoch: 5 Global Step: 55180 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:05:44,825-Speed 5932.90 samples/sec Loss 11.2595 LearningRate 0.2660 Epoch: 5 Global Step: 55190 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:05:51,697-Speed 5961.19 samples/sec Loss 11.2131 LearningRate 0.2659 Epoch: 5 Global Step: 55200 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:05:58,562-Speed 5968.00 samples/sec Loss 11.2974 LearningRate 0.2659 Epoch: 5 Global Step: 55210 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:06:05,436-Speed 5960.31 samples/sec Loss 11.2015 LearningRate 0.2659 Epoch: 5 Global Step: 55220 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:06:12,295-Speed 5972.60 samples/sec Loss 11.3090 LearningRate 0.2658 Epoch: 5 Global Step: 55230 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:06:19,155-Speed 5972.44 samples/sec Loss 11.2947 LearningRate 0.2658 Epoch: 5 Global Step: 55240 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:06:26,042-Speed 5948.10 samples/sec Loss 11.2690 LearningRate 0.2657 Epoch: 5 Global Step: 55250 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:06:32,900-Speed 5974.03 samples/sec Loss 11.3149 LearningRate 0.2657 Epoch: 5 Global Step: 55260 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:06:39,760-Speed 5972.34 samples/sec Loss 11.3140 LearningRate 0.2657 Epoch: 5 Global Step: 55270 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:06:46,608-Speed 5981.96 samples/sec Loss 11.2784 LearningRate 0.2656 Epoch: 5 Global Step: 55280 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:06:53,455-Speed 5982.84 samples/sec Loss 11.2905 LearningRate 0.2656 Epoch: 5 Global Step: 55290 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:07:00,315-Speed 5972.29 samples/sec Loss 11.3780 LearningRate 0.2656 Epoch: 5 Global Step: 55300 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:07:07,188-Speed 5961.21 samples/sec Loss 11.2025 LearningRate 0.2655 Epoch: 5 Global Step: 55310 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:07:14,042-Speed 5976.74 samples/sec Loss 11.2624 LearningRate 0.2655 Epoch: 5 Global Step: 55320 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:07:20,910-Speed 5967.40 samples/sec Loss 11.2660 LearningRate 0.2655 Epoch: 5 Global Step: 55330 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:07:27,785-Speed 5959.03 samples/sec Loss 11.2534 LearningRate 0.2654 Epoch: 5 Global Step: 55340 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:07:34,689-Speed 5933.62 samples/sec Loss 11.2732 LearningRate 0.2654 Epoch: 5 Global Step: 55350 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:07:41,536-Speed 5983.39 samples/sec Loss 11.3006 LearningRate 0.2654 Epoch: 5 Global Step: 55360 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:07:48,410-Speed 5962.15 samples/sec Loss 11.2798 LearningRate 0.2653 Epoch: 5 Global Step: 55370 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:07:55,265-Speed 5976.58 samples/sec Loss 11.2999 LearningRate 0.2653 Epoch: 5 Global Step: 55380 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:08:02,210-Speed 5901.31 samples/sec Loss 11.2755 LearningRate 0.2653 Epoch: 5 Global Step: 55390 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:08:09,064-Speed 5977.41 samples/sec Loss 11.2481 LearningRate 0.2652 Epoch: 5 Global Step: 55400 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:08:15,956-Speed 5944.10 samples/sec Loss 11.3533 LearningRate 0.2652 Epoch: 5 Global Step: 55410 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:08:22,835-Speed 5956.06 samples/sec Loss 11.2378 LearningRate 0.2652 Epoch: 5 Global Step: 55420 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:08:29,702-Speed 5965.44 samples/sec Loss 11.2548 LearningRate 0.2651 Epoch: 5 Global Step: 55430 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:08:36,595-Speed 5943.61 samples/sec Loss 11.2941 LearningRate 0.2651 Epoch: 5 Global Step: 55440 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:08:43,469-Speed 5959.94 samples/sec Loss 11.2498 LearningRate 0.2651 Epoch: 5 Global Step: 55450 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:08:50,339-Speed 5963.80 samples/sec Loss 11.2776 LearningRate 0.2650 Epoch: 5 Global Step: 55460 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:08:57,206-Speed 5966.27 samples/sec Loss 11.3533 LearningRate 0.2650 Epoch: 5 Global Step: 55470 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:09:04,068-Speed 5971.91 samples/sec Loss 11.2544 LearningRate 0.2649 Epoch: 5 Global Step: 55480 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:09:10,957-Speed 5947.88 samples/sec Loss 11.2965 LearningRate 0.2649 Epoch: 5 Global Step: 55490 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:09:17,823-Speed 5966.67 samples/sec Loss 11.2688 LearningRate 0.2649 Epoch: 5 Global Step: 55500 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:09:24,689-Speed 5966.34 samples/sec Loss 11.2649 LearningRate 0.2648 Epoch: 5 Global Step: 55510 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:09:31,568-Speed 5955.59 samples/sec Loss 11.2140 LearningRate 0.2648 Epoch: 5 Global Step: 55520 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:09:38,430-Speed 5970.19 samples/sec Loss 11.2686 LearningRate 0.2648 Epoch: 5 Global Step: 55530 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:09:45,323-Speed 5943.17 samples/sec Loss 11.3260 LearningRate 0.2647 Epoch: 5 Global Step: 55540 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:09:52,194-Speed 5962.38 samples/sec Loss 11.3566 LearningRate 0.2647 Epoch: 5 Global Step: 55550 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:09:59,065-Speed 5963.17 samples/sec Loss 11.3220 LearningRate 0.2647 Epoch: 5 Global Step: 55560 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:10:05,935-Speed 5962.97 samples/sec Loss 11.1931 LearningRate 0.2646 Epoch: 5 Global Step: 55570 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:10:12,805-Speed 5963.85 samples/sec Loss 11.2436 LearningRate 0.2646 Epoch: 5 Global Step: 55580 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:10:19,707-Speed 5935.54 samples/sec Loss 11.2994 LearningRate 0.2646 Epoch: 5 Global Step: 55590 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:10:26,718-Speed 5843.92 samples/sec Loss 11.2869 LearningRate 0.2645 Epoch: 5 Global Step: 55600 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:10:33,572-Speed 5978.53 samples/sec Loss 11.2377 LearningRate 0.2645 Epoch: 5 Global Step: 55610 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:10:40,461-Speed 5946.74 samples/sec Loss 11.3155 LearningRate 0.2645 Epoch: 5 Global Step: 55620 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:10:47,338-Speed 5957.32 samples/sec Loss 11.2371 LearningRate 0.2644 Epoch: 5 Global Step: 55630 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:10:54,194-Speed 5975.82 samples/sec Loss 11.2732 LearningRate 0.2644 Epoch: 5 Global Step: 55640 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:11:01,065-Speed 5962.57 samples/sec Loss 11.2463 LearningRate 0.2644 Epoch: 5 Global Step: 55650 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:11:07,935-Speed 5962.66 samples/sec Loss 11.2890 LearningRate 0.2643 Epoch: 5 Global Step: 55660 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:11:14,792-Speed 5975.27 samples/sec Loss 11.1755 LearningRate 0.2643 Epoch: 5 Global Step: 55670 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:11:21,649-Speed 5974.03 samples/sec Loss 11.2854 LearningRate 0.2642 Epoch: 5 Global Step: 55680 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:11:28,503-Speed 5978.06 samples/sec Loss 11.3145 LearningRate 0.2642 Epoch: 5 Global Step: 55690 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:11:35,375-Speed 5961.16 samples/sec Loss 11.2413 LearningRate 0.2642 Epoch: 5 Global Step: 55700 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:11:42,268-Speed 5943.34 samples/sec Loss 11.1962 LearningRate 0.2641 Epoch: 5 Global Step: 55710 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:11:49,144-Speed 5958.98 samples/sec Loss 11.2400 LearningRate 0.2641 Epoch: 5 Global Step: 55720 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:11:56,006-Speed 5970.33 samples/sec Loss 11.2156 LearningRate 0.2641 Epoch: 5 Global Step: 55730 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:12:02,976-Speed 5877.63 samples/sec Loss 11.3133 LearningRate 0.2640 Epoch: 5 Global Step: 55740 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:12:09,844-Speed 5964.93 samples/sec Loss 11.2011 LearningRate 0.2640 Epoch: 5 Global Step: 55750 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:12:16,695-Speed 5979.67 samples/sec Loss 11.2513 LearningRate 0.2640 Epoch: 5 Global Step: 55760 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:12:23,574-Speed 5956.12 samples/sec Loss 11.1813 LearningRate 0.2639 Epoch: 5 Global Step: 55770 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:12:30,428-Speed 5977.08 samples/sec Loss 11.2444 LearningRate 0.2639 Epoch: 5 Global Step: 55780 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:12:37,282-Speed 5977.25 samples/sec Loss 11.2316 LearningRate 0.2639 Epoch: 5 Global Step: 55790 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:12:44,156-Speed 5959.49 samples/sec Loss 11.2493 LearningRate 0.2638 Epoch: 5 Global Step: 55800 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:12:51,012-Speed 5975.21 samples/sec Loss 11.1891 LearningRate 0.2638 Epoch: 5 Global Step: 55810 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:12:57,885-Speed 5961.72 samples/sec Loss 11.3172 LearningRate 0.2638 Epoch: 5 Global Step: 55820 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:13:04,746-Speed 5971.18 samples/sec Loss 11.2683 LearningRate 0.2637 Epoch: 5 Global Step: 55830 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:13:11,597-Speed 5980.17 samples/sec Loss 11.2734 LearningRate 0.2637 Epoch: 5 Global Step: 55840 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:13:18,454-Speed 5974.15 samples/sec Loss 11.3040 LearningRate 0.2637 Epoch: 5 Global Step: 55850 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:13:25,312-Speed 5973.98 samples/sec Loss 11.2346 LearningRate 0.2636 Epoch: 5 Global Step: 55860 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:13:32,173-Speed 5970.53 samples/sec Loss 11.2489 LearningRate 0.2636 Epoch: 5 Global Step: 55870 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:13:39,054-Speed 5954.17 samples/sec Loss 11.2300 LearningRate 0.2636 Epoch: 5 Global Step: 55880 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:13:45,935-Speed 5953.85 samples/sec Loss 11.2527 LearningRate 0.2635 Epoch: 5 Global Step: 55890 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:13:52,788-Speed 5978.73 samples/sec Loss 11.3270 LearningRate 0.2635 Epoch: 5 Global Step: 55900 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:13:59,641-Speed 5978.43 samples/sec Loss 11.2888 LearningRate 0.2634 Epoch: 5 Global Step: 55910 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:14:06,494-Speed 5978.24 samples/sec Loss 11.3083 LearningRate 0.2634 Epoch: 5 Global Step: 55920 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:14:13,380-Speed 5952.74 samples/sec Loss 11.2782 LearningRate 0.2634 Epoch: 5 Global Step: 55930 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:14:20,271-Speed 5947.16 samples/sec Loss 11.2032 LearningRate 0.2633 Epoch: 5 Global Step: 55940 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:14:27,140-Speed 5964.34 samples/sec Loss 11.2626 LearningRate 0.2633 Epoch: 5 Global Step: 55950 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:14:33,987-Speed 5983.13 samples/sec Loss 11.2855 LearningRate 0.2633 Epoch: 5 Global Step: 55960 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:14:40,846-Speed 5972.81 samples/sec Loss 11.2210 LearningRate 0.2632 Epoch: 5 Global Step: 55970 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:14:47,720-Speed 5960.15 samples/sec Loss 11.1144 LearningRate 0.2632 Epoch: 5 Global Step: 55980 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:14:54,584-Speed 5970.07 samples/sec Loss 11.2511 LearningRate 0.2632 Epoch: 5 Global Step: 55990 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:15:01,431-Speed 5983.56 samples/sec Loss 11.2357 LearningRate 0.2631 Epoch: 5 Global Step: 56000 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:15:08,314-Speed 5952.60 samples/sec Loss 11.2155 LearningRate 0.2631 Epoch: 5 Global Step: 56010 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:15:15,170-Speed 5975.42 samples/sec Loss 11.2207 LearningRate 0.2631 Epoch: 5 Global Step: 56020 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:15:22,049-Speed 5955.96 samples/sec Loss 11.2095 LearningRate 0.2630 Epoch: 5 Global Step: 56030 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:15:28,900-Speed 5978.96 samples/sec Loss 11.1279 LearningRate 0.2630 Epoch: 5 Global Step: 56040 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:15:35,745-Speed 5985.21 samples/sec Loss 11.1886 LearningRate 0.2630 Epoch: 5 Global Step: 56050 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:15:42,592-Speed 5983.65 samples/sec Loss 11.2398 LearningRate 0.2629 Epoch: 5 Global Step: 56060 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:15:49,465-Speed 5959.95 samples/sec Loss 11.2409 LearningRate 0.2629 Epoch: 5 Global Step: 56070 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:15:56,399-Speed 5909.10 samples/sec Loss 11.2019 LearningRate 0.2629 Epoch: 5 Global Step: 56080 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:16:03,334-Speed 5906.86 samples/sec Loss 11.2150 LearningRate 0.2628 Epoch: 5 Global Step: 56090 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:16:10,285-Speed 5893.50 samples/sec Loss 11.1812 LearningRate 0.2628 Epoch: 5 Global Step: 56100 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:16:17,184-Speed 5938.28 samples/sec Loss 11.2175 LearningRate 0.2628 Epoch: 5 Global Step: 56110 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 07:16:24,121-Speed 5905.79 samples/sec Loss 11.1985 LearningRate 0.2627 Epoch: 5 Global Step: 56120 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 07:16:30,991-Speed 5963.35 samples/sec Loss 11.2451 LearningRate 0.2627 Epoch: 5 Global Step: 56130 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 07:16:37,839-Speed 5981.65 samples/sec Loss 11.2638 LearningRate 0.2626 Epoch: 5 Global Step: 56140 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 07:16:44,698-Speed 5975.43 samples/sec Loss 11.1873 LearningRate 0.2626 Epoch: 5 Global Step: 56150 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 07:16:51,546-Speed 5982.08 samples/sec Loss 11.2722 LearningRate 0.2626 Epoch: 5 Global Step: 56160 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 07:16:58,417-Speed 5961.83 samples/sec Loss 11.2465 LearningRate 0.2625 Epoch: 5 Global Step: 56170 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 07:17:05,271-Speed 5977.09 samples/sec Loss 11.1509 LearningRate 0.2625 Epoch: 5 Global Step: 56180 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 07:17:12,127-Speed 5975.80 samples/sec Loss 11.2253 LearningRate 0.2625 Epoch: 5 Global Step: 56190 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 07:17:19,000-Speed 5960.23 samples/sec Loss 11.1712 LearningRate 0.2624 Epoch: 5 Global Step: 56200 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 07:17:25,849-Speed 5982.22 samples/sec Loss 11.2300 LearningRate 0.2624 Epoch: 5 Global Step: 56210 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:17:32,707-Speed 5973.40 samples/sec Loss 11.1865 LearningRate 0.2624 Epoch: 5 Global Step: 56220 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:17:39,564-Speed 5975.78 samples/sec Loss 11.2988 LearningRate 0.2623 Epoch: 5 Global Step: 56230 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:17:46,414-Speed 5980.73 samples/sec Loss 11.2335 LearningRate 0.2623 Epoch: 5 Global Step: 56240 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:17:53,290-Speed 5957.87 samples/sec Loss 11.2314 LearningRate 0.2623 Epoch: 5 Global Step: 56250 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:18:00,137-Speed 5983.42 samples/sec Loss 11.1939 LearningRate 0.2622 Epoch: 5 Global Step: 56260 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:18:06,985-Speed 5982.16 samples/sec Loss 11.1826 LearningRate 0.2622 Epoch: 5 Global Step: 56270 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:18:13,833-Speed 5982.02 samples/sec Loss 11.2296 LearningRate 0.2622 Epoch: 5 Global Step: 56280 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:18:20,728-Speed 5941.85 samples/sec Loss 11.1156 LearningRate 0.2621 Epoch: 5 Global Step: 56290 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:18:27,589-Speed 5971.21 samples/sec Loss 11.2010 LearningRate 0.2621 Epoch: 5 Global Step: 56300 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:18:34,452-Speed 5968.83 samples/sec Loss 11.1554 LearningRate 0.2621 Epoch: 5 Global Step: 56310 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:18:41,308-Speed 5975.89 samples/sec Loss 11.2020 LearningRate 0.2620 Epoch: 5 Global Step: 56320 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:18:48,176-Speed 5965.08 samples/sec Loss 11.2208 LearningRate 0.2620 Epoch: 5 Global Step: 56330 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:18:55,072-Speed 5939.95 samples/sec Loss 11.1764 LearningRate 0.2620 Epoch: 5 Global Step: 56340 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:19:01,940-Speed 5965.01 samples/sec Loss 11.1523 LearningRate 0.2619 Epoch: 5 Global Step: 56350 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:19:08,819-Speed 5955.73 samples/sec Loss 11.2187 LearningRate 0.2619 Epoch: 5 Global Step: 56360 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:19:15,672-Speed 5977.91 samples/sec Loss 11.2089 LearningRate 0.2619 Epoch: 5 Global Step: 56370 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:19:22,552-Speed 5962.33 samples/sec Loss 11.1389 LearningRate 0.2618 Epoch: 5 Global Step: 56380 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:19:29,421-Speed 5964.39 samples/sec Loss 11.1998 LearningRate 0.2618 Epoch: 5 Global Step: 56390 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:19:36,320-Speed 5938.21 samples/sec Loss 11.1104 LearningRate 0.2617 Epoch: 5 Global Step: 56400 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:19:43,194-Speed 5959.71 samples/sec Loss 11.2275 LearningRate 0.2617 Epoch: 5 Global Step: 56410 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:19:50,054-Speed 5972.11 samples/sec Loss 11.2906 LearningRate 0.2617 Epoch: 5 Global Step: 56420 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:19:56,923-Speed 5964.43 samples/sec Loss 11.1708 LearningRate 0.2616 Epoch: 5 Global Step: 56430 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:20:03,774-Speed 5981.04 samples/sec Loss 11.2191 LearningRate 0.2616 Epoch: 5 Global Step: 56440 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:20:10,640-Speed 5966.60 samples/sec Loss 11.2279 LearningRate 0.2616 Epoch: 5 Global Step: 56450 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:20:17,516-Speed 5958.03 samples/sec Loss 11.1249 LearningRate 0.2615 Epoch: 5 Global Step: 56460 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:20:24,426-Speed 5928.93 samples/sec Loss 11.1703 LearningRate 0.2615 Epoch: 5 Global Step: 56470 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:20:31,332-Speed 5932.32 samples/sec Loss 11.2674 LearningRate 0.2615 Epoch: 5 Global Step: 56480 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:20:38,211-Speed 5955.40 samples/sec Loss 11.2668 LearningRate 0.2614 Epoch: 5 Global Step: 56490 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:20:45,187-Speed 5875.15 samples/sec Loss 11.2227 LearningRate 0.2614 Epoch: 5 Global Step: 56500 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:20:52,049-Speed 5971.14 samples/sec Loss 11.2073 LearningRate 0.2614 Epoch: 5 Global Step: 56510 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:20:58,915-Speed 5965.88 samples/sec Loss 11.1899 LearningRate 0.2613 Epoch: 5 Global Step: 56520 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:21:05,784-Speed 5966.88 samples/sec Loss 11.2031 LearningRate 0.2613 Epoch: 5 Global Step: 56530 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:21:12,637-Speed 5978.84 samples/sec Loss 11.2031 LearningRate 0.2613 Epoch: 5 Global Step: 56540 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:21:19,485-Speed 5981.79 samples/sec Loss 11.2058 LearningRate 0.2612 Epoch: 5 Global Step: 56550 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:21:26,327-Speed 5987.62 samples/sec Loss 11.1969 LearningRate 0.2612 Epoch: 5 Global Step: 56560 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:21:33,176-Speed 5981.49 samples/sec Loss 11.2458 LearningRate 0.2612 Epoch: 5 Global Step: 56570 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:21:40,025-Speed 5981.20 samples/sec Loss 11.2224 LearningRate 0.2611 Epoch: 5 Global Step: 56580 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:21:46,895-Speed 5963.28 samples/sec Loss 11.1606 LearningRate 0.2611 Epoch: 5 Global Step: 56590 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:21:53,746-Speed 5980.41 samples/sec Loss 11.1983 LearningRate 0.2611 Epoch: 5 Global Step: 56600 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:22:00,619-Speed 5960.59 samples/sec Loss 11.1825 LearningRate 0.2610 Epoch: 5 Global Step: 56610 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:22:07,479-Speed 5972.07 samples/sec Loss 11.1591 LearningRate 0.2610 Epoch: 5 Global Step: 56620 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:22:14,328-Speed 5981.36 samples/sec Loss 11.2154 LearningRate 0.2609 Epoch: 5 Global Step: 56630 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:22:21,189-Speed 5970.78 samples/sec Loss 11.2107 LearningRate 0.2609 Epoch: 5 Global Step: 56640 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:22:28,038-Speed 5981.50 samples/sec Loss 11.0809 LearningRate 0.2609 Epoch: 5 Global Step: 56650 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 07:22:34,895-Speed 5975.02 samples/sec Loss 11.1928 LearningRate 0.2608 Epoch: 5 Global Step: 56660 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 07:22:41,739-Speed 5985.70 samples/sec Loss 11.2299 LearningRate 0.2608 Epoch: 5 Global Step: 56670 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 07:22:48,591-Speed 5979.46 samples/sec Loss 11.1396 LearningRate 0.2608 Epoch: 5 Global Step: 56680 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 07:22:55,447-Speed 5977.86 samples/sec Loss 11.0586 LearningRate 0.2607 Epoch: 5 Global Step: 56690 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 07:23:02,310-Speed 5969.07 samples/sec Loss 11.2740 LearningRate 0.2607 Epoch: 5 Global Step: 56700 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 07:23:09,164-Speed 5977.03 samples/sec Loss 11.1954 LearningRate 0.2607 Epoch: 5 Global Step: 56710 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 07:23:16,032-Speed 5969.37 samples/sec Loss 11.2045 LearningRate 0.2606 Epoch: 5 Global Step: 56720 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 07:23:22,893-Speed 5970.52 samples/sec Loss 11.2336 LearningRate 0.2606 Epoch: 5 Global Step: 56730 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 07:23:29,733-Speed 5989.80 samples/sec Loss 11.0642 LearningRate 0.2606 Epoch: 5 Global Step: 56740 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 07:23:36,599-Speed 5967.33 samples/sec Loss 11.0882 LearningRate 0.2605 Epoch: 5 Global Step: 56750 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:23:43,445-Speed 5983.57 samples/sec Loss 11.1613 LearningRate 0.2605 Epoch: 5 Global Step: 56760 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:23:50,299-Speed 5977.16 samples/sec Loss 11.0792 LearningRate 0.2605 Epoch: 5 Global Step: 56770 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:23:57,174-Speed 5959.66 samples/sec Loss 11.2475 LearningRate 0.2604 Epoch: 5 Global Step: 56780 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:24:04,014-Speed 5989.47 samples/sec Loss 11.2009 LearningRate 0.2604 Epoch: 5 Global Step: 56790 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:24:10,899-Speed 5950.98 samples/sec Loss 11.1939 LearningRate 0.2604 Epoch: 5 Global Step: 56800 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:24:17,788-Speed 5951.35 samples/sec Loss 11.2414 LearningRate 0.2603 Epoch: 5 Global Step: 56810 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:24:24,639-Speed 5979.71 samples/sec Loss 11.1258 LearningRate 0.2603 Epoch: 5 Global Step: 56820 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:24:31,483-Speed 5985.67 samples/sec Loss 11.2144 LearningRate 0.2603 Epoch: 5 Global Step: 56830 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:24:38,358-Speed 5959.13 samples/sec Loss 11.0928 LearningRate 0.2602 Epoch: 5 Global Step: 56840 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 07:24:45,206-Speed 5982.78 samples/sec Loss 11.1562 LearningRate 0.2602 Epoch: 5 Global Step: 56850 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:24:52,052-Speed 5983.43 samples/sec Loss 11.1492 LearningRate 0.2602 Epoch: 5 Global Step: 56860 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:24:58,929-Speed 5983.41 samples/sec Loss 11.1047 LearningRate 0.2601 Epoch: 5 Global Step: 56870 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:25:05,797-Speed 5964.21 samples/sec Loss 11.1577 LearningRate 0.2601 Epoch: 5 Global Step: 56880 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 07:25:12,669-Speed 5962.20 samples/sec Loss 11.2167 LearningRate 0.2600 Epoch: 5 Global Step: 56890 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:25:19,525-Speed 5975.20 samples/sec Loss 11.1963 LearningRate 0.2600 Epoch: 5 Global Step: 56900 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:25:26,389-Speed 5968.33 samples/sec Loss 11.2066 LearningRate 0.2600 Epoch: 5 Global Step: 56910 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:25:33,236-Speed 5983.67 samples/sec Loss 11.1415 LearningRate 0.2599 Epoch: 5 Global Step: 56920 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:25:40,080-Speed 5986.09 samples/sec Loss 11.2373 LearningRate 0.2599 Epoch: 5 Global Step: 56930 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:25:46,934-Speed 5976.31 samples/sec Loss 11.1345 LearningRate 0.2599 Epoch: 5 Global Step: 56940 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:25:53,780-Speed 5984.50 samples/sec Loss 11.1296 LearningRate 0.2598 Epoch: 5 Global Step: 56950 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:26:00,629-Speed 5981.02 samples/sec Loss 11.1936 LearningRate 0.2598 Epoch: 5 Global Step: 56960 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:26:07,473-Speed 5985.41 samples/sec Loss 11.1373 LearningRate 0.2598 Epoch: 5 Global Step: 56970 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:26:14,341-Speed 5965.40 samples/sec Loss 11.2421 LearningRate 0.2597 Epoch: 5 Global Step: 56980 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:26:21,196-Speed 5976.15 samples/sec Loss 11.1271 LearningRate 0.2597 Epoch: 5 Global Step: 56990 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:26:28,060-Speed 5971.05 samples/sec Loss 11.1361 LearningRate 0.2597 Epoch: 5 Global Step: 57000 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:26:34,924-Speed 5968.22 samples/sec Loss 11.0740 LearningRate 0.2596 Epoch: 5 Global Step: 57010 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:26:41,789-Speed 5968.12 samples/sec Loss 11.1932 LearningRate 0.2596 Epoch: 5 Global Step: 57020 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:26:48,633-Speed 5985.37 samples/sec Loss 11.1386 LearningRate 0.2596 Epoch: 5 Global Step: 57030 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:26:55,550-Speed 5922.64 samples/sec Loss 11.1205 LearningRate 0.2595 Epoch: 5 Global Step: 57040 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:27:02,422-Speed 5961.91 samples/sec Loss 11.1001 LearningRate 0.2595 Epoch: 5 Global Step: 57050 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:27:09,302-Speed 5953.74 samples/sec Loss 11.1075 LearningRate 0.2595 Epoch: 5 Global Step: 57060 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:27:16,165-Speed 5970.24 samples/sec Loss 11.1885 LearningRate 0.2594 Epoch: 5 Global Step: 57070 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:27:23,012-Speed 5983.08 samples/sec Loss 11.0905 LearningRate 0.2594 Epoch: 5 Global Step: 57080 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:27:29,882-Speed 5963.30 samples/sec Loss 11.1587 LearningRate 0.2594 Epoch: 5 Global Step: 57090 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:27:36,758-Speed 5958.12 samples/sec Loss 11.2329 LearningRate 0.2593 Epoch: 5 Global Step: 57100 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:27:43,645-Speed 5948.76 samples/sec Loss 11.0608 LearningRate 0.2593 Epoch: 5 Global Step: 57110 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:27:50,492-Speed 5983.10 samples/sec Loss 11.1414 LearningRate 0.2593 Epoch: 5 Global Step: 57120 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:27:57,352-Speed 5972.07 samples/sec Loss 11.1557 LearningRate 0.2592 Epoch: 5 Global Step: 57130 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:28:04,205-Speed 5978.01 samples/sec Loss 11.0733 LearningRate 0.2592 Epoch: 5 Global Step: 57140 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:28:11,075-Speed 5963.80 samples/sec Loss 11.0784 LearningRate 0.2592 Epoch: 5 Global Step: 57150 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:28:17,949-Speed 5959.26 samples/sec Loss 11.1448 LearningRate 0.2591 Epoch: 5 Global Step: 57160 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:28:24,810-Speed 5971.50 samples/sec Loss 11.1127 LearningRate 0.2591 Epoch: 5 Global Step: 57170 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:28:31,661-Speed 5980.31 samples/sec Loss 11.1240 LearningRate 0.2590 Epoch: 5 Global Step: 57180 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:28:38,514-Speed 5977.55 samples/sec Loss 11.2201 LearningRate 0.2590 Epoch: 5 Global Step: 57190 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:28:45,379-Speed 5967.96 samples/sec Loss 11.1560 LearningRate 0.2590 Epoch: 5 Global Step: 57200 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:28:52,232-Speed 5977.79 samples/sec Loss 11.1579 LearningRate 0.2589 Epoch: 5 Global Step: 57210 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:28:59,076-Speed 5985.55 samples/sec Loss 11.1047 LearningRate 0.2589 Epoch: 5 Global Step: 57220 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:29:05,951-Speed 5959.64 samples/sec Loss 11.1447 LearningRate 0.2589 Epoch: 5 Global Step: 57230 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:29:12,801-Speed 5979.78 samples/sec Loss 11.0994 LearningRate 0.2588 Epoch: 5 Global Step: 57240 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:29:19,657-Speed 5975.81 samples/sec Loss 11.1860 LearningRate 0.2588 Epoch: 5 Global Step: 57250 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:29:26,534-Speed 5957.26 samples/sec Loss 11.2132 LearningRate 0.2588 Epoch: 5 Global Step: 57260 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:29:33,405-Speed 5961.82 samples/sec Loss 11.1179 LearningRate 0.2587 Epoch: 5 Global Step: 57270 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:29:40,285-Speed 5956.86 samples/sec Loss 11.1156 LearningRate 0.2587 Epoch: 5 Global Step: 57280 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:29:47,134-Speed 5982.49 samples/sec Loss 11.1894 LearningRate 0.2587 Epoch: 5 Global Step: 57290 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:29:53,989-Speed 5975.10 samples/sec Loss 11.1284 LearningRate 0.2586 Epoch: 5 Global Step: 57300 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:30:00,841-Speed 5979.80 samples/sec Loss 11.1461 LearningRate 0.2586 Epoch: 5 Global Step: 57310 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:30:07,691-Speed 5980.36 samples/sec Loss 11.1647 LearningRate 0.2586 Epoch: 5 Global Step: 57320 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:30:14,544-Speed 5977.80 samples/sec Loss 11.1459 LearningRate 0.2585 Epoch: 5 Global Step: 57330 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:30:21,418-Speed 5959.83 samples/sec Loss 11.0691 LearningRate 0.2585 Epoch: 5 Global Step: 57340 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:30:28,296-Speed 5956.78 samples/sec Loss 11.0994 LearningRate 0.2585 Epoch: 5 Global Step: 57350 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:30:35,144-Speed 5982.08 samples/sec Loss 11.1196 LearningRate 0.2584 Epoch: 5 Global Step: 57360 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:30:41,985-Speed 5988.01 samples/sec Loss 11.1049 LearningRate 0.2584 Epoch: 5 Global Step: 57370 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:30:48,843-Speed 5974.27 samples/sec Loss 11.0957 LearningRate 0.2584 Epoch: 5 Global Step: 57380 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:30:55,687-Speed 5985.35 samples/sec Loss 11.1965 LearningRate 0.2583 Epoch: 5 Global Step: 57390 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:31:02,559-Speed 5961.62 samples/sec Loss 11.1426 LearningRate 0.2583 Epoch: 5 Global Step: 57400 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:31:09,417-Speed 5973.99 samples/sec Loss 11.2073 LearningRate 0.2583 Epoch: 5 Global Step: 57410 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:31:16,262-Speed 5984.24 samples/sec Loss 11.1915 LearningRate 0.2582 Epoch: 5 Global Step: 57420 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:31:23,107-Speed 5985.61 samples/sec Loss 11.0933 LearningRate 0.2582 Epoch: 5 Global Step: 57430 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:31:29,952-Speed 5985.05 samples/sec Loss 11.1224 LearningRate 0.2582 Epoch: 5 Global Step: 57440 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:31:36,846-Speed 5941.89 samples/sec Loss 11.1737 LearningRate 0.2581 Epoch: 5 Global Step: 57450 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:31:43,694-Speed 5982.49 samples/sec Loss 11.1387 LearningRate 0.2581 Epoch: 5 Global Step: 57460 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:31:50,539-Speed 5984.61 samples/sec Loss 11.1369 LearningRate 0.2580 Epoch: 5 Global Step: 57470 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:31:57,476-Speed 5905.47 samples/sec Loss 11.2035 LearningRate 0.2580 Epoch: 5 Global Step: 57480 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:32:04,449-Speed 5874.88 samples/sec Loss 11.1196 LearningRate 0.2580 Epoch: 5 Global Step: 57490 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:32:11,303-Speed 5977.33 samples/sec Loss 11.1174 LearningRate 0.2579 Epoch: 5 Global Step: 57500 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:32:18,250-Speed 5897.02 samples/sec Loss 11.1657 LearningRate 0.2579 Epoch: 5 Global Step: 57510 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:32:25,127-Speed 5957.43 samples/sec Loss 11.1426 LearningRate 0.2579 Epoch: 5 Global Step: 57520 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:32:31,982-Speed 5976.38 samples/sec Loss 11.2112 LearningRate 0.2578 Epoch: 5 Global Step: 57530 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:32:38,840-Speed 5974.20 samples/sec Loss 11.1323 LearningRate 0.2578 Epoch: 5 Global Step: 57540 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:32:45,694-Speed 5976.85 samples/sec Loss 11.0882 LearningRate 0.2578 Epoch: 5 Global Step: 57550 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:32:52,559-Speed 5967.43 samples/sec Loss 11.0962 LearningRate 0.2577 Epoch: 5 Global Step: 57560 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:32:59,418-Speed 5973.16 samples/sec Loss 11.1366 LearningRate 0.2577 Epoch: 5 Global Step: 57570 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:33:06,279-Speed 5971.36 samples/sec Loss 11.1911 LearningRate 0.2577 Epoch: 5 Global Step: 57580 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:33:13,156-Speed 5956.77 samples/sec Loss 11.0539 LearningRate 0.2576 Epoch: 5 Global Step: 57590 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:33:20,030-Speed 5960.48 samples/sec Loss 11.1174 LearningRate 0.2576 Epoch: 5 Global Step: 57600 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:33:26,872-Speed 5986.98 samples/sec Loss 11.1796 LearningRate 0.2576 Epoch: 5 Global Step: 57610 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:33:33,746-Speed 5959.81 samples/sec Loss 11.0866 LearningRate 0.2575 Epoch: 5 Global Step: 57620 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:33:40,595-Speed 5981.28 samples/sec Loss 11.1026 LearningRate 0.2575 Epoch: 5 Global Step: 57630 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:33:47,442-Speed 5983.86 samples/sec Loss 11.0983 LearningRate 0.2575 Epoch: 5 Global Step: 57640 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:33:54,294-Speed 5979.20 samples/sec Loss 11.1819 LearningRate 0.2574 Epoch: 5 Global Step: 57650 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:34:01,145-Speed 5979.14 samples/sec Loss 11.1339 LearningRate 0.2574 Epoch: 5 Global Step: 57660 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:34:08,008-Speed 5969.61 samples/sec Loss 11.0390 LearningRate 0.2574 Epoch: 5 Global Step: 57670 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:34:14,858-Speed 5980.97 samples/sec Loss 11.0895 LearningRate 0.2573 Epoch: 5 Global Step: 57680 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:34:21,726-Speed 5964.44 samples/sec Loss 11.1013 LearningRate 0.2573 Epoch: 5 Global Step: 57690 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:34:28,584-Speed 5974.32 samples/sec Loss 11.1065 LearningRate 0.2573 Epoch: 5 Global Step: 57700 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:34:35,430-Speed 5984.33 samples/sec Loss 11.1088 LearningRate 0.2572 Epoch: 5 Global Step: 57710 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:34:42,265-Speed 5993.13 samples/sec Loss 11.1229 LearningRate 0.2572 Epoch: 5 Global Step: 57720 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:34:49,109-Speed 5985.74 samples/sec Loss 11.0316 LearningRate 0.2572 Epoch: 5 Global Step: 57730 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:34:55,965-Speed 5975.47 samples/sec Loss 11.0862 LearningRate 0.2571 Epoch: 5 Global Step: 57740 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:35:02,818-Speed 5977.46 samples/sec Loss 11.1019 LearningRate 0.2571 Epoch: 5 Global Step: 57750 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:35:09,696-Speed 5956.97 samples/sec Loss 11.1071 LearningRate 0.2571 Epoch: 5 Global Step: 57760 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:35:16,545-Speed 5981.64 samples/sec Loss 11.1159 LearningRate 0.2570 Epoch: 5 Global Step: 57770 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:35:23,405-Speed 5975.03 samples/sec Loss 10.9872 LearningRate 0.2570 Epoch: 5 Global Step: 57780 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:35:30,260-Speed 5977.40 samples/sec Loss 11.0568 LearningRate 0.2569 Epoch: 5 Global Step: 57790 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:35:37,119-Speed 5972.55 samples/sec Loss 11.0807 LearningRate 0.2569 Epoch: 5 Global Step: 57800 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:35:44,004-Speed 5950.23 samples/sec Loss 11.0724 LearningRate 0.2569 Epoch: 5 Global Step: 57810 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:35:50,855-Speed 5980.13 samples/sec Loss 11.0874 LearningRate 0.2568 Epoch: 5 Global Step: 57820 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:35:57,731-Speed 5959.17 samples/sec Loss 11.1861 LearningRate 0.2568 Epoch: 5 Global Step: 57830 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:36:04,586-Speed 5976.16 samples/sec Loss 11.0084 LearningRate 0.2568 Epoch: 5 Global Step: 57840 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:36:11,446-Speed 5981.34 samples/sec Loss 11.0982 LearningRate 0.2567 Epoch: 5 Global Step: 57850 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:36:18,297-Speed 5979.95 samples/sec Loss 11.0477 LearningRate 0.2567 Epoch: 5 Global Step: 57860 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:36:25,148-Speed 5979.23 samples/sec Loss 11.0856 LearningRate 0.2567 Epoch: 5 Global Step: 57870 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:36:31,988-Speed 5989.77 samples/sec Loss 11.1261 LearningRate 0.2566 Epoch: 5 Global Step: 57880 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:36:38,878-Speed 5945.70 samples/sec Loss 11.0950 LearningRate 0.2566 Epoch: 5 Global Step: 57890 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:36:45,738-Speed 5971.49 samples/sec Loss 11.1002 LearningRate 0.2566 Epoch: 5 Global Step: 57900 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:36:52,593-Speed 5976.85 samples/sec Loss 10.9918 LearningRate 0.2565 Epoch: 5 Global Step: 57910 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:36:59,459-Speed 5966.21 samples/sec Loss 11.1239 LearningRate 0.2565 Epoch: 5 Global Step: 57920 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:37:06,318-Speed 5973.08 samples/sec Loss 11.1765 LearningRate 0.2565 Epoch: 5 Global Step: 57930 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:37:13,166-Speed 5981.83 samples/sec Loss 11.1475 LearningRate 0.2564 Epoch: 5 Global Step: 57940 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:37:20,043-Speed 5969.11 samples/sec Loss 10.9884 LearningRate 0.2564 Epoch: 5 Global Step: 57950 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:37:26,901-Speed 5973.92 samples/sec Loss 11.1155 LearningRate 0.2564 Epoch: 5 Global Step: 57960 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:37:33,762-Speed 5971.25 samples/sec Loss 11.1073 LearningRate 0.2563 Epoch: 5 Global Step: 57970 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:37:40,614-Speed 5980.86 samples/sec Loss 11.0834 LearningRate 0.2563 Epoch: 5 Global Step: 57980 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:37:47,466-Speed 5978.99 samples/sec Loss 11.0480 LearningRate 0.2563 Epoch: 5 Global Step: 57990 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:37:54,323-Speed 5974.28 samples/sec Loss 11.0886 LearningRate 0.2562 Epoch: 5 Global Step: 58000 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:38:01,181-Speed 5973.63 samples/sec Loss 11.1076 LearningRate 0.2562 Epoch: 5 Global Step: 58010 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:38:08,037-Speed 5975.24 samples/sec Loss 11.1031 LearningRate 0.2562 Epoch: 5 Global Step: 58020 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:38:14,888-Speed 5981.75 samples/sec Loss 11.0716 LearningRate 0.2561 Epoch: 5 Global Step: 58030 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:38:21,774-Speed 5949.18 samples/sec Loss 11.0694 LearningRate 0.2561 Epoch: 5 Global Step: 58040 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:38:28,634-Speed 5971.75 samples/sec Loss 11.1110 LearningRate 0.2561 Epoch: 5 Global Step: 58050 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:38:35,479-Speed 5985.53 samples/sec Loss 11.0947 LearningRate 0.2560 Epoch: 5 Global Step: 58060 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:38:42,351-Speed 5962.56 samples/sec Loss 11.0262 LearningRate 0.2560 Epoch: 5 Global Step: 58070 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:38:49,222-Speed 5961.90 samples/sec Loss 11.0271 LearningRate 0.2560 Epoch: 5 Global Step: 58080 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:38:56,092-Speed 5963.74 samples/sec Loss 11.1219 LearningRate 0.2559 Epoch: 5 Global Step: 58090 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:39:02,962-Speed 5963.05 samples/sec Loss 11.0562 LearningRate 0.2559 Epoch: 5 Global Step: 58100 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:39:09,847-Speed 5950.68 samples/sec Loss 11.0265 LearningRate 0.2559 Epoch: 5 Global Step: 58110 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:39:16,703-Speed 5975.63 samples/sec Loss 11.0775 LearningRate 0.2558 Epoch: 5 Global Step: 58120 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:39:23,551-Speed 5981.94 samples/sec Loss 11.0685 LearningRate 0.2558 Epoch: 5 Global Step: 58130 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:39:30,419-Speed 5965.24 samples/sec Loss 11.0872 LearningRate 0.2557 Epoch: 5 Global Step: 58140 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:39:37,268-Speed 5983.92 samples/sec Loss 11.0401 LearningRate 0.2557 Epoch: 5 Global Step: 58150 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:39:44,113-Speed 5984.11 samples/sec Loss 11.0559 LearningRate 0.2557 Epoch: 5 Global Step: 58160 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:39:50,971-Speed 5973.72 samples/sec Loss 11.0799 LearningRate 0.2556 Epoch: 5 Global Step: 58170 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:39:57,840-Speed 5964.40 samples/sec Loss 11.1330 LearningRate 0.2556 Epoch: 5 Global Step: 58180 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:40:04,690-Speed 5980.78 samples/sec Loss 11.0602 LearningRate 0.2556 Epoch: 5 Global Step: 58190 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:40:11,550-Speed 5971.83 samples/sec Loss 11.1420 LearningRate 0.2555 Epoch: 5 Global Step: 58200 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:40:18,409-Speed 5973.19 samples/sec Loss 11.1628 LearningRate 0.2555 Epoch: 5 Global Step: 58210 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:40:25,256-Speed 5982.68 samples/sec Loss 11.0155 LearningRate 0.2555 Epoch: 5 Global Step: 58220 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:40:32,107-Speed 5980.08 samples/sec Loss 11.1588 LearningRate 0.2554 Epoch: 5 Global Step: 58230 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:40:38,947-Speed 5989.27 samples/sec Loss 11.1036 LearningRate 0.2554 Epoch: 5 Global Step: 58240 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:40:45,823-Speed 5957.96 samples/sec Loss 11.0565 LearningRate 0.2554 Epoch: 5 Global Step: 58250 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:40:52,667-Speed 5985.68 samples/sec Loss 11.0798 LearningRate 0.2553 Epoch: 5 Global Step: 58260 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:40:59,542-Speed 5959.45 samples/sec Loss 11.0949 LearningRate 0.2553 Epoch: 5 Global Step: 58270 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:41:06,406-Speed 5971.67 samples/sec Loss 11.0450 LearningRate 0.2553 Epoch: 5 Global Step: 58280 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:41:13,266-Speed 5971.34 samples/sec Loss 11.1341 LearningRate 0.2552 Epoch: 5 Global Step: 58290 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:41:20,126-Speed 5971.95 samples/sec Loss 11.0812 LearningRate 0.2552 Epoch: 5 Global Step: 58300 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:41:26,978-Speed 5979.03 samples/sec Loss 11.0968 LearningRate 0.2552 Epoch: 5 Global Step: 58310 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:41:33,830-Speed 5978.88 samples/sec Loss 11.0267 LearningRate 0.2551 Epoch: 5 Global Step: 58320 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:41:40,686-Speed 5975.36 samples/sec Loss 11.0452 LearningRate 0.2551 Epoch: 5 Global Step: 58330 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:41:47,553-Speed 5966.25 samples/sec Loss 11.0405 LearningRate 0.2551 Epoch: 5 Global Step: 58340 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:41:54,420-Speed 5965.09 samples/sec Loss 11.0739 LearningRate 0.2550 Epoch: 5 Global Step: 58350 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:42:01,268-Speed 5983.02 samples/sec Loss 11.0903 LearningRate 0.2550 Epoch: 5 Global Step: 58360 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:42:08,149-Speed 5953.23 samples/sec Loss 11.0045 LearningRate 0.2550 Epoch: 5 Global Step: 58370 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:42:15,006-Speed 5975.03 samples/sec Loss 11.0410 LearningRate 0.2549 Epoch: 5 Global Step: 58380 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:42:21,870-Speed 5968.16 samples/sec Loss 11.0567 LearningRate 0.2549 Epoch: 5 Global Step: 58390 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:42:28,751-Speed 5953.92 samples/sec Loss 11.1302 LearningRate 0.2549 Epoch: 5 Global Step: 58400 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:42:35,612-Speed 5971.08 samples/sec Loss 11.0888 LearningRate 0.2548 Epoch: 5 Global Step: 58410 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:42:42,500-Speed 5948.24 samples/sec Loss 11.0910 LearningRate 0.2548 Epoch: 5 Global Step: 58420 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:42:49,442-Speed 5901.31 samples/sec Loss 11.0970 LearningRate 0.2548 Epoch: 5 Global Step: 58430 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:42:56,376-Speed 5908.00 samples/sec Loss 11.0870 LearningRate 0.2547 Epoch: 5 Global Step: 58440 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:43:03,266-Speed 5945.66 samples/sec Loss 10.9941 LearningRate 0.2547 Epoch: 5 Global Step: 58450 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:43:10,118-Speed 5978.90 samples/sec Loss 11.0833 LearningRate 0.2547 Epoch: 5 Global Step: 58460 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:43:16,975-Speed 5974.51 samples/sec Loss 11.0130 LearningRate 0.2546 Epoch: 5 Global Step: 58470 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:43:23,841-Speed 5967.21 samples/sec Loss 11.0876 LearningRate 0.2546 Epoch: 5 Global Step: 58480 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:43:30,706-Speed 5969.10 samples/sec Loss 10.9609 LearningRate 0.2545 Epoch: 5 Global Step: 58490 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:43:37,542-Speed 5992.98 samples/sec Loss 11.0762 LearningRate 0.2545 Epoch: 5 Global Step: 58500 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:43:44,413-Speed 5962.20 samples/sec Loss 11.0874 LearningRate 0.2545 Epoch: 5 Global Step: 58510 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:43:51,261-Speed 5982.51 samples/sec Loss 11.0222 LearningRate 0.2544 Epoch: 5 Global Step: 58520 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:43:58,154-Speed 5944.04 samples/sec Loss 11.0884 LearningRate 0.2544 Epoch: 5 Global Step: 58530 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:44:05,005-Speed 5979.45 samples/sec Loss 11.0198 LearningRate 0.2544 Epoch: 5 Global Step: 58540 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:44:11,923-Speed 5924.12 samples/sec Loss 11.0267 LearningRate 0.2543 Epoch: 5 Global Step: 58550 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:44:18,831-Speed 5930.24 samples/sec Loss 11.0143 LearningRate 0.2543 Epoch: 5 Global Step: 58560 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:44:25,689-Speed 5974.00 samples/sec Loss 10.9875 LearningRate 0.2543 Epoch: 5 Global Step: 58570 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:44:32,530-Speed 5988.21 samples/sec Loss 11.0393 LearningRate 0.2542 Epoch: 5 Global Step: 58580 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:44:39,376-Speed 5983.68 samples/sec Loss 10.9956 LearningRate 0.2542 Epoch: 5 Global Step: 58590 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:44:46,236-Speed 5973.34 samples/sec Loss 11.0871 LearningRate 0.2542 Epoch: 5 Global Step: 58600 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:44:53,092-Speed 5975.73 samples/sec Loss 11.0653 LearningRate 0.2541 Epoch: 5 Global Step: 58610 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:44:59,928-Speed 5993.30 samples/sec Loss 11.0956 LearningRate 0.2541 Epoch: 5 Global Step: 58620 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:45:06,809-Speed 5953.11 samples/sec Loss 11.0465 LearningRate 0.2541 Epoch: 5 Global Step: 58630 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:45:13,663-Speed 5978.08 samples/sec Loss 11.1326 LearningRate 0.2540 Epoch: 5 Global Step: 58640 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:45:20,519-Speed 5978.14 samples/sec Loss 10.9417 LearningRate 0.2540 Epoch: 5 Global Step: 58650 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:45:27,394-Speed 5958.30 samples/sec Loss 11.0481 LearningRate 0.2540 Epoch: 5 Global Step: 58660 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:45:34,265-Speed 5962.66 samples/sec Loss 11.0433 LearningRate 0.2539 Epoch: 5 Global Step: 58670 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:45:41,106-Speed 5988.56 samples/sec Loss 11.0357 LearningRate 0.2539 Epoch: 5 Global Step: 58680 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:45:47,961-Speed 5976.50 samples/sec Loss 11.0352 LearningRate 0.2539 Epoch: 5 Global Step: 58690 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:45:54,807-Speed 5984.62 samples/sec Loss 11.1124 LearningRate 0.2538 Epoch: 5 Global Step: 58700 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:46:01,647-Speed 5988.98 samples/sec Loss 11.0719 LearningRate 0.2538 Epoch: 5 Global Step: 58710 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:46:08,496-Speed 5981.75 samples/sec Loss 11.0618 LearningRate 0.2538 Epoch: 5 Global Step: 58720 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:46:15,351-Speed 5977.17 samples/sec Loss 11.0119 LearningRate 0.2537 Epoch: 5 Global Step: 58730 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:46:22,316-Speed 5881.81 samples/sec Loss 11.0061 LearningRate 0.2537 Epoch: 5 Global Step: 58740 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:46:29,274-Speed 5888.41 samples/sec Loss 11.0078 LearningRate 0.2537 Epoch: 5 Global Step: 58750 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:46:36,149-Speed 5959.37 samples/sec Loss 11.0765 LearningRate 0.2536 Epoch: 5 Global Step: 58760 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:46:43,010-Speed 5970.39 samples/sec Loss 11.0603 LearningRate 0.2536 Epoch: 5 Global Step: 58770 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:46:49,876-Speed 5969.68 samples/sec Loss 11.0413 LearningRate 0.2536 Epoch: 5 Global Step: 58780 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:46:56,833-Speed 5889.84 samples/sec Loss 11.0742 LearningRate 0.2535 Epoch: 5 Global Step: 58790 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:47:03,675-Speed 5987.12 samples/sec Loss 11.0079 LearningRate 0.2535 Epoch: 5 Global Step: 58800 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:47:10,511-Speed 5992.96 samples/sec Loss 11.0312 LearningRate 0.2535 Epoch: 5 Global Step: 58810 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 07:47:17,357-Speed 5983.99 samples/sec Loss 11.0495 LearningRate 0.2534 Epoch: 5 Global Step: 58820 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 07:47:24,209-Speed 5978.76 samples/sec Loss 11.0208 LearningRate 0.2534 Epoch: 5 Global Step: 58830 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 07:47:31,062-Speed 5978.14 samples/sec Loss 11.0827 LearningRate 0.2534 Epoch: 5 Global Step: 58840 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 07:47:37,909-Speed 5983.20 samples/sec Loss 11.0113 LearningRate 0.2533 Epoch: 5 Global Step: 58850 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 07:47:44,756-Speed 5983.63 samples/sec Loss 10.9380 LearningRate 0.2533 Epoch: 5 Global Step: 58860 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 07:47:51,609-Speed 5978.19 samples/sec Loss 11.0759 LearningRate 0.2533 Epoch: 5 Global Step: 58870 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 07:47:58,462-Speed 5978.14 samples/sec Loss 11.0827 LearningRate 0.2532 Epoch: 5 Global Step: 58880 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 07:48:05,314-Speed 5978.59 samples/sec Loss 11.0036 LearningRate 0.2532 Epoch: 5 Global Step: 58890 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 07:48:12,164-Speed 5980.71 samples/sec Loss 11.0328 LearningRate 0.2531 Epoch: 5 Global Step: 58900 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 07:48:19,015-Speed 5979.42 samples/sec Loss 11.0462 LearningRate 0.2531 Epoch: 5 Global Step: 58910 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:48:25,874-Speed 5972.61 samples/sec Loss 11.0549 LearningRate 0.2531 Epoch: 5 Global Step: 58920 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:48:32,721-Speed 5983.42 samples/sec Loss 11.0295 LearningRate 0.2530 Epoch: 5 Global Step: 58930 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:48:39,575-Speed 5977.45 samples/sec Loss 11.0109 LearningRate 0.2530 Epoch: 5 Global Step: 58940 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:48:46,454-Speed 5955.94 samples/sec Loss 11.0670 LearningRate 0.2530 Epoch: 5 Global Step: 58950 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:48:53,330-Speed 5958.17 samples/sec Loss 10.9993 LearningRate 0.2529 Epoch: 5 Global Step: 58960 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:49:00,204-Speed 5963.21 samples/sec Loss 11.0279 LearningRate 0.2529 Epoch: 5 Global Step: 58970 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:49:07,054-Speed 5980.42 samples/sec Loss 11.0889 LearningRate 0.2529 Epoch: 5 Global Step: 58980 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:49:13,909-Speed 5977.93 samples/sec Loss 10.9382 LearningRate 0.2528 Epoch: 5 Global Step: 58990 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:49:21,560-Speed 5354.41 samples/sec Loss 10.9662 LearningRate 0.2528 Epoch: 5 Global Step: 59000 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:49:28,470-Speed 5929.34 samples/sec Loss 10.9707 LearningRate 0.2528 Epoch: 5 Global Step: 59010 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:49:35,320-Speed 5980.68 samples/sec Loss 10.9518 LearningRate 0.2527 Epoch: 5 Global Step: 59020 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:49:42,187-Speed 5965.83 samples/sec Loss 10.8839 LearningRate 0.2527 Epoch: 5 Global Step: 59030 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:49:49,054-Speed 5966.19 samples/sec Loss 11.0263 LearningRate 0.2527 Epoch: 5 Global Step: 59040 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:49:55,912-Speed 5973.87 samples/sec Loss 11.0752 LearningRate 0.2526 Epoch: 5 Global Step: 59050 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:50:02,768-Speed 5976.18 samples/sec Loss 10.9715 LearningRate 0.2526 Epoch: 5 Global Step: 59060 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:50:09,627-Speed 5972.06 samples/sec Loss 11.1035 LearningRate 0.2526 Epoch: 5 Global Step: 59070 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:50:16,493-Speed 5969.59 samples/sec Loss 10.9970 LearningRate 0.2525 Epoch: 5 Global Step: 59080 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:50:23,340-Speed 5983.24 samples/sec Loss 11.0890 LearningRate 0.2525 Epoch: 5 Global Step: 59090 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:50:30,188-Speed 5982.74 samples/sec Loss 11.0306 LearningRate 0.2525 Epoch: 5 Global Step: 59100 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:50:37,045-Speed 5973.86 samples/sec Loss 11.0322 LearningRate 0.2524 Epoch: 5 Global Step: 59110 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:50:43,894-Speed 5981.89 samples/sec Loss 11.0122 LearningRate 0.2524 Epoch: 5 Global Step: 59120 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:50:50,763-Speed 5964.61 samples/sec Loss 11.0965 LearningRate 0.2524 Epoch: 5 Global Step: 59130 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:50:57,629-Speed 5968.66 samples/sec Loss 11.0256 LearningRate 0.2523 Epoch: 5 Global Step: 59140 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:51:04,478-Speed 5981.38 samples/sec Loss 10.9240 LearningRate 0.2523 Epoch: 5 Global Step: 59150 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:51:11,371-Speed 5942.81 samples/sec Loss 10.8674 LearningRate 0.2523 Epoch: 5 Global Step: 59160 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:51:18,276-Speed 5933.14 samples/sec Loss 10.9846 LearningRate 0.2522 Epoch: 5 Global Step: 59170 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:51:25,171-Speed 5941.67 samples/sec Loss 10.9719 LearningRate 0.2522 Epoch: 5 Global Step: 59180 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:51:32,052-Speed 5953.18 samples/sec Loss 10.9629 LearningRate 0.2522 Epoch: 5 Global Step: 59190 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:51:38,969-Speed 5923.21 samples/sec Loss 10.9577 LearningRate 0.2521 Epoch: 5 Global Step: 59200 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:51:45,850-Speed 5953.67 samples/sec Loss 10.9293 LearningRate 0.2521 Epoch: 5 Global Step: 59210 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:51:52,711-Speed 5970.90 samples/sec Loss 11.0120 LearningRate 0.2521 Epoch: 5 Global Step: 59220 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:51:59,606-Speed 5942.90 samples/sec Loss 11.0442 LearningRate 0.2520 Epoch: 5 Global Step: 59230 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:52:06,491-Speed 5950.45 samples/sec Loss 11.0809 LearningRate 0.2520 Epoch: 5 Global Step: 59240 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:52:13,361-Speed 5963.52 samples/sec Loss 11.0392 LearningRate 0.2520 Epoch: 5 Global Step: 59250 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:52:20,215-Speed 5976.68 samples/sec Loss 11.0345 LearningRate 0.2519 Epoch: 5 Global Step: 59260 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:52:27,084-Speed 5968.27 samples/sec Loss 11.0307 LearningRate 0.2519 Epoch: 5 Global Step: 59270 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:52:33,971-Speed 5948.42 samples/sec Loss 10.9880 LearningRate 0.2519 Epoch: 5 Global Step: 59280 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:52:40,818-Speed 5983.04 samples/sec Loss 10.9725 LearningRate 0.2518 Epoch: 5 Global Step: 59290 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:52:47,678-Speed 5971.85 samples/sec Loss 11.0437 LearningRate 0.2518 Epoch: 5 Global Step: 59300 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:52:54,539-Speed 5971.51 samples/sec Loss 10.9414 LearningRate 0.2518 Epoch: 5 Global Step: 59310 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:53:01,794-Speed 5646.63 samples/sec Loss 10.9843 LearningRate 0.2517 Epoch: 5 Global Step: 59320 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:53:08,641-Speed 5983.37 samples/sec Loss 10.9755 LearningRate 0.2517 Epoch: 5 Global Step: 59330 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:53:15,511-Speed 5962.97 samples/sec Loss 11.0004 LearningRate 0.2517 Epoch: 5 Global Step: 59340 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:53:22,396-Speed 5950.78 samples/sec Loss 10.9435 LearningRate 0.2516 Epoch: 5 Global Step: 59350 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:53:29,237-Speed 5988.61 samples/sec Loss 10.8997 LearningRate 0.2516 Epoch: 5 Global Step: 59360 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 07:53:36,086-Speed 5981.27 samples/sec Loss 11.0354 LearningRate 0.2515 Epoch: 5 Global Step: 59370 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 07:53:42,931-Speed 5985.22 samples/sec Loss 10.9805 LearningRate 0.2515 Epoch: 5 Global Step: 59380 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 07:53:49,786-Speed 5975.98 samples/sec Loss 11.0633 LearningRate 0.2515 Epoch: 5 Global Step: 59390 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 07:53:56,636-Speed 5980.59 samples/sec Loss 10.9948 LearningRate 0.2514 Epoch: 5 Global Step: 59400 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 07:54:03,503-Speed 5966.13 samples/sec Loss 10.9563 LearningRate 0.2514 Epoch: 5 Global Step: 59410 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 07:54:10,350-Speed 5986.38 samples/sec Loss 11.0525 LearningRate 0.2514 Epoch: 5 Global Step: 59420 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 07:54:17,198-Speed 5981.79 samples/sec Loss 11.0245 LearningRate 0.2513 Epoch: 5 Global Step: 59430 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 07:54:24,049-Speed 5981.25 samples/sec Loss 11.0032 LearningRate 0.2513 Epoch: 5 Global Step: 59440 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 07:54:30,897-Speed 5982.49 samples/sec Loss 10.9790 LearningRate 0.2513 Epoch: 5 Global Step: 59450 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 07:54:37,741-Speed 5985.45 samples/sec Loss 10.9759 LearningRate 0.2512 Epoch: 5 Global Step: 59460 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:54:44,594-Speed 5977.75 samples/sec Loss 10.9656 LearningRate 0.2512 Epoch: 5 Global Step: 59470 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:54:51,559-Speed 5883.22 samples/sec Loss 10.9650 LearningRate 0.2512 Epoch: 5 Global Step: 59480 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:54:58,525-Speed 5881.27 samples/sec Loss 11.0114 LearningRate 0.2511 Epoch: 5 Global Step: 59490 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:55:05,391-Speed 5966.69 samples/sec Loss 10.8976 LearningRate 0.2511 Epoch: 5 Global Step: 59500 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:55:12,244-Speed 5978.55 samples/sec Loss 10.9894 LearningRate 0.2511 Epoch: 5 Global Step: 59510 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:55:19,092-Speed 5981.59 samples/sec Loss 10.8820 LearningRate 0.2510 Epoch: 5 Global Step: 59520 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:55:25,935-Speed 5987.53 samples/sec Loss 10.9460 LearningRate 0.2510 Epoch: 5 Global Step: 59530 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:55:32,805-Speed 5962.86 samples/sec Loss 11.0768 LearningRate 0.2510 Epoch: 5 Global Step: 59540 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:55:39,657-Speed 5978.91 samples/sec Loss 10.9802 LearningRate 0.2509 Epoch: 5 Global Step: 59550 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:55:46,516-Speed 5973.78 samples/sec Loss 10.9879 LearningRate 0.2509 Epoch: 5 Global Step: 59560 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:55:53,374-Speed 5973.57 samples/sec Loss 10.9036 LearningRate 0.2509 Epoch: 5 Global Step: 59570 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:56:00,207-Speed 5995.31 samples/sec Loss 10.9563 LearningRate 0.2508 Epoch: 5 Global Step: 59580 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:56:07,065-Speed 5974.15 samples/sec Loss 11.0150 LearningRate 0.2508 Epoch: 5 Global Step: 59590 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:56:13,983-Speed 5922.77 samples/sec Loss 10.9823 LearningRate 0.2508 Epoch: 5 Global Step: 59600 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:56:20,902-Speed 5920.96 samples/sec Loss 10.9775 LearningRate 0.2507 Epoch: 5 Global Step: 59610 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:56:27,825-Speed 5917.26 samples/sec Loss 10.9021 LearningRate 0.2507 Epoch: 5 Global Step: 59620 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:56:34,664-Speed 5990.12 samples/sec Loss 10.9294 LearningRate 0.2507 Epoch: 5 Global Step: 59630 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:56:41,536-Speed 5962.05 samples/sec Loss 10.9624 LearningRate 0.2506 Epoch: 5 Global Step: 59640 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:56:48,979-Speed 5506.61 samples/sec Loss 10.9593 LearningRate 0.2506 Epoch: 5 Global Step: 59650 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:56:55,885-Speed 5933.29 samples/sec Loss 10.9514 LearningRate 0.2506 Epoch: 5 Global Step: 59660 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:57:02,795-Speed 5928.86 samples/sec Loss 11.0047 LearningRate 0.2505 Epoch: 5 Global Step: 59670 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:57:09,641-Speed 5983.96 samples/sec Loss 10.9492 LearningRate 0.2505 Epoch: 5 Global Step: 59680 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:57:16,490-Speed 5982.24 samples/sec Loss 10.9737 LearningRate 0.2505 Epoch: 5 Global Step: 59690 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:57:23,327-Speed 5991.77 samples/sec Loss 10.9394 LearningRate 0.2504 Epoch: 5 Global Step: 59700 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:57:30,184-Speed 5974.49 samples/sec Loss 10.9986 LearningRate 0.2504 Epoch: 5 Global Step: 59710 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:57:37,036-Speed 5979.25 samples/sec Loss 10.9485 LearningRate 0.2504 Epoch: 5 Global Step: 59720 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:57:43,885-Speed 5981.83 samples/sec Loss 11.0077 LearningRate 0.2503 Epoch: 5 Global Step: 59730 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:57:50,744-Speed 5973.59 samples/sec Loss 10.9824 LearningRate 0.2503 Epoch: 5 Global Step: 59740 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:57:57,607-Speed 5969.21 samples/sec Loss 10.8995 LearningRate 0.2503 Epoch: 5 Global Step: 59750 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:58:04,468-Speed 5971.69 samples/sec Loss 10.9534 LearningRate 0.2502 Epoch: 5 Global Step: 59760 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:58:11,335-Speed 5965.76 samples/sec Loss 10.9706 LearningRate 0.2502 Epoch: 5 Global Step: 59770 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:58:18,212-Speed 5956.92 samples/sec Loss 10.9758 LearningRate 0.2502 Epoch: 5 Global Step: 59780 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:58:25,059-Speed 5983.49 samples/sec Loss 10.9378 LearningRate 0.2501 Epoch: 5 Global Step: 59790 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:58:31,906-Speed 5983.45 samples/sec Loss 11.0278 LearningRate 0.2501 Epoch: 5 Global Step: 59800 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:58:38,788-Speed 5954.87 samples/sec Loss 11.0931 LearningRate 0.2501 Epoch: 5 Global Step: 59810 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:58:45,642-Speed 5977.11 samples/sec Loss 10.9661 LearningRate 0.2500 Epoch: 5 Global Step: 59820 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:58:52,517-Speed 5958.67 samples/sec Loss 10.9694 LearningRate 0.2500 Epoch: 5 Global Step: 59830 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 07:58:59,380-Speed 5969.59 samples/sec Loss 11.0279 LearningRate 0.2500 Epoch: 5 Global Step: 59840 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:59:06,255-Speed 5959.08 samples/sec Loss 10.9211 LearningRate 0.2499 Epoch: 5 Global Step: 59850 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:59:13,109-Speed 5976.76 samples/sec Loss 10.9068 LearningRate 0.2499 Epoch: 5 Global Step: 59860 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:59:19,965-Speed 5977.60 samples/sec Loss 10.9365 LearningRate 0.2499 Epoch: 5 Global Step: 59870 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:59:26,824-Speed 5972.74 samples/sec Loss 11.0105 LearningRate 0.2498 Epoch: 5 Global Step: 59880 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:59:33,690-Speed 5966.31 samples/sec Loss 10.9756 LearningRate 0.2498 Epoch: 5 Global Step: 59890 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:59:40,549-Speed 5973.50 samples/sec Loss 10.9696 LearningRate 0.2498 Epoch: 5 Global Step: 59900 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:59:47,402-Speed 5977.55 samples/sec Loss 10.9700 LearningRate 0.2497 Epoch: 5 Global Step: 59910 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 07:59:54,264-Speed 5969.54 samples/sec Loss 11.0115 LearningRate 0.2497 Epoch: 5 Global Step: 59920 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:00:01,121-Speed 5974.73 samples/sec Loss 10.9204 LearningRate 0.2496 Epoch: 5 Global Step: 59930 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:00:07,995-Speed 5959.93 samples/sec Loss 10.9400 LearningRate 0.2496 Epoch: 5 Global Step: 59940 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:00:14,879-Speed 5951.12 samples/sec Loss 10.9752 LearningRate 0.2496 Epoch: 5 Global Step: 59950 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:00:21,740-Speed 5972.03 samples/sec Loss 11.0660 LearningRate 0.2495 Epoch: 5 Global Step: 59960 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:00:28,611-Speed 5961.91 samples/sec Loss 11.0093 LearningRate 0.2495 Epoch: 5 Global Step: 59970 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:00:35,460-Speed 5981.75 samples/sec Loss 11.0031 LearningRate 0.2495 Epoch: 5 Global Step: 59980 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:00:42,332-Speed 5961.95 samples/sec Loss 10.9845 LearningRate 0.2494 Epoch: 5 Global Step: 59990 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:00:49,211-Speed 5955.11 samples/sec Loss 10.9592 LearningRate 0.2494 Epoch: 5 Global Step: 60000 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:01:16,394-[lfw][60000]XNorm: 23.897016 Training: 2022-01-08 08:01:16,395-[lfw][60000]Accuracy-Flip: 0.99683+-0.00293 Training: 2022-01-08 08:01:16,396-[lfw][60000]Accuracy-Highest: 0.99700 Training: 2022-01-08 08:01:47,711-[cfp_fp][60000]XNorm: 20.987406 Training: 2022-01-08 08:01:47,712-[cfp_fp][60000]Accuracy-Flip: 0.97557+-0.00773 Training: 2022-01-08 08:01:47,713-[cfp_fp][60000]Accuracy-Highest: 0.97686 Training: 2022-01-08 08:02:14,789-[agedb_30][60000]XNorm: 23.517364 Training: 2022-01-08 08:02:14,790-[agedb_30][60000]Accuracy-Flip: 0.96400+-0.00834 Training: 2022-01-08 08:02:14,790-[agedb_30][60000]Accuracy-Highest: 0.96400 Training: 2022-01-08 08:02:21,622-Speed 443.25 samples/sec Loss 10.9680 LearningRate 0.2494 Epoch: 5 Global Step: 60010 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:02:28,448-Speed 6002.05 samples/sec Loss 10.8945 LearningRate 0.2493 Epoch: 5 Global Step: 60020 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:02:35,292-Speed 5987.14 samples/sec Loss 11.0046 LearningRate 0.2493 Epoch: 5 Global Step: 60030 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:02:42,154-Speed 5970.26 samples/sec Loss 10.9984 LearningRate 0.2493 Epoch: 5 Global Step: 60040 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 08:02:49,004-Speed 5983.22 samples/sec Loss 10.9467 LearningRate 0.2492 Epoch: 5 Global Step: 60050 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 08:02:55,842-Speed 5991.05 samples/sec Loss 10.9290 LearningRate 0.2492 Epoch: 5 Global Step: 60060 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:03:02,692-Speed 5980.37 samples/sec Loss 10.9721 LearningRate 0.2492 Epoch: 5 Global Step: 60070 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:03:09,570-Speed 5956.57 samples/sec Loss 10.9121 LearningRate 0.2491 Epoch: 5 Global Step: 60080 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:03:16,417-Speed 5983.19 samples/sec Loss 10.8770 LearningRate 0.2491 Epoch: 5 Global Step: 60090 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:03:23,284-Speed 5966.24 samples/sec Loss 11.0155 LearningRate 0.2491 Epoch: 5 Global Step: 60100 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 08:03:30,129-Speed 5984.67 samples/sec Loss 10.9281 LearningRate 0.2490 Epoch: 5 Global Step: 60110 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 08:03:36,971-Speed 5988.00 samples/sec Loss 10.9760 LearningRate 0.2490 Epoch: 5 Global Step: 60120 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 08:03:44,834-Speed 5213.05 samples/sec Loss 10.9977 LearningRate 0.2490 Epoch: 5 Global Step: 60130 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 08:03:51,671-Speed 5990.95 samples/sec Loss 10.9258 LearningRate 0.2489 Epoch: 5 Global Step: 60140 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 08:03:58,513-Speed 5990.25 samples/sec Loss 10.9163 LearningRate 0.2489 Epoch: 5 Global Step: 60150 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 08:04:05,357-Speed 5986.81 samples/sec Loss 10.8595 LearningRate 0.2489 Epoch: 5 Global Step: 60160 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 08:04:12,210-Speed 5978.02 samples/sec Loss 11.0014 LearningRate 0.2488 Epoch: 5 Global Step: 60170 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 08:04:19,051-Speed 5990.76 samples/sec Loss 10.8937 LearningRate 0.2488 Epoch: 5 Global Step: 60180 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 08:04:25,904-Speed 5978.74 samples/sec Loss 10.9154 LearningRate 0.2488 Epoch: 5 Global Step: 60190 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 08:04:32,776-Speed 5961.52 samples/sec Loss 10.9069 LearningRate 0.2487 Epoch: 5 Global Step: 60200 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:04:39,619-Speed 5986.92 samples/sec Loss 10.9328 LearningRate 0.2487 Epoch: 5 Global Step: 60210 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:04:46,461-Speed 5988.00 samples/sec Loss 10.9427 LearningRate 0.2487 Epoch: 5 Global Step: 60220 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:04:53,315-Speed 5977.27 samples/sec Loss 11.0262 LearningRate 0.2486 Epoch: 5 Global Step: 60230 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:05:00,186-Speed 5962.30 samples/sec Loss 10.9520 LearningRate 0.2486 Epoch: 5 Global Step: 60240 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:05:07,059-Speed 5961.18 samples/sec Loss 11.0108 LearningRate 0.2486 Epoch: 5 Global Step: 60250 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:05:13,930-Speed 5961.79 samples/sec Loss 10.8619 LearningRate 0.2485 Epoch: 5 Global Step: 60260 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:05:20,793-Speed 5969.25 samples/sec Loss 10.9756 LearningRate 0.2485 Epoch: 5 Global Step: 60270 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:05:27,662-Speed 5964.58 samples/sec Loss 10.9069 LearningRate 0.2485 Epoch: 5 Global Step: 60280 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:05:34,532-Speed 5963.37 samples/sec Loss 10.9767 LearningRate 0.2484 Epoch: 5 Global Step: 60290 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:05:41,415-Speed 5951.99 samples/sec Loss 10.8803 LearningRate 0.2484 Epoch: 5 Global Step: 60300 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 08:05:48,291-Speed 5966.85 samples/sec Loss 10.8752 LearningRate 0.2484 Epoch: 5 Global Step: 60310 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 08:05:55,144-Speed 5978.05 samples/sec Loss 10.9502 LearningRate 0.2483 Epoch: 5 Global Step: 60320 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 08:06:02,003-Speed 5972.06 samples/sec Loss 10.9574 LearningRate 0.2483 Epoch: 5 Global Step: 60330 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:06:08,861-Speed 5974.18 samples/sec Loss 10.8674 LearningRate 0.2483 Epoch: 5 Global Step: 60340 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:06:15,716-Speed 5976.65 samples/sec Loss 10.9358 LearningRate 0.2482 Epoch: 5 Global Step: 60350 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:06:22,562-Speed 5984.46 samples/sec Loss 10.8773 LearningRate 0.2482 Epoch: 5 Global Step: 60360 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:06:29,439-Speed 5958.95 samples/sec Loss 10.9015 LearningRate 0.2482 Epoch: 5 Global Step: 60370 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:06:36,299-Speed 5972.44 samples/sec Loss 11.0007 LearningRate 0.2481 Epoch: 5 Global Step: 60380 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:06:43,165-Speed 5966.75 samples/sec Loss 10.9490 LearningRate 0.2481 Epoch: 5 Global Step: 60390 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:06:50,096-Speed 5910.65 samples/sec Loss 10.9847 LearningRate 0.2481 Epoch: 5 Global Step: 60400 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:06:56,933-Speed 5991.79 samples/sec Loss 10.9042 LearningRate 0.2480 Epoch: 5 Global Step: 60410 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:07:03,776-Speed 5987.70 samples/sec Loss 10.9488 LearningRate 0.2480 Epoch: 5 Global Step: 60420 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:07:10,664-Speed 5948.41 samples/sec Loss 10.9353 LearningRate 0.2480 Epoch: 5 Global Step: 60430 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 08:07:17,511-Speed 5983.26 samples/sec Loss 10.9974 LearningRate 0.2479 Epoch: 5 Global Step: 60440 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 08:07:24,389-Speed 5956.36 samples/sec Loss 10.9412 LearningRate 0.2479 Epoch: 5 Global Step: 60450 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 08:07:31,230-Speed 5989.66 samples/sec Loss 10.9169 LearningRate 0.2479 Epoch: 5 Global Step: 60460 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:07:38,096-Speed 5966.30 samples/sec Loss 10.8702 LearningRate 0.2478 Epoch: 5 Global Step: 60470 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:07:44,957-Speed 5971.31 samples/sec Loss 10.8963 LearningRate 0.2478 Epoch: 5 Global Step: 60480 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:07:51,824-Speed 5966.30 samples/sec Loss 10.9371 LearningRate 0.2478 Epoch: 5 Global Step: 60490 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:07:58,680-Speed 5974.97 samples/sec Loss 10.9120 LearningRate 0.2477 Epoch: 5 Global Step: 60500 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:08:05,531-Speed 5980.58 samples/sec Loss 10.9212 LearningRate 0.2477 Epoch: 5 Global Step: 60510 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:08:12,398-Speed 5966.41 samples/sec Loss 10.8908 LearningRate 0.2477 Epoch: 5 Global Step: 60520 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:08:19,265-Speed 5965.40 samples/sec Loss 10.8940 LearningRate 0.2476 Epoch: 5 Global Step: 60530 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:08:26,123-Speed 5974.17 samples/sec Loss 10.9227 LearningRate 0.2476 Epoch: 5 Global Step: 60540 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:08:32,986-Speed 5969.17 samples/sec Loss 10.9005 LearningRate 0.2476 Epoch: 5 Global Step: 60550 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:08:39,834-Speed 5982.24 samples/sec Loss 10.8940 LearningRate 0.2475 Epoch: 5 Global Step: 60560 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 08:08:46,738-Speed 5933.84 samples/sec Loss 10.8291 LearningRate 0.2475 Epoch: 5 Global Step: 60570 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 08:08:53,613-Speed 5959.02 samples/sec Loss 10.9181 LearningRate 0.2475 Epoch: 5 Global Step: 60580 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 08:09:00,473-Speed 5971.87 samples/sec Loss 10.9345 LearningRate 0.2474 Epoch: 5 Global Step: 60590 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 08:09:07,336-Speed 5970.17 samples/sec Loss 10.8666 LearningRate 0.2474 Epoch: 5 Global Step: 60600 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 08:09:14,203-Speed 5967.08 samples/sec Loss 10.9920 LearningRate 0.2474 Epoch: 5 Global Step: 60610 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 08:09:21,038-Speed 5992.60 samples/sec Loss 11.0234 LearningRate 0.2473 Epoch: 5 Global Step: 60620 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:09:27,907-Speed 5964.78 samples/sec Loss 10.9318 LearningRate 0.2473 Epoch: 5 Global Step: 60630 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:09:34,785-Speed 5955.90 samples/sec Loss 10.8720 LearningRate 0.2473 Epoch: 5 Global Step: 60640 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:09:41,640-Speed 5976.89 samples/sec Loss 10.8694 LearningRate 0.2472 Epoch: 5 Global Step: 60650 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:09:48,487-Speed 5983.14 samples/sec Loss 10.8322 LearningRate 0.2472 Epoch: 5 Global Step: 60660 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:09:55,360-Speed 5961.35 samples/sec Loss 10.8621 LearningRate 0.2472 Epoch: 5 Global Step: 60670 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:10:02,250-Speed 5945.95 samples/sec Loss 10.9026 LearningRate 0.2471 Epoch: 5 Global Step: 60680 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:10:09,116-Speed 5966.46 samples/sec Loss 10.8950 LearningRate 0.2471 Epoch: 5 Global Step: 60690 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:10:15,977-Speed 5972.47 samples/sec Loss 10.8576 LearningRate 0.2470 Epoch: 5 Global Step: 60700 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:10:22,934-Speed 5888.20 samples/sec Loss 10.9985 LearningRate 0.2470 Epoch: 5 Global Step: 60710 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:10:29,894-Speed 5886.18 samples/sec Loss 10.9249 LearningRate 0.2470 Epoch: 5 Global Step: 60720 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:10:36,759-Speed 5967.81 samples/sec Loss 10.9191 LearningRate 0.2469 Epoch: 5 Global Step: 60730 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:10:43,632-Speed 5961.13 samples/sec Loss 10.8871 LearningRate 0.2469 Epoch: 5 Global Step: 60740 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:10:50,495-Speed 5975.72 samples/sec Loss 10.9144 LearningRate 0.2469 Epoch: 5 Global Step: 60750 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:10:57,349-Speed 5976.38 samples/sec Loss 10.9265 LearningRate 0.2468 Epoch: 5 Global Step: 60760 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:11:04,216-Speed 5965.73 samples/sec Loss 10.8936 LearningRate 0.2468 Epoch: 5 Global Step: 60770 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:11:11,099-Speed 5952.13 samples/sec Loss 10.8737 LearningRate 0.2468 Epoch: 5 Global Step: 60780 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:11:17,997-Speed 5939.95 samples/sec Loss 10.9099 LearningRate 0.2467 Epoch: 5 Global Step: 60790 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:11:24,851-Speed 5976.34 samples/sec Loss 10.9108 LearningRate 0.2467 Epoch: 5 Global Step: 60800 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:11:31,700-Speed 5981.34 samples/sec Loss 10.8104 LearningRate 0.2467 Epoch: 5 Global Step: 60810 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:11:38,549-Speed 5981.86 samples/sec Loss 10.9811 LearningRate 0.2466 Epoch: 5 Global Step: 60820 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 08:11:45,402-Speed 5977.61 samples/sec Loss 10.9390 LearningRate 0.2466 Epoch: 5 Global Step: 60830 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 08:11:52,258-Speed 5975.32 samples/sec Loss 10.8913 LearningRate 0.2466 Epoch: 5 Global Step: 60840 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 08:11:59,111-Speed 5978.56 samples/sec Loss 10.9106 LearningRate 0.2465 Epoch: 5 Global Step: 60850 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 08:12:05,992-Speed 5953.24 samples/sec Loss 10.8682 LearningRate 0.2465 Epoch: 5 Global Step: 60860 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 08:12:12,869-Speed 5958.11 samples/sec Loss 10.9418 LearningRate 0.2465 Epoch: 5 Global Step: 60870 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 08:12:19,750-Speed 5953.72 samples/sec Loss 10.8841 LearningRate 0.2464 Epoch: 5 Global Step: 60880 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 08:12:26,623-Speed 5960.61 samples/sec Loss 10.8138 LearningRate 0.2464 Epoch: 5 Global Step: 60890 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 08:12:33,482-Speed 5973.18 samples/sec Loss 10.8256 LearningRate 0.2464 Epoch: 5 Global Step: 60900 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 08:12:40,350-Speed 5964.46 samples/sec Loss 10.8977 LearningRate 0.2463 Epoch: 5 Global Step: 60910 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 08:12:47,206-Speed 5975.97 samples/sec Loss 10.8798 LearningRate 0.2463 Epoch: 5 Global Step: 60920 Fp16 Grad Scale: 524288 Required: 29 hours Training: 2022-01-08 08:12:54,064-Speed 5973.55 samples/sec Loss 10.9124 LearningRate 0.2463 Epoch: 5 Global Step: 60930 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 08:13:00,930-Speed 5966.29 samples/sec Loss 10.9386 LearningRate 0.2462 Epoch: 5 Global Step: 60940 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 08:13:07,807-Speed 5957.73 samples/sec Loss 10.8742 LearningRate 0.2462 Epoch: 5 Global Step: 60950 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 08:13:14,667-Speed 5971.38 samples/sec Loss 10.8130 LearningRate 0.2462 Epoch: 5 Global Step: 60960 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 08:13:21,517-Speed 5980.45 samples/sec Loss 10.8941 LearningRate 0.2461 Epoch: 5 Global Step: 60970 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:13:28,399-Speed 5952.53 samples/sec Loss 10.7931 LearningRate 0.2461 Epoch: 5 Global Step: 60980 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:13:35,281-Speed 5952.75 samples/sec Loss 10.9300 LearningRate 0.2461 Epoch: 5 Global Step: 60990 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:13:42,132-Speed 5980.67 samples/sec Loss 10.9069 LearningRate 0.2460 Epoch: 5 Global Step: 61000 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:13:48,981-Speed 5981.05 samples/sec Loss 10.9363 LearningRate 0.2460 Epoch: 5 Global Step: 61010 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:13:55,840-Speed 5972.47 samples/sec Loss 10.8537 LearningRate 0.2460 Epoch: 5 Global Step: 61020 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:14:02,693-Speed 5978.31 samples/sec Loss 10.9714 LearningRate 0.2459 Epoch: 5 Global Step: 61030 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:14:09,540-Speed 5983.36 samples/sec Loss 10.9354 LearningRate 0.2459 Epoch: 5 Global Step: 61040 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:14:16,415-Speed 5958.71 samples/sec Loss 10.9316 LearningRate 0.2459 Epoch: 5 Global Step: 61050 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:14:23,262-Speed 5983.62 samples/sec Loss 10.8777 LearningRate 0.2458 Epoch: 5 Global Step: 61060 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:14:30,115-Speed 5977.26 samples/sec Loss 10.9061 LearningRate 0.2458 Epoch: 5 Global Step: 61070 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 08:14:36,980-Speed 5967.94 samples/sec Loss 10.8400 LearningRate 0.2458 Epoch: 5 Global Step: 61080 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 08:14:43,872-Speed 5944.01 samples/sec Loss 10.8433 LearningRate 0.2457 Epoch: 5 Global Step: 61090 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 08:14:50,740-Speed 5965.75 samples/sec Loss 10.9081 LearningRate 0.2457 Epoch: 5 Global Step: 61100 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 08:14:57,597-Speed 5974.24 samples/sec Loss 10.9189 LearningRate 0.2457 Epoch: 5 Global Step: 61110 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 08:15:04,448-Speed 5979.51 samples/sec Loss 10.8371 LearningRate 0.2456 Epoch: 5 Global Step: 61120 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 08:15:11,304-Speed 5976.16 samples/sec Loss 10.8579 LearningRate 0.2456 Epoch: 5 Global Step: 61130 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 08:15:18,164-Speed 5971.86 samples/sec Loss 10.8910 LearningRate 0.2456 Epoch: 5 Global Step: 61140 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 08:15:25,017-Speed 5979.71 samples/sec Loss 10.7978 LearningRate 0.2455 Epoch: 5 Global Step: 61150 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 08:15:31,864-Speed 5983.60 samples/sec Loss 10.8719 LearningRate 0.2455 Epoch: 5 Global Step: 61160 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 08:15:38,707-Speed 5986.34 samples/sec Loss 10.9315 LearningRate 0.2455 Epoch: 5 Global Step: 61170 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 08:15:45,583-Speed 5958.00 samples/sec Loss 10.8892 LearningRate 0.2454 Epoch: 5 Global Step: 61180 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 08:15:52,462-Speed 5955.18 samples/sec Loss 10.9736 LearningRate 0.2454 Epoch: 5 Global Step: 61190 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 08:15:59,347-Speed 5950.32 samples/sec Loss 10.8796 LearningRate 0.2454 Epoch: 5 Global Step: 61200 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 08:16:06,224-Speed 5957.04 samples/sec Loss 10.8503 LearningRate 0.2453 Epoch: 5 Global Step: 61210 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 08:16:13,077-Speed 5978.22 samples/sec Loss 10.8996 LearningRate 0.2453 Epoch: 5 Global Step: 61220 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 08:16:19,983-Speed 5933.76 samples/sec Loss 10.8308 LearningRate 0.2453 Epoch: 5 Global Step: 61230 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 08:16:26,832-Speed 5982.23 samples/sec Loss 10.8788 LearningRate 0.2452 Epoch: 5 Global Step: 61240 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:16:33,700-Speed 5965.06 samples/sec Loss 10.8007 LearningRate 0.2452 Epoch: 5 Global Step: 61250 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:16:40,567-Speed 5966.45 samples/sec Loss 10.8542 LearningRate 0.2452 Epoch: 5 Global Step: 61260 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:16:47,516-Speed 5895.34 samples/sec Loss 10.8654 LearningRate 0.2451 Epoch: 5 Global Step: 61270 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:16:54,455-Speed 5904.16 samples/sec Loss 10.9050 LearningRate 0.2451 Epoch: 5 Global Step: 61280 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:17:01,387-Speed 5911.31 samples/sec Loss 10.9123 LearningRate 0.2451 Epoch: 5 Global Step: 61290 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:17:08,313-Speed 5915.48 samples/sec Loss 10.8972 LearningRate 0.2450 Epoch: 5 Global Step: 61300 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 08:17:15,223-Speed 5928.73 samples/sec Loss 10.8613 LearningRate 0.2450 Epoch: 5 Global Step: 61310 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 08:17:22,097-Speed 5960.78 samples/sec Loss 10.8203 LearningRate 0.2450 Epoch: 5 Global Step: 61320 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 08:17:28,965-Speed 5965.42 samples/sec Loss 10.8662 LearningRate 0.2449 Epoch: 5 Global Step: 61330 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 08:17:35,819-Speed 5979.12 samples/sec Loss 10.8793 LearningRate 0.2449 Epoch: 5 Global Step: 61340 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 08:17:42,675-Speed 5974.93 samples/sec Loss 10.7946 LearningRate 0.2449 Epoch: 5 Global Step: 61350 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 08:17:49,528-Speed 5979.16 samples/sec Loss 10.7732 LearningRate 0.2448 Epoch: 5 Global Step: 61360 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 08:17:56,396-Speed 5964.18 samples/sec Loss 10.8268 LearningRate 0.2448 Epoch: 5 Global Step: 61370 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 08:18:03,263-Speed 5966.69 samples/sec Loss 10.8466 LearningRate 0.2448 Epoch: 5 Global Step: 61380 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 08:18:10,140-Speed 5956.70 samples/sec Loss 10.8161 LearningRate 0.2447 Epoch: 5 Global Step: 61390 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 08:18:16,994-Speed 5977.67 samples/sec Loss 10.8169 LearningRate 0.2447 Epoch: 5 Global Step: 61400 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:18:23,877-Speed 5951.58 samples/sec Loss 10.8461 LearningRate 0.2447 Epoch: 5 Global Step: 61410 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:18:30,739-Speed 5971.56 samples/sec Loss 10.9019 LearningRate 0.2446 Epoch: 5 Global Step: 61420 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:18:37,586-Speed 5982.72 samples/sec Loss 10.9134 LearningRate 0.2446 Epoch: 5 Global Step: 61430 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:18:44,438-Speed 5979.29 samples/sec Loss 10.8788 LearningRate 0.2446 Epoch: 5 Global Step: 61440 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:18:51,303-Speed 5968.37 samples/sec Loss 10.9044 LearningRate 0.2445 Epoch: 5 Global Step: 61450 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:18:58,158-Speed 5975.30 samples/sec Loss 10.8323 LearningRate 0.2445 Epoch: 5 Global Step: 61460 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:19:05,028-Speed 5964.08 samples/sec Loss 10.8476 LearningRate 0.2445 Epoch: 5 Global Step: 61470 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:19:11,889-Speed 5970.92 samples/sec Loss 10.8736 LearningRate 0.2444 Epoch: 5 Global Step: 61480 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:19:18,745-Speed 5975.53 samples/sec Loss 10.7797 LearningRate 0.2444 Epoch: 5 Global Step: 61490 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:19:25,607-Speed 5973.36 samples/sec Loss 10.8510 LearningRate 0.2444 Epoch: 5 Global Step: 61500 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 08:19:32,459-Speed 5979.51 samples/sec Loss 10.8269 LearningRate 0.2443 Epoch: 5 Global Step: 61510 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:19:39,315-Speed 5975.36 samples/sec Loss 10.7864 LearningRate 0.2443 Epoch: 5 Global Step: 61520 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:19:46,168-Speed 5979.82 samples/sec Loss 10.8317 LearningRate 0.2443 Epoch: 5 Global Step: 61530 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:19:53,031-Speed 5969.57 samples/sec Loss 10.8514 LearningRate 0.2442 Epoch: 5 Global Step: 61540 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:19:59,885-Speed 5977.13 samples/sec Loss 10.8932 LearningRate 0.2442 Epoch: 5 Global Step: 61550 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:20:06,756-Speed 5961.93 samples/sec Loss 10.8208 LearningRate 0.2442 Epoch: 5 Global Step: 61560 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:20:13,629-Speed 5960.68 samples/sec Loss 10.8890 LearningRate 0.2441 Epoch: 5 Global Step: 61570 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:20:20,486-Speed 5974.91 samples/sec Loss 10.8440 LearningRate 0.2441 Epoch: 5 Global Step: 61580 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:20:27,346-Speed 5971.42 samples/sec Loss 10.8674 LearningRate 0.2441 Epoch: 5 Global Step: 61590 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:20:34,235-Speed 5949.68 samples/sec Loss 10.7700 LearningRate 0.2440 Epoch: 5 Global Step: 61600 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:20:41,085-Speed 5980.48 samples/sec Loss 10.8245 LearningRate 0.2440 Epoch: 5 Global Step: 61610 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:20:47,952-Speed 5966.52 samples/sec Loss 10.9218 LearningRate 0.2440 Epoch: 5 Global Step: 61620 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:20:54,808-Speed 5975.15 samples/sec Loss 10.9206 LearningRate 0.2439 Epoch: 5 Global Step: 61630 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:21:01,679-Speed 5962.15 samples/sec Loss 10.8487 LearningRate 0.2439 Epoch: 5 Global Step: 61640 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:21:08,552-Speed 5961.15 samples/sec Loss 10.7762 LearningRate 0.2439 Epoch: 5 Global Step: 61650 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:21:15,486-Speed 5909.29 samples/sec Loss 10.8365 LearningRate 0.2438 Epoch: 5 Global Step: 61660 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:21:22,431-Speed 5898.44 samples/sec Loss 10.8344 LearningRate 0.2438 Epoch: 5 Global Step: 61670 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:21:29,291-Speed 5973.44 samples/sec Loss 10.8457 LearningRate 0.2438 Epoch: 5 Global Step: 61680 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:21:36,159-Speed 5968.13 samples/sec Loss 10.7700 LearningRate 0.2437 Epoch: 5 Global Step: 61690 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:21:43,032-Speed 5961.06 samples/sec Loss 10.8865 LearningRate 0.2437 Epoch: 5 Global Step: 61700 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:21:49,885-Speed 5978.03 samples/sec Loss 10.8536 LearningRate 0.2437 Epoch: 5 Global Step: 61710 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 08:21:56,756-Speed 5962.83 samples/sec Loss 10.8948 LearningRate 0.2436 Epoch: 5 Global Step: 61720 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:22:03,611-Speed 5976.02 samples/sec Loss 10.7191 LearningRate 0.2436 Epoch: 5 Global Step: 61730 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:22:10,474-Speed 5969.61 samples/sec Loss 10.8598 LearningRate 0.2436 Epoch: 5 Global Step: 61740 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:22:17,365-Speed 5945.73 samples/sec Loss 10.8810 LearningRate 0.2435 Epoch: 5 Global Step: 61750 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:22:24,236-Speed 5961.65 samples/sec Loss 10.8403 LearningRate 0.2435 Epoch: 5 Global Step: 61760 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:22:31,106-Speed 5975.05 samples/sec Loss 10.8881 LearningRate 0.2435 Epoch: 5 Global Step: 61770 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:22:37,967-Speed 5971.56 samples/sec Loss 10.7927 LearningRate 0.2434 Epoch: 5 Global Step: 61780 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:22:51,447-Speed 3038.78 samples/sec Loss 10.9354 LearningRate 0.2434 Epoch: 5 Global Step: 61790 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:22:58,289-Speed 5988.59 samples/sec Loss 10.8659 LearningRate 0.2434 Epoch: 5 Global Step: 61800 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:23:05,130-Speed 5988.31 samples/sec Loss 10.7560 LearningRate 0.2433 Epoch: 5 Global Step: 61810 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:23:12,001-Speed 5965.24 samples/sec Loss 10.8500 LearningRate 0.2433 Epoch: 5 Global Step: 61820 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 08:23:18,840-Speed 5990.28 samples/sec Loss 10.8100 LearningRate 0.2433 Epoch: 5 Global Step: 61830 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 08:23:25,675-Speed 5993.56 samples/sec Loss 10.7720 LearningRate 0.2432 Epoch: 5 Global Step: 61840 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:23:32,527-Speed 5979.12 samples/sec Loss 10.8474 LearningRate 0.2432 Epoch: 5 Global Step: 61850 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:23:39,369-Speed 5987.84 samples/sec Loss 10.8557 LearningRate 0.2432 Epoch: 5 Global Step: 61860 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:23:46,214-Speed 5984.12 samples/sec Loss 10.7936 LearningRate 0.2431 Epoch: 5 Global Step: 61870 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:23:53,069-Speed 5976.06 samples/sec Loss 10.8466 LearningRate 0.2431 Epoch: 5 Global Step: 61880 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:23:59,919-Speed 5981.21 samples/sec Loss 10.8385 LearningRate 0.2431 Epoch: 5 Global Step: 61890 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:24:06,764-Speed 5984.71 samples/sec Loss 10.7986 LearningRate 0.2430 Epoch: 5 Global Step: 61900 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:24:13,618-Speed 5983.89 samples/sec Loss 10.8999 LearningRate 0.2430 Epoch: 5 Global Step: 61910 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:24:20,475-Speed 5976.80 samples/sec Loss 10.8393 LearningRate 0.2430 Epoch: 5 Global Step: 61920 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:24:27,343-Speed 5964.54 samples/sec Loss 10.8715 LearningRate 0.2429 Epoch: 5 Global Step: 61930 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:24:34,224-Speed 5955.84 samples/sec Loss 10.8218 LearningRate 0.2429 Epoch: 5 Global Step: 61940 Fp16 Grad Scale: 262144 Required: 29 hours Training: 2022-01-08 08:24:41,084-Speed 5971.68 samples/sec Loss 10.8218 LearningRate 0.2429 Epoch: 5 Global Step: 61950 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:24:47,950-Speed 5966.79 samples/sec Loss 10.7873 LearningRate 0.2428 Epoch: 5 Global Step: 61960 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:24:54,823-Speed 5960.81 samples/sec Loss 10.7777 LearningRate 0.2428 Epoch: 5 Global Step: 61970 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 08:25:01,689-Speed 5966.83 samples/sec Loss 10.8395 LearningRate 0.2428 Epoch: 5 Global Step: 61980 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:25:08,565-Speed 5958.21 samples/sec Loss 10.7285 LearningRate 0.2427 Epoch: 5 Global Step: 61990 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:25:15,502-Speed 5906.61 samples/sec Loss 10.8225 LearningRate 0.2427 Epoch: 5 Global Step: 62000 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:25:22,500-Speed 5854.30 samples/sec Loss 10.7793 LearningRate 0.2427 Epoch: 5 Global Step: 62010 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:25:29,351-Speed 5979.96 samples/sec Loss 10.7516 LearningRate 0.2426 Epoch: 5 Global Step: 62020 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:25:36,229-Speed 5956.82 samples/sec Loss 10.8572 LearningRate 0.2426 Epoch: 5 Global Step: 62030 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:25:43,087-Speed 5974.42 samples/sec Loss 10.6895 LearningRate 0.2426 Epoch: 5 Global Step: 62040 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:25:49,963-Speed 5957.53 samples/sec Loss 10.7883 LearningRate 0.2425 Epoch: 5 Global Step: 62050 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:25:56,844-Speed 5954.52 samples/sec Loss 10.8530 LearningRate 0.2425 Epoch: 5 Global Step: 62060 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:26:03,837-Speed 5858.00 samples/sec Loss 10.8370 LearningRate 0.2425 Epoch: 5 Global Step: 62070 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:26:10,781-Speed 5900.09 samples/sec Loss 10.9688 LearningRate 0.2424 Epoch: 5 Global Step: 62080 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:26:17,758-Speed 5873.08 samples/sec Loss 10.8259 LearningRate 0.2424 Epoch: 5 Global Step: 62090 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:26:24,610-Speed 5978.53 samples/sec Loss 10.9467 LearningRate 0.2424 Epoch: 5 Global Step: 62100 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:26:31,469-Speed 5973.12 samples/sec Loss 10.8156 LearningRate 0.2423 Epoch: 5 Global Step: 62110 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:26:38,321-Speed 5978.48 samples/sec Loss 10.8370 LearningRate 0.2423 Epoch: 5 Global Step: 62120 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:26:45,170-Speed 5982.38 samples/sec Loss 10.8448 LearningRate 0.2423 Epoch: 5 Global Step: 62130 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:26:52,019-Speed 5980.72 samples/sec Loss 10.7631 LearningRate 0.2422 Epoch: 5 Global Step: 62140 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:26:58,885-Speed 5967.14 samples/sec Loss 10.8314 LearningRate 0.2422 Epoch: 5 Global Step: 62150 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:27:05,736-Speed 5980.00 samples/sec Loss 10.8678 LearningRate 0.2422 Epoch: 5 Global Step: 62160 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:27:12,585-Speed 5981.06 samples/sec Loss 10.7911 LearningRate 0.2421 Epoch: 5 Global Step: 62170 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:27:19,435-Speed 5981.78 samples/sec Loss 10.8309 LearningRate 0.2421 Epoch: 5 Global Step: 62180 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:27:26,303-Speed 5966.79 samples/sec Loss 10.7911 LearningRate 0.2421 Epoch: 5 Global Step: 62190 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:27:33,156-Speed 5978.03 samples/sec Loss 10.8484 LearningRate 0.2420 Epoch: 5 Global Step: 62200 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:27:40,003-Speed 5983.45 samples/sec Loss 10.8340 LearningRate 0.2420 Epoch: 5 Global Step: 62210 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:28:03,307-Speed 1757.75 samples/sec Loss 10.7980 LearningRate 0.2420 Epoch: 6 Global Step: 62220 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:28:10,365-Speed 5804.38 samples/sec Loss 10.8225 LearningRate 0.2419 Epoch: 6 Global Step: 62230 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:28:17,207-Speed 5987.77 samples/sec Loss 10.8946 LearningRate 0.2419 Epoch: 6 Global Step: 62240 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:28:24,039-Speed 5996.73 samples/sec Loss 10.8294 LearningRate 0.2419 Epoch: 6 Global Step: 62250 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:28:30,870-Speed 5997.88 samples/sec Loss 10.8055 LearningRate 0.2418 Epoch: 6 Global Step: 62260 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:28:37,706-Speed 5993.38 samples/sec Loss 10.8095 LearningRate 0.2418 Epoch: 6 Global Step: 62270 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:28:44,553-Speed 5983.06 samples/sec Loss 10.8185 LearningRate 0.2418 Epoch: 6 Global Step: 62280 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:28:51,403-Speed 5980.07 samples/sec Loss 10.8079 LearningRate 0.2417 Epoch: 6 Global Step: 62290 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:28:58,304-Speed 5938.01 samples/sec Loss 10.8336 LearningRate 0.2417 Epoch: 6 Global Step: 62300 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:29:05,279-Speed 5874.40 samples/sec Loss 10.7769 LearningRate 0.2417 Epoch: 6 Global Step: 62310 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:29:12,247-Speed 5879.76 samples/sec Loss 10.7213 LearningRate 0.2416 Epoch: 6 Global Step: 62320 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:29:19,229-Speed 5867.62 samples/sec Loss 10.7869 LearningRate 0.2416 Epoch: 6 Global Step: 62330 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:29:26,079-Speed 5980.85 samples/sec Loss 10.7224 LearningRate 0.2416 Epoch: 6 Global Step: 62340 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:29:32,970-Speed 5945.40 samples/sec Loss 10.7583 LearningRate 0.2415 Epoch: 6 Global Step: 62350 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:29:39,823-Speed 5977.73 samples/sec Loss 10.7406 LearningRate 0.2415 Epoch: 6 Global Step: 62360 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:29:46,719-Speed 5941.30 samples/sec Loss 10.8074 LearningRate 0.2415 Epoch: 6 Global Step: 62370 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:29:53,615-Speed 5940.82 samples/sec Loss 10.7467 LearningRate 0.2414 Epoch: 6 Global Step: 62380 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:30:00,478-Speed 5969.72 samples/sec Loss 10.7390 LearningRate 0.2414 Epoch: 6 Global Step: 62390 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:30:07,400-Speed 5918.17 samples/sec Loss 10.8627 LearningRate 0.2414 Epoch: 6 Global Step: 62400 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:30:14,341-Speed 5902.15 samples/sec Loss 10.8696 LearningRate 0.2413 Epoch: 6 Global Step: 62410 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:30:21,276-Speed 5907.16 samples/sec Loss 10.8439 LearningRate 0.2413 Epoch: 6 Global Step: 62420 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:30:28,203-Speed 5914.01 samples/sec Loss 10.7685 LearningRate 0.2413 Epoch: 6 Global Step: 62430 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:30:35,086-Speed 5951.70 samples/sec Loss 10.7824 LearningRate 0.2412 Epoch: 6 Global Step: 62440 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:30:41,946-Speed 5971.95 samples/sec Loss 10.8586 LearningRate 0.2412 Epoch: 6 Global Step: 62450 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:30:48,797-Speed 5980.02 samples/sec Loss 10.7543 LearningRate 0.2412 Epoch: 6 Global Step: 62460 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:30:55,687-Speed 5945.43 samples/sec Loss 10.8291 LearningRate 0.2411 Epoch: 6 Global Step: 62470 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:31:02,534-Speed 5985.38 samples/sec Loss 10.8026 LearningRate 0.2411 Epoch: 6 Global Step: 62480 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:31:09,402-Speed 5964.84 samples/sec Loss 10.7830 LearningRate 0.2411 Epoch: 6 Global Step: 62490 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:31:16,242-Speed 5988.62 samples/sec Loss 10.7935 LearningRate 0.2410 Epoch: 6 Global Step: 62500 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:31:23,107-Speed 5967.71 samples/sec Loss 10.7723 LearningRate 0.2410 Epoch: 6 Global Step: 62510 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:31:29,989-Speed 5952.18 samples/sec Loss 10.7928 LearningRate 0.2410 Epoch: 6 Global Step: 62520 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:31:36,842-Speed 5979.07 samples/sec Loss 10.7510 LearningRate 0.2409 Epoch: 6 Global Step: 62530 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:31:43,706-Speed 5968.25 samples/sec Loss 10.7694 LearningRate 0.2409 Epoch: 6 Global Step: 62540 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:31:50,574-Speed 5964.73 samples/sec Loss 10.7738 LearningRate 0.2409 Epoch: 6 Global Step: 62550 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:31:57,428-Speed 5977.55 samples/sec Loss 10.7258 LearningRate 0.2408 Epoch: 6 Global Step: 62560 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:32:04,283-Speed 5978.15 samples/sec Loss 10.7469 LearningRate 0.2408 Epoch: 6 Global Step: 62570 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:32:11,136-Speed 5977.68 samples/sec Loss 10.7791 LearningRate 0.2408 Epoch: 6 Global Step: 62580 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:32:18,003-Speed 5966.52 samples/sec Loss 10.7806 LearningRate 0.2407 Epoch: 6 Global Step: 62590 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:32:24,877-Speed 5959.37 samples/sec Loss 10.7733 LearningRate 0.2407 Epoch: 6 Global Step: 62600 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:32:31,760-Speed 5952.22 samples/sec Loss 10.7833 LearningRate 0.2407 Epoch: 6 Global Step: 62610 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:32:38,620-Speed 5971.69 samples/sec Loss 10.7015 LearningRate 0.2406 Epoch: 6 Global Step: 62620 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:32:45,492-Speed 5962.02 samples/sec Loss 10.7830 LearningRate 0.2406 Epoch: 6 Global Step: 62630 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:32:52,352-Speed 5972.08 samples/sec Loss 10.7015 LearningRate 0.2406 Epoch: 6 Global Step: 62640 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:32:59,210-Speed 5973.82 samples/sec Loss 10.8031 LearningRate 0.2405 Epoch: 6 Global Step: 62650 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:33:06,068-Speed 5973.10 samples/sec Loss 10.8831 LearningRate 0.2405 Epoch: 6 Global Step: 62660 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:33:12,923-Speed 5979.13 samples/sec Loss 10.7940 LearningRate 0.2405 Epoch: 6 Global Step: 62670 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:33:19,773-Speed 5980.74 samples/sec Loss 10.8324 LearningRate 0.2404 Epoch: 6 Global Step: 62680 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:33:26,627-Speed 5976.86 samples/sec Loss 10.8621 LearningRate 0.2404 Epoch: 6 Global Step: 62690 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:33:33,485-Speed 5975.32 samples/sec Loss 10.8347 LearningRate 0.2404 Epoch: 6 Global Step: 62700 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:33:40,335-Speed 5980.59 samples/sec Loss 10.8163 LearningRate 0.2403 Epoch: 6 Global Step: 62710 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:33:47,186-Speed 5979.69 samples/sec Loss 10.8250 LearningRate 0.2403 Epoch: 6 Global Step: 62720 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:33:54,063-Speed 5957.16 samples/sec Loss 10.7946 LearningRate 0.2403 Epoch: 6 Global Step: 62730 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:34:00,923-Speed 5972.01 samples/sec Loss 10.7868 LearningRate 0.2402 Epoch: 6 Global Step: 62740 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:34:07,794-Speed 5961.83 samples/sec Loss 10.7139 LearningRate 0.2402 Epoch: 6 Global Step: 62750 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:34:14,655-Speed 5973.12 samples/sec Loss 10.7174 LearningRate 0.2402 Epoch: 6 Global Step: 62760 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:34:21,535-Speed 5955.42 samples/sec Loss 10.6757 LearningRate 0.2401 Epoch: 6 Global Step: 62770 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:34:28,396-Speed 5970.42 samples/sec Loss 10.7562 LearningRate 0.2401 Epoch: 6 Global Step: 62780 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:34:35,250-Speed 5978.15 samples/sec Loss 10.7158 LearningRate 0.2401 Epoch: 6 Global Step: 62790 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:34:42,137-Speed 5948.75 samples/sec Loss 10.6826 LearningRate 0.2400 Epoch: 6 Global Step: 62800 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:34:48,990-Speed 5977.37 samples/sec Loss 10.7506 LearningRate 0.2400 Epoch: 6 Global Step: 62810 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:34:55,823-Speed 5995.08 samples/sec Loss 10.7683 LearningRate 0.2400 Epoch: 6 Global Step: 62820 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:35:02,669-Speed 5985.00 samples/sec Loss 10.6950 LearningRate 0.2399 Epoch: 6 Global Step: 62830 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:35:09,517-Speed 5981.80 samples/sec Loss 10.7095 LearningRate 0.2399 Epoch: 6 Global Step: 62840 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:35:16,367-Speed 5980.72 samples/sec Loss 10.7871 LearningRate 0.2399 Epoch: 6 Global Step: 62850 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:35:23,230-Speed 5972.16 samples/sec Loss 10.8212 LearningRate 0.2398 Epoch: 6 Global Step: 62860 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:35:30,158-Speed 5913.09 samples/sec Loss 10.8128 LearningRate 0.2398 Epoch: 6 Global Step: 62870 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:35:37,015-Speed 5974.84 samples/sec Loss 10.7120 LearningRate 0.2398 Epoch: 6 Global Step: 62880 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 08:35:43,889-Speed 5959.73 samples/sec Loss 10.7666 LearningRate 0.2397 Epoch: 6 Global Step: 62890 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 08:35:50,780-Speed 5945.72 samples/sec Loss 10.7793 LearningRate 0.2397 Epoch: 6 Global Step: 62900 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 08:35:57,665-Speed 5952.22 samples/sec Loss 10.8491 LearningRate 0.2397 Epoch: 6 Global Step: 62910 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 08:36:04,553-Speed 5948.28 samples/sec Loss 10.7710 LearningRate 0.2396 Epoch: 6 Global Step: 62920 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 08:36:11,406-Speed 5977.87 samples/sec Loss 10.7609 LearningRate 0.2396 Epoch: 6 Global Step: 62930 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 08:36:18,265-Speed 5972.94 samples/sec Loss 10.7234 LearningRate 0.2396 Epoch: 6 Global Step: 62940 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 08:36:25,131-Speed 5966.57 samples/sec Loss 10.8534 LearningRate 0.2395 Epoch: 6 Global Step: 62950 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 08:36:32,011-Speed 5954.92 samples/sec Loss 10.7240 LearningRate 0.2395 Epoch: 6 Global Step: 62960 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 08:36:38,882-Speed 5962.67 samples/sec Loss 10.7522 LearningRate 0.2395 Epoch: 6 Global Step: 62970 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 08:36:45,751-Speed 5964.24 samples/sec Loss 10.6780 LearningRate 0.2394 Epoch: 6 Global Step: 62980 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:36:52,632-Speed 5953.74 samples/sec Loss 10.7685 LearningRate 0.2394 Epoch: 6 Global Step: 62990 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:36:59,496-Speed 5968.19 samples/sec Loss 10.7442 LearningRate 0.2394 Epoch: 6 Global Step: 63000 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:37:06,372-Speed 5959.23 samples/sec Loss 10.7661 LearningRate 0.2393 Epoch: 6 Global Step: 63010 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:37:13,240-Speed 5964.86 samples/sec Loss 10.8228 LearningRate 0.2393 Epoch: 6 Global Step: 63020 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:37:20,112-Speed 5961.31 samples/sec Loss 10.8114 LearningRate 0.2393 Epoch: 6 Global Step: 63030 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:37:26,964-Speed 5978.71 samples/sec Loss 10.6907 LearningRate 0.2392 Epoch: 6 Global Step: 63040 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:37:33,826-Speed 5970.66 samples/sec Loss 10.7215 LearningRate 0.2392 Epoch: 6 Global Step: 63050 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:37:40,681-Speed 5976.92 samples/sec Loss 10.7854 LearningRate 0.2392 Epoch: 6 Global Step: 63060 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:37:47,551-Speed 5962.43 samples/sec Loss 10.7410 LearningRate 0.2391 Epoch: 6 Global Step: 63070 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:37:54,381-Speed 5998.72 samples/sec Loss 10.7121 LearningRate 0.2391 Epoch: 6 Global Step: 63080 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-01-08 08:38:01,243-Speed 5970.58 samples/sec Loss 10.8641 LearningRate 0.2391 Epoch: 6 Global Step: 63090 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-01-08 08:38:08,100-Speed 5975.46 samples/sec Loss 10.8400 LearningRate 0.2390 Epoch: 6 Global Step: 63100 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-01-08 08:38:14,962-Speed 5971.06 samples/sec Loss 10.7597 LearningRate 0.2390 Epoch: 6 Global Step: 63110 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-01-08 08:38:21,831-Speed 5963.95 samples/sec Loss 10.7742 LearningRate 0.2390 Epoch: 6 Global Step: 63120 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-01-08 08:38:28,689-Speed 5973.88 samples/sec Loss 10.7319 LearningRate 0.2389 Epoch: 6 Global Step: 63130 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-01-08 08:38:35,545-Speed 5975.88 samples/sec Loss 10.7249 LearningRate 0.2389 Epoch: 6 Global Step: 63140 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-01-08 08:38:42,409-Speed 5969.29 samples/sec Loss 10.7353 LearningRate 0.2389 Epoch: 6 Global Step: 63150 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-01-08 08:38:49,251-Speed 5987.41 samples/sec Loss 10.7728 LearningRate 0.2388 Epoch: 6 Global Step: 63160 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-01-08 08:38:56,094-Speed 5987.09 samples/sec Loss 10.7852 LearningRate 0.2388 Epoch: 6 Global Step: 63170 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-01-08 08:39:02,944-Speed 5980.85 samples/sec Loss 10.6956 LearningRate 0.2388 Epoch: 6 Global Step: 63180 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 08:39:09,803-Speed 5972.99 samples/sec Loss 10.7600 LearningRate 0.2387 Epoch: 6 Global Step: 63190 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 08:39:16,695-Speed 5944.46 samples/sec Loss 10.7271 LearningRate 0.2387 Epoch: 6 Global Step: 63200 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 08:39:23,539-Speed 5985.92 samples/sec Loss 10.7093 LearningRate 0.2387 Epoch: 6 Global Step: 63210 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 08:39:30,384-Speed 5984.59 samples/sec Loss 10.7438 LearningRate 0.2386 Epoch: 6 Global Step: 63220 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 08:39:37,242-Speed 5973.86 samples/sec Loss 10.7072 LearningRate 0.2386 Epoch: 6 Global Step: 63230 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 08:39:44,128-Speed 5949.86 samples/sec Loss 10.7168 LearningRate 0.2386 Epoch: 6 Global Step: 63240 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 08:39:50,998-Speed 5963.19 samples/sec Loss 10.8060 LearningRate 0.2385 Epoch: 6 Global Step: 63250 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 08:39:57,846-Speed 5982.54 samples/sec Loss 10.6934 LearningRate 0.2385 Epoch: 6 Global Step: 63260 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 08:40:04,709-Speed 5970.00 samples/sec Loss 10.7716 LearningRate 0.2385 Epoch: 6 Global Step: 63270 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 08:40:11,562-Speed 5978.00 samples/sec Loss 10.7966 LearningRate 0.2384 Epoch: 6 Global Step: 63280 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:40:18,437-Speed 5959.00 samples/sec Loss 10.6640 LearningRate 0.2384 Epoch: 6 Global Step: 63290 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:40:25,285-Speed 5982.32 samples/sec Loss 10.7023 LearningRate 0.2384 Epoch: 6 Global Step: 63300 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:40:32,132-Speed 5983.28 samples/sec Loss 10.7073 LearningRate 0.2383 Epoch: 6 Global Step: 63310 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:40:38,975-Speed 5987.04 samples/sec Loss 10.6616 LearningRate 0.2383 Epoch: 6 Global Step: 63320 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:40:47,395-Speed 4865.03 samples/sec Loss 10.6889 LearningRate 0.2383 Epoch: 6 Global Step: 63330 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:40:54,257-Speed 5970.77 samples/sec Loss 10.6459 LearningRate 0.2382 Epoch: 6 Global Step: 63340 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:41:01,147-Speed 5945.28 samples/sec Loss 10.6668 LearningRate 0.2382 Epoch: 6 Global Step: 63350 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:41:07,994-Speed 5983.87 samples/sec Loss 10.7309 LearningRate 0.2382 Epoch: 6 Global Step: 63360 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:41:14,854-Speed 5971.86 samples/sec Loss 10.7968 LearningRate 0.2381 Epoch: 6 Global Step: 63370 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:41:21,702-Speed 5982.16 samples/sec Loss 10.6907 LearningRate 0.2381 Epoch: 6 Global Step: 63380 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:41:28,571-Speed 5964.77 samples/sec Loss 10.7732 LearningRate 0.2381 Epoch: 6 Global Step: 63390 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:41:35,419-Speed 5982.29 samples/sec Loss 10.7416 LearningRate 0.2380 Epoch: 6 Global Step: 63400 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:41:42,270-Speed 5979.31 samples/sec Loss 10.7764 LearningRate 0.2380 Epoch: 6 Global Step: 63410 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:41:49,131-Speed 5971.83 samples/sec Loss 10.7337 LearningRate 0.2380 Epoch: 6 Global Step: 63420 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:41:55,978-Speed 5983.47 samples/sec Loss 10.7065 LearningRate 0.2379 Epoch: 6 Global Step: 63430 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:42:02,829-Speed 5979.26 samples/sec Loss 10.6975 LearningRate 0.2379 Epoch: 6 Global Step: 63440 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:42:09,706-Speed 5958.33 samples/sec Loss 10.7531 LearningRate 0.2379 Epoch: 6 Global Step: 63450 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:42:16,565-Speed 5972.52 samples/sec Loss 10.7030 LearningRate 0.2378 Epoch: 6 Global Step: 63460 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:42:23,424-Speed 5974.91 samples/sec Loss 10.6800 LearningRate 0.2378 Epoch: 6 Global Step: 63470 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:42:30,303-Speed 5955.15 samples/sec Loss 10.7206 LearningRate 0.2378 Epoch: 6 Global Step: 63480 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:42:37,165-Speed 5969.95 samples/sec Loss 10.7959 LearningRate 0.2377 Epoch: 6 Global Step: 63490 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:42:44,069-Speed 5933.76 samples/sec Loss 10.7818 LearningRate 0.2377 Epoch: 6 Global Step: 63500 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:42:50,914-Speed 5984.52 samples/sec Loss 10.7259 LearningRate 0.2377 Epoch: 6 Global Step: 63510 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:42:57,799-Speed 5951.83 samples/sec Loss 10.7197 LearningRate 0.2376 Epoch: 6 Global Step: 63520 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:43:04,643-Speed 5985.74 samples/sec Loss 10.6803 LearningRate 0.2376 Epoch: 6 Global Step: 63530 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:43:11,517-Speed 5959.89 samples/sec Loss 10.7533 LearningRate 0.2376 Epoch: 6 Global Step: 63540 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:43:18,432-Speed 5924.18 samples/sec Loss 10.6526 LearningRate 0.2375 Epoch: 6 Global Step: 63550 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:43:25,368-Speed 5907.20 samples/sec Loss 10.6908 LearningRate 0.2375 Epoch: 6 Global Step: 63560 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:43:32,295-Speed 5914.35 samples/sec Loss 10.6169 LearningRate 0.2375 Epoch: 6 Global Step: 63570 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:43:39,173-Speed 5956.42 samples/sec Loss 10.7501 LearningRate 0.2374 Epoch: 6 Global Step: 63580 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:43:46,051-Speed 5956.50 samples/sec Loss 10.7576 LearningRate 0.2374 Epoch: 6 Global Step: 63590 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:43:52,922-Speed 5964.20 samples/sec Loss 10.7668 LearningRate 0.2374 Epoch: 6 Global Step: 63600 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:43:59,769-Speed 5982.93 samples/sec Loss 10.7274 LearningRate 0.2373 Epoch: 6 Global Step: 63610 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:44:06,641-Speed 5961.63 samples/sec Loss 10.6334 LearningRate 0.2373 Epoch: 6 Global Step: 63620 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:44:13,486-Speed 5985.07 samples/sec Loss 10.7552 LearningRate 0.2373 Epoch: 6 Global Step: 63630 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:44:20,343-Speed 5974.71 samples/sec Loss 10.6877 LearningRate 0.2372 Epoch: 6 Global Step: 63640 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:44:27,287-Speed 5899.93 samples/sec Loss 10.6450 LearningRate 0.2372 Epoch: 6 Global Step: 63650 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:44:34,148-Speed 5971.62 samples/sec Loss 10.7024 LearningRate 0.2372 Epoch: 6 Global Step: 63660 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:44:41,022-Speed 5959.49 samples/sec Loss 10.6793 LearningRate 0.2371 Epoch: 6 Global Step: 63670 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:44:47,899-Speed 5957.34 samples/sec Loss 10.6575 LearningRate 0.2371 Epoch: 6 Global Step: 63680 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:44:54,775-Speed 5958.07 samples/sec Loss 10.6684 LearningRate 0.2371 Epoch: 6 Global Step: 63690 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:45:01,645-Speed 5964.09 samples/sec Loss 10.6893 LearningRate 0.2370 Epoch: 6 Global Step: 63700 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:45:08,555-Speed 5928.34 samples/sec Loss 10.7846 LearningRate 0.2370 Epoch: 6 Global Step: 63710 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:45:15,497-Speed 5902.29 samples/sec Loss 10.7648 LearningRate 0.2370 Epoch: 6 Global Step: 63720 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:45:22,382-Speed 5950.12 samples/sec Loss 10.6004 LearningRate 0.2369 Epoch: 6 Global Step: 63730 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:45:29,239-Speed 5974.91 samples/sec Loss 10.6830 LearningRate 0.2369 Epoch: 6 Global Step: 63740 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:45:36,107-Speed 5964.50 samples/sec Loss 10.8254 LearningRate 0.2369 Epoch: 6 Global Step: 63750 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:45:42,961-Speed 5977.35 samples/sec Loss 10.6894 LearningRate 0.2368 Epoch: 6 Global Step: 63760 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:45:49,827-Speed 5966.31 samples/sec Loss 10.7057 LearningRate 0.2368 Epoch: 6 Global Step: 63770 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:45:56,715-Speed 5947.39 samples/sec Loss 10.7196 LearningRate 0.2368 Epoch: 6 Global Step: 63780 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:46:03,642-Speed 5917.12 samples/sec Loss 10.7077 LearningRate 0.2367 Epoch: 6 Global Step: 63790 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:46:10,559-Speed 5923.99 samples/sec Loss 10.7355 LearningRate 0.2367 Epoch: 6 Global Step: 63800 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:46:17,416-Speed 5974.63 samples/sec Loss 10.6166 LearningRate 0.2367 Epoch: 6 Global Step: 63810 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:46:24,291-Speed 5958.72 samples/sec Loss 10.7343 LearningRate 0.2367 Epoch: 6 Global Step: 63820 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:46:31,151-Speed 5972.53 samples/sec Loss 10.6724 LearningRate 0.2366 Epoch: 6 Global Step: 63830 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:46:38,021-Speed 5964.90 samples/sec Loss 10.6503 LearningRate 0.2366 Epoch: 6 Global Step: 63840 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:46:44,908-Speed 5948.47 samples/sec Loss 10.6201 LearningRate 0.2366 Epoch: 6 Global Step: 63850 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:46:51,765-Speed 5975.11 samples/sec Loss 10.6812 LearningRate 0.2365 Epoch: 6 Global Step: 63860 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:46:58,618-Speed 5977.65 samples/sec Loss 10.7335 LearningRate 0.2365 Epoch: 6 Global Step: 63870 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:47:05,492-Speed 5982.26 samples/sec Loss 10.7006 LearningRate 0.2365 Epoch: 6 Global Step: 63880 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:47:12,474-Speed 5867.67 samples/sec Loss 10.5846 LearningRate 0.2364 Epoch: 6 Global Step: 63890 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:47:19,370-Speed 5940.78 samples/sec Loss 10.6456 LearningRate 0.2364 Epoch: 6 Global Step: 63900 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:47:26,220-Speed 5979.95 samples/sec Loss 10.7159 LearningRate 0.2364 Epoch: 6 Global Step: 63910 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:47:33,077-Speed 5974.39 samples/sec Loss 10.7325 LearningRate 0.2363 Epoch: 6 Global Step: 63920 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:47:39,944-Speed 5966.70 samples/sec Loss 10.6585 LearningRate 0.2363 Epoch: 6 Global Step: 63930 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:47:46,800-Speed 5974.69 samples/sec Loss 10.6988 LearningRate 0.2363 Epoch: 6 Global Step: 63940 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:47:53,678-Speed 5956.75 samples/sec Loss 10.6670 LearningRate 0.2362 Epoch: 6 Global Step: 63950 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:48:00,532-Speed 5977.57 samples/sec Loss 10.6802 LearningRate 0.2362 Epoch: 6 Global Step: 63960 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:48:07,422-Speed 5946.00 samples/sec Loss 10.7210 LearningRate 0.2362 Epoch: 6 Global Step: 63970 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:48:14,277-Speed 5976.64 samples/sec Loss 10.6548 LearningRate 0.2361 Epoch: 6 Global Step: 63980 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:48:21,138-Speed 5970.86 samples/sec Loss 10.6841 LearningRate 0.2361 Epoch: 6 Global Step: 63990 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:48:28,023-Speed 5950.88 samples/sec Loss 10.6410 LearningRate 0.2361 Epoch: 6 Global Step: 64000 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:48:34,875-Speed 5978.89 samples/sec Loss 10.6694 LearningRate 0.2360 Epoch: 6 Global Step: 64010 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:48:41,761-Speed 5949.59 samples/sec Loss 10.6891 LearningRate 0.2360 Epoch: 6 Global Step: 64020 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:48:48,646-Speed 5950.27 samples/sec Loss 10.6893 LearningRate 0.2360 Epoch: 6 Global Step: 64030 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:48:55,523-Speed 5957.54 samples/sec Loss 10.7640 LearningRate 0.2359 Epoch: 6 Global Step: 64040 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:49:02,368-Speed 5984.54 samples/sec Loss 10.8019 LearningRate 0.2359 Epoch: 6 Global Step: 64050 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:49:09,236-Speed 5967.65 samples/sec Loss 10.6706 LearningRate 0.2359 Epoch: 6 Global Step: 64060 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:49:16,094-Speed 5973.81 samples/sec Loss 10.6684 LearningRate 0.2358 Epoch: 6 Global Step: 64070 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:49:22,951-Speed 5974.93 samples/sec Loss 10.6999 LearningRate 0.2358 Epoch: 6 Global Step: 64080 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:49:29,817-Speed 5966.88 samples/sec Loss 10.5975 LearningRate 0.2358 Epoch: 6 Global Step: 64090 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:49:36,679-Speed 5971.39 samples/sec Loss 10.6365 LearningRate 0.2357 Epoch: 6 Global Step: 64100 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:49:43,518-Speed 5989.68 samples/sec Loss 10.6194 LearningRate 0.2357 Epoch: 6 Global Step: 64110 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:49:50,360-Speed 5987.38 samples/sec Loss 10.7146 LearningRate 0.2357 Epoch: 6 Global Step: 64120 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:49:57,225-Speed 5968.02 samples/sec Loss 10.5972 LearningRate 0.2356 Epoch: 6 Global Step: 64130 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:50:04,107-Speed 5955.36 samples/sec Loss 10.5935 LearningRate 0.2356 Epoch: 6 Global Step: 64140 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:50:10,971-Speed 5970.46 samples/sec Loss 10.6975 LearningRate 0.2356 Epoch: 6 Global Step: 64150 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:50:17,917-Speed 5897.91 samples/sec Loss 10.6647 LearningRate 0.2355 Epoch: 6 Global Step: 64160 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:50:24,780-Speed 5969.94 samples/sec Loss 10.6857 LearningRate 0.2355 Epoch: 6 Global Step: 64170 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:50:31,636-Speed 5975.30 samples/sec Loss 10.5878 LearningRate 0.2355 Epoch: 6 Global Step: 64180 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:50:38,484-Speed 5984.68 samples/sec Loss 10.7546 LearningRate 0.2354 Epoch: 6 Global Step: 64190 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:50:45,335-Speed 5980.00 samples/sec Loss 10.6631 LearningRate 0.2354 Epoch: 6 Global Step: 64200 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:50:52,203-Speed 5965.16 samples/sec Loss 10.6315 LearningRate 0.2354 Epoch: 6 Global Step: 64210 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:50:59,048-Speed 5985.19 samples/sec Loss 10.6280 LearningRate 0.2353 Epoch: 6 Global Step: 64220 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:51:05,891-Speed 5986.56 samples/sec Loss 10.6639 LearningRate 0.2353 Epoch: 6 Global Step: 64230 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:51:12,737-Speed 5984.75 samples/sec Loss 10.6805 LearningRate 0.2353 Epoch: 6 Global Step: 64240 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:51:19,582-Speed 5984.68 samples/sec Loss 10.6685 LearningRate 0.2352 Epoch: 6 Global Step: 64250 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:51:26,449-Speed 5965.43 samples/sec Loss 10.6428 LearningRate 0.2352 Epoch: 6 Global Step: 64260 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:51:33,304-Speed 5976.51 samples/sec Loss 10.7061 LearningRate 0.2352 Epoch: 6 Global Step: 64270 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:51:40,162-Speed 5973.65 samples/sec Loss 10.5891 LearningRate 0.2351 Epoch: 6 Global Step: 64280 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:51:47,035-Speed 5960.65 samples/sec Loss 10.7539 LearningRate 0.2351 Epoch: 6 Global Step: 64290 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:51:53,889-Speed 5977.16 samples/sec Loss 10.6776 LearningRate 0.2351 Epoch: 6 Global Step: 64300 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:52:00,736-Speed 5982.75 samples/sec Loss 10.6301 LearningRate 0.2350 Epoch: 6 Global Step: 64310 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:52:07,593-Speed 5974.72 samples/sec Loss 10.6235 LearningRate 0.2350 Epoch: 6 Global Step: 64320 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:52:14,460-Speed 5966.14 samples/sec Loss 10.6286 LearningRate 0.2350 Epoch: 6 Global Step: 64330 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:52:21,310-Speed 5980.05 samples/sec Loss 10.7118 LearningRate 0.2349 Epoch: 6 Global Step: 64340 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:52:28,160-Speed 5980.39 samples/sec Loss 10.6215 LearningRate 0.2349 Epoch: 6 Global Step: 64350 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:52:35,012-Speed 5979.02 samples/sec Loss 10.5619 LearningRate 0.2349 Epoch: 6 Global Step: 64360 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:52:41,878-Speed 5966.45 samples/sec Loss 10.6354 LearningRate 0.2348 Epoch: 6 Global Step: 64370 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:52:48,728-Speed 5980.80 samples/sec Loss 10.7103 LearningRate 0.2348 Epoch: 6 Global Step: 64380 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:52:55,579-Speed 5982.86 samples/sec Loss 10.7838 LearningRate 0.2348 Epoch: 6 Global Step: 64390 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:53:02,423-Speed 5986.26 samples/sec Loss 10.7049 LearningRate 0.2347 Epoch: 6 Global Step: 64400 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:53:09,261-Speed 5992.05 samples/sec Loss 10.6094 LearningRate 0.2347 Epoch: 6 Global Step: 64410 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:53:16,103-Speed 5987.01 samples/sec Loss 10.6303 LearningRate 0.2347 Epoch: 6 Global Step: 64420 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:53:22,966-Speed 5971.14 samples/sec Loss 10.6094 LearningRate 0.2346 Epoch: 6 Global Step: 64430 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:53:29,819-Speed 5977.73 samples/sec Loss 10.6978 LearningRate 0.2346 Epoch: 6 Global Step: 64440 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:53:36,679-Speed 5971.96 samples/sec Loss 10.6417 LearningRate 0.2346 Epoch: 6 Global Step: 64450 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:53:43,543-Speed 5968.20 samples/sec Loss 10.5266 LearningRate 0.2345 Epoch: 6 Global Step: 64460 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:53:50,394-Speed 5980.25 samples/sec Loss 10.6794 LearningRate 0.2345 Epoch: 6 Global Step: 64470 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:53:57,244-Speed 5980.95 samples/sec Loss 10.5758 LearningRate 0.2345 Epoch: 6 Global Step: 64480 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:54:04,090-Speed 5983.76 samples/sec Loss 10.6105 LearningRate 0.2344 Epoch: 6 Global Step: 64490 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:54:10,938-Speed 5985.42 samples/sec Loss 10.7009 LearningRate 0.2344 Epoch: 6 Global Step: 64500 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:54:17,782-Speed 5985.88 samples/sec Loss 10.6498 LearningRate 0.2344 Epoch: 6 Global Step: 64510 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:54:24,625-Speed 5987.30 samples/sec Loss 10.6126 LearningRate 0.2343 Epoch: 6 Global Step: 64520 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:54:31,511-Speed 5950.96 samples/sec Loss 10.6700 LearningRate 0.2343 Epoch: 6 Global Step: 64530 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:54:38,408-Speed 5939.22 samples/sec Loss 10.6585 LearningRate 0.2343 Epoch: 6 Global Step: 64540 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:54:45,281-Speed 5963.66 samples/sec Loss 10.6039 LearningRate 0.2343 Epoch: 6 Global Step: 64550 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:54:52,147-Speed 5967.41 samples/sec Loss 10.6602 LearningRate 0.2342 Epoch: 6 Global Step: 64560 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:54:59,028-Speed 5953.76 samples/sec Loss 10.6169 LearningRate 0.2342 Epoch: 6 Global Step: 64570 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:55:05,941-Speed 5926.09 samples/sec Loss 10.6216 LearningRate 0.2342 Epoch: 6 Global Step: 64580 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:55:12,833-Speed 5944.54 samples/sec Loss 10.6851 LearningRate 0.2341 Epoch: 6 Global Step: 64590 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:55:19,683-Speed 5982.55 samples/sec Loss 10.6353 LearningRate 0.2341 Epoch: 6 Global Step: 64600 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:55:26,545-Speed 5972.72 samples/sec Loss 10.7280 LearningRate 0.2341 Epoch: 6 Global Step: 64610 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:55:33,388-Speed 5986.89 samples/sec Loss 10.6563 LearningRate 0.2340 Epoch: 6 Global Step: 64620 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:55:40,251-Speed 5969.14 samples/sec Loss 10.6215 LearningRate 0.2340 Epoch: 6 Global Step: 64630 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:55:47,129-Speed 5956.88 samples/sec Loss 10.6855 LearningRate 0.2340 Epoch: 6 Global Step: 64640 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:55:53,992-Speed 5969.64 samples/sec Loss 10.6452 LearningRate 0.2339 Epoch: 6 Global Step: 64650 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:56:00,857-Speed 5967.34 samples/sec Loss 10.5474 LearningRate 0.2339 Epoch: 6 Global Step: 64660 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:56:07,718-Speed 5971.05 samples/sec Loss 10.6726 LearningRate 0.2339 Epoch: 6 Global Step: 64670 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:56:14,593-Speed 5959.69 samples/sec Loss 10.6221 LearningRate 0.2338 Epoch: 6 Global Step: 64680 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:56:21,440-Speed 5983.68 samples/sec Loss 10.6739 LearningRate 0.2338 Epoch: 6 Global Step: 64690 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:56:28,326-Speed 5949.19 samples/sec Loss 10.6654 LearningRate 0.2338 Epoch: 6 Global Step: 64700 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:56:35,201-Speed 5959.31 samples/sec Loss 10.6559 LearningRate 0.2337 Epoch: 6 Global Step: 64710 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:56:42,056-Speed 5976.47 samples/sec Loss 10.6939 LearningRate 0.2337 Epoch: 6 Global Step: 64720 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:56:48,919-Speed 5968.96 samples/sec Loss 10.6353 LearningRate 0.2337 Epoch: 6 Global Step: 64730 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:56:55,775-Speed 5975.70 samples/sec Loss 10.6115 LearningRate 0.2336 Epoch: 6 Global Step: 64740 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:57:02,641-Speed 5966.76 samples/sec Loss 10.6626 LearningRate 0.2336 Epoch: 6 Global Step: 64750 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:57:09,510-Speed 5964.74 samples/sec Loss 10.7294 LearningRate 0.2336 Epoch: 6 Global Step: 64760 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:57:16,369-Speed 5975.92 samples/sec Loss 10.6292 LearningRate 0.2335 Epoch: 6 Global Step: 64770 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:57:23,224-Speed 5976.33 samples/sec Loss 10.6701 LearningRate 0.2335 Epoch: 6 Global Step: 64780 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:57:30,078-Speed 5976.91 samples/sec Loss 10.6445 LearningRate 0.2335 Epoch: 6 Global Step: 64790 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:57:36,951-Speed 5961.11 samples/sec Loss 10.6580 LearningRate 0.2334 Epoch: 6 Global Step: 64800 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:57:43,800-Speed 5980.73 samples/sec Loss 10.7117 LearningRate 0.2334 Epoch: 6 Global Step: 64810 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:57:50,673-Speed 5960.56 samples/sec Loss 10.6103 LearningRate 0.2334 Epoch: 6 Global Step: 64820 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:57:57,619-Speed 5898.45 samples/sec Loss 10.5697 LearningRate 0.2333 Epoch: 6 Global Step: 64830 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:58:04,470-Speed 5979.19 samples/sec Loss 10.6268 LearningRate 0.2333 Epoch: 6 Global Step: 64840 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:58:11,326-Speed 5975.58 samples/sec Loss 10.5985 LearningRate 0.2333 Epoch: 6 Global Step: 64850 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:58:18,177-Speed 5980.19 samples/sec Loss 10.6333 LearningRate 0.2332 Epoch: 6 Global Step: 64860 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:58:25,045-Speed 5965.45 samples/sec Loss 10.5853 LearningRate 0.2332 Epoch: 6 Global Step: 64870 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:58:31,973-Speed 5913.43 samples/sec Loss 10.6578 LearningRate 0.2332 Epoch: 6 Global Step: 64880 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:58:38,910-Speed 5905.47 samples/sec Loss 10.6018 LearningRate 0.2331 Epoch: 6 Global Step: 64890 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:58:46,646-Speed 5295.69 samples/sec Loss 10.6330 LearningRate 0.2331 Epoch: 6 Global Step: 64900 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:58:53,493-Speed 5983.71 samples/sec Loss 10.5792 LearningRate 0.2331 Epoch: 6 Global Step: 64910 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:59:00,359-Speed 5966.48 samples/sec Loss 10.6072 LearningRate 0.2330 Epoch: 6 Global Step: 64920 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:59:07,205-Speed 5983.88 samples/sec Loss 10.5957 LearningRate 0.2330 Epoch: 6 Global Step: 64930 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:59:14,079-Speed 5959.91 samples/sec Loss 10.5834 LearningRate 0.2330 Epoch: 6 Global Step: 64940 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:59:20,921-Speed 5988.17 samples/sec Loss 10.5960 LearningRate 0.2329 Epoch: 6 Global Step: 64950 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:59:27,784-Speed 5969.84 samples/sec Loss 10.6321 LearningRate 0.2329 Epoch: 6 Global Step: 64960 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:59:34,760-Speed 5872.95 samples/sec Loss 10.7057 LearningRate 0.2329 Epoch: 6 Global Step: 64970 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:59:41,631-Speed 5962.31 samples/sec Loss 10.6494 LearningRate 0.2328 Epoch: 6 Global Step: 64980 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 08:59:48,492-Speed 5969.90 samples/sec Loss 10.5446 LearningRate 0.2328 Epoch: 6 Global Step: 64990 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 08:59:55,349-Speed 5975.39 samples/sec Loss 10.5977 LearningRate 0.2328 Epoch: 6 Global Step: 65000 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 09:00:21,977-[lfw][65000]XNorm: 22.888245 Training: 2022-01-08 09:00:21,977-[lfw][65000]Accuracy-Flip: 0.99617+-0.00279 Training: 2022-01-08 09:00:21,978-[lfw][65000]Accuracy-Highest: 0.99700 Training: 2022-01-08 09:00:52,799-[cfp_fp][65000]XNorm: 20.112521 Training: 2022-01-08 09:00:52,800-[cfp_fp][65000]Accuracy-Flip: 0.97129+-0.00893 Training: 2022-01-08 09:00:52,801-[cfp_fp][65000]Accuracy-Highest: 0.97686 Training: 2022-01-08 09:01:19,461-[agedb_30][65000]XNorm: 22.095434 Training: 2022-01-08 09:01:19,462-[agedb_30][65000]Accuracy-Flip: 0.96633+-0.00777 Training: 2022-01-08 09:01:19,463-[agedb_30][65000]Accuracy-Highest: 0.96633 Training: 2022-01-08 09:01:26,344-Speed 450.14 samples/sec Loss 10.6133 LearningRate 0.2327 Epoch: 6 Global Step: 65010 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 09:01:33,188-Speed 5986.94 samples/sec Loss 10.6660 LearningRate 0.2327 Epoch: 6 Global Step: 65020 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:01:40,062-Speed 5959.95 samples/sec Loss 10.6214 LearningRate 0.2327 Epoch: 6 Global Step: 65030 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:01:46,917-Speed 5976.37 samples/sec Loss 10.5562 LearningRate 0.2326 Epoch: 6 Global Step: 65040 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:01:53,783-Speed 5966.54 samples/sec Loss 10.5842 LearningRate 0.2326 Epoch: 6 Global Step: 65050 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:02:00,668-Speed 5950.57 samples/sec Loss 10.6301 LearningRate 0.2326 Epoch: 6 Global Step: 65060 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:02:07,538-Speed 5963.59 samples/sec Loss 10.6431 LearningRate 0.2325 Epoch: 6 Global Step: 65070 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:02:14,415-Speed 5956.80 samples/sec Loss 10.6042 LearningRate 0.2325 Epoch: 6 Global Step: 65080 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:02:21,325-Speed 5929.27 samples/sec Loss 10.6584 LearningRate 0.2325 Epoch: 6 Global Step: 65090 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:02:28,268-Speed 5900.60 samples/sec Loss 10.6332 LearningRate 0.2324 Epoch: 6 Global Step: 65100 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:02:35,157-Speed 5947.08 samples/sec Loss 10.6278 LearningRate 0.2324 Epoch: 6 Global Step: 65110 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:02:42,037-Speed 5954.18 samples/sec Loss 10.6548 LearningRate 0.2324 Epoch: 6 Global Step: 65120 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 09:02:48,890-Speed 5977.36 samples/sec Loss 10.5392 LearningRate 0.2324 Epoch: 6 Global Step: 65130 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:02:55,752-Speed 5971.03 samples/sec Loss 10.6050 LearningRate 0.2323 Epoch: 6 Global Step: 65140 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:03:02,610-Speed 5975.90 samples/sec Loss 10.5267 LearningRate 0.2323 Epoch: 6 Global Step: 65150 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:03:09,459-Speed 5981.43 samples/sec Loss 10.6128 LearningRate 0.2323 Epoch: 6 Global Step: 65160 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:03:16,327-Speed 5965.24 samples/sec Loss 10.6263 LearningRate 0.2322 Epoch: 6 Global Step: 65170 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:03:23,204-Speed 5959.51 samples/sec Loss 10.6447 LearningRate 0.2322 Epoch: 6 Global Step: 65180 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:03:30,051-Speed 5983.62 samples/sec Loss 10.6914 LearningRate 0.2322 Epoch: 6 Global Step: 65190 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:03:36,922-Speed 5961.86 samples/sec Loss 10.6459 LearningRate 0.2321 Epoch: 6 Global Step: 65200 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:03:43,804-Speed 5955.80 samples/sec Loss 10.5656 LearningRate 0.2321 Epoch: 6 Global Step: 65210 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:03:50,666-Speed 5970.12 samples/sec Loss 10.6039 LearningRate 0.2321 Epoch: 6 Global Step: 65220 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:03:57,519-Speed 5977.93 samples/sec Loss 10.6253 LearningRate 0.2320 Epoch: 6 Global Step: 65230 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:04:04,374-Speed 5976.09 samples/sec Loss 10.6174 LearningRate 0.2320 Epoch: 6 Global Step: 65240 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:04:11,225-Speed 5979.89 samples/sec Loss 10.6464 LearningRate 0.2320 Epoch: 6 Global Step: 65250 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:04:18,081-Speed 5975.60 samples/sec Loss 10.6118 LearningRate 0.2319 Epoch: 6 Global Step: 65260 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:04:24,944-Speed 5970.23 samples/sec Loss 10.6076 LearningRate 0.2319 Epoch: 6 Global Step: 65270 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:04:31,796-Speed 5978.75 samples/sec Loss 10.5401 LearningRate 0.2319 Epoch: 6 Global Step: 65280 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:04:38,663-Speed 5966.55 samples/sec Loss 10.6075 LearningRate 0.2318 Epoch: 6 Global Step: 65290 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:04:45,539-Speed 5958.01 samples/sec Loss 10.6178 LearningRate 0.2318 Epoch: 6 Global Step: 65300 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:04:52,407-Speed 5964.44 samples/sec Loss 10.5755 LearningRate 0.2318 Epoch: 6 Global Step: 65310 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:04:59,271-Speed 5969.03 samples/sec Loss 10.5894 LearningRate 0.2317 Epoch: 6 Global Step: 65320 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:05:06,149-Speed 5956.47 samples/sec Loss 10.6551 LearningRate 0.2317 Epoch: 6 Global Step: 65330 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 09:05:13,030-Speed 5954.18 samples/sec Loss 10.7148 LearningRate 0.2317 Epoch: 6 Global Step: 65340 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 09:05:19,895-Speed 5967.64 samples/sec Loss 10.5804 LearningRate 0.2316 Epoch: 6 Global Step: 65350 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 09:05:26,749-Speed 5977.55 samples/sec Loss 10.5976 LearningRate 0.2316 Epoch: 6 Global Step: 65360 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 09:05:33,612-Speed 5969.25 samples/sec Loss 10.5840 LearningRate 0.2316 Epoch: 6 Global Step: 65370 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 09:05:40,464-Speed 5981.96 samples/sec Loss 10.5484 LearningRate 0.2315 Epoch: 6 Global Step: 65380 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:05:47,322-Speed 5972.87 samples/sec Loss 10.5813 LearningRate 0.2315 Epoch: 6 Global Step: 65390 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:05:54,273-Speed 5894.11 samples/sec Loss 10.5782 LearningRate 0.2315 Epoch: 6 Global Step: 65400 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:06:01,240-Speed 5880.98 samples/sec Loss 10.5766 LearningRate 0.2314 Epoch: 6 Global Step: 65410 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:06:08,187-Speed 5896.90 samples/sec Loss 10.6271 LearningRate 0.2314 Epoch: 6 Global Step: 65420 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:06:15,129-Speed 5901.98 samples/sec Loss 10.6035 LearningRate 0.2314 Epoch: 6 Global Step: 65430 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:06:22,049-Speed 5920.17 samples/sec Loss 10.6003 LearningRate 0.2313 Epoch: 6 Global Step: 65440 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:06:28,968-Speed 5922.41 samples/sec Loss 10.6754 LearningRate 0.2313 Epoch: 6 Global Step: 65450 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:06:35,821-Speed 5981.47 samples/sec Loss 10.5832 LearningRate 0.2313 Epoch: 6 Global Step: 65460 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:06:42,671-Speed 5980.16 samples/sec Loss 10.5582 LearningRate 0.2312 Epoch: 6 Global Step: 65470 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:06:49,530-Speed 5972.64 samples/sec Loss 10.5422 LearningRate 0.2312 Epoch: 6 Global Step: 65480 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 09:06:56,394-Speed 5968.44 samples/sec Loss 10.6379 LearningRate 0.2312 Epoch: 6 Global Step: 65490 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 09:07:03,275-Speed 5953.61 samples/sec Loss 10.5908 LearningRate 0.2311 Epoch: 6 Global Step: 65500 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 09:07:10,151-Speed 5961.24 samples/sec Loss 10.5224 LearningRate 0.2311 Epoch: 6 Global Step: 65510 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 09:07:16,996-Speed 5984.43 samples/sec Loss 10.5992 LearningRate 0.2311 Epoch: 6 Global Step: 65520 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:07:23,852-Speed 5975.52 samples/sec Loss 10.6024 LearningRate 0.2310 Epoch: 6 Global Step: 65530 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:07:30,705-Speed 5978.18 samples/sec Loss 10.5765 LearningRate 0.2310 Epoch: 6 Global Step: 65540 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:07:37,572-Speed 5966.74 samples/sec Loss 10.5534 LearningRate 0.2310 Epoch: 6 Global Step: 65550 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:07:44,442-Speed 5965.70 samples/sec Loss 10.5645 LearningRate 0.2309 Epoch: 6 Global Step: 65560 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:07:51,302-Speed 5971.63 samples/sec Loss 10.6355 LearningRate 0.2309 Epoch: 6 Global Step: 65570 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:07:58,170-Speed 5965.26 samples/sec Loss 10.4668 LearningRate 0.2309 Epoch: 6 Global Step: 65580 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:08:05,049-Speed 5955.41 samples/sec Loss 10.5975 LearningRate 0.2309 Epoch: 6 Global Step: 65590 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:08:11,917-Speed 5964.88 samples/sec Loss 10.5549 LearningRate 0.2308 Epoch: 6 Global Step: 65600 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:08:18,800-Speed 5952.46 samples/sec Loss 10.5455 LearningRate 0.2308 Epoch: 6 Global Step: 65610 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:08:25,652-Speed 5979.17 samples/sec Loss 10.5467 LearningRate 0.2308 Epoch: 6 Global Step: 65620 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 09:08:32,529-Speed 5956.24 samples/sec Loss 10.5363 LearningRate 0.2307 Epoch: 6 Global Step: 65630 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 09:08:39,394-Speed 5968.17 samples/sec Loss 10.5298 LearningRate 0.2307 Epoch: 6 Global Step: 65640 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 09:08:46,251-Speed 5975.06 samples/sec Loss 10.4721 LearningRate 0.2307 Epoch: 6 Global Step: 65650 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:08:53,115-Speed 5968.03 samples/sec Loss 10.6230 LearningRate 0.2306 Epoch: 6 Global Step: 65660 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:08:59,969-Speed 5976.81 samples/sec Loss 10.5874 LearningRate 0.2306 Epoch: 6 Global Step: 65670 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:09:06,825-Speed 5975.91 samples/sec Loss 10.5905 LearningRate 0.2306 Epoch: 6 Global Step: 65680 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:09:13,690-Speed 5967.24 samples/sec Loss 10.5595 LearningRate 0.2305 Epoch: 6 Global Step: 65690 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:09:20,542-Speed 5979.42 samples/sec Loss 10.5446 LearningRate 0.2305 Epoch: 6 Global Step: 65700 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:09:27,411-Speed 5964.12 samples/sec Loss 10.6220 LearningRate 0.2305 Epoch: 6 Global Step: 65710 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:09:34,266-Speed 5975.71 samples/sec Loss 10.5656 LearningRate 0.2304 Epoch: 6 Global Step: 65720 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:09:41,133-Speed 5965.86 samples/sec Loss 10.5175 LearningRate 0.2304 Epoch: 6 Global Step: 65730 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:09:48,017-Speed 5952.14 samples/sec Loss 10.5752 LearningRate 0.2304 Epoch: 6 Global Step: 65740 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:09:54,900-Speed 5951.24 samples/sec Loss 10.5985 LearningRate 0.2303 Epoch: 6 Global Step: 65750 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 09:10:01,757-Speed 5974.89 samples/sec Loss 10.6393 LearningRate 0.2303 Epoch: 6 Global Step: 65760 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 09:10:08,618-Speed 5971.02 samples/sec Loss 10.6276 LearningRate 0.2303 Epoch: 6 Global Step: 65770 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 09:10:15,472-Speed 5976.63 samples/sec Loss 10.5937 LearningRate 0.2302 Epoch: 6 Global Step: 65780 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 09:10:22,354-Speed 5954.93 samples/sec Loss 10.5380 LearningRate 0.2302 Epoch: 6 Global Step: 65790 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 09:10:29,239-Speed 5950.04 samples/sec Loss 10.5954 LearningRate 0.2302 Epoch: 6 Global Step: 65800 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 09:10:36,084-Speed 5984.97 samples/sec Loss 10.5497 LearningRate 0.2301 Epoch: 6 Global Step: 65810 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:10:42,939-Speed 5976.52 samples/sec Loss 10.4875 LearningRate 0.2301 Epoch: 6 Global Step: 65820 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:10:49,794-Speed 5979.55 samples/sec Loss 10.5323 LearningRate 0.2301 Epoch: 6 Global Step: 65830 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:10:56,669-Speed 5958.44 samples/sec Loss 10.6024 LearningRate 0.2300 Epoch: 6 Global Step: 65840 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:11:03,520-Speed 5979.70 samples/sec Loss 10.5345 LearningRate 0.2300 Epoch: 6 Global Step: 65850 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:11:10,384-Speed 5968.69 samples/sec Loss 10.5635 LearningRate 0.2300 Epoch: 6 Global Step: 65860 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:11:17,278-Speed 5942.76 samples/sec Loss 10.5068 LearningRate 0.2299 Epoch: 6 Global Step: 65870 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:11:24,135-Speed 5974.74 samples/sec Loss 10.5438 LearningRate 0.2299 Epoch: 6 Global Step: 65880 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:11:30,989-Speed 5976.98 samples/sec Loss 10.5270 LearningRate 0.2299 Epoch: 6 Global Step: 65890 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:11:37,861-Speed 5960.89 samples/sec Loss 10.5806 LearningRate 0.2298 Epoch: 6 Global Step: 65900 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:11:44,736-Speed 5959.44 samples/sec Loss 10.5364 LearningRate 0.2298 Epoch: 6 Global Step: 65910 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 09:11:51,604-Speed 5967.35 samples/sec Loss 10.5248 LearningRate 0.2298 Epoch: 6 Global Step: 65920 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:11:58,452-Speed 5982.07 samples/sec Loss 10.5560 LearningRate 0.2297 Epoch: 6 Global Step: 65930 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 09:12:05,328-Speed 5958.83 samples/sec Loss 10.5737 LearningRate 0.2297 Epoch: 6 Global Step: 65940 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 09:12:12,201-Speed 5961.32 samples/sec Loss 10.6359 LearningRate 0.2297 Epoch: 6 Global Step: 65950 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 09:12:19,062-Speed 5971.48 samples/sec Loss 10.5676 LearningRate 0.2296 Epoch: 6 Global Step: 65960 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 09:12:25,942-Speed 5954.52 samples/sec Loss 10.5909 LearningRate 0.2296 Epoch: 6 Global Step: 65970 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 09:12:32,806-Speed 5968.50 samples/sec Loss 10.4957 LearningRate 0.2296 Epoch: 6 Global Step: 65980 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 09:12:39,677-Speed 5962.34 samples/sec Loss 10.5451 LearningRate 0.2296 Epoch: 6 Global Step: 65990 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 09:12:46,537-Speed 5972.44 samples/sec Loss 10.5075 LearningRate 0.2295 Epoch: 6 Global Step: 66000 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 09:12:53,429-Speed 5944.27 samples/sec Loss 10.5002 LearningRate 0.2295 Epoch: 6 Global Step: 66010 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 09:13:00,288-Speed 5972.12 samples/sec Loss 10.4738 LearningRate 0.2295 Epoch: 6 Global Step: 66020 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 09:13:07,183-Speed 5941.53 samples/sec Loss 10.5184 LearningRate 0.2294 Epoch: 6 Global Step: 66030 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:13:14,043-Speed 5972.14 samples/sec Loss 10.5228 LearningRate 0.2294 Epoch: 6 Global Step: 66040 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:13:20,939-Speed 5941.05 samples/sec Loss 10.4903 LearningRate 0.2294 Epoch: 6 Global Step: 66050 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:13:27,820-Speed 5953.95 samples/sec Loss 10.5282 LearningRate 0.2293 Epoch: 6 Global Step: 66060 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:13:34,668-Speed 5982.34 samples/sec Loss 10.4852 LearningRate 0.2293 Epoch: 6 Global Step: 66070 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:13:41,536-Speed 5965.26 samples/sec Loss 10.5411 LearningRate 0.2293 Epoch: 6 Global Step: 66080 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:13:48,413-Speed 5957.19 samples/sec Loss 10.6155 LearningRate 0.2292 Epoch: 6 Global Step: 66090 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:13:55,357-Speed 5899.58 samples/sec Loss 10.5497 LearningRate 0.2292 Epoch: 6 Global Step: 66100 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:14:02,216-Speed 5972.88 samples/sec Loss 10.4992 LearningRate 0.2292 Epoch: 6 Global Step: 66110 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:14:09,092-Speed 5958.13 samples/sec Loss 10.5346 LearningRate 0.2291 Epoch: 6 Global Step: 66120 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:14:15,960-Speed 5965.73 samples/sec Loss 10.5364 LearningRate 0.2291 Epoch: 6 Global Step: 66130 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 09:14:22,873-Speed 5926.19 samples/sec Loss 10.5425 LearningRate 0.2291 Epoch: 6 Global Step: 66140 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:14:29,719-Speed 5983.50 samples/sec Loss 10.5482 LearningRate 0.2290 Epoch: 6 Global Step: 66150 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:14:36,607-Speed 5948.24 samples/sec Loss 10.5481 LearningRate 0.2290 Epoch: 6 Global Step: 66160 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:14:43,477-Speed 5962.81 samples/sec Loss 10.4913 LearningRate 0.2290 Epoch: 6 Global Step: 66170 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:14:50,350-Speed 5961.49 samples/sec Loss 10.4771 LearningRate 0.2289 Epoch: 6 Global Step: 66180 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:14:57,215-Speed 5966.96 samples/sec Loss 10.5271 LearningRate 0.2289 Epoch: 6 Global Step: 66190 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:15:04,069-Speed 5977.19 samples/sec Loss 10.5257 LearningRate 0.2289 Epoch: 6 Global Step: 66200 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:15:10,926-Speed 5985.90 samples/sec Loss 10.5441 LearningRate 0.2288 Epoch: 6 Global Step: 66210 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:15:17,773-Speed 5983.64 samples/sec Loss 10.4649 LearningRate 0.2288 Epoch: 6 Global Step: 66220 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:15:24,628-Speed 5975.59 samples/sec Loss 10.5454 LearningRate 0.2288 Epoch: 6 Global Step: 66230 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:15:31,499-Speed 5964.03 samples/sec Loss 10.5737 LearningRate 0.2287 Epoch: 6 Global Step: 66240 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 09:15:38,369-Speed 5963.45 samples/sec Loss 10.5359 LearningRate 0.2287 Epoch: 6 Global Step: 66250 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:15:45,225-Speed 5974.93 samples/sec Loss 10.4769 LearningRate 0.2287 Epoch: 6 Global Step: 66260 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:15:52,073-Speed 5982.41 samples/sec Loss 10.5121 LearningRate 0.2286 Epoch: 6 Global Step: 66270 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:15:58,949-Speed 5958.20 samples/sec Loss 10.5588 LearningRate 0.2286 Epoch: 6 Global Step: 66280 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:16:05,803-Speed 5976.78 samples/sec Loss 10.5326 LearningRate 0.2286 Epoch: 6 Global Step: 66290 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:16:12,662-Speed 5972.39 samples/sec Loss 10.5411 LearningRate 0.2285 Epoch: 6 Global Step: 66300 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:16:19,534-Speed 5961.63 samples/sec Loss 10.5517 LearningRate 0.2285 Epoch: 6 Global Step: 66310 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:16:26,415-Speed 5953.97 samples/sec Loss 10.4870 LearningRate 0.2285 Epoch: 6 Global Step: 66320 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:16:33,327-Speed 5926.60 samples/sec Loss 10.4995 LearningRate 0.2284 Epoch: 6 Global Step: 66330 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:16:40,199-Speed 5961.69 samples/sec Loss 10.5185 LearningRate 0.2284 Epoch: 6 Global Step: 66340 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:16:47,054-Speed 5980.43 samples/sec Loss 10.5102 LearningRate 0.2284 Epoch: 6 Global Step: 66350 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 09:16:53,893-Speed 5989.87 samples/sec Loss 10.4629 LearningRate 0.2284 Epoch: 6 Global Step: 66360 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:17:00,750-Speed 5976.21 samples/sec Loss 10.5549 LearningRate 0.2283 Epoch: 6 Global Step: 66370 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:17:07,627-Speed 5956.62 samples/sec Loss 10.4156 LearningRate 0.2283 Epoch: 6 Global Step: 66380 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:17:14,490-Speed 5969.60 samples/sec Loss 10.5524 LearningRate 0.2283 Epoch: 6 Global Step: 66390 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:17:21,353-Speed 5969.14 samples/sec Loss 10.4900 LearningRate 0.2282 Epoch: 6 Global Step: 66400 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:17:28,215-Speed 5981.51 samples/sec Loss 10.4578 LearningRate 0.2282 Epoch: 6 Global Step: 66410 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:17:35,064-Speed 5981.22 samples/sec Loss 10.5398 LearningRate 0.2282 Epoch: 6 Global Step: 66420 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:17:41,946-Speed 5952.54 samples/sec Loss 10.5005 LearningRate 0.2281 Epoch: 6 Global Step: 66430 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:17:48,816-Speed 5963.23 samples/sec Loss 10.5139 LearningRate 0.2281 Epoch: 6 Global Step: 66440 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:17:55,666-Speed 5980.90 samples/sec Loss 10.5128 LearningRate 0.2281 Epoch: 6 Global Step: 66450 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:18:02,520-Speed 5976.76 samples/sec Loss 10.4889 LearningRate 0.2280 Epoch: 6 Global Step: 66460 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:18:09,378-Speed 5974.29 samples/sec Loss 10.5245 LearningRate 0.2280 Epoch: 6 Global Step: 66470 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:18:16,226-Speed 5981.10 samples/sec Loss 10.4563 LearningRate 0.2280 Epoch: 6 Global Step: 66480 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:18:23,080-Speed 5977.29 samples/sec Loss 10.5169 LearningRate 0.2279 Epoch: 6 Global Step: 66490 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:18:30,307-Speed 5668.89 samples/sec Loss 10.5460 LearningRate 0.2279 Epoch: 6 Global Step: 66500 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:18:37,176-Speed 5964.60 samples/sec Loss 10.5861 LearningRate 0.2279 Epoch: 6 Global Step: 66510 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 09:18:44,037-Speed 5971.12 samples/sec Loss 10.4917 LearningRate 0.2278 Epoch: 6 Global Step: 66520 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 09:18:50,918-Speed 5954.61 samples/sec Loss 10.3961 LearningRate 0.2278 Epoch: 6 Global Step: 66530 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 09:18:57,797-Speed 5955.78 samples/sec Loss 10.4971 LearningRate 0.2278 Epoch: 6 Global Step: 66540 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 09:19:04,651-Speed 5977.15 samples/sec Loss 10.5283 LearningRate 0.2277 Epoch: 6 Global Step: 66550 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 09:19:11,502-Speed 5980.07 samples/sec Loss 10.6117 LearningRate 0.2277 Epoch: 6 Global Step: 66560 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 09:19:18,366-Speed 5970.39 samples/sec Loss 10.5263 LearningRate 0.2277 Epoch: 6 Global Step: 66570 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 09:19:25,226-Speed 5972.55 samples/sec Loss 10.4839 LearningRate 0.2276 Epoch: 6 Global Step: 66580 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 09:19:32,077-Speed 5979.84 samples/sec Loss 10.3837 LearningRate 0.2276 Epoch: 6 Global Step: 66590 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 09:19:38,955-Speed 5956.29 samples/sec Loss 10.4422 LearningRate 0.2276 Epoch: 6 Global Step: 66600 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 09:19:45,940-Speed 5864.71 samples/sec Loss 10.5011 LearningRate 0.2275 Epoch: 6 Global Step: 66610 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:19:52,823-Speed 5954.26 samples/sec Loss 10.5155 LearningRate 0.2275 Epoch: 6 Global Step: 66620 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:19:59,689-Speed 5966.67 samples/sec Loss 10.5735 LearningRate 0.2275 Epoch: 6 Global Step: 66630 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:20:06,560-Speed 5962.29 samples/sec Loss 10.5229 LearningRate 0.2274 Epoch: 6 Global Step: 66640 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:20:13,436-Speed 5959.00 samples/sec Loss 10.4780 LearningRate 0.2274 Epoch: 6 Global Step: 66650 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:20:20,329-Speed 5943.56 samples/sec Loss 10.4364 LearningRate 0.2274 Epoch: 6 Global Step: 66660 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:20:27,229-Speed 5937.52 samples/sec Loss 10.4736 LearningRate 0.2273 Epoch: 6 Global Step: 66670 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:20:34,105-Speed 5958.97 samples/sec Loss 10.5564 LearningRate 0.2273 Epoch: 6 Global Step: 66680 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:20:41,031-Speed 5914.98 samples/sec Loss 10.5558 LearningRate 0.2273 Epoch: 6 Global Step: 66690 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:20:47,923-Speed 5943.87 samples/sec Loss 10.4771 LearningRate 0.2273 Epoch: 6 Global Step: 66700 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:20:54,815-Speed 5944.63 samples/sec Loss 10.5581 LearningRate 0.2272 Epoch: 6 Global Step: 66710 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:21:01,671-Speed 5975.40 samples/sec Loss 10.4548 LearningRate 0.2272 Epoch: 6 Global Step: 66720 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:21:08,527-Speed 5975.85 samples/sec Loss 10.5183 LearningRate 0.2272 Epoch: 6 Global Step: 66730 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:21:15,374-Speed 5983.25 samples/sec Loss 10.4610 LearningRate 0.2271 Epoch: 6 Global Step: 66740 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:21:22,235-Speed 5971.57 samples/sec Loss 10.5353 LearningRate 0.2271 Epoch: 6 Global Step: 66750 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:21:29,107-Speed 5964.30 samples/sec Loss 10.5088 LearningRate 0.2271 Epoch: 6 Global Step: 66760 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:21:35,995-Speed 5948.51 samples/sec Loss 10.4583 LearningRate 0.2270 Epoch: 6 Global Step: 66770 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:21:42,849-Speed 5976.72 samples/sec Loss 10.4176 LearningRate 0.2270 Epoch: 6 Global Step: 66780 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:21:49,701-Speed 5979.39 samples/sec Loss 10.4433 LearningRate 0.2270 Epoch: 6 Global Step: 66790 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:21:56,569-Speed 5965.20 samples/sec Loss 10.5771 LearningRate 0.2269 Epoch: 6 Global Step: 66800 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:22:03,445-Speed 5957.37 samples/sec Loss 10.5141 LearningRate 0.2269 Epoch: 6 Global Step: 66810 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 09:22:10,286-Speed 5988.85 samples/sec Loss 10.4870 LearningRate 0.2269 Epoch: 6 Global Step: 66820 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:22:17,151-Speed 5967.52 samples/sec Loss 10.4096 LearningRate 0.2268 Epoch: 6 Global Step: 66830 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:22:24,131-Speed 5869.56 samples/sec Loss 10.4653 LearningRate 0.2268 Epoch: 6 Global Step: 66840 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:22:31,013-Speed 5954.34 samples/sec Loss 10.4367 LearningRate 0.2268 Epoch: 6 Global Step: 66850 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:22:37,900-Speed 5949.16 samples/sec Loss 10.5024 LearningRate 0.2267 Epoch: 6 Global Step: 66860 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:22:44,770-Speed 5962.64 samples/sec Loss 10.4492 LearningRate 0.2267 Epoch: 6 Global Step: 66870 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:22:51,631-Speed 5972.07 samples/sec Loss 10.5038 LearningRate 0.2267 Epoch: 6 Global Step: 66880 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:22:58,498-Speed 5965.31 samples/sec Loss 10.4683 LearningRate 0.2266 Epoch: 6 Global Step: 66890 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:23:05,351-Speed 5978.16 samples/sec Loss 10.3679 LearningRate 0.2266 Epoch: 6 Global Step: 66900 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:23:12,202-Speed 5981.34 samples/sec Loss 10.4878 LearningRate 0.2266 Epoch: 6 Global Step: 66910 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:23:19,067-Speed 5967.91 samples/sec Loss 10.4106 LearningRate 0.2265 Epoch: 6 Global Step: 66920 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 09:23:25,939-Speed 5961.46 samples/sec Loss 10.4848 LearningRate 0.2265 Epoch: 6 Global Step: 66930 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 09:23:32,791-Speed 5978.63 samples/sec Loss 10.5061 LearningRate 0.2265 Epoch: 6 Global Step: 66940 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:23:39,663-Speed 5962.03 samples/sec Loss 10.4832 LearningRate 0.2264 Epoch: 6 Global Step: 66950 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:23:46,517-Speed 5976.52 samples/sec Loss 10.5236 LearningRate 0.2264 Epoch: 6 Global Step: 66960 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:23:53,367-Speed 5980.27 samples/sec Loss 10.4724 LearningRate 0.2264 Epoch: 6 Global Step: 66970 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:24:00,235-Speed 5965.11 samples/sec Loss 10.4818 LearningRate 0.2263 Epoch: 6 Global Step: 66980 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:24:07,109-Speed 5960.13 samples/sec Loss 10.4337 LearningRate 0.2263 Epoch: 6 Global Step: 66990 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:24:13,986-Speed 5957.56 samples/sec Loss 10.4097 LearningRate 0.2263 Epoch: 6 Global Step: 67000 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:24:20,882-Speed 5942.76 samples/sec Loss 10.4558 LearningRate 0.2263 Epoch: 6 Global Step: 67010 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:24:27,771-Speed 5946.92 samples/sec Loss 10.4518 LearningRate 0.2262 Epoch: 6 Global Step: 67020 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:24:34,616-Speed 5985.04 samples/sec Loss 10.4987 LearningRate 0.2262 Epoch: 6 Global Step: 67030 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:24:41,464-Speed 5982.66 samples/sec Loss 10.4887 LearningRate 0.2262 Epoch: 6 Global Step: 67040 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 09:24:48,328-Speed 5968.52 samples/sec Loss 10.5133 LearningRate 0.2261 Epoch: 6 Global Step: 67050 Fp16 Grad Scale: 262144 Required: 28 hours Training: 2022-01-08 09:24:55,175-Speed 5983.65 samples/sec Loss 10.4385 LearningRate 0.2261 Epoch: 6 Global Step: 67060 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:25:02,167-Speed 5860.78 samples/sec Loss 10.4554 LearningRate 0.2261 Epoch: 6 Global Step: 67070 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:25:09,033-Speed 5966.51 samples/sec Loss 10.4702 LearningRate 0.2260 Epoch: 6 Global Step: 67080 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:25:15,889-Speed 5976.99 samples/sec Loss 10.4637 LearningRate 0.2260 Epoch: 6 Global Step: 67090 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:25:22,744-Speed 5976.81 samples/sec Loss 10.5532 LearningRate 0.2260 Epoch: 6 Global Step: 67100 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 09:25:29,591-Speed 5982.42 samples/sec Loss 10.4648 LearningRate 0.2259 Epoch: 6 Global Step: 67110 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:25:36,447-Speed 5975.44 samples/sec Loss 10.3794 LearningRate 0.2259 Epoch: 6 Global Step: 67120 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:25:43,335-Speed 5948.12 samples/sec Loss 10.4418 LearningRate 0.2259 Epoch: 6 Global Step: 67130 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:25:50,216-Speed 5953.47 samples/sec Loss 10.4780 LearningRate 0.2258 Epoch: 6 Global Step: 67140 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:25:57,109-Speed 5943.91 samples/sec Loss 10.3894 LearningRate 0.2258 Epoch: 6 Global Step: 67150 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:26:03,979-Speed 5963.63 samples/sec Loss 10.5559 LearningRate 0.2258 Epoch: 6 Global Step: 67160 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:26:10,852-Speed 5960.45 samples/sec Loss 10.5401 LearningRate 0.2257 Epoch: 6 Global Step: 67170 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:26:18,895-Speed 5093.56 samples/sec Loss 10.4038 LearningRate 0.2257 Epoch: 6 Global Step: 67180 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:26:25,737-Speed 5988.47 samples/sec Loss 10.4231 LearningRate 0.2257 Epoch: 6 Global Step: 67190 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:26:32,602-Speed 5967.16 samples/sec Loss 10.4587 LearningRate 0.2256 Epoch: 6 Global Step: 67200 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:26:39,447-Speed 5985.65 samples/sec Loss 10.4848 LearningRate 0.2256 Epoch: 6 Global Step: 67210 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:26:46,411-Speed 5882.61 samples/sec Loss 10.4856 LearningRate 0.2256 Epoch: 6 Global Step: 67220 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:26:53,256-Speed 5985.90 samples/sec Loss 10.4379 LearningRate 0.2255 Epoch: 6 Global Step: 67230 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:27:00,118-Speed 5969.93 samples/sec Loss 10.4176 LearningRate 0.2255 Epoch: 6 Global Step: 67240 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:27:06,967-Speed 5981.83 samples/sec Loss 10.3998 LearningRate 0.2255 Epoch: 6 Global Step: 67250 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:27:13,821-Speed 5976.55 samples/sec Loss 10.3824 LearningRate 0.2254 Epoch: 6 Global Step: 67260 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 09:27:20,701-Speed 5954.85 samples/sec Loss 10.3792 LearningRate 0.2254 Epoch: 6 Global Step: 67270 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 09:27:27,554-Speed 5978.74 samples/sec Loss 10.4500 LearningRate 0.2254 Epoch: 6 Global Step: 67280 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:27:34,431-Speed 5956.40 samples/sec Loss 10.4407 LearningRate 0.2253 Epoch: 6 Global Step: 67290 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:27:41,279-Speed 5982.76 samples/sec Loss 10.3543 LearningRate 0.2253 Epoch: 6 Global Step: 67300 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:27:48,152-Speed 5963.53 samples/sec Loss 10.4864 LearningRate 0.2253 Epoch: 6 Global Step: 67310 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:27:55,010-Speed 5974.72 samples/sec Loss 10.4660 LearningRate 0.2253 Epoch: 6 Global Step: 67320 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:28:01,870-Speed 5971.64 samples/sec Loss 10.4728 LearningRate 0.2252 Epoch: 6 Global Step: 67330 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:28:08,732-Speed 5970.54 samples/sec Loss 10.5210 LearningRate 0.2252 Epoch: 6 Global Step: 67340 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:28:15,588-Speed 5976.21 samples/sec Loss 10.4603 LearningRate 0.2252 Epoch: 6 Global Step: 67350 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:28:22,463-Speed 5961.08 samples/sec Loss 10.4035 LearningRate 0.2251 Epoch: 6 Global Step: 67360 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:28:29,324-Speed 5970.35 samples/sec Loss 10.4360 LearningRate 0.2251 Epoch: 6 Global Step: 67370 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:28:36,187-Speed 5971.80 samples/sec Loss 10.4248 LearningRate 0.2251 Epoch: 6 Global Step: 67380 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 09:28:43,023-Speed 5992.15 samples/sec Loss 10.4955 LearningRate 0.2250 Epoch: 6 Global Step: 67390 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:28:49,875-Speed 5981.52 samples/sec Loss 10.3886 LearningRate 0.2250 Epoch: 6 Global Step: 67400 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:28:56,739-Speed 5969.42 samples/sec Loss 10.4562 LearningRate 0.2250 Epoch: 6 Global Step: 67410 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:29:03,597-Speed 5974.00 samples/sec Loss 10.4149 LearningRate 0.2249 Epoch: 6 Global Step: 67420 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:29:10,458-Speed 5970.77 samples/sec Loss 10.4479 LearningRate 0.2249 Epoch: 6 Global Step: 67430 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:29:17,315-Speed 5974.88 samples/sec Loss 10.4429 LearningRate 0.2249 Epoch: 6 Global Step: 67440 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:29:24,189-Speed 5960.12 samples/sec Loss 10.3805 LearningRate 0.2248 Epoch: 6 Global Step: 67450 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:29:31,033-Speed 5985.68 samples/sec Loss 10.4390 LearningRate 0.2248 Epoch: 6 Global Step: 67460 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:29:37,882-Speed 5981.52 samples/sec Loss 10.4561 LearningRate 0.2248 Epoch: 6 Global Step: 67470 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:29:44,733-Speed 5980.27 samples/sec Loss 10.4136 LearningRate 0.2247 Epoch: 6 Global Step: 67480 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:29:51,584-Speed 5979.56 samples/sec Loss 10.4158 LearningRate 0.2247 Epoch: 6 Global Step: 67490 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 09:29:58,435-Speed 5979.53 samples/sec Loss 10.4564 LearningRate 0.2247 Epoch: 6 Global Step: 67500 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 09:30:05,300-Speed 5969.37 samples/sec Loss 10.4107 LearningRate 0.2246 Epoch: 6 Global Step: 67510 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 09:30:12,151-Speed 5978.91 samples/sec Loss 10.3823 LearningRate 0.2246 Epoch: 6 Global Step: 67520 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 09:30:19,002-Speed 5980.27 samples/sec Loss 10.3656 LearningRate 0.2246 Epoch: 6 Global Step: 67530 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:30:25,866-Speed 5968.63 samples/sec Loss 10.4010 LearningRate 0.2245 Epoch: 6 Global Step: 67540 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:30:32,729-Speed 5969.32 samples/sec Loss 10.4134 LearningRate 0.2245 Epoch: 6 Global Step: 67550 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:30:39,604-Speed 5961.53 samples/sec Loss 10.4265 LearningRate 0.2245 Epoch: 6 Global Step: 67560 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:30:46,460-Speed 5976.02 samples/sec Loss 10.3169 LearningRate 0.2244 Epoch: 6 Global Step: 67570 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:30:53,311-Speed 5979.20 samples/sec Loss 10.4299 LearningRate 0.2244 Epoch: 6 Global Step: 67580 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:31:00,163-Speed 5979.38 samples/sec Loss 10.3436 LearningRate 0.2244 Epoch: 6 Global Step: 67590 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 09:31:07,022-Speed 5973.03 samples/sec Loss 10.4440 LearningRate 0.2244 Epoch: 6 Global Step: 67600 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 09:31:13,899-Speed 5956.96 samples/sec Loss 10.4215 LearningRate 0.2243 Epoch: 6 Global Step: 67610 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 09:31:20,773-Speed 5960.54 samples/sec Loss 10.3673 LearningRate 0.2243 Epoch: 6 Global Step: 67620 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 09:31:27,628-Speed 5976.37 samples/sec Loss 10.4807 LearningRate 0.2243 Epoch: 6 Global Step: 67630 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 09:31:34,479-Speed 5979.78 samples/sec Loss 10.5291 LearningRate 0.2242 Epoch: 6 Global Step: 67640 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 09:31:41,414-Speed 5907.06 samples/sec Loss 10.3749 LearningRate 0.2242 Epoch: 6 Global Step: 67650 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 09:31:48,387-Speed 5875.44 samples/sec Loss 10.4330 LearningRate 0.2242 Epoch: 6 Global Step: 67660 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 09:31:55,339-Speed 5893.35 samples/sec Loss 10.4525 LearningRate 0.2241 Epoch: 6 Global Step: 67670 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 09:32:02,205-Speed 5966.60 samples/sec Loss 10.4148 LearningRate 0.2241 Epoch: 6 Global Step: 67680 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 09:32:09,049-Speed 5985.45 samples/sec Loss 10.4297 LearningRate 0.2241 Epoch: 6 Global Step: 67690 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 09:32:15,943-Speed 5943.14 samples/sec Loss 10.3528 LearningRate 0.2240 Epoch: 6 Global Step: 67700 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 09:32:22,854-Speed 5927.84 samples/sec Loss 10.4605 LearningRate 0.2240 Epoch: 6 Global Step: 67710 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 09:32:29,726-Speed 5960.58 samples/sec Loss 10.4870 LearningRate 0.2240 Epoch: 6 Global Step: 67720 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 09:32:36,571-Speed 5985.89 samples/sec Loss 10.4427 LearningRate 0.2239 Epoch: 6 Global Step: 67730 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 09:32:43,433-Speed 5969.98 samples/sec Loss 10.4311 LearningRate 0.2239 Epoch: 6 Global Step: 67740 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 09:32:50,287-Speed 5977.33 samples/sec Loss 10.3941 LearningRate 0.2239 Epoch: 6 Global Step: 67750 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 09:32:57,165-Speed 5956.83 samples/sec Loss 10.4476 LearningRate 0.2238 Epoch: 6 Global Step: 67760 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 09:33:04,051-Speed 5948.90 samples/sec Loss 10.4841 LearningRate 0.2238 Epoch: 6 Global Step: 67770 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 09:33:10,905-Speed 5977.30 samples/sec Loss 10.4639 LearningRate 0.2238 Epoch: 6 Global Step: 67780 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-01-08 09:33:17,772-Speed 5966.26 samples/sec Loss 10.4526 LearningRate 0.2237 Epoch: 6 Global Step: 67790 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-01-08 09:33:24,649-Speed 5957.39 samples/sec Loss 10.4965 LearningRate 0.2237 Epoch: 6 Global Step: 67800 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-01-08 09:33:31,498-Speed 5981.19 samples/sec Loss 10.4829 LearningRate 0.2237 Epoch: 6 Global Step: 67810 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-01-08 09:33:38,401-Speed 5937.24 samples/sec Loss 10.4068 LearningRate 0.2236 Epoch: 6 Global Step: 67820 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-01-08 09:33:45,282-Speed 5953.89 samples/sec Loss 10.4150 LearningRate 0.2236 Epoch: 6 Global Step: 67830 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-01-08 09:33:52,179-Speed 5939.64 samples/sec Loss 10.4238 LearningRate 0.2236 Epoch: 6 Global Step: 67840 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-01-08 09:33:59,048-Speed 5965.16 samples/sec Loss 10.4293 LearningRate 0.2236 Epoch: 6 Global Step: 67850 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-01-08 09:34:05,923-Speed 5958.64 samples/sec Loss 10.3823 LearningRate 0.2235 Epoch: 6 Global Step: 67860 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-01-08 09:34:12,769-Speed 5984.06 samples/sec Loss 10.4577 LearningRate 0.2235 Epoch: 6 Global Step: 67870 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-01-08 09:34:19,651-Speed 5953.15 samples/sec Loss 10.3904 LearningRate 0.2235 Epoch: 6 Global Step: 67880 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 09:34:26,527-Speed 5960.68 samples/sec Loss 10.3716 LearningRate 0.2234 Epoch: 6 Global Step: 67890 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 09:34:33,381-Speed 5977.44 samples/sec Loss 10.4249 LearningRate 0.2234 Epoch: 6 Global Step: 67900 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 09:34:40,238-Speed 5974.79 samples/sec Loss 10.4668 LearningRate 0.2234 Epoch: 6 Global Step: 67910 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 09:34:47,088-Speed 5981.10 samples/sec Loss 10.5049 LearningRate 0.2233 Epoch: 6 Global Step: 67920 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 09:34:53,944-Speed 5975.56 samples/sec Loss 10.4329 LearningRate 0.2233 Epoch: 6 Global Step: 67930 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 09:35:00,812-Speed 5965.01 samples/sec Loss 10.4130 LearningRate 0.2233 Epoch: 6 Global Step: 67940 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 09:35:07,692-Speed 5955.41 samples/sec Loss 10.3856 LearningRate 0.2232 Epoch: 6 Global Step: 67950 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 09:35:14,554-Speed 5970.71 samples/sec Loss 10.3629 LearningRate 0.2232 Epoch: 6 Global Step: 67960 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 09:35:21,413-Speed 5972.86 samples/sec Loss 10.3740 LearningRate 0.2232 Epoch: 6 Global Step: 67970 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 09:35:28,259-Speed 5984.62 samples/sec Loss 10.3689 LearningRate 0.2231 Epoch: 6 Global Step: 67980 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 09:35:35,110-Speed 5978.77 samples/sec Loss 10.3722 LearningRate 0.2231 Epoch: 6 Global Step: 67990 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 09:35:41,962-Speed 5979.02 samples/sec Loss 10.4262 LearningRate 0.2231 Epoch: 6 Global Step: 68000 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 09:35:48,835-Speed 5961.62 samples/sec Loss 10.3858 LearningRate 0.2230 Epoch: 6 Global Step: 68010 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 09:35:55,689-Speed 5979.12 samples/sec Loss 10.4289 LearningRate 0.2230 Epoch: 6 Global Step: 68020 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 09:36:02,564-Speed 5958.82 samples/sec Loss 10.3255 LearningRate 0.2230 Epoch: 6 Global Step: 68030 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 09:36:09,434-Speed 5962.76 samples/sec Loss 10.3542 LearningRate 0.2229 Epoch: 6 Global Step: 68040 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 09:36:16,280-Speed 5984.08 samples/sec Loss 10.3942 LearningRate 0.2229 Epoch: 6 Global Step: 68050 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 09:36:23,124-Speed 5985.37 samples/sec Loss 10.4232 LearningRate 0.2229 Epoch: 6 Global Step: 68060 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 09:36:29,979-Speed 5976.69 samples/sec Loss 10.3727 LearningRate 0.2228 Epoch: 6 Global Step: 68070 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 09:36:36,877-Speed 5938.61 samples/sec Loss 10.4198 LearningRate 0.2228 Epoch: 6 Global Step: 68080 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:36:43,747-Speed 5965.04 samples/sec Loss 10.3766 LearningRate 0.2228 Epoch: 6 Global Step: 68090 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:36:50,596-Speed 5980.97 samples/sec Loss 10.3959 LearningRate 0.2228 Epoch: 6 Global Step: 68100 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:36:57,443-Speed 5983.99 samples/sec Loss 10.3914 LearningRate 0.2227 Epoch: 6 Global Step: 68110 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:37:04,294-Speed 5980.06 samples/sec Loss 10.3779 LearningRate 0.2227 Epoch: 6 Global Step: 68120 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:37:11,142-Speed 5983.98 samples/sec Loss 10.4092 LearningRate 0.2227 Epoch: 6 Global Step: 68130 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:37:17,988-Speed 5983.77 samples/sec Loss 10.3493 LearningRate 0.2226 Epoch: 6 Global Step: 68140 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:37:24,849-Speed 5970.83 samples/sec Loss 10.3609 LearningRate 0.2226 Epoch: 6 Global Step: 68150 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:37:31,714-Speed 5967.65 samples/sec Loss 10.3901 LearningRate 0.2226 Epoch: 6 Global Step: 68160 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:37:38,557-Speed 5987.63 samples/sec Loss 10.3463 LearningRate 0.2225 Epoch: 6 Global Step: 68170 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:37:45,408-Speed 5979.10 samples/sec Loss 10.2874 LearningRate 0.2225 Epoch: 6 Global Step: 68180 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 09:37:52,264-Speed 5975.51 samples/sec Loss 10.3035 LearningRate 0.2225 Epoch: 6 Global Step: 68190 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 09:37:59,119-Speed 5976.19 samples/sec Loss 10.3946 LearningRate 0.2224 Epoch: 6 Global Step: 68200 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:38:05,994-Speed 5959.51 samples/sec Loss 10.3614 LearningRate 0.2224 Epoch: 6 Global Step: 68210 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:38:12,854-Speed 5971.42 samples/sec Loss 10.4044 LearningRate 0.2224 Epoch: 6 Global Step: 68220 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:38:19,726-Speed 5961.60 samples/sec Loss 10.4028 LearningRate 0.2223 Epoch: 6 Global Step: 68230 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:38:26,636-Speed 5927.95 samples/sec Loss 10.4187 LearningRate 0.2223 Epoch: 6 Global Step: 68240 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:38:33,485-Speed 5982.22 samples/sec Loss 10.3543 LearningRate 0.2223 Epoch: 6 Global Step: 68250 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:38:40,330-Speed 5984.96 samples/sec Loss 10.4243 LearningRate 0.2222 Epoch: 6 Global Step: 68260 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:38:47,180-Speed 5979.98 samples/sec Loss 10.2942 LearningRate 0.2222 Epoch: 6 Global Step: 68270 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:38:54,034-Speed 5977.74 samples/sec Loss 10.4162 LearningRate 0.2222 Epoch: 6 Global Step: 68280 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:39:00,882-Speed 5982.63 samples/sec Loss 10.4635 LearningRate 0.2221 Epoch: 6 Global Step: 68290 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:39:07,743-Speed 5970.55 samples/sec Loss 10.3575 LearningRate 0.2221 Epoch: 6 Global Step: 68300 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 09:39:14,584-Speed 5988.20 samples/sec Loss 10.3744 LearningRate 0.2221 Epoch: 6 Global Step: 68310 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:39:21,431-Speed 5983.44 samples/sec Loss 10.3238 LearningRate 0.2220 Epoch: 6 Global Step: 68320 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:39:28,287-Speed 5975.22 samples/sec Loss 10.3134 LearningRate 0.2220 Epoch: 6 Global Step: 68330 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:39:35,140-Speed 5978.17 samples/sec Loss 10.4295 LearningRate 0.2220 Epoch: 6 Global Step: 68340 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:39:41,993-Speed 5978.55 samples/sec Loss 10.3722 LearningRate 0.2220 Epoch: 6 Global Step: 68350 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:39:48,838-Speed 5984.38 samples/sec Loss 10.3758 LearningRate 0.2219 Epoch: 6 Global Step: 68360 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:39:55,692-Speed 5977.73 samples/sec Loss 10.3916 LearningRate 0.2219 Epoch: 6 Global Step: 68370 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:40:02,544-Speed 5978.74 samples/sec Loss 10.2791 LearningRate 0.2219 Epoch: 6 Global Step: 68380 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:40:09,418-Speed 5959.96 samples/sec Loss 10.3109 LearningRate 0.2218 Epoch: 6 Global Step: 68390 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:40:16,275-Speed 5973.66 samples/sec Loss 10.3629 LearningRate 0.2218 Epoch: 6 Global Step: 68400 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:40:23,152-Speed 5957.52 samples/sec Loss 10.3316 LearningRate 0.2218 Epoch: 6 Global Step: 68410 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:40:30,009-Speed 5975.07 samples/sec Loss 10.3222 LearningRate 0.2217 Epoch: 6 Global Step: 68420 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:40:36,879-Speed 5962.78 samples/sec Loss 10.3639 LearningRate 0.2217 Epoch: 6 Global Step: 68430 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:40:43,734-Speed 5976.54 samples/sec Loss 10.3418 LearningRate 0.2217 Epoch: 6 Global Step: 68440 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:40:50,577-Speed 5989.15 samples/sec Loss 10.3246 LearningRate 0.2216 Epoch: 6 Global Step: 68450 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:40:57,425-Speed 5982.52 samples/sec Loss 10.4064 LearningRate 0.2216 Epoch: 6 Global Step: 68460 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:41:04,288-Speed 5969.39 samples/sec Loss 10.4117 LearningRate 0.2216 Epoch: 6 Global Step: 68470 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:41:11,154-Speed 5966.65 samples/sec Loss 10.3741 LearningRate 0.2215 Epoch: 6 Global Step: 68480 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:41:18,007-Speed 5977.85 samples/sec Loss 10.3109 LearningRate 0.2215 Epoch: 6 Global Step: 68490 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:41:24,856-Speed 5981.50 samples/sec Loss 10.2858 LearningRate 0.2215 Epoch: 6 Global Step: 68500 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:41:31,727-Speed 5961.98 samples/sec Loss 10.4232 LearningRate 0.2214 Epoch: 6 Global Step: 68510 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 09:41:38,564-Speed 5991.85 samples/sec Loss 10.3573 LearningRate 0.2214 Epoch: 6 Global Step: 68520 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:41:45,423-Speed 5973.27 samples/sec Loss 10.3935 LearningRate 0.2214 Epoch: 6 Global Step: 68530 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:41:52,293-Speed 5963.32 samples/sec Loss 10.3351 LearningRate 0.2213 Epoch: 6 Global Step: 68540 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:41:59,165-Speed 5962.04 samples/sec Loss 10.3363 LearningRate 0.2213 Epoch: 6 Global Step: 68550 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:42:06,016-Speed 5979.30 samples/sec Loss 10.3474 LearningRate 0.2213 Epoch: 6 Global Step: 68560 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:42:12,867-Speed 5980.28 samples/sec Loss 10.3508 LearningRate 0.2212 Epoch: 6 Global Step: 68570 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:42:19,725-Speed 5973.80 samples/sec Loss 10.3776 LearningRate 0.2212 Epoch: 6 Global Step: 68580 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:42:26,597-Speed 5961.30 samples/sec Loss 10.3030 LearningRate 0.2212 Epoch: 6 Global Step: 68590 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:42:33,465-Speed 5964.89 samples/sec Loss 10.4233 LearningRate 0.2212 Epoch: 6 Global Step: 68600 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:42:40,305-Speed 5989.35 samples/sec Loss 10.4253 LearningRate 0.2211 Epoch: 6 Global Step: 68610 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:42:47,169-Speed 5968.68 samples/sec Loss 10.4208 LearningRate 0.2211 Epoch: 6 Global Step: 68620 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 09:42:54,048-Speed 5954.98 samples/sec Loss 10.3453 LearningRate 0.2211 Epoch: 6 Global Step: 68630 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 09:43:00,898-Speed 5980.45 samples/sec Loss 10.4126 LearningRate 0.2210 Epoch: 6 Global Step: 68640 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 09:43:07,746-Speed 5982.46 samples/sec Loss 10.3762 LearningRate 0.2210 Epoch: 6 Global Step: 68650 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:43:14,591-Speed 5985.17 samples/sec Loss 10.3364 LearningRate 0.2210 Epoch: 6 Global Step: 68660 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:43:21,442-Speed 5980.17 samples/sec Loss 10.3434 LearningRate 0.2209 Epoch: 6 Global Step: 68670 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:43:28,291-Speed 5981.18 samples/sec Loss 10.4259 LearningRate 0.2209 Epoch: 6 Global Step: 68680 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:43:35,136-Speed 5986.01 samples/sec Loss 10.4341 LearningRate 0.2209 Epoch: 6 Global Step: 68690 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:43:42,025-Speed 5947.93 samples/sec Loss 10.3109 LearningRate 0.2208 Epoch: 6 Global Step: 68700 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:43:48,877-Speed 5978.88 samples/sec Loss 10.2741 LearningRate 0.2208 Epoch: 6 Global Step: 68710 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:43:55,764-Speed 5948.38 samples/sec Loss 10.3607 LearningRate 0.2208 Epoch: 6 Global Step: 68720 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:44:02,644-Speed 5954.85 samples/sec Loss 10.3834 LearningRate 0.2207 Epoch: 6 Global Step: 68730 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:44:09,533-Speed 5946.97 samples/sec Loss 10.3033 LearningRate 0.2207 Epoch: 6 Global Step: 68740 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:44:16,485-Speed 5893.35 samples/sec Loss 10.3649 LearningRate 0.2207 Epoch: 6 Global Step: 68750 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 09:44:23,340-Speed 5976.50 samples/sec Loss 10.2482 LearningRate 0.2206 Epoch: 6 Global Step: 68760 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 09:44:30,317-Speed 5871.97 samples/sec Loss 10.2899 LearningRate 0.2206 Epoch: 6 Global Step: 68770 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 09:44:37,272-Speed 5890.93 samples/sec Loss 10.3466 LearningRate 0.2206 Epoch: 6 Global Step: 68780 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 09:44:44,126-Speed 5977.68 samples/sec Loss 10.3421 LearningRate 0.2205 Epoch: 6 Global Step: 68790 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 09:44:50,975-Speed 5980.95 samples/sec Loss 10.3675 LearningRate 0.2205 Epoch: 6 Global Step: 68800 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 09:44:57,846-Speed 5963.42 samples/sec Loss 10.2500 LearningRate 0.2205 Epoch: 6 Global Step: 68810 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 09:45:04,736-Speed 5945.91 samples/sec Loss 10.3965 LearningRate 0.2205 Epoch: 6 Global Step: 68820 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 09:45:11,583-Speed 5983.24 samples/sec Loss 10.3902 LearningRate 0.2204 Epoch: 6 Global Step: 68830 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 09:45:18,453-Speed 5963.24 samples/sec Loss 10.3374 LearningRate 0.2204 Epoch: 6 Global Step: 68840 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 09:45:25,290-Speed 5992.94 samples/sec Loss 10.3754 LearningRate 0.2204 Epoch: 6 Global Step: 68850 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 09:45:32,137-Speed 5982.64 samples/sec Loss 10.4253 LearningRate 0.2203 Epoch: 6 Global Step: 68860 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:45:39,015-Speed 5956.63 samples/sec Loss 10.3819 LearningRate 0.2203 Epoch: 6 Global Step: 68870 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:45:45,885-Speed 5963.56 samples/sec Loss 10.2676 LearningRate 0.2203 Epoch: 6 Global Step: 68880 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:45:52,746-Speed 5970.99 samples/sec Loss 10.4188 LearningRate 0.2202 Epoch: 6 Global Step: 68890 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:45:59,611-Speed 5967.34 samples/sec Loss 10.2960 LearningRate 0.2202 Epoch: 6 Global Step: 68900 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:46:06,468-Speed 5974.72 samples/sec Loss 10.3143 LearningRate 0.2202 Epoch: 6 Global Step: 68910 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:46:13,325-Speed 5977.27 samples/sec Loss 10.3251 LearningRate 0.2201 Epoch: 6 Global Step: 68920 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:46:20,179-Speed 5977.69 samples/sec Loss 10.3669 LearningRate 0.2201 Epoch: 6 Global Step: 68930 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:46:27,064-Speed 5950.81 samples/sec Loss 10.2600 LearningRate 0.2201 Epoch: 6 Global Step: 68940 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:46:33,942-Speed 5956.49 samples/sec Loss 10.3337 LearningRate 0.2200 Epoch: 6 Global Step: 68950 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:46:40,805-Speed 5971.55 samples/sec Loss 10.3391 LearningRate 0.2200 Epoch: 6 Global Step: 68960 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 09:46:47,651-Speed 5983.70 samples/sec Loss 10.3009 LearningRate 0.2200 Epoch: 6 Global Step: 68970 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 09:46:54,521-Speed 5963.35 samples/sec Loss 10.2749 LearningRate 0.2199 Epoch: 6 Global Step: 68980 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 09:47:01,387-Speed 5967.27 samples/sec Loss 10.3406 LearningRate 0.2199 Epoch: 6 Global Step: 68990 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 09:47:08,294-Speed 5931.59 samples/sec Loss 10.2748 LearningRate 0.2199 Epoch: 6 Global Step: 69000 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 09:47:15,191-Speed 5940.10 samples/sec Loss 10.3573 LearningRate 0.2198 Epoch: 6 Global Step: 69010 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 09:47:22,064-Speed 5959.84 samples/sec Loss 10.2763 LearningRate 0.2198 Epoch: 6 Global Step: 69020 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:47:28,983-Speed 5921.93 samples/sec Loss 10.2898 LearningRate 0.2198 Epoch: 6 Global Step: 69030 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:47:35,900-Speed 5922.58 samples/sec Loss 10.3749 LearningRate 0.2198 Epoch: 6 Global Step: 69040 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:47:42,745-Speed 5985.01 samples/sec Loss 10.3408 LearningRate 0.2197 Epoch: 6 Global Step: 69050 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 09:47:49,602-Speed 5975.46 samples/sec Loss 10.3636 LearningRate 0.2197 Epoch: 6 Global Step: 69060 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 09:47:56,453-Speed 5979.75 samples/sec Loss 10.2698 LearningRate 0.2197 Epoch: 6 Global Step: 69070 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 09:48:03,340-Speed 5948.21 samples/sec Loss 10.2888 LearningRate 0.2196 Epoch: 6 Global Step: 69080 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 09:48:10,202-Speed 5970.90 samples/sec Loss 10.3244 LearningRate 0.2196 Epoch: 6 Global Step: 69090 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 09:48:17,059-Speed 5975.14 samples/sec Loss 10.2540 LearningRate 0.2196 Epoch: 6 Global Step: 69100 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 09:48:23,921-Speed 5970.32 samples/sec Loss 10.3194 LearningRate 0.2195 Epoch: 6 Global Step: 69110 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 09:48:30,784-Speed 5970.87 samples/sec Loss 10.3641 LearningRate 0.2195 Epoch: 6 Global Step: 69120 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 09:48:37,622-Speed 5990.91 samples/sec Loss 10.3812 LearningRate 0.2195 Epoch: 6 Global Step: 69130 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 09:48:44,508-Speed 5950.08 samples/sec Loss 10.2493 LearningRate 0.2194 Epoch: 6 Global Step: 69140 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 09:48:51,372-Speed 5969.05 samples/sec Loss 10.3126 LearningRate 0.2194 Epoch: 6 Global Step: 69150 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:48:58,233-Speed 5970.92 samples/sec Loss 10.2960 LearningRate 0.2194 Epoch: 6 Global Step: 69160 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:49:05,092-Speed 5973.16 samples/sec Loss 10.3229 LearningRate 0.2193 Epoch: 6 Global Step: 69170 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:49:11,956-Speed 5969.70 samples/sec Loss 10.3736 LearningRate 0.2193 Epoch: 6 Global Step: 69180 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:49:18,833-Speed 5957.57 samples/sec Loss 10.3300 LearningRate 0.2193 Epoch: 6 Global Step: 69190 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:49:25,706-Speed 5960.21 samples/sec Loss 10.2306 LearningRate 0.2192 Epoch: 6 Global Step: 69200 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:49:32,573-Speed 5970.00 samples/sec Loss 10.3529 LearningRate 0.2192 Epoch: 6 Global Step: 69210 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:49:39,428-Speed 5975.83 samples/sec Loss 10.2320 LearningRate 0.2192 Epoch: 6 Global Step: 69220 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:49:46,290-Speed 5970.33 samples/sec Loss 10.2703 LearningRate 0.2192 Epoch: 6 Global Step: 69230 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:49:53,152-Speed 5970.29 samples/sec Loss 10.3145 LearningRate 0.2191 Epoch: 6 Global Step: 69240 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:50:00,014-Speed 5970.36 samples/sec Loss 10.4137 LearningRate 0.2191 Epoch: 6 Global Step: 69250 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 09:50:06,884-Speed 5963.65 samples/sec Loss 10.3467 LearningRate 0.2191 Epoch: 6 Global Step: 69260 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 09:50:13,723-Speed 5990.06 samples/sec Loss 10.2616 LearningRate 0.2190 Epoch: 6 Global Step: 69270 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:50:20,572-Speed 5982.41 samples/sec Loss 10.3381 LearningRate 0.2190 Epoch: 6 Global Step: 69280 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:50:27,421-Speed 5981.07 samples/sec Loss 10.3071 LearningRate 0.2190 Epoch: 6 Global Step: 69290 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:50:34,276-Speed 5976.82 samples/sec Loss 10.2755 LearningRate 0.2189 Epoch: 6 Global Step: 69300 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:50:41,115-Speed 5989.24 samples/sec Loss 10.3760 LearningRate 0.2189 Epoch: 6 Global Step: 69310 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 09:50:48,000-Speed 5950.36 samples/sec Loss 10.2415 LearningRate 0.2189 Epoch: 6 Global Step: 69320 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 09:50:54,858-Speed 5974.48 samples/sec Loss 10.3022 LearningRate 0.2188 Epoch: 6 Global Step: 69330 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 09:51:01,710-Speed 5980.63 samples/sec Loss 10.3160 LearningRate 0.2188 Epoch: 6 Global Step: 69340 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 09:51:08,566-Speed 5975.91 samples/sec Loss 10.2791 LearningRate 0.2188 Epoch: 6 Global Step: 69350 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 09:51:15,414-Speed 5982.55 samples/sec Loss 10.2709 LearningRate 0.2187 Epoch: 6 Global Step: 69360 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 09:51:22,285-Speed 5962.14 samples/sec Loss 10.2686 LearningRate 0.2187 Epoch: 6 Global Step: 69370 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 09:51:29,140-Speed 5976.42 samples/sec Loss 10.2501 LearningRate 0.2187 Epoch: 6 Global Step: 69380 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 09:51:36,007-Speed 5966.14 samples/sec Loss 10.2612 LearningRate 0.2186 Epoch: 6 Global Step: 69390 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 09:51:42,887-Speed 5954.71 samples/sec Loss 10.2860 LearningRate 0.2186 Epoch: 6 Global Step: 69400 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 09:51:49,743-Speed 5978.20 samples/sec Loss 10.2289 LearningRate 0.2186 Epoch: 6 Global Step: 69410 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:51:56,593-Speed 5980.19 samples/sec Loss 10.3016 LearningRate 0.2185 Epoch: 6 Global Step: 69420 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:52:03,526-Speed 5909.95 samples/sec Loss 10.3014 LearningRate 0.2185 Epoch: 6 Global Step: 69430 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:52:10,376-Speed 5980.04 samples/sec Loss 10.3361 LearningRate 0.2185 Epoch: 6 Global Step: 69440 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:52:17,241-Speed 5967.74 samples/sec Loss 10.2362 LearningRate 0.2185 Epoch: 6 Global Step: 69450 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:52:24,111-Speed 5963.23 samples/sec Loss 10.2193 LearningRate 0.2184 Epoch: 6 Global Step: 69460 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:52:30,997-Speed 5949.86 samples/sec Loss 10.1795 LearningRate 0.2184 Epoch: 6 Global Step: 69470 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:52:37,844-Speed 5983.69 samples/sec Loss 10.2635 LearningRate 0.2184 Epoch: 6 Global Step: 69480 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:52:44,693-Speed 5981.66 samples/sec Loss 10.2509 LearningRate 0.2183 Epoch: 6 Global Step: 69490 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:52:51,550-Speed 5975.17 samples/sec Loss 10.2351 LearningRate 0.2183 Epoch: 6 Global Step: 69500 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:52:58,399-Speed 5981.49 samples/sec Loss 10.3131 LearningRate 0.2183 Epoch: 6 Global Step: 69510 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 09:53:05,254-Speed 5976.30 samples/sec Loss 10.1928 LearningRate 0.2182 Epoch: 6 Global Step: 69520 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:53:12,121-Speed 5966.36 samples/sec Loss 10.2435 LearningRate 0.2182 Epoch: 6 Global Step: 69530 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:53:18,969-Speed 5982.29 samples/sec Loss 10.3273 LearningRate 0.2182 Epoch: 6 Global Step: 69540 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:53:25,827-Speed 5973.68 samples/sec Loss 10.2046 LearningRate 0.2181 Epoch: 6 Global Step: 69550 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:53:32,677-Speed 5982.64 samples/sec Loss 10.2787 LearningRate 0.2181 Epoch: 6 Global Step: 69560 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:53:39,525-Speed 5982.93 samples/sec Loss 10.3189 LearningRate 0.2181 Epoch: 6 Global Step: 69570 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:53:46,369-Speed 5985.42 samples/sec Loss 10.2769 LearningRate 0.2180 Epoch: 6 Global Step: 69580 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:53:53,246-Speed 5961.15 samples/sec Loss 10.2442 LearningRate 0.2180 Epoch: 6 Global Step: 69590 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:54:00,106-Speed 5971.69 samples/sec Loss 10.2373 LearningRate 0.2180 Epoch: 6 Global Step: 69600 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:54:06,980-Speed 5960.05 samples/sec Loss 10.3351 LearningRate 0.2179 Epoch: 6 Global Step: 69610 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:54:13,845-Speed 5967.33 samples/sec Loss 10.2824 LearningRate 0.2179 Epoch: 6 Global Step: 69620 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 09:54:20,711-Speed 5967.32 samples/sec Loss 10.3935 LearningRate 0.2179 Epoch: 6 Global Step: 69630 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 09:54:27,573-Speed 5969.49 samples/sec Loss 10.3155 LearningRate 0.2179 Epoch: 6 Global Step: 69640 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 09:54:34,449-Speed 5958.08 samples/sec Loss 10.3524 LearningRate 0.2178 Epoch: 6 Global Step: 69650 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 09:54:41,296-Speed 5984.16 samples/sec Loss 10.2547 LearningRate 0.2178 Epoch: 6 Global Step: 69660 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:54:48,155-Speed 5972.55 samples/sec Loss 10.3088 LearningRate 0.2178 Epoch: 6 Global Step: 69670 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:54:55,009-Speed 5976.78 samples/sec Loss 10.3263 LearningRate 0.2177 Epoch: 6 Global Step: 69680 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:55:01,889-Speed 5955.79 samples/sec Loss 10.3056 LearningRate 0.2177 Epoch: 6 Global Step: 69690 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:55:08,754-Speed 5967.43 samples/sec Loss 10.2029 LearningRate 0.2177 Epoch: 6 Global Step: 69700 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:55:15,597-Speed 5986.16 samples/sec Loss 10.2519 LearningRate 0.2176 Epoch: 6 Global Step: 69710 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:55:22,442-Speed 5985.34 samples/sec Loss 10.2570 LearningRate 0.2176 Epoch: 6 Global Step: 69720 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:55:29,298-Speed 5975.10 samples/sec Loss 10.3262 LearningRate 0.2176 Epoch: 6 Global Step: 69730 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:55:36,153-Speed 5977.04 samples/sec Loss 10.3264 LearningRate 0.2175 Epoch: 6 Global Step: 69740 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:55:43,033-Speed 5954.62 samples/sec Loss 10.2562 LearningRate 0.2175 Epoch: 6 Global Step: 69750 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:55:49,888-Speed 5976.07 samples/sec Loss 10.2452 LearningRate 0.2175 Epoch: 6 Global Step: 69760 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 09:55:56,738-Speed 5983.92 samples/sec Loss 10.3291 LearningRate 0.2174 Epoch: 6 Global Step: 69770 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:56:03,597-Speed 5972.93 samples/sec Loss 10.2102 LearningRate 0.2174 Epoch: 6 Global Step: 69780 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:56:10,458-Speed 5971.03 samples/sec Loss 10.2356 LearningRate 0.2174 Epoch: 6 Global Step: 69790 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:56:17,328-Speed 5963.61 samples/sec Loss 10.2619 LearningRate 0.2173 Epoch: 6 Global Step: 69800 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:56:24,190-Speed 5969.84 samples/sec Loss 10.3415 LearningRate 0.2173 Epoch: 6 Global Step: 69810 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:56:31,038-Speed 5981.99 samples/sec Loss 10.2730 LearningRate 0.2173 Epoch: 6 Global Step: 69820 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:56:37,896-Speed 5973.24 samples/sec Loss 10.2830 LearningRate 0.2173 Epoch: 6 Global Step: 69830 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:56:44,752-Speed 5975.95 samples/sec Loss 10.1927 LearningRate 0.2172 Epoch: 6 Global Step: 69840 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:56:51,624-Speed 5961.52 samples/sec Loss 10.2925 LearningRate 0.2172 Epoch: 6 Global Step: 69850 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:56:58,483-Speed 5972.59 samples/sec Loss 10.2550 LearningRate 0.2172 Epoch: 6 Global Step: 69860 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:57:05,328-Speed 5985.17 samples/sec Loss 10.3151 LearningRate 0.2171 Epoch: 6 Global Step: 69870 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:57:12,202-Speed 5960.18 samples/sec Loss 10.3343 LearningRate 0.2171 Epoch: 6 Global Step: 69880 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:57:19,066-Speed 5968.20 samples/sec Loss 10.1464 LearningRate 0.2171 Epoch: 6 Global Step: 69890 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:57:25,934-Speed 5965.59 samples/sec Loss 10.2330 LearningRate 0.2170 Epoch: 6 Global Step: 69900 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 09:57:32,780-Speed 5984.40 samples/sec Loss 10.2246 LearningRate 0.2170 Epoch: 6 Global Step: 69910 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 09:57:39,636-Speed 5974.80 samples/sec Loss 10.1519 LearningRate 0.2170 Epoch: 6 Global Step: 69920 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 09:57:46,485-Speed 5981.78 samples/sec Loss 10.2268 LearningRate 0.2169 Epoch: 6 Global Step: 69930 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 09:57:53,331-Speed 5984.86 samples/sec Loss 10.2443 LearningRate 0.2169 Epoch: 6 Global Step: 69940 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 09:58:00,208-Speed 5957.21 samples/sec Loss 10.1509 LearningRate 0.2169 Epoch: 6 Global Step: 69950 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 09:58:07,058-Speed 5981.31 samples/sec Loss 10.2486 LearningRate 0.2168 Epoch: 6 Global Step: 69960 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 09:58:13,908-Speed 5980.25 samples/sec Loss 10.2663 LearningRate 0.2168 Epoch: 6 Global Step: 69970 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 09:58:20,768-Speed 5971.99 samples/sec Loss 10.2582 LearningRate 0.2168 Epoch: 6 Global Step: 69980 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 09:58:27,642-Speed 5962.53 samples/sec Loss 10.3201 LearningRate 0.2167 Epoch: 6 Global Step: 69990 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 09:58:34,500-Speed 5973.42 samples/sec Loss 10.2610 LearningRate 0.2167 Epoch: 6 Global Step: 70000 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 09:59:01,283-[lfw][70000]XNorm: 23.052480 Training: 2022-01-08 09:59:01,284-[lfw][70000]Accuracy-Flip: 0.99750+-0.00291 Training: 2022-01-08 09:59:01,285-[lfw][70000]Accuracy-Highest: 0.99750 Training: 2022-01-08 09:59:32,303-[cfp_fp][70000]XNorm: 20.241400 Training: 2022-01-08 09:59:32,304-[cfp_fp][70000]Accuracy-Flip: 0.97443+-0.00781 Training: 2022-01-08 09:59:32,305-[cfp_fp][70000]Accuracy-Highest: 0.97686 Training: 2022-01-08 09:59:59,096-[agedb_30][70000]XNorm: 22.574474 Training: 2022-01-08 09:59:59,097-[agedb_30][70000]Accuracy-Flip: 0.96050+-0.00940 Training: 2022-01-08 09:59:59,098-[agedb_30][70000]Accuracy-Highest: 0.96633 Training: 2022-01-08 10:00:05,971-Speed 447.80 samples/sec Loss 10.2294 LearningRate 0.2167 Epoch: 6 Global Step: 70010 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:00:12,800-Speed 6001.42 samples/sec Loss 10.3211 LearningRate 0.2167 Epoch: 6 Global Step: 70020 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:00:19,637-Speed 5992.20 samples/sec Loss 10.2896 LearningRate 0.2166 Epoch: 6 Global Step: 70030 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:00:26,478-Speed 5988.29 samples/sec Loss 10.2625 LearningRate 0.2166 Epoch: 6 Global Step: 70040 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:00:33,332-Speed 5976.84 samples/sec Loss 10.2206 LearningRate 0.2166 Epoch: 6 Global Step: 70050 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:00:40,194-Speed 5970.48 samples/sec Loss 10.2120 LearningRate 0.2165 Epoch: 6 Global Step: 70060 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:00:47,050-Speed 5977.53 samples/sec Loss 10.2554 LearningRate 0.2165 Epoch: 6 Global Step: 70070 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:00:53,899-Speed 5980.50 samples/sec Loss 10.2569 LearningRate 0.2165 Epoch: 6 Global Step: 70080 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:01:00,762-Speed 5969.91 samples/sec Loss 10.2690 LearningRate 0.2164 Epoch: 6 Global Step: 70090 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:01:07,648-Speed 5949.93 samples/sec Loss 10.2747 LearningRate 0.2164 Epoch: 6 Global Step: 70100 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 10:01:14,500-Speed 5978.93 samples/sec Loss 10.2850 LearningRate 0.2164 Epoch: 6 Global Step: 70110 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 10:01:21,364-Speed 5968.49 samples/sec Loss 10.2120 LearningRate 0.2163 Epoch: 6 Global Step: 70120 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 10:01:28,213-Speed 5981.83 samples/sec Loss 10.2170 LearningRate 0.2163 Epoch: 6 Global Step: 70130 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 10:01:35,055-Speed 5987.52 samples/sec Loss 10.2717 LearningRate 0.2163 Epoch: 6 Global Step: 70140 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 10:01:41,919-Speed 5968.54 samples/sec Loss 10.2397 LearningRate 0.2162 Epoch: 6 Global Step: 70150 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 10:01:48,830-Speed 5928.20 samples/sec Loss 10.1622 LearningRate 0.2162 Epoch: 6 Global Step: 70160 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 10:01:55,677-Speed 5983.02 samples/sec Loss 10.2206 LearningRate 0.2162 Epoch: 6 Global Step: 70170 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 10:02:02,531-Speed 5977.54 samples/sec Loss 10.3114 LearningRate 0.2161 Epoch: 6 Global Step: 70180 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 10:02:09,386-Speed 5978.04 samples/sec Loss 10.2031 LearningRate 0.2161 Epoch: 6 Global Step: 70190 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 10:02:16,227-Speed 5988.80 samples/sec Loss 10.1912 LearningRate 0.2161 Epoch: 6 Global Step: 70200 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 10:02:23,082-Speed 5975.63 samples/sec Loss 10.2933 LearningRate 0.2161 Epoch: 6 Global Step: 70210 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 10:02:29,934-Speed 5980.38 samples/sec Loss 10.1821 LearningRate 0.2160 Epoch: 6 Global Step: 70220 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 10:02:36,787-Speed 5977.42 samples/sec Loss 10.2340 LearningRate 0.2160 Epoch: 6 Global Step: 70230 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 10:02:43,637-Speed 5981.01 samples/sec Loss 10.2120 LearningRate 0.2160 Epoch: 6 Global Step: 70240 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 10:02:50,545-Speed 5930.23 samples/sec Loss 10.1845 LearningRate 0.2159 Epoch: 6 Global Step: 70250 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 10:02:57,414-Speed 5964.45 samples/sec Loss 10.2397 LearningRate 0.2159 Epoch: 6 Global Step: 70260 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 10:03:04,274-Speed 5972.60 samples/sec Loss 10.2252 LearningRate 0.2159 Epoch: 6 Global Step: 70270 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 10:03:11,111-Speed 5991.74 samples/sec Loss 10.2175 LearningRate 0.2158 Epoch: 6 Global Step: 70280 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:03:17,970-Speed 5973.76 samples/sec Loss 10.2831 LearningRate 0.2158 Epoch: 6 Global Step: 70290 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:03:24,833-Speed 5969.33 samples/sec Loss 10.2561 LearningRate 0.2158 Epoch: 6 Global Step: 70300 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:03:31,721-Speed 5947.96 samples/sec Loss 10.2248 LearningRate 0.2157 Epoch: 6 Global Step: 70310 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:03:38,639-Speed 5921.51 samples/sec Loss 10.1749 LearningRate 0.2157 Epoch: 6 Global Step: 70320 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:03:45,567-Speed 5913.65 samples/sec Loss 10.2258 LearningRate 0.2157 Epoch: 6 Global Step: 70330 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:03:52,496-Speed 5912.66 samples/sec Loss 10.1840 LearningRate 0.2156 Epoch: 6 Global Step: 70340 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:03:59,430-Speed 5908.56 samples/sec Loss 10.2935 LearningRate 0.2156 Epoch: 6 Global Step: 70350 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:04:06,363-Speed 5909.23 samples/sec Loss 10.2934 LearningRate 0.2156 Epoch: 6 Global Step: 70360 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:04:13,234-Speed 5963.04 samples/sec Loss 10.2056 LearningRate 0.2155 Epoch: 6 Global Step: 70370 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:04:20,083-Speed 5981.83 samples/sec Loss 10.1766 LearningRate 0.2155 Epoch: 6 Global Step: 70380 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 10:04:26,934-Speed 5979.89 samples/sec Loss 10.2259 LearningRate 0.2155 Epoch: 6 Global Step: 70390 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 10:04:33,802-Speed 5965.72 samples/sec Loss 10.3350 LearningRate 0.2155 Epoch: 6 Global Step: 70400 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 10:04:40,654-Speed 5978.08 samples/sec Loss 10.1904 LearningRate 0.2154 Epoch: 6 Global Step: 70410 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 10:04:47,515-Speed 5970.93 samples/sec Loss 10.2228 LearningRate 0.2154 Epoch: 6 Global Step: 70420 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:04:54,369-Speed 5977.46 samples/sec Loss 10.1912 LearningRate 0.2154 Epoch: 6 Global Step: 70430 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:05:01,230-Speed 5970.97 samples/sec Loss 10.2236 LearningRate 0.2153 Epoch: 6 Global Step: 70440 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:05:08,090-Speed 5972.09 samples/sec Loss 10.2614 LearningRate 0.2153 Epoch: 6 Global Step: 70450 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:05:14,968-Speed 5956.56 samples/sec Loss 10.2640 LearningRate 0.2153 Epoch: 6 Global Step: 70460 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:05:21,831-Speed 5968.84 samples/sec Loss 10.2373 LearningRate 0.2152 Epoch: 6 Global Step: 70470 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:05:28,740-Speed 5929.87 samples/sec Loss 10.1946 LearningRate 0.2152 Epoch: 6 Global Step: 70480 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:05:35,603-Speed 5969.07 samples/sec Loss 10.2465 LearningRate 0.2152 Epoch: 6 Global Step: 70490 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:05:42,451-Speed 5982.54 samples/sec Loss 10.2430 LearningRate 0.2151 Epoch: 6 Global Step: 70500 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:05:49,330-Speed 5956.48 samples/sec Loss 10.1875 LearningRate 0.2151 Epoch: 6 Global Step: 70510 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:05:56,185-Speed 5976.19 samples/sec Loss 10.2942 LearningRate 0.2151 Epoch: 6 Global Step: 70520 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 10:06:03,070-Speed 5950.34 samples/sec Loss 10.1184 LearningRate 0.2150 Epoch: 6 Global Step: 70530 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 10:06:09,912-Speed 5987.89 samples/sec Loss 10.2649 LearningRate 0.2150 Epoch: 6 Global Step: 70540 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:06:16,772-Speed 5975.06 samples/sec Loss 10.3100 LearningRate 0.2150 Epoch: 6 Global Step: 70550 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:06:23,625-Speed 5977.56 samples/sec Loss 10.2209 LearningRate 0.2150 Epoch: 6 Global Step: 70560 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:06:30,479-Speed 5977.47 samples/sec Loss 10.2070 LearningRate 0.2149 Epoch: 6 Global Step: 70570 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:06:37,341-Speed 5970.31 samples/sec Loss 10.1631 LearningRate 0.2149 Epoch: 6 Global Step: 70580 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:06:44,193-Speed 5978.95 samples/sec Loss 10.2651 LearningRate 0.2149 Epoch: 6 Global Step: 70590 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:06:51,039-Speed 5984.11 samples/sec Loss 10.1550 LearningRate 0.2148 Epoch: 6 Global Step: 70600 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:06:57,919-Speed 5954.26 samples/sec Loss 10.1715 LearningRate 0.2148 Epoch: 6 Global Step: 70610 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:07:04,780-Speed 5971.46 samples/sec Loss 10.2134 LearningRate 0.2148 Epoch: 6 Global Step: 70620 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:07:11,671-Speed 5945.59 samples/sec Loss 10.1189 LearningRate 0.2147 Epoch: 6 Global Step: 70630 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:07:18,545-Speed 5960.29 samples/sec Loss 10.1366 LearningRate 0.2147 Epoch: 6 Global Step: 70640 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 10:07:25,426-Speed 5953.44 samples/sec Loss 10.1725 LearningRate 0.2147 Epoch: 6 Global Step: 70650 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 10:07:32,301-Speed 5959.27 samples/sec Loss 10.2947 LearningRate 0.2146 Epoch: 6 Global Step: 70660 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 10:07:39,172-Speed 5964.44 samples/sec Loss 10.1658 LearningRate 0.2146 Epoch: 6 Global Step: 70670 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 10:07:46,092-Speed 5919.46 samples/sec Loss 10.1715 LearningRate 0.2146 Epoch: 6 Global Step: 70680 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 10:07:52,966-Speed 5962.51 samples/sec Loss 10.2221 LearningRate 0.2145 Epoch: 6 Global Step: 70690 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 10:07:59,831-Speed 5967.72 samples/sec Loss 10.2016 LearningRate 0.2145 Epoch: 6 Global Step: 70700 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 10:08:06,688-Speed 5974.17 samples/sec Loss 10.1840 LearningRate 0.2145 Epoch: 6 Global Step: 70710 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 10:08:13,554-Speed 5967.63 samples/sec Loss 10.1983 LearningRate 0.2144 Epoch: 6 Global Step: 70720 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 10:08:20,420-Speed 5966.40 samples/sec Loss 10.1855 LearningRate 0.2144 Epoch: 6 Global Step: 70730 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 10:08:27,273-Speed 5980.50 samples/sec Loss 10.1543 LearningRate 0.2144 Epoch: 6 Global Step: 70740 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 10:08:34,133-Speed 5971.29 samples/sec Loss 10.1543 LearningRate 0.2144 Epoch: 6 Global Step: 70750 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 10:08:41,001-Speed 5965.89 samples/sec Loss 10.1761 LearningRate 0.2143 Epoch: 6 Global Step: 70760 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 10:08:47,878-Speed 5956.85 samples/sec Loss 10.2465 LearningRate 0.2143 Epoch: 6 Global Step: 70770 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 10:08:54,732-Speed 5977.36 samples/sec Loss 10.1592 LearningRate 0.2143 Epoch: 6 Global Step: 70780 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 10:09:01,589-Speed 5975.05 samples/sec Loss 10.2246 LearningRate 0.2142 Epoch: 6 Global Step: 70790 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 10:09:08,465-Speed 5957.70 samples/sec Loss 10.1845 LearningRate 0.2142 Epoch: 6 Global Step: 70800 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 10:09:15,323-Speed 5973.49 samples/sec Loss 10.1643 LearningRate 0.2142 Epoch: 6 Global Step: 70810 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 10:09:22,207-Speed 5951.09 samples/sec Loss 10.2034 LearningRate 0.2141 Epoch: 6 Global Step: 70820 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:09:29,072-Speed 5968.01 samples/sec Loss 10.2351 LearningRate 0.2141 Epoch: 6 Global Step: 70830 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:09:35,931-Speed 5974.68 samples/sec Loss 10.2063 LearningRate 0.2141 Epoch: 6 Global Step: 70840 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:09:42,779-Speed 5982.70 samples/sec Loss 10.2182 LearningRate 0.2140 Epoch: 6 Global Step: 70850 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:09:49,646-Speed 5967.66 samples/sec Loss 10.1416 LearningRate 0.2140 Epoch: 6 Global Step: 70860 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:09:56,497-Speed 5979.51 samples/sec Loss 10.2204 LearningRate 0.2140 Epoch: 6 Global Step: 70870 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:10:03,345-Speed 5984.82 samples/sec Loss 10.1748 LearningRate 0.2139 Epoch: 6 Global Step: 70880 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:10:10,197-Speed 5979.38 samples/sec Loss 10.1341 LearningRate 0.2139 Epoch: 6 Global Step: 70890 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:10:17,082-Speed 5950.67 samples/sec Loss 10.1072 LearningRate 0.2139 Epoch: 6 Global Step: 70900 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:10:23,938-Speed 5975.92 samples/sec Loss 10.2076 LearningRate 0.2139 Epoch: 6 Global Step: 70910 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:10:30,794-Speed 5975.85 samples/sec Loss 10.0977 LearningRate 0.2138 Epoch: 6 Global Step: 70920 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 10:10:37,655-Speed 5971.06 samples/sec Loss 10.1442 LearningRate 0.2138 Epoch: 6 Global Step: 70930 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 10:10:44,537-Speed 5952.56 samples/sec Loss 10.2912 LearningRate 0.2138 Epoch: 6 Global Step: 70940 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 10:10:51,408-Speed 5962.23 samples/sec Loss 10.1708 LearningRate 0.2137 Epoch: 6 Global Step: 70950 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 10:10:58,272-Speed 5968.44 samples/sec Loss 10.1940 LearningRate 0.2137 Epoch: 6 Global Step: 70960 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:11:05,142-Speed 5963.52 samples/sec Loss 10.2339 LearningRate 0.2137 Epoch: 6 Global Step: 70970 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:11:11,999-Speed 5974.37 samples/sec Loss 10.1339 LearningRate 0.2136 Epoch: 6 Global Step: 70980 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:11:18,855-Speed 5975.45 samples/sec Loss 10.0804 LearningRate 0.2136 Epoch: 6 Global Step: 70990 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:11:25,704-Speed 5981.73 samples/sec Loss 10.1584 LearningRate 0.2136 Epoch: 6 Global Step: 71000 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:11:32,556-Speed 5978.83 samples/sec Loss 10.0881 LearningRate 0.2135 Epoch: 6 Global Step: 71010 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:11:39,425-Speed 5964.46 samples/sec Loss 10.1705 LearningRate 0.2135 Epoch: 6 Global Step: 71020 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:11:46,378-Speed 5894.82 samples/sec Loss 10.1008 LearningRate 0.2135 Epoch: 6 Global Step: 71030 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:11:53,255-Speed 5956.57 samples/sec Loss 10.2114 LearningRate 0.2134 Epoch: 6 Global Step: 71040 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:12:00,129-Speed 5962.36 samples/sec Loss 10.2278 LearningRate 0.2134 Epoch: 6 Global Step: 71050 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:12:06,975-Speed 5986.19 samples/sec Loss 10.1461 LearningRate 0.2134 Epoch: 6 Global Step: 71060 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:12:13,828-Speed 5977.61 samples/sec Loss 10.1703 LearningRate 0.2134 Epoch: 6 Global Step: 71070 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:12:20,701-Speed 5962.69 samples/sec Loss 10.0980 LearningRate 0.2133 Epoch: 6 Global Step: 71080 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:12:27,568-Speed 5965.46 samples/sec Loss 10.1662 LearningRate 0.2133 Epoch: 6 Global Step: 71090 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:12:34,447-Speed 5956.62 samples/sec Loss 10.1289 LearningRate 0.2133 Epoch: 6 Global Step: 71100 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:12:41,305-Speed 5973.50 samples/sec Loss 10.1365 LearningRate 0.2132 Epoch: 6 Global Step: 71110 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:12:48,170-Speed 5967.90 samples/sec Loss 10.2330 LearningRate 0.2132 Epoch: 6 Global Step: 71120 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:12:55,045-Speed 5965.16 samples/sec Loss 10.1536 LearningRate 0.2132 Epoch: 6 Global Step: 71130 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:13:01,911-Speed 5966.84 samples/sec Loss 10.1164 LearningRate 0.2131 Epoch: 6 Global Step: 71140 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:13:08,768-Speed 5974.55 samples/sec Loss 10.1285 LearningRate 0.2131 Epoch: 6 Global Step: 71150 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:13:15,636-Speed 5965.07 samples/sec Loss 10.1888 LearningRate 0.2131 Epoch: 6 Global Step: 71160 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 10:13:22,535-Speed 5937.95 samples/sec Loss 10.2258 LearningRate 0.2130 Epoch: 6 Global Step: 71170 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 10:13:29,398-Speed 5972.13 samples/sec Loss 10.1761 LearningRate 0.2130 Epoch: 6 Global Step: 71180 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 10:13:36,255-Speed 5974.13 samples/sec Loss 10.1760 LearningRate 0.2130 Epoch: 6 Global Step: 71190 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 10:13:43,118-Speed 5969.79 samples/sec Loss 10.2310 LearningRate 0.2129 Epoch: 6 Global Step: 71200 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 10:13:49,973-Speed 5976.23 samples/sec Loss 10.1775 LearningRate 0.2129 Epoch: 6 Global Step: 71210 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:13:56,844-Speed 5962.71 samples/sec Loss 10.1302 LearningRate 0.2129 Epoch: 6 Global Step: 71220 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:14:03,725-Speed 5954.28 samples/sec Loss 10.1230 LearningRate 0.2129 Epoch: 6 Global Step: 71230 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:14:10,610-Speed 5950.31 samples/sec Loss 10.0984 LearningRate 0.2128 Epoch: 6 Global Step: 71240 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:14:17,503-Speed 5942.84 samples/sec Loss 10.2127 LearningRate 0.2128 Epoch: 6 Global Step: 71250 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:14:24,354-Speed 5980.22 samples/sec Loss 10.1375 LearningRate 0.2128 Epoch: 6 Global Step: 71260 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:14:31,215-Speed 5972.02 samples/sec Loss 10.1732 LearningRate 0.2127 Epoch: 6 Global Step: 71270 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:14:38,069-Speed 5977.16 samples/sec Loss 10.2182 LearningRate 0.2127 Epoch: 6 Global Step: 71280 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:14:44,922-Speed 5977.23 samples/sec Loss 10.1286 LearningRate 0.2127 Epoch: 6 Global Step: 71290 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:14:51,778-Speed 5975.83 samples/sec Loss 10.1257 LearningRate 0.2126 Epoch: 6 Global Step: 71300 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:14:58,659-Speed 5953.56 samples/sec Loss 10.1545 LearningRate 0.2126 Epoch: 6 Global Step: 71310 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 10:15:05,524-Speed 5968.54 samples/sec Loss 10.1786 LearningRate 0.2126 Epoch: 6 Global Step: 71320 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 10:15:12,376-Speed 5978.53 samples/sec Loss 10.1317 LearningRate 0.2125 Epoch: 6 Global Step: 71330 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:15:19,252-Speed 5957.96 samples/sec Loss 10.1392 LearningRate 0.2125 Epoch: 6 Global Step: 71340 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:15:26,106-Speed 5977.43 samples/sec Loss 10.0900 LearningRate 0.2125 Epoch: 6 Global Step: 71350 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:15:32,984-Speed 5957.19 samples/sec Loss 10.1422 LearningRate 0.2124 Epoch: 6 Global Step: 71360 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:15:39,858-Speed 5959.39 samples/sec Loss 10.1086 LearningRate 0.2124 Epoch: 6 Global Step: 71370 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:15:46,750-Speed 5944.04 samples/sec Loss 10.1064 LearningRate 0.2124 Epoch: 6 Global Step: 71380 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:15:53,582-Speed 5996.50 samples/sec Loss 10.1743 LearningRate 0.2124 Epoch: 6 Global Step: 71390 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 10:16:00,431-Speed 5980.89 samples/sec Loss 10.1550 LearningRate 0.2123 Epoch: 6 Global Step: 71400 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 10:16:07,294-Speed 5969.91 samples/sec Loss 10.1956 LearningRate 0.2123 Epoch: 6 Global Step: 71410 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 10:16:14,162-Speed 5965.22 samples/sec Loss 10.1865 LearningRate 0.2123 Epoch: 6 Global Step: 71420 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 10:16:21,008-Speed 5983.62 samples/sec Loss 10.1905 LearningRate 0.2122 Epoch: 6 Global Step: 71430 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 10:16:27,868-Speed 5972.98 samples/sec Loss 10.1793 LearningRate 0.2122 Epoch: 6 Global Step: 71440 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 10:16:34,718-Speed 5981.41 samples/sec Loss 10.1682 LearningRate 0.2122 Epoch: 6 Global Step: 71450 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 10:16:41,555-Speed 5991.86 samples/sec Loss 10.1152 LearningRate 0.2121 Epoch: 6 Global Step: 71460 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 10:16:48,404-Speed 5982.12 samples/sec Loss 10.1309 LearningRate 0.2121 Epoch: 6 Global Step: 71470 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 10:16:55,266-Speed 5972.34 samples/sec Loss 10.1668 LearningRate 0.2121 Epoch: 6 Global Step: 71480 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 10:17:02,148-Speed 5953.48 samples/sec Loss 10.1922 LearningRate 0.2120 Epoch: 6 Global Step: 71490 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 10:17:09,022-Speed 5959.56 samples/sec Loss 10.2432 LearningRate 0.2120 Epoch: 6 Global Step: 71500 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 10:17:15,884-Speed 5970.00 samples/sec Loss 10.0898 LearningRate 0.2120 Epoch: 6 Global Step: 71510 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 10:17:22,746-Speed 5970.67 samples/sec Loss 10.0655 LearningRate 0.2119 Epoch: 6 Global Step: 71520 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 10:17:29,618-Speed 5962.00 samples/sec Loss 10.0876 LearningRate 0.2119 Epoch: 6 Global Step: 71530 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 10:17:36,473-Speed 5978.25 samples/sec Loss 10.1201 LearningRate 0.2119 Epoch: 6 Global Step: 71540 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 10:17:43,319-Speed 5984.25 samples/sec Loss 10.0674 LearningRate 0.2119 Epoch: 6 Global Step: 71550 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 10:17:50,172-Speed 5980.85 samples/sec Loss 10.1131 LearningRate 0.2118 Epoch: 6 Global Step: 71560 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 10:17:57,023-Speed 5980.05 samples/sec Loss 10.1400 LearningRate 0.2118 Epoch: 6 Global Step: 71570 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 10:18:03,898-Speed 5959.06 samples/sec Loss 10.1315 LearningRate 0.2118 Epoch: 6 Global Step: 71580 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 10:18:10,764-Speed 5966.78 samples/sec Loss 10.2174 LearningRate 0.2117 Epoch: 6 Global Step: 71590 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:18:17,639-Speed 5959.47 samples/sec Loss 10.0425 LearningRate 0.2117 Epoch: 6 Global Step: 71600 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:18:24,493-Speed 5977.44 samples/sec Loss 10.0693 LearningRate 0.2117 Epoch: 6 Global Step: 71610 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:18:31,347-Speed 5977.29 samples/sec Loss 10.1531 LearningRate 0.2116 Epoch: 6 Global Step: 71620 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:18:39,155-Speed 5246.80 samples/sec Loss 10.0777 LearningRate 0.2116 Epoch: 6 Global Step: 71630 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:18:45,995-Speed 5989.20 samples/sec Loss 10.1602 LearningRate 0.2116 Epoch: 6 Global Step: 71640 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:18:52,836-Speed 5988.21 samples/sec Loss 10.1250 LearningRate 0.2115 Epoch: 6 Global Step: 71650 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:18:59,682-Speed 5983.98 samples/sec Loss 10.1474 LearningRate 0.2115 Epoch: 6 Global Step: 71660 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:19:06,541-Speed 5972.22 samples/sec Loss 10.1579 LearningRate 0.2115 Epoch: 6 Global Step: 71670 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:19:13,387-Speed 5984.41 samples/sec Loss 10.0566 LearningRate 0.2114 Epoch: 6 Global Step: 71680 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:19:20,232-Speed 5984.28 samples/sec Loss 10.1023 LearningRate 0.2114 Epoch: 6 Global Step: 71690 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 10:19:27,090-Speed 5974.02 samples/sec Loss 10.1163 LearningRate 0.2114 Epoch: 6 Global Step: 71700 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:19:33,992-Speed 5936.58 samples/sec Loss 10.1234 LearningRate 0.2114 Epoch: 6 Global Step: 71710 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:19:40,853-Speed 5970.50 samples/sec Loss 10.1708 LearningRate 0.2113 Epoch: 6 Global Step: 71720 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:19:47,707-Speed 5977.30 samples/sec Loss 10.1322 LearningRate 0.2113 Epoch: 6 Global Step: 71730 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:19:54,560-Speed 5978.29 samples/sec Loss 10.0940 LearningRate 0.2113 Epoch: 6 Global Step: 71740 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:20:01,439-Speed 5955.01 samples/sec Loss 10.1563 LearningRate 0.2112 Epoch: 6 Global Step: 71750 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:20:08,293-Speed 5977.11 samples/sec Loss 10.1021 LearningRate 0.2112 Epoch: 6 Global Step: 71760 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:20:15,158-Speed 5969.54 samples/sec Loss 10.0739 LearningRate 0.2112 Epoch: 6 Global Step: 71770 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:20:22,010-Speed 5978.45 samples/sec Loss 10.0550 LearningRate 0.2111 Epoch: 6 Global Step: 71780 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:20:28,861-Speed 5979.77 samples/sec Loss 10.1531 LearningRate 0.2111 Epoch: 6 Global Step: 71790 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:20:35,740-Speed 5955.98 samples/sec Loss 10.0547 LearningRate 0.2111 Epoch: 6 Global Step: 71800 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 10:20:42,599-Speed 5972.60 samples/sec Loss 10.0968 LearningRate 0.2110 Epoch: 6 Global Step: 71810 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 10:20:49,465-Speed 5967.00 samples/sec Loss 10.0564 LearningRate 0.2110 Epoch: 6 Global Step: 71820 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 10:20:56,338-Speed 5960.61 samples/sec Loss 10.1261 LearningRate 0.2110 Epoch: 6 Global Step: 71830 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:21:03,220-Speed 5953.28 samples/sec Loss 10.0875 LearningRate 0.2109 Epoch: 6 Global Step: 71840 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:21:10,072-Speed 5978.33 samples/sec Loss 10.1829 LearningRate 0.2109 Epoch: 6 Global Step: 71850 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:21:16,926-Speed 5977.86 samples/sec Loss 10.1447 LearningRate 0.2109 Epoch: 6 Global Step: 71860 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:21:23,788-Speed 5969.85 samples/sec Loss 10.1551 LearningRate 0.2109 Epoch: 6 Global Step: 71870 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:21:30,643-Speed 5976.22 samples/sec Loss 10.1169 LearningRate 0.2108 Epoch: 6 Global Step: 71880 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:21:37,491-Speed 5982.34 samples/sec Loss 10.1301 LearningRate 0.2108 Epoch: 6 Global Step: 71890 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:21:44,339-Speed 5982.48 samples/sec Loss 10.1021 LearningRate 0.2108 Epoch: 6 Global Step: 71900 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:21:51,187-Speed 5981.51 samples/sec Loss 10.0844 LearningRate 0.2107 Epoch: 6 Global Step: 71910 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:21:58,059-Speed 5961.30 samples/sec Loss 10.1751 LearningRate 0.2107 Epoch: 6 Global Step: 71920 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:22:04,928-Speed 5973.52 samples/sec Loss 10.0245 LearningRate 0.2107 Epoch: 6 Global Step: 71930 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 10:22:11,813-Speed 5950.59 samples/sec Loss 10.0987 LearningRate 0.2106 Epoch: 6 Global Step: 71940 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 10:22:18,666-Speed 5977.56 samples/sec Loss 9.9746 LearningRate 0.2106 Epoch: 6 Global Step: 71950 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:22:25,533-Speed 5966.10 samples/sec Loss 10.1477 LearningRate 0.2106 Epoch: 6 Global Step: 71960 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:22:32,393-Speed 5972.14 samples/sec Loss 10.0779 LearningRate 0.2105 Epoch: 6 Global Step: 71970 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:22:39,258-Speed 5967.55 samples/sec Loss 10.0878 LearningRate 0.2105 Epoch: 6 Global Step: 71980 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:22:46,137-Speed 5955.46 samples/sec Loss 10.1559 LearningRate 0.2105 Epoch: 6 Global Step: 71990 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:22:53,000-Speed 5971.51 samples/sec Loss 10.1251 LearningRate 0.2105 Epoch: 6 Global Step: 72000 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:22:59,884-Speed 5951.61 samples/sec Loss 10.0540 LearningRate 0.2104 Epoch: 6 Global Step: 72010 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:23:06,753-Speed 5964.44 samples/sec Loss 10.1330 LearningRate 0.2104 Epoch: 6 Global Step: 72020 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:23:13,624-Speed 5963.90 samples/sec Loss 10.1215 LearningRate 0.2104 Epoch: 6 Global Step: 72030 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:23:20,490-Speed 5967.41 samples/sec Loss 10.1348 LearningRate 0.2103 Epoch: 6 Global Step: 72040 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:23:27,353-Speed 5969.88 samples/sec Loss 10.1223 LearningRate 0.2103 Epoch: 6 Global Step: 72050 Fp16 Grad Scale: 262144 Required: 27 hours Training: 2022-01-08 10:23:34,216-Speed 5971.57 samples/sec Loss 10.1222 LearningRate 0.2103 Epoch: 6 Global Step: 72060 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:23:41,092-Speed 5958.19 samples/sec Loss 10.0221 LearningRate 0.2102 Epoch: 6 Global Step: 72070 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:23:47,969-Speed 5957.26 samples/sec Loss 10.1186 LearningRate 0.2102 Epoch: 6 Global Step: 72080 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:23:54,829-Speed 5972.04 samples/sec Loss 10.0721 LearningRate 0.2102 Epoch: 6 Global Step: 72090 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:24:01,693-Speed 5968.07 samples/sec Loss 10.0785 LearningRate 0.2101 Epoch: 6 Global Step: 72100 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:24:08,556-Speed 5969.51 samples/sec Loss 10.0702 LearningRate 0.2101 Epoch: 6 Global Step: 72110 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:24:15,416-Speed 5972.15 samples/sec Loss 10.0648 LearningRate 0.2101 Epoch: 6 Global Step: 72120 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:24:22,287-Speed 5962.35 samples/sec Loss 10.1124 LearningRate 0.2100 Epoch: 6 Global Step: 72130 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:24:29,145-Speed 5973.15 samples/sec Loss 10.1571 LearningRate 0.2100 Epoch: 6 Global Step: 72140 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:24:36,035-Speed 5949.72 samples/sec Loss 10.1061 LearningRate 0.2100 Epoch: 6 Global Step: 72150 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:24:42,882-Speed 5983.69 samples/sec Loss 10.0405 LearningRate 0.2100 Epoch: 6 Global Step: 72160 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:24:49,743-Speed 5970.22 samples/sec Loss 10.0998 LearningRate 0.2099 Epoch: 6 Global Step: 72170 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 10:24:56,613-Speed 5964.03 samples/sec Loss 10.0698 LearningRate 0.2099 Epoch: 6 Global Step: 72180 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:25:03,464-Speed 5980.35 samples/sec Loss 10.1085 LearningRate 0.2099 Epoch: 6 Global Step: 72190 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:25:10,308-Speed 5985.07 samples/sec Loss 10.1437 LearningRate 0.2098 Epoch: 6 Global Step: 72200 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:25:17,177-Speed 5964.31 samples/sec Loss 10.0677 LearningRate 0.2098 Epoch: 6 Global Step: 72210 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:25:24,058-Speed 5954.14 samples/sec Loss 10.0854 LearningRate 0.2098 Epoch: 6 Global Step: 72220 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:25:30,926-Speed 5964.22 samples/sec Loss 10.1225 LearningRate 0.2097 Epoch: 6 Global Step: 72230 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:25:37,787-Speed 5971.05 samples/sec Loss 10.0314 LearningRate 0.2097 Epoch: 6 Global Step: 72240 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:25:44,629-Speed 5987.52 samples/sec Loss 10.0630 LearningRate 0.2097 Epoch: 6 Global Step: 72250 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:25:51,474-Speed 5984.79 samples/sec Loss 10.0026 LearningRate 0.2096 Epoch: 6 Global Step: 72260 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:25:58,322-Speed 5981.90 samples/sec Loss 10.0442 LearningRate 0.2096 Epoch: 6 Global Step: 72270 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:26:05,167-Speed 5984.93 samples/sec Loss 9.9779 LearningRate 0.2096 Epoch: 6 Global Step: 72280 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:26:12,022-Speed 5976.78 samples/sec Loss 10.0792 LearningRate 0.2095 Epoch: 6 Global Step: 72290 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:26:18,877-Speed 5975.89 samples/sec Loss 10.0635 LearningRate 0.2095 Epoch: 6 Global Step: 72300 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:26:25,788-Speed 5928.35 samples/sec Loss 10.0846 LearningRate 0.2095 Epoch: 6 Global Step: 72310 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:26:32,686-Speed 5938.69 samples/sec Loss 10.1049 LearningRate 0.2095 Epoch: 6 Global Step: 72320 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:26:39,606-Speed 5920.85 samples/sec Loss 9.9751 LearningRate 0.2094 Epoch: 6 Global Step: 72330 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:26:47,061-Speed 5494.83 samples/sec Loss 10.0761 LearningRate 0.2094 Epoch: 6 Global Step: 72340 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:26:53,956-Speed 5941.66 samples/sec Loss 10.0786 LearningRate 0.2094 Epoch: 6 Global Step: 72350 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:27:00,805-Speed 5981.09 samples/sec Loss 9.9856 LearningRate 0.2093 Epoch: 6 Global Step: 72360 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 10:27:07,655-Speed 5981.38 samples/sec Loss 10.0957 LearningRate 0.2093 Epoch: 6 Global Step: 72370 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 10:27:14,568-Speed 5925.72 samples/sec Loss 10.0949 LearningRate 0.2093 Epoch: 6 Global Step: 72380 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 10:27:21,426-Speed 5973.56 samples/sec Loss 10.0777 LearningRate 0.2092 Epoch: 6 Global Step: 72390 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 10:27:28,289-Speed 5969.98 samples/sec Loss 10.0585 LearningRate 0.2092 Epoch: 6 Global Step: 72400 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 10:27:35,156-Speed 5967.28 samples/sec Loss 10.0830 LearningRate 0.2092 Epoch: 6 Global Step: 72410 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 10:27:42,021-Speed 5977.95 samples/sec Loss 10.1077 LearningRate 0.2091 Epoch: 6 Global Step: 72420 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 10:27:48,872-Speed 5979.77 samples/sec Loss 10.0571 LearningRate 0.2091 Epoch: 6 Global Step: 72430 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 10:27:55,730-Speed 5974.31 samples/sec Loss 10.0415 LearningRate 0.2091 Epoch: 6 Global Step: 72440 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 10:28:02,587-Speed 5975.94 samples/sec Loss 10.0163 LearningRate 0.2091 Epoch: 6 Global Step: 72450 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 10:28:09,446-Speed 5972.41 samples/sec Loss 10.1428 LearningRate 0.2090 Epoch: 6 Global Step: 72460 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 10:28:16,308-Speed 5970.14 samples/sec Loss 10.1581 LearningRate 0.2090 Epoch: 6 Global Step: 72470 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 10:28:23,187-Speed 5956.43 samples/sec Loss 10.0283 LearningRate 0.2090 Epoch: 6 Global Step: 72480 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 10:28:30,054-Speed 5965.92 samples/sec Loss 10.0630 LearningRate 0.2089 Epoch: 6 Global Step: 72490 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 10:28:36,938-Speed 5950.95 samples/sec Loss 10.0361 LearningRate 0.2089 Epoch: 6 Global Step: 72500 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 10:28:43,805-Speed 5968.46 samples/sec Loss 10.1141 LearningRate 0.2089 Epoch: 6 Global Step: 72510 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 10:28:50,641-Speed 5992.40 samples/sec Loss 10.1116 LearningRate 0.2088 Epoch: 6 Global Step: 72520 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:28:57,493-Speed 5979.40 samples/sec Loss 10.1246 LearningRate 0.2088 Epoch: 6 Global Step: 72530 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:29:04,380-Speed 5948.08 samples/sec Loss 10.0223 LearningRate 0.2088 Epoch: 6 Global Step: 72540 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:29:11,235-Speed 5976.61 samples/sec Loss 10.0714 LearningRate 0.2087 Epoch: 6 Global Step: 72550 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:29:18,110-Speed 5958.65 samples/sec Loss 10.0620 LearningRate 0.2087 Epoch: 6 Global Step: 72560 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:29:25,015-Speed 5953.25 samples/sec Loss 10.0647 LearningRate 0.2087 Epoch: 6 Global Step: 72570 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:29:31,858-Speed 5986.64 samples/sec Loss 10.0320 LearningRate 0.2087 Epoch: 6 Global Step: 72580 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:29:55,540-Speed 1729.90 samples/sec Loss 10.1085 LearningRate 0.2086 Epoch: 7 Global Step: 72590 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:30:02,355-Speed 6011.76 samples/sec Loss 10.1052 LearningRate 0.2086 Epoch: 7 Global Step: 72600 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:30:09,187-Speed 5996.03 samples/sec Loss 10.1062 LearningRate 0.2086 Epoch: 7 Global Step: 72610 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:30:16,019-Speed 5996.39 samples/sec Loss 10.0749 LearningRate 0.2085 Epoch: 7 Global Step: 72620 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 10:30:22,855-Speed 5993.23 samples/sec Loss 10.0265 LearningRate 0.2085 Epoch: 7 Global Step: 72630 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:30:29,717-Speed 5969.76 samples/sec Loss 10.0312 LearningRate 0.2085 Epoch: 7 Global Step: 72640 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:30:36,573-Speed 5975.30 samples/sec Loss 10.0442 LearningRate 0.2084 Epoch: 7 Global Step: 72650 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:30:43,414-Speed 5989.25 samples/sec Loss 10.1393 LearningRate 0.2084 Epoch: 7 Global Step: 72660 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:30:50,293-Speed 5955.39 samples/sec Loss 10.0856 LearningRate 0.2084 Epoch: 7 Global Step: 72670 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:30:57,207-Speed 5924.41 samples/sec Loss 10.1029 LearningRate 0.2083 Epoch: 7 Global Step: 72680 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:31:04,093-Speed 5951.70 samples/sec Loss 9.9594 LearningRate 0.2083 Epoch: 7 Global Step: 72690 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:31:10,961-Speed 5965.64 samples/sec Loss 9.9396 LearningRate 0.2083 Epoch: 7 Global Step: 72700 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:31:17,867-Speed 5934.90 samples/sec Loss 9.9967 LearningRate 0.2082 Epoch: 7 Global Step: 72710 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:31:24,746-Speed 5955.22 samples/sec Loss 9.9787 LearningRate 0.2082 Epoch: 7 Global Step: 72720 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:31:31,623-Speed 5958.36 samples/sec Loss 9.9772 LearningRate 0.2082 Epoch: 7 Global Step: 72730 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 10:31:38,487-Speed 5967.67 samples/sec Loss 10.0316 LearningRate 0.2082 Epoch: 7 Global Step: 72740 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:31:45,364-Speed 5957.49 samples/sec Loss 9.9787 LearningRate 0.2081 Epoch: 7 Global Step: 72750 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:31:52,224-Speed 5976.85 samples/sec Loss 9.9674 LearningRate 0.2081 Epoch: 7 Global Step: 72760 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:31:59,110-Speed 5949.21 samples/sec Loss 9.9511 LearningRate 0.2081 Epoch: 7 Global Step: 72770 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:32:05,983-Speed 5961.20 samples/sec Loss 10.0194 LearningRate 0.2080 Epoch: 7 Global Step: 72780 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:32:12,853-Speed 5964.88 samples/sec Loss 9.9745 LearningRate 0.2080 Epoch: 7 Global Step: 72790 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:32:19,728-Speed 5961.03 samples/sec Loss 10.0118 LearningRate 0.2080 Epoch: 7 Global Step: 72800 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:32:26,571-Speed 5986.48 samples/sec Loss 10.0459 LearningRate 0.2079 Epoch: 7 Global Step: 72810 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:32:33,420-Speed 5981.91 samples/sec Loss 10.0214 LearningRate 0.2079 Epoch: 7 Global Step: 72820 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:32:40,276-Speed 5975.83 samples/sec Loss 10.0582 LearningRate 0.2079 Epoch: 7 Global Step: 72830 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:32:47,139-Speed 5970.20 samples/sec Loss 10.0050 LearningRate 0.2078 Epoch: 7 Global Step: 72840 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 10:32:53,990-Speed 5980.16 samples/sec Loss 9.9780 LearningRate 0.2078 Epoch: 7 Global Step: 72850 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 10:33:00,863-Speed 5960.83 samples/sec Loss 10.0755 LearningRate 0.2078 Epoch: 7 Global Step: 72860 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:33:07,721-Speed 5973.43 samples/sec Loss 10.0119 LearningRate 0.2078 Epoch: 7 Global Step: 72870 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:33:14,594-Speed 5960.75 samples/sec Loss 10.0419 LearningRate 0.2077 Epoch: 7 Global Step: 72880 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:33:21,466-Speed 5962.36 samples/sec Loss 10.0148 LearningRate 0.2077 Epoch: 7 Global Step: 72890 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:33:28,326-Speed 5971.34 samples/sec Loss 10.0384 LearningRate 0.2077 Epoch: 7 Global Step: 72900 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:33:35,183-Speed 5981.14 samples/sec Loss 10.0849 LearningRate 0.2076 Epoch: 7 Global Step: 72910 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:33:42,046-Speed 5968.41 samples/sec Loss 10.0646 LearningRate 0.2076 Epoch: 7 Global Step: 72920 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:33:48,893-Speed 5983.63 samples/sec Loss 10.0019 LearningRate 0.2076 Epoch: 7 Global Step: 72930 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:33:55,813-Speed 5920.51 samples/sec Loss 10.0519 LearningRate 0.2075 Epoch: 7 Global Step: 72940 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:34:02,669-Speed 5977.55 samples/sec Loss 10.0580 LearningRate 0.2075 Epoch: 7 Global Step: 72950 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:34:09,522-Speed 5977.17 samples/sec Loss 10.0093 LearningRate 0.2075 Epoch: 7 Global Step: 72960 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 10:34:16,371-Speed 5982.23 samples/sec Loss 9.9986 LearningRate 0.2074 Epoch: 7 Global Step: 72970 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:34:23,228-Speed 5974.52 samples/sec Loss 10.0169 LearningRate 0.2074 Epoch: 7 Global Step: 72980 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:34:30,087-Speed 5972.96 samples/sec Loss 9.9821 LearningRate 0.2074 Epoch: 7 Global Step: 72990 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:34:36,936-Speed 5980.87 samples/sec Loss 10.0564 LearningRate 0.2074 Epoch: 7 Global Step: 73000 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:34:43,794-Speed 5975.52 samples/sec Loss 10.0584 LearningRate 0.2073 Epoch: 7 Global Step: 73010 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:34:50,671-Speed 5957.36 samples/sec Loss 10.0011 LearningRate 0.2073 Epoch: 7 Global Step: 73020 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:34:57,571-Speed 5941.47 samples/sec Loss 10.1651 LearningRate 0.2073 Epoch: 7 Global Step: 73030 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:35:04,438-Speed 5965.88 samples/sec Loss 10.0621 LearningRate 0.2072 Epoch: 7 Global Step: 73040 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:35:11,304-Speed 5966.66 samples/sec Loss 10.0116 LearningRate 0.2072 Epoch: 7 Global Step: 73050 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:35:18,185-Speed 5953.75 samples/sec Loss 9.9896 LearningRate 0.2072 Epoch: 7 Global Step: 73060 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:35:25,035-Speed 5981.06 samples/sec Loss 10.0111 LearningRate 0.2071 Epoch: 7 Global Step: 73070 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 10:35:31,899-Speed 5968.20 samples/sec Loss 10.0552 LearningRate 0.2071 Epoch: 7 Global Step: 73080 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 10:35:38,760-Speed 5971.20 samples/sec Loss 10.0148 LearningRate 0.2071 Epoch: 7 Global Step: 73090 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 10:35:45,622-Speed 5969.91 samples/sec Loss 10.0137 LearningRate 0.2070 Epoch: 7 Global Step: 73100 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 10:35:52,477-Speed 5976.65 samples/sec Loss 10.0313 LearningRate 0.2070 Epoch: 7 Global Step: 73110 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 10:35:59,336-Speed 5972.10 samples/sec Loss 10.0426 LearningRate 0.2070 Epoch: 7 Global Step: 73120 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 10:36:06,225-Speed 5947.25 samples/sec Loss 9.9597 LearningRate 0.2070 Epoch: 7 Global Step: 73130 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 10:36:13,066-Speed 5988.42 samples/sec Loss 10.0070 LearningRate 0.2069 Epoch: 7 Global Step: 73140 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:36:19,941-Speed 5958.78 samples/sec Loss 10.0143 LearningRate 0.2069 Epoch: 7 Global Step: 73150 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:36:26,792-Speed 5980.37 samples/sec Loss 9.9880 LearningRate 0.2069 Epoch: 7 Global Step: 73160 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:36:33,644-Speed 5978.38 samples/sec Loss 9.9389 LearningRate 0.2068 Epoch: 7 Global Step: 73170 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:36:40,525-Speed 5952.89 samples/sec Loss 9.9210 LearningRate 0.2068 Epoch: 7 Global Step: 73180 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:36:47,379-Speed 5976.96 samples/sec Loss 10.0104 LearningRate 0.2068 Epoch: 7 Global Step: 73190 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:36:54,222-Speed 5986.61 samples/sec Loss 10.0392 LearningRate 0.2067 Epoch: 7 Global Step: 73200 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:37:01,078-Speed 5975.93 samples/sec Loss 10.0533 LearningRate 0.2067 Epoch: 7 Global Step: 73210 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:37:07,936-Speed 5973.89 samples/sec Loss 10.0256 LearningRate 0.2067 Epoch: 7 Global Step: 73220 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:37:14,826-Speed 5945.17 samples/sec Loss 9.9584 LearningRate 0.2066 Epoch: 7 Global Step: 73230 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:37:21,680-Speed 5977.38 samples/sec Loss 9.9151 LearningRate 0.2066 Epoch: 7 Global Step: 73240 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 10:37:28,545-Speed 5968.14 samples/sec Loss 10.0181 LearningRate 0.2066 Epoch: 7 Global Step: 73250 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:37:35,409-Speed 5970.47 samples/sec Loss 10.0418 LearningRate 0.2066 Epoch: 7 Global Step: 73260 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:37:42,267-Speed 5973.11 samples/sec Loss 9.9165 LearningRate 0.2065 Epoch: 7 Global Step: 73270 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:37:49,131-Speed 5971.23 samples/sec Loss 10.0475 LearningRate 0.2065 Epoch: 7 Global Step: 73280 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:37:55,978-Speed 5983.70 samples/sec Loss 10.0066 LearningRate 0.2065 Epoch: 7 Global Step: 73290 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:38:02,824-Speed 5983.73 samples/sec Loss 10.0198 LearningRate 0.2064 Epoch: 7 Global Step: 73300 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:38:09,674-Speed 5983.74 samples/sec Loss 9.9717 LearningRate 0.2064 Epoch: 7 Global Step: 73310 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:38:16,527-Speed 5978.00 samples/sec Loss 9.9789 LearningRate 0.2064 Epoch: 7 Global Step: 73320 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:38:23,385-Speed 5974.71 samples/sec Loss 9.9643 LearningRate 0.2063 Epoch: 7 Global Step: 73330 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:38:30,238-Speed 5977.75 samples/sec Loss 10.0039 LearningRate 0.2063 Epoch: 7 Global Step: 73340 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:38:37,123-Speed 5950.55 samples/sec Loss 9.9993 LearningRate 0.2063 Epoch: 7 Global Step: 73350 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 10:38:44,013-Speed 5945.52 samples/sec Loss 9.9698 LearningRate 0.2062 Epoch: 7 Global Step: 73360 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 10:38:50,875-Speed 5970.59 samples/sec Loss 9.9711 LearningRate 0.2062 Epoch: 7 Global Step: 73370 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 10:38:57,739-Speed 5969.18 samples/sec Loss 9.9428 LearningRate 0.2062 Epoch: 7 Global Step: 73380 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:39:04,598-Speed 5972.70 samples/sec Loss 9.9719 LearningRate 0.2062 Epoch: 7 Global Step: 73390 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:39:11,452-Speed 5977.37 samples/sec Loss 10.0124 LearningRate 0.2061 Epoch: 7 Global Step: 73400 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:39:18,325-Speed 5960.87 samples/sec Loss 9.9797 LearningRate 0.2061 Epoch: 7 Global Step: 73410 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:39:25,210-Speed 5950.42 samples/sec Loss 9.9093 LearningRate 0.2061 Epoch: 7 Global Step: 73420 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:39:32,070-Speed 5971.90 samples/sec Loss 9.9865 LearningRate 0.2060 Epoch: 7 Global Step: 73430 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:39:38,918-Speed 5982.82 samples/sec Loss 10.0443 LearningRate 0.2060 Epoch: 7 Global Step: 73440 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:39:45,851-Speed 5909.39 samples/sec Loss 9.9627 LearningRate 0.2060 Epoch: 7 Global Step: 73450 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:39:52,815-Speed 5882.97 samples/sec Loss 10.1128 LearningRate 0.2059 Epoch: 7 Global Step: 73460 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:39:59,694-Speed 5955.16 samples/sec Loss 9.9950 LearningRate 0.2059 Epoch: 7 Global Step: 73470 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:40:06,543-Speed 5982.12 samples/sec Loss 10.0172 LearningRate 0.2059 Epoch: 7 Global Step: 73480 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:40:13,407-Speed 5968.25 samples/sec Loss 10.0192 LearningRate 0.2058 Epoch: 7 Global Step: 73490 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:40:20,263-Speed 5975.30 samples/sec Loss 9.9874 LearningRate 0.2058 Epoch: 7 Global Step: 73500 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:40:27,108-Speed 5985.41 samples/sec Loss 9.9780 LearningRate 0.2058 Epoch: 7 Global Step: 73510 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:40:33,998-Speed 5946.73 samples/sec Loss 9.9874 LearningRate 0.2058 Epoch: 7 Global Step: 73520 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:40:40,936-Speed 5905.37 samples/sec Loss 9.9817 LearningRate 0.2057 Epoch: 7 Global Step: 73530 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:40:47,795-Speed 5973.03 samples/sec Loss 10.0311 LearningRate 0.2057 Epoch: 7 Global Step: 73540 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:40:54,664-Speed 5963.94 samples/sec Loss 9.9042 LearningRate 0.2057 Epoch: 7 Global Step: 73550 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:41:01,565-Speed 5936.39 samples/sec Loss 9.8344 LearningRate 0.2056 Epoch: 7 Global Step: 73560 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:41:08,444-Speed 5955.95 samples/sec Loss 9.9114 LearningRate 0.2056 Epoch: 7 Global Step: 73570 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:41:15,343-Speed 5938.97 samples/sec Loss 10.0045 LearningRate 0.2056 Epoch: 7 Global Step: 73580 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 10:41:22,208-Speed 5967.38 samples/sec Loss 9.9404 LearningRate 0.2055 Epoch: 7 Global Step: 73590 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:41:29,075-Speed 5964.99 samples/sec Loss 9.9887 LearningRate 0.2055 Epoch: 7 Global Step: 73600 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:41:35,954-Speed 5955.23 samples/sec Loss 9.9425 LearningRate 0.2055 Epoch: 7 Global Step: 73610 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:41:42,813-Speed 5973.29 samples/sec Loss 10.0240 LearningRate 0.2054 Epoch: 7 Global Step: 73620 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:41:49,665-Speed 5978.89 samples/sec Loss 9.9386 LearningRate 0.2054 Epoch: 7 Global Step: 73630 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 10:41:56,535-Speed 5963.13 samples/sec Loss 9.9028 LearningRate 0.2054 Epoch: 7 Global Step: 73640 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 10:42:03,414-Speed 5955.74 samples/sec Loss 9.9214 LearningRate 0.2054 Epoch: 7 Global Step: 73650 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 10:42:10,280-Speed 5967.80 samples/sec Loss 9.9261 LearningRate 0.2053 Epoch: 7 Global Step: 73660 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 10:42:17,195-Speed 5924.23 samples/sec Loss 10.0051 LearningRate 0.2053 Epoch: 7 Global Step: 73670 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 10:42:24,129-Speed 5908.04 samples/sec Loss 10.0361 LearningRate 0.2053 Epoch: 7 Global Step: 73680 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 10:42:31,068-Speed 5904.47 samples/sec Loss 9.9912 LearningRate 0.2052 Epoch: 7 Global Step: 73690 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 10:42:38,005-Speed 5905.69 samples/sec Loss 9.9383 LearningRate 0.2052 Epoch: 7 Global Step: 73700 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 10:42:44,898-Speed 5942.98 samples/sec Loss 9.9252 LearningRate 0.2052 Epoch: 7 Global Step: 73710 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 10:42:51,752-Speed 5976.92 samples/sec Loss 9.9168 LearningRate 0.2051 Epoch: 7 Global Step: 73720 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 10:42:58,601-Speed 5982.25 samples/sec Loss 10.0044 LearningRate 0.2051 Epoch: 7 Global Step: 73730 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:43:05,465-Speed 5967.64 samples/sec Loss 9.9174 LearningRate 0.2051 Epoch: 7 Global Step: 73740 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:43:12,318-Speed 5978.27 samples/sec Loss 9.9183 LearningRate 0.2050 Epoch: 7 Global Step: 73750 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:43:19,196-Speed 5956.85 samples/sec Loss 10.0018 LearningRate 0.2050 Epoch: 7 Global Step: 73760 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:43:26,055-Speed 5972.94 samples/sec Loss 9.9672 LearningRate 0.2050 Epoch: 7 Global Step: 73770 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:43:32,939-Speed 5951.82 samples/sec Loss 9.9125 LearningRate 0.2050 Epoch: 7 Global Step: 73780 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:43:39,866-Speed 5916.55 samples/sec Loss 9.9906 LearningRate 0.2049 Epoch: 7 Global Step: 73790 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:43:46,731-Speed 5967.62 samples/sec Loss 10.0053 LearningRate 0.2049 Epoch: 7 Global Step: 73800 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:43:53,564-Speed 5995.08 samples/sec Loss 10.0330 LearningRate 0.2049 Epoch: 7 Global Step: 73810 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 10:44:00,438-Speed 5959.91 samples/sec Loss 9.9916 LearningRate 0.2048 Epoch: 7 Global Step: 73820 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 10:44:07,294-Speed 5975.90 samples/sec Loss 9.9371 LearningRate 0.2048 Epoch: 7 Global Step: 73830 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 10:44:14,159-Speed 5967.81 samples/sec Loss 9.9568 LearningRate 0.2048 Epoch: 7 Global Step: 73840 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 10:44:21,028-Speed 5964.39 samples/sec Loss 9.9283 LearningRate 0.2047 Epoch: 7 Global Step: 73850 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 10:44:27,878-Speed 5980.18 samples/sec Loss 9.9621 LearningRate 0.2047 Epoch: 7 Global Step: 73860 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 10:44:34,723-Speed 5984.66 samples/sec Loss 10.0050 LearningRate 0.2047 Epoch: 7 Global Step: 73870 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 10:44:41,586-Speed 5969.44 samples/sec Loss 9.9334 LearningRate 0.2046 Epoch: 7 Global Step: 73880 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 10:44:48,437-Speed 5979.78 samples/sec Loss 10.0118 LearningRate 0.2046 Epoch: 7 Global Step: 73890 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 10:44:55,294-Speed 5977.44 samples/sec Loss 9.8754 LearningRate 0.2046 Epoch: 7 Global Step: 73900 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 10:45:02,183-Speed 5947.01 samples/sec Loss 9.8480 LearningRate 0.2046 Epoch: 7 Global Step: 73910 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:45:09,041-Speed 5973.61 samples/sec Loss 9.9305 LearningRate 0.2045 Epoch: 7 Global Step: 73920 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:45:15,886-Speed 5985.61 samples/sec Loss 9.9289 LearningRate 0.2045 Epoch: 7 Global Step: 73930 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:45:22,776-Speed 5946.23 samples/sec Loss 9.9648 LearningRate 0.2045 Epoch: 7 Global Step: 73940 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:45:29,652-Speed 5957.52 samples/sec Loss 9.9766 LearningRate 0.2044 Epoch: 7 Global Step: 73950 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:45:36,562-Speed 5930.96 samples/sec Loss 9.9949 LearningRate 0.2044 Epoch: 7 Global Step: 73960 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:45:43,440-Speed 5956.18 samples/sec Loss 10.0429 LearningRate 0.2044 Epoch: 7 Global Step: 73970 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:45:50,297-Speed 5975.10 samples/sec Loss 9.9310 LearningRate 0.2043 Epoch: 7 Global Step: 73980 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:45:57,139-Speed 5987.72 samples/sec Loss 9.9497 LearningRate 0.2043 Epoch: 7 Global Step: 73990 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:46:03,983-Speed 5985.51 samples/sec Loss 9.8888 LearningRate 0.2043 Epoch: 7 Global Step: 74000 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:46:10,826-Speed 5986.83 samples/sec Loss 10.0111 LearningRate 0.2042 Epoch: 7 Global Step: 74010 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 10:46:17,664-Speed 5990.32 samples/sec Loss 9.9507 LearningRate 0.2042 Epoch: 7 Global Step: 74020 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:46:24,570-Speed 5932.58 samples/sec Loss 9.8546 LearningRate 0.2042 Epoch: 7 Global Step: 74030 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:46:31,441-Speed 5962.35 samples/sec Loss 9.9119 LearningRate 0.2042 Epoch: 7 Global Step: 74040 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:46:38,295-Speed 5977.04 samples/sec Loss 9.9288 LearningRate 0.2041 Epoch: 7 Global Step: 74050 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:46:45,145-Speed 5980.32 samples/sec Loss 9.9127 LearningRate 0.2041 Epoch: 7 Global Step: 74060 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:46:52,040-Speed 5942.73 samples/sec Loss 9.9978 LearningRate 0.2041 Epoch: 7 Global Step: 74070 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:46:58,916-Speed 5957.74 samples/sec Loss 9.9766 LearningRate 0.2040 Epoch: 7 Global Step: 74080 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:47:05,761-Speed 5984.91 samples/sec Loss 9.9266 LearningRate 0.2040 Epoch: 7 Global Step: 74090 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:47:12,617-Speed 5976.27 samples/sec Loss 9.8917 LearningRate 0.2040 Epoch: 7 Global Step: 74100 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:47:19,471-Speed 5978.40 samples/sec Loss 9.9823 LearningRate 0.2039 Epoch: 7 Global Step: 74110 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:47:26,333-Speed 5970.61 samples/sec Loss 10.0026 LearningRate 0.2039 Epoch: 7 Global Step: 74120 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 10:47:33,193-Speed 5972.00 samples/sec Loss 9.9687 LearningRate 0.2039 Epoch: 7 Global Step: 74130 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 10:47:40,057-Speed 5968.34 samples/sec Loss 9.9958 LearningRate 0.2038 Epoch: 7 Global Step: 74140 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 10:47:46,913-Speed 5975.43 samples/sec Loss 9.8965 LearningRate 0.2038 Epoch: 7 Global Step: 74150 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 10:47:53,762-Speed 5982.35 samples/sec Loss 9.9520 LearningRate 0.2038 Epoch: 7 Global Step: 74160 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 10:48:00,619-Speed 5973.95 samples/sec Loss 10.0109 LearningRate 0.2038 Epoch: 7 Global Step: 74170 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 10:48:07,465-Speed 5983.81 samples/sec Loss 9.9610 LearningRate 0.2037 Epoch: 7 Global Step: 74180 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:48:14,323-Speed 5973.54 samples/sec Loss 9.8954 LearningRate 0.2037 Epoch: 7 Global Step: 74190 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:48:21,170-Speed 5983.51 samples/sec Loss 9.9760 LearningRate 0.2037 Epoch: 7 Global Step: 74200 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:48:28,008-Speed 5990.99 samples/sec Loss 9.9241 LearningRate 0.2036 Epoch: 7 Global Step: 74210 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:48:34,863-Speed 5975.94 samples/sec Loss 9.9010 LearningRate 0.2036 Epoch: 7 Global Step: 74220 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:48:41,733-Speed 5963.63 samples/sec Loss 9.9008 LearningRate 0.2036 Epoch: 7 Global Step: 74230 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:48:48,632-Speed 5938.06 samples/sec Loss 9.8977 LearningRate 0.2035 Epoch: 7 Global Step: 74240 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:48:55,490-Speed 5973.68 samples/sec Loss 9.9060 LearningRate 0.2035 Epoch: 7 Global Step: 74250 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:49:02,346-Speed 5975.46 samples/sec Loss 9.9081 LearningRate 0.2035 Epoch: 7 Global Step: 74260 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:49:09,195-Speed 5981.58 samples/sec Loss 9.8941 LearningRate 0.2035 Epoch: 7 Global Step: 74270 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:49:16,040-Speed 5984.55 samples/sec Loss 9.9339 LearningRate 0.2034 Epoch: 7 Global Step: 74280 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:49:22,909-Speed 5965.88 samples/sec Loss 9.9186 LearningRate 0.2034 Epoch: 7 Global Step: 74290 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:49:29,762-Speed 5978.14 samples/sec Loss 9.9879 LearningRate 0.2034 Epoch: 7 Global Step: 74300 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:49:36,616-Speed 5977.46 samples/sec Loss 9.8998 LearningRate 0.2033 Epoch: 7 Global Step: 74310 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:49:43,481-Speed 5970.63 samples/sec Loss 9.8880 LearningRate 0.2033 Epoch: 7 Global Step: 74320 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:49:50,342-Speed 5971.12 samples/sec Loss 9.9495 LearningRate 0.2033 Epoch: 7 Global Step: 74330 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:49:57,203-Speed 5970.90 samples/sec Loss 9.8648 LearningRate 0.2032 Epoch: 7 Global Step: 74340 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:50:04,066-Speed 5969.69 samples/sec Loss 9.9684 LearningRate 0.2032 Epoch: 7 Global Step: 74350 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 10:50:10,929-Speed 5968.73 samples/sec Loss 9.9689 LearningRate 0.2032 Epoch: 7 Global Step: 74360 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 10:50:17,786-Speed 5975.06 samples/sec Loss 9.9787 LearningRate 0.2031 Epoch: 7 Global Step: 74370 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 10:50:24,645-Speed 5972.72 samples/sec Loss 10.0038 LearningRate 0.2031 Epoch: 7 Global Step: 74380 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 10:50:31,498-Speed 5977.81 samples/sec Loss 9.9070 LearningRate 0.2031 Epoch: 7 Global Step: 74390 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 10:50:38,360-Speed 5970.55 samples/sec Loss 9.9079 LearningRate 0.2031 Epoch: 7 Global Step: 74400 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 10:50:45,207-Speed 5983.15 samples/sec Loss 9.9022 LearningRate 0.2030 Epoch: 7 Global Step: 74410 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 10:50:52,061-Speed 5977.91 samples/sec Loss 9.8012 LearningRate 0.2030 Epoch: 7 Global Step: 74420 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 10:50:58,920-Speed 5972.87 samples/sec Loss 9.8468 LearningRate 0.2030 Epoch: 7 Global Step: 74430 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 10:51:05,765-Speed 5984.97 samples/sec Loss 9.8728 LearningRate 0.2029 Epoch: 7 Global Step: 74440 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 10:51:12,620-Speed 5976.28 samples/sec Loss 9.8717 LearningRate 0.2029 Epoch: 7 Global Step: 74450 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:51:19,475-Speed 5975.97 samples/sec Loss 9.9796 LearningRate 0.2029 Epoch: 7 Global Step: 74460 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:51:26,317-Speed 5987.51 samples/sec Loss 9.8820 LearningRate 0.2028 Epoch: 7 Global Step: 74470 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:51:33,169-Speed 5979.27 samples/sec Loss 9.9336 LearningRate 0.2028 Epoch: 7 Global Step: 74480 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:51:40,021-Speed 5978.28 samples/sec Loss 9.9019 LearningRate 0.2028 Epoch: 7 Global Step: 74490 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:51:46,883-Speed 5972.58 samples/sec Loss 9.7905 LearningRate 0.2027 Epoch: 7 Global Step: 74500 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:51:53,726-Speed 5986.57 samples/sec Loss 9.9370 LearningRate 0.2027 Epoch: 7 Global Step: 74510 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:52:00,645-Speed 5921.73 samples/sec Loss 9.8846 LearningRate 0.2027 Epoch: 7 Global Step: 74520 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:52:07,499-Speed 5977.35 samples/sec Loss 9.9270 LearningRate 0.2027 Epoch: 7 Global Step: 74530 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:52:14,358-Speed 5971.71 samples/sec Loss 9.8306 LearningRate 0.2026 Epoch: 7 Global Step: 74540 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:52:21,204-Speed 5984.64 samples/sec Loss 9.9279 LearningRate 0.2026 Epoch: 7 Global Step: 74550 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 10:52:28,056-Speed 5978.97 samples/sec Loss 9.9196 LearningRate 0.2026 Epoch: 7 Global Step: 74560 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 10:52:34,902-Speed 5984.19 samples/sec Loss 9.8261 LearningRate 0.2025 Epoch: 7 Global Step: 74570 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 10:52:41,741-Speed 5990.54 samples/sec Loss 9.9227 LearningRate 0.2025 Epoch: 7 Global Step: 74580 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:52:48,602-Speed 5971.13 samples/sec Loss 9.8857 LearningRate 0.2025 Epoch: 7 Global Step: 74590 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:52:55,464-Speed 5970.08 samples/sec Loss 9.8627 LearningRate 0.2024 Epoch: 7 Global Step: 74600 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:53:02,330-Speed 5966.90 samples/sec Loss 9.8271 LearningRate 0.2024 Epoch: 7 Global Step: 74610 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:53:09,291-Speed 5885.74 samples/sec Loss 9.8914 LearningRate 0.2024 Epoch: 7 Global Step: 74620 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:53:16,206-Speed 5924.41 samples/sec Loss 9.9350 LearningRate 0.2024 Epoch: 7 Global Step: 74630 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:53:23,071-Speed 5967.17 samples/sec Loss 9.8952 LearningRate 0.2023 Epoch: 7 Global Step: 74640 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:53:29,939-Speed 5965.44 samples/sec Loss 9.8461 LearningRate 0.2023 Epoch: 7 Global Step: 74650 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:53:36,799-Speed 5972.53 samples/sec Loss 9.9091 LearningRate 0.2023 Epoch: 7 Global Step: 74660 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:53:43,657-Speed 5973.31 samples/sec Loss 10.0026 LearningRate 0.2022 Epoch: 7 Global Step: 74670 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:53:50,511-Speed 5977.56 samples/sec Loss 9.8999 LearningRate 0.2022 Epoch: 7 Global Step: 74680 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 10:53:57,391-Speed 5956.80 samples/sec Loss 9.8206 LearningRate 0.2022 Epoch: 7 Global Step: 74690 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 10:54:04,249-Speed 5975.95 samples/sec Loss 9.8787 LearningRate 0.2021 Epoch: 7 Global Step: 74700 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 10:54:11,102-Speed 5981.10 samples/sec Loss 9.9097 LearningRate 0.2021 Epoch: 7 Global Step: 74710 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 10:54:17,973-Speed 5963.31 samples/sec Loss 9.9274 LearningRate 0.2021 Epoch: 7 Global Step: 74720 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 10:54:24,831-Speed 5973.99 samples/sec Loss 9.8769 LearningRate 0.2020 Epoch: 7 Global Step: 74730 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 10:54:31,687-Speed 5975.79 samples/sec Loss 9.9221 LearningRate 0.2020 Epoch: 7 Global Step: 74740 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:54:38,576-Speed 5946.59 samples/sec Loss 9.8491 LearningRate 0.2020 Epoch: 7 Global Step: 74750 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:54:45,465-Speed 5947.28 samples/sec Loss 9.8465 LearningRate 0.2020 Epoch: 7 Global Step: 74760 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:54:52,319-Speed 5978.12 samples/sec Loss 9.9558 LearningRate 0.2019 Epoch: 7 Global Step: 74770 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:54:59,169-Speed 5980.60 samples/sec Loss 9.9455 LearningRate 0.2019 Epoch: 7 Global Step: 74780 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 10:55:06,042-Speed 5961.34 samples/sec Loss 9.8984 LearningRate 0.2019 Epoch: 7 Global Step: 74790 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 10:55:12,882-Speed 5988.96 samples/sec Loss 9.8920 LearningRate 0.2018 Epoch: 7 Global Step: 74800 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 10:55:19,726-Speed 5987.72 samples/sec Loss 9.8992 LearningRate 0.2018 Epoch: 7 Global Step: 74810 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 10:55:26,588-Speed 5972.84 samples/sec Loss 9.7959 LearningRate 0.2018 Epoch: 7 Global Step: 74820 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 10:55:33,451-Speed 5970.47 samples/sec Loss 9.8369 LearningRate 0.2017 Epoch: 7 Global Step: 74830 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 10:55:40,302-Speed 5979.09 samples/sec Loss 9.7988 LearningRate 0.2017 Epoch: 7 Global Step: 74840 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 10:55:47,143-Speed 5988.76 samples/sec Loss 9.9008 LearningRate 0.2017 Epoch: 7 Global Step: 74850 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 10:55:53,995-Speed 5978.42 samples/sec Loss 9.9114 LearningRate 0.2017 Epoch: 7 Global Step: 74860 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 10:56:00,857-Speed 5970.12 samples/sec Loss 9.8766 LearningRate 0.2016 Epoch: 7 Global Step: 74870 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 10:56:07,722-Speed 5967.93 samples/sec Loss 9.8505 LearningRate 0.2016 Epoch: 7 Global Step: 74880 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:56:14,574-Speed 5981.37 samples/sec Loss 9.8064 LearningRate 0.2016 Epoch: 7 Global Step: 74890 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:56:21,425-Speed 5980.04 samples/sec Loss 9.8308 LearningRate 0.2015 Epoch: 7 Global Step: 74900 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:56:28,280-Speed 5976.46 samples/sec Loss 9.8633 LearningRate 0.2015 Epoch: 7 Global Step: 74910 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:56:35,133-Speed 5977.95 samples/sec Loss 9.8963 LearningRate 0.2015 Epoch: 7 Global Step: 74920 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:56:42,012-Speed 5955.92 samples/sec Loss 9.9357 LearningRate 0.2014 Epoch: 7 Global Step: 74930 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:56:48,887-Speed 5959.31 samples/sec Loss 9.8607 LearningRate 0.2014 Epoch: 7 Global Step: 74940 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:56:55,739-Speed 5980.42 samples/sec Loss 9.7999 LearningRate 0.2014 Epoch: 7 Global Step: 74950 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:57:02,583-Speed 5985.59 samples/sec Loss 9.9857 LearningRate 0.2013 Epoch: 7 Global Step: 74960 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:57:09,428-Speed 5984.95 samples/sec Loss 9.8447 LearningRate 0.2013 Epoch: 7 Global Step: 74970 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:57:16,369-Speed 5902.99 samples/sec Loss 9.8906 LearningRate 0.2013 Epoch: 7 Global Step: 74980 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 10:57:23,225-Speed 5975.36 samples/sec Loss 9.8835 LearningRate 0.2013 Epoch: 7 Global Step: 74990 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 10:57:30,089-Speed 5970.39 samples/sec Loss 9.9370 LearningRate 0.2012 Epoch: 7 Global Step: 75000 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:57:56,807-[lfw][75000]XNorm: 23.391407 Training: 2022-01-08 10:57:56,808-[lfw][75000]Accuracy-Flip: 0.99717+-0.00279 Training: 2022-01-08 10:57:56,808-[lfw][75000]Accuracy-Highest: 0.99750 Training: 2022-01-08 10:58:27,710-[cfp_fp][75000]XNorm: 20.369652 Training: 2022-01-08 10:58:27,711-[cfp_fp][75000]Accuracy-Flip: 0.97314+-0.00820 Training: 2022-01-08 10:58:27,713-[cfp_fp][75000]Accuracy-Highest: 0.97686 Training: 2022-01-08 10:58:54,492-[agedb_30][75000]XNorm: 22.560914 Training: 2022-01-08 10:58:54,493-[agedb_30][75000]Accuracy-Flip: 0.96800+-0.00792 Training: 2022-01-08 10:58:54,493-[agedb_30][75000]Accuracy-Highest: 0.96800 Training: 2022-01-08 10:59:01,345-Speed 448.86 samples/sec Loss 9.8661 LearningRate 0.2012 Epoch: 7 Global Step: 75010 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:59:08,186-Speed 5988.74 samples/sec Loss 9.8623 LearningRate 0.2012 Epoch: 7 Global Step: 75020 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:59:15,071-Speed 5950.71 samples/sec Loss 9.9138 LearningRate 0.2011 Epoch: 7 Global Step: 75030 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:59:21,940-Speed 5964.30 samples/sec Loss 9.8377 LearningRate 0.2011 Epoch: 7 Global Step: 75040 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:59:28,856-Speed 5923.66 samples/sec Loss 9.8907 LearningRate 0.2011 Epoch: 7 Global Step: 75050 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:59:35,724-Speed 5964.59 samples/sec Loss 9.9059 LearningRate 0.2010 Epoch: 7 Global Step: 75060 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:59:42,618-Speed 5943.30 samples/sec Loss 9.8782 LearningRate 0.2010 Epoch: 7 Global Step: 75070 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 10:59:49,507-Speed 5946.58 samples/sec Loss 9.9137 LearningRate 0.2010 Epoch: 7 Global Step: 75080 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 10:59:56,400-Speed 5943.99 samples/sec Loss 9.8888 LearningRate 0.2010 Epoch: 7 Global Step: 75090 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:00:03,267-Speed 5965.43 samples/sec Loss 9.9988 LearningRate 0.2009 Epoch: 7 Global Step: 75100 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:00:10,174-Speed 5931.20 samples/sec Loss 9.8769 LearningRate 0.2009 Epoch: 7 Global Step: 75110 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:00:17,054-Speed 5954.88 samples/sec Loss 9.8333 LearningRate 0.2009 Epoch: 7 Global Step: 75120 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:00:23,963-Speed 5929.97 samples/sec Loss 9.8276 LearningRate 0.2008 Epoch: 7 Global Step: 75130 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:00:30,826-Speed 5969.75 samples/sec Loss 9.9001 LearningRate 0.2008 Epoch: 7 Global Step: 75140 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:00:37,683-Speed 5975.06 samples/sec Loss 9.8614 LearningRate 0.2008 Epoch: 7 Global Step: 75150 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:00:44,532-Speed 5981.70 samples/sec Loss 9.8142 LearningRate 0.2007 Epoch: 7 Global Step: 75160 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:00:51,392-Speed 5971.56 samples/sec Loss 9.9512 LearningRate 0.2007 Epoch: 7 Global Step: 75170 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:00:58,285-Speed 5943.78 samples/sec Loss 9.8145 LearningRate 0.2007 Epoch: 7 Global Step: 75180 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:01:05,151-Speed 5967.10 samples/sec Loss 9.8769 LearningRate 0.2006 Epoch: 7 Global Step: 75190 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:01:12,011-Speed 5972.01 samples/sec Loss 9.8497 LearningRate 0.2006 Epoch: 7 Global Step: 75200 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:01:18,874-Speed 5969.33 samples/sec Loss 9.9488 LearningRate 0.2006 Epoch: 7 Global Step: 75210 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:01:25,722-Speed 5982.68 samples/sec Loss 9.8119 LearningRate 0.2006 Epoch: 7 Global Step: 75220 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:01:32,621-Speed 5937.76 samples/sec Loss 9.8429 LearningRate 0.2005 Epoch: 7 Global Step: 75230 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:01:39,477-Speed 5975.91 samples/sec Loss 9.8838 LearningRate 0.2005 Epoch: 7 Global Step: 75240 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:01:46,345-Speed 5965.14 samples/sec Loss 9.9020 LearningRate 0.2005 Epoch: 7 Global Step: 75250 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:01:53,238-Speed 5943.25 samples/sec Loss 9.8185 LearningRate 0.2004 Epoch: 7 Global Step: 75260 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:02:00,106-Speed 5966.60 samples/sec Loss 9.8728 LearningRate 0.2004 Epoch: 7 Global Step: 75270 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:02:06,976-Speed 5966.25 samples/sec Loss 9.8593 LearningRate 0.2004 Epoch: 7 Global Step: 75280 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 11:02:13,844-Speed 5964.77 samples/sec Loss 9.8436 LearningRate 0.2003 Epoch: 7 Global Step: 75290 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:02:20,699-Speed 5976.44 samples/sec Loss 9.8876 LearningRate 0.2003 Epoch: 7 Global Step: 75300 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:02:27,567-Speed 5964.74 samples/sec Loss 9.8409 LearningRate 0.2003 Epoch: 7 Global Step: 75310 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:02:34,420-Speed 5978.20 samples/sec Loss 9.7889 LearningRate 0.2003 Epoch: 7 Global Step: 75320 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:02:41,304-Speed 5952.26 samples/sec Loss 9.8511 LearningRate 0.2002 Epoch: 7 Global Step: 75330 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:02:48,164-Speed 5973.91 samples/sec Loss 9.8549 LearningRate 0.2002 Epoch: 7 Global Step: 75340 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:02:55,037-Speed 5960.92 samples/sec Loss 9.8054 LearningRate 0.2002 Epoch: 7 Global Step: 75350 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:03:01,903-Speed 5966.79 samples/sec Loss 9.8712 LearningRate 0.2001 Epoch: 7 Global Step: 75360 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:03:08,769-Speed 5966.49 samples/sec Loss 9.8696 LearningRate 0.2001 Epoch: 7 Global Step: 75370 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:03:15,633-Speed 5968.47 samples/sec Loss 9.8934 LearningRate 0.2001 Epoch: 7 Global Step: 75380 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:03:22,487-Speed 5977.26 samples/sec Loss 9.8327 LearningRate 0.2000 Epoch: 7 Global Step: 75390 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 11:03:29,403-Speed 5923.52 samples/sec Loss 9.9169 LearningRate 0.2000 Epoch: 7 Global Step: 75400 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 11:03:36,268-Speed 5967.24 samples/sec Loss 9.9138 LearningRate 0.2000 Epoch: 7 Global Step: 75410 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 11:03:43,165-Speed 5943.63 samples/sec Loss 9.7794 LearningRate 0.2000 Epoch: 7 Global Step: 75420 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 11:03:50,085-Speed 5920.19 samples/sec Loss 9.8948 LearningRate 0.1999 Epoch: 7 Global Step: 75430 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:03:57,019-Speed 5908.91 samples/sec Loss 9.7909 LearningRate 0.1999 Epoch: 7 Global Step: 75440 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:04:03,877-Speed 5973.24 samples/sec Loss 9.8285 LearningRate 0.1999 Epoch: 7 Global Step: 75450 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:04:10,739-Speed 5970.24 samples/sec Loss 9.8102 LearningRate 0.1998 Epoch: 7 Global Step: 75460 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:04:17,620-Speed 5954.02 samples/sec Loss 9.8087 LearningRate 0.1998 Epoch: 7 Global Step: 75470 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:04:24,484-Speed 5968.69 samples/sec Loss 9.8708 LearningRate 0.1998 Epoch: 7 Global Step: 75480 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:04:31,340-Speed 5975.61 samples/sec Loss 9.7870 LearningRate 0.1997 Epoch: 7 Global Step: 75490 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:04:38,198-Speed 5973.45 samples/sec Loss 9.9104 LearningRate 0.1997 Epoch: 7 Global Step: 75500 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:04:45,075-Speed 5957.27 samples/sec Loss 9.8948 LearningRate 0.1997 Epoch: 7 Global Step: 75510 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:04:51,934-Speed 5977.21 samples/sec Loss 9.8742 LearningRate 0.1996 Epoch: 7 Global Step: 75520 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:04:58,791-Speed 5974.63 samples/sec Loss 9.7520 LearningRate 0.1996 Epoch: 7 Global Step: 75530 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:05:05,672-Speed 5954.05 samples/sec Loss 9.8320 LearningRate 0.1996 Epoch: 7 Global Step: 75540 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:05:12,528-Speed 5975.45 samples/sec Loss 9.8432 LearningRate 0.1996 Epoch: 7 Global Step: 75550 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:05:19,436-Speed 5930.55 samples/sec Loss 9.8553 LearningRate 0.1995 Epoch: 7 Global Step: 75560 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:05:26,313-Speed 5957.45 samples/sec Loss 9.8436 LearningRate 0.1995 Epoch: 7 Global Step: 75570 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:05:33,194-Speed 5953.63 samples/sec Loss 9.7698 LearningRate 0.1995 Epoch: 7 Global Step: 75580 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:05:40,053-Speed 5972.93 samples/sec Loss 9.9043 LearningRate 0.1994 Epoch: 7 Global Step: 75590 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:05:46,955-Speed 5935.25 samples/sec Loss 9.8689 LearningRate 0.1994 Epoch: 7 Global Step: 75600 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:05:53,822-Speed 5966.60 samples/sec Loss 9.7917 LearningRate 0.1994 Epoch: 7 Global Step: 75610 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:06:00,674-Speed 5978.52 samples/sec Loss 9.8244 LearningRate 0.1993 Epoch: 7 Global Step: 75620 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:06:07,549-Speed 5959.29 samples/sec Loss 9.7667 LearningRate 0.1993 Epoch: 7 Global Step: 75630 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:06:14,401-Speed 5978.72 samples/sec Loss 9.9326 LearningRate 0.1993 Epoch: 7 Global Step: 75640 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:06:21,238-Speed 5991.47 samples/sec Loss 9.8202 LearningRate 0.1993 Epoch: 7 Global Step: 75650 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:06:28,116-Speed 5957.02 samples/sec Loss 9.8206 LearningRate 0.1992 Epoch: 7 Global Step: 75660 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:06:34,971-Speed 5975.78 samples/sec Loss 9.8561 LearningRate 0.1992 Epoch: 7 Global Step: 75670 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:06:41,830-Speed 5972.62 samples/sec Loss 9.8051 LearningRate 0.1992 Epoch: 7 Global Step: 75680 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:06:48,684-Speed 5977.25 samples/sec Loss 9.8713 LearningRate 0.1991 Epoch: 7 Global Step: 75690 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:06:55,531-Speed 5983.63 samples/sec Loss 9.8888 LearningRate 0.1991 Epoch: 7 Global Step: 75700 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:07:02,387-Speed 5976.91 samples/sec Loss 9.8179 LearningRate 0.1991 Epoch: 7 Global Step: 75710 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:07:09,269-Speed 5953.19 samples/sec Loss 9.8336 LearningRate 0.1990 Epoch: 7 Global Step: 75720 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:07:16,112-Speed 5986.65 samples/sec Loss 9.8023 LearningRate 0.1990 Epoch: 7 Global Step: 75730 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:07:22,970-Speed 5974.19 samples/sec Loss 9.8615 LearningRate 0.1990 Epoch: 7 Global Step: 75740 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:07:29,831-Speed 5970.79 samples/sec Loss 9.8153 LearningRate 0.1990 Epoch: 7 Global Step: 75750 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:07:36,684-Speed 5978.17 samples/sec Loss 9.7460 LearningRate 0.1989 Epoch: 7 Global Step: 75760 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:07:43,603-Speed 5920.73 samples/sec Loss 9.7718 LearningRate 0.1989 Epoch: 7 Global Step: 75770 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:07:50,463-Speed 5972.67 samples/sec Loss 9.7968 LearningRate 0.1989 Epoch: 7 Global Step: 75780 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:07:57,322-Speed 5974.33 samples/sec Loss 9.8150 LearningRate 0.1988 Epoch: 7 Global Step: 75790 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:08:04,180-Speed 5972.58 samples/sec Loss 9.7667 LearningRate 0.1988 Epoch: 7 Global Step: 75800 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:08:11,058-Speed 5957.60 samples/sec Loss 9.8012 LearningRate 0.1988 Epoch: 7 Global Step: 75810 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:08:17,923-Speed 5967.81 samples/sec Loss 9.8082 LearningRate 0.1987 Epoch: 7 Global Step: 75820 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:08:24,769-Speed 5984.05 samples/sec Loss 9.8084 LearningRate 0.1987 Epoch: 7 Global Step: 75830 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:08:31,636-Speed 5966.00 samples/sec Loss 9.8115 LearningRate 0.1987 Epoch: 7 Global Step: 75840 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:08:38,498-Speed 5970.52 samples/sec Loss 9.8623 LearningRate 0.1987 Epoch: 7 Global Step: 75850 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 11:08:45,367-Speed 5963.91 samples/sec Loss 9.7898 LearningRate 0.1986 Epoch: 7 Global Step: 75860 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 11:08:52,254-Speed 5948.82 samples/sec Loss 9.7237 LearningRate 0.1986 Epoch: 7 Global Step: 75870 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 11:08:59,128-Speed 5962.70 samples/sec Loss 9.8045 LearningRate 0.1986 Epoch: 7 Global Step: 75880 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 11:09:05,980-Speed 5979.28 samples/sec Loss 9.8494 LearningRate 0.1985 Epoch: 7 Global Step: 75890 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 11:09:12,850-Speed 5963.28 samples/sec Loss 9.7710 LearningRate 0.1985 Epoch: 7 Global Step: 75900 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 11:09:19,731-Speed 5954.48 samples/sec Loss 9.8214 LearningRate 0.1985 Epoch: 7 Global Step: 75910 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 11:09:26,586-Speed 5976.01 samples/sec Loss 9.7732 LearningRate 0.1984 Epoch: 7 Global Step: 75920 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:09:33,440-Speed 5976.99 samples/sec Loss 9.7940 LearningRate 0.1984 Epoch: 7 Global Step: 75930 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:09:40,305-Speed 5968.75 samples/sec Loss 9.8811 LearningRate 0.1984 Epoch: 7 Global Step: 75940 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:09:47,255-Speed 5895.05 samples/sec Loss 9.8224 LearningRate 0.1983 Epoch: 7 Global Step: 75950 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:09:54,216-Speed 5885.64 samples/sec Loss 9.7928 LearningRate 0.1983 Epoch: 7 Global Step: 75960 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:10:01,100-Speed 5951.22 samples/sec Loss 9.7886 LearningRate 0.1983 Epoch: 7 Global Step: 75970 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:10:07,960-Speed 5972.36 samples/sec Loss 9.7771 LearningRate 0.1983 Epoch: 7 Global Step: 75980 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:10:15,551-Speed 5975.68 samples/sec Loss 9.7438 LearningRate 0.1982 Epoch: 7 Global Step: 75990 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:10:22,405-Speed 5977.62 samples/sec Loss 9.7987 LearningRate 0.1982 Epoch: 7 Global Step: 76000 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:10:29,260-Speed 5976.54 samples/sec Loss 9.8743 LearningRate 0.1982 Epoch: 7 Global Step: 76010 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:10:36,119-Speed 5972.71 samples/sec Loss 9.8284 LearningRate 0.1981 Epoch: 7 Global Step: 76020 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:10:42,975-Speed 5975.48 samples/sec Loss 9.7465 LearningRate 0.1981 Epoch: 7 Global Step: 76030 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:10:49,857-Speed 5952.51 samples/sec Loss 9.6812 LearningRate 0.1981 Epoch: 7 Global Step: 76040 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:10:56,731-Speed 5959.31 samples/sec Loss 9.8149 LearningRate 0.1980 Epoch: 7 Global Step: 76050 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:11:03,622-Speed 5945.52 samples/sec Loss 9.8545 LearningRate 0.1980 Epoch: 7 Global Step: 76060 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:11:10,506-Speed 5951.07 samples/sec Loss 9.7984 LearningRate 0.1980 Epoch: 7 Global Step: 76070 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:11:17,377-Speed 5962.64 samples/sec Loss 9.7632 LearningRate 0.1980 Epoch: 7 Global Step: 76080 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:11:24,233-Speed 5976.22 samples/sec Loss 9.8171 LearningRate 0.1979 Epoch: 7 Global Step: 76090 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:11:31,089-Speed 5975.34 samples/sec Loss 9.7801 LearningRate 0.1979 Epoch: 7 Global Step: 76100 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:11:37,942-Speed 5977.71 samples/sec Loss 9.8155 LearningRate 0.1979 Epoch: 7 Global Step: 76110 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:11:44,806-Speed 5968.33 samples/sec Loss 9.7797 LearningRate 0.1978 Epoch: 7 Global Step: 76120 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 11:11:51,668-Speed 5969.80 samples/sec Loss 9.9077 LearningRate 0.1978 Epoch: 7 Global Step: 76130 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 11:11:58,527-Speed 5972.77 samples/sec Loss 9.7723 LearningRate 0.1978 Epoch: 7 Global Step: 76140 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 11:12:05,393-Speed 5966.72 samples/sec Loss 9.7709 LearningRate 0.1977 Epoch: 7 Global Step: 76150 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 11:12:12,272-Speed 5955.21 samples/sec Loss 9.8159 LearningRate 0.1977 Epoch: 7 Global Step: 76160 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 11:12:19,137-Speed 5967.43 samples/sec Loss 9.7984 LearningRate 0.1977 Epoch: 7 Global Step: 76170 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 11:12:25,990-Speed 5978.15 samples/sec Loss 9.7824 LearningRate 0.1977 Epoch: 7 Global Step: 76180 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 11:12:32,843-Speed 5978.24 samples/sec Loss 9.8508 LearningRate 0.1976 Epoch: 7 Global Step: 76190 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 11:12:39,693-Speed 5980.22 samples/sec Loss 9.7996 LearningRate 0.1976 Epoch: 7 Global Step: 76200 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:12:46,556-Speed 5969.28 samples/sec Loss 9.7699 LearningRate 0.1976 Epoch: 7 Global Step: 76210 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:12:53,392-Speed 5992.77 samples/sec Loss 9.8243 LearningRate 0.1975 Epoch: 7 Global Step: 76220 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:13:00,259-Speed 5966.38 samples/sec Loss 9.8308 LearningRate 0.1975 Epoch: 7 Global Step: 76230 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:13:07,128-Speed 5964.72 samples/sec Loss 9.8833 LearningRate 0.1975 Epoch: 7 Global Step: 76240 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:13:14,015-Speed 5948.95 samples/sec Loss 9.7974 LearningRate 0.1974 Epoch: 7 Global Step: 76250 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:13:21,946-Speed 5165.32 samples/sec Loss 9.7456 LearningRate 0.1974 Epoch: 7 Global Step: 76260 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:13:28,813-Speed 5966.01 samples/sec Loss 9.8474 LearningRate 0.1974 Epoch: 7 Global Step: 76270 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:13:35,679-Speed 5966.97 samples/sec Loss 9.7649 LearningRate 0.1974 Epoch: 7 Global Step: 76280 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:13:42,530-Speed 5979.72 samples/sec Loss 9.8252 LearningRate 0.1973 Epoch: 7 Global Step: 76290 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:13:49,392-Speed 5969.84 samples/sec Loss 9.7767 LearningRate 0.1973 Epoch: 7 Global Step: 76300 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:13:56,237-Speed 5985.01 samples/sec Loss 9.7582 LearningRate 0.1973 Epoch: 7 Global Step: 76310 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 11:14:03,116-Speed 5955.58 samples/sec Loss 9.8149 LearningRate 0.1972 Epoch: 7 Global Step: 76320 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 11:14:09,981-Speed 5967.14 samples/sec Loss 9.7148 LearningRate 0.1972 Epoch: 7 Global Step: 76330 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 11:14:16,840-Speed 5973.58 samples/sec Loss 9.7043 LearningRate 0.1972 Epoch: 7 Global Step: 76340 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 11:14:23,696-Speed 5974.47 samples/sec Loss 9.8140 LearningRate 0.1971 Epoch: 7 Global Step: 76350 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 11:14:30,687-Speed 5860.36 samples/sec Loss 9.8135 LearningRate 0.1971 Epoch: 7 Global Step: 76360 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 11:14:37,557-Speed 5963.59 samples/sec Loss 9.7494 LearningRate 0.1971 Epoch: 7 Global Step: 76370 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 11:14:44,414-Speed 5974.01 samples/sec Loss 9.8532 LearningRate 0.1971 Epoch: 7 Global Step: 76380 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 11:14:51,270-Speed 5975.77 samples/sec Loss 9.7503 LearningRate 0.1970 Epoch: 7 Global Step: 76390 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 11:14:58,135-Speed 5969.75 samples/sec Loss 9.7703 LearningRate 0.1970 Epoch: 7 Global Step: 76400 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 11:15:04,992-Speed 5975.03 samples/sec Loss 9.7466 LearningRate 0.1970 Epoch: 7 Global Step: 76410 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:15:11,843-Speed 5980.31 samples/sec Loss 9.7500 LearningRate 0.1969 Epoch: 7 Global Step: 76420 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:15:21,207-Speed 4374.52 samples/sec Loss 9.7238 LearningRate 0.1969 Epoch: 7 Global Step: 76430 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:15:28,065-Speed 5974.65 samples/sec Loss 9.7144 LearningRate 0.1969 Epoch: 7 Global Step: 76440 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:15:34,925-Speed 5972.35 samples/sec Loss 9.8415 LearningRate 0.1968 Epoch: 7 Global Step: 76450 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:15:41,781-Speed 5976.64 samples/sec Loss 9.7391 LearningRate 0.1968 Epoch: 7 Global Step: 76460 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:15:48,634-Speed 5977.77 samples/sec Loss 9.8163 LearningRate 0.1968 Epoch: 7 Global Step: 76470 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:15:55,512-Speed 5956.17 samples/sec Loss 9.7419 LearningRate 0.1968 Epoch: 7 Global Step: 76480 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:16:02,364-Speed 5981.46 samples/sec Loss 9.8793 LearningRate 0.1967 Epoch: 7 Global Step: 76490 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:16:09,238-Speed 5959.61 samples/sec Loss 9.8032 LearningRate 0.1967 Epoch: 7 Global Step: 76500 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:16:16,123-Speed 5951.15 samples/sec Loss 9.7177 LearningRate 0.1967 Epoch: 7 Global Step: 76510 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:16:22,984-Speed 5970.60 samples/sec Loss 9.8230 LearningRate 0.1966 Epoch: 7 Global Step: 76520 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:16:29,886-Speed 5935.80 samples/sec Loss 9.7539 LearningRate 0.1966 Epoch: 7 Global Step: 76530 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:16:36,798-Speed 5927.05 samples/sec Loss 9.7879 LearningRate 0.1966 Epoch: 7 Global Step: 76540 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:16:43,650-Speed 5978.60 samples/sec Loss 9.7966 LearningRate 0.1965 Epoch: 7 Global Step: 76550 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:16:50,519-Speed 5964.01 samples/sec Loss 9.6802 LearningRate 0.1965 Epoch: 7 Global Step: 76560 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:16:57,386-Speed 5965.77 samples/sec Loss 9.6578 LearningRate 0.1965 Epoch: 7 Global Step: 76570 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:17:04,253-Speed 5966.35 samples/sec Loss 9.7603 LearningRate 0.1965 Epoch: 7 Global Step: 76580 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:17:11,118-Speed 5967.10 samples/sec Loss 9.7856 LearningRate 0.1964 Epoch: 7 Global Step: 76590 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:17:18,001-Speed 5952.54 samples/sec Loss 9.7843 LearningRate 0.1964 Epoch: 7 Global Step: 76600 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:17:24,867-Speed 5967.15 samples/sec Loss 9.8387 LearningRate 0.1964 Epoch: 7 Global Step: 76610 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 11:17:31,724-Speed 5974.15 samples/sec Loss 9.7311 LearningRate 0.1963 Epoch: 7 Global Step: 76620 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 11:17:38,577-Speed 5978.43 samples/sec Loss 9.7992 LearningRate 0.1963 Epoch: 7 Global Step: 76630 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 11:17:45,425-Speed 5981.77 samples/sec Loss 9.7368 LearningRate 0.1963 Epoch: 7 Global Step: 76640 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:17:52,308-Speed 5951.90 samples/sec Loss 9.8049 LearningRate 0.1962 Epoch: 7 Global Step: 76650 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:17:59,179-Speed 5962.60 samples/sec Loss 9.7424 LearningRate 0.1962 Epoch: 7 Global Step: 76660 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:18:06,059-Speed 5956.92 samples/sec Loss 9.7322 LearningRate 0.1962 Epoch: 7 Global Step: 76670 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:18:12,913-Speed 5976.57 samples/sec Loss 9.7962 LearningRate 0.1962 Epoch: 7 Global Step: 76680 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:18:19,774-Speed 5970.89 samples/sec Loss 9.7111 LearningRate 0.1961 Epoch: 7 Global Step: 76690 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:18:26,632-Speed 5973.76 samples/sec Loss 9.7731 LearningRate 0.1961 Epoch: 7 Global Step: 76700 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:18:33,488-Speed 5975.21 samples/sec Loss 9.7503 LearningRate 0.1961 Epoch: 7 Global Step: 76710 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:18:40,346-Speed 5973.45 samples/sec Loss 9.7937 LearningRate 0.1960 Epoch: 7 Global Step: 76720 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:18:47,190-Speed 5986.07 samples/sec Loss 9.7378 LearningRate 0.1960 Epoch: 7 Global Step: 76730 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:18:54,043-Speed 5978.70 samples/sec Loss 9.7873 LearningRate 0.1960 Epoch: 7 Global Step: 76740 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:19:00,883-Speed 5989.04 samples/sec Loss 9.7104 LearningRate 0.1959 Epoch: 7 Global Step: 76750 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 11:19:07,753-Speed 5962.79 samples/sec Loss 9.6779 LearningRate 0.1959 Epoch: 7 Global Step: 76760 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 11:19:14,602-Speed 5981.97 samples/sec Loss 9.7230 LearningRate 0.1959 Epoch: 7 Global Step: 76770 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 11:19:21,467-Speed 5967.33 samples/sec Loss 9.7230 LearningRate 0.1959 Epoch: 7 Global Step: 76780 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 11:19:28,320-Speed 5978.00 samples/sec Loss 9.7245 LearningRate 0.1958 Epoch: 7 Global Step: 76790 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 11:19:35,183-Speed 5969.54 samples/sec Loss 9.7638 LearningRate 0.1958 Epoch: 7 Global Step: 76800 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 11:19:42,033-Speed 5980.82 samples/sec Loss 9.6759 LearningRate 0.1958 Epoch: 7 Global Step: 76810 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 11:19:48,892-Speed 5972.49 samples/sec Loss 9.7603 LearningRate 0.1957 Epoch: 7 Global Step: 76820 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 11:19:55,748-Speed 5976.18 samples/sec Loss 9.7838 LearningRate 0.1957 Epoch: 7 Global Step: 76830 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 11:20:02,609-Speed 5970.86 samples/sec Loss 9.7337 LearningRate 0.1957 Epoch: 7 Global Step: 76840 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 11:20:09,462-Speed 5978.62 samples/sec Loss 9.7440 LearningRate 0.1956 Epoch: 7 Global Step: 76850 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:20:16,331-Speed 5964.02 samples/sec Loss 9.7147 LearningRate 0.1956 Epoch: 7 Global Step: 76860 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:20:23,182-Speed 5980.19 samples/sec Loss 9.7735 LearningRate 0.1956 Epoch: 7 Global Step: 76870 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:20:30,109-Speed 5914.01 samples/sec Loss 9.7570 LearningRate 0.1956 Epoch: 7 Global Step: 76880 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:20:36,965-Speed 5975.00 samples/sec Loss 9.7494 LearningRate 0.1955 Epoch: 7 Global Step: 76890 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:20:43,797-Speed 5996.49 samples/sec Loss 9.6655 LearningRate 0.1955 Epoch: 7 Global Step: 76900 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 11:20:50,640-Speed 5986.55 samples/sec Loss 9.6667 LearningRate 0.1955 Epoch: 7 Global Step: 76910 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 11:20:57,486-Speed 5983.53 samples/sec Loss 9.7537 LearningRate 0.1954 Epoch: 7 Global Step: 76920 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 11:21:04,341-Speed 5978.22 samples/sec Loss 9.7719 LearningRate 0.1954 Epoch: 7 Global Step: 76930 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 11:21:11,194-Speed 5978.23 samples/sec Loss 9.6635 LearningRate 0.1954 Epoch: 7 Global Step: 76940 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 11:21:18,076-Speed 5952.92 samples/sec Loss 9.7035 LearningRate 0.1953 Epoch: 7 Global Step: 76950 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 11:21:24,939-Speed 5968.81 samples/sec Loss 9.6889 LearningRate 0.1953 Epoch: 7 Global Step: 76960 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 11:21:31,805-Speed 5967.08 samples/sec Loss 9.6437 LearningRate 0.1953 Epoch: 7 Global Step: 76970 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 11:21:38,672-Speed 5966.11 samples/sec Loss 9.8016 LearningRate 0.1953 Epoch: 7 Global Step: 76980 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 11:21:45,536-Speed 5968.48 samples/sec Loss 9.7654 LearningRate 0.1952 Epoch: 7 Global Step: 76990 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 11:21:52,396-Speed 5972.58 samples/sec Loss 9.8288 LearningRate 0.1952 Epoch: 7 Global Step: 77000 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:21:59,250-Speed 5976.50 samples/sec Loss 9.7937 LearningRate 0.1952 Epoch: 7 Global Step: 77010 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:22:06,111-Speed 5973.96 samples/sec Loss 9.6763 LearningRate 0.1951 Epoch: 7 Global Step: 77020 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:22:12,967-Speed 5975.78 samples/sec Loss 9.6943 LearningRate 0.1951 Epoch: 7 Global Step: 77030 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:22:19,822-Speed 5975.78 samples/sec Loss 9.8525 LearningRate 0.1951 Epoch: 7 Global Step: 77040 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:22:26,726-Speed 5934.01 samples/sec Loss 9.7292 LearningRate 0.1950 Epoch: 7 Global Step: 77050 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:22:33,570-Speed 5985.09 samples/sec Loss 9.7892 LearningRate 0.1950 Epoch: 7 Global Step: 77060 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:22:40,418-Speed 5984.46 samples/sec Loss 9.7376 LearningRate 0.1950 Epoch: 7 Global Step: 77070 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:22:47,275-Speed 5973.76 samples/sec Loss 9.7498 LearningRate 0.1950 Epoch: 7 Global Step: 77080 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:22:54,135-Speed 5972.10 samples/sec Loss 9.6974 LearningRate 0.1949 Epoch: 7 Global Step: 77090 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 11:23:00,992-Speed 5974.50 samples/sec Loss 9.7521 LearningRate 0.1949 Epoch: 7 Global Step: 77100 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:23:12,584-Speed 3533.74 samples/sec Loss 9.6776 LearningRate 0.1949 Epoch: 7 Global Step: 77110 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:23:19,450-Speed 5967.31 samples/sec Loss 9.7044 LearningRate 0.1948 Epoch: 7 Global Step: 77120 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:23:26,342-Speed 5945.00 samples/sec Loss 9.7334 LearningRate 0.1948 Epoch: 7 Global Step: 77130 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:23:33,192-Speed 5980.45 samples/sec Loss 9.7285 LearningRate 0.1948 Epoch: 7 Global Step: 77140 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:23:40,050-Speed 5973.83 samples/sec Loss 9.7064 LearningRate 0.1947 Epoch: 7 Global Step: 77150 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:23:46,965-Speed 5926.37 samples/sec Loss 9.6901 LearningRate 0.1947 Epoch: 7 Global Step: 77160 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:23:53,840-Speed 5958.47 samples/sec Loss 9.7015 LearningRate 0.1947 Epoch: 7 Global Step: 77170 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:24:00,700-Speed 5972.29 samples/sec Loss 9.6768 LearningRate 0.1947 Epoch: 7 Global Step: 77180 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:24:07,552-Speed 5978.92 samples/sec Loss 9.7202 LearningRate 0.1946 Epoch: 7 Global Step: 77190 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:24:14,422-Speed 5965.69 samples/sec Loss 9.7414 LearningRate 0.1946 Epoch: 7 Global Step: 77200 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 11:24:21,306-Speed 5951.44 samples/sec Loss 9.7794 LearningRate 0.1946 Epoch: 7 Global Step: 77210 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 11:24:28,174-Speed 5963.98 samples/sec Loss 9.6775 LearningRate 0.1945 Epoch: 7 Global Step: 77220 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-01-08 11:24:35,054-Speed 5955.02 samples/sec Loss 9.6526 LearningRate 0.1945 Epoch: 7 Global Step: 77230 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:24:41,931-Speed 5957.50 samples/sec Loss 9.7504 LearningRate 0.1945 Epoch: 7 Global Step: 77240 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:24:48,790-Speed 5973.04 samples/sec Loss 9.6483 LearningRate 0.1944 Epoch: 7 Global Step: 77250 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:24:55,643-Speed 5978.19 samples/sec Loss 9.7103 LearningRate 0.1944 Epoch: 7 Global Step: 77260 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:25:02,575-Speed 5910.17 samples/sec Loss 9.6605 LearningRate 0.1944 Epoch: 7 Global Step: 77270 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:25:09,423-Speed 5982.34 samples/sec Loss 9.7954 LearningRate 0.1944 Epoch: 7 Global Step: 77280 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:25:16,292-Speed 5964.96 samples/sec Loss 9.7158 LearningRate 0.1943 Epoch: 7 Global Step: 77290 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:25:23,143-Speed 5979.35 samples/sec Loss 9.7524 LearningRate 0.1943 Epoch: 7 Global Step: 77300 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:25:30,004-Speed 5971.77 samples/sec Loss 9.6653 LearningRate 0.1943 Epoch: 7 Global Step: 77310 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 11:25:36,874-Speed 5963.36 samples/sec Loss 9.6916 LearningRate 0.1942 Epoch: 7 Global Step: 77320 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:25:43,716-Speed 5986.62 samples/sec Loss 9.7217 LearningRate 0.1942 Epoch: 7 Global Step: 77330 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:25:50,588-Speed 5962.37 samples/sec Loss 9.7035 LearningRate 0.1942 Epoch: 7 Global Step: 77340 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:25:57,438-Speed 5980.52 samples/sec Loss 9.6619 LearningRate 0.1941 Epoch: 7 Global Step: 77350 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:26:04,327-Speed 5946.75 samples/sec Loss 9.6803 LearningRate 0.1941 Epoch: 7 Global Step: 77360 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:26:11,177-Speed 5980.77 samples/sec Loss 9.6905 LearningRate 0.1941 Epoch: 7 Global Step: 77370 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:26:18,023-Speed 5983.83 samples/sec Loss 9.6745 LearningRate 0.1941 Epoch: 7 Global Step: 77380 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:26:24,905-Speed 5954.85 samples/sec Loss 9.7452 LearningRate 0.1940 Epoch: 7 Global Step: 77390 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:26:31,763-Speed 5973.13 samples/sec Loss 9.6548 LearningRate 0.1940 Epoch: 7 Global Step: 77400 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:26:38,622-Speed 5973.92 samples/sec Loss 9.7167 LearningRate 0.1940 Epoch: 7 Global Step: 77410 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:26:45,494-Speed 5961.24 samples/sec Loss 9.6947 LearningRate 0.1939 Epoch: 7 Global Step: 77420 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:26:52,375-Speed 5955.28 samples/sec Loss 9.7208 LearningRate 0.1939 Epoch: 7 Global Step: 77430 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 11:26:59,222-Speed 5983.62 samples/sec Loss 9.7594 LearningRate 0.1939 Epoch: 7 Global Step: 77440 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 11:27:06,060-Speed 5991.06 samples/sec Loss 9.7413 LearningRate 0.1938 Epoch: 7 Global Step: 77450 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:27:13,019-Speed 5887.13 samples/sec Loss 9.7070 LearningRate 0.1938 Epoch: 7 Global Step: 77460 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:27:19,876-Speed 5974.64 samples/sec Loss 9.6946 LearningRate 0.1938 Epoch: 7 Global Step: 77470 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:27:26,773-Speed 5940.31 samples/sec Loss 9.7538 LearningRate 0.1938 Epoch: 7 Global Step: 77480 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:27:33,664-Speed 5945.20 samples/sec Loss 9.6800 LearningRate 0.1937 Epoch: 7 Global Step: 77490 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:27:40,553-Speed 5947.26 samples/sec Loss 9.6736 LearningRate 0.1937 Epoch: 7 Global Step: 77500 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:27:47,441-Speed 5947.84 samples/sec Loss 9.6715 LearningRate 0.1937 Epoch: 7 Global Step: 77510 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:27:54,302-Speed 5971.64 samples/sec Loss 9.7354 LearningRate 0.1936 Epoch: 7 Global Step: 77520 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:28:01,156-Speed 5977.39 samples/sec Loss 9.7045 LearningRate 0.1936 Epoch: 7 Global Step: 77530 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:28:08,006-Speed 5981.24 samples/sec Loss 9.6208 LearningRate 0.1936 Epoch: 7 Global Step: 77540 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:28:14,878-Speed 5961.85 samples/sec Loss 9.7040 LearningRate 0.1935 Epoch: 7 Global Step: 77550 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 11:28:21,721-Speed 5986.37 samples/sec Loss 9.7032 LearningRate 0.1935 Epoch: 7 Global Step: 77560 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 11:28:28,575-Speed 5977.13 samples/sec Loss 9.7815 LearningRate 0.1935 Epoch: 7 Global Step: 77570 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 11:28:35,428-Speed 5979.92 samples/sec Loss 9.7471 LearningRate 0.1935 Epoch: 7 Global Step: 77580 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 11:28:42,296-Speed 5965.09 samples/sec Loss 9.6164 LearningRate 0.1934 Epoch: 7 Global Step: 77590 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 11:28:49,148-Speed 5978.41 samples/sec Loss 9.6621 LearningRate 0.1934 Epoch: 7 Global Step: 77600 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 11:28:55,999-Speed 5980.16 samples/sec Loss 9.6391 LearningRate 0.1934 Epoch: 7 Global Step: 77610 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:29:02,861-Speed 5970.17 samples/sec Loss 9.6814 LearningRate 0.1933 Epoch: 7 Global Step: 77620 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:29:09,727-Speed 5965.76 samples/sec Loss 9.7232 LearningRate 0.1933 Epoch: 7 Global Step: 77630 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:29:16,576-Speed 5981.58 samples/sec Loss 9.7006 LearningRate 0.1933 Epoch: 7 Global Step: 77640 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:29:23,439-Speed 5969.65 samples/sec Loss 9.6273 LearningRate 0.1933 Epoch: 7 Global Step: 77650 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:29:30,303-Speed 5968.37 samples/sec Loss 9.6421 LearningRate 0.1932 Epoch: 7 Global Step: 77660 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:29:37,164-Speed 5970.52 samples/sec Loss 9.7131 LearningRate 0.1932 Epoch: 7 Global Step: 77670 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:29:44,031-Speed 5966.60 samples/sec Loss 9.7177 LearningRate 0.1932 Epoch: 7 Global Step: 77680 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:29:50,907-Speed 5957.35 samples/sec Loss 9.6672 LearningRate 0.1931 Epoch: 7 Global Step: 77690 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:29:57,760-Speed 5977.66 samples/sec Loss 9.6993 LearningRate 0.1931 Epoch: 7 Global Step: 77700 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:30:04,618-Speed 5974.31 samples/sec Loss 9.6446 LearningRate 0.1931 Epoch: 7 Global Step: 77710 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:30:11,461-Speed 5986.22 samples/sec Loss 9.6196 LearningRate 0.1930 Epoch: 7 Global Step: 77720 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:30:18,340-Speed 5955.23 samples/sec Loss 9.6208 LearningRate 0.1930 Epoch: 7 Global Step: 77730 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:30:25,184-Speed 5985.85 samples/sec Loss 9.7250 LearningRate 0.1930 Epoch: 7 Global Step: 77740 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:30:32,035-Speed 5979.50 samples/sec Loss 9.7224 LearningRate 0.1930 Epoch: 7 Global Step: 77750 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:30:38,921-Speed 5957.13 samples/sec Loss 9.5628 LearningRate 0.1929 Epoch: 7 Global Step: 77760 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:30:45,775-Speed 5976.29 samples/sec Loss 9.6909 LearningRate 0.1929 Epoch: 7 Global Step: 77770 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:30:52,720-Speed 5899.50 samples/sec Loss 9.6547 LearningRate 0.1929 Epoch: 7 Global Step: 77780 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:30:59,566-Speed 5985.54 samples/sec Loss 9.6789 LearningRate 0.1928 Epoch: 7 Global Step: 77790 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:31:06,424-Speed 5974.28 samples/sec Loss 9.6545 LearningRate 0.1928 Epoch: 7 Global Step: 77800 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:31:13,275-Speed 5978.87 samples/sec Loss 9.6421 LearningRate 0.1928 Epoch: 7 Global Step: 77810 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 11:31:20,137-Speed 5970.09 samples/sec Loss 9.6424 LearningRate 0.1927 Epoch: 7 Global Step: 77820 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 11:31:26,993-Speed 5976.28 samples/sec Loss 9.6624 LearningRate 0.1927 Epoch: 7 Global Step: 77830 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 11:31:33,833-Speed 5989.42 samples/sec Loss 9.6600 LearningRate 0.1927 Epoch: 7 Global Step: 77840 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 11:31:40,678-Speed 5984.52 samples/sec Loss 9.7061 LearningRate 0.1927 Epoch: 7 Global Step: 77850 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 11:31:47,536-Speed 5973.42 samples/sec Loss 9.6754 LearningRate 0.1926 Epoch: 7 Global Step: 77860 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 11:31:54,390-Speed 5976.75 samples/sec Loss 9.6961 LearningRate 0.1926 Epoch: 7 Global Step: 77870 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 11:32:01,252-Speed 5970.07 samples/sec Loss 9.6621 LearningRate 0.1926 Epoch: 7 Global Step: 77880 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 11:32:08,176-Speed 5917.06 samples/sec Loss 9.6221 LearningRate 0.1925 Epoch: 7 Global Step: 77890 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 11:32:15,098-Speed 5917.89 samples/sec Loss 9.6785 LearningRate 0.1925 Epoch: 7 Global Step: 77900 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 11:32:22,024-Speed 5914.84 samples/sec Loss 9.7187 LearningRate 0.1925 Epoch: 7 Global Step: 77910 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:32:28,946-Speed 5918.39 samples/sec Loss 9.6155 LearningRate 0.1924 Epoch: 7 Global Step: 77920 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:32:35,882-Speed 5909.82 samples/sec Loss 9.5967 LearningRate 0.1924 Epoch: 7 Global Step: 77930 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:32:42,760-Speed 5956.14 samples/sec Loss 9.6842 LearningRate 0.1924 Epoch: 7 Global Step: 77940 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:32:49,708-Speed 5897.67 samples/sec Loss 9.7107 LearningRate 0.1924 Epoch: 7 Global Step: 77950 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:32:56,557-Speed 5981.41 samples/sec Loss 9.6809 LearningRate 0.1923 Epoch: 7 Global Step: 77960 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:33:03,424-Speed 5968.41 samples/sec Loss 9.6103 LearningRate 0.1923 Epoch: 7 Global Step: 77970 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:33:10,280-Speed 5976.48 samples/sec Loss 9.6049 LearningRate 0.1923 Epoch: 7 Global Step: 77980 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:33:17,125-Speed 5984.38 samples/sec Loss 9.6904 LearningRate 0.1922 Epoch: 7 Global Step: 77990 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:33:23,994-Speed 5964.61 samples/sec Loss 9.6920 LearningRate 0.1922 Epoch: 7 Global Step: 78000 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:33:30,851-Speed 5973.70 samples/sec Loss 9.6797 LearningRate 0.1922 Epoch: 7 Global Step: 78010 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 11:33:37,709-Speed 5974.14 samples/sec Loss 9.7218 LearningRate 0.1922 Epoch: 7 Global Step: 78020 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 11:33:44,547-Speed 5991.50 samples/sec Loss 9.6173 LearningRate 0.1921 Epoch: 7 Global Step: 78030 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:33:51,408-Speed 5970.49 samples/sec Loss 9.6496 LearningRate 0.1921 Epoch: 7 Global Step: 78040 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:33:58,264-Speed 5975.62 samples/sec Loss 9.6628 LearningRate 0.1921 Epoch: 7 Global Step: 78050 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:34:05,132-Speed 5965.93 samples/sec Loss 9.6873 LearningRate 0.1920 Epoch: 7 Global Step: 78060 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:34:11,977-Speed 5984.66 samples/sec Loss 9.6727 LearningRate 0.1920 Epoch: 7 Global Step: 78070 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:34:18,851-Speed 5960.28 samples/sec Loss 9.6520 LearningRate 0.1920 Epoch: 7 Global Step: 78080 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:34:25,712-Speed 5971.21 samples/sec Loss 9.6549 LearningRate 0.1919 Epoch: 7 Global Step: 78090 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:34:32,661-Speed 5895.01 samples/sec Loss 9.7628 LearningRate 0.1919 Epoch: 7 Global Step: 78100 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:34:39,551-Speed 5946.22 samples/sec Loss 9.6858 LearningRate 0.1919 Epoch: 7 Global Step: 78110 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:34:46,399-Speed 5982.95 samples/sec Loss 9.6214 LearningRate 0.1919 Epoch: 7 Global Step: 78120 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:34:53,270-Speed 5962.12 samples/sec Loss 9.6497 LearningRate 0.1918 Epoch: 7 Global Step: 78130 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 11:35:00,149-Speed 5955.91 samples/sec Loss 9.6930 LearningRate 0.1918 Epoch: 7 Global Step: 78140 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:35:07,007-Speed 5973.49 samples/sec Loss 9.6955 LearningRate 0.1918 Epoch: 7 Global Step: 78150 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:35:13,854-Speed 5983.16 samples/sec Loss 9.7206 LearningRate 0.1917 Epoch: 7 Global Step: 78160 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:35:20,721-Speed 5966.24 samples/sec Loss 9.6850 LearningRate 0.1917 Epoch: 7 Global Step: 78170 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:35:27,576-Speed 5975.81 samples/sec Loss 9.6551 LearningRate 0.1917 Epoch: 7 Global Step: 78180 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:35:34,446-Speed 5964.95 samples/sec Loss 9.6584 LearningRate 0.1916 Epoch: 7 Global Step: 78190 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:35:41,307-Speed 5972.37 samples/sec Loss 9.6161 LearningRate 0.1916 Epoch: 7 Global Step: 78200 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:35:48,160-Speed 5978.55 samples/sec Loss 9.5329 LearningRate 0.1916 Epoch: 7 Global Step: 78210 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:35:55,010-Speed 5980.91 samples/sec Loss 9.6331 LearningRate 0.1916 Epoch: 7 Global Step: 78220 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:36:01,859-Speed 5981.58 samples/sec Loss 9.7121 LearningRate 0.1915 Epoch: 7 Global Step: 78230 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:36:08,711-Speed 5979.92 samples/sec Loss 9.6819 LearningRate 0.1915 Epoch: 7 Global Step: 78240 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 11:36:15,585-Speed 5964.91 samples/sec Loss 9.6126 LearningRate 0.1915 Epoch: 7 Global Step: 78250 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 11:36:22,442-Speed 5974.76 samples/sec Loss 9.6676 LearningRate 0.1914 Epoch: 7 Global Step: 78260 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 11:36:29,310-Speed 5965.06 samples/sec Loss 9.6709 LearningRate 0.1914 Epoch: 7 Global Step: 78270 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:36:36,176-Speed 5966.52 samples/sec Loss 9.6799 LearningRate 0.1914 Epoch: 7 Global Step: 78280 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:36:43,066-Speed 5948.38 samples/sec Loss 9.6494 LearningRate 0.1913 Epoch: 7 Global Step: 78290 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:36:49,925-Speed 5975.59 samples/sec Loss 9.6029 LearningRate 0.1913 Epoch: 7 Global Step: 78300 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:36:56,773-Speed 5981.64 samples/sec Loss 9.6233 LearningRate 0.1913 Epoch: 7 Global Step: 78310 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:37:03,620-Speed 5983.66 samples/sec Loss 9.6943 LearningRate 0.1913 Epoch: 7 Global Step: 78320 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:37:10,478-Speed 5975.54 samples/sec Loss 9.5528 LearningRate 0.1912 Epoch: 7 Global Step: 78330 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:37:17,346-Speed 5965.50 samples/sec Loss 9.6277 LearningRate 0.1912 Epoch: 7 Global Step: 78340 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:37:24,208-Speed 5968.97 samples/sec Loss 9.6756 LearningRate 0.1912 Epoch: 7 Global Step: 78350 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:37:31,071-Speed 5969.63 samples/sec Loss 9.6094 LearningRate 0.1911 Epoch: 7 Global Step: 78360 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:37:37,945-Speed 5960.05 samples/sec Loss 9.6569 LearningRate 0.1911 Epoch: 7 Global Step: 78370 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:37:44,800-Speed 5976.20 samples/sec Loss 9.6556 LearningRate 0.1911 Epoch: 7 Global Step: 78380 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:37:51,658-Speed 5974.01 samples/sec Loss 9.6149 LearningRate 0.1911 Epoch: 7 Global Step: 78390 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:37:58,511-Speed 5977.18 samples/sec Loss 9.6532 LearningRate 0.1910 Epoch: 7 Global Step: 78400 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:38:05,358-Speed 5982.92 samples/sec Loss 9.7388 LearningRate 0.1910 Epoch: 7 Global Step: 78410 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:38:12,226-Speed 5965.06 samples/sec Loss 9.6528 LearningRate 0.1910 Epoch: 7 Global Step: 78420 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:38:19,080-Speed 5977.44 samples/sec Loss 9.6208 LearningRate 0.1909 Epoch: 7 Global Step: 78430 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:38:25,947-Speed 5965.61 samples/sec Loss 9.4936 LearningRate 0.1909 Epoch: 7 Global Step: 78440 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:38:32,841-Speed 5943.02 samples/sec Loss 9.6574 LearningRate 0.1909 Epoch: 7 Global Step: 78450 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:38:39,697-Speed 5974.72 samples/sec Loss 9.5436 LearningRate 0.1908 Epoch: 7 Global Step: 78460 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:38:46,572-Speed 5960.58 samples/sec Loss 9.5046 LearningRate 0.1908 Epoch: 7 Global Step: 78470 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 11:38:53,428-Speed 5975.61 samples/sec Loss 9.5768 LearningRate 0.1908 Epoch: 7 Global Step: 78480 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 11:39:00,305-Speed 5957.34 samples/sec Loss 9.6054 LearningRate 0.1908 Epoch: 7 Global Step: 78490 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 11:39:07,157-Speed 5978.54 samples/sec Loss 9.6585 LearningRate 0.1907 Epoch: 7 Global Step: 78500 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:39:14,030-Speed 5961.08 samples/sec Loss 9.6160 LearningRate 0.1907 Epoch: 7 Global Step: 78510 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:39:20,887-Speed 5974.78 samples/sec Loss 9.6033 LearningRate 0.1907 Epoch: 7 Global Step: 78520 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:39:27,744-Speed 5976.71 samples/sec Loss 9.6040 LearningRate 0.1906 Epoch: 7 Global Step: 78530 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:39:34,598-Speed 5976.90 samples/sec Loss 9.5513 LearningRate 0.1906 Epoch: 7 Global Step: 78540 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:39:41,500-Speed 5977.37 samples/sec Loss 9.6019 LearningRate 0.1906 Epoch: 7 Global Step: 78550 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:39:48,359-Speed 5973.37 samples/sec Loss 9.5931 LearningRate 0.1905 Epoch: 7 Global Step: 78560 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:39:55,226-Speed 5966.18 samples/sec Loss 9.6342 LearningRate 0.1905 Epoch: 7 Global Step: 78570 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:40:02,074-Speed 5982.07 samples/sec Loss 9.6179 LearningRate 0.1905 Epoch: 7 Global Step: 78580 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:40:08,945-Speed 5963.97 samples/sec Loss 9.5617 LearningRate 0.1905 Epoch: 7 Global Step: 78590 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:40:15,792-Speed 5984.07 samples/sec Loss 9.6373 LearningRate 0.1904 Epoch: 7 Global Step: 78600 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:40:22,651-Speed 5972.44 samples/sec Loss 9.6010 LearningRate 0.1904 Epoch: 7 Global Step: 78610 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:40:29,530-Speed 5955.78 samples/sec Loss 9.6622 LearningRate 0.1904 Epoch: 7 Global Step: 78620 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 11:40:36,387-Speed 5974.65 samples/sec Loss 9.6177 LearningRate 0.1903 Epoch: 7 Global Step: 78630 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 11:40:43,259-Speed 5961.35 samples/sec Loss 9.5432 LearningRate 0.1903 Epoch: 7 Global Step: 78640 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 11:40:50,114-Speed 5975.88 samples/sec Loss 9.5999 LearningRate 0.1903 Epoch: 7 Global Step: 78650 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 11:40:56,973-Speed 5973.43 samples/sec Loss 9.5573 LearningRate 0.1903 Epoch: 7 Global Step: 78660 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 11:41:03,833-Speed 5972.23 samples/sec Loss 9.6367 LearningRate 0.1902 Epoch: 7 Global Step: 78670 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 11:41:10,685-Speed 5979.17 samples/sec Loss 9.5932 LearningRate 0.1902 Epoch: 7 Global Step: 78680 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 11:41:17,548-Speed 5969.28 samples/sec Loss 9.5990 LearningRate 0.1902 Epoch: 7 Global Step: 78690 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 11:41:24,417-Speed 5964.54 samples/sec Loss 9.5971 LearningRate 0.1901 Epoch: 7 Global Step: 78700 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 11:41:31,283-Speed 5967.20 samples/sec Loss 9.6279 LearningRate 0.1901 Epoch: 7 Global Step: 78710 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 11:41:38,143-Speed 5971.67 samples/sec Loss 9.6074 LearningRate 0.1901 Epoch: 7 Global Step: 78720 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:41:45,054-Speed 5927.78 samples/sec Loss 9.6150 LearningRate 0.1900 Epoch: 7 Global Step: 78730 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:41:51,913-Speed 5972.92 samples/sec Loss 9.6743 LearningRate 0.1900 Epoch: 7 Global Step: 78740 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:41:58,770-Speed 5974.49 samples/sec Loss 9.6107 LearningRate 0.1900 Epoch: 7 Global Step: 78750 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:42:05,617-Speed 5983.44 samples/sec Loss 9.5691 LearningRate 0.1900 Epoch: 7 Global Step: 78760 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:42:12,476-Speed 5972.42 samples/sec Loss 9.6306 LearningRate 0.1899 Epoch: 7 Global Step: 78770 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:42:19,359-Speed 5952.97 samples/sec Loss 9.5430 LearningRate 0.1899 Epoch: 7 Global Step: 78780 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:42:26,263-Speed 5933.28 samples/sec Loss 9.5912 LearningRate 0.1899 Epoch: 7 Global Step: 78790 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:42:33,178-Speed 5924.20 samples/sec Loss 9.5605 LearningRate 0.1898 Epoch: 7 Global Step: 78800 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:42:40,062-Speed 5951.44 samples/sec Loss 9.5921 LearningRate 0.1898 Epoch: 7 Global Step: 78810 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:42:46,943-Speed 5953.52 samples/sec Loss 9.5995 LearningRate 0.1898 Epoch: 7 Global Step: 78820 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 11:42:53,794-Speed 5980.42 samples/sec Loss 9.6135 LearningRate 0.1898 Epoch: 7 Global Step: 78830 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:43:00,657-Speed 5969.15 samples/sec Loss 9.5097 LearningRate 0.1897 Epoch: 7 Global Step: 78840 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:43:07,523-Speed 5966.77 samples/sec Loss 9.6004 LearningRate 0.1897 Epoch: 7 Global Step: 78850 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:43:14,371-Speed 5981.79 samples/sec Loss 9.6341 LearningRate 0.1897 Epoch: 7 Global Step: 78860 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:43:21,224-Speed 5978.02 samples/sec Loss 9.6168 LearningRate 0.1896 Epoch: 7 Global Step: 78870 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:43:28,096-Speed 5962.84 samples/sec Loss 9.5753 LearningRate 0.1896 Epoch: 7 Global Step: 78880 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:43:34,978-Speed 5952.27 samples/sec Loss 9.6524 LearningRate 0.1896 Epoch: 7 Global Step: 78890 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:43:41,828-Speed 5980.50 samples/sec Loss 9.6109 LearningRate 0.1895 Epoch: 7 Global Step: 78900 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:43:48,688-Speed 5973.62 samples/sec Loss 9.5836 LearningRate 0.1895 Epoch: 7 Global Step: 78910 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:43:55,535-Speed 5983.67 samples/sec Loss 9.4990 LearningRate 0.1895 Epoch: 7 Global Step: 78920 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:44:02,397-Speed 5970.59 samples/sec Loss 9.5929 LearningRate 0.1895 Epoch: 7 Global Step: 78930 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 11:44:09,241-Speed 5987.76 samples/sec Loss 9.6404 LearningRate 0.1894 Epoch: 7 Global Step: 78940 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:44:16,149-Speed 5930.76 samples/sec Loss 9.6790 LearningRate 0.1894 Epoch: 7 Global Step: 78950 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:44:23,026-Speed 5958.03 samples/sec Loss 9.6457 LearningRate 0.1894 Epoch: 7 Global Step: 78960 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:44:29,881-Speed 5976.41 samples/sec Loss 9.6012 LearningRate 0.1893 Epoch: 7 Global Step: 78970 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:44:36,744-Speed 5970.03 samples/sec Loss 9.6226 LearningRate 0.1893 Epoch: 7 Global Step: 78980 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:44:43,596-Speed 5978.77 samples/sec Loss 9.6830 LearningRate 0.1893 Epoch: 7 Global Step: 78990 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:44:50,444-Speed 5981.95 samples/sec Loss 9.6446 LearningRate 0.1893 Epoch: 7 Global Step: 79000 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:44:57,298-Speed 5977.07 samples/sec Loss 9.5844 LearningRate 0.1892 Epoch: 7 Global Step: 79010 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:45:04,155-Speed 5977.99 samples/sec Loss 9.5337 LearningRate 0.1892 Epoch: 7 Global Step: 79020 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:45:11,000-Speed 5984.59 samples/sec Loss 9.6409 LearningRate 0.1892 Epoch: 7 Global Step: 79030 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:45:17,853-Speed 5978.63 samples/sec Loss 9.6007 LearningRate 0.1891 Epoch: 7 Global Step: 79040 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 11:45:24,713-Speed 5972.44 samples/sec Loss 9.6003 LearningRate 0.1891 Epoch: 7 Global Step: 79050 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 11:45:31,586-Speed 5960.62 samples/sec Loss 9.5894 LearningRate 0.1891 Epoch: 7 Global Step: 79060 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:45:38,439-Speed 5978.65 samples/sec Loss 9.6359 LearningRate 0.1890 Epoch: 7 Global Step: 79070 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:45:45,358-Speed 5922.81 samples/sec Loss 9.6267 LearningRate 0.1890 Epoch: 7 Global Step: 79080 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:45:52,224-Speed 5966.58 samples/sec Loss 9.6247 LearningRate 0.1890 Epoch: 7 Global Step: 79090 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:45:59,107-Speed 5952.14 samples/sec Loss 9.5552 LearningRate 0.1890 Epoch: 7 Global Step: 79100 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:46:05,988-Speed 5954.45 samples/sec Loss 9.5714 LearningRate 0.1889 Epoch: 7 Global Step: 79110 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:46:12,854-Speed 5966.33 samples/sec Loss 9.6368 LearningRate 0.1889 Epoch: 7 Global Step: 79120 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:46:19,735-Speed 5954.10 samples/sec Loss 9.5669 LearningRate 0.1889 Epoch: 7 Global Step: 79130 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:46:26,591-Speed 5975.34 samples/sec Loss 9.5393 LearningRate 0.1888 Epoch: 7 Global Step: 79140 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:46:33,442-Speed 5980.13 samples/sec Loss 9.6504 LearningRate 0.1888 Epoch: 7 Global Step: 79150 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:46:40,285-Speed 5986.15 samples/sec Loss 9.5942 LearningRate 0.1888 Epoch: 7 Global Step: 79160 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 11:46:47,136-Speed 5979.59 samples/sec Loss 9.5823 LearningRate 0.1887 Epoch: 7 Global Step: 79170 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 11:46:53,989-Speed 5977.78 samples/sec Loss 9.6217 LearningRate 0.1887 Epoch: 7 Global Step: 79180 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 11:47:00,830-Speed 5988.87 samples/sec Loss 9.5978 LearningRate 0.1887 Epoch: 7 Global Step: 79190 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:47:07,675-Speed 5985.08 samples/sec Loss 9.6152 LearningRate 0.1887 Epoch: 7 Global Step: 79200 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:47:14,563-Speed 5947.81 samples/sec Loss 9.5068 LearningRate 0.1886 Epoch: 7 Global Step: 79210 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:47:21,424-Speed 5970.90 samples/sec Loss 9.5578 LearningRate 0.1886 Epoch: 7 Global Step: 79220 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:47:28,399-Speed 5874.08 samples/sec Loss 9.5335 LearningRate 0.1886 Epoch: 7 Global Step: 79230 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:47:35,266-Speed 5967.53 samples/sec Loss 9.5805 LearningRate 0.1885 Epoch: 7 Global Step: 79240 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:47:42,155-Speed 5946.40 samples/sec Loss 9.5534 LearningRate 0.1885 Epoch: 7 Global Step: 79250 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:47:49,010-Speed 5976.68 samples/sec Loss 9.5377 LearningRate 0.1885 Epoch: 7 Global Step: 79260 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:47:55,889-Speed 5954.94 samples/sec Loss 9.6167 LearningRate 0.1885 Epoch: 7 Global Step: 79270 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:48:02,744-Speed 5976.43 samples/sec Loss 9.6468 LearningRate 0.1884 Epoch: 7 Global Step: 79280 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:48:09,666-Speed 5921.47 samples/sec Loss 9.5608 LearningRate 0.1884 Epoch: 7 Global Step: 79290 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 11:48:16,552-Speed 5948.89 samples/sec Loss 9.6391 LearningRate 0.1884 Epoch: 7 Global Step: 79300 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:48:23,474-Speed 5918.48 samples/sec Loss 9.5947 LearningRate 0.1883 Epoch: 7 Global Step: 79310 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:48:30,382-Speed 5931.22 samples/sec Loss 9.5583 LearningRate 0.1883 Epoch: 7 Global Step: 79320 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:48:37,255-Speed 5960.69 samples/sec Loss 9.5307 LearningRate 0.1883 Epoch: 7 Global Step: 79330 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:48:44,125-Speed 5963.64 samples/sec Loss 9.5643 LearningRate 0.1882 Epoch: 7 Global Step: 79340 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:48:50,978-Speed 5977.37 samples/sec Loss 9.5994 LearningRate 0.1882 Epoch: 7 Global Step: 79350 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:48:57,826-Speed 5982.53 samples/sec Loss 9.5313 LearningRate 0.1882 Epoch: 7 Global Step: 79360 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:49:04,687-Speed 5971.62 samples/sec Loss 9.6137 LearningRate 0.1882 Epoch: 7 Global Step: 79370 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 11:49:11,545-Speed 5973.57 samples/sec Loss 9.5886 LearningRate 0.1881 Epoch: 7 Global Step: 79380 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 11:49:18,430-Speed 5949.49 samples/sec Loss 9.6017 LearningRate 0.1881 Epoch: 7 Global Step: 79390 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 11:49:25,328-Speed 5939.84 samples/sec Loss 9.6249 LearningRate 0.1881 Epoch: 7 Global Step: 79400 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 11:49:32,193-Speed 5967.17 samples/sec Loss 9.5834 LearningRate 0.1880 Epoch: 7 Global Step: 79410 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 11:49:39,051-Speed 5974.26 samples/sec Loss 9.5443 LearningRate 0.1880 Epoch: 7 Global Step: 79420 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 11:49:45,935-Speed 5951.35 samples/sec Loss 9.5734 LearningRate 0.1880 Epoch: 7 Global Step: 79430 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 11:49:52,791-Speed 5975.34 samples/sec Loss 9.5454 LearningRate 0.1880 Epoch: 7 Global Step: 79440 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 11:49:59,644-Speed 5978.92 samples/sec Loss 9.6537 LearningRate 0.1879 Epoch: 7 Global Step: 79450 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 11:50:06,490-Speed 5984.36 samples/sec Loss 9.5864 LearningRate 0.1879 Epoch: 7 Global Step: 79460 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-01-08 11:50:13,351-Speed 5971.00 samples/sec Loss 9.5968 LearningRate 0.1879 Epoch: 7 Global Step: 79470 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-01-08 11:50:20,200-Speed 5981.52 samples/sec Loss 9.5078 LearningRate 0.1878 Epoch: 7 Global Step: 79480 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-01-08 11:50:27,046-Speed 5984.15 samples/sec Loss 9.5599 LearningRate 0.1878 Epoch: 7 Global Step: 79490 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-01-08 11:50:33,918-Speed 5961.52 samples/sec Loss 9.5727 LearningRate 0.1878 Epoch: 7 Global Step: 79500 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-01-08 11:50:40,799-Speed 5953.78 samples/sec Loss 9.4855 LearningRate 0.1877 Epoch: 7 Global Step: 79510 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-01-08 11:50:47,653-Speed 5977.47 samples/sec Loss 9.5125 LearningRate 0.1877 Epoch: 7 Global Step: 79520 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-01-08 11:50:54,508-Speed 5975.98 samples/sec Loss 9.5962 LearningRate 0.1877 Epoch: 7 Global Step: 79530 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-01-08 11:51:01,357-Speed 5981.50 samples/sec Loss 9.6399 LearningRate 0.1877 Epoch: 7 Global Step: 79540 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-01-08 11:51:08,266-Speed 5929.89 samples/sec Loss 9.6618 LearningRate 0.1876 Epoch: 7 Global Step: 79550 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-01-08 11:51:15,219-Speed 5892.68 samples/sec Loss 9.5404 LearningRate 0.1876 Epoch: 7 Global Step: 79560 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 11:51:22,071-Speed 5980.18 samples/sec Loss 9.4591 LearningRate 0.1876 Epoch: 7 Global Step: 79570 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 11:51:28,937-Speed 5966.20 samples/sec Loss 9.6284 LearningRate 0.1875 Epoch: 7 Global Step: 79580 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 11:51:35,799-Speed 5970.41 samples/sec Loss 9.6079 LearningRate 0.1875 Epoch: 7 Global Step: 79590 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 11:51:42,644-Speed 5984.64 samples/sec Loss 9.5349 LearningRate 0.1875 Epoch: 7 Global Step: 79600 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 11:51:49,503-Speed 5973.16 samples/sec Loss 9.5305 LearningRate 0.1875 Epoch: 7 Global Step: 79610 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 11:51:56,339-Speed 5992.37 samples/sec Loss 9.6283 LearningRate 0.1874 Epoch: 7 Global Step: 79620 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 11:52:03,199-Speed 5972.61 samples/sec Loss 9.4766 LearningRate 0.1874 Epoch: 7 Global Step: 79630 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 11:52:10,040-Speed 5988.66 samples/sec Loss 9.5876 LearningRate 0.1874 Epoch: 7 Global Step: 79640 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 11:52:16,886-Speed 5984.46 samples/sec Loss 9.5538 LearningRate 0.1873 Epoch: 7 Global Step: 79650 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 11:52:23,762-Speed 5957.66 samples/sec Loss 9.5640 LearningRate 0.1873 Epoch: 7 Global Step: 79660 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 11:52:30,627-Speed 5968.02 samples/sec Loss 9.5228 LearningRate 0.1873 Epoch: 7 Global Step: 79670 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 11:52:37,500-Speed 5960.37 samples/sec Loss 9.5331 LearningRate 0.1873 Epoch: 7 Global Step: 79680 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 11:52:44,363-Speed 5969.30 samples/sec Loss 9.5451 LearningRate 0.1872 Epoch: 7 Global Step: 79690 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 11:52:51,238-Speed 5959.44 samples/sec Loss 9.5264 LearningRate 0.1872 Epoch: 7 Global Step: 79700 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 11:52:58,094-Speed 5974.76 samples/sec Loss 9.4435 LearningRate 0.1872 Epoch: 7 Global Step: 79710 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 11:53:04,947-Speed 5978.50 samples/sec Loss 9.4963 LearningRate 0.1871 Epoch: 7 Global Step: 79720 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 11:53:11,803-Speed 5976.08 samples/sec Loss 9.5346 LearningRate 0.1871 Epoch: 7 Global Step: 79730 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 11:53:18,687-Speed 5956.68 samples/sec Loss 9.6122 LearningRate 0.1871 Epoch: 7 Global Step: 79740 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 11:53:25,565-Speed 5955.78 samples/sec Loss 9.5094 LearningRate 0.1870 Epoch: 7 Global Step: 79750 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 11:53:32,424-Speed 5972.78 samples/sec Loss 9.4621 LearningRate 0.1870 Epoch: 7 Global Step: 79760 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:53:39,277-Speed 5977.85 samples/sec Loss 9.4942 LearningRate 0.1870 Epoch: 7 Global Step: 79770 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:53:46,146-Speed 5964.05 samples/sec Loss 9.5875 LearningRate 0.1870 Epoch: 7 Global Step: 79780 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:53:53,014-Speed 5967.89 samples/sec Loss 9.5110 LearningRate 0.1869 Epoch: 7 Global Step: 79790 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:53:59,868-Speed 5977.00 samples/sec Loss 9.4948 LearningRate 0.1869 Epoch: 7 Global Step: 79800 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:54:06,714-Speed 5984.46 samples/sec Loss 9.5054 LearningRate 0.1869 Epoch: 7 Global Step: 79810 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:54:13,596-Speed 5952.91 samples/sec Loss 9.4955 LearningRate 0.1868 Epoch: 7 Global Step: 79820 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:54:20,448-Speed 5978.71 samples/sec Loss 9.4660 LearningRate 0.1868 Epoch: 7 Global Step: 79830 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:54:27,343-Speed 5942.11 samples/sec Loss 9.5510 LearningRate 0.1868 Epoch: 7 Global Step: 79840 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:54:34,189-Speed 5983.46 samples/sec Loss 9.4905 LearningRate 0.1868 Epoch: 7 Global Step: 79850 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:54:41,045-Speed 5975.62 samples/sec Loss 9.5010 LearningRate 0.1867 Epoch: 7 Global Step: 79860 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 11:54:48,671-Speed 5373.73 samples/sec Loss 9.4900 LearningRate 0.1867 Epoch: 7 Global Step: 79870 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 11:54:55,540-Speed 5967.35 samples/sec Loss 9.5148 LearningRate 0.1867 Epoch: 7 Global Step: 79880 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 11:55:02,397-Speed 5973.91 samples/sec Loss 9.5700 LearningRate 0.1866 Epoch: 7 Global Step: 79890 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 11:55:09,257-Speed 5971.68 samples/sec Loss 9.5856 LearningRate 0.1866 Epoch: 7 Global Step: 79900 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 11:55:16,128-Speed 5962.93 samples/sec Loss 9.5497 LearningRate 0.1866 Epoch: 7 Global Step: 79910 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 11:55:23,024-Speed 5941.33 samples/sec Loss 9.4699 LearningRate 0.1865 Epoch: 7 Global Step: 79920 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 11:55:29,911-Speed 5948.44 samples/sec Loss 9.5203 LearningRate 0.1865 Epoch: 7 Global Step: 79930 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:55:36,789-Speed 5956.34 samples/sec Loss 9.5646 LearningRate 0.1865 Epoch: 7 Global Step: 79940 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:55:43,648-Speed 5973.01 samples/sec Loss 9.5517 LearningRate 0.1865 Epoch: 7 Global Step: 79950 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:55:50,523-Speed 5960.88 samples/sec Loss 9.5407 LearningRate 0.1864 Epoch: 7 Global Step: 79960 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:55:57,372-Speed 5982.22 samples/sec Loss 9.5041 LearningRate 0.1864 Epoch: 7 Global Step: 79970 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:56:04,222-Speed 5982.99 samples/sec Loss 9.5452 LearningRate 0.1864 Epoch: 7 Global Step: 79980 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:56:11,104-Speed 5952.63 samples/sec Loss 9.5323 LearningRate 0.1863 Epoch: 7 Global Step: 79990 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:56:17,976-Speed 5962.58 samples/sec Loss 9.4844 LearningRate 0.1863 Epoch: 7 Global Step: 80000 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:56:49,243-[lfw][80000]XNorm: 22.646365 Training: 2022-01-08 11:56:49,244-[lfw][80000]Accuracy-Flip: 0.99750+-0.00327 Training: 2022-01-08 11:56:49,245-[lfw][80000]Accuracy-Highest: 0.99750 Training: 2022-01-08 11:57:20,102-[cfp_fp][80000]XNorm: 19.757936 Training: 2022-01-08 11:57:20,102-[cfp_fp][80000]Accuracy-Flip: 0.97786+-0.00659 Training: 2022-01-08 11:57:20,102-[cfp_fp][80000]Accuracy-Highest: 0.97786 Training: 2022-01-08 11:57:46,565-[agedb_30][80000]XNorm: 22.282000 Training: 2022-01-08 11:57:46,566-[agedb_30][80000]Accuracy-Flip: 0.96883+-0.00742 Training: 2022-01-08 11:57:46,566-[agedb_30][80000]Accuracy-Highest: 0.96883 Training: 2022-01-08 11:57:53,425-Speed 429.13 samples/sec Loss 9.5204 LearningRate 0.1863 Epoch: 7 Global Step: 80010 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:58:00,290-Speed 5968.42 samples/sec Loss 9.5827 LearningRate 0.1863 Epoch: 7 Global Step: 80020 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:58:07,142-Speed 5979.23 samples/sec Loss 9.5709 LearningRate 0.1862 Epoch: 7 Global Step: 80030 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 11:58:13,981-Speed 5990.83 samples/sec Loss 9.5513 LearningRate 0.1862 Epoch: 7 Global Step: 80040 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 11:58:20,834-Speed 5977.83 samples/sec Loss 9.5179 LearningRate 0.1862 Epoch: 7 Global Step: 80050 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 11:58:27,705-Speed 5962.07 samples/sec Loss 9.5389 LearningRate 0.1861 Epoch: 7 Global Step: 80060 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:58:34,567-Speed 5970.53 samples/sec Loss 9.5530 LearningRate 0.1861 Epoch: 7 Global Step: 80070 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:58:41,433-Speed 5967.46 samples/sec Loss 9.5194 LearningRate 0.1861 Epoch: 7 Global Step: 80080 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:58:48,299-Speed 5966.53 samples/sec Loss 9.5177 LearningRate 0.1861 Epoch: 7 Global Step: 80090 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:58:55,169-Speed 5963.10 samples/sec Loss 9.5463 LearningRate 0.1860 Epoch: 7 Global Step: 80100 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:59:02,021-Speed 5979.39 samples/sec Loss 9.5778 LearningRate 0.1860 Epoch: 7 Global Step: 80110 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:59:08,921-Speed 5938.10 samples/sec Loss 9.5347 LearningRate 0.1860 Epoch: 7 Global Step: 80120 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:59:15,766-Speed 5984.23 samples/sec Loss 9.4851 LearningRate 0.1859 Epoch: 7 Global Step: 80130 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:59:22,635-Speed 5964.31 samples/sec Loss 9.5789 LearningRate 0.1859 Epoch: 7 Global Step: 80140 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:59:29,495-Speed 5972.11 samples/sec Loss 9.4104 LearningRate 0.1859 Epoch: 7 Global Step: 80150 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 11:59:36,349-Speed 5977.22 samples/sec Loss 9.4212 LearningRate 0.1858 Epoch: 7 Global Step: 80160 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 11:59:43,226-Speed 5956.94 samples/sec Loss 9.5122 LearningRate 0.1858 Epoch: 7 Global Step: 80170 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 11:59:50,078-Speed 5979.72 samples/sec Loss 9.5882 LearningRate 0.1858 Epoch: 7 Global Step: 80180 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 11:59:56,922-Speed 5985.11 samples/sec Loss 9.5256 LearningRate 0.1858 Epoch: 7 Global Step: 80190 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 12:00:03,771-Speed 5981.14 samples/sec Loss 9.5012 LearningRate 0.1857 Epoch: 7 Global Step: 80200 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:00:10,625-Speed 5978.25 samples/sec Loss 9.4985 LearningRate 0.1857 Epoch: 7 Global Step: 80210 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:00:17,468-Speed 5986.47 samples/sec Loss 9.4694 LearningRate 0.1857 Epoch: 7 Global Step: 80220 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:00:24,324-Speed 5974.90 samples/sec Loss 9.4948 LearningRate 0.1856 Epoch: 7 Global Step: 80230 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:00:31,182-Speed 5974.14 samples/sec Loss 9.5539 LearningRate 0.1856 Epoch: 7 Global Step: 80240 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:00:38,084-Speed 5935.85 samples/sec Loss 9.5141 LearningRate 0.1856 Epoch: 7 Global Step: 80250 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:00:44,971-Speed 5950.14 samples/sec Loss 9.5297 LearningRate 0.1856 Epoch: 7 Global Step: 80260 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:00:51,851-Speed 5953.97 samples/sec Loss 9.4881 LearningRate 0.1855 Epoch: 7 Global Step: 80270 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:00:58,701-Speed 5982.23 samples/sec Loss 9.4859 LearningRate 0.1855 Epoch: 7 Global Step: 80280 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:01:05,552-Speed 5979.99 samples/sec Loss 9.4559 LearningRate 0.1855 Epoch: 7 Global Step: 80290 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:01:12,401-Speed 5981.66 samples/sec Loss 9.6212 LearningRate 0.1854 Epoch: 7 Global Step: 80300 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 12:01:19,263-Speed 5970.31 samples/sec Loss 9.5068 LearningRate 0.1854 Epoch: 7 Global Step: 80310 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 12:01:26,113-Speed 5980.58 samples/sec Loss 9.6056 LearningRate 0.1854 Epoch: 7 Global Step: 80320 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:01:32,965-Speed 5978.98 samples/sec Loss 9.5623 LearningRate 0.1853 Epoch: 7 Global Step: 80330 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:01:39,807-Speed 5987.41 samples/sec Loss 9.4717 LearningRate 0.1853 Epoch: 7 Global Step: 80340 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:01:46,665-Speed 5975.09 samples/sec Loss 9.5138 LearningRate 0.1853 Epoch: 7 Global Step: 80350 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:01:53,518-Speed 5978.36 samples/sec Loss 9.4482 LearningRate 0.1853 Epoch: 7 Global Step: 80360 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:02:00,377-Speed 5972.92 samples/sec Loss 9.5298 LearningRate 0.1852 Epoch: 7 Global Step: 80370 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:02:07,237-Speed 5974.76 samples/sec Loss 9.4438 LearningRate 0.1852 Epoch: 7 Global Step: 80380 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:02:14,092-Speed 5985.60 samples/sec Loss 9.4692 LearningRate 0.1852 Epoch: 7 Global Step: 80390 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:02:20,944-Speed 5981.58 samples/sec Loss 9.4284 LearningRate 0.1851 Epoch: 7 Global Step: 80400 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:02:27,826-Speed 5952.89 samples/sec Loss 9.4733 LearningRate 0.1851 Epoch: 7 Global Step: 80410 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:02:34,680-Speed 5977.72 samples/sec Loss 9.4840 LearningRate 0.1851 Epoch: 7 Global Step: 80420 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 12:02:41,527-Speed 5982.63 samples/sec Loss 9.4532 LearningRate 0.1851 Epoch: 7 Global Step: 80430 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 12:02:48,386-Speed 5975.98 samples/sec Loss 9.4421 LearningRate 0.1850 Epoch: 7 Global Step: 80440 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 12:02:55,226-Speed 5990.14 samples/sec Loss 9.5348 LearningRate 0.1850 Epoch: 7 Global Step: 80450 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:03:02,076-Speed 5981.01 samples/sec Loss 9.4853 LearningRate 0.1850 Epoch: 7 Global Step: 80460 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:03:08,938-Speed 5969.48 samples/sec Loss 9.3668 LearningRate 0.1849 Epoch: 7 Global Step: 80470 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:03:15,793-Speed 5979.40 samples/sec Loss 9.5373 LearningRate 0.1849 Epoch: 7 Global Step: 80480 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:03:22,654-Speed 5971.57 samples/sec Loss 9.4630 LearningRate 0.1849 Epoch: 7 Global Step: 80490 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:03:29,508-Speed 5976.79 samples/sec Loss 9.5376 LearningRate 0.1849 Epoch: 7 Global Step: 80500 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:03:36,387-Speed 5957.77 samples/sec Loss 9.4602 LearningRate 0.1848 Epoch: 7 Global Step: 80510 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:03:43,229-Speed 5988.60 samples/sec Loss 9.5352 LearningRate 0.1848 Epoch: 7 Global Step: 80520 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:03:50,078-Speed 5981.42 samples/sec Loss 9.4863 LearningRate 0.1848 Epoch: 7 Global Step: 80530 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:03:56,942-Speed 5968.19 samples/sec Loss 9.4750 LearningRate 0.1847 Epoch: 7 Global Step: 80540 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:04:03,813-Speed 5963.15 samples/sec Loss 9.5047 LearningRate 0.1847 Epoch: 7 Global Step: 80550 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:04:10,665-Speed 5978.43 samples/sec Loss 9.5001 LearningRate 0.1847 Epoch: 7 Global Step: 80560 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:04:17,522-Speed 5974.91 samples/sec Loss 9.4964 LearningRate 0.1846 Epoch: 7 Global Step: 80570 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:04:24,396-Speed 5959.42 samples/sec Loss 9.5056 LearningRate 0.1846 Epoch: 7 Global Step: 80580 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:04:31,256-Speed 5972.56 samples/sec Loss 9.4805 LearningRate 0.1846 Epoch: 7 Global Step: 80590 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:04:38,110-Speed 5976.94 samples/sec Loss 9.6440 LearningRate 0.1846 Epoch: 7 Global Step: 80600 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:04:44,995-Speed 5950.72 samples/sec Loss 9.5293 LearningRate 0.1845 Epoch: 7 Global Step: 80610 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:04:51,847-Speed 5978.70 samples/sec Loss 9.5175 LearningRate 0.1845 Epoch: 7 Global Step: 80620 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:04:58,727-Speed 5954.53 samples/sec Loss 9.5153 LearningRate 0.1845 Epoch: 7 Global Step: 80630 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:05:05,597-Speed 5963.46 samples/sec Loss 9.5230 LearningRate 0.1844 Epoch: 7 Global Step: 80640 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:05:12,453-Speed 5974.86 samples/sec Loss 9.4571 LearningRate 0.1844 Epoch: 7 Global Step: 80650 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:05:19,305-Speed 5978.72 samples/sec Loss 9.5047 LearningRate 0.1844 Epoch: 7 Global Step: 80660 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:05:26,173-Speed 5965.52 samples/sec Loss 9.4999 LearningRate 0.1844 Epoch: 7 Global Step: 80670 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:05:33,024-Speed 5979.34 samples/sec Loss 9.5124 LearningRate 0.1843 Epoch: 7 Global Step: 80680 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 12:05:39,880-Speed 5975.19 samples/sec Loss 9.5498 LearningRate 0.1843 Epoch: 7 Global Step: 80690 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:05:46,750-Speed 5964.15 samples/sec Loss 9.4766 LearningRate 0.1843 Epoch: 7 Global Step: 80700 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:05:53,618-Speed 5964.68 samples/sec Loss 9.5756 LearningRate 0.1842 Epoch: 7 Global Step: 80710 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:06:00,478-Speed 5971.36 samples/sec Loss 9.4559 LearningRate 0.1842 Epoch: 7 Global Step: 80720 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:06:07,323-Speed 5984.92 samples/sec Loss 9.4780 LearningRate 0.1842 Epoch: 7 Global Step: 80730 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:06:14,163-Speed 5988.94 samples/sec Loss 9.4690 LearningRate 0.1842 Epoch: 7 Global Step: 80740 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:06:21,057-Speed 5943.59 samples/sec Loss 9.5478 LearningRate 0.1841 Epoch: 7 Global Step: 80750 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:06:27,913-Speed 5974.73 samples/sec Loss 9.5014 LearningRate 0.1841 Epoch: 7 Global Step: 80760 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:06:34,791-Speed 5956.58 samples/sec Loss 9.5687 LearningRate 0.1841 Epoch: 7 Global Step: 80770 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:06:41,638-Speed 5983.33 samples/sec Loss 9.4663 LearningRate 0.1840 Epoch: 7 Global Step: 80780 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:06:48,501-Speed 5969.43 samples/sec Loss 9.5124 LearningRate 0.1840 Epoch: 7 Global Step: 80790 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:06:55,383-Speed 5952.24 samples/sec Loss 9.5592 LearningRate 0.1840 Epoch: 7 Global Step: 80800 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:07:02,237-Speed 5977.31 samples/sec Loss 9.3976 LearningRate 0.1840 Epoch: 7 Global Step: 80810 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:07:09,088-Speed 5980.05 samples/sec Loss 9.4367 LearningRate 0.1839 Epoch: 7 Global Step: 80820 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:07:15,949-Speed 5973.58 samples/sec Loss 9.4167 LearningRate 0.1839 Epoch: 7 Global Step: 80830 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:07:22,813-Speed 5968.46 samples/sec Loss 9.4742 LearningRate 0.1839 Epoch: 7 Global Step: 80840 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:07:29,702-Speed 5946.30 samples/sec Loss 9.5133 LearningRate 0.1838 Epoch: 7 Global Step: 80850 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:07:36,557-Speed 5976.95 samples/sec Loss 9.4694 LearningRate 0.1838 Epoch: 7 Global Step: 80860 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:07:43,430-Speed 5960.51 samples/sec Loss 9.4055 LearningRate 0.1838 Epoch: 7 Global Step: 80870 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:07:50,284-Speed 5979.27 samples/sec Loss 9.4394 LearningRate 0.1837 Epoch: 7 Global Step: 80880 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:07:57,144-Speed 5972.18 samples/sec Loss 9.4653 LearningRate 0.1837 Epoch: 7 Global Step: 80890 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:08:04,017-Speed 5960.29 samples/sec Loss 9.4052 LearningRate 0.1837 Epoch: 7 Global Step: 80900 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:08:10,868-Speed 5979.99 samples/sec Loss 9.4659 LearningRate 0.1837 Epoch: 7 Global Step: 80910 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:08:17,741-Speed 5959.82 samples/sec Loss 9.4983 LearningRate 0.1836 Epoch: 7 Global Step: 80920 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:08:24,654-Speed 5926.12 samples/sec Loss 9.4641 LearningRate 0.1836 Epoch: 7 Global Step: 80930 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:08:31,548-Speed 5942.87 samples/sec Loss 9.4927 LearningRate 0.1836 Epoch: 7 Global Step: 80940 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 12:08:38,426-Speed 5956.79 samples/sec Loss 9.5120 LearningRate 0.1835 Epoch: 7 Global Step: 80950 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 12:08:45,293-Speed 5965.86 samples/sec Loss 9.4999 LearningRate 0.1835 Epoch: 7 Global Step: 80960 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 12:08:52,161-Speed 5965.03 samples/sec Loss 9.4568 LearningRate 0.1835 Epoch: 7 Global Step: 80970 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:08:59,027-Speed 5966.91 samples/sec Loss 9.4052 LearningRate 0.1835 Epoch: 7 Global Step: 80980 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:09:05,934-Speed 5931.38 samples/sec Loss 9.4150 LearningRate 0.1834 Epoch: 7 Global Step: 80990 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:09:12,822-Speed 5948.10 samples/sec Loss 9.4296 LearningRate 0.1834 Epoch: 7 Global Step: 81000 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:09:19,672-Speed 5980.71 samples/sec Loss 9.5394 LearningRate 0.1834 Epoch: 7 Global Step: 81010 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:09:26,542-Speed 5963.98 samples/sec Loss 9.4532 LearningRate 0.1833 Epoch: 7 Global Step: 81020 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:09:33,403-Speed 5971.36 samples/sec Loss 9.4439 LearningRate 0.1833 Epoch: 7 Global Step: 81030 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:09:40,288-Speed 5950.19 samples/sec Loss 9.4789 LearningRate 0.1833 Epoch: 7 Global Step: 81040 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:09:47,165-Speed 5957.41 samples/sec Loss 9.5030 LearningRate 0.1833 Epoch: 7 Global Step: 81050 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:09:54,031-Speed 5967.88 samples/sec Loss 9.4417 LearningRate 0.1832 Epoch: 7 Global Step: 81060 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:10:00,885-Speed 5976.76 samples/sec Loss 9.4316 LearningRate 0.1832 Epoch: 7 Global Step: 81070 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:10:07,877-Speed 5859.85 samples/sec Loss 9.4534 LearningRate 0.1832 Epoch: 7 Global Step: 81080 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:10:14,848-Speed 5876.74 samples/sec Loss 9.4572 LearningRate 0.1831 Epoch: 7 Global Step: 81090 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:10:21,770-Speed 5918.51 samples/sec Loss 9.4684 LearningRate 0.1831 Epoch: 7 Global Step: 81100 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:10:28,660-Speed 5946.96 samples/sec Loss 9.4856 LearningRate 0.1831 Epoch: 7 Global Step: 81110 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:10:35,528-Speed 5964.88 samples/sec Loss 9.4176 LearningRate 0.1831 Epoch: 7 Global Step: 81120 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:10:42,408-Speed 5954.29 samples/sec Loss 9.4307 LearningRate 0.1830 Epoch: 7 Global Step: 81130 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:10:49,257-Speed 5982.09 samples/sec Loss 9.3939 LearningRate 0.1830 Epoch: 7 Global Step: 81140 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:10:56,106-Speed 5981.23 samples/sec Loss 9.4714 LearningRate 0.1830 Epoch: 7 Global Step: 81150 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:11:02,969-Speed 5968.85 samples/sec Loss 9.4783 LearningRate 0.1829 Epoch: 7 Global Step: 81160 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:11:09,830-Speed 5971.03 samples/sec Loss 9.4714 LearningRate 0.1829 Epoch: 7 Global Step: 81170 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:11:16,702-Speed 5962.06 samples/sec Loss 9.5003 LearningRate 0.1829 Epoch: 7 Global Step: 81180 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:11:23,590-Speed 5947.72 samples/sec Loss 9.4677 LearningRate 0.1828 Epoch: 7 Global Step: 81190 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:11:30,477-Speed 5949.38 samples/sec Loss 9.4692 LearningRate 0.1828 Epoch: 7 Global Step: 81200 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:11:37,362-Speed 5949.92 samples/sec Loss 9.4539 LearningRate 0.1828 Epoch: 7 Global Step: 81210 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 12:11:44,259-Speed 5939.53 samples/sec Loss 9.4942 LearningRate 0.1828 Epoch: 7 Global Step: 81220 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 12:11:51,196-Speed 5905.98 samples/sec Loss 9.3586 LearningRate 0.1827 Epoch: 7 Global Step: 81230 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:11:58,056-Speed 5974.26 samples/sec Loss 9.3721 LearningRate 0.1827 Epoch: 7 Global Step: 81240 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:12:04,910-Speed 5977.05 samples/sec Loss 9.5127 LearningRate 0.1827 Epoch: 7 Global Step: 81250 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:12:11,753-Speed 5986.80 samples/sec Loss 9.5181 LearningRate 0.1826 Epoch: 7 Global Step: 81260 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:12:18,598-Speed 5985.79 samples/sec Loss 9.3521 LearningRate 0.1826 Epoch: 7 Global Step: 81270 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:12:25,462-Speed 5968.76 samples/sec Loss 9.4581 LearningRate 0.1826 Epoch: 7 Global Step: 81280 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:12:32,321-Speed 5972.34 samples/sec Loss 9.3900 LearningRate 0.1826 Epoch: 7 Global Step: 81290 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:12:39,187-Speed 5966.96 samples/sec Loss 9.4388 LearningRate 0.1825 Epoch: 7 Global Step: 81300 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:12:46,062-Speed 5958.53 samples/sec Loss 9.4019 LearningRate 0.1825 Epoch: 7 Global Step: 81310 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:12:52,925-Speed 5969.60 samples/sec Loss 9.4093 LearningRate 0.1825 Epoch: 7 Global Step: 81320 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:12:59,779-Speed 5977.79 samples/sec Loss 9.4510 LearningRate 0.1824 Epoch: 7 Global Step: 81330 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:13:06,637-Speed 5973.39 samples/sec Loss 9.4259 LearningRate 0.1824 Epoch: 7 Global Step: 81340 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:13:13,482-Speed 5985.05 samples/sec Loss 9.4301 LearningRate 0.1824 Epoch: 7 Global Step: 81350 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:13:20,335-Speed 5976.97 samples/sec Loss 9.4003 LearningRate 0.1824 Epoch: 7 Global Step: 81360 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:13:27,215-Speed 5955.20 samples/sec Loss 9.4481 LearningRate 0.1823 Epoch: 7 Global Step: 81370 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:13:34,071-Speed 5976.27 samples/sec Loss 9.3942 LearningRate 0.1823 Epoch: 7 Global Step: 81380 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:13:40,937-Speed 5966.40 samples/sec Loss 9.4540 LearningRate 0.1823 Epoch: 7 Global Step: 81390 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:13:47,791-Speed 5977.82 samples/sec Loss 9.4445 LearningRate 0.1822 Epoch: 7 Global Step: 81400 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:13:54,643-Speed 5978.11 samples/sec Loss 9.4403 LearningRate 0.1822 Epoch: 7 Global Step: 81410 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:14:01,490-Speed 5982.56 samples/sec Loss 9.4523 LearningRate 0.1822 Epoch: 7 Global Step: 81420 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:14:08,455-Speed 5882.31 samples/sec Loss 9.3611 LearningRate 0.1822 Epoch: 7 Global Step: 81430 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:14:15,316-Speed 5971.22 samples/sec Loss 9.4809 LearningRate 0.1821 Epoch: 7 Global Step: 81440 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 12:14:22,176-Speed 5971.64 samples/sec Loss 9.5191 LearningRate 0.1821 Epoch: 7 Global Step: 81450 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 12:14:29,061-Speed 5950.20 samples/sec Loss 9.4071 LearningRate 0.1821 Epoch: 7 Global Step: 81460 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 12:14:35,939-Speed 5956.49 samples/sec Loss 9.3883 LearningRate 0.1820 Epoch: 7 Global Step: 81470 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 12:14:42,812-Speed 5961.05 samples/sec Loss 9.3989 LearningRate 0.1820 Epoch: 7 Global Step: 81480 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 12:14:49,703-Speed 5945.02 samples/sec Loss 9.4736 LearningRate 0.1820 Epoch: 7 Global Step: 81490 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 12:14:56,568-Speed 5967.87 samples/sec Loss 9.3818 LearningRate 0.1820 Epoch: 7 Global Step: 81500 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 12:15:03,438-Speed 5963.35 samples/sec Loss 9.3853 LearningRate 0.1819 Epoch: 7 Global Step: 81510 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 12:15:10,297-Speed 5972.66 samples/sec Loss 9.3260 LearningRate 0.1819 Epoch: 7 Global Step: 81520 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 12:15:17,151-Speed 5978.10 samples/sec Loss 9.3524 LearningRate 0.1819 Epoch: 7 Global Step: 81530 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:15:24,000-Speed 5981.01 samples/sec Loss 9.4105 LearningRate 0.1818 Epoch: 7 Global Step: 81540 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:15:30,875-Speed 5958.50 samples/sec Loss 9.5040 LearningRate 0.1818 Epoch: 7 Global Step: 81550 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:15:37,739-Speed 5968.31 samples/sec Loss 9.4766 LearningRate 0.1818 Epoch: 7 Global Step: 81560 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:15:44,589-Speed 5981.00 samples/sec Loss 9.5059 LearningRate 0.1817 Epoch: 7 Global Step: 81570 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:15:51,457-Speed 5964.84 samples/sec Loss 9.4718 LearningRate 0.1817 Epoch: 7 Global Step: 81580 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:15:58,310-Speed 5977.64 samples/sec Loss 9.4507 LearningRate 0.1817 Epoch: 7 Global Step: 81590 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:16:05,161-Speed 5979.65 samples/sec Loss 9.4515 LearningRate 0.1817 Epoch: 7 Global Step: 81600 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:16:12,015-Speed 5977.54 samples/sec Loss 9.4610 LearningRate 0.1816 Epoch: 7 Global Step: 81610 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:16:18,869-Speed 5978.29 samples/sec Loss 9.4023 LearningRate 0.1816 Epoch: 7 Global Step: 81620 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:16:25,738-Speed 5964.26 samples/sec Loss 9.4317 LearningRate 0.1816 Epoch: 7 Global Step: 81630 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 12:16:32,593-Speed 5978.30 samples/sec Loss 9.4201 LearningRate 0.1815 Epoch: 7 Global Step: 81640 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:16:39,457-Speed 5968.40 samples/sec Loss 9.3999 LearningRate 0.1815 Epoch: 7 Global Step: 81650 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:16:46,328-Speed 5962.93 samples/sec Loss 9.4291 LearningRate 0.1815 Epoch: 7 Global Step: 81660 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:16:53,184-Speed 5975.00 samples/sec Loss 9.3615 LearningRate 0.1815 Epoch: 7 Global Step: 81670 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:17:00,040-Speed 5975.79 samples/sec Loss 9.4143 LearningRate 0.1814 Epoch: 7 Global Step: 81680 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:17:06,889-Speed 5981.39 samples/sec Loss 9.4230 LearningRate 0.1814 Epoch: 7 Global Step: 81690 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:17:13,749-Speed 5973.77 samples/sec Loss 9.3473 LearningRate 0.1814 Epoch: 7 Global Step: 81700 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:17:20,603-Speed 5976.95 samples/sec Loss 9.4460 LearningRate 0.1813 Epoch: 7 Global Step: 81710 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:17:27,466-Speed 5969.06 samples/sec Loss 9.3885 LearningRate 0.1813 Epoch: 7 Global Step: 81720 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:17:34,319-Speed 5978.43 samples/sec Loss 9.4636 LearningRate 0.1813 Epoch: 7 Global Step: 81730 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:17:41,179-Speed 5972.42 samples/sec Loss 9.4892 LearningRate 0.1813 Epoch: 7 Global Step: 81740 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 12:17:48,020-Speed 5988.47 samples/sec Loss 9.4757 LearningRate 0.1812 Epoch: 7 Global Step: 81750 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:17:54,869-Speed 5980.98 samples/sec Loss 9.4230 LearningRate 0.1812 Epoch: 7 Global Step: 81760 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:18:01,720-Speed 5980.33 samples/sec Loss 9.3762 LearningRate 0.1812 Epoch: 7 Global Step: 81770 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:18:08,574-Speed 5976.08 samples/sec Loss 9.4517 LearningRate 0.1811 Epoch: 7 Global Step: 81780 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:18:15,432-Speed 5974.49 samples/sec Loss 9.4461 LearningRate 0.1811 Epoch: 7 Global Step: 81790 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:18:22,370-Speed 5905.00 samples/sec Loss 9.3623 LearningRate 0.1811 Epoch: 7 Global Step: 81800 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:18:29,340-Speed 5877.51 samples/sec Loss 9.4836 LearningRate 0.1811 Epoch: 7 Global Step: 81810 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:18:36,191-Speed 5980.26 samples/sec Loss 9.4028 LearningRate 0.1810 Epoch: 7 Global Step: 81820 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:18:43,063-Speed 5961.79 samples/sec Loss 9.3505 LearningRate 0.1810 Epoch: 7 Global Step: 81830 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:18:49,926-Speed 5969.19 samples/sec Loss 9.3375 LearningRate 0.1810 Epoch: 7 Global Step: 81840 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:18:56,775-Speed 5981.64 samples/sec Loss 9.3848 LearningRate 0.1809 Epoch: 7 Global Step: 81850 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:19:03,659-Speed 5952.56 samples/sec Loss 9.4702 LearningRate 0.1809 Epoch: 7 Global Step: 81860 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:19:10,523-Speed 5968.44 samples/sec Loss 9.4133 LearningRate 0.1809 Epoch: 7 Global Step: 81870 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:19:17,383-Speed 5975.19 samples/sec Loss 9.3162 LearningRate 0.1809 Epoch: 7 Global Step: 81880 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:19:24,243-Speed 5971.82 samples/sec Loss 9.3981 LearningRate 0.1808 Epoch: 7 Global Step: 81890 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:19:31,097-Speed 5977.42 samples/sec Loss 9.4172 LearningRate 0.1808 Epoch: 7 Global Step: 81900 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:19:37,975-Speed 5956.45 samples/sec Loss 9.4302 LearningRate 0.1808 Epoch: 7 Global Step: 81910 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:19:44,830-Speed 5978.06 samples/sec Loss 9.4550 LearningRate 0.1807 Epoch: 7 Global Step: 81920 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:19:51,730-Speed 5939.98 samples/sec Loss 9.4102 LearningRate 0.1807 Epoch: 7 Global Step: 81930 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:19:58,574-Speed 5985.58 samples/sec Loss 9.4177 LearningRate 0.1807 Epoch: 7 Global Step: 81940 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:20:05,428-Speed 5977.16 samples/sec Loss 9.4341 LearningRate 0.1807 Epoch: 7 Global Step: 81950 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 12:20:12,296-Speed 5965.64 samples/sec Loss 9.4667 LearningRate 0.1806 Epoch: 7 Global Step: 81960 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:20:19,162-Speed 5966.87 samples/sec Loss 9.4528 LearningRate 0.1806 Epoch: 7 Global Step: 81970 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:20:26,028-Speed 5966.80 samples/sec Loss 9.3555 LearningRate 0.1806 Epoch: 7 Global Step: 81980 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:20:32,891-Speed 5969.43 samples/sec Loss 9.3225 LearningRate 0.1805 Epoch: 7 Global Step: 81990 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:20:39,757-Speed 5967.18 samples/sec Loss 9.3505 LearningRate 0.1805 Epoch: 7 Global Step: 82000 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:20:46,698-Speed 5901.93 samples/sec Loss 9.4294 LearningRate 0.1805 Epoch: 7 Global Step: 82010 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:20:53,575-Speed 5957.11 samples/sec Loss 9.4408 LearningRate 0.1805 Epoch: 7 Global Step: 82020 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:21:00,437-Speed 5970.58 samples/sec Loss 9.2990 LearningRate 0.1804 Epoch: 7 Global Step: 82030 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:21:07,291-Speed 5976.63 samples/sec Loss 9.3615 LearningRate 0.1804 Epoch: 7 Global Step: 82040 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:21:14,147-Speed 5974.98 samples/sec Loss 9.3890 LearningRate 0.1804 Epoch: 7 Global Step: 82050 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:21:21,018-Speed 5962.42 samples/sec Loss 9.4276 LearningRate 0.1803 Epoch: 7 Global Step: 82060 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:21:27,875-Speed 5974.61 samples/sec Loss 9.3292 LearningRate 0.1803 Epoch: 7 Global Step: 82070 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:21:34,744-Speed 5963.95 samples/sec Loss 9.3866 LearningRate 0.1803 Epoch: 7 Global Step: 82080 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:21:41,589-Speed 5984.34 samples/sec Loss 9.3424 LearningRate 0.1802 Epoch: 7 Global Step: 82090 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:21:48,446-Speed 5975.25 samples/sec Loss 9.3827 LearningRate 0.1802 Epoch: 7 Global Step: 82100 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:21:55,308-Speed 5970.06 samples/sec Loss 9.3495 LearningRate 0.1802 Epoch: 7 Global Step: 82110 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:22:02,166-Speed 5974.04 samples/sec Loss 9.3806 LearningRate 0.1802 Epoch: 7 Global Step: 82120 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:22:09,046-Speed 5954.08 samples/sec Loss 9.3842 LearningRate 0.1801 Epoch: 7 Global Step: 82130 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:22:15,910-Speed 5969.25 samples/sec Loss 9.3007 LearningRate 0.1801 Epoch: 7 Global Step: 82140 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:22:22,758-Speed 5982.04 samples/sec Loss 9.3782 LearningRate 0.1801 Epoch: 7 Global Step: 82150 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:22:29,611-Speed 5978.40 samples/sec Loss 9.3882 LearningRate 0.1800 Epoch: 7 Global Step: 82160 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:22:36,472-Speed 5971.13 samples/sec Loss 9.3653 LearningRate 0.1800 Epoch: 7 Global Step: 82170 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:22:43,336-Speed 5969.37 samples/sec Loss 9.4015 LearningRate 0.1800 Epoch: 7 Global Step: 82180 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:22:50,194-Speed 5973.36 samples/sec Loss 9.3018 LearningRate 0.1800 Epoch: 7 Global Step: 82190 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:22:57,067-Speed 5960.81 samples/sec Loss 9.3863 LearningRate 0.1799 Epoch: 7 Global Step: 82200 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:23:03,973-Speed 5932.74 samples/sec Loss 9.3903 LearningRate 0.1799 Epoch: 7 Global Step: 82210 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 12:23:10,878-Speed 5932.88 samples/sec Loss 9.3690 LearningRate 0.1799 Epoch: 7 Global Step: 82220 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 12:23:17,754-Speed 5960.55 samples/sec Loss 9.4583 LearningRate 0.1798 Epoch: 7 Global Step: 82230 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-01-08 12:23:24,604-Speed 5980.24 samples/sec Loss 9.4296 LearningRate 0.1798 Epoch: 7 Global Step: 82240 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:23:31,447-Speed 5987.41 samples/sec Loss 9.3510 LearningRate 0.1798 Epoch: 7 Global Step: 82250 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:23:38,326-Speed 5955.18 samples/sec Loss 9.3413 LearningRate 0.1798 Epoch: 7 Global Step: 82260 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:23:45,185-Speed 5973.07 samples/sec Loss 9.3923 LearningRate 0.1797 Epoch: 7 Global Step: 82270 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:23:52,044-Speed 5972.18 samples/sec Loss 9.3903 LearningRate 0.1797 Epoch: 7 Global Step: 82280 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:23:58,889-Speed 5985.09 samples/sec Loss 9.3929 LearningRate 0.1797 Epoch: 7 Global Step: 82290 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:24:05,762-Speed 5961.16 samples/sec Loss 9.3719 LearningRate 0.1796 Epoch: 7 Global Step: 82300 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:24:12,625-Speed 5969.54 samples/sec Loss 9.4120 LearningRate 0.1796 Epoch: 7 Global Step: 82310 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:24:19,498-Speed 5960.98 samples/sec Loss 9.3064 LearningRate 0.1796 Epoch: 7 Global Step: 82320 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:24:26,351-Speed 5977.83 samples/sec Loss 9.3345 LearningRate 0.1796 Epoch: 7 Global Step: 82330 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 12:24:33,196-Speed 5985.17 samples/sec Loss 9.4427 LearningRate 0.1795 Epoch: 7 Global Step: 82340 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:24:40,047-Speed 5979.84 samples/sec Loss 9.3771 LearningRate 0.1795 Epoch: 7 Global Step: 82350 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:24:46,899-Speed 5981.63 samples/sec Loss 9.3438 LearningRate 0.1795 Epoch: 7 Global Step: 82360 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:24:53,763-Speed 5968.70 samples/sec Loss 9.4032 LearningRate 0.1794 Epoch: 7 Global Step: 82370 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:25:00,607-Speed 5985.35 samples/sec Loss 9.3591 LearningRate 0.1794 Epoch: 7 Global Step: 82380 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:25:07,463-Speed 5975.64 samples/sec Loss 9.3669 LearningRate 0.1794 Epoch: 7 Global Step: 82390 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 12:25:14,321-Speed 5974.00 samples/sec Loss 9.3405 LearningRate 0.1794 Epoch: 7 Global Step: 82400 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:25:21,172-Speed 5979.10 samples/sec Loss 9.3519 LearningRate 0.1793 Epoch: 7 Global Step: 82410 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:25:28,023-Speed 5980.36 samples/sec Loss 9.3753 LearningRate 0.1793 Epoch: 7 Global Step: 82420 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:25:34,883-Speed 5973.48 samples/sec Loss 9.3377 LearningRate 0.1793 Epoch: 7 Global Step: 82430 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:25:41,741-Speed 5973.18 samples/sec Loss 9.4138 LearningRate 0.1792 Epoch: 7 Global Step: 82440 Fp16 Grad Scale: 262144 Required: 24 hours Training: 2022-01-08 12:25:48,589-Speed 5982.80 samples/sec Loss 9.4312 LearningRate 0.1792 Epoch: 7 Global Step: 82450 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:25:55,463-Speed 5959.69 samples/sec Loss 9.4258 LearningRate 0.1792 Epoch: 7 Global Step: 82460 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:26:02,315-Speed 5979.10 samples/sec Loss 9.4408 LearningRate 0.1792 Epoch: 7 Global Step: 82470 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:26:09,183-Speed 5964.24 samples/sec Loss 9.3987 LearningRate 0.1791 Epoch: 7 Global Step: 82480 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:26:16,044-Speed 5971.51 samples/sec Loss 9.3713 LearningRate 0.1791 Epoch: 7 Global Step: 82490 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:26:22,894-Speed 5980.99 samples/sec Loss 9.4469 LearningRate 0.1791 Epoch: 7 Global Step: 82500 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:26:29,747-Speed 5978.00 samples/sec Loss 9.2891 LearningRate 0.1790 Epoch: 7 Global Step: 82510 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:26:36,591-Speed 5985.58 samples/sec Loss 9.3214 LearningRate 0.1790 Epoch: 7 Global Step: 82520 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:26:43,448-Speed 5975.19 samples/sec Loss 9.3626 LearningRate 0.1790 Epoch: 7 Global Step: 82530 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:26:50,300-Speed 5978.30 samples/sec Loss 9.3128 LearningRate 0.1790 Epoch: 7 Global Step: 82540 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:26:57,151-Speed 5980.01 samples/sec Loss 9.3767 LearningRate 0.1789 Epoch: 7 Global Step: 82550 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:27:04,027-Speed 5957.48 samples/sec Loss 9.3289 LearningRate 0.1789 Epoch: 7 Global Step: 82560 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:27:10,887-Speed 5972.49 samples/sec Loss 9.3703 LearningRate 0.1789 Epoch: 7 Global Step: 82570 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:27:17,775-Speed 5947.32 samples/sec Loss 9.3494 LearningRate 0.1788 Epoch: 7 Global Step: 82580 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:27:24,626-Speed 5979.99 samples/sec Loss 9.2729 LearningRate 0.1788 Epoch: 7 Global Step: 82590 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:27:31,477-Speed 5980.10 samples/sec Loss 9.3508 LearningRate 0.1788 Epoch: 7 Global Step: 82600 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:27:38,341-Speed 5968.79 samples/sec Loss 9.3023 LearningRate 0.1788 Epoch: 7 Global Step: 82610 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:27:45,198-Speed 5973.97 samples/sec Loss 9.3243 LearningRate 0.1787 Epoch: 7 Global Step: 82620 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:27:52,048-Speed 5980.80 samples/sec Loss 9.3755 LearningRate 0.1787 Epoch: 7 Global Step: 82630 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:27:58,918-Speed 5963.72 samples/sec Loss 9.3797 LearningRate 0.1787 Epoch: 7 Global Step: 82640 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:28:05,774-Speed 5974.72 samples/sec Loss 9.4342 LearningRate 0.1786 Epoch: 7 Global Step: 82650 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:28:12,635-Speed 5971.67 samples/sec Loss 9.4006 LearningRate 0.1786 Epoch: 7 Global Step: 82660 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:28:19,491-Speed 5975.72 samples/sec Loss 9.3303 LearningRate 0.1786 Epoch: 7 Global Step: 82670 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:28:26,362-Speed 5962.02 samples/sec Loss 9.3459 LearningRate 0.1786 Epoch: 7 Global Step: 82680 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:28:33,226-Speed 5968.27 samples/sec Loss 9.3757 LearningRate 0.1785 Epoch: 7 Global Step: 82690 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:28:40,095-Speed 5964.20 samples/sec Loss 9.3725 LearningRate 0.1785 Epoch: 7 Global Step: 82700 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:28:46,955-Speed 5972.09 samples/sec Loss 9.3575 LearningRate 0.1785 Epoch: 7 Global Step: 82710 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:28:53,823-Speed 5965.55 samples/sec Loss 9.3783 LearningRate 0.1784 Epoch: 7 Global Step: 82720 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:29:00,666-Speed 5987.07 samples/sec Loss 9.3288 LearningRate 0.1784 Epoch: 7 Global Step: 82730 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:29:07,514-Speed 5982.95 samples/sec Loss 9.3107 LearningRate 0.1784 Epoch: 7 Global Step: 82740 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:29:14,395-Speed 5955.27 samples/sec Loss 9.4096 LearningRate 0.1784 Epoch: 7 Global Step: 82750 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:29:21,248-Speed 5980.18 samples/sec Loss 9.2857 LearningRate 0.1783 Epoch: 7 Global Step: 82760 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:29:28,100-Speed 5978.79 samples/sec Loss 9.3808 LearningRate 0.1783 Epoch: 7 Global Step: 82770 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:29:34,949-Speed 5981.16 samples/sec Loss 9.4254 LearningRate 0.1783 Epoch: 7 Global Step: 82780 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:29:41,801-Speed 5979.14 samples/sec Loss 9.3201 LearningRate 0.1782 Epoch: 7 Global Step: 82790 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:29:48,696-Speed 5942.56 samples/sec Loss 9.3382 LearningRate 0.1782 Epoch: 7 Global Step: 82800 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:29:55,545-Speed 5981.34 samples/sec Loss 9.3581 LearningRate 0.1782 Epoch: 7 Global Step: 82810 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:30:02,395-Speed 5980.70 samples/sec Loss 9.3504 LearningRate 0.1782 Epoch: 7 Global Step: 82820 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:30:09,242-Speed 5983.31 samples/sec Loss 9.3402 LearningRate 0.1781 Epoch: 7 Global Step: 82830 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:30:16,114-Speed 5961.85 samples/sec Loss 9.3632 LearningRate 0.1781 Epoch: 7 Global Step: 82840 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:30:22,985-Speed 5962.01 samples/sec Loss 9.3331 LearningRate 0.1781 Epoch: 7 Global Step: 82850 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:30:29,850-Speed 5967.74 samples/sec Loss 9.3210 LearningRate 0.1780 Epoch: 7 Global Step: 82860 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:30:36,710-Speed 5972.23 samples/sec Loss 9.3947 LearningRate 0.1780 Epoch: 7 Global Step: 82870 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:30:43,577-Speed 5966.35 samples/sec Loss 9.3186 LearningRate 0.1780 Epoch: 7 Global Step: 82880 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:30:50,430-Speed 5978.01 samples/sec Loss 9.3535 LearningRate 0.1780 Epoch: 7 Global Step: 82890 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:30:57,281-Speed 5980.25 samples/sec Loss 9.2777 LearningRate 0.1779 Epoch: 7 Global Step: 82900 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:31:04,179-Speed 5939.22 samples/sec Loss 9.3568 LearningRate 0.1779 Epoch: 7 Global Step: 82910 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:31:11,039-Speed 5971.38 samples/sec Loss 9.3770 LearningRate 0.1779 Epoch: 7 Global Step: 82920 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:31:17,888-Speed 5981.81 samples/sec Loss 9.3649 LearningRate 0.1778 Epoch: 7 Global Step: 82930 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:31:24,745-Speed 5978.11 samples/sec Loss 9.3819 LearningRate 0.1778 Epoch: 7 Global Step: 82940 Fp16 Grad Scale: 262144 Required: 24 hours Training: 2022-01-08 12:31:31,603-Speed 5973.16 samples/sec Loss 9.3072 LearningRate 0.1778 Epoch: 7 Global Step: 82950 Fp16 Grad Scale: 262144 Required: 24 hours Training: 2022-01-08 12:31:56,699-Speed 1632.45 samples/sec Loss 9.3326 LearningRate 0.1778 Epoch: 8 Global Step: 82960 Fp16 Grad Scale: 262144 Required: 24 hours Training: 2022-01-08 12:32:03,510-Speed 6015.42 samples/sec Loss 9.3149 LearningRate 0.1777 Epoch: 8 Global Step: 82970 Fp16 Grad Scale: 262144 Required: 24 hours Training: 2022-01-08 12:32:10,322-Speed 6013.51 samples/sec Loss 9.3662 LearningRate 0.1777 Epoch: 8 Global Step: 82980 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:32:17,153-Speed 5997.11 samples/sec Loss 9.3140 LearningRate 0.1777 Epoch: 8 Global Step: 82990 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:32:24,008-Speed 5977.07 samples/sec Loss 9.3551 LearningRate 0.1776 Epoch: 8 Global Step: 83000 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:32:30,858-Speed 5980.74 samples/sec Loss 9.3382 LearningRate 0.1776 Epoch: 8 Global Step: 83010 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:32:37,730-Speed 5961.53 samples/sec Loss 9.3018 LearningRate 0.1776 Epoch: 8 Global Step: 83020 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:32:44,632-Speed 5936.56 samples/sec Loss 9.2431 LearningRate 0.1776 Epoch: 8 Global Step: 83030 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:32:51,526-Speed 5941.98 samples/sec Loss 9.2942 LearningRate 0.1775 Epoch: 8 Global Step: 83040 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:32:58,418-Speed 5944.63 samples/sec Loss 9.3675 LearningRate 0.1775 Epoch: 8 Global Step: 83050 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:33:06,180-Speed 5278.27 samples/sec Loss 9.3641 LearningRate 0.1775 Epoch: 8 Global Step: 83060 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:33:13,057-Speed 5956.97 samples/sec Loss 9.2384 LearningRate 0.1774 Epoch: 8 Global Step: 83070 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:33:19,935-Speed 5956.37 samples/sec Loss 9.3371 LearningRate 0.1774 Epoch: 8 Global Step: 83080 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:33:26,850-Speed 5925.13 samples/sec Loss 9.3046 LearningRate 0.1774 Epoch: 8 Global Step: 83090 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:33:33,717-Speed 5966.00 samples/sec Loss 9.2502 LearningRate 0.1774 Epoch: 8 Global Step: 83100 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:33:40,590-Speed 5960.99 samples/sec Loss 9.3197 LearningRate 0.1773 Epoch: 8 Global Step: 83110 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:33:47,463-Speed 5960.52 samples/sec Loss 9.3337 LearningRate 0.1773 Epoch: 8 Global Step: 83120 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:33:54,306-Speed 5987.07 samples/sec Loss 9.2741 LearningRate 0.1773 Epoch: 8 Global Step: 83130 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:34:01,162-Speed 5975.34 samples/sec Loss 9.3103 LearningRate 0.1772 Epoch: 8 Global Step: 83140 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:34:08,044-Speed 5953.00 samples/sec Loss 9.3321 LearningRate 0.1772 Epoch: 8 Global Step: 83150 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:34:14,894-Speed 5980.58 samples/sec Loss 9.3517 LearningRate 0.1772 Epoch: 8 Global Step: 83160 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:34:21,741-Speed 5983.41 samples/sec Loss 9.3512 LearningRate 0.1772 Epoch: 8 Global Step: 83170 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:34:28,604-Speed 5969.52 samples/sec Loss 9.3126 LearningRate 0.1771 Epoch: 8 Global Step: 83180 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:34:35,496-Speed 5944.03 samples/sec Loss 9.3281 LearningRate 0.1771 Epoch: 8 Global Step: 83190 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:34:42,341-Speed 5984.81 samples/sec Loss 9.2284 LearningRate 0.1771 Epoch: 8 Global Step: 83200 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:34:49,203-Speed 5970.96 samples/sec Loss 9.3418 LearningRate 0.1770 Epoch: 8 Global Step: 83210 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:34:56,067-Speed 5967.92 samples/sec Loss 9.3135 LearningRate 0.1770 Epoch: 8 Global Step: 83220 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:35:02,920-Speed 5978.42 samples/sec Loss 9.3041 LearningRate 0.1770 Epoch: 8 Global Step: 83230 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:35:12,005-Speed 4509.36 samples/sec Loss 9.2651 LearningRate 0.1770 Epoch: 8 Global Step: 83240 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:35:18,855-Speed 5980.64 samples/sec Loss 9.3013 LearningRate 0.1769 Epoch: 8 Global Step: 83250 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:35:25,709-Speed 5978.11 samples/sec Loss 9.2488 LearningRate 0.1769 Epoch: 8 Global Step: 83260 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:35:32,569-Speed 5971.86 samples/sec Loss 9.2812 LearningRate 0.1769 Epoch: 8 Global Step: 83270 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:35:39,438-Speed 5967.50 samples/sec Loss 9.3236 LearningRate 0.1768 Epoch: 8 Global Step: 83280 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:35:46,291-Speed 5977.83 samples/sec Loss 9.3103 LearningRate 0.1768 Epoch: 8 Global Step: 83290 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:35:53,146-Speed 5975.98 samples/sec Loss 9.2511 LearningRate 0.1768 Epoch: 8 Global Step: 83300 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:36:00,006-Speed 5972.54 samples/sec Loss 9.3437 LearningRate 0.1768 Epoch: 8 Global Step: 83310 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:36:06,860-Speed 5977.13 samples/sec Loss 9.3050 LearningRate 0.1767 Epoch: 8 Global Step: 83320 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:36:13,722-Speed 5970.37 samples/sec Loss 9.3420 LearningRate 0.1767 Epoch: 8 Global Step: 83330 Fp16 Grad Scale: 262144 Required: 24 hours Training: 2022-01-08 12:36:20,585-Speed 5968.98 samples/sec Loss 9.3579 LearningRate 0.1767 Epoch: 8 Global Step: 83340 Fp16 Grad Scale: 262144 Required: 24 hours Training: 2022-01-08 12:36:27,441-Speed 5976.28 samples/sec Loss 9.3557 LearningRate 0.1766 Epoch: 8 Global Step: 83350 Fp16 Grad Scale: 262144 Required: 24 hours Training: 2022-01-08 12:36:34,306-Speed 5966.95 samples/sec Loss 9.3796 LearningRate 0.1766 Epoch: 8 Global Step: 83360 Fp16 Grad Scale: 262144 Required: 24 hours Training: 2022-01-08 12:36:41,162-Speed 5975.55 samples/sec Loss 9.2559 LearningRate 0.1766 Epoch: 8 Global Step: 83370 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:36:48,015-Speed 5980.79 samples/sec Loss 9.3300 LearningRate 0.1766 Epoch: 8 Global Step: 83380 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:36:54,861-Speed 5984.22 samples/sec Loss 9.3196 LearningRate 0.1765 Epoch: 8 Global Step: 83390 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:37:01,714-Speed 5978.20 samples/sec Loss 9.3146 LearningRate 0.1765 Epoch: 8 Global Step: 83400 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:37:08,587-Speed 5961.02 samples/sec Loss 9.3554 LearningRate 0.1765 Epoch: 8 Global Step: 83410 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:37:15,447-Speed 5971.36 samples/sec Loss 9.3322 LearningRate 0.1764 Epoch: 8 Global Step: 83420 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:37:22,353-Speed 5932.40 samples/sec Loss 9.3520 LearningRate 0.1764 Epoch: 8 Global Step: 83430 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:37:29,290-Speed 5905.86 samples/sec Loss 9.2462 LearningRate 0.1764 Epoch: 8 Global Step: 83440 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:37:36,141-Speed 5979.52 samples/sec Loss 9.3202 LearningRate 0.1764 Epoch: 8 Global Step: 83450 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:37:43,014-Speed 5960.96 samples/sec Loss 9.2772 LearningRate 0.1763 Epoch: 8 Global Step: 83460 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:37:49,869-Speed 5977.04 samples/sec Loss 9.2625 LearningRate 0.1763 Epoch: 8 Global Step: 83470 Fp16 Grad Scale: 262144 Required: 24 hours Training: 2022-01-08 12:37:56,721-Speed 5978.78 samples/sec Loss 9.2800 LearningRate 0.1763 Epoch: 8 Global Step: 83480 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:38:03,586-Speed 5968.39 samples/sec Loss 9.2925 LearningRate 0.1762 Epoch: 8 Global Step: 83490 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:38:10,447-Speed 5971.11 samples/sec Loss 9.3398 LearningRate 0.1762 Epoch: 8 Global Step: 83500 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:38:17,305-Speed 5973.00 samples/sec Loss 9.2845 LearningRate 0.1762 Epoch: 8 Global Step: 83510 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:38:24,154-Speed 5982.07 samples/sec Loss 9.2797 LearningRate 0.1762 Epoch: 8 Global Step: 83520 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:38:31,007-Speed 5980.31 samples/sec Loss 9.3045 LearningRate 0.1761 Epoch: 8 Global Step: 83530 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:38:37,871-Speed 5968.00 samples/sec Loss 9.3594 LearningRate 0.1761 Epoch: 8 Global Step: 83540 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:38:44,725-Speed 5977.22 samples/sec Loss 9.2563 LearningRate 0.1761 Epoch: 8 Global Step: 83550 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:38:51,585-Speed 5971.89 samples/sec Loss 9.3222 LearningRate 0.1760 Epoch: 8 Global Step: 83560 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:38:58,471-Speed 5949.46 samples/sec Loss 9.2670 LearningRate 0.1760 Epoch: 8 Global Step: 83570 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:39:05,319-Speed 5982.22 samples/sec Loss 9.2776 LearningRate 0.1760 Epoch: 8 Global Step: 83580 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:39:12,196-Speed 5959.24 samples/sec Loss 9.2889 LearningRate 0.1760 Epoch: 8 Global Step: 83590 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:39:19,346-Speed 5729.42 samples/sec Loss 9.2587 LearningRate 0.1759 Epoch: 8 Global Step: 83600 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:39:26,213-Speed 5965.82 samples/sec Loss 9.3014 LearningRate 0.1759 Epoch: 8 Global Step: 83610 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:39:33,041-Speed 5999.70 samples/sec Loss 9.2892 LearningRate 0.1759 Epoch: 8 Global Step: 83620 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-01-08 12:39:39,898-Speed 5974.09 samples/sec Loss 9.3063 LearningRate 0.1758 Epoch: 8 Global Step: 83630 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-01-08 12:39:46,801-Speed 5935.17 samples/sec Loss 9.3307 LearningRate 0.1758 Epoch: 8 Global Step: 83640 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-01-08 12:39:53,696-Speed 5941.61 samples/sec Loss 9.3232 LearningRate 0.1758 Epoch: 8 Global Step: 83650 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-01-08 12:40:00,551-Speed 5976.26 samples/sec Loss 9.3208 LearningRate 0.1758 Epoch: 8 Global Step: 83660 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-01-08 12:40:07,409-Speed 5974.55 samples/sec Loss 9.2937 LearningRate 0.1757 Epoch: 8 Global Step: 83670 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-01-08 12:40:14,266-Speed 5974.98 samples/sec Loss 9.2890 LearningRate 0.1757 Epoch: 8 Global Step: 83680 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-01-08 12:40:21,143-Speed 5957.05 samples/sec Loss 9.3312 LearningRate 0.1757 Epoch: 8 Global Step: 83690 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-01-08 12:40:28,009-Speed 5966.44 samples/sec Loss 9.2508 LearningRate 0.1756 Epoch: 8 Global Step: 83700 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-01-08 12:40:34,863-Speed 5977.74 samples/sec Loss 9.3031 LearningRate 0.1756 Epoch: 8 Global Step: 83710 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-01-08 12:40:41,715-Speed 5978.76 samples/sec Loss 9.2568 LearningRate 0.1756 Epoch: 8 Global Step: 83720 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 12:40:48,572-Speed 5974.71 samples/sec Loss 9.2891 LearningRate 0.1756 Epoch: 8 Global Step: 83730 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 12:40:55,435-Speed 5971.74 samples/sec Loss 9.2838 LearningRate 0.1755 Epoch: 8 Global Step: 83740 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 12:41:02,318-Speed 5952.33 samples/sec Loss 9.2026 LearningRate 0.1755 Epoch: 8 Global Step: 83750 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 12:41:09,180-Speed 5970.21 samples/sec Loss 9.2620 LearningRate 0.1755 Epoch: 8 Global Step: 83760 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 12:41:16,049-Speed 5964.19 samples/sec Loss 9.2606 LearningRate 0.1754 Epoch: 8 Global Step: 83770 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 12:41:22,940-Speed 5944.24 samples/sec Loss 9.1789 LearningRate 0.1754 Epoch: 8 Global Step: 83780 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 12:41:29,821-Speed 5954.05 samples/sec Loss 9.3218 LearningRate 0.1754 Epoch: 8 Global Step: 83790 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 12:41:36,691-Speed 5966.71 samples/sec Loss 9.2382 LearningRate 0.1754 Epoch: 8 Global Step: 83800 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 12:41:43,580-Speed 5946.01 samples/sec Loss 9.2551 LearningRate 0.1753 Epoch: 8 Global Step: 83810 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 12:41:50,439-Speed 5973.80 samples/sec Loss 9.3349 LearningRate 0.1753 Epoch: 8 Global Step: 83820 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:41:57,296-Speed 5975.24 samples/sec Loss 9.2444 LearningRate 0.1753 Epoch: 8 Global Step: 83830 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:42:04,146-Speed 5980.15 samples/sec Loss 9.3824 LearningRate 0.1752 Epoch: 8 Global Step: 83840 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:42:11,030-Speed 5951.45 samples/sec Loss 9.2622 LearningRate 0.1752 Epoch: 8 Global Step: 83850 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:42:17,900-Speed 5963.15 samples/sec Loss 9.3022 LearningRate 0.1752 Epoch: 8 Global Step: 83860 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:42:24,749-Speed 5981.56 samples/sec Loss 9.3407 LearningRate 0.1752 Epoch: 8 Global Step: 83870 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:42:31,616-Speed 5966.18 samples/sec Loss 9.2133 LearningRate 0.1751 Epoch: 8 Global Step: 83880 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:42:38,499-Speed 5952.35 samples/sec Loss 9.2167 LearningRate 0.1751 Epoch: 8 Global Step: 83890 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:42:45,372-Speed 5961.32 samples/sec Loss 9.2295 LearningRate 0.1751 Epoch: 8 Global Step: 83900 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:42:52,220-Speed 5982.25 samples/sec Loss 9.2751 LearningRate 0.1751 Epoch: 8 Global Step: 83910 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 12:42:59,091-Speed 5962.50 samples/sec Loss 9.2489 LearningRate 0.1750 Epoch: 8 Global Step: 83920 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:43:05,947-Speed 5974.94 samples/sec Loss 9.2996 LearningRate 0.1750 Epoch: 8 Global Step: 83930 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:43:12,811-Speed 5969.41 samples/sec Loss 9.2724 LearningRate 0.1750 Epoch: 8 Global Step: 83940 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:43:19,673-Speed 5969.71 samples/sec Loss 9.2711 LearningRate 0.1749 Epoch: 8 Global Step: 83950 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:43:26,551-Speed 5956.37 samples/sec Loss 9.2846 LearningRate 0.1749 Epoch: 8 Global Step: 83960 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:43:33,415-Speed 5968.27 samples/sec Loss 9.2197 LearningRate 0.1749 Epoch: 8 Global Step: 83970 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:43:40,270-Speed 5978.15 samples/sec Loss 9.1881 LearningRate 0.1749 Epoch: 8 Global Step: 83980 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:43:47,193-Speed 5918.05 samples/sec Loss 9.2973 LearningRate 0.1748 Epoch: 8 Global Step: 83990 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:43:54,118-Speed 5915.79 samples/sec Loss 9.3061 LearningRate 0.1748 Epoch: 8 Global Step: 84000 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:44:01,048-Speed 5913.38 samples/sec Loss 9.2599 LearningRate 0.1748 Epoch: 8 Global Step: 84010 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:44:07,979-Speed 5910.72 samples/sec Loss 9.2366 LearningRate 0.1747 Epoch: 8 Global Step: 84020 Fp16 Grad Scale: 262144 Required: 24 hours Training: 2022-01-08 12:44:14,884-Speed 5932.78 samples/sec Loss 9.2632 LearningRate 0.1747 Epoch: 8 Global Step: 84030 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:44:21,748-Speed 5970.57 samples/sec Loss 9.2182 LearningRate 0.1747 Epoch: 8 Global Step: 84040 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:44:28,605-Speed 5973.94 samples/sec Loss 9.2809 LearningRate 0.1747 Epoch: 8 Global Step: 84050 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:44:35,478-Speed 5960.81 samples/sec Loss 9.2668 LearningRate 0.1746 Epoch: 8 Global Step: 84060 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:44:42,379-Speed 5937.14 samples/sec Loss 9.2509 LearningRate 0.1746 Epoch: 8 Global Step: 84070 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:44:49,248-Speed 5963.59 samples/sec Loss 9.2307 LearningRate 0.1746 Epoch: 8 Global Step: 84080 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:44:56,142-Speed 5942.77 samples/sec Loss 9.2306 LearningRate 0.1745 Epoch: 8 Global Step: 84090 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:45:03,008-Speed 5966.83 samples/sec Loss 9.2252 LearningRate 0.1745 Epoch: 8 Global Step: 84100 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:45:09,862-Speed 5977.24 samples/sec Loss 9.3093 LearningRate 0.1745 Epoch: 8 Global Step: 84110 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:45:16,713-Speed 5979.45 samples/sec Loss 9.2129 LearningRate 0.1745 Epoch: 8 Global Step: 84120 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:45:23,568-Speed 5976.96 samples/sec Loss 9.2221 LearningRate 0.1744 Epoch: 8 Global Step: 84130 Fp16 Grad Scale: 262144 Required: 24 hours Training: 2022-01-08 12:45:30,425-Speed 5974.48 samples/sec Loss 9.2411 LearningRate 0.1744 Epoch: 8 Global Step: 84140 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:45:37,305-Speed 5955.05 samples/sec Loss 9.2405 LearningRate 0.1744 Epoch: 8 Global Step: 84150 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:45:44,177-Speed 5960.48 samples/sec Loss 9.2824 LearningRate 0.1743 Epoch: 8 Global Step: 84160 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:45:51,040-Speed 5969.63 samples/sec Loss 9.2219 LearningRate 0.1743 Epoch: 8 Global Step: 84170 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:45:57,894-Speed 5977.26 samples/sec Loss 9.2562 LearningRate 0.1743 Epoch: 8 Global Step: 84180 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:46:04,761-Speed 5966.28 samples/sec Loss 9.2478 LearningRate 0.1743 Epoch: 8 Global Step: 84190 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:46:11,630-Speed 5963.95 samples/sec Loss 9.1947 LearningRate 0.1742 Epoch: 8 Global Step: 84200 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:46:18,513-Speed 5952.05 samples/sec Loss 9.2393 LearningRate 0.1742 Epoch: 8 Global Step: 84210 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:46:25,399-Speed 5949.10 samples/sec Loss 9.1917 LearningRate 0.1742 Epoch: 8 Global Step: 84220 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:46:32,259-Speed 5972.18 samples/sec Loss 9.2458 LearningRate 0.1741 Epoch: 8 Global Step: 84230 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:46:39,143-Speed 5951.28 samples/sec Loss 9.2890 LearningRate 0.1741 Epoch: 8 Global Step: 84240 Fp16 Grad Scale: 262144 Required: 24 hours Training: 2022-01-08 12:46:46,010-Speed 5968.92 samples/sec Loss 9.2025 LearningRate 0.1741 Epoch: 8 Global Step: 84250 Fp16 Grad Scale: 262144 Required: 24 hours Training: 2022-01-08 12:46:52,867-Speed 5973.94 samples/sec Loss 9.2095 LearningRate 0.1741 Epoch: 8 Global Step: 84260 Fp16 Grad Scale: 262144 Required: 24 hours Training: 2022-01-08 12:46:59,730-Speed 5969.86 samples/sec Loss 9.1316 LearningRate 0.1740 Epoch: 8 Global Step: 84270 Fp16 Grad Scale: 262144 Required: 24 hours Training: 2022-01-08 12:47:06,598-Speed 5965.19 samples/sec Loss 9.2094 LearningRate 0.1740 Epoch: 8 Global Step: 84280 Fp16 Grad Scale: 262144 Required: 24 hours Training: 2022-01-08 12:47:13,470-Speed 5960.61 samples/sec Loss 9.2477 LearningRate 0.1740 Epoch: 8 Global Step: 84290 Fp16 Grad Scale: 262144 Required: 24 hours Training: 2022-01-08 12:47:20,349-Speed 5955.45 samples/sec Loss 9.2627 LearningRate 0.1739 Epoch: 8 Global Step: 84300 Fp16 Grad Scale: 262144 Required: 24 hours Training: 2022-01-08 12:47:27,200-Speed 5980.02 samples/sec Loss 9.2882 LearningRate 0.1739 Epoch: 8 Global Step: 84310 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:47:34,056-Speed 5975.24 samples/sec Loss 9.3144 LearningRate 0.1739 Epoch: 8 Global Step: 84320 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:47:40,904-Speed 5982.13 samples/sec Loss 9.1903 LearningRate 0.1739 Epoch: 8 Global Step: 84330 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:47:47,755-Speed 5980.21 samples/sec Loss 9.1579 LearningRate 0.1738 Epoch: 8 Global Step: 84340 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:47:54,623-Speed 5964.79 samples/sec Loss 9.2362 LearningRate 0.1738 Epoch: 8 Global Step: 84350 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:48:01,477-Speed 5976.94 samples/sec Loss 9.2279 LearningRate 0.1738 Epoch: 8 Global Step: 84360 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:48:08,347-Speed 5963.91 samples/sec Loss 9.3288 LearningRate 0.1737 Epoch: 8 Global Step: 84370 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:48:15,215-Speed 5964.49 samples/sec Loss 9.2505 LearningRate 0.1737 Epoch: 8 Global Step: 84380 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:48:22,071-Speed 5975.86 samples/sec Loss 9.2715 LearningRate 0.1737 Epoch: 8 Global Step: 84390 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:48:28,970-Speed 5938.99 samples/sec Loss 9.1754 LearningRate 0.1737 Epoch: 8 Global Step: 84400 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:48:35,847-Speed 5956.41 samples/sec Loss 9.2321 LearningRate 0.1736 Epoch: 8 Global Step: 84410 Fp16 Grad Scale: 262144 Required: 24 hours Training: 2022-01-08 12:48:42,729-Speed 5953.85 samples/sec Loss 9.2410 LearningRate 0.1736 Epoch: 8 Global Step: 84420 Fp16 Grad Scale: 262144 Required: 24 hours Training: 2022-01-08 12:48:49,591-Speed 5970.16 samples/sec Loss 9.2785 LearningRate 0.1736 Epoch: 8 Global Step: 84430 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:48:56,454-Speed 5969.35 samples/sec Loss 9.2281 LearningRate 0.1736 Epoch: 8 Global Step: 84440 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:49:03,322-Speed 5965.40 samples/sec Loss 9.3417 LearningRate 0.1735 Epoch: 8 Global Step: 84450 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:49:10,175-Speed 5977.97 samples/sec Loss 9.2163 LearningRate 0.1735 Epoch: 8 Global Step: 84460 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:49:17,030-Speed 5976.22 samples/sec Loss 9.2313 LearningRate 0.1735 Epoch: 8 Global Step: 84470 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:49:23,905-Speed 5959.35 samples/sec Loss 9.2344 LearningRate 0.1734 Epoch: 8 Global Step: 84480 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:49:30,787-Speed 5953.37 samples/sec Loss 9.2234 LearningRate 0.1734 Epoch: 8 Global Step: 84490 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:49:37,682-Speed 5942.20 samples/sec Loss 9.2556 LearningRate 0.1734 Epoch: 8 Global Step: 84500 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:49:44,577-Speed 5941.61 samples/sec Loss 9.1572 LearningRate 0.1734 Epoch: 8 Global Step: 84510 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:49:51,455-Speed 5956.84 samples/sec Loss 9.2565 LearningRate 0.1733 Epoch: 8 Global Step: 84520 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:49:58,314-Speed 5972.19 samples/sec Loss 9.3060 LearningRate 0.1733 Epoch: 8 Global Step: 84530 Fp16 Grad Scale: 262144 Required: 24 hours Training: 2022-01-08 12:50:05,175-Speed 5971.84 samples/sec Loss 9.2420 LearningRate 0.1733 Epoch: 8 Global Step: 84540 Fp16 Grad Scale: 262144 Required: 24 hours Training: 2022-01-08 12:50:12,029-Speed 5978.86 samples/sec Loss 9.1420 LearningRate 0.1732 Epoch: 8 Global Step: 84550 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:50:18,913-Speed 5951.23 samples/sec Loss 9.2545 LearningRate 0.1732 Epoch: 8 Global Step: 84560 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:50:25,776-Speed 5969.57 samples/sec Loss 9.1300 LearningRate 0.1732 Epoch: 8 Global Step: 84570 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:50:32,644-Speed 5965.56 samples/sec Loss 9.2417 LearningRate 0.1732 Epoch: 8 Global Step: 84580 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:50:39,511-Speed 5966.06 samples/sec Loss 9.1548 LearningRate 0.1731 Epoch: 8 Global Step: 84590 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:50:46,378-Speed 5966.30 samples/sec Loss 9.2347 LearningRate 0.1731 Epoch: 8 Global Step: 84600 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:50:53,261-Speed 5951.75 samples/sec Loss 9.1339 LearningRate 0.1731 Epoch: 8 Global Step: 84610 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:51:00,150-Speed 5946.42 samples/sec Loss 9.1976 LearningRate 0.1730 Epoch: 8 Global Step: 84620 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:51:07,020-Speed 5963.65 samples/sec Loss 9.1197 LearningRate 0.1730 Epoch: 8 Global Step: 84630 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:51:13,888-Speed 5965.29 samples/sec Loss 9.2453 LearningRate 0.1730 Epoch: 8 Global Step: 84640 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:51:20,749-Speed 5971.02 samples/sec Loss 9.2499 LearningRate 0.1730 Epoch: 8 Global Step: 84650 Fp16 Grad Scale: 262144 Required: 24 hours Training: 2022-01-08 12:51:27,611-Speed 5972.20 samples/sec Loss 9.1812 LearningRate 0.1729 Epoch: 8 Global Step: 84660 Fp16 Grad Scale: 262144 Required: 24 hours Training: 2022-01-08 12:51:34,475-Speed 5968.77 samples/sec Loss 9.2049 LearningRate 0.1729 Epoch: 8 Global Step: 84670 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:51:41,407-Speed 5909.95 samples/sec Loss 9.1857 LearningRate 0.1729 Epoch: 8 Global Step: 84680 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:51:48,271-Speed 5968.54 samples/sec Loss 9.1767 LearningRate 0.1728 Epoch: 8 Global Step: 84690 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:51:55,152-Speed 5956.05 samples/sec Loss 9.1340 LearningRate 0.1728 Epoch: 8 Global Step: 84700 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:52:02,016-Speed 5968.43 samples/sec Loss 9.1858 LearningRate 0.1728 Epoch: 8 Global Step: 84710 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:52:08,893-Speed 5957.45 samples/sec Loss 9.1093 LearningRate 0.1728 Epoch: 8 Global Step: 84720 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:52:15,769-Speed 5958.03 samples/sec Loss 9.1635 LearningRate 0.1727 Epoch: 8 Global Step: 84730 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:52:22,627-Speed 5973.57 samples/sec Loss 9.1566 LearningRate 0.1727 Epoch: 8 Global Step: 84740 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:52:29,489-Speed 5970.78 samples/sec Loss 9.2486 LearningRate 0.1727 Epoch: 8 Global Step: 84750 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:52:36,348-Speed 5976.23 samples/sec Loss 9.2900 LearningRate 0.1726 Epoch: 8 Global Step: 84760 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:52:43,324-Speed 5872.72 samples/sec Loss 9.1816 LearningRate 0.1726 Epoch: 8 Global Step: 84770 Fp16 Grad Scale: 262144 Required: 24 hours Training: 2022-01-08 12:52:50,169-Speed 5984.37 samples/sec Loss 9.2363 LearningRate 0.1726 Epoch: 8 Global Step: 84780 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:52:57,094-Speed 5916.51 samples/sec Loss 9.1633 LearningRate 0.1726 Epoch: 8 Global Step: 84790 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:53:04,019-Speed 5915.14 samples/sec Loss 9.1856 LearningRate 0.1725 Epoch: 8 Global Step: 84800 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:53:10,919-Speed 5939.96 samples/sec Loss 9.2298 LearningRate 0.1725 Epoch: 8 Global Step: 84810 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:53:17,780-Speed 5971.92 samples/sec Loss 9.2275 LearningRate 0.1725 Epoch: 8 Global Step: 84820 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:53:24,643-Speed 5968.59 samples/sec Loss 9.2969 LearningRate 0.1725 Epoch: 8 Global Step: 84830 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:53:31,530-Speed 5948.80 samples/sec Loss 9.2136 LearningRate 0.1724 Epoch: 8 Global Step: 84840 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:53:38,429-Speed 5941.21 samples/sec Loss 9.2247 LearningRate 0.1724 Epoch: 8 Global Step: 84850 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:53:45,321-Speed 5944.38 samples/sec Loss 9.1299 LearningRate 0.1724 Epoch: 8 Global Step: 84860 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:53:52,233-Speed 5926.70 samples/sec Loss 9.1953 LearningRate 0.1723 Epoch: 8 Global Step: 84870 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:53:59,143-Speed 5929.05 samples/sec Loss 9.1544 LearningRate 0.1723 Epoch: 8 Global Step: 84880 Fp16 Grad Scale: 262144 Required: 24 hours Training: 2022-01-08 12:54:06,022-Speed 5955.37 samples/sec Loss 9.2645 LearningRate 0.1723 Epoch: 8 Global Step: 84890 Fp16 Grad Scale: 262144 Required: 24 hours Training: 2022-01-08 12:54:12,908-Speed 5950.05 samples/sec Loss 9.1332 LearningRate 0.1723 Epoch: 8 Global Step: 84900 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:54:19,818-Speed 5929.21 samples/sec Loss 9.1482 LearningRate 0.1722 Epoch: 8 Global Step: 84910 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:54:26,692-Speed 5960.20 samples/sec Loss 9.1359 LearningRate 0.1722 Epoch: 8 Global Step: 84920 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:54:33,562-Speed 5962.99 samples/sec Loss 9.1913 LearningRate 0.1722 Epoch: 8 Global Step: 84930 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:54:40,432-Speed 5963.61 samples/sec Loss 9.1355 LearningRate 0.1721 Epoch: 8 Global Step: 84940 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:54:47,281-Speed 5981.22 samples/sec Loss 9.2300 LearningRate 0.1721 Epoch: 8 Global Step: 84950 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:54:54,142-Speed 5974.83 samples/sec Loss 9.2231 LearningRate 0.1721 Epoch: 8 Global Step: 84960 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:55:01,016-Speed 5960.13 samples/sec Loss 9.0993 LearningRate 0.1721 Epoch: 8 Global Step: 84970 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:55:07,899-Speed 5951.49 samples/sec Loss 9.1361 LearningRate 0.1720 Epoch: 8 Global Step: 84980 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:55:14,764-Speed 5968.83 samples/sec Loss 9.1688 LearningRate 0.1720 Epoch: 8 Global Step: 84990 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:55:21,640-Speed 5958.25 samples/sec Loss 9.2153 LearningRate 0.1720 Epoch: 8 Global Step: 85000 Fp16 Grad Scale: 262144 Required: 24 hours Training: 2022-01-08 12:55:59,326-[lfw][85000]XNorm: 24.676382 Training: 2022-01-08 12:55:59,327-[lfw][85000]Accuracy-Flip: 0.99683+-0.00241 Training: 2022-01-08 12:55:59,328-[lfw][85000]Accuracy-Highest: 0.99750 Training: 2022-01-08 12:56:30,214-[cfp_fp][85000]XNorm: 21.869296 Training: 2022-01-08 12:56:30,215-[cfp_fp][85000]Accuracy-Flip: 0.98114+-0.00498 Training: 2022-01-08 12:56:30,216-[cfp_fp][85000]Accuracy-Highest: 0.98114 Training: 2022-01-08 12:56:56,978-[agedb_30][85000]XNorm: 24.313713 Training: 2022-01-08 12:56:56,979-[agedb_30][85000]Accuracy-Flip: 0.96583+-0.00704 Training: 2022-01-08 12:56:56,979-[agedb_30][85000]Accuracy-Highest: 0.96883 Training: 2022-01-08 12:57:03,848-Speed 400.76 samples/sec Loss 9.2098 LearningRate 0.1719 Epoch: 8 Global Step: 85010 Fp16 Grad Scale: 262144 Required: 24 hours Training: 2022-01-08 12:57:10,707-Speed 5973.50 samples/sec Loss 9.2582 LearningRate 0.1719 Epoch: 8 Global Step: 85020 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:57:17,604-Speed 5940.15 samples/sec Loss 9.2257 LearningRate 0.1719 Epoch: 8 Global Step: 85030 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:57:24,476-Speed 5962.41 samples/sec Loss 9.1773 LearningRate 0.1719 Epoch: 8 Global Step: 85040 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:57:31,350-Speed 5959.95 samples/sec Loss 9.1134 LearningRate 0.1718 Epoch: 8 Global Step: 85050 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:57:38,384-Speed 5824.04 samples/sec Loss 9.1807 LearningRate 0.1718 Epoch: 8 Global Step: 85060 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:57:45,255-Speed 5963.04 samples/sec Loss 9.2165 LearningRate 0.1718 Epoch: 8 Global Step: 85070 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:57:52,142-Speed 5948.35 samples/sec Loss 9.1850 LearningRate 0.1717 Epoch: 8 Global Step: 85080 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:57:59,022-Speed 5954.35 samples/sec Loss 9.1437 LearningRate 0.1717 Epoch: 8 Global Step: 85090 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:58:05,891-Speed 5964.82 samples/sec Loss 9.1489 LearningRate 0.1717 Epoch: 8 Global Step: 85100 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:58:12,808-Speed 5921.94 samples/sec Loss 9.1823 LearningRate 0.1717 Epoch: 8 Global Step: 85110 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:58:19,701-Speed 5943.90 samples/sec Loss 9.1345 LearningRate 0.1716 Epoch: 8 Global Step: 85120 Fp16 Grad Scale: 262144 Required: 24 hours Training: 2022-01-08 12:58:26,555-Speed 5977.76 samples/sec Loss 9.1905 LearningRate 0.1716 Epoch: 8 Global Step: 85130 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:58:33,410-Speed 5975.95 samples/sec Loss 9.1434 LearningRate 0.1716 Epoch: 8 Global Step: 85140 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:58:40,263-Speed 5978.62 samples/sec Loss 9.1532 LearningRate 0.1716 Epoch: 8 Global Step: 85150 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:58:47,110-Speed 5983.11 samples/sec Loss 9.2167 LearningRate 0.1715 Epoch: 8 Global Step: 85160 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:58:54,118-Speed 5846.11 samples/sec Loss 9.1580 LearningRate 0.1715 Epoch: 8 Global Step: 85170 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:59:01,067-Speed 5895.71 samples/sec Loss 9.2088 LearningRate 0.1715 Epoch: 8 Global Step: 85180 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:59:07,978-Speed 5927.64 samples/sec Loss 9.2017 LearningRate 0.1714 Epoch: 8 Global Step: 85190 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:59:14,828-Speed 5981.41 samples/sec Loss 9.1566 LearningRate 0.1714 Epoch: 8 Global Step: 85200 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:59:21,786-Speed 5887.69 samples/sec Loss 9.1372 LearningRate 0.1714 Epoch: 8 Global Step: 85210 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:59:28,657-Speed 5962.90 samples/sec Loss 9.1684 LearningRate 0.1714 Epoch: 8 Global Step: 85220 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:59:35,506-Speed 5981.41 samples/sec Loss 9.1776 LearningRate 0.1713 Epoch: 8 Global Step: 85230 Fp16 Grad Scale: 262144 Required: 24 hours Training: 2022-01-08 12:59:42,351-Speed 5984.77 samples/sec Loss 9.1444 LearningRate 0.1713 Epoch: 8 Global Step: 85240 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:59:49,202-Speed 5980.27 samples/sec Loss 9.1900 LearningRate 0.1713 Epoch: 8 Global Step: 85250 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 12:59:56,054-Speed 5978.98 samples/sec Loss 9.1562 LearningRate 0.1712 Epoch: 8 Global Step: 85260 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:00:02,958-Speed 5934.57 samples/sec Loss 9.1667 LearningRate 0.1712 Epoch: 8 Global Step: 85270 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:00:09,848-Speed 5945.56 samples/sec Loss 9.1693 LearningRate 0.1712 Epoch: 8 Global Step: 85280 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:00:16,695-Speed 5983.65 samples/sec Loss 9.2392 LearningRate 0.1712 Epoch: 8 Global Step: 85290 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:00:23,542-Speed 5982.73 samples/sec Loss 9.2009 LearningRate 0.1711 Epoch: 8 Global Step: 85300 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:00:30,397-Speed 5977.09 samples/sec Loss 9.1353 LearningRate 0.1711 Epoch: 8 Global Step: 85310 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:00:37,253-Speed 5975.19 samples/sec Loss 9.1449 LearningRate 0.1711 Epoch: 8 Global Step: 85320 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:00:44,126-Speed 5961.46 samples/sec Loss 9.1522 LearningRate 0.1710 Epoch: 8 Global Step: 85330 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:00:51,000-Speed 5959.58 samples/sec Loss 9.1556 LearningRate 0.1710 Epoch: 8 Global Step: 85340 Fp16 Grad Scale: 262144 Required: 24 hours Training: 2022-01-08 13:00:58,220-Speed 5674.53 samples/sec Loss 9.1916 LearningRate 0.1710 Epoch: 8 Global Step: 85350 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:01:05,193-Speed 5875.07 samples/sec Loss 9.1733 LearningRate 0.1710 Epoch: 8 Global Step: 85360 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:01:12,169-Speed 5874.91 samples/sec Loss 9.2379 LearningRate 0.1709 Epoch: 8 Global Step: 85370 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:01:19,034-Speed 5967.72 samples/sec Loss 9.2023 LearningRate 0.1709 Epoch: 8 Global Step: 85380 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:01:25,904-Speed 5963.26 samples/sec Loss 9.1868 LearningRate 0.1709 Epoch: 8 Global Step: 85390 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:01:32,803-Speed 5938.70 samples/sec Loss 9.1770 LearningRate 0.1709 Epoch: 8 Global Step: 85400 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:01:39,743-Speed 5903.02 samples/sec Loss 9.1124 LearningRate 0.1708 Epoch: 8 Global Step: 85410 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:01:46,615-Speed 5961.23 samples/sec Loss 9.1397 LearningRate 0.1708 Epoch: 8 Global Step: 85420 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:01:53,477-Speed 5970.14 samples/sec Loss 9.1964 LearningRate 0.1708 Epoch: 8 Global Step: 85430 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:02:00,342-Speed 5967.91 samples/sec Loss 9.1821 LearningRate 0.1707 Epoch: 8 Global Step: 85440 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:02:07,213-Speed 5962.64 samples/sec Loss 9.2031 LearningRate 0.1707 Epoch: 8 Global Step: 85450 Fp16 Grad Scale: 262144 Required: 24 hours Training: 2022-01-08 13:02:14,082-Speed 5964.67 samples/sec Loss 9.1691 LearningRate 0.1707 Epoch: 8 Global Step: 85460 Fp16 Grad Scale: 262144 Required: 24 hours Training: 2022-01-08 13:02:20,939-Speed 5974.63 samples/sec Loss 9.1557 LearningRate 0.1707 Epoch: 8 Global Step: 85470 Fp16 Grad Scale: 262144 Required: 24 hours Training: 2022-01-08 13:02:27,808-Speed 5965.19 samples/sec Loss 9.0612 LearningRate 0.1706 Epoch: 8 Global Step: 85480 Fp16 Grad Scale: 262144 Required: 24 hours Training: 2022-01-08 13:02:34,660-Speed 5978.53 samples/sec Loss 9.1282 LearningRate 0.1706 Epoch: 8 Global Step: 85490 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:02:41,542-Speed 5952.93 samples/sec Loss 9.1552 LearningRate 0.1706 Epoch: 8 Global Step: 85500 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:02:48,406-Speed 5970.55 samples/sec Loss 9.1449 LearningRate 0.1705 Epoch: 8 Global Step: 85510 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:02:55,258-Speed 5978.99 samples/sec Loss 9.1371 LearningRate 0.1705 Epoch: 8 Global Step: 85520 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:03:02,114-Speed 5974.72 samples/sec Loss 9.1829 LearningRate 0.1705 Epoch: 8 Global Step: 85530 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:03:09,026-Speed 5927.80 samples/sec Loss 9.1258 LearningRate 0.1705 Epoch: 8 Global Step: 85540 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:03:15,878-Speed 5978.50 samples/sec Loss 9.1269 LearningRate 0.1704 Epoch: 8 Global Step: 85550 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:03:22,750-Speed 5961.23 samples/sec Loss 9.1389 LearningRate 0.1704 Epoch: 8 Global Step: 85560 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:03:29,603-Speed 5978.68 samples/sec Loss 9.1890 LearningRate 0.1704 Epoch: 8 Global Step: 85570 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:03:36,479-Speed 5958.59 samples/sec Loss 9.0745 LearningRate 0.1703 Epoch: 8 Global Step: 85580 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:03:43,321-Speed 5987.20 samples/sec Loss 9.1074 LearningRate 0.1703 Epoch: 8 Global Step: 85590 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:03:50,180-Speed 5973.16 samples/sec Loss 9.1281 LearningRate 0.1703 Epoch: 8 Global Step: 85600 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:03:57,048-Speed 5965.17 samples/sec Loss 9.1559 LearningRate 0.1703 Epoch: 8 Global Step: 85610 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:04:03,908-Speed 5972.42 samples/sec Loss 9.2488 LearningRate 0.1702 Epoch: 8 Global Step: 85620 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:04:10,779-Speed 5962.81 samples/sec Loss 9.1424 LearningRate 0.1702 Epoch: 8 Global Step: 85630 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:04:17,652-Speed 5960.91 samples/sec Loss 9.1537 LearningRate 0.1702 Epoch: 8 Global Step: 85640 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:04:24,520-Speed 5964.62 samples/sec Loss 9.0688 LearningRate 0.1702 Epoch: 8 Global Step: 85650 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:04:31,371-Speed 5979.81 samples/sec Loss 9.1092 LearningRate 0.1701 Epoch: 8 Global Step: 85660 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:04:38,229-Speed 5973.46 samples/sec Loss 9.2079 LearningRate 0.1701 Epoch: 8 Global Step: 85670 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:04:45,087-Speed 5973.62 samples/sec Loss 9.1774 LearningRate 0.1701 Epoch: 8 Global Step: 85680 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:04:51,935-Speed 5982.46 samples/sec Loss 9.1632 LearningRate 0.1700 Epoch: 8 Global Step: 85690 Fp16 Grad Scale: 262144 Required: 24 hours Training: 2022-01-08 13:04:58,835-Speed 5937.57 samples/sec Loss 9.1125 LearningRate 0.1700 Epoch: 8 Global Step: 85700 Fp16 Grad Scale: 262144 Required: 24 hours Training: 2022-01-08 13:05:05,690-Speed 5976.70 samples/sec Loss 9.1089 LearningRate 0.1700 Epoch: 8 Global Step: 85710 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:05:12,547-Speed 5974.80 samples/sec Loss 9.1441 LearningRate 0.1700 Epoch: 8 Global Step: 85720 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:05:19,412-Speed 5967.57 samples/sec Loss 9.1668 LearningRate 0.1699 Epoch: 8 Global Step: 85730 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:05:26,294-Speed 5953.02 samples/sec Loss 9.1385 LearningRate 0.1699 Epoch: 8 Global Step: 85740 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:05:33,168-Speed 5960.54 samples/sec Loss 9.1835 LearningRate 0.1699 Epoch: 8 Global Step: 85750 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:05:40,057-Speed 5947.18 samples/sec Loss 9.1322 LearningRate 0.1698 Epoch: 8 Global Step: 85760 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:05:46,940-Speed 5952.35 samples/sec Loss 9.1868 LearningRate 0.1698 Epoch: 8 Global Step: 85770 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:05:53,797-Speed 5975.11 samples/sec Loss 9.1734 LearningRate 0.1698 Epoch: 8 Global Step: 85780 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:06:00,671-Speed 5959.40 samples/sec Loss 9.0615 LearningRate 0.1698 Epoch: 8 Global Step: 85790 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:06:07,570-Speed 5938.29 samples/sec Loss 9.0766 LearningRate 0.1697 Epoch: 8 Global Step: 85800 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:06:14,450-Speed 5957.14 samples/sec Loss 9.1488 LearningRate 0.1697 Epoch: 8 Global Step: 85810 Fp16 Grad Scale: 262144 Required: 24 hours Training: 2022-01-08 13:06:21,321-Speed 5962.64 samples/sec Loss 9.1407 LearningRate 0.1697 Epoch: 8 Global Step: 85820 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:06:28,191-Speed 5962.68 samples/sec Loss 9.1682 LearningRate 0.1696 Epoch: 8 Global Step: 85830 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:06:35,063-Speed 5963.60 samples/sec Loss 9.0593 LearningRate 0.1696 Epoch: 8 Global Step: 85840 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:06:41,934-Speed 5962.62 samples/sec Loss 9.0953 LearningRate 0.1696 Epoch: 8 Global Step: 85850 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:06:48,787-Speed 5977.78 samples/sec Loss 9.1345 LearningRate 0.1696 Epoch: 8 Global Step: 85860 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:06:55,644-Speed 5975.53 samples/sec Loss 9.0832 LearningRate 0.1695 Epoch: 8 Global Step: 85870 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:07:02,514-Speed 5963.69 samples/sec Loss 9.1766 LearningRate 0.1695 Epoch: 8 Global Step: 85880 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:07:09,382-Speed 5964.52 samples/sec Loss 9.1310 LearningRate 0.1695 Epoch: 8 Global Step: 85890 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:07:16,285-Speed 5935.18 samples/sec Loss 9.1443 LearningRate 0.1695 Epoch: 8 Global Step: 85900 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:07:23,144-Speed 5972.82 samples/sec Loss 9.1509 LearningRate 0.1694 Epoch: 8 Global Step: 85910 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:07:30,010-Speed 5966.65 samples/sec Loss 9.1465 LearningRate 0.1694 Epoch: 8 Global Step: 85920 Fp16 Grad Scale: 262144 Required: 24 hours Training: 2022-01-08 13:07:36,880-Speed 5964.80 samples/sec Loss 9.1192 LearningRate 0.1694 Epoch: 8 Global Step: 85930 Fp16 Grad Scale: 262144 Required: 24 hours Training: 2022-01-08 13:07:43,744-Speed 5968.90 samples/sec Loss 9.1625 LearningRate 0.1693 Epoch: 8 Global Step: 85940 Fp16 Grad Scale: 262144 Required: 24 hours Training: 2022-01-08 13:07:50,626-Speed 5954.74 samples/sec Loss 9.1470 LearningRate 0.1693 Epoch: 8 Global Step: 85950 Fp16 Grad Scale: 262144 Required: 24 hours Training: 2022-01-08 13:07:57,518-Speed 5944.61 samples/sec Loss 9.1628 LearningRate 0.1693 Epoch: 8 Global Step: 85960 Fp16 Grad Scale: 262144 Required: 24 hours Training: 2022-01-08 13:08:04,451-Speed 5908.94 samples/sec Loss 9.1270 LearningRate 0.1693 Epoch: 8 Global Step: 85970 Fp16 Grad Scale: 262144 Required: 24 hours Training: 2022-01-08 13:08:11,325-Speed 5960.38 samples/sec Loss 9.0889 LearningRate 0.1692 Epoch: 8 Global Step: 85980 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:08:18,186-Speed 5971.36 samples/sec Loss 9.0986 LearningRate 0.1692 Epoch: 8 Global Step: 85990 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:08:25,051-Speed 5967.94 samples/sec Loss 9.0863 LearningRate 0.1692 Epoch: 8 Global Step: 86000 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:08:31,917-Speed 5966.95 samples/sec Loss 9.1037 LearningRate 0.1691 Epoch: 8 Global Step: 86010 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:08:38,779-Speed 5970.26 samples/sec Loss 9.1353 LearningRate 0.1691 Epoch: 8 Global Step: 86020 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 13:08:45,656-Speed 5956.83 samples/sec Loss 9.1735 LearningRate 0.1691 Epoch: 8 Global Step: 86030 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 13:08:52,525-Speed 5963.94 samples/sec Loss 9.1280 LearningRate 0.1691 Epoch: 8 Global Step: 86040 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 13:08:59,408-Speed 5952.02 samples/sec Loss 9.1007 LearningRate 0.1690 Epoch: 8 Global Step: 86050 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 13:09:06,268-Speed 5973.32 samples/sec Loss 9.1471 LearningRate 0.1690 Epoch: 8 Global Step: 86060 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 13:09:13,132-Speed 5968.50 samples/sec Loss 9.1841 LearningRate 0.1690 Epoch: 8 Global Step: 86070 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 13:09:19,989-Speed 5973.96 samples/sec Loss 9.0865 LearningRate 0.1690 Epoch: 8 Global Step: 86080 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 13:09:26,875-Speed 5952.44 samples/sec Loss 9.1124 LearningRate 0.1689 Epoch: 8 Global Step: 86090 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 13:09:33,728-Speed 5977.73 samples/sec Loss 9.0930 LearningRate 0.1689 Epoch: 8 Global Step: 86100 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 13:09:40,595-Speed 5965.97 samples/sec Loss 9.1295 LearningRate 0.1689 Epoch: 8 Global Step: 86110 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 13:09:47,472-Speed 5957.32 samples/sec Loss 9.2116 LearningRate 0.1688 Epoch: 8 Global Step: 86120 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 13:09:54,358-Speed 5949.20 samples/sec Loss 9.1026 LearningRate 0.1688 Epoch: 8 Global Step: 86130 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 13:10:01,207-Speed 5981.99 samples/sec Loss 9.0902 LearningRate 0.1688 Epoch: 8 Global Step: 86140 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 13:10:08,081-Speed 5959.24 samples/sec Loss 9.1448 LearningRate 0.1688 Epoch: 8 Global Step: 86150 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 13:10:14,980-Speed 5938.87 samples/sec Loss 9.1428 LearningRate 0.1687 Epoch: 8 Global Step: 86160 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 13:10:21,843-Speed 5969.60 samples/sec Loss 9.1121 LearningRate 0.1687 Epoch: 8 Global Step: 86170 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 13:10:28,705-Speed 5969.79 samples/sec Loss 9.0934 LearningRate 0.1687 Epoch: 8 Global Step: 86180 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 13:10:35,582-Speed 5957.38 samples/sec Loss 9.0483 LearningRate 0.1686 Epoch: 8 Global Step: 86190 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 13:10:42,468-Speed 5950.17 samples/sec Loss 9.0755 LearningRate 0.1686 Epoch: 8 Global Step: 86200 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 13:10:49,362-Speed 5942.51 samples/sec Loss 9.0332 LearningRate 0.1686 Epoch: 8 Global Step: 86210 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 13:10:56,241-Speed 5954.99 samples/sec Loss 9.0686 LearningRate 0.1686 Epoch: 8 Global Step: 86220 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:11:03,112-Speed 5963.21 samples/sec Loss 9.1465 LearningRate 0.1685 Epoch: 8 Global Step: 86230 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:11:09,974-Speed 5970.34 samples/sec Loss 9.0854 LearningRate 0.1685 Epoch: 8 Global Step: 86240 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:11:16,829-Speed 5975.75 samples/sec Loss 9.2217 LearningRate 0.1685 Epoch: 8 Global Step: 86250 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:11:23,698-Speed 5964.30 samples/sec Loss 9.1201 LearningRate 0.1685 Epoch: 8 Global Step: 86260 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:11:30,559-Speed 5971.72 samples/sec Loss 9.1551 LearningRate 0.1684 Epoch: 8 Global Step: 86270 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:11:37,441-Speed 5952.75 samples/sec Loss 9.1008 LearningRate 0.1684 Epoch: 8 Global Step: 86280 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:11:44,302-Speed 5970.95 samples/sec Loss 9.0636 LearningRate 0.1684 Epoch: 8 Global Step: 86290 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 13:11:51,169-Speed 5966.59 samples/sec Loss 9.1015 LearningRate 0.1683 Epoch: 8 Global Step: 86300 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 13:11:58,027-Speed 5973.63 samples/sec Loss 9.0883 LearningRate 0.1683 Epoch: 8 Global Step: 86310 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 13:12:04,903-Speed 5958.85 samples/sec Loss 9.1180 LearningRate 0.1683 Epoch: 8 Global Step: 86320 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 13:12:11,782-Speed 5955.51 samples/sec Loss 9.1147 LearningRate 0.1683 Epoch: 8 Global Step: 86330 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 13:12:18,649-Speed 5965.73 samples/sec Loss 9.1432 LearningRate 0.1682 Epoch: 8 Global Step: 86340 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 13:12:25,509-Speed 5971.95 samples/sec Loss 9.0465 LearningRate 0.1682 Epoch: 8 Global Step: 86350 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 13:12:32,373-Speed 5969.20 samples/sec Loss 9.1730 LearningRate 0.1682 Epoch: 8 Global Step: 86360 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 13:12:39,229-Speed 5974.56 samples/sec Loss 9.0540 LearningRate 0.1681 Epoch: 8 Global Step: 86370 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 13:12:46,097-Speed 5966.14 samples/sec Loss 9.0253 LearningRate 0.1681 Epoch: 8 Global Step: 86380 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 13:12:52,995-Speed 5938.84 samples/sec Loss 9.0970 LearningRate 0.1681 Epoch: 8 Global Step: 86390 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:12:59,886-Speed 5945.83 samples/sec Loss 9.0205 LearningRate 0.1681 Epoch: 8 Global Step: 86400 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:13:06,752-Speed 5967.00 samples/sec Loss 9.1114 LearningRate 0.1680 Epoch: 8 Global Step: 86410 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:13:13,624-Speed 5962.01 samples/sec Loss 9.1092 LearningRate 0.1680 Epoch: 8 Global Step: 86420 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:13:20,491-Speed 5965.71 samples/sec Loss 9.0737 LearningRate 0.1680 Epoch: 8 Global Step: 86430 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:13:27,349-Speed 5973.71 samples/sec Loss 9.0363 LearningRate 0.1680 Epoch: 8 Global Step: 86440 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:13:34,213-Speed 5969.01 samples/sec Loss 9.0941 LearningRate 0.1679 Epoch: 8 Global Step: 86450 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:13:41,081-Speed 5965.35 samples/sec Loss 9.0115 LearningRate 0.1679 Epoch: 8 Global Step: 86460 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:13:47,942-Speed 5970.93 samples/sec Loss 9.0584 LearningRate 0.1679 Epoch: 8 Global Step: 86470 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:13:54,806-Speed 5968.40 samples/sec Loss 9.0554 LearningRate 0.1678 Epoch: 8 Global Step: 86480 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:14:01,681-Speed 5959.26 samples/sec Loss 9.0908 LearningRate 0.1678 Epoch: 8 Global Step: 86490 Fp16 Grad Scale: 262144 Required: 24 hours Training: 2022-01-08 13:14:11,145-Speed 4328.57 samples/sec Loss 9.0950 LearningRate 0.1678 Epoch: 8 Global Step: 86500 Fp16 Grad Scale: 262144 Required: 24 hours Training: 2022-01-08 13:14:17,995-Speed 5981.48 samples/sec Loss 9.0964 LearningRate 0.1678 Epoch: 8 Global Step: 86510 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:14:24,874-Speed 5954.75 samples/sec Loss 9.1014 LearningRate 0.1677 Epoch: 8 Global Step: 86520 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:14:31,736-Speed 5971.15 samples/sec Loss 9.0775 LearningRate 0.1677 Epoch: 8 Global Step: 86530 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:14:38,631-Speed 5941.45 samples/sec Loss 9.0739 LearningRate 0.1677 Epoch: 8 Global Step: 86540 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:14:45,483-Speed 5978.32 samples/sec Loss 9.0832 LearningRate 0.1676 Epoch: 8 Global Step: 86550 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 13:14:52,344-Speed 5971.67 samples/sec Loss 9.0505 LearningRate 0.1676 Epoch: 8 Global Step: 86560 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 13:14:59,210-Speed 5966.66 samples/sec Loss 9.1199 LearningRate 0.1676 Epoch: 8 Global Step: 86570 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 13:15:06,073-Speed 5969.04 samples/sec Loss 9.1205 LearningRate 0.1676 Epoch: 8 Global Step: 86580 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 13:15:12,929-Speed 5975.98 samples/sec Loss 9.1344 LearningRate 0.1675 Epoch: 8 Global Step: 86590 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 13:15:19,786-Speed 5974.24 samples/sec Loss 9.0481 LearningRate 0.1675 Epoch: 8 Global Step: 86600 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 13:15:26,647-Speed 5971.25 samples/sec Loss 9.0665 LearningRate 0.1675 Epoch: 8 Global Step: 86610 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 13:15:33,514-Speed 5966.02 samples/sec Loss 9.0846 LearningRate 0.1675 Epoch: 8 Global Step: 86620 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 13:15:40,369-Speed 5976.50 samples/sec Loss 9.0986 LearningRate 0.1674 Epoch: 8 Global Step: 86630 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 13:15:47,254-Speed 5950.27 samples/sec Loss 8.9905 LearningRate 0.1674 Epoch: 8 Global Step: 86640 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 13:15:54,129-Speed 5959.03 samples/sec Loss 9.1102 LearningRate 0.1674 Epoch: 8 Global Step: 86650 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:16:00,996-Speed 5965.68 samples/sec Loss 9.0621 LearningRate 0.1673 Epoch: 8 Global Step: 86660 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:16:07,856-Speed 5972.50 samples/sec Loss 9.0461 LearningRate 0.1673 Epoch: 8 Global Step: 86670 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:16:14,725-Speed 5964.37 samples/sec Loss 9.1278 LearningRate 0.1673 Epoch: 8 Global Step: 86680 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:16:21,599-Speed 5959.59 samples/sec Loss 9.0922 LearningRate 0.1673 Epoch: 8 Global Step: 86690 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:16:28,460-Speed 5971.27 samples/sec Loss 9.0892 LearningRate 0.1672 Epoch: 8 Global Step: 86700 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:16:35,330-Speed 5963.37 samples/sec Loss 9.1281 LearningRate 0.1672 Epoch: 8 Global Step: 86710 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:16:42,200-Speed 5963.56 samples/sec Loss 9.0035 LearningRate 0.1672 Epoch: 8 Global Step: 86720 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:16:49,108-Speed 5930.24 samples/sec Loss 9.0183 LearningRate 0.1671 Epoch: 8 Global Step: 86730 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:16:55,972-Speed 5969.06 samples/sec Loss 9.0700 LearningRate 0.1671 Epoch: 8 Global Step: 86740 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:17:02,840-Speed 5965.48 samples/sec Loss 9.0867 LearningRate 0.1671 Epoch: 8 Global Step: 86750 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:17:09,711-Speed 5961.84 samples/sec Loss 8.9989 LearningRate 0.1671 Epoch: 8 Global Step: 86760 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:17:16,580-Speed 5964.60 samples/sec Loss 9.1123 LearningRate 0.1670 Epoch: 8 Global Step: 86770 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:17:23,454-Speed 5959.49 samples/sec Loss 9.0577 LearningRate 0.1670 Epoch: 8 Global Step: 86780 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:17:30,307-Speed 5978.14 samples/sec Loss 9.1001 LearningRate 0.1670 Epoch: 8 Global Step: 86790 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:17:37,174-Speed 5966.12 samples/sec Loss 9.0533 LearningRate 0.1670 Epoch: 8 Global Step: 86800 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:17:44,051-Speed 5956.92 samples/sec Loss 8.9999 LearningRate 0.1669 Epoch: 8 Global Step: 86810 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:17:50,915-Speed 5968.30 samples/sec Loss 8.9591 LearningRate 0.1669 Epoch: 8 Global Step: 86820 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:17:57,791-Speed 5958.71 samples/sec Loss 9.0752 LearningRate 0.1669 Epoch: 8 Global Step: 86830 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:18:04,691-Speed 5939.96 samples/sec Loss 9.0282 LearningRate 0.1668 Epoch: 8 Global Step: 86840 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:18:11,531-Speed 5989.29 samples/sec Loss 9.0941 LearningRate 0.1668 Epoch: 8 Global Step: 86850 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:18:18,404-Speed 5960.49 samples/sec Loss 9.0041 LearningRate 0.1668 Epoch: 8 Global Step: 86860 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:18:25,270-Speed 5967.26 samples/sec Loss 9.0224 LearningRate 0.1668 Epoch: 8 Global Step: 86870 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:18:32,134-Speed 5968.66 samples/sec Loss 9.0144 LearningRate 0.1667 Epoch: 8 Global Step: 86880 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 13:18:38,998-Speed 5968.38 samples/sec Loss 9.0535 LearningRate 0.1667 Epoch: 8 Global Step: 86890 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 13:18:45,863-Speed 5967.84 samples/sec Loss 9.0819 LearningRate 0.1667 Epoch: 8 Global Step: 86900 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 13:18:52,717-Speed 5977.02 samples/sec Loss 9.1402 LearningRate 0.1666 Epoch: 8 Global Step: 86910 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 13:18:59,563-Speed 5983.82 samples/sec Loss 9.0872 LearningRate 0.1666 Epoch: 8 Global Step: 86920 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 13:19:06,443-Speed 5954.59 samples/sec Loss 9.0546 LearningRate 0.1666 Epoch: 8 Global Step: 86930 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 13:19:13,298-Speed 5976.49 samples/sec Loss 9.0061 LearningRate 0.1666 Epoch: 8 Global Step: 86940 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 13:19:20,214-Speed 5924.24 samples/sec Loss 8.9606 LearningRate 0.1665 Epoch: 8 Global Step: 86950 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 13:19:27,101-Speed 5949.64 samples/sec Loss 9.0020 LearningRate 0.1665 Epoch: 8 Global Step: 86960 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 13:19:33,981-Speed 5954.68 samples/sec Loss 8.9649 LearningRate 0.1665 Epoch: 8 Global Step: 86970 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 13:19:40,842-Speed 5971.33 samples/sec Loss 9.0208 LearningRate 0.1665 Epoch: 8 Global Step: 86980 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:19:47,705-Speed 5969.54 samples/sec Loss 9.0839 LearningRate 0.1664 Epoch: 8 Global Step: 86990 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:19:54,582-Speed 5956.83 samples/sec Loss 9.1570 LearningRate 0.1664 Epoch: 8 Global Step: 87000 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:20:01,452-Speed 5963.69 samples/sec Loss 9.0541 LearningRate 0.1664 Epoch: 8 Global Step: 87010 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:20:08,315-Speed 5971.61 samples/sec Loss 9.0047 LearningRate 0.1663 Epoch: 8 Global Step: 87020 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:20:15,183-Speed 5964.97 samples/sec Loss 9.0520 LearningRate 0.1663 Epoch: 8 Global Step: 87030 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:20:22,053-Speed 5963.96 samples/sec Loss 9.1114 LearningRate 0.1663 Epoch: 8 Global Step: 87040 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:20:28,933-Speed 5954.91 samples/sec Loss 9.0957 LearningRate 0.1663 Epoch: 8 Global Step: 87050 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:20:35,799-Speed 5966.08 samples/sec Loss 9.0549 LearningRate 0.1662 Epoch: 8 Global Step: 87060 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:20:42,662-Speed 5969.45 samples/sec Loss 9.0519 LearningRate 0.1662 Epoch: 8 Global Step: 87070 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:20:49,528-Speed 5966.92 samples/sec Loss 9.0337 LearningRate 0.1662 Epoch: 8 Global Step: 87080 Fp16 Grad Scale: 262144 Required: 24 hours Training: 2022-01-08 13:20:56,467-Speed 5904.58 samples/sec Loss 9.0329 LearningRate 0.1661 Epoch: 8 Global Step: 87090 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:21:03,319-Speed 5978.93 samples/sec Loss 9.0001 LearningRate 0.1661 Epoch: 8 Global Step: 87100 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:21:10,174-Speed 5976.99 samples/sec Loss 9.0399 LearningRate 0.1661 Epoch: 8 Global Step: 87110 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:21:17,043-Speed 5963.45 samples/sec Loss 8.9887 LearningRate 0.1661 Epoch: 8 Global Step: 87120 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:21:25,426-Speed 4887.14 samples/sec Loss 8.9950 LearningRate 0.1660 Epoch: 8 Global Step: 87130 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:21:32,308-Speed 5954.41 samples/sec Loss 8.9937 LearningRate 0.1660 Epoch: 8 Global Step: 87140 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:21:39,162-Speed 5976.85 samples/sec Loss 9.0880 LearningRate 0.1660 Epoch: 8 Global Step: 87150 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:21:46,037-Speed 5959.39 samples/sec Loss 8.9811 LearningRate 0.1660 Epoch: 8 Global Step: 87160 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:21:52,907-Speed 5964.89 samples/sec Loss 9.0048 LearningRate 0.1659 Epoch: 8 Global Step: 87170 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:21:59,760-Speed 5977.31 samples/sec Loss 8.9834 LearningRate 0.1659 Epoch: 8 Global Step: 87180 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:22:06,600-Speed 5989.88 samples/sec Loss 9.0431 LearningRate 0.1659 Epoch: 8 Global Step: 87190 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:22:13,456-Speed 5975.46 samples/sec Loss 8.9971 LearningRate 0.1658 Epoch: 8 Global Step: 87200 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:22:20,323-Speed 5965.59 samples/sec Loss 9.0475 LearningRate 0.1658 Epoch: 8 Global Step: 87210 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:22:27,190-Speed 5965.70 samples/sec Loss 9.0003 LearningRate 0.1658 Epoch: 8 Global Step: 87220 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:22:34,052-Speed 5973.76 samples/sec Loss 9.0575 LearningRate 0.1658 Epoch: 8 Global Step: 87230 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:22:40,903-Speed 5979.47 samples/sec Loss 9.0065 LearningRate 0.1657 Epoch: 8 Global Step: 87240 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:22:47,751-Speed 5985.09 samples/sec Loss 8.9979 LearningRate 0.1657 Epoch: 8 Global Step: 87250 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:22:54,613-Speed 5969.55 samples/sec Loss 9.1282 LearningRate 0.1657 Epoch: 8 Global Step: 87260 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:23:01,478-Speed 5968.34 samples/sec Loss 9.0170 LearningRate 0.1657 Epoch: 8 Global Step: 87270 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:23:08,340-Speed 5969.73 samples/sec Loss 8.9908 LearningRate 0.1656 Epoch: 8 Global Step: 87280 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:23:15,197-Speed 5975.50 samples/sec Loss 9.0760 LearningRate 0.1656 Epoch: 8 Global Step: 87290 Fp16 Grad Scale: 262144 Required: 24 hours Training: 2022-01-08 13:23:22,082-Speed 5950.09 samples/sec Loss 9.0539 LearningRate 0.1656 Epoch: 8 Global Step: 87300 Fp16 Grad Scale: 262144 Required: 24 hours Training: 2022-01-08 13:23:28,951-Speed 5964.15 samples/sec Loss 9.0606 LearningRate 0.1655 Epoch: 8 Global Step: 87310 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:23:35,806-Speed 5977.10 samples/sec Loss 9.0505 LearningRate 0.1655 Epoch: 8 Global Step: 87320 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:23:42,664-Speed 5973.73 samples/sec Loss 8.9265 LearningRate 0.1655 Epoch: 8 Global Step: 87330 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:23:49,560-Speed 5941.57 samples/sec Loss 9.0285 LearningRate 0.1655 Epoch: 8 Global Step: 87340 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:23:56,457-Speed 5948.00 samples/sec Loss 9.0931 LearningRate 0.1654 Epoch: 8 Global Step: 87350 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:24:03,328-Speed 5962.20 samples/sec Loss 8.9937 LearningRate 0.1654 Epoch: 8 Global Step: 87360 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:24:10,197-Speed 5964.01 samples/sec Loss 8.9847 LearningRate 0.1654 Epoch: 8 Global Step: 87370 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:24:17,046-Speed 5982.86 samples/sec Loss 8.9755 LearningRate 0.1653 Epoch: 8 Global Step: 87380 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:24:23,903-Speed 5975.03 samples/sec Loss 9.0489 LearningRate 0.1653 Epoch: 8 Global Step: 87390 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:24:30,780-Speed 5957.51 samples/sec Loss 9.0075 LearningRate 0.1653 Epoch: 8 Global Step: 87400 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:24:37,631-Speed 5979.60 samples/sec Loss 9.0850 LearningRate 0.1653 Epoch: 8 Global Step: 87410 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:24:44,478-Speed 5983.24 samples/sec Loss 9.0399 LearningRate 0.1652 Epoch: 8 Global Step: 87420 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:24:51,321-Speed 5987.58 samples/sec Loss 8.9592 LearningRate 0.1652 Epoch: 8 Global Step: 87430 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:24:58,175-Speed 5977.80 samples/sec Loss 8.9702 LearningRate 0.1652 Epoch: 8 Global Step: 87440 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 13:25:05,052-Speed 5957.18 samples/sec Loss 9.0045 LearningRate 0.1652 Epoch: 8 Global Step: 87450 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 13:25:11,909-Speed 5974.86 samples/sec Loss 9.0413 LearningRate 0.1651 Epoch: 8 Global Step: 87460 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 13:25:18,782-Speed 5960.72 samples/sec Loss 9.0003 LearningRate 0.1651 Epoch: 8 Global Step: 87470 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 13:25:25,642-Speed 5971.53 samples/sec Loss 8.9786 LearningRate 0.1651 Epoch: 8 Global Step: 87480 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 13:25:32,490-Speed 5983.49 samples/sec Loss 8.9050 LearningRate 0.1650 Epoch: 8 Global Step: 87490 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 13:25:39,338-Speed 5982.60 samples/sec Loss 8.9706 LearningRate 0.1650 Epoch: 8 Global Step: 87500 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 13:25:46,212-Speed 5959.49 samples/sec Loss 8.9474 LearningRate 0.1650 Epoch: 8 Global Step: 87510 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 13:25:53,078-Speed 5967.43 samples/sec Loss 9.0083 LearningRate 0.1650 Epoch: 8 Global Step: 87520 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 13:25:59,932-Speed 5977.17 samples/sec Loss 8.9973 LearningRate 0.1649 Epoch: 8 Global Step: 87530 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 13:26:06,793-Speed 5971.66 samples/sec Loss 9.0851 LearningRate 0.1649 Epoch: 8 Global Step: 87540 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 13:26:13,656-Speed 5969.09 samples/sec Loss 9.0116 LearningRate 0.1649 Epoch: 8 Global Step: 87550 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:26:21,009-Speed 5571.54 samples/sec Loss 8.8967 LearningRate 0.1649 Epoch: 8 Global Step: 87560 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:26:27,876-Speed 5965.70 samples/sec Loss 8.9699 LearningRate 0.1648 Epoch: 8 Global Step: 87570 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:26:34,774-Speed 5939.30 samples/sec Loss 9.1285 LearningRate 0.1648 Epoch: 8 Global Step: 87580 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:26:41,648-Speed 5960.40 samples/sec Loss 9.0058 LearningRate 0.1648 Epoch: 8 Global Step: 87590 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:26:48,500-Speed 5978.72 samples/sec Loss 9.0037 LearningRate 0.1647 Epoch: 8 Global Step: 87600 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:26:55,391-Speed 5945.36 samples/sec Loss 8.9320 LearningRate 0.1647 Epoch: 8 Global Step: 87610 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:27:02,260-Speed 5963.84 samples/sec Loss 8.9895 LearningRate 0.1647 Epoch: 8 Global Step: 87620 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:27:09,138-Speed 5956.44 samples/sec Loss 9.0776 LearningRate 0.1647 Epoch: 8 Global Step: 87630 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:27:16,005-Speed 5965.74 samples/sec Loss 8.9920 LearningRate 0.1646 Epoch: 8 Global Step: 87640 Fp16 Grad Scale: 262144 Required: 23 hours Training: 2022-01-08 13:27:22,886-Speed 5954.60 samples/sec Loss 9.0130 LearningRate 0.1646 Epoch: 8 Global Step: 87650 Fp16 Grad Scale: 262144 Required: 23 hours Training: 2022-01-08 13:27:29,758-Speed 5961.47 samples/sec Loss 8.9959 LearningRate 0.1646 Epoch: 8 Global Step: 87660 Fp16 Grad Scale: 262144 Required: 23 hours Training: 2022-01-08 13:27:36,622-Speed 5968.36 samples/sec Loss 9.0034 LearningRate 0.1646 Epoch: 8 Global Step: 87670 Fp16 Grad Scale: 262144 Required: 23 hours Training: 2022-01-08 13:27:43,494-Speed 5961.94 samples/sec Loss 8.9714 LearningRate 0.1645 Epoch: 8 Global Step: 87680 Fp16 Grad Scale: 262144 Required: 23 hours Training: 2022-01-08 13:27:50,337-Speed 5986.35 samples/sec Loss 9.0013 LearningRate 0.1645 Epoch: 8 Global Step: 87690 Fp16 Grad Scale: 262144 Required: 23 hours Training: 2022-01-08 13:27:57,229-Speed 5946.22 samples/sec Loss 8.9459 LearningRate 0.1645 Epoch: 8 Global Step: 87700 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:28:04,078-Speed 5981.69 samples/sec Loss 8.9295 LearningRate 0.1644 Epoch: 8 Global Step: 87710 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:28:10,923-Speed 5984.90 samples/sec Loss 9.0471 LearningRate 0.1644 Epoch: 8 Global Step: 87720 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:28:17,778-Speed 5976.02 samples/sec Loss 9.0276 LearningRate 0.1644 Epoch: 8 Global Step: 87730 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:28:24,624-Speed 5984.51 samples/sec Loss 8.9952 LearningRate 0.1644 Epoch: 8 Global Step: 87740 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:28:31,476-Speed 5978.64 samples/sec Loss 8.9399 LearningRate 0.1643 Epoch: 8 Global Step: 87750 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:28:38,315-Speed 5989.61 samples/sec Loss 8.9694 LearningRate 0.1643 Epoch: 8 Global Step: 87760 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:28:45,184-Speed 5965.77 samples/sec Loss 9.0154 LearningRate 0.1643 Epoch: 8 Global Step: 87770 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:28:52,032-Speed 5982.42 samples/sec Loss 8.9978 LearningRate 0.1642 Epoch: 8 Global Step: 87780 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:28:58,893-Speed 5971.15 samples/sec Loss 8.9846 LearningRate 0.1642 Epoch: 8 Global Step: 87790 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:29:05,736-Speed 5986.92 samples/sec Loss 8.9323 LearningRate 0.1642 Epoch: 8 Global Step: 87800 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:29:12,615-Speed 5954.92 samples/sec Loss 8.9552 LearningRate 0.1642 Epoch: 8 Global Step: 87810 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:29:19,469-Speed 5977.26 samples/sec Loss 8.9894 LearningRate 0.1641 Epoch: 8 Global Step: 87820 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:29:26,309-Speed 5989.57 samples/sec Loss 9.0009 LearningRate 0.1641 Epoch: 8 Global Step: 87830 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:29:33,175-Speed 5966.59 samples/sec Loss 9.0020 LearningRate 0.1641 Epoch: 8 Global Step: 87840 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:29:40,057-Speed 5956.41 samples/sec Loss 8.9751 LearningRate 0.1641 Epoch: 8 Global Step: 87850 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:29:46,927-Speed 5963.91 samples/sec Loss 8.9595 LearningRate 0.1640 Epoch: 8 Global Step: 87860 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:29:53,787-Speed 5971.73 samples/sec Loss 8.9925 LearningRate 0.1640 Epoch: 8 Global Step: 87870 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:30:00,648-Speed 5971.64 samples/sec Loss 8.9351 LearningRate 0.1640 Epoch: 8 Global Step: 87880 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:30:07,538-Speed 5946.13 samples/sec Loss 9.0021 LearningRate 0.1639 Epoch: 8 Global Step: 87890 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:30:14,432-Speed 5942.32 samples/sec Loss 9.0458 LearningRate 0.1639 Epoch: 8 Global Step: 87900 Fp16 Grad Scale: 262144 Required: 23 hours Training: 2022-01-08 13:30:21,274-Speed 5988.15 samples/sec Loss 9.0007 LearningRate 0.1639 Epoch: 8 Global Step: 87910 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:30:28,127-Speed 5981.42 samples/sec Loss 8.9538 LearningRate 0.1639 Epoch: 8 Global Step: 87920 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:30:35,037-Speed 5928.50 samples/sec Loss 9.0013 LearningRate 0.1638 Epoch: 8 Global Step: 87930 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:30:41,909-Speed 5961.64 samples/sec Loss 8.9211 LearningRate 0.1638 Epoch: 8 Global Step: 87940 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:30:48,796-Speed 5948.96 samples/sec Loss 8.9954 LearningRate 0.1638 Epoch: 8 Global Step: 87950 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:30:55,642-Speed 5984.64 samples/sec Loss 8.9615 LearningRate 0.1638 Epoch: 8 Global Step: 87960 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 13:31:02,538-Speed 5940.46 samples/sec Loss 8.9678 LearningRate 0.1637 Epoch: 8 Global Step: 87970 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 13:31:09,400-Speed 5971.19 samples/sec Loss 8.8999 LearningRate 0.1637 Epoch: 8 Global Step: 87980 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 13:31:16,239-Speed 5989.63 samples/sec Loss 9.0366 LearningRate 0.1637 Epoch: 8 Global Step: 87990 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 13:31:23,127-Speed 5948.80 samples/sec Loss 8.9250 LearningRate 0.1636 Epoch: 8 Global Step: 88000 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 13:31:29,983-Speed 5978.49 samples/sec Loss 9.0020 LearningRate 0.1636 Epoch: 8 Global Step: 88010 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 13:31:36,847-Speed 5967.72 samples/sec Loss 9.0246 LearningRate 0.1636 Epoch: 8 Global Step: 88020 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 13:31:43,716-Speed 5964.73 samples/sec Loss 8.9016 LearningRate 0.1636 Epoch: 8 Global Step: 88030 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 13:31:50,574-Speed 5973.41 samples/sec Loss 8.9854 LearningRate 0.1635 Epoch: 8 Global Step: 88040 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 13:31:57,431-Speed 5974.06 samples/sec Loss 8.9731 LearningRate 0.1635 Epoch: 8 Global Step: 88050 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 13:32:04,327-Speed 5942.02 samples/sec Loss 8.9761 LearningRate 0.1635 Epoch: 8 Global Step: 88060 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:32:11,171-Speed 5986.44 samples/sec Loss 9.0159 LearningRate 0.1635 Epoch: 8 Global Step: 88070 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:32:18,050-Speed 5955.63 samples/sec Loss 8.9598 LearningRate 0.1634 Epoch: 8 Global Step: 88080 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:32:24,903-Speed 5978.24 samples/sec Loss 9.0476 LearningRate 0.1634 Epoch: 8 Global Step: 88090 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:32:31,758-Speed 5976.71 samples/sec Loss 8.9219 LearningRate 0.1634 Epoch: 8 Global Step: 88100 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:32:38,603-Speed 5984.65 samples/sec Loss 8.9267 LearningRate 0.1633 Epoch: 8 Global Step: 88110 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:32:45,515-Speed 5976.90 samples/sec Loss 8.9529 LearningRate 0.1633 Epoch: 8 Global Step: 88120 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:32:52,391-Speed 5958.37 samples/sec Loss 8.9393 LearningRate 0.1633 Epoch: 8 Global Step: 88130 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:32:59,266-Speed 5961.34 samples/sec Loss 8.9432 LearningRate 0.1633 Epoch: 8 Global Step: 88140 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:33:06,113-Speed 5982.70 samples/sec Loss 8.8700 LearningRate 0.1632 Epoch: 8 Global Step: 88150 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:33:12,973-Speed 5972.46 samples/sec Loss 8.9178 LearningRate 0.1632 Epoch: 8 Global Step: 88160 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:33:19,826-Speed 5980.24 samples/sec Loss 8.9539 LearningRate 0.1632 Epoch: 8 Global Step: 88170 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:33:26,734-Speed 5930.26 samples/sec Loss 8.8745 LearningRate 0.1632 Epoch: 8 Global Step: 88180 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:33:33,592-Speed 5973.53 samples/sec Loss 8.8297 LearningRate 0.1631 Epoch: 8 Global Step: 88190 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:33:40,447-Speed 5978.55 samples/sec Loss 8.9633 LearningRate 0.1631 Epoch: 8 Global Step: 88200 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:33:47,325-Speed 5955.86 samples/sec Loss 8.9309 LearningRate 0.1631 Epoch: 8 Global Step: 88210 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:33:54,216-Speed 5945.94 samples/sec Loss 8.9317 LearningRate 0.1630 Epoch: 8 Global Step: 88220 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:34:01,108-Speed 5945.70 samples/sec Loss 8.9243 LearningRate 0.1630 Epoch: 8 Global Step: 88230 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:34:07,983-Speed 5958.97 samples/sec Loss 8.9939 LearningRate 0.1630 Epoch: 8 Global Step: 88240 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 13:34:14,829-Speed 5984.40 samples/sec Loss 8.9234 LearningRate 0.1630 Epoch: 8 Global Step: 88250 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 13:34:21,675-Speed 5983.90 samples/sec Loss 8.8985 LearningRate 0.1629 Epoch: 8 Global Step: 88260 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 13:34:28,514-Speed 5990.28 samples/sec Loss 8.9564 LearningRate 0.1629 Epoch: 8 Global Step: 88270 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 13:34:35,369-Speed 5977.13 samples/sec Loss 8.9944 LearningRate 0.1629 Epoch: 8 Global Step: 88280 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 13:34:42,214-Speed 5985.38 samples/sec Loss 8.9421 LearningRate 0.1629 Epoch: 8 Global Step: 88290 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 13:34:49,084-Speed 5962.70 samples/sec Loss 8.9756 LearningRate 0.1628 Epoch: 8 Global Step: 88300 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 13:34:55,942-Speed 5974.18 samples/sec Loss 8.9768 LearningRate 0.1628 Epoch: 8 Global Step: 88310 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 13:35:02,803-Speed 5971.13 samples/sec Loss 9.0328 LearningRate 0.1628 Epoch: 8 Global Step: 88320 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 13:35:09,663-Speed 5972.09 samples/sec Loss 8.9216 LearningRate 0.1627 Epoch: 8 Global Step: 88330 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 13:35:16,531-Speed 5965.50 samples/sec Loss 8.9036 LearningRate 0.1627 Epoch: 8 Global Step: 88340 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:35:23,403-Speed 5961.00 samples/sec Loss 8.9301 LearningRate 0.1627 Epoch: 8 Global Step: 88350 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:35:30,252-Speed 5981.61 samples/sec Loss 8.9843 LearningRate 0.1627 Epoch: 8 Global Step: 88360 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:35:37,106-Speed 5977.76 samples/sec Loss 8.9709 LearningRate 0.1626 Epoch: 8 Global Step: 88370 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:35:43,954-Speed 5982.66 samples/sec Loss 8.9751 LearningRate 0.1626 Epoch: 8 Global Step: 88380 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:35:50,793-Speed 5990.04 samples/sec Loss 8.9935 LearningRate 0.1626 Epoch: 8 Global Step: 88390 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:35:57,645-Speed 5981.03 samples/sec Loss 8.9237 LearningRate 0.1626 Epoch: 8 Global Step: 88400 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:36:04,494-Speed 5981.88 samples/sec Loss 8.9167 LearningRate 0.1625 Epoch: 8 Global Step: 88410 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:36:11,339-Speed 5984.92 samples/sec Loss 8.8726 LearningRate 0.1625 Epoch: 8 Global Step: 88420 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:36:18,214-Speed 5960.11 samples/sec Loss 8.8997 LearningRate 0.1625 Epoch: 8 Global Step: 88430 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:36:25,066-Speed 5979.87 samples/sec Loss 8.9484 LearningRate 0.1624 Epoch: 8 Global Step: 88440 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:36:31,922-Speed 5975.70 samples/sec Loss 8.9433 LearningRate 0.1624 Epoch: 8 Global Step: 88450 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:36:38,795-Speed 5960.44 samples/sec Loss 8.9498 LearningRate 0.1624 Epoch: 8 Global Step: 88460 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:36:45,647-Speed 5981.88 samples/sec Loss 8.9681 LearningRate 0.1624 Epoch: 8 Global Step: 88470 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:36:52,498-Speed 5979.78 samples/sec Loss 8.9842 LearningRate 0.1623 Epoch: 8 Global Step: 88480 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:36:59,359-Speed 5973.85 samples/sec Loss 8.9557 LearningRate 0.1623 Epoch: 8 Global Step: 88490 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:37:06,237-Speed 5956.25 samples/sec Loss 9.0095 LearningRate 0.1623 Epoch: 8 Global Step: 88500 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:37:13,127-Speed 5946.10 samples/sec Loss 9.0182 LearningRate 0.1623 Epoch: 8 Global Step: 88510 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:37:19,984-Speed 5974.41 samples/sec Loss 8.9445 LearningRate 0.1622 Epoch: 8 Global Step: 88520 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:37:26,867-Speed 5952.61 samples/sec Loss 8.8899 LearningRate 0.1622 Epoch: 8 Global Step: 88530 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:37:33,742-Speed 5958.67 samples/sec Loss 8.9649 LearningRate 0.1622 Epoch: 8 Global Step: 88540 Fp16 Grad Scale: 262144 Required: 23 hours Training: 2022-01-08 13:37:40,580-Speed 5991.76 samples/sec Loss 8.9161 LearningRate 0.1621 Epoch: 8 Global Step: 88550 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:37:47,449-Speed 5964.58 samples/sec Loss 9.0259 LearningRate 0.1621 Epoch: 8 Global Step: 88560 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:37:54,320-Speed 5962.53 samples/sec Loss 9.0078 LearningRate 0.1621 Epoch: 8 Global Step: 88570 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:38:01,174-Speed 5980.59 samples/sec Loss 8.9144 LearningRate 0.1621 Epoch: 8 Global Step: 88580 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:38:08,038-Speed 5969.11 samples/sec Loss 8.9212 LearningRate 0.1620 Epoch: 8 Global Step: 88590 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:38:14,899-Speed 5970.82 samples/sec Loss 8.8849 LearningRate 0.1620 Epoch: 8 Global Step: 88600 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:38:21,776-Speed 5959.19 samples/sec Loss 8.9159 LearningRate 0.1620 Epoch: 8 Global Step: 88610 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:38:28,652-Speed 5958.43 samples/sec Loss 8.8900 LearningRate 0.1620 Epoch: 8 Global Step: 88620 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:38:35,591-Speed 5904.17 samples/sec Loss 8.9791 LearningRate 0.1619 Epoch: 8 Global Step: 88630 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:38:42,446-Speed 5976.01 samples/sec Loss 8.8470 LearningRate 0.1619 Epoch: 8 Global Step: 88640 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:38:49,315-Speed 5964.32 samples/sec Loss 8.9011 LearningRate 0.1619 Epoch: 8 Global Step: 88650 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:38:56,173-Speed 5973.77 samples/sec Loss 8.9565 LearningRate 0.1618 Epoch: 8 Global Step: 88660 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:39:03,021-Speed 5986.66 samples/sec Loss 8.9569 LearningRate 0.1618 Epoch: 8 Global Step: 88670 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:39:10,341-Speed 5596.19 samples/sec Loss 8.9810 LearningRate 0.1618 Epoch: 8 Global Step: 88680 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:39:17,181-Speed 5989.31 samples/sec Loss 8.9875 LearningRate 0.1618 Epoch: 8 Global Step: 88690 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:39:24,029-Speed 5982.88 samples/sec Loss 8.9104 LearningRate 0.1617 Epoch: 8 Global Step: 88700 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:39:30,873-Speed 5985.87 samples/sec Loss 8.8650 LearningRate 0.1617 Epoch: 8 Global Step: 88710 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:39:37,745-Speed 5960.55 samples/sec Loss 8.8801 LearningRate 0.1617 Epoch: 8 Global Step: 88720 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:39:44,639-Speed 5943.47 samples/sec Loss 8.9347 LearningRate 0.1617 Epoch: 8 Global Step: 88730 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:39:51,497-Speed 5973.58 samples/sec Loss 8.9274 LearningRate 0.1616 Epoch: 8 Global Step: 88740 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:39:58,364-Speed 5964.82 samples/sec Loss 8.9152 LearningRate 0.1616 Epoch: 8 Global Step: 88750 Fp16 Grad Scale: 262144 Required: 23 hours Training: 2022-01-08 13:40:05,314-Speed 5895.04 samples/sec Loss 9.0013 LearningRate 0.1616 Epoch: 8 Global Step: 88760 Fp16 Grad Scale: 262144 Required: 23 hours Training: 2022-01-08 13:40:12,164-Speed 5984.81 samples/sec Loss 8.9193 LearningRate 0.1615 Epoch: 8 Global Step: 88770 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:40:19,016-Speed 5978.35 samples/sec Loss 8.8248 LearningRate 0.1615 Epoch: 8 Global Step: 88780 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:40:25,876-Speed 5974.37 samples/sec Loss 8.8953 LearningRate 0.1615 Epoch: 8 Global Step: 88790 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:40:32,724-Speed 5983.19 samples/sec Loss 8.8991 LearningRate 0.1615 Epoch: 8 Global Step: 88800 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:40:39,578-Speed 5976.86 samples/sec Loss 8.9382 LearningRate 0.1614 Epoch: 8 Global Step: 88810 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:40:46,457-Speed 5954.85 samples/sec Loss 8.9055 LearningRate 0.1614 Epoch: 8 Global Step: 88820 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:40:53,306-Speed 5982.32 samples/sec Loss 8.9617 LearningRate 0.1614 Epoch: 8 Global Step: 88830 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:41:00,156-Speed 5980.64 samples/sec Loss 8.9340 LearningRate 0.1614 Epoch: 8 Global Step: 88840 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:41:07,022-Speed 5966.45 samples/sec Loss 8.8874 LearningRate 0.1613 Epoch: 8 Global Step: 88850 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:41:13,871-Speed 5982.81 samples/sec Loss 8.9086 LearningRate 0.1613 Epoch: 8 Global Step: 88860 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:41:20,721-Speed 5980.03 samples/sec Loss 8.7981 LearningRate 0.1613 Epoch: 8 Global Step: 88870 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:41:27,570-Speed 5981.90 samples/sec Loss 8.9013 LearningRate 0.1612 Epoch: 8 Global Step: 88880 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:41:34,430-Speed 5972.36 samples/sec Loss 8.8847 LearningRate 0.1612 Epoch: 8 Global Step: 88890 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:41:41,288-Speed 5973.86 samples/sec Loss 8.8981 LearningRate 0.1612 Epoch: 8 Global Step: 88900 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:41:48,141-Speed 5978.23 samples/sec Loss 8.9457 LearningRate 0.1612 Epoch: 8 Global Step: 88910 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:41:55,014-Speed 5960.32 samples/sec Loss 8.9379 LearningRate 0.1611 Epoch: 8 Global Step: 88920 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:42:01,863-Speed 5981.86 samples/sec Loss 8.9414 LearningRate 0.1611 Epoch: 8 Global Step: 88930 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:42:08,713-Speed 5982.57 samples/sec Loss 8.8656 LearningRate 0.1611 Epoch: 8 Global Step: 88940 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:42:15,562-Speed 5981.57 samples/sec Loss 8.9063 LearningRate 0.1611 Epoch: 8 Global Step: 88950 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:42:22,415-Speed 5978.48 samples/sec Loss 8.9721 LearningRate 0.1610 Epoch: 8 Global Step: 88960 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:42:29,266-Speed 5979.95 samples/sec Loss 8.8874 LearningRate 0.1610 Epoch: 8 Global Step: 88970 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:42:36,124-Speed 5977.22 samples/sec Loss 8.8265 LearningRate 0.1610 Epoch: 8 Global Step: 88980 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:42:42,985-Speed 5970.06 samples/sec Loss 8.8698 LearningRate 0.1609 Epoch: 8 Global Step: 88990 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:42:49,843-Speed 5974.21 samples/sec Loss 8.9209 LearningRate 0.1609 Epoch: 8 Global Step: 89000 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:42:56,704-Speed 5971.39 samples/sec Loss 8.9412 LearningRate 0.1609 Epoch: 8 Global Step: 89010 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:43:03,567-Speed 5968.73 samples/sec Loss 8.8864 LearningRate 0.1609 Epoch: 8 Global Step: 89020 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:43:10,431-Speed 5968.07 samples/sec Loss 8.8744 LearningRate 0.1608 Epoch: 8 Global Step: 89030 Fp16 Grad Scale: 262144 Required: 23 hours Training: 2022-01-08 13:43:17,313-Speed 5955.99 samples/sec Loss 8.9189 LearningRate 0.1608 Epoch: 8 Global Step: 89040 Fp16 Grad Scale: 262144 Required: 23 hours Training: 2022-01-08 13:43:24,169-Speed 5975.04 samples/sec Loss 8.8064 LearningRate 0.1608 Epoch: 8 Global Step: 89050 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:43:31,024-Speed 5976.44 samples/sec Loss 8.9146 LearningRate 0.1608 Epoch: 8 Global Step: 89060 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:43:37,881-Speed 5974.83 samples/sec Loss 8.9243 LearningRate 0.1607 Epoch: 8 Global Step: 89070 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:43:44,746-Speed 5970.13 samples/sec Loss 8.8691 LearningRate 0.1607 Epoch: 8 Global Step: 89080 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:43:51,591-Speed 5985.39 samples/sec Loss 8.8902 LearningRate 0.1607 Epoch: 8 Global Step: 89090 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:43:58,460-Speed 5964.55 samples/sec Loss 8.8844 LearningRate 0.1606 Epoch: 8 Global Step: 89100 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:44:05,328-Speed 5964.77 samples/sec Loss 8.8403 LearningRate 0.1606 Epoch: 8 Global Step: 89110 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:44:12,194-Speed 5966.49 samples/sec Loss 8.8595 LearningRate 0.1606 Epoch: 8 Global Step: 89120 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:44:19,053-Speed 5973.24 samples/sec Loss 8.8719 LearningRate 0.1606 Epoch: 8 Global Step: 89130 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:44:25,917-Speed 5968.42 samples/sec Loss 8.9143 LearningRate 0.1605 Epoch: 8 Global Step: 89140 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:44:32,764-Speed 5983.46 samples/sec Loss 8.8234 LearningRate 0.1605 Epoch: 8 Global Step: 89150 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:44:39,621-Speed 5975.36 samples/sec Loss 8.9040 LearningRate 0.1605 Epoch: 8 Global Step: 89160 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:44:46,472-Speed 5979.10 samples/sec Loss 8.8932 LearningRate 0.1605 Epoch: 8 Global Step: 89170 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:44:53,331-Speed 5973.02 samples/sec Loss 8.8627 LearningRate 0.1604 Epoch: 8 Global Step: 89180 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:45:00,212-Speed 5953.64 samples/sec Loss 8.8891 LearningRate 0.1604 Epoch: 8 Global Step: 89190 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:45:07,075-Speed 5968.79 samples/sec Loss 8.8625 LearningRate 0.1604 Epoch: 8 Global Step: 89200 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:45:13,942-Speed 5966.30 samples/sec Loss 8.7971 LearningRate 0.1603 Epoch: 8 Global Step: 89210 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:45:21,465-Speed 5446.01 samples/sec Loss 8.8313 LearningRate 0.1603 Epoch: 8 Global Step: 89220 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:45:28,329-Speed 5968.28 samples/sec Loss 8.8647 LearningRate 0.1603 Epoch: 8 Global Step: 89230 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:45:35,186-Speed 5974.60 samples/sec Loss 8.9280 LearningRate 0.1603 Epoch: 8 Global Step: 89240 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:45:42,042-Speed 5975.56 samples/sec Loss 8.7865 LearningRate 0.1602 Epoch: 8 Global Step: 89250 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:45:48,894-Speed 5978.52 samples/sec Loss 8.9154 LearningRate 0.1602 Epoch: 8 Global Step: 89260 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:45:55,834-Speed 5903.71 samples/sec Loss 8.9396 LearningRate 0.1602 Epoch: 8 Global Step: 89270 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:46:02,710-Speed 5958.58 samples/sec Loss 8.8531 LearningRate 0.1602 Epoch: 8 Global Step: 89280 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:46:09,575-Speed 5969.76 samples/sec Loss 8.8998 LearningRate 0.1601 Epoch: 8 Global Step: 89290 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:46:16,442-Speed 5965.43 samples/sec Loss 8.8553 LearningRate 0.1601 Epoch: 8 Global Step: 89300 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:46:23,309-Speed 5965.91 samples/sec Loss 8.8922 LearningRate 0.1601 Epoch: 8 Global Step: 89310 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:46:30,170-Speed 5971.26 samples/sec Loss 8.8818 LearningRate 0.1600 Epoch: 8 Global Step: 89320 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:46:37,027-Speed 5974.32 samples/sec Loss 8.9378 LearningRate 0.1600 Epoch: 8 Global Step: 89330 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:46:43,884-Speed 5974.84 samples/sec Loss 8.9489 LearningRate 0.1600 Epoch: 8 Global Step: 89340 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:46:50,733-Speed 5981.70 samples/sec Loss 8.8868 LearningRate 0.1600 Epoch: 8 Global Step: 89350 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:46:57,593-Speed 5972.26 samples/sec Loss 8.9523 LearningRate 0.1599 Epoch: 8 Global Step: 89360 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:47:04,447-Speed 5978.79 samples/sec Loss 8.9176 LearningRate 0.1599 Epoch: 8 Global Step: 89370 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:47:11,334-Speed 5948.39 samples/sec Loss 8.8580 LearningRate 0.1599 Epoch: 8 Global Step: 89380 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:47:18,272-Speed 5904.39 samples/sec Loss 8.8826 LearningRate 0.1599 Epoch: 8 Global Step: 89390 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:47:25,210-Speed 5906.23 samples/sec Loss 8.8834 LearningRate 0.1598 Epoch: 8 Global Step: 89400 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:47:32,091-Speed 5954.08 samples/sec Loss 8.8119 LearningRate 0.1598 Epoch: 8 Global Step: 89410 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:47:38,948-Speed 5975.17 samples/sec Loss 8.8741 LearningRate 0.1598 Epoch: 8 Global Step: 89420 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:47:45,799-Speed 5979.50 samples/sec Loss 8.8371 LearningRate 0.1597 Epoch: 8 Global Step: 89430 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:47:52,649-Speed 5980.58 samples/sec Loss 8.8240 LearningRate 0.1597 Epoch: 8 Global Step: 89440 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:47:59,493-Speed 5987.84 samples/sec Loss 8.9122 LearningRate 0.1597 Epoch: 8 Global Step: 89450 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:48:06,370-Speed 5957.18 samples/sec Loss 8.8995 LearningRate 0.1597 Epoch: 8 Global Step: 89460 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:48:13,222-Speed 5978.93 samples/sec Loss 8.7924 LearningRate 0.1596 Epoch: 8 Global Step: 89470 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:48:20,077-Speed 5976.82 samples/sec Loss 8.8372 LearningRate 0.1596 Epoch: 8 Global Step: 89480 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:48:26,959-Speed 5953.35 samples/sec Loss 8.8484 LearningRate 0.1596 Epoch: 8 Global Step: 89490 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:48:33,808-Speed 5981.29 samples/sec Loss 8.8039 LearningRate 0.1596 Epoch: 8 Global Step: 89500 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:48:40,678-Speed 5963.19 samples/sec Loss 8.9483 LearningRate 0.1595 Epoch: 8 Global Step: 89510 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:48:47,523-Speed 5984.67 samples/sec Loss 8.8351 LearningRate 0.1595 Epoch: 8 Global Step: 89520 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:48:54,386-Speed 5969.93 samples/sec Loss 8.8409 LearningRate 0.1595 Epoch: 8 Global Step: 89530 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:49:01,239-Speed 5978.10 samples/sec Loss 8.8836 LearningRate 0.1595 Epoch: 8 Global Step: 89540 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:49:08,106-Speed 5966.22 samples/sec Loss 8.9540 LearningRate 0.1594 Epoch: 8 Global Step: 89550 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:49:14,984-Speed 5956.13 samples/sec Loss 8.8603 LearningRate 0.1594 Epoch: 8 Global Step: 89560 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:49:21,847-Speed 5972.11 samples/sec Loss 8.8894 LearningRate 0.1594 Epoch: 8 Global Step: 89570 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:49:28,735-Speed 5947.02 samples/sec Loss 8.8228 LearningRate 0.1593 Epoch: 8 Global Step: 89580 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:49:35,671-Speed 5906.79 samples/sec Loss 8.7977 LearningRate 0.1593 Epoch: 8 Global Step: 89590 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:49:42,563-Speed 5945.08 samples/sec Loss 8.8426 LearningRate 0.1593 Epoch: 8 Global Step: 89600 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:49:49,418-Speed 5976.27 samples/sec Loss 8.8273 LearningRate 0.1593 Epoch: 8 Global Step: 89610 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:49:56,278-Speed 5971.31 samples/sec Loss 8.8686 LearningRate 0.1592 Epoch: 8 Global Step: 89620 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:50:03,127-Speed 5981.91 samples/sec Loss 8.8714 LearningRate 0.1592 Epoch: 8 Global Step: 89630 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:50:09,992-Speed 5967.96 samples/sec Loss 8.9098 LearningRate 0.1592 Epoch: 8 Global Step: 89640 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:50:16,958-Speed 5881.08 samples/sec Loss 8.8855 LearningRate 0.1592 Epoch: 8 Global Step: 89650 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:50:23,811-Speed 5978.29 samples/sec Loss 8.9023 LearningRate 0.1591 Epoch: 8 Global Step: 89660 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:50:30,661-Speed 5980.28 samples/sec Loss 8.8127 LearningRate 0.1591 Epoch: 8 Global Step: 89670 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:50:37,523-Speed 5970.20 samples/sec Loss 8.8148 LearningRate 0.1591 Epoch: 8 Global Step: 89680 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:50:44,376-Speed 5979.04 samples/sec Loss 8.8867 LearningRate 0.1590 Epoch: 8 Global Step: 89690 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:50:51,244-Speed 5966.10 samples/sec Loss 8.8274 LearningRate 0.1590 Epoch: 8 Global Step: 89700 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:50:58,101-Speed 5973.84 samples/sec Loss 8.8386 LearningRate 0.1590 Epoch: 8 Global Step: 89710 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:51:04,951-Speed 5981.23 samples/sec Loss 8.9232 LearningRate 0.1590 Epoch: 8 Global Step: 89720 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:51:11,797-Speed 5983.91 samples/sec Loss 8.8387 LearningRate 0.1589 Epoch: 8 Global Step: 89730 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:51:18,666-Speed 5964.29 samples/sec Loss 8.8294 LearningRate 0.1589 Epoch: 8 Global Step: 89740 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:51:25,512-Speed 5984.42 samples/sec Loss 8.8622 LearningRate 0.1589 Epoch: 8 Global Step: 89750 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:51:32,373-Speed 5972.46 samples/sec Loss 8.8419 LearningRate 0.1589 Epoch: 8 Global Step: 89760 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:51:39,243-Speed 5965.92 samples/sec Loss 8.8720 LearningRate 0.1588 Epoch: 8 Global Step: 89770 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:51:46,109-Speed 5967.10 samples/sec Loss 8.8005 LearningRate 0.1588 Epoch: 8 Global Step: 89780 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:51:52,963-Speed 5977.29 samples/sec Loss 8.8235 LearningRate 0.1588 Epoch: 8 Global Step: 89790 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:51:59,848-Speed 5949.72 samples/sec Loss 8.8342 LearningRate 0.1587 Epoch: 8 Global Step: 89800 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:52:06,714-Speed 5967.48 samples/sec Loss 8.8656 LearningRate 0.1587 Epoch: 8 Global Step: 89810 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:52:13,565-Speed 5979.63 samples/sec Loss 8.8688 LearningRate 0.1587 Epoch: 8 Global Step: 89820 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:52:20,418-Speed 5978.08 samples/sec Loss 8.8897 LearningRate 0.1587 Epoch: 8 Global Step: 89830 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:52:27,288-Speed 5964.06 samples/sec Loss 8.8955 LearningRate 0.1586 Epoch: 8 Global Step: 89840 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:52:34,155-Speed 5966.28 samples/sec Loss 8.8578 LearningRate 0.1586 Epoch: 8 Global Step: 89850 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:52:41,005-Speed 5979.99 samples/sec Loss 8.8489 LearningRate 0.1586 Epoch: 8 Global Step: 89860 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:52:47,873-Speed 5965.23 samples/sec Loss 8.8473 LearningRate 0.1586 Epoch: 8 Global Step: 89870 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:52:54,748-Speed 5961.94 samples/sec Loss 8.8437 LearningRate 0.1585 Epoch: 8 Global Step: 89880 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:53:01,603-Speed 5975.80 samples/sec Loss 8.8072 LearningRate 0.1585 Epoch: 8 Global Step: 89890 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:53:08,496-Speed 5943.23 samples/sec Loss 8.7818 LearningRate 0.1585 Epoch: 8 Global Step: 89900 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:53:15,388-Speed 5946.32 samples/sec Loss 8.8597 LearningRate 0.1585 Epoch: 8 Global Step: 89910 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:53:22,274-Speed 5949.40 samples/sec Loss 8.7564 LearningRate 0.1584 Epoch: 8 Global Step: 89920 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:53:29,200-Speed 5915.28 samples/sec Loss 8.8522 LearningRate 0.1584 Epoch: 8 Global Step: 89930 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:53:36,064-Speed 5969.89 samples/sec Loss 8.8964 LearningRate 0.1584 Epoch: 8 Global Step: 89940 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:53:42,916-Speed 5979.66 samples/sec Loss 8.8028 LearningRate 0.1583 Epoch: 8 Global Step: 89950 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:53:49,782-Speed 5966.33 samples/sec Loss 8.8560 LearningRate 0.1583 Epoch: 8 Global Step: 89960 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:53:56,632-Speed 5983.87 samples/sec Loss 8.8233 LearningRate 0.1583 Epoch: 8 Global Step: 89970 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:54:03,478-Speed 5983.63 samples/sec Loss 8.8749 LearningRate 0.1583 Epoch: 8 Global Step: 89980 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:54:10,390-Speed 5927.36 samples/sec Loss 8.7826 LearningRate 0.1582 Epoch: 8 Global Step: 89990 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:54:17,238-Speed 5982.02 samples/sec Loss 8.7688 LearningRate 0.1582 Epoch: 8 Global Step: 90000 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:54:44,267-[lfw][90000]XNorm: 24.866661 Training: 2022-01-08 13:54:44,268-[lfw][90000]Accuracy-Flip: 0.99733+-0.00291 Training: 2022-01-08 13:54:44,269-[lfw][90000]Accuracy-Highest: 0.99750 Training: 2022-01-08 13:55:15,331-[cfp_fp][90000]XNorm: 21.526887 Training: 2022-01-08 13:55:15,332-[cfp_fp][90000]Accuracy-Flip: 0.97757+-0.01110 Training: 2022-01-08 13:55:15,333-[cfp_fp][90000]Accuracy-Highest: 0.98114 Training: 2022-01-08 13:55:42,142-[agedb_30][90000]XNorm: 23.877042 Training: 2022-01-08 13:55:42,143-[agedb_30][90000]Accuracy-Flip: 0.96650+-0.00848 Training: 2022-01-08 13:55:42,144-[agedb_30][90000]Accuracy-Highest: 0.96883 Training: 2022-01-08 13:55:49,002-Speed 446.37 samples/sec Loss 8.8322 LearningRate 0.1582 Epoch: 8 Global Step: 90010 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:55:55,843-Speed 5988.87 samples/sec Loss 8.8285 LearningRate 0.1582 Epoch: 8 Global Step: 90020 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:56:02,697-Speed 5979.60 samples/sec Loss 8.8735 LearningRate 0.1581 Epoch: 8 Global Step: 90030 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:56:09,570-Speed 5960.52 samples/sec Loss 8.8437 LearningRate 0.1581 Epoch: 8 Global Step: 90040 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:56:16,428-Speed 5973.69 samples/sec Loss 8.7870 LearningRate 0.1581 Epoch: 8 Global Step: 90050 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:56:23,348-Speed 5920.78 samples/sec Loss 8.8299 LearningRate 0.1580 Epoch: 8 Global Step: 90060 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:56:30,241-Speed 5943.83 samples/sec Loss 8.8102 LearningRate 0.1580 Epoch: 8 Global Step: 90070 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:56:37,134-Speed 5942.92 samples/sec Loss 8.7856 LearningRate 0.1580 Epoch: 8 Global Step: 90080 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:56:44,028-Speed 5943.08 samples/sec Loss 8.8004 LearningRate 0.1580 Epoch: 8 Global Step: 90090 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:56:50,907-Speed 5954.75 samples/sec Loss 8.7866 LearningRate 0.1579 Epoch: 8 Global Step: 90100 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:56:57,791-Speed 5951.05 samples/sec Loss 8.8302 LearningRate 0.1579 Epoch: 8 Global Step: 90110 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:57:04,663-Speed 5963.18 samples/sec Loss 8.8127 LearningRate 0.1579 Epoch: 8 Global Step: 90120 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:57:11,527-Speed 5968.29 samples/sec Loss 8.7702 LearningRate 0.1579 Epoch: 8 Global Step: 90130 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:57:18,533-Speed 5847.38 samples/sec Loss 8.7165 LearningRate 0.1578 Epoch: 8 Global Step: 90140 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:57:25,374-Speed 5989.64 samples/sec Loss 8.8046 LearningRate 0.1578 Epoch: 8 Global Step: 90150 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 13:57:32,311-Speed 5905.60 samples/sec Loss 8.8150 LearningRate 0.1578 Epoch: 8 Global Step: 90160 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 13:57:39,267-Speed 5889.98 samples/sec Loss 8.8387 LearningRate 0.1578 Epoch: 8 Global Step: 90170 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 13:57:46,262-Speed 5857.36 samples/sec Loss 8.7182 LearningRate 0.1577 Epoch: 8 Global Step: 90180 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 13:57:53,192-Speed 5911.62 samples/sec Loss 8.8097 LearningRate 0.1577 Epoch: 8 Global Step: 90190 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 13:58:00,129-Speed 5905.51 samples/sec Loss 8.7723 LearningRate 0.1577 Epoch: 8 Global Step: 90200 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 13:58:06,979-Speed 5980.76 samples/sec Loss 8.7812 LearningRate 0.1576 Epoch: 8 Global Step: 90210 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 13:58:13,838-Speed 5973.28 samples/sec Loss 8.7902 LearningRate 0.1576 Epoch: 8 Global Step: 90220 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 13:58:20,715-Speed 5957.13 samples/sec Loss 8.8311 LearningRate 0.1576 Epoch: 8 Global Step: 90230 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 13:58:27,584-Speed 5965.00 samples/sec Loss 8.7915 LearningRate 0.1576 Epoch: 8 Global Step: 90240 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 13:58:34,470-Speed 5949.64 samples/sec Loss 8.8638 LearningRate 0.1575 Epoch: 8 Global Step: 90250 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:58:41,379-Speed 5929.70 samples/sec Loss 8.8211 LearningRate 0.1575 Epoch: 8 Global Step: 90260 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:58:48,238-Speed 5972.90 samples/sec Loss 8.7821 LearningRate 0.1575 Epoch: 8 Global Step: 90270 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:58:55,109-Speed 5962.95 samples/sec Loss 8.7424 LearningRate 0.1575 Epoch: 8 Global Step: 90280 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:59:01,983-Speed 5959.72 samples/sec Loss 8.7903 LearningRate 0.1574 Epoch: 8 Global Step: 90290 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:59:08,833-Speed 5980.76 samples/sec Loss 8.7778 LearningRate 0.1574 Epoch: 8 Global Step: 90300 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:59:15,696-Speed 5969.15 samples/sec Loss 8.8553 LearningRate 0.1574 Epoch: 8 Global Step: 90310 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:59:22,552-Speed 5975.59 samples/sec Loss 8.8228 LearningRate 0.1573 Epoch: 8 Global Step: 90320 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:59:29,405-Speed 5978.66 samples/sec Loss 8.8579 LearningRate 0.1573 Epoch: 8 Global Step: 90330 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:59:36,275-Speed 5963.35 samples/sec Loss 8.8069 LearningRate 0.1573 Epoch: 8 Global Step: 90340 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 13:59:43,138-Speed 5969.50 samples/sec Loss 8.8342 LearningRate 0.1573 Epoch: 8 Global Step: 90350 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:59:49,999-Speed 5970.95 samples/sec Loss 8.7974 LearningRate 0.1572 Epoch: 8 Global Step: 90360 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 13:59:56,864-Speed 5967.62 samples/sec Loss 8.8307 LearningRate 0.1572 Epoch: 8 Global Step: 90370 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:00:03,810-Speed 5897.87 samples/sec Loss 8.7767 LearningRate 0.1572 Epoch: 8 Global Step: 90380 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:00:10,663-Speed 5977.67 samples/sec Loss 8.8424 LearningRate 0.1572 Epoch: 8 Global Step: 90390 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:00:17,518-Speed 5976.72 samples/sec Loss 8.7851 LearningRate 0.1571 Epoch: 8 Global Step: 90400 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:00:24,380-Speed 5969.42 samples/sec Loss 8.7581 LearningRate 0.1571 Epoch: 8 Global Step: 90410 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:00:31,233-Speed 5978.07 samples/sec Loss 8.8125 LearningRate 0.1571 Epoch: 8 Global Step: 90420 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:00:38,113-Speed 5955.12 samples/sec Loss 8.7776 LearningRate 0.1571 Epoch: 8 Global Step: 90430 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:00:44,968-Speed 5976.37 samples/sec Loss 8.8421 LearningRate 0.1570 Epoch: 8 Global Step: 90440 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:00:51,870-Speed 5935.60 samples/sec Loss 8.7963 LearningRate 0.1570 Epoch: 8 Global Step: 90450 Fp16 Grad Scale: 262144 Required: 23 hours Training: 2022-01-08 14:00:58,736-Speed 5967.17 samples/sec Loss 8.7915 LearningRate 0.1570 Epoch: 8 Global Step: 90460 Fp16 Grad Scale: 262144 Required: 23 hours Training: 2022-01-08 14:01:05,595-Speed 5972.78 samples/sec Loss 8.7540 LearningRate 0.1569 Epoch: 8 Global Step: 90470 Fp16 Grad Scale: 262144 Required: 23 hours Training: 2022-01-08 14:01:12,455-Speed 5972.10 samples/sec Loss 8.6955 LearningRate 0.1569 Epoch: 8 Global Step: 90480 Fp16 Grad Scale: 262144 Required: 23 hours Training: 2022-01-08 14:01:19,320-Speed 5967.12 samples/sec Loss 8.7677 LearningRate 0.1569 Epoch: 8 Global Step: 90490 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:01:26,175-Speed 5975.95 samples/sec Loss 8.8275 LearningRate 0.1569 Epoch: 8 Global Step: 90500 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:01:35,735-Speed 5976.44 samples/sec Loss 8.8126 LearningRate 0.1568 Epoch: 8 Global Step: 90510 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:01:42,583-Speed 5981.77 samples/sec Loss 8.8110 LearningRate 0.1568 Epoch: 8 Global Step: 90520 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:01:49,449-Speed 5967.51 samples/sec Loss 8.8310 LearningRate 0.1568 Epoch: 8 Global Step: 90530 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:01:56,331-Speed 5952.64 samples/sec Loss 8.7948 LearningRate 0.1568 Epoch: 8 Global Step: 90540 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:02:03,228-Speed 5940.28 samples/sec Loss 8.8155 LearningRate 0.1567 Epoch: 8 Global Step: 90550 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:02:10,089-Speed 5971.72 samples/sec Loss 8.7384 LearningRate 0.1567 Epoch: 8 Global Step: 90560 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:02:16,982-Speed 5943.71 samples/sec Loss 8.8435 LearningRate 0.1567 Epoch: 8 Global Step: 90570 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:02:23,855-Speed 5960.36 samples/sec Loss 8.8408 LearningRate 0.1566 Epoch: 8 Global Step: 90580 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:02:30,740-Speed 5950.59 samples/sec Loss 8.7715 LearningRate 0.1566 Epoch: 8 Global Step: 90590 Fp16 Grad Scale: 262144 Required: 23 hours Training: 2022-01-08 14:02:37,642-Speed 5937.15 samples/sec Loss 8.8306 LearningRate 0.1566 Epoch: 8 Global Step: 90600 Fp16 Grad Scale: 262144 Required: 23 hours Training: 2022-01-08 14:02:44,505-Speed 5969.18 samples/sec Loss 8.8359 LearningRate 0.1566 Epoch: 8 Global Step: 90610 Fp16 Grad Scale: 262144 Required: 23 hours Training: 2022-01-08 14:02:51,373-Speed 5965.45 samples/sec Loss 8.7444 LearningRate 0.1565 Epoch: 8 Global Step: 90620 Fp16 Grad Scale: 262144 Required: 23 hours Training: 2022-01-08 14:02:58,254-Speed 5953.51 samples/sec Loss 8.8094 LearningRate 0.1565 Epoch: 8 Global Step: 90630 Fp16 Grad Scale: 262144 Required: 23 hours Training: 2022-01-08 14:03:05,117-Speed 5969.47 samples/sec Loss 8.7487 LearningRate 0.1565 Epoch: 8 Global Step: 90640 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:03:12,004-Speed 5948.99 samples/sec Loss 8.7642 LearningRate 0.1565 Epoch: 8 Global Step: 90650 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:03:18,863-Speed 5972.76 samples/sec Loss 8.8067 LearningRate 0.1564 Epoch: 8 Global Step: 90660 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:03:25,715-Speed 5978.57 samples/sec Loss 8.8246 LearningRate 0.1564 Epoch: 8 Global Step: 90670 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:03:32,602-Speed 5949.04 samples/sec Loss 8.7327 LearningRate 0.1564 Epoch: 8 Global Step: 90680 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:03:39,462-Speed 5971.74 samples/sec Loss 8.7446 LearningRate 0.1564 Epoch: 8 Global Step: 90690 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:03:46,330-Speed 5965.23 samples/sec Loss 8.7691 LearningRate 0.1563 Epoch: 8 Global Step: 90700 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:03:53,188-Speed 5973.35 samples/sec Loss 8.7597 LearningRate 0.1563 Epoch: 8 Global Step: 90710 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:04:00,052-Speed 5968.66 samples/sec Loss 8.7874 LearningRate 0.1563 Epoch: 8 Global Step: 90720 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:04:06,910-Speed 5973.99 samples/sec Loss 8.7433 LearningRate 0.1562 Epoch: 8 Global Step: 90730 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:04:13,767-Speed 5974.61 samples/sec Loss 8.7528 LearningRate 0.1562 Epoch: 8 Global Step: 90740 Fp16 Grad Scale: 262144 Required: 23 hours Training: 2022-01-08 14:04:20,624-Speed 5974.72 samples/sec Loss 8.8368 LearningRate 0.1562 Epoch: 8 Global Step: 90750 Fp16 Grad Scale: 262144 Required: 23 hours Training: 2022-01-08 14:04:27,465-Speed 5988.00 samples/sec Loss 8.7997 LearningRate 0.1562 Epoch: 8 Global Step: 90760 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:04:34,317-Speed 5981.51 samples/sec Loss 8.7352 LearningRate 0.1561 Epoch: 8 Global Step: 90770 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:04:41,163-Speed 5983.99 samples/sec Loss 8.7741 LearningRate 0.1561 Epoch: 8 Global Step: 90780 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:04:48,026-Speed 5968.76 samples/sec Loss 8.7565 LearningRate 0.1561 Epoch: 8 Global Step: 90790 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:04:54,873-Speed 5983.48 samples/sec Loss 8.7061 LearningRate 0.1561 Epoch: 8 Global Step: 90800 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:05:01,747-Speed 5963.50 samples/sec Loss 8.7970 LearningRate 0.1560 Epoch: 8 Global Step: 90810 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:05:08,615-Speed 5964.59 samples/sec Loss 8.7714 LearningRate 0.1560 Epoch: 8 Global Step: 90820 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:05:15,481-Speed 5966.81 samples/sec Loss 8.7875 LearningRate 0.1560 Epoch: 8 Global Step: 90830 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:05:22,344-Speed 5970.35 samples/sec Loss 8.7616 LearningRate 0.1560 Epoch: 8 Global Step: 90840 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:05:29,194-Speed 5980.53 samples/sec Loss 8.7711 LearningRate 0.1559 Epoch: 8 Global Step: 90850 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:05:36,058-Speed 5968.40 samples/sec Loss 8.7871 LearningRate 0.1559 Epoch: 8 Global Step: 90860 Fp16 Grad Scale: 262144 Required: 23 hours Training: 2022-01-08 14:05:42,925-Speed 5968.60 samples/sec Loss 8.7453 LearningRate 0.1559 Epoch: 8 Global Step: 90870 Fp16 Grad Scale: 262144 Required: 23 hours Training: 2022-01-08 14:05:49,776-Speed 5979.73 samples/sec Loss 8.7420 LearningRate 0.1558 Epoch: 8 Global Step: 90880 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:05:56,628-Speed 5978.62 samples/sec Loss 8.7220 LearningRate 0.1558 Epoch: 8 Global Step: 90890 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:06:03,476-Speed 5983.28 samples/sec Loss 8.7672 LearningRate 0.1558 Epoch: 8 Global Step: 90900 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:06:10,342-Speed 5965.92 samples/sec Loss 8.7720 LearningRate 0.1558 Epoch: 8 Global Step: 90910 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:06:17,219-Speed 5959.86 samples/sec Loss 8.7571 LearningRate 0.1557 Epoch: 8 Global Step: 90920 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:06:24,086-Speed 5967.79 samples/sec Loss 8.8040 LearningRate 0.1557 Epoch: 8 Global Step: 90930 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:06:30,971-Speed 5950.48 samples/sec Loss 8.7024 LearningRate 0.1557 Epoch: 8 Global Step: 90940 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:06:37,831-Speed 5971.84 samples/sec Loss 8.8043 LearningRate 0.1557 Epoch: 8 Global Step: 90950 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:06:44,682-Speed 5980.27 samples/sec Loss 8.7667 LearningRate 0.1556 Epoch: 8 Global Step: 90960 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:06:51,524-Speed 5986.59 samples/sec Loss 8.6794 LearningRate 0.1556 Epoch: 8 Global Step: 90970 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:06:58,374-Speed 5981.57 samples/sec Loss 8.7875 LearningRate 0.1556 Epoch: 8 Global Step: 90980 Fp16 Grad Scale: 262144 Required: 23 hours Training: 2022-01-08 14:07:05,237-Speed 5969.21 samples/sec Loss 8.8027 LearningRate 0.1556 Epoch: 8 Global Step: 90990 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:07:12,100-Speed 5969.16 samples/sec Loss 8.6854 LearningRate 0.1555 Epoch: 8 Global Step: 91000 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:07:18,961-Speed 5971.56 samples/sec Loss 8.8077 LearningRate 0.1555 Epoch: 8 Global Step: 91010 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:07:25,839-Speed 5956.55 samples/sec Loss 8.7699 LearningRate 0.1555 Epoch: 8 Global Step: 91020 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:07:32,707-Speed 5965.29 samples/sec Loss 8.7667 LearningRate 0.1554 Epoch: 8 Global Step: 91030 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:07:39,593-Speed 5952.04 samples/sec Loss 8.8465 LearningRate 0.1554 Epoch: 8 Global Step: 91040 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:07:46,454-Speed 5973.79 samples/sec Loss 8.7529 LearningRate 0.1554 Epoch: 8 Global Step: 91050 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:07:53,308-Speed 5976.45 samples/sec Loss 8.7619 LearningRate 0.1554 Epoch: 8 Global Step: 91060 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:08:00,169-Speed 5973.57 samples/sec Loss 8.7746 LearningRate 0.1553 Epoch: 8 Global Step: 91070 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:08:07,035-Speed 5967.04 samples/sec Loss 8.8075 LearningRate 0.1553 Epoch: 8 Global Step: 91080 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:08:13,890-Speed 5976.44 samples/sec Loss 8.7435 LearningRate 0.1553 Epoch: 8 Global Step: 91090 Fp16 Grad Scale: 262144 Required: 23 hours Training: 2022-01-08 14:08:20,770-Speed 5954.81 samples/sec Loss 8.6939 LearningRate 0.1553 Epoch: 8 Global Step: 91100 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:08:27,636-Speed 5966.06 samples/sec Loss 8.7221 LearningRate 0.1552 Epoch: 8 Global Step: 91110 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:08:34,490-Speed 5978.08 samples/sec Loss 8.7192 LearningRate 0.1552 Epoch: 8 Global Step: 91120 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:08:41,371-Speed 5952.89 samples/sec Loss 8.7490 LearningRate 0.1552 Epoch: 8 Global Step: 91130 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:08:48,243-Speed 5961.73 samples/sec Loss 8.7753 LearningRate 0.1552 Epoch: 8 Global Step: 91140 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:08:55,092-Speed 5981.53 samples/sec Loss 8.7402 LearningRate 0.1551 Epoch: 8 Global Step: 91150 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:09:01,953-Speed 5972.06 samples/sec Loss 8.7855 LearningRate 0.1551 Epoch: 8 Global Step: 91160 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:09:08,785-Speed 5996.62 samples/sec Loss 8.7488 LearningRate 0.1551 Epoch: 8 Global Step: 91170 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 14:09:15,685-Speed 5937.30 samples/sec Loss 8.8082 LearningRate 0.1550 Epoch: 8 Global Step: 91180 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 14:09:22,544-Speed 5972.50 samples/sec Loss 8.7820 LearningRate 0.1550 Epoch: 8 Global Step: 91190 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 14:09:29,405-Speed 5970.92 samples/sec Loss 8.6945 LearningRate 0.1550 Epoch: 8 Global Step: 91200 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 14:09:36,251-Speed 5984.82 samples/sec Loss 8.7543 LearningRate 0.1550 Epoch: 8 Global Step: 91210 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 14:09:43,099-Speed 5981.83 samples/sec Loss 8.7611 LearningRate 0.1549 Epoch: 8 Global Step: 91220 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 14:09:49,980-Speed 5953.17 samples/sec Loss 8.7171 LearningRate 0.1549 Epoch: 8 Global Step: 91230 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 14:09:56,832-Speed 5979.21 samples/sec Loss 8.6918 LearningRate 0.1549 Epoch: 8 Global Step: 91240 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 14:10:03,679-Speed 5983.83 samples/sec Loss 8.7371 LearningRate 0.1549 Epoch: 8 Global Step: 91250 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 14:10:10,536-Speed 5974.80 samples/sec Loss 8.7278 LearningRate 0.1548 Epoch: 8 Global Step: 91260 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 14:10:17,398-Speed 5970.34 samples/sec Loss 8.6512 LearningRate 0.1548 Epoch: 8 Global Step: 91270 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 14:10:24,244-Speed 5984.43 samples/sec Loss 8.7994 LearningRate 0.1548 Epoch: 8 Global Step: 91280 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 14:10:31,087-Speed 5986.99 samples/sec Loss 8.7026 LearningRate 0.1548 Epoch: 8 Global Step: 91290 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 14:10:37,952-Speed 5967.93 samples/sec Loss 8.7431 LearningRate 0.1547 Epoch: 8 Global Step: 91300 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 14:10:44,798-Speed 5984.09 samples/sec Loss 8.7323 LearningRate 0.1547 Epoch: 8 Global Step: 91310 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 14:10:51,645-Speed 5985.91 samples/sec Loss 8.7379 LearningRate 0.1547 Epoch: 8 Global Step: 91320 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 14:10:58,502-Speed 5973.96 samples/sec Loss 8.7978 LearningRate 0.1546 Epoch: 8 Global Step: 91330 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 14:11:05,371-Speed 5964.54 samples/sec Loss 8.7099 LearningRate 0.1546 Epoch: 8 Global Step: 91340 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 14:11:12,219-Speed 5982.80 samples/sec Loss 8.7564 LearningRate 0.1546 Epoch: 8 Global Step: 91350 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 14:11:19,081-Speed 5970.10 samples/sec Loss 8.7462 LearningRate 0.1546 Epoch: 8 Global Step: 91360 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 14:11:25,950-Speed 5964.12 samples/sec Loss 8.7125 LearningRate 0.1545 Epoch: 8 Global Step: 91370 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:11:32,828-Speed 5956.93 samples/sec Loss 8.7315 LearningRate 0.1545 Epoch: 8 Global Step: 91380 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:11:39,682-Speed 5976.46 samples/sec Loss 8.7329 LearningRate 0.1545 Epoch: 8 Global Step: 91390 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:11:46,544-Speed 5970.33 samples/sec Loss 8.6834 LearningRate 0.1545 Epoch: 8 Global Step: 91400 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:11:53,393-Speed 5983.40 samples/sec Loss 8.6628 LearningRate 0.1544 Epoch: 8 Global Step: 91410 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:12:00,250-Speed 5974.39 samples/sec Loss 8.7249 LearningRate 0.1544 Epoch: 8 Global Step: 91420 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:12:07,131-Speed 5953.81 samples/sec Loss 8.7299 LearningRate 0.1544 Epoch: 8 Global Step: 91430 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:12:14,000-Speed 5964.74 samples/sec Loss 8.6337 LearningRate 0.1544 Epoch: 8 Global Step: 91440 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:12:20,867-Speed 5965.26 samples/sec Loss 8.7234 LearningRate 0.1543 Epoch: 8 Global Step: 91450 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:12:27,737-Speed 5964.16 samples/sec Loss 8.6975 LearningRate 0.1543 Epoch: 8 Global Step: 91460 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:12:34,596-Speed 5972.53 samples/sec Loss 8.6844 LearningRate 0.1543 Epoch: 8 Global Step: 91470 Fp16 Grad Scale: 262144 Required: 23 hours Training: 2022-01-08 14:12:41,454-Speed 5973.95 samples/sec Loss 8.7350 LearningRate 0.1542 Epoch: 8 Global Step: 91480 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:12:48,304-Speed 5980.56 samples/sec Loss 8.6877 LearningRate 0.1542 Epoch: 8 Global Step: 91490 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:12:55,150-Speed 5987.58 samples/sec Loss 8.7238 LearningRate 0.1542 Epoch: 8 Global Step: 91500 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:13:02,021-Speed 5962.57 samples/sec Loss 8.7496 LearningRate 0.1542 Epoch: 8 Global Step: 91510 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:13:08,888-Speed 5967.57 samples/sec Loss 8.7354 LearningRate 0.1541 Epoch: 8 Global Step: 91520 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:13:15,750-Speed 5970.38 samples/sec Loss 8.7150 LearningRate 0.1541 Epoch: 8 Global Step: 91530 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:13:22,608-Speed 5973.43 samples/sec Loss 8.7362 LearningRate 0.1541 Epoch: 8 Global Step: 91540 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:13:29,475-Speed 5965.80 samples/sec Loss 8.7260 LearningRate 0.1541 Epoch: 8 Global Step: 91550 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:13:36,334-Speed 5973.88 samples/sec Loss 8.7559 LearningRate 0.1540 Epoch: 8 Global Step: 91560 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:13:43,187-Speed 5977.89 samples/sec Loss 8.8022 LearningRate 0.1540 Epoch: 8 Global Step: 91570 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:13:50,052-Speed 5967.52 samples/sec Loss 8.7596 LearningRate 0.1540 Epoch: 8 Global Step: 91580 Fp16 Grad Scale: 262144 Required: 23 hours Training: 2022-01-08 14:13:56,924-Speed 5961.28 samples/sec Loss 8.6740 LearningRate 0.1540 Epoch: 8 Global Step: 91590 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:14:03,783-Speed 5973.15 samples/sec Loss 8.6914 LearningRate 0.1539 Epoch: 8 Global Step: 91600 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:14:10,648-Speed 5969.32 samples/sec Loss 8.6591 LearningRate 0.1539 Epoch: 8 Global Step: 91610 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:14:17,519-Speed 5962.74 samples/sec Loss 8.6839 LearningRate 0.1539 Epoch: 8 Global Step: 91620 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:14:24,384-Speed 5967.28 samples/sec Loss 8.7239 LearningRate 0.1538 Epoch: 8 Global Step: 91630 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:14:31,257-Speed 5960.34 samples/sec Loss 8.6736 LearningRate 0.1538 Epoch: 8 Global Step: 91640 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:14:38,121-Speed 5972.25 samples/sec Loss 8.6788 LearningRate 0.1538 Epoch: 8 Global Step: 91650 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:14:44,965-Speed 5985.37 samples/sec Loss 8.6665 LearningRate 0.1538 Epoch: 8 Global Step: 91660 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:14:51,881-Speed 5923.12 samples/sec Loss 8.7218 LearningRate 0.1537 Epoch: 8 Global Step: 91670 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:14:58,726-Speed 5984.77 samples/sec Loss 8.6720 LearningRate 0.1537 Epoch: 8 Global Step: 91680 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:15:05,587-Speed 5971.41 samples/sec Loss 8.6858 LearningRate 0.1537 Epoch: 8 Global Step: 91690 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:15:12,442-Speed 5976.37 samples/sec Loss 8.6817 LearningRate 0.1537 Epoch: 8 Global Step: 91700 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:15:19,296-Speed 5976.90 samples/sec Loss 8.7448 LearningRate 0.1536 Epoch: 8 Global Step: 91710 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:15:26,182-Speed 5949.19 samples/sec Loss 8.7116 LearningRate 0.1536 Epoch: 8 Global Step: 91720 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:15:33,034-Speed 5979.76 samples/sec Loss 8.6965 LearningRate 0.1536 Epoch: 8 Global Step: 91730 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 14:15:39,903-Speed 5964.36 samples/sec Loss 8.7323 LearningRate 0.1536 Epoch: 8 Global Step: 91740 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 14:15:46,773-Speed 5963.16 samples/sec Loss 8.6852 LearningRate 0.1535 Epoch: 8 Global Step: 91750 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 14:15:53,639-Speed 5966.74 samples/sec Loss 8.6404 LearningRate 0.1535 Epoch: 8 Global Step: 91760 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 14:16:00,518-Speed 5955.97 samples/sec Loss 8.7037 LearningRate 0.1535 Epoch: 8 Global Step: 91770 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 14:16:07,375-Speed 5974.22 samples/sec Loss 8.6228 LearningRate 0.1534 Epoch: 8 Global Step: 91780 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 14:16:14,232-Speed 5974.26 samples/sec Loss 8.6404 LearningRate 0.1534 Epoch: 8 Global Step: 91790 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 14:16:21,079-Speed 5983.90 samples/sec Loss 8.6540 LearningRate 0.1534 Epoch: 8 Global Step: 91800 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 14:16:27,941-Speed 5969.70 samples/sec Loss 8.6293 LearningRate 0.1534 Epoch: 8 Global Step: 91810 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 14:16:34,807-Speed 5966.52 samples/sec Loss 8.6406 LearningRate 0.1533 Epoch: 8 Global Step: 91820 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 14:16:41,669-Speed 5969.94 samples/sec Loss 8.6968 LearningRate 0.1533 Epoch: 8 Global Step: 91830 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:16:48,530-Speed 5970.97 samples/sec Loss 8.6926 LearningRate 0.1533 Epoch: 8 Global Step: 91840 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:16:55,397-Speed 5966.32 samples/sec Loss 8.7252 LearningRate 0.1533 Epoch: 8 Global Step: 91850 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:17:02,261-Speed 5968.15 samples/sec Loss 8.6732 LearningRate 0.1532 Epoch: 8 Global Step: 91860 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:17:09,112-Speed 5979.53 samples/sec Loss 8.6543 LearningRate 0.1532 Epoch: 8 Global Step: 91870 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:17:15,958-Speed 5983.96 samples/sec Loss 8.6015 LearningRate 0.1532 Epoch: 8 Global Step: 91880 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:17:22,800-Speed 5987.91 samples/sec Loss 8.6925 LearningRate 0.1532 Epoch: 8 Global Step: 91890 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 14:17:29,646-Speed 5984.00 samples/sec Loss 8.7021 LearningRate 0.1531 Epoch: 8 Global Step: 91900 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 14:17:36,497-Speed 5980.11 samples/sec Loss 8.7745 LearningRate 0.1531 Epoch: 8 Global Step: 91910 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 14:17:43,360-Speed 5969.10 samples/sec Loss 8.6995 LearningRate 0.1531 Epoch: 8 Global Step: 91920 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 14:17:50,213-Speed 5977.70 samples/sec Loss 8.7094 LearningRate 0.1530 Epoch: 8 Global Step: 91930 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 14:17:57,065-Speed 5978.48 samples/sec Loss 8.6336 LearningRate 0.1530 Epoch: 8 Global Step: 91940 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 14:18:03,939-Speed 5960.02 samples/sec Loss 8.6409 LearningRate 0.1530 Epoch: 8 Global Step: 91950 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 14:18:10,787-Speed 5981.75 samples/sec Loss 8.6769 LearningRate 0.1530 Epoch: 8 Global Step: 91960 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 14:18:17,661-Speed 5960.42 samples/sec Loss 8.6893 LearningRate 0.1529 Epoch: 8 Global Step: 91970 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 14:18:24,518-Speed 5973.95 samples/sec Loss 8.7155 LearningRate 0.1529 Epoch: 8 Global Step: 91980 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 14:18:31,376-Speed 5974.35 samples/sec Loss 8.6404 LearningRate 0.1529 Epoch: 8 Global Step: 91990 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:18:38,248-Speed 5963.26 samples/sec Loss 8.6834 LearningRate 0.1529 Epoch: 8 Global Step: 92000 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:18:45,127-Speed 5955.47 samples/sec Loss 8.6923 LearningRate 0.1528 Epoch: 8 Global Step: 92010 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:18:52,003-Speed 5957.81 samples/sec Loss 8.6766 LearningRate 0.1528 Epoch: 8 Global Step: 92020 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:18:58,875-Speed 5962.02 samples/sec Loss 8.7165 LearningRate 0.1528 Epoch: 8 Global Step: 92030 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:19:05,734-Speed 5973.82 samples/sec Loss 8.7094 LearningRate 0.1528 Epoch: 8 Global Step: 92040 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:19:12,617-Speed 5952.18 samples/sec Loss 8.6178 LearningRate 0.1527 Epoch: 8 Global Step: 92050 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:19:19,484-Speed 5966.03 samples/sec Loss 8.6893 LearningRate 0.1527 Epoch: 8 Global Step: 92060 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:19:26,331-Speed 5983.15 samples/sec Loss 8.6357 LearningRate 0.1527 Epoch: 8 Global Step: 92070 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:19:33,211-Speed 5954.80 samples/sec Loss 8.6496 LearningRate 0.1527 Epoch: 8 Global Step: 92080 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:19:40,085-Speed 5959.94 samples/sec Loss 8.6777 LearningRate 0.1526 Epoch: 8 Global Step: 92090 Fp16 Grad Scale: 262144 Required: 23 hours Training: 2022-01-08 14:19:46,940-Speed 5976.00 samples/sec Loss 8.6432 LearningRate 0.1526 Epoch: 8 Global Step: 92100 Fp16 Grad Scale: 262144 Required: 23 hours Training: 2022-01-08 14:19:53,801-Speed 5971.44 samples/sec Loss 8.6499 LearningRate 0.1526 Epoch: 8 Global Step: 92110 Fp16 Grad Scale: 262144 Required: 23 hours Training: 2022-01-08 14:20:00,649-Speed 5982.43 samples/sec Loss 8.7233 LearningRate 0.1525 Epoch: 8 Global Step: 92120 Fp16 Grad Scale: 262144 Required: 23 hours Training: 2022-01-08 14:20:07,505-Speed 5974.78 samples/sec Loss 8.6755 LearningRate 0.1525 Epoch: 8 Global Step: 92130 Fp16 Grad Scale: 262144 Required: 23 hours Training: 2022-01-08 14:20:14,380-Speed 5958.93 samples/sec Loss 8.6844 LearningRate 0.1525 Epoch: 8 Global Step: 92140 Fp16 Grad Scale: 262144 Required: 23 hours Training: 2022-01-08 14:20:21,260-Speed 5955.44 samples/sec Loss 8.6387 LearningRate 0.1525 Epoch: 8 Global Step: 92150 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:20:28,124-Speed 5968.89 samples/sec Loss 8.7214 LearningRate 0.1524 Epoch: 8 Global Step: 92160 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:20:34,987-Speed 5969.25 samples/sec Loss 8.7215 LearningRate 0.1524 Epoch: 8 Global Step: 92170 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:20:41,838-Speed 5979.25 samples/sec Loss 8.6413 LearningRate 0.1524 Epoch: 8 Global Step: 92180 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:20:48,698-Speed 5972.73 samples/sec Loss 8.6019 LearningRate 0.1524 Epoch: 8 Global Step: 92190 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:20:55,572-Speed 5959.18 samples/sec Loss 8.6799 LearningRate 0.1523 Epoch: 8 Global Step: 92200 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:21:02,442-Speed 5964.04 samples/sec Loss 8.6838 LearningRate 0.1523 Epoch: 8 Global Step: 92210 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:21:09,314-Speed 5961.90 samples/sec Loss 8.6041 LearningRate 0.1523 Epoch: 8 Global Step: 92220 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:21:16,161-Speed 5983.55 samples/sec Loss 8.6918 LearningRate 0.1523 Epoch: 8 Global Step: 92230 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:21:23,019-Speed 5973.92 samples/sec Loss 8.6391 LearningRate 0.1522 Epoch: 8 Global Step: 92240 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:21:29,877-Speed 5974.16 samples/sec Loss 8.6723 LearningRate 0.1522 Epoch: 8 Global Step: 92250 Fp16 Grad Scale: 262144 Required: 23 hours Training: 2022-01-08 14:21:36,721-Speed 5985.33 samples/sec Loss 8.6222 LearningRate 0.1522 Epoch: 8 Global Step: 92260 Fp16 Grad Scale: 262144 Required: 23 hours Training: 2022-01-08 14:21:43,572-Speed 5979.89 samples/sec Loss 8.7579 LearningRate 0.1521 Epoch: 8 Global Step: 92270 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:21:50,452-Speed 5954.04 samples/sec Loss 8.6978 LearningRate 0.1521 Epoch: 8 Global Step: 92280 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:21:57,310-Speed 5974.12 samples/sec Loss 8.5962 LearningRate 0.1521 Epoch: 8 Global Step: 92290 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:22:04,182-Speed 5961.01 samples/sec Loss 8.6437 LearningRate 0.1521 Epoch: 8 Global Step: 92300 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:22:11,061-Speed 5955.52 samples/sec Loss 8.6621 LearningRate 0.1520 Epoch: 8 Global Step: 92310 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:22:17,958-Speed 5940.12 samples/sec Loss 8.6956 LearningRate 0.1520 Epoch: 8 Global Step: 92320 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:22:24,822-Speed 5968.82 samples/sec Loss 8.6270 LearningRate 0.1520 Epoch: 8 Global Step: 92330 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:22:31,666-Speed 5985.82 samples/sec Loss 8.6053 LearningRate 0.1520 Epoch: 8 Global Step: 92340 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:22:38,548-Speed 5952.47 samples/sec Loss 8.6238 LearningRate 0.1519 Epoch: 8 Global Step: 92350 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:22:45,395-Speed 5984.26 samples/sec Loss 8.6701 LearningRate 0.1519 Epoch: 8 Global Step: 92360 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 14:22:52,254-Speed 5973.23 samples/sec Loss 8.6481 LearningRate 0.1519 Epoch: 8 Global Step: 92370 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 14:22:59,135-Speed 5953.19 samples/sec Loss 8.6283 LearningRate 0.1519 Epoch: 8 Global Step: 92380 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 14:23:05,983-Speed 5982.64 samples/sec Loss 8.6727 LearningRate 0.1518 Epoch: 8 Global Step: 92390 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 14:23:12,874-Speed 5945.07 samples/sec Loss 8.6602 LearningRate 0.1518 Epoch: 8 Global Step: 92400 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 14:23:19,729-Speed 5976.32 samples/sec Loss 8.6816 LearningRate 0.1518 Epoch: 8 Global Step: 92410 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 14:23:26,610-Speed 5953.97 samples/sec Loss 8.6619 LearningRate 0.1518 Epoch: 8 Global Step: 92420 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 14:23:33,473-Speed 5971.80 samples/sec Loss 8.6146 LearningRate 0.1517 Epoch: 8 Global Step: 92430 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 14:23:40,317-Speed 5984.81 samples/sec Loss 8.5019 LearningRate 0.1517 Epoch: 8 Global Step: 92440 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 14:23:47,177-Speed 5973.88 samples/sec Loss 8.6827 LearningRate 0.1517 Epoch: 8 Global Step: 92450 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 14:23:54,031-Speed 5977.14 samples/sec Loss 8.6909 LearningRate 0.1516 Epoch: 8 Global Step: 92460 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:24:00,901-Speed 5963.64 samples/sec Loss 8.6716 LearningRate 0.1516 Epoch: 8 Global Step: 92470 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:24:07,773-Speed 5961.25 samples/sec Loss 8.6681 LearningRate 0.1516 Epoch: 8 Global Step: 92480 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:24:14,637-Speed 5972.11 samples/sec Loss 8.6608 LearningRate 0.1516 Epoch: 8 Global Step: 92490 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 14:24:21,483-Speed 5983.95 samples/sec Loss 8.6268 LearningRate 0.1515 Epoch: 8 Global Step: 92500 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 14:24:28,354-Speed 5962.66 samples/sec Loss 8.6471 LearningRate 0.1515 Epoch: 8 Global Step: 92510 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 14:24:35,196-Speed 5987.52 samples/sec Loss 8.6829 LearningRate 0.1515 Epoch: 8 Global Step: 92520 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 14:24:42,102-Speed 5932.49 samples/sec Loss 8.6276 LearningRate 0.1515 Epoch: 8 Global Step: 92530 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 14:24:49,020-Speed 5922.18 samples/sec Loss 8.7300 LearningRate 0.1514 Epoch: 8 Global Step: 92540 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 14:24:55,881-Speed 5971.06 samples/sec Loss 8.6137 LearningRate 0.1514 Epoch: 8 Global Step: 92550 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 14:25:02,735-Speed 5977.63 samples/sec Loss 8.6129 LearningRate 0.1514 Epoch: 8 Global Step: 92560 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 14:25:09,593-Speed 5973.68 samples/sec Loss 8.6814 LearningRate 0.1514 Epoch: 8 Global Step: 92570 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 14:25:16,455-Speed 5970.57 samples/sec Loss 8.6388 LearningRate 0.1513 Epoch: 8 Global Step: 92580 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 14:25:23,331-Speed 5957.57 samples/sec Loss 8.5286 LearningRate 0.1513 Epoch: 8 Global Step: 92590 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 14:25:30,205-Speed 5960.36 samples/sec Loss 8.5870 LearningRate 0.1513 Epoch: 8 Global Step: 92600 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 14:25:37,072-Speed 5966.93 samples/sec Loss 8.6627 LearningRate 0.1513 Epoch: 8 Global Step: 92610 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 14:25:43,929-Speed 5974.55 samples/sec Loss 8.6787 LearningRate 0.1512 Epoch: 8 Global Step: 92620 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 14:25:50,801-Speed 5961.38 samples/sec Loss 8.6827 LearningRate 0.1512 Epoch: 8 Global Step: 92630 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:25:57,699-Speed 5939.18 samples/sec Loss 8.6082 LearningRate 0.1512 Epoch: 8 Global Step: 92640 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:26:04,629-Speed 5911.68 samples/sec Loss 8.6374 LearningRate 0.1511 Epoch: 8 Global Step: 92650 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:26:11,560-Speed 5910.49 samples/sec Loss 8.6986 LearningRate 0.1511 Epoch: 8 Global Step: 92660 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:26:18,483-Speed 5917.50 samples/sec Loss 8.6325 LearningRate 0.1511 Epoch: 8 Global Step: 92670 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:26:25,409-Speed 5914.55 samples/sec Loss 8.6980 LearningRate 0.1511 Epoch: 8 Global Step: 92680 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:26:32,340-Speed 5911.39 samples/sec Loss 8.6094 LearningRate 0.1510 Epoch: 8 Global Step: 92690 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:26:39,210-Speed 5962.85 samples/sec Loss 8.6171 LearningRate 0.1510 Epoch: 8 Global Step: 92700 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:26:46,061-Speed 5980.01 samples/sec Loss 8.5834 LearningRate 0.1510 Epoch: 8 Global Step: 92710 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:26:52,924-Speed 5969.45 samples/sec Loss 8.6277 LearningRate 0.1510 Epoch: 8 Global Step: 92720 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:26:59,832-Speed 5930.03 samples/sec Loss 8.5973 LearningRate 0.1509 Epoch: 8 Global Step: 92730 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:27:06,747-Speed 5924.25 samples/sec Loss 8.6158 LearningRate 0.1509 Epoch: 8 Global Step: 92740 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:27:13,606-Speed 5973.45 samples/sec Loss 8.6008 LearningRate 0.1509 Epoch: 8 Global Step: 92750 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:27:20,462-Speed 5975.73 samples/sec Loss 8.6375 LearningRate 0.1509 Epoch: 8 Global Step: 92760 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:27:27,327-Speed 5967.62 samples/sec Loss 8.5733 LearningRate 0.1508 Epoch: 8 Global Step: 92770 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:27:34,192-Speed 5967.91 samples/sec Loss 8.6267 LearningRate 0.1508 Epoch: 8 Global Step: 92780 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:27:41,073-Speed 5953.48 samples/sec Loss 8.6769 LearningRate 0.1508 Epoch: 8 Global Step: 92790 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:27:47,956-Speed 5951.92 samples/sec Loss 8.6366 LearningRate 0.1508 Epoch: 8 Global Step: 92800 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:27:54,834-Speed 5956.49 samples/sec Loss 8.6460 LearningRate 0.1507 Epoch: 8 Global Step: 92810 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-08 14:28:01,709-Speed 5959.71 samples/sec Loss 8.7022 LearningRate 0.1507 Epoch: 8 Global Step: 92820 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-08 14:28:08,563-Speed 5976.94 samples/sec Loss 8.6539 LearningRate 0.1507 Epoch: 8 Global Step: 92830 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:28:15,414-Speed 5980.05 samples/sec Loss 8.6353 LearningRate 0.1506 Epoch: 8 Global Step: 92840 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:28:22,276-Speed 5970.63 samples/sec Loss 8.6022 LearningRate 0.1506 Epoch: 8 Global Step: 92850 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:28:29,160-Speed 5950.83 samples/sec Loss 8.5930 LearningRate 0.1506 Epoch: 8 Global Step: 92860 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:28:36,014-Speed 5977.33 samples/sec Loss 8.6232 LearningRate 0.1506 Epoch: 8 Global Step: 92870 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:28:42,890-Speed 5958.05 samples/sec Loss 8.6098 LearningRate 0.1505 Epoch: 8 Global Step: 92880 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:28:49,749-Speed 5973.18 samples/sec Loss 8.5442 LearningRate 0.1505 Epoch: 8 Global Step: 92890 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:28:56,634-Speed 5951.40 samples/sec Loss 8.6919 LearningRate 0.1505 Epoch: 8 Global Step: 92900 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:29:03,494-Speed 5972.65 samples/sec Loss 8.5724 LearningRate 0.1505 Epoch: 8 Global Step: 92910 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:29:10,343-Speed 5981.34 samples/sec Loss 8.6986 LearningRate 0.1504 Epoch: 8 Global Step: 92920 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:29:17,219-Speed 5958.72 samples/sec Loss 8.5772 LearningRate 0.1504 Epoch: 8 Global Step: 92930 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-08 14:29:24,066-Speed 5982.47 samples/sec Loss 8.5965 LearningRate 0.1504 Epoch: 8 Global Step: 92940 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:29:30,927-Speed 5970.74 samples/sec Loss 8.5502 LearningRate 0.1504 Epoch: 8 Global Step: 92950 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:29:37,784-Speed 5974.20 samples/sec Loss 8.5456 LearningRate 0.1503 Epoch: 8 Global Step: 92960 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:29:44,641-Speed 5975.21 samples/sec Loss 8.5947 LearningRate 0.1503 Epoch: 8 Global Step: 92970 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:29:51,489-Speed 5981.74 samples/sec Loss 8.5404 LearningRate 0.1503 Epoch: 8 Global Step: 92980 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:29:58,367-Speed 5955.95 samples/sec Loss 8.5706 LearningRate 0.1503 Epoch: 8 Global Step: 92990 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:30:05,264-Speed 5940.91 samples/sec Loss 8.5979 LearningRate 0.1502 Epoch: 8 Global Step: 93000 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:30:12,119-Speed 5976.01 samples/sec Loss 8.6329 LearningRate 0.1502 Epoch: 8 Global Step: 93010 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:30:18,994-Speed 5958.99 samples/sec Loss 8.6584 LearningRate 0.1502 Epoch: 8 Global Step: 93020 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:30:25,845-Speed 5980.50 samples/sec Loss 8.5946 LearningRate 0.1501 Epoch: 8 Global Step: 93030 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:30:32,708-Speed 5969.16 samples/sec Loss 8.5647 LearningRate 0.1501 Epoch: 8 Global Step: 93040 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-08 14:30:39,553-Speed 5985.17 samples/sec Loss 8.6360 LearningRate 0.1501 Epoch: 8 Global Step: 93050 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:30:46,431-Speed 5956.10 samples/sec Loss 8.6178 LearningRate 0.1501 Epoch: 8 Global Step: 93060 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:30:53,283-Speed 5979.28 samples/sec Loss 8.6141 LearningRate 0.1500 Epoch: 8 Global Step: 93070 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:31:00,150-Speed 5966.30 samples/sec Loss 8.5823 LearningRate 0.1500 Epoch: 8 Global Step: 93080 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:31:07,017-Speed 5965.52 samples/sec Loss 8.6059 LearningRate 0.1500 Epoch: 8 Global Step: 93090 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:31:13,945-Speed 5913.46 samples/sec Loss 8.6187 LearningRate 0.1500 Epoch: 8 Global Step: 93100 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:31:20,807-Speed 5970.44 samples/sec Loss 8.6291 LearningRate 0.1499 Epoch: 8 Global Step: 93110 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:31:27,678-Speed 5963.01 samples/sec Loss 8.6043 LearningRate 0.1499 Epoch: 8 Global Step: 93120 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:31:34,541-Speed 5969.29 samples/sec Loss 8.6315 LearningRate 0.1499 Epoch: 8 Global Step: 93130 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:31:41,418-Speed 5957.65 samples/sec Loss 8.6111 LearningRate 0.1499 Epoch: 8 Global Step: 93140 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:31:48,292-Speed 5962.05 samples/sec Loss 8.6330 LearningRate 0.1498 Epoch: 8 Global Step: 93150 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:31:55,161-Speed 5964.29 samples/sec Loss 8.5966 LearningRate 0.1498 Epoch: 8 Global Step: 93160 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:32:02,017-Speed 5975.36 samples/sec Loss 8.5948 LearningRate 0.1498 Epoch: 8 Global Step: 93170 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:32:08,906-Speed 5946.73 samples/sec Loss 8.5613 LearningRate 0.1498 Epoch: 8 Global Step: 93180 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:32:15,752-Speed 5983.93 samples/sec Loss 8.6189 LearningRate 0.1497 Epoch: 8 Global Step: 93190 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:32:22,626-Speed 5960.24 samples/sec Loss 8.6492 LearningRate 0.1497 Epoch: 8 Global Step: 93200 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:32:29,504-Speed 5956.73 samples/sec Loss 8.6294 LearningRate 0.1497 Epoch: 8 Global Step: 93210 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:32:36,359-Speed 5976.29 samples/sec Loss 8.5223 LearningRate 0.1496 Epoch: 8 Global Step: 93220 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:32:43,227-Speed 5964.84 samples/sec Loss 8.6294 LearningRate 0.1496 Epoch: 8 Global Step: 93230 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:32:50,073-Speed 5983.82 samples/sec Loss 8.6466 LearningRate 0.1496 Epoch: 8 Global Step: 93240 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:32:56,936-Speed 5968.84 samples/sec Loss 8.5670 LearningRate 0.1496 Epoch: 8 Global Step: 93250 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:33:03,785-Speed 5981.42 samples/sec Loss 8.5639 LearningRate 0.1495 Epoch: 8 Global Step: 93260 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:33:10,657-Speed 5961.92 samples/sec Loss 8.5819 LearningRate 0.1495 Epoch: 8 Global Step: 93270 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:33:17,546-Speed 5946.98 samples/sec Loss 8.6346 LearningRate 0.1495 Epoch: 8 Global Step: 93280 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:33:24,400-Speed 5977.49 samples/sec Loss 8.6010 LearningRate 0.1495 Epoch: 8 Global Step: 93290 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:33:31,267-Speed 5965.85 samples/sec Loss 8.6214 LearningRate 0.1494 Epoch: 8 Global Step: 93300 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:33:38,137-Speed 5963.09 samples/sec Loss 8.5980 LearningRate 0.1494 Epoch: 8 Global Step: 93310 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:33:45,025-Speed 5949.83 samples/sec Loss 8.6122 LearningRate 0.1494 Epoch: 8 Global Step: 93320 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:34:10,090-Speed 1634.28 samples/sec Loss 8.6018 LearningRate 0.1494 Epoch: 9 Global Step: 93330 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:34:16,960-Speed 5964.04 samples/sec Loss 8.6006 LearningRate 0.1493 Epoch: 9 Global Step: 93340 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:34:23,869-Speed 5929.61 samples/sec Loss 8.6021 LearningRate 0.1493 Epoch: 9 Global Step: 93350 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-08 14:34:30,786-Speed 5923.34 samples/sec Loss 8.5335 LearningRate 0.1493 Epoch: 9 Global Step: 93360 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-08 14:34:37,681-Speed 5942.55 samples/sec Loss 8.5867 LearningRate 0.1493 Epoch: 9 Global Step: 93370 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:34:44,584-Speed 5934.45 samples/sec Loss 8.5694 LearningRate 0.1492 Epoch: 9 Global Step: 93380 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:34:51,455-Speed 5962.71 samples/sec Loss 8.6281 LearningRate 0.1492 Epoch: 9 Global Step: 93390 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 14:34:58,291-Speed 5993.44 samples/sec Loss 8.6078 LearningRate 0.1492 Epoch: 9 Global Step: 93400 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 14:35:05,130-Speed 5990.04 samples/sec Loss 8.5538 LearningRate 0.1491 Epoch: 9 Global Step: 93410 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 14:35:11,984-Speed 5977.32 samples/sec Loss 8.5810 LearningRate 0.1491 Epoch: 9 Global Step: 93420 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 14:35:18,834-Speed 5980.95 samples/sec Loss 8.5631 LearningRate 0.1491 Epoch: 9 Global Step: 93430 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 14:35:25,704-Speed 5963.13 samples/sec Loss 8.5577 LearningRate 0.1491 Epoch: 9 Global Step: 93440 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 14:35:32,572-Speed 5965.38 samples/sec Loss 8.5963 LearningRate 0.1490 Epoch: 9 Global Step: 93450 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 14:35:39,495-Speed 5917.62 samples/sec Loss 8.6240 LearningRate 0.1490 Epoch: 9 Global Step: 93460 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 14:35:46,417-Speed 5919.30 samples/sec Loss 8.5147 LearningRate 0.1490 Epoch: 9 Global Step: 93470 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 14:35:53,398-Speed 5868.39 samples/sec Loss 8.5592 LearningRate 0.1490 Epoch: 9 Global Step: 93480 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 14:36:00,361-Speed 5884.56 samples/sec Loss 8.5592 LearningRate 0.1489 Epoch: 9 Global Step: 93490 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:36:07,271-Speed 5929.00 samples/sec Loss 8.5497 LearningRate 0.1489 Epoch: 9 Global Step: 93500 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:36:14,196-Speed 5915.80 samples/sec Loss 8.5039 LearningRate 0.1489 Epoch: 9 Global Step: 93510 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:36:21,093-Speed 5940.37 samples/sec Loss 8.5950 LearningRate 0.1489 Epoch: 9 Global Step: 93520 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:36:28,004-Speed 5927.47 samples/sec Loss 8.5369 LearningRate 0.1488 Epoch: 9 Global Step: 93530 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:36:34,907-Speed 5934.78 samples/sec Loss 8.5058 LearningRate 0.1488 Epoch: 9 Global Step: 93540 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:36:41,817-Speed 5928.90 samples/sec Loss 8.6027 LearningRate 0.1488 Epoch: 9 Global Step: 93550 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:36:48,676-Speed 5972.98 samples/sec Loss 8.6153 LearningRate 0.1488 Epoch: 9 Global Step: 93560 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:36:55,536-Speed 5972.31 samples/sec Loss 8.5984 LearningRate 0.1487 Epoch: 9 Global Step: 93570 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:37:02,394-Speed 5973.54 samples/sec Loss 8.4898 LearningRate 0.1487 Epoch: 9 Global Step: 93580 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:37:09,246-Speed 5979.30 samples/sec Loss 8.4640 LearningRate 0.1487 Epoch: 9 Global Step: 93590 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:37:16,104-Speed 5974.05 samples/sec Loss 8.5527 LearningRate 0.1487 Epoch: 9 Global Step: 93600 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:37:22,950-Speed 5983.64 samples/sec Loss 8.6179 LearningRate 0.1486 Epoch: 9 Global Step: 93610 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:37:29,807-Speed 5974.74 samples/sec Loss 8.5476 LearningRate 0.1486 Epoch: 9 Global Step: 93620 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:37:36,677-Speed 5963.32 samples/sec Loss 8.5651 LearningRate 0.1486 Epoch: 9 Global Step: 93630 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:37:43,538-Speed 5970.36 samples/sec Loss 8.5095 LearningRate 0.1485 Epoch: 9 Global Step: 93640 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:37:50,414-Speed 5958.27 samples/sec Loss 8.5645 LearningRate 0.1485 Epoch: 9 Global Step: 93650 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:37:57,271-Speed 5974.94 samples/sec Loss 8.5903 LearningRate 0.1485 Epoch: 9 Global Step: 93660 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:38:04,223-Speed 5892.92 samples/sec Loss 8.5522 LearningRate 0.1485 Epoch: 9 Global Step: 93670 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:38:11,094-Speed 5962.66 samples/sec Loss 8.5857 LearningRate 0.1484 Epoch: 9 Global Step: 93680 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:38:17,935-Speed 5988.39 samples/sec Loss 8.6214 LearningRate 0.1484 Epoch: 9 Global Step: 93690 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:38:24,795-Speed 5972.28 samples/sec Loss 8.5957 LearningRate 0.1484 Epoch: 9 Global Step: 93700 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:38:31,652-Speed 5974.51 samples/sec Loss 8.5012 LearningRate 0.1484 Epoch: 9 Global Step: 93710 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:38:38,533-Speed 5953.19 samples/sec Loss 8.6468 LearningRate 0.1483 Epoch: 9 Global Step: 93720 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:38:45,434-Speed 5937.16 samples/sec Loss 8.5930 LearningRate 0.1483 Epoch: 9 Global Step: 93730 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:38:52,333-Speed 5939.03 samples/sec Loss 8.5796 LearningRate 0.1483 Epoch: 9 Global Step: 93740 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:38:59,200-Speed 5969.38 samples/sec Loss 8.5654 LearningRate 0.1483 Epoch: 9 Global Step: 93750 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:39:06,065-Speed 5967.44 samples/sec Loss 8.5725 LearningRate 0.1482 Epoch: 9 Global Step: 93760 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:39:12,983-Speed 5921.70 samples/sec Loss 8.5157 LearningRate 0.1482 Epoch: 9 Global Step: 93770 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:39:19,834-Speed 5979.43 samples/sec Loss 8.5580 LearningRate 0.1482 Epoch: 9 Global Step: 93780 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:39:26,682-Speed 5983.01 samples/sec Loss 8.5990 LearningRate 0.1482 Epoch: 9 Global Step: 93790 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:39:33,545-Speed 5971.65 samples/sec Loss 8.5310 LearningRate 0.1481 Epoch: 9 Global Step: 93800 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:39:40,419-Speed 5959.20 samples/sec Loss 8.6051 LearningRate 0.1481 Epoch: 9 Global Step: 93810 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:39:47,286-Speed 5970.56 samples/sec Loss 8.5811 LearningRate 0.1481 Epoch: 9 Global Step: 93820 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:39:54,144-Speed 5975.70 samples/sec Loss 8.5989 LearningRate 0.1481 Epoch: 9 Global Step: 93830 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:40:01,006-Speed 5970.63 samples/sec Loss 8.5320 LearningRate 0.1480 Epoch: 9 Global Step: 93840 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:40:07,858-Speed 5978.84 samples/sec Loss 8.5167 LearningRate 0.1480 Epoch: 9 Global Step: 93850 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:40:14,715-Speed 5974.48 samples/sec Loss 8.4865 LearningRate 0.1480 Epoch: 9 Global Step: 93860 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:40:21,584-Speed 5964.02 samples/sec Loss 8.5588 LearningRate 0.1479 Epoch: 9 Global Step: 93870 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:40:28,447-Speed 5969.65 samples/sec Loss 8.4700 LearningRate 0.1479 Epoch: 9 Global Step: 93880 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:40:35,301-Speed 5977.50 samples/sec Loss 8.4780 LearningRate 0.1479 Epoch: 9 Global Step: 93890 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:40:42,176-Speed 5958.77 samples/sec Loss 8.6121 LearningRate 0.1479 Epoch: 9 Global Step: 93900 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:40:49,035-Speed 5973.71 samples/sec Loss 8.5202 LearningRate 0.1478 Epoch: 9 Global Step: 93910 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:40:55,892-Speed 5974.37 samples/sec Loss 8.5411 LearningRate 0.1478 Epoch: 9 Global Step: 93920 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:41:02,741-Speed 5981.87 samples/sec Loss 8.5682 LearningRate 0.1478 Epoch: 9 Global Step: 93930 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:41:09,596-Speed 5975.43 samples/sec Loss 8.5380 LearningRate 0.1478 Epoch: 9 Global Step: 93940 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:41:16,450-Speed 5977.05 samples/sec Loss 8.5079 LearningRate 0.1477 Epoch: 9 Global Step: 93950 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:41:23,308-Speed 5973.81 samples/sec Loss 8.4924 LearningRate 0.1477 Epoch: 9 Global Step: 93960 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:41:30,164-Speed 5975.69 samples/sec Loss 8.5613 LearningRate 0.1477 Epoch: 9 Global Step: 93970 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:41:37,016-Speed 5978.94 samples/sec Loss 8.5533 LearningRate 0.1477 Epoch: 9 Global Step: 93980 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:41:43,976-Speed 5886.62 samples/sec Loss 8.5687 LearningRate 0.1476 Epoch: 9 Global Step: 93990 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:41:50,863-Speed 5949.93 samples/sec Loss 8.4304 LearningRate 0.1476 Epoch: 9 Global Step: 94000 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:41:57,718-Speed 5975.84 samples/sec Loss 8.5611 LearningRate 0.1476 Epoch: 9 Global Step: 94010 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:42:04,564-Speed 5983.48 samples/sec Loss 8.5431 LearningRate 0.1476 Epoch: 9 Global Step: 94020 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:42:11,415-Speed 5979.75 samples/sec Loss 8.5164 LearningRate 0.1475 Epoch: 9 Global Step: 94030 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:42:18,288-Speed 5960.62 samples/sec Loss 8.5568 LearningRate 0.1475 Epoch: 9 Global Step: 94040 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:42:25,133-Speed 5985.11 samples/sec Loss 8.5038 LearningRate 0.1475 Epoch: 9 Global Step: 94050 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:42:32,007-Speed 5959.57 samples/sec Loss 8.5626 LearningRate 0.1475 Epoch: 9 Global Step: 94060 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:42:38,860-Speed 5978.31 samples/sec Loss 8.5771 LearningRate 0.1474 Epoch: 9 Global Step: 94070 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:42:45,710-Speed 5980.49 samples/sec Loss 8.5879 LearningRate 0.1474 Epoch: 9 Global Step: 94080 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:42:52,560-Speed 5980.03 samples/sec Loss 8.4936 LearningRate 0.1474 Epoch: 9 Global Step: 94090 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:42:59,428-Speed 5967.55 samples/sec Loss 8.5467 LearningRate 0.1473 Epoch: 9 Global Step: 94100 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:43:06,289-Speed 5971.19 samples/sec Loss 8.5786 LearningRate 0.1473 Epoch: 9 Global Step: 94110 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:43:13,150-Speed 5971.28 samples/sec Loss 8.5228 LearningRate 0.1473 Epoch: 9 Global Step: 94120 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:43:20,039-Speed 5947.05 samples/sec Loss 8.4929 LearningRate 0.1473 Epoch: 9 Global Step: 94130 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:43:26,909-Speed 5967.08 samples/sec Loss 8.5082 LearningRate 0.1472 Epoch: 9 Global Step: 94140 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:43:33,770-Speed 5970.99 samples/sec Loss 8.5144 LearningRate 0.1472 Epoch: 9 Global Step: 94150 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:43:40,621-Speed 5979.51 samples/sec Loss 8.4777 LearningRate 0.1472 Epoch: 9 Global Step: 94160 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:43:47,476-Speed 5976.15 samples/sec Loss 8.5247 LearningRate 0.1472 Epoch: 9 Global Step: 94170 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:43:54,354-Speed 5957.17 samples/sec Loss 8.5446 LearningRate 0.1471 Epoch: 9 Global Step: 94180 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:44:01,209-Speed 5975.95 samples/sec Loss 8.5224 LearningRate 0.1471 Epoch: 9 Global Step: 94190 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-08 14:44:08,062-Speed 5978.70 samples/sec Loss 8.5389 LearningRate 0.1471 Epoch: 9 Global Step: 94200 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-08 14:44:14,924-Speed 5971.04 samples/sec Loss 8.6043 LearningRate 0.1471 Epoch: 9 Global Step: 94210 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-08 14:44:21,803-Speed 5955.80 samples/sec Loss 8.5643 LearningRate 0.1470 Epoch: 9 Global Step: 94220 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-08 14:44:28,673-Speed 5963.95 samples/sec Loss 8.5091 LearningRate 0.1470 Epoch: 9 Global Step: 94230 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:44:35,531-Speed 5973.81 samples/sec Loss 8.5107 LearningRate 0.1470 Epoch: 9 Global Step: 94240 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:44:42,386-Speed 5976.77 samples/sec Loss 8.5990 LearningRate 0.1470 Epoch: 9 Global Step: 94250 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:44:49,263-Speed 5959.85 samples/sec Loss 8.4748 LearningRate 0.1469 Epoch: 9 Global Step: 94260 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:44:56,125-Speed 5970.44 samples/sec Loss 8.5106 LearningRate 0.1469 Epoch: 9 Global Step: 94270 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:45:03,006-Speed 5953.45 samples/sec Loss 8.5284 LearningRate 0.1469 Epoch: 9 Global Step: 94280 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:45:09,861-Speed 5977.15 samples/sec Loss 8.5269 LearningRate 0.1469 Epoch: 9 Global Step: 94290 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:45:16,721-Speed 5973.53 samples/sec Loss 8.5217 LearningRate 0.1468 Epoch: 9 Global Step: 94300 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:45:23,555-Speed 5994.38 samples/sec Loss 8.5393 LearningRate 0.1468 Epoch: 9 Global Step: 94310 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:45:30,408-Speed 5978.21 samples/sec Loss 8.6126 LearningRate 0.1468 Epoch: 9 Global Step: 94320 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:45:37,258-Speed 5980.79 samples/sec Loss 8.5081 LearningRate 0.1468 Epoch: 9 Global Step: 94330 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:45:44,117-Speed 5972.91 samples/sec Loss 8.4557 LearningRate 0.1467 Epoch: 9 Global Step: 94340 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:45:51,036-Speed 5921.52 samples/sec Loss 8.5024 LearningRate 0.1467 Epoch: 9 Global Step: 94350 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:45:57,898-Speed 5970.22 samples/sec Loss 8.5188 LearningRate 0.1467 Epoch: 9 Global Step: 94360 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:46:04,751-Speed 5979.48 samples/sec Loss 8.5121 LearningRate 0.1466 Epoch: 9 Global Step: 94370 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:46:11,631-Speed 5954.51 samples/sec Loss 8.4240 LearningRate 0.1466 Epoch: 9 Global Step: 94380 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:46:18,483-Speed 5979.68 samples/sec Loss 8.4269 LearningRate 0.1466 Epoch: 9 Global Step: 94390 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:46:25,344-Speed 5971.07 samples/sec Loss 8.5239 LearningRate 0.1466 Epoch: 9 Global Step: 94400 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:46:32,203-Speed 5972.90 samples/sec Loss 8.5564 LearningRate 0.1465 Epoch: 9 Global Step: 94410 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:46:39,065-Speed 5970.79 samples/sec Loss 8.4707 LearningRate 0.1465 Epoch: 9 Global Step: 94420 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:46:45,930-Speed 5969.58 samples/sec Loss 8.5504 LearningRate 0.1465 Epoch: 9 Global Step: 94430 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:46:52,783-Speed 5977.95 samples/sec Loss 8.5255 LearningRate 0.1465 Epoch: 9 Global Step: 94440 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:46:59,645-Speed 5970.21 samples/sec Loss 8.4512 LearningRate 0.1464 Epoch: 9 Global Step: 94450 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:47:06,505-Speed 5972.34 samples/sec Loss 8.5675 LearningRate 0.1464 Epoch: 9 Global Step: 94460 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:47:13,383-Speed 5958.13 samples/sec Loss 8.4981 LearningRate 0.1464 Epoch: 9 Global Step: 94470 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:47:20,233-Speed 5980.63 samples/sec Loss 8.5204 LearningRate 0.1464 Epoch: 9 Global Step: 94480 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:47:27,104-Speed 5962.53 samples/sec Loss 8.5080 LearningRate 0.1463 Epoch: 9 Global Step: 94490 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:47:33,947-Speed 5987.00 samples/sec Loss 8.5033 LearningRate 0.1463 Epoch: 9 Global Step: 94500 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:47:40,792-Speed 5984.75 samples/sec Loss 8.4324 LearningRate 0.1463 Epoch: 9 Global Step: 94510 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-08 14:47:47,655-Speed 5972.30 samples/sec Loss 8.4608 LearningRate 0.1463 Epoch: 9 Global Step: 94520 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-08 14:47:54,525-Speed 5963.66 samples/sec Loss 8.4531 LearningRate 0.1462 Epoch: 9 Global Step: 94530 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-08 14:48:01,377-Speed 5978.40 samples/sec Loss 8.4875 LearningRate 0.1462 Epoch: 9 Global Step: 94540 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-08 14:48:08,263-Speed 5949.76 samples/sec Loss 8.5204 LearningRate 0.1462 Epoch: 9 Global Step: 94550 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-08 14:48:15,161-Speed 5939.41 samples/sec Loss 8.5140 LearningRate 0.1462 Epoch: 9 Global Step: 94560 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-08 14:48:22,035-Speed 5960.24 samples/sec Loss 8.4840 LearningRate 0.1461 Epoch: 9 Global Step: 94570 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:48:28,903-Speed 5964.88 samples/sec Loss 8.5002 LearningRate 0.1461 Epoch: 9 Global Step: 94580 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:48:35,767-Speed 5969.24 samples/sec Loss 8.5241 LearningRate 0.1461 Epoch: 9 Global Step: 94590 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:48:42,616-Speed 5981.75 samples/sec Loss 8.4691 LearningRate 0.1461 Epoch: 9 Global Step: 94600 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:48:49,477-Speed 5970.79 samples/sec Loss 8.4837 LearningRate 0.1460 Epoch: 9 Global Step: 94610 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:48:56,330-Speed 5978.49 samples/sec Loss 8.4813 LearningRate 0.1460 Epoch: 9 Global Step: 94620 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:49:03,205-Speed 5958.41 samples/sec Loss 8.4594 LearningRate 0.1460 Epoch: 9 Global Step: 94630 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:49:10,078-Speed 5961.20 samples/sec Loss 8.5356 LearningRate 0.1459 Epoch: 9 Global Step: 94640 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:49:16,946-Speed 5964.87 samples/sec Loss 8.5254 LearningRate 0.1459 Epoch: 9 Global Step: 94650 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:49:23,790-Speed 5985.26 samples/sec Loss 8.4482 LearningRate 0.1459 Epoch: 9 Global Step: 94660 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:49:30,659-Speed 5964.36 samples/sec Loss 8.5002 LearningRate 0.1459 Epoch: 9 Global Step: 94670 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-08 14:49:37,514-Speed 5975.74 samples/sec Loss 8.4142 LearningRate 0.1458 Epoch: 9 Global Step: 94680 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-08 14:49:44,348-Speed 5997.67 samples/sec Loss 8.4980 LearningRate 0.1458 Epoch: 9 Global Step: 94690 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:49:51,199-Speed 5979.69 samples/sec Loss 8.5033 LearningRate 0.1458 Epoch: 9 Global Step: 94700 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:49:58,057-Speed 5973.87 samples/sec Loss 8.5505 LearningRate 0.1458 Epoch: 9 Global Step: 94710 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:50:04,921-Speed 5968.01 samples/sec Loss 8.4542 LearningRate 0.1457 Epoch: 9 Global Step: 94720 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:50:11,774-Speed 5978.26 samples/sec Loss 8.4480 LearningRate 0.1457 Epoch: 9 Global Step: 94730 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:50:18,619-Speed 5985.00 samples/sec Loss 8.5048 LearningRate 0.1457 Epoch: 9 Global Step: 94740 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:50:25,494-Speed 5958.44 samples/sec Loss 8.4821 LearningRate 0.1457 Epoch: 9 Global Step: 94750 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:50:32,362-Speed 5965.51 samples/sec Loss 8.5401 LearningRate 0.1456 Epoch: 9 Global Step: 94760 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:50:39,204-Speed 5987.18 samples/sec Loss 8.5126 LearningRate 0.1456 Epoch: 9 Global Step: 94770 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:50:46,047-Speed 5986.56 samples/sec Loss 8.4650 LearningRate 0.1456 Epoch: 9 Global Step: 94780 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:50:52,966-Speed 5920.55 samples/sec Loss 8.4403 LearningRate 0.1456 Epoch: 9 Global Step: 94790 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-08 14:50:59,853-Speed 5948.61 samples/sec Loss 8.5058 LearningRate 0.1455 Epoch: 9 Global Step: 94800 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:51:06,702-Speed 5981.73 samples/sec Loss 8.5397 LearningRate 0.1455 Epoch: 9 Global Step: 94810 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:51:13,562-Speed 5971.91 samples/sec Loss 8.5284 LearningRate 0.1455 Epoch: 9 Global Step: 94820 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:51:20,423-Speed 5971.30 samples/sec Loss 8.4941 LearningRate 0.1455 Epoch: 9 Global Step: 94830 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:51:27,275-Speed 5979.49 samples/sec Loss 8.4885 LearningRate 0.1454 Epoch: 9 Global Step: 94840 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:51:34,153-Speed 5957.62 samples/sec Loss 8.5219 LearningRate 0.1454 Epoch: 9 Global Step: 94850 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:51:41,011-Speed 5975.31 samples/sec Loss 8.4710 LearningRate 0.1454 Epoch: 9 Global Step: 94860 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:51:47,865-Speed 5977.95 samples/sec Loss 8.4645 LearningRate 0.1454 Epoch: 9 Global Step: 94870 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:51:54,717-Speed 5981.60 samples/sec Loss 8.5146 LearningRate 0.1453 Epoch: 9 Global Step: 94880 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:52:01,577-Speed 5972.64 samples/sec Loss 8.4724 LearningRate 0.1453 Epoch: 9 Global Step: 94890 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:52:08,423-Speed 5983.93 samples/sec Loss 8.5015 LearningRate 0.1453 Epoch: 9 Global Step: 94900 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-08 14:52:15,274-Speed 5979.30 samples/sec Loss 8.4068 LearningRate 0.1452 Epoch: 9 Global Step: 94910 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-08 14:52:22,136-Speed 5970.65 samples/sec Loss 8.4158 LearningRate 0.1452 Epoch: 9 Global Step: 94920 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:52:28,996-Speed 5975.29 samples/sec Loss 8.5034 LearningRate 0.1452 Epoch: 9 Global Step: 94930 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:52:35,952-Speed 5889.73 samples/sec Loss 8.4773 LearningRate 0.1452 Epoch: 9 Global Step: 94940 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:52:42,825-Speed 5960.64 samples/sec Loss 8.4502 LearningRate 0.1451 Epoch: 9 Global Step: 94950 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:52:49,696-Speed 5962.71 samples/sec Loss 8.4352 LearningRate 0.1451 Epoch: 9 Global Step: 94960 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:52:56,569-Speed 5961.30 samples/sec Loss 8.4770 LearningRate 0.1451 Epoch: 9 Global Step: 94970 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:53:03,448-Speed 5955.65 samples/sec Loss 8.4380 LearningRate 0.1451 Epoch: 9 Global Step: 94980 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:53:10,314-Speed 5966.88 samples/sec Loss 8.4307 LearningRate 0.1450 Epoch: 9 Global Step: 94990 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:53:17,172-Speed 5973.67 samples/sec Loss 8.4288 LearningRate 0.1450 Epoch: 9 Global Step: 95000 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:53:43,723-[lfw][95000]XNorm: 22.545525 Training: 2022-01-08 14:53:43,724-[lfw][95000]Accuracy-Flip: 0.99700+-0.00267 Training: 2022-01-08 14:53:43,725-[lfw][95000]Accuracy-Highest: 0.99750 Training: 2022-01-08 14:54:14,771-[cfp_fp][95000]XNorm: 19.472820 Training: 2022-01-08 14:54:14,772-[cfp_fp][95000]Accuracy-Flip: 0.97957+-0.00808 Training: 2022-01-08 14:54:14,773-[cfp_fp][95000]Accuracy-Highest: 0.98114 Training: 2022-01-08 14:54:41,583-[agedb_30][95000]XNorm: 21.829587 Training: 2022-01-08 14:54:41,583-[agedb_30][95000]Accuracy-Flip: 0.97150+-0.00762 Training: 2022-01-08 14:54:41,584-[agedb_30][95000]Accuracy-Highest: 0.97150 Training: 2022-01-08 14:54:48,440-Speed 448.79 samples/sec Loss 8.4634 LearningRate 0.1450 Epoch: 9 Global Step: 95010 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:54:55,300-Speed 5978.92 samples/sec Loss 8.4394 LearningRate 0.1450 Epoch: 9 Global Step: 95020 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:55:02,157-Speed 5988.70 samples/sec Loss 8.4882 LearningRate 0.1449 Epoch: 9 Global Step: 95030 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:55:09,015-Speed 5974.23 samples/sec Loss 8.4639 LearningRate 0.1449 Epoch: 9 Global Step: 95040 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:55:15,858-Speed 5986.55 samples/sec Loss 8.4449 LearningRate 0.1449 Epoch: 9 Global Step: 95050 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:55:22,713-Speed 5975.90 samples/sec Loss 8.3620 LearningRate 0.1449 Epoch: 9 Global Step: 95060 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:55:29,587-Speed 5962.62 samples/sec Loss 8.3835 LearningRate 0.1448 Epoch: 9 Global Step: 95070 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:55:36,563-Speed 5873.04 samples/sec Loss 8.4737 LearningRate 0.1448 Epoch: 9 Global Step: 95080 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:55:43,446-Speed 5951.63 samples/sec Loss 8.5341 LearningRate 0.1448 Epoch: 9 Global Step: 95090 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:55:50,326-Speed 5959.89 samples/sec Loss 8.4734 LearningRate 0.1448 Epoch: 9 Global Step: 95100 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:55:57,188-Speed 5972.18 samples/sec Loss 8.4913 LearningRate 0.1447 Epoch: 9 Global Step: 95110 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:56:04,066-Speed 5956.68 samples/sec Loss 8.4386 LearningRate 0.1447 Epoch: 9 Global Step: 95120 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-08 14:56:10,911-Speed 5984.81 samples/sec Loss 8.4819 LearningRate 0.1447 Epoch: 9 Global Step: 95130 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:56:17,772-Speed 5971.36 samples/sec Loss 8.4183 LearningRate 0.1447 Epoch: 9 Global Step: 95140 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:56:24,620-Speed 5981.76 samples/sec Loss 8.4623 LearningRate 0.1446 Epoch: 9 Global Step: 95150 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:56:31,475-Speed 5976.69 samples/sec Loss 8.4561 LearningRate 0.1446 Epoch: 9 Global Step: 95160 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:56:38,337-Speed 5970.38 samples/sec Loss 8.4387 LearningRate 0.1446 Epoch: 9 Global Step: 95170 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:56:45,207-Speed 5962.79 samples/sec Loss 8.4338 LearningRate 0.1446 Epoch: 9 Global Step: 95180 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:56:52,054-Speed 5983.23 samples/sec Loss 8.3854 LearningRate 0.1445 Epoch: 9 Global Step: 95190 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:56:58,916-Speed 5970.59 samples/sec Loss 8.4721 LearningRate 0.1445 Epoch: 9 Global Step: 95200 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:57:05,784-Speed 5964.46 samples/sec Loss 8.4417 LearningRate 0.1445 Epoch: 9 Global Step: 95210 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:57:12,644-Speed 5972.06 samples/sec Loss 8.4605 LearningRate 0.1444 Epoch: 9 Global Step: 95220 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:57:19,506-Speed 5970.24 samples/sec Loss 8.4064 LearningRate 0.1444 Epoch: 9 Global Step: 95230 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:57:26,356-Speed 5983.30 samples/sec Loss 8.4413 LearningRate 0.1444 Epoch: 9 Global Step: 95240 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:57:33,218-Speed 5970.21 samples/sec Loss 8.5052 LearningRate 0.1444 Epoch: 9 Global Step: 95250 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:57:40,052-Speed 5994.24 samples/sec Loss 8.4274 LearningRate 0.1443 Epoch: 9 Global Step: 95260 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:57:46,970-Speed 5922.27 samples/sec Loss 8.4579 LearningRate 0.1443 Epoch: 9 Global Step: 95270 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:57:53,922-Speed 5893.32 samples/sec Loss 8.4356 LearningRate 0.1443 Epoch: 9 Global Step: 95280 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:58:00,836-Speed 5925.12 samples/sec Loss 8.3978 LearningRate 0.1443 Epoch: 9 Global Step: 95290 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:58:07,722-Speed 5950.36 samples/sec Loss 8.4149 LearningRate 0.1442 Epoch: 9 Global Step: 95300 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:58:14,591-Speed 5963.37 samples/sec Loss 8.3628 LearningRate 0.1442 Epoch: 9 Global Step: 95310 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:58:21,433-Speed 5987.87 samples/sec Loss 8.4107 LearningRate 0.1442 Epoch: 9 Global Step: 95320 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:58:28,278-Speed 5984.98 samples/sec Loss 8.4953 LearningRate 0.1442 Epoch: 9 Global Step: 95330 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:58:35,124-Speed 5983.69 samples/sec Loss 8.4198 LearningRate 0.1441 Epoch: 9 Global Step: 95340 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:58:41,976-Speed 5979.18 samples/sec Loss 8.5021 LearningRate 0.1441 Epoch: 9 Global Step: 95350 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:58:48,828-Speed 5978.34 samples/sec Loss 8.4859 LearningRate 0.1441 Epoch: 9 Global Step: 95360 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:58:55,677-Speed 5981.73 samples/sec Loss 8.4718 LearningRate 0.1441 Epoch: 9 Global Step: 95370 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 14:59:02,527-Speed 5980.05 samples/sec Loss 8.4545 LearningRate 0.1440 Epoch: 9 Global Step: 95380 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:59:09,382-Speed 5975.21 samples/sec Loss 8.5070 LearningRate 0.1440 Epoch: 9 Global Step: 95390 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:59:16,238-Speed 5976.20 samples/sec Loss 8.4452 LearningRate 0.1440 Epoch: 9 Global Step: 95400 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:59:23,207-Speed 5878.20 samples/sec Loss 8.4474 LearningRate 0.1440 Epoch: 9 Global Step: 95410 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:59:30,089-Speed 5953.27 samples/sec Loss 8.3916 LearningRate 0.1439 Epoch: 9 Global Step: 95420 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:59:36,942-Speed 5977.53 samples/sec Loss 8.4204 LearningRate 0.1439 Epoch: 9 Global Step: 95430 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:59:43,808-Speed 5966.44 samples/sec Loss 8.4021 LearningRate 0.1439 Epoch: 9 Global Step: 95440 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:59:50,652-Speed 5986.66 samples/sec Loss 8.4189 LearningRate 0.1439 Epoch: 9 Global Step: 95450 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 14:59:57,494-Speed 5987.28 samples/sec Loss 8.4680 LearningRate 0.1438 Epoch: 9 Global Step: 95460 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 15:00:04,346-Speed 5979.06 samples/sec Loss 8.4123 LearningRate 0.1438 Epoch: 9 Global Step: 95470 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 15:00:11,190-Speed 5985.89 samples/sec Loss 8.3728 LearningRate 0.1438 Epoch: 9 Global Step: 95480 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:00:18,050-Speed 5971.51 samples/sec Loss 8.4114 LearningRate 0.1438 Epoch: 9 Global Step: 95490 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:00:24,900-Speed 5981.32 samples/sec Loss 8.4818 LearningRate 0.1437 Epoch: 9 Global Step: 95500 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:00:31,745-Speed 5984.50 samples/sec Loss 8.4626 LearningRate 0.1437 Epoch: 9 Global Step: 95510 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:00:38,619-Speed 5959.50 samples/sec Loss 8.4089 LearningRate 0.1437 Epoch: 9 Global Step: 95520 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:00:45,493-Speed 5959.98 samples/sec Loss 8.3127 LearningRate 0.1437 Epoch: 9 Global Step: 95530 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:00:52,380-Speed 5948.36 samples/sec Loss 8.4802 LearningRate 0.1436 Epoch: 9 Global Step: 95540 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:00:59,243-Speed 5971.58 samples/sec Loss 8.4205 LearningRate 0.1436 Epoch: 9 Global Step: 95550 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:01:06,105-Speed 5970.32 samples/sec Loss 8.3881 LearningRate 0.1436 Epoch: 9 Global Step: 95560 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:01:12,984-Speed 5955.40 samples/sec Loss 8.4143 LearningRate 0.1435 Epoch: 9 Global Step: 95570 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:01:19,841-Speed 5974.59 samples/sec Loss 8.5028 LearningRate 0.1435 Epoch: 9 Global Step: 95580 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-08 15:01:26,682-Speed 5988.70 samples/sec Loss 8.4154 LearningRate 0.1435 Epoch: 9 Global Step: 95590 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:01:33,534-Speed 5979.03 samples/sec Loss 8.3527 LearningRate 0.1435 Epoch: 9 Global Step: 95600 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:01:40,389-Speed 5976.03 samples/sec Loss 8.4355 LearningRate 0.1434 Epoch: 9 Global Step: 95610 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 15:01:47,247-Speed 5973.60 samples/sec Loss 8.4150 LearningRate 0.1434 Epoch: 9 Global Step: 95620 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 15:01:54,110-Speed 5969.47 samples/sec Loss 8.3903 LearningRate 0.1434 Epoch: 9 Global Step: 95630 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 15:02:00,957-Speed 5983.68 samples/sec Loss 8.3490 LearningRate 0.1434 Epoch: 9 Global Step: 95640 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 15:02:07,842-Speed 5950.26 samples/sec Loss 8.3753 LearningRate 0.1433 Epoch: 9 Global Step: 95650 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 15:02:14,719-Speed 5956.69 samples/sec Loss 8.3956 LearningRate 0.1433 Epoch: 9 Global Step: 95660 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 15:02:21,593-Speed 5961.00 samples/sec Loss 8.4381 LearningRate 0.1433 Epoch: 9 Global Step: 95670 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 15:02:28,463-Speed 5963.84 samples/sec Loss 8.4177 LearningRate 0.1433 Epoch: 9 Global Step: 95680 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 15:02:35,317-Speed 5977.13 samples/sec Loss 8.4060 LearningRate 0.1432 Epoch: 9 Global Step: 95690 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 15:02:42,165-Speed 5984.15 samples/sec Loss 8.4420 LearningRate 0.1432 Epoch: 9 Global Step: 95700 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 15:02:49,024-Speed 5972.74 samples/sec Loss 8.4102 LearningRate 0.1432 Epoch: 9 Global Step: 95710 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:02:55,883-Speed 5973.25 samples/sec Loss 8.4111 LearningRate 0.1432 Epoch: 9 Global Step: 95720 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:03:02,746-Speed 5968.79 samples/sec Loss 8.4903 LearningRate 0.1431 Epoch: 9 Global Step: 95730 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:03:09,620-Speed 5960.47 samples/sec Loss 8.5000 LearningRate 0.1431 Epoch: 9 Global Step: 95740 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:03:16,487-Speed 5965.39 samples/sec Loss 8.4118 LearningRate 0.1431 Epoch: 9 Global Step: 95750 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:03:23,356-Speed 5964.76 samples/sec Loss 8.4275 LearningRate 0.1431 Epoch: 9 Global Step: 95760 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:03:30,238-Speed 5952.65 samples/sec Loss 8.3920 LearningRate 0.1430 Epoch: 9 Global Step: 95770 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:03:37,101-Speed 5969.65 samples/sec Loss 8.3880 LearningRate 0.1430 Epoch: 9 Global Step: 95780 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:03:43,967-Speed 5966.34 samples/sec Loss 8.4160 LearningRate 0.1430 Epoch: 9 Global Step: 95790 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:03:50,825-Speed 5974.13 samples/sec Loss 8.4547 LearningRate 0.1430 Epoch: 9 Global Step: 95800 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:03:57,680-Speed 5976.32 samples/sec Loss 8.3791 LearningRate 0.1429 Epoch: 9 Global Step: 95810 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-08 15:04:04,543-Speed 5968.87 samples/sec Loss 8.4244 LearningRate 0.1429 Epoch: 9 Global Step: 95820 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:04:11,390-Speed 5983.84 samples/sec Loss 8.3792 LearningRate 0.1429 Epoch: 9 Global Step: 95830 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:04:18,248-Speed 5974.27 samples/sec Loss 8.4533 LearningRate 0.1429 Epoch: 9 Global Step: 95840 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:04:25,088-Speed 5989.38 samples/sec Loss 8.4107 LearningRate 0.1428 Epoch: 9 Global Step: 95850 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 15:04:31,974-Speed 5948.92 samples/sec Loss 8.3669 LearningRate 0.1428 Epoch: 9 Global Step: 95860 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 15:04:38,844-Speed 5963.18 samples/sec Loss 8.3996 LearningRate 0.1428 Epoch: 9 Global Step: 95870 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 15:04:45,708-Speed 5968.55 samples/sec Loss 8.4004 LearningRate 0.1428 Epoch: 9 Global Step: 95880 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 15:04:52,557-Speed 5981.77 samples/sec Loss 8.4378 LearningRate 0.1427 Epoch: 9 Global Step: 95890 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 15:04:59,432-Speed 5958.50 samples/sec Loss 8.3625 LearningRate 0.1427 Epoch: 9 Global Step: 95900 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 15:05:06,284-Speed 5979.04 samples/sec Loss 8.3206 LearningRate 0.1427 Epoch: 9 Global Step: 95910 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 15:05:13,136-Speed 5978.40 samples/sec Loss 8.3369 LearningRate 0.1427 Epoch: 9 Global Step: 95920 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 15:05:19,993-Speed 5975.69 samples/sec Loss 8.4505 LearningRate 0.1426 Epoch: 9 Global Step: 95930 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 15:05:26,846-Speed 5977.73 samples/sec Loss 8.3538 LearningRate 0.1426 Epoch: 9 Global Step: 95940 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 15:05:33,707-Speed 5970.76 samples/sec Loss 8.4741 LearningRate 0.1426 Epoch: 9 Global Step: 95950 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:05:40,566-Speed 5973.26 samples/sec Loss 8.3989 LearningRate 0.1426 Epoch: 9 Global Step: 95960 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:05:47,485-Speed 5920.66 samples/sec Loss 8.4722 LearningRate 0.1425 Epoch: 9 Global Step: 95970 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 15:05:54,424-Speed 5904.16 samples/sec Loss 8.3912 LearningRate 0.1425 Epoch: 9 Global Step: 95980 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 15:06:01,340-Speed 5923.15 samples/sec Loss 8.3972 LearningRate 0.1425 Epoch: 9 Global Step: 95990 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 15:06:08,199-Speed 5973.16 samples/sec Loss 8.4312 LearningRate 0.1424 Epoch: 9 Global Step: 96000 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 15:06:15,057-Speed 5973.41 samples/sec Loss 8.3672 LearningRate 0.1424 Epoch: 9 Global Step: 96010 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 15:06:21,923-Speed 5966.89 samples/sec Loss 8.3665 LearningRate 0.1424 Epoch: 9 Global Step: 96020 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 15:06:28,788-Speed 5968.84 samples/sec Loss 8.4062 LearningRate 0.1424 Epoch: 9 Global Step: 96030 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 15:06:35,708-Speed 5920.32 samples/sec Loss 8.4285 LearningRate 0.1423 Epoch: 9 Global Step: 96040 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 15:06:42,600-Speed 5943.89 samples/sec Loss 8.4188 LearningRate 0.1423 Epoch: 9 Global Step: 96050 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 15:06:49,477-Speed 5958.12 samples/sec Loss 8.3967 LearningRate 0.1423 Epoch: 9 Global Step: 96060 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 15:06:56,327-Speed 5980.28 samples/sec Loss 8.3804 LearningRate 0.1423 Epoch: 9 Global Step: 96070 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:07:03,181-Speed 5977.71 samples/sec Loss 8.3786 LearningRate 0.1422 Epoch: 9 Global Step: 96080 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:07:10,028-Speed 5982.41 samples/sec Loss 8.3455 LearningRate 0.1422 Epoch: 9 Global Step: 96090 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:07:16,876-Speed 5982.97 samples/sec Loss 8.4753 LearningRate 0.1422 Epoch: 9 Global Step: 96100 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:07:23,718-Speed 5987.42 samples/sec Loss 8.4000 LearningRate 0.1422 Epoch: 9 Global Step: 96110 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 15:07:30,563-Speed 5984.51 samples/sec Loss 8.3709 LearningRate 0.1421 Epoch: 9 Global Step: 96120 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 15:07:37,422-Speed 5973.33 samples/sec Loss 8.3593 LearningRate 0.1421 Epoch: 9 Global Step: 96130 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 15:07:44,301-Speed 5954.77 samples/sec Loss 8.4109 LearningRate 0.1421 Epoch: 9 Global Step: 96140 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 15:07:51,169-Speed 5965.35 samples/sec Loss 8.3501 LearningRate 0.1421 Epoch: 9 Global Step: 96150 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 15:07:58,032-Speed 5968.64 samples/sec Loss 8.4638 LearningRate 0.1420 Epoch: 9 Global Step: 96160 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 15:08:04,896-Speed 5968.84 samples/sec Loss 8.3681 LearningRate 0.1420 Epoch: 9 Global Step: 96170 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 15:08:11,760-Speed 5969.42 samples/sec Loss 8.3789 LearningRate 0.1420 Epoch: 9 Global Step: 96180 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 15:08:18,615-Speed 5976.75 samples/sec Loss 8.3841 LearningRate 0.1420 Epoch: 9 Global Step: 96190 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 15:08:25,495-Speed 5953.93 samples/sec Loss 8.3360 LearningRate 0.1419 Epoch: 9 Global Step: 96200 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 15:08:32,350-Speed 5976.91 samples/sec Loss 8.3489 LearningRate 0.1419 Epoch: 9 Global Step: 96210 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 15:08:39,217-Speed 5966.38 samples/sec Loss 8.3010 LearningRate 0.1419 Epoch: 9 Global Step: 96220 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 15:08:46,082-Speed 5967.61 samples/sec Loss 8.3657 LearningRate 0.1419 Epoch: 9 Global Step: 96230 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 15:08:52,937-Speed 5975.77 samples/sec Loss 8.4454 LearningRate 0.1418 Epoch: 9 Global Step: 96240 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 15:08:59,861-Speed 5917.48 samples/sec Loss 8.3511 LearningRate 0.1418 Epoch: 9 Global Step: 96250 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 15:09:06,745-Speed 5950.94 samples/sec Loss 8.3929 LearningRate 0.1418 Epoch: 9 Global Step: 96260 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 15:09:13,616-Speed 5962.41 samples/sec Loss 8.3657 LearningRate 0.1418 Epoch: 9 Global Step: 96270 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 15:09:20,468-Speed 5979.67 samples/sec Loss 8.4458 LearningRate 0.1417 Epoch: 9 Global Step: 96280 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 15:09:27,335-Speed 5965.39 samples/sec Loss 8.3895 LearningRate 0.1417 Epoch: 9 Global Step: 96290 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 15:09:34,194-Speed 5973.07 samples/sec Loss 8.3733 LearningRate 0.1417 Epoch: 9 Global Step: 96300 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 15:09:41,087-Speed 5943.84 samples/sec Loss 8.4680 LearningRate 0.1417 Epoch: 9 Global Step: 96310 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 15:09:47,961-Speed 5959.43 samples/sec Loss 8.4305 LearningRate 0.1416 Epoch: 9 Global Step: 96320 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 15:09:54,854-Speed 5943.81 samples/sec Loss 8.4254 LearningRate 0.1416 Epoch: 9 Global Step: 96330 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 15:10:01,707-Speed 5978.38 samples/sec Loss 8.3950 LearningRate 0.1416 Epoch: 9 Global Step: 96340 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 15:10:08,580-Speed 5960.87 samples/sec Loss 8.3683 LearningRate 0.1416 Epoch: 9 Global Step: 96350 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 15:10:15,450-Speed 5963.64 samples/sec Loss 8.3843 LearningRate 0.1415 Epoch: 9 Global Step: 96360 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 15:10:22,301-Speed 5982.30 samples/sec Loss 8.3933 LearningRate 0.1415 Epoch: 9 Global Step: 96370 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 15:10:29,173-Speed 5960.76 samples/sec Loss 8.3623 LearningRate 0.1415 Epoch: 9 Global Step: 96380 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 15:10:36,023-Speed 5985.38 samples/sec Loss 8.3621 LearningRate 0.1415 Epoch: 9 Global Step: 96390 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 15:10:42,880-Speed 5973.61 samples/sec Loss 8.4056 LearningRate 0.1414 Epoch: 9 Global Step: 96400 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 15:10:49,731-Speed 5979.78 samples/sec Loss 8.3184 LearningRate 0.1414 Epoch: 9 Global Step: 96410 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 15:10:56,581-Speed 5980.18 samples/sec Loss 8.3489 LearningRate 0.1414 Epoch: 9 Global Step: 96420 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 15:11:03,429-Speed 5982.65 samples/sec Loss 8.3755 LearningRate 0.1414 Epoch: 9 Global Step: 96430 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 15:11:10,284-Speed 5977.31 samples/sec Loss 8.3948 LearningRate 0.1413 Epoch: 9 Global Step: 96440 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 15:11:17,167-Speed 5952.14 samples/sec Loss 8.3987 LearningRate 0.1413 Epoch: 9 Global Step: 96450 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 15:11:24,023-Speed 5975.18 samples/sec Loss 8.3968 LearningRate 0.1413 Epoch: 9 Global Step: 96460 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 15:11:30,903-Speed 5954.99 samples/sec Loss 8.4277 LearningRate 0.1412 Epoch: 9 Global Step: 96470 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 15:11:37,781-Speed 5956.76 samples/sec Loss 8.3924 LearningRate 0.1412 Epoch: 9 Global Step: 96480 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:11:44,632-Speed 5979.62 samples/sec Loss 8.4070 LearningRate 0.1412 Epoch: 9 Global Step: 96490 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:11:51,499-Speed 5966.20 samples/sec Loss 8.3515 LearningRate 0.1412 Epoch: 9 Global Step: 96500 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:11:58,358-Speed 5980.54 samples/sec Loss 8.3485 LearningRate 0.1411 Epoch: 9 Global Step: 96510 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:12:05,216-Speed 5973.57 samples/sec Loss 8.3403 LearningRate 0.1411 Epoch: 9 Global Step: 96520 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:12:12,090-Speed 5959.81 samples/sec Loss 8.3820 LearningRate 0.1411 Epoch: 9 Global Step: 96530 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:12:18,954-Speed 5968.18 samples/sec Loss 8.3230 LearningRate 0.1411 Epoch: 9 Global Step: 96540 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:12:25,834-Speed 5955.09 samples/sec Loss 8.2899 LearningRate 0.1410 Epoch: 9 Global Step: 96550 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:12:32,689-Speed 5976.28 samples/sec Loss 8.3845 LearningRate 0.1410 Epoch: 9 Global Step: 96560 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:12:39,560-Speed 5962.86 samples/sec Loss 8.3024 LearningRate 0.1410 Epoch: 9 Global Step: 96570 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:12:46,426-Speed 5966.46 samples/sec Loss 8.3523 LearningRate 0.1410 Epoch: 9 Global Step: 96580 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-08 15:12:53,280-Speed 5977.59 samples/sec Loss 8.3950 LearningRate 0.1409 Epoch: 9 Global Step: 96590 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-08 15:13:00,140-Speed 5973.39 samples/sec Loss 8.3815 LearningRate 0.1409 Epoch: 9 Global Step: 96600 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-08 15:13:06,992-Speed 5979.60 samples/sec Loss 8.3813 LearningRate 0.1409 Epoch: 9 Global Step: 96610 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-08 15:13:13,877-Speed 5959.61 samples/sec Loss 8.3583 LearningRate 0.1409 Epoch: 9 Global Step: 96620 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-08 15:13:20,744-Speed 5965.39 samples/sec Loss 8.3729 LearningRate 0.1408 Epoch: 9 Global Step: 96630 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-08 15:13:27,597-Speed 5977.91 samples/sec Loss 8.3514 LearningRate 0.1408 Epoch: 9 Global Step: 96640 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-08 15:13:34,463-Speed 5966.45 samples/sec Loss 8.3170 LearningRate 0.1408 Epoch: 9 Global Step: 96650 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:13:41,330-Speed 5965.41 samples/sec Loss 8.4102 LearningRate 0.1408 Epoch: 9 Global Step: 96660 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:13:48,197-Speed 5966.27 samples/sec Loss 8.3227 LearningRate 0.1407 Epoch: 9 Global Step: 96670 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:13:55,065-Speed 5965.22 samples/sec Loss 8.3194 LearningRate 0.1407 Epoch: 9 Global Step: 96680 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:14:01,924-Speed 5972.49 samples/sec Loss 8.3454 LearningRate 0.1407 Epoch: 9 Global Step: 96690 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:14:08,794-Speed 5966.26 samples/sec Loss 8.3308 LearningRate 0.1407 Epoch: 9 Global Step: 96700 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:14:15,654-Speed 5972.74 samples/sec Loss 8.3379 LearningRate 0.1406 Epoch: 9 Global Step: 96710 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:14:22,510-Speed 5974.86 samples/sec Loss 8.3213 LearningRate 0.1406 Epoch: 9 Global Step: 96720 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:14:29,401-Speed 5945.22 samples/sec Loss 8.3437 LearningRate 0.1406 Epoch: 9 Global Step: 96730 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:14:36,260-Speed 5973.24 samples/sec Loss 8.3192 LearningRate 0.1406 Epoch: 9 Global Step: 96740 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:14:43,151-Speed 5945.23 samples/sec Loss 8.3132 LearningRate 0.1405 Epoch: 9 Global Step: 96750 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-08 15:14:50,014-Speed 5969.90 samples/sec Loss 8.3918 LearningRate 0.1405 Epoch: 9 Global Step: 96760 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-08 15:14:56,879-Speed 5967.01 samples/sec Loss 8.2619 LearningRate 0.1405 Epoch: 9 Global Step: 96770 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-08 15:15:03,732-Speed 5978.13 samples/sec Loss 8.3256 LearningRate 0.1405 Epoch: 9 Global Step: 96780 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:15:10,584-Speed 5978.72 samples/sec Loss 8.3987 LearningRate 0.1404 Epoch: 9 Global Step: 96790 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:15:17,447-Speed 5970.66 samples/sec Loss 8.3794 LearningRate 0.1404 Epoch: 9 Global Step: 96800 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:15:24,302-Speed 5975.36 samples/sec Loss 8.3483 LearningRate 0.1404 Epoch: 9 Global Step: 96810 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:15:31,149-Speed 5983.73 samples/sec Loss 8.2937 LearningRate 0.1404 Epoch: 9 Global Step: 96820 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:15:38,005-Speed 5975.25 samples/sec Loss 8.3325 LearningRate 0.1403 Epoch: 9 Global Step: 96830 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:15:44,877-Speed 5964.17 samples/sec Loss 8.3130 LearningRate 0.1403 Epoch: 9 Global Step: 96840 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:15:51,753-Speed 5958.26 samples/sec Loss 8.3472 LearningRate 0.1403 Epoch: 9 Global Step: 96850 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:15:58,611-Speed 5972.77 samples/sec Loss 8.3188 LearningRate 0.1403 Epoch: 9 Global Step: 96860 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:16:05,464-Speed 5981.45 samples/sec Loss 8.3073 LearningRate 0.1402 Epoch: 9 Global Step: 96870 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:16:12,345-Speed 5953.59 samples/sec Loss 8.3368 LearningRate 0.1402 Epoch: 9 Global Step: 96880 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-08 15:16:19,194-Speed 5982.04 samples/sec Loss 8.3244 LearningRate 0.1402 Epoch: 9 Global Step: 96890 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:16:26,050-Speed 5976.26 samples/sec Loss 8.3671 LearningRate 0.1402 Epoch: 9 Global Step: 96900 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:16:32,895-Speed 5984.82 samples/sec Loss 8.3994 LearningRate 0.1401 Epoch: 9 Global Step: 96910 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:16:39,750-Speed 5975.86 samples/sec Loss 8.3429 LearningRate 0.1401 Epoch: 9 Global Step: 96920 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:16:46,630-Speed 5955.38 samples/sec Loss 8.3431 LearningRate 0.1401 Epoch: 9 Global Step: 96930 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:16:53,521-Speed 5945.32 samples/sec Loss 8.2962 LearningRate 0.1401 Epoch: 9 Global Step: 96940 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:17:00,371-Speed 5980.77 samples/sec Loss 8.4429 LearningRate 0.1400 Epoch: 9 Global Step: 96950 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:17:07,223-Speed 5978.97 samples/sec Loss 8.3358 LearningRate 0.1400 Epoch: 9 Global Step: 96960 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:17:14,072-Speed 5983.69 samples/sec Loss 8.3365 LearningRate 0.1400 Epoch: 9 Global Step: 96970 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:17:20,933-Speed 5971.32 samples/sec Loss 8.3554 LearningRate 0.1400 Epoch: 9 Global Step: 96980 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:17:27,785-Speed 5978.11 samples/sec Loss 8.3545 LearningRate 0.1399 Epoch: 9 Global Step: 96990 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-08 15:17:34,647-Speed 5970.66 samples/sec Loss 8.3328 LearningRate 0.1399 Epoch: 9 Global Step: 97000 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-08 15:17:41,510-Speed 5969.42 samples/sec Loss 8.3448 LearningRate 0.1399 Epoch: 9 Global Step: 97010 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-08 15:17:48,359-Speed 5981.63 samples/sec Loss 8.3645 LearningRate 0.1399 Epoch: 9 Global Step: 97020 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-08 15:17:55,244-Speed 5950.71 samples/sec Loss 8.3597 LearningRate 0.1398 Epoch: 9 Global Step: 97030 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-08 15:18:02,098-Speed 5976.85 samples/sec Loss 8.3061 LearningRate 0.1398 Epoch: 9 Global Step: 97040 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:18:08,986-Speed 5948.25 samples/sec Loss 8.3026 LearningRate 0.1398 Epoch: 9 Global Step: 97050 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:18:15,893-Speed 5931.10 samples/sec Loss 8.2879 LearningRate 0.1397 Epoch: 9 Global Step: 97060 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:18:22,753-Speed 5972.11 samples/sec Loss 8.2855 LearningRate 0.1397 Epoch: 9 Global Step: 97070 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:18:29,616-Speed 5969.36 samples/sec Loss 8.3109 LearningRate 0.1397 Epoch: 9 Global Step: 97080 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:18:36,571-Speed 5890.74 samples/sec Loss 8.2593 LearningRate 0.1397 Epoch: 9 Global Step: 97090 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:18:43,443-Speed 5961.28 samples/sec Loss 8.2925 LearningRate 0.1396 Epoch: 9 Global Step: 97100 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:18:50,294-Speed 5979.70 samples/sec Loss 8.2591 LearningRate 0.1396 Epoch: 9 Global Step: 97110 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:18:57,151-Speed 5974.99 samples/sec Loss 8.3029 LearningRate 0.1396 Epoch: 9 Global Step: 97120 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:19:04,003-Speed 5978.27 samples/sec Loss 8.3612 LearningRate 0.1396 Epoch: 9 Global Step: 97130 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:19:10,867-Speed 5968.68 samples/sec Loss 8.2615 LearningRate 0.1395 Epoch: 9 Global Step: 97140 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-08 15:19:17,719-Speed 5980.76 samples/sec Loss 8.2903 LearningRate 0.1395 Epoch: 9 Global Step: 97150 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-08 15:19:24,579-Speed 5975.32 samples/sec Loss 8.2707 LearningRate 0.1395 Epoch: 9 Global Step: 97160 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-08 15:19:31,436-Speed 5974.34 samples/sec Loss 8.2932 LearningRate 0.1395 Epoch: 9 Global Step: 97170 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-08 15:19:38,321-Speed 5949.91 samples/sec Loss 8.2741 LearningRate 0.1394 Epoch: 9 Global Step: 97180 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:19:45,171-Speed 5982.05 samples/sec Loss 8.3311 LearningRate 0.1394 Epoch: 9 Global Step: 97190 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:19:52,031-Speed 5971.84 samples/sec Loss 8.2866 LearningRate 0.1394 Epoch: 9 Global Step: 97200 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:19:58,891-Speed 5971.72 samples/sec Loss 8.3543 LearningRate 0.1394 Epoch: 9 Global Step: 97210 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:20:05,855-Speed 5882.99 samples/sec Loss 8.2688 LearningRate 0.1393 Epoch: 9 Global Step: 97220 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:20:12,856-Speed 5853.02 samples/sec Loss 8.3828 LearningRate 0.1393 Epoch: 9 Global Step: 97230 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:20:19,709-Speed 5978.54 samples/sec Loss 8.3579 LearningRate 0.1393 Epoch: 9 Global Step: 97240 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:20:26,561-Speed 5979.49 samples/sec Loss 8.2823 LearningRate 0.1393 Epoch: 9 Global Step: 97250 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:20:33,412-Speed 5979.25 samples/sec Loss 8.2964 LearningRate 0.1392 Epoch: 9 Global Step: 97260 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:20:40,377-Speed 5883.03 samples/sec Loss 8.2977 LearningRate 0.1392 Epoch: 9 Global Step: 97270 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:20:47,378-Speed 5852.62 samples/sec Loss 8.2688 LearningRate 0.1392 Epoch: 9 Global Step: 97280 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-08 15:20:54,323-Speed 5898.98 samples/sec Loss 8.3110 LearningRate 0.1392 Epoch: 9 Global Step: 97290 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:21:01,297-Speed 5874.71 samples/sec Loss 8.2903 LearningRate 0.1391 Epoch: 9 Global Step: 97300 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:21:08,140-Speed 5986.88 samples/sec Loss 8.2418 LearningRate 0.1391 Epoch: 9 Global Step: 97310 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:21:14,992-Speed 5979.33 samples/sec Loss 8.2061 LearningRate 0.1391 Epoch: 9 Global Step: 97320 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:21:21,871-Speed 5956.09 samples/sec Loss 8.3449 LearningRate 0.1391 Epoch: 9 Global Step: 97330 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:21:28,736-Speed 5967.24 samples/sec Loss 8.3462 LearningRate 0.1390 Epoch: 9 Global Step: 97340 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:21:35,600-Speed 5968.46 samples/sec Loss 8.2842 LearningRate 0.1390 Epoch: 9 Global Step: 97350 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:21:42,488-Speed 5947.76 samples/sec Loss 8.2633 LearningRate 0.1390 Epoch: 9 Global Step: 97360 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:21:49,347-Speed 5974.68 samples/sec Loss 8.3623 LearningRate 0.1390 Epoch: 9 Global Step: 97370 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:21:56,200-Speed 5977.99 samples/sec Loss 8.3407 LearningRate 0.1389 Epoch: 9 Global Step: 97380 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:22:03,064-Speed 5968.12 samples/sec Loss 8.2615 LearningRate 0.1389 Epoch: 9 Global Step: 97390 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:22:09,940-Speed 5958.63 samples/sec Loss 8.3161 LearningRate 0.1389 Epoch: 9 Global Step: 97400 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:22:16,794-Speed 5977.97 samples/sec Loss 8.3301 LearningRate 0.1389 Epoch: 9 Global Step: 97410 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:22:23,670-Speed 5958.27 samples/sec Loss 8.2898 LearningRate 0.1388 Epoch: 9 Global Step: 97420 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:22:30,569-Speed 5937.57 samples/sec Loss 8.3544 LearningRate 0.1388 Epoch: 9 Global Step: 97430 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:22:37,427-Speed 5974.32 samples/sec Loss 8.2872 LearningRate 0.1388 Epoch: 9 Global Step: 97440 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:22:44,282-Speed 5976.98 samples/sec Loss 8.2930 LearningRate 0.1388 Epoch: 9 Global Step: 97450 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:22:51,144-Speed 5973.38 samples/sec Loss 8.3039 LearningRate 0.1387 Epoch: 9 Global Step: 97460 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:22:58,000-Speed 5975.18 samples/sec Loss 8.2258 LearningRate 0.1387 Epoch: 9 Global Step: 97470 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:23:04,861-Speed 5970.47 samples/sec Loss 8.2841 LearningRate 0.1387 Epoch: 9 Global Step: 97480 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:23:11,710-Speed 5981.84 samples/sec Loss 8.2683 LearningRate 0.1387 Epoch: 9 Global Step: 97490 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-08 15:23:18,569-Speed 5972.91 samples/sec Loss 8.3439 LearningRate 0.1386 Epoch: 9 Global Step: 97500 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-08 15:23:25,424-Speed 5976.06 samples/sec Loss 8.3595 LearningRate 0.1386 Epoch: 9 Global Step: 97510 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-08 15:23:32,304-Speed 5957.13 samples/sec Loss 8.2271 LearningRate 0.1386 Epoch: 9 Global Step: 97520 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:23:39,172-Speed 5964.84 samples/sec Loss 8.3356 LearningRate 0.1386 Epoch: 9 Global Step: 97530 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:23:46,019-Speed 5983.37 samples/sec Loss 8.2181 LearningRate 0.1385 Epoch: 9 Global Step: 97540 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:23:52,880-Speed 5971.03 samples/sec Loss 8.2866 LearningRate 0.1385 Epoch: 9 Global Step: 97550 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:23:59,735-Speed 5976.64 samples/sec Loss 8.3274 LearningRate 0.1385 Epoch: 9 Global Step: 97560 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:24:06,604-Speed 5964.74 samples/sec Loss 8.2674 LearningRate 0.1385 Epoch: 9 Global Step: 97570 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:24:13,467-Speed 5969.23 samples/sec Loss 8.2679 LearningRate 0.1384 Epoch: 9 Global Step: 97580 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:24:20,332-Speed 5969.69 samples/sec Loss 8.2621 LearningRate 0.1384 Epoch: 9 Global Step: 97590 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:24:27,178-Speed 5983.84 samples/sec Loss 8.3020 LearningRate 0.1384 Epoch: 9 Global Step: 97600 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:24:34,032-Speed 5977.37 samples/sec Loss 8.2743 LearningRate 0.1384 Epoch: 9 Global Step: 97610 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:24:40,898-Speed 5966.79 samples/sec Loss 8.3558 LearningRate 0.1383 Epoch: 9 Global Step: 97620 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-08 15:24:47,761-Speed 5969.40 samples/sec Loss 8.3771 LearningRate 0.1383 Epoch: 9 Global Step: 97630 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:24:54,621-Speed 5971.37 samples/sec Loss 8.2599 LearningRate 0.1383 Epoch: 9 Global Step: 97640 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:25:01,493-Speed 5962.35 samples/sec Loss 8.2968 LearningRate 0.1383 Epoch: 9 Global Step: 97650 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:25:08,374-Speed 5953.46 samples/sec Loss 8.2643 LearningRate 0.1382 Epoch: 9 Global Step: 97660 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:25:15,220-Speed 5983.73 samples/sec Loss 8.2625 LearningRate 0.1382 Epoch: 9 Global Step: 97670 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:25:22,072-Speed 5979.33 samples/sec Loss 8.2031 LearningRate 0.1382 Epoch: 9 Global Step: 97680 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:25:28,945-Speed 5960.34 samples/sec Loss 8.2362 LearningRate 0.1382 Epoch: 9 Global Step: 97690 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:25:35,852-Speed 5931.80 samples/sec Loss 8.2200 LearningRate 0.1381 Epoch: 9 Global Step: 97700 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:25:42,705-Speed 5978.16 samples/sec Loss 8.2570 LearningRate 0.1381 Epoch: 9 Global Step: 97710 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:25:49,583-Speed 5956.51 samples/sec Loss 8.3377 LearningRate 0.1381 Epoch: 9 Global Step: 97720 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 15:25:56,452-Speed 5964.12 samples/sec Loss 8.2374 LearningRate 0.1381 Epoch: 9 Global Step: 97730 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-08 15:26:03,314-Speed 5970.36 samples/sec Loss 8.3224 LearningRate 0.1380 Epoch: 9 Global Step: 97740 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-08 15:26:10,172-Speed 5976.45 samples/sec Loss 8.2956 LearningRate 0.1380 Epoch: 9 Global Step: 97750 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:26:17,017-Speed 5984.73 samples/sec Loss 8.2910 LearningRate 0.1380 Epoch: 9 Global Step: 97760 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:26:23,879-Speed 5970.15 samples/sec Loss 8.2502 LearningRate 0.1380 Epoch: 9 Global Step: 97770 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:26:30,738-Speed 5973.49 samples/sec Loss 8.2795 LearningRate 0.1379 Epoch: 9 Global Step: 97780 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:26:37,603-Speed 5968.89 samples/sec Loss 8.2746 LearningRate 0.1379 Epoch: 9 Global Step: 97790 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:26:44,450-Speed 5983.36 samples/sec Loss 8.2756 LearningRate 0.1379 Epoch: 9 Global Step: 97800 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:26:51,316-Speed 5967.44 samples/sec Loss 8.2783 LearningRate 0.1379 Epoch: 9 Global Step: 97810 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:26:58,182-Speed 5967.03 samples/sec Loss 8.2767 LearningRate 0.1378 Epoch: 9 Global Step: 97820 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:27:05,032-Speed 5981.54 samples/sec Loss 8.2690 LearningRate 0.1378 Epoch: 9 Global Step: 97830 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:27:11,887-Speed 5976.28 samples/sec Loss 8.2621 LearningRate 0.1378 Epoch: 9 Global Step: 97840 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:27:18,750-Speed 5968.78 samples/sec Loss 8.2846 LearningRate 0.1378 Epoch: 9 Global Step: 97850 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-08 15:27:25,605-Speed 5977.21 samples/sec Loss 8.2962 LearningRate 0.1377 Epoch: 9 Global Step: 97860 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:27:32,461-Speed 5975.23 samples/sec Loss 8.2346 LearningRate 0.1377 Epoch: 9 Global Step: 97870 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:27:39,389-Speed 5913.59 samples/sec Loss 8.2725 LearningRate 0.1377 Epoch: 9 Global Step: 97880 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:27:46,245-Speed 5975.92 samples/sec Loss 8.2901 LearningRate 0.1377 Epoch: 9 Global Step: 97890 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:27:53,095-Speed 5980.17 samples/sec Loss 8.3533 LearningRate 0.1376 Epoch: 9 Global Step: 97900 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:27:59,957-Speed 5970.12 samples/sec Loss 8.2865 LearningRate 0.1376 Epoch: 9 Global Step: 97910 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:28:06,858-Speed 5938.03 samples/sec Loss 8.2713 LearningRate 0.1376 Epoch: 9 Global Step: 97920 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:28:13,755-Speed 5940.17 samples/sec Loss 8.2311 LearningRate 0.1376 Epoch: 9 Global Step: 97930 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:28:20,611-Speed 5975.68 samples/sec Loss 8.2068 LearningRate 0.1375 Epoch: 9 Global Step: 97940 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:28:27,501-Speed 5945.60 samples/sec Loss 8.2877 LearningRate 0.1375 Epoch: 9 Global Step: 97950 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:28:34,391-Speed 5945.91 samples/sec Loss 8.2512 LearningRate 0.1375 Epoch: 9 Global Step: 97960 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:28:41,247-Speed 5976.03 samples/sec Loss 8.2536 LearningRate 0.1375 Epoch: 9 Global Step: 97970 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:28:48,098-Speed 5979.87 samples/sec Loss 8.2541 LearningRate 0.1374 Epoch: 9 Global Step: 97980 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:28:54,968-Speed 5963.13 samples/sec Loss 8.2490 LearningRate 0.1374 Epoch: 9 Global Step: 97990 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:29:01,827-Speed 5973.38 samples/sec Loss 8.1416 LearningRate 0.1374 Epoch: 9 Global Step: 98000 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:29:08,692-Speed 5967.58 samples/sec Loss 8.2225 LearningRate 0.1374 Epoch: 9 Global Step: 98010 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:29:15,538-Speed 5983.84 samples/sec Loss 8.3038 LearningRate 0.1373 Epoch: 9 Global Step: 98020 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:29:22,406-Speed 5965.71 samples/sec Loss 8.2993 LearningRate 0.1373 Epoch: 9 Global Step: 98030 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:29:29,279-Speed 5960.70 samples/sec Loss 8.2742 LearningRate 0.1373 Epoch: 9 Global Step: 98040 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:29:36,139-Speed 5972.10 samples/sec Loss 8.2663 LearningRate 0.1373 Epoch: 9 Global Step: 98050 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:29:43,037-Speed 5940.05 samples/sec Loss 8.2489 LearningRate 0.1372 Epoch: 9 Global Step: 98060 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-08 15:29:49,894-Speed 5974.44 samples/sec Loss 8.2492 LearningRate 0.1372 Epoch: 9 Global Step: 98070 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-08 15:29:56,749-Speed 5978.29 samples/sec Loss 8.2580 LearningRate 0.1372 Epoch: 9 Global Step: 98080 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:30:03,621-Speed 5962.41 samples/sec Loss 8.2940 LearningRate 0.1372 Epoch: 9 Global Step: 98090 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:30:10,487-Speed 5966.70 samples/sec Loss 8.2041 LearningRate 0.1371 Epoch: 9 Global Step: 98100 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:30:17,377-Speed 5945.87 samples/sec Loss 8.2132 LearningRate 0.1371 Epoch: 9 Global Step: 98110 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:30:24,228-Speed 5980.08 samples/sec Loss 8.2686 LearningRate 0.1371 Epoch: 9 Global Step: 98120 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:30:31,086-Speed 5973.91 samples/sec Loss 8.2292 LearningRate 0.1371 Epoch: 9 Global Step: 98130 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:30:37,990-Speed 5934.12 samples/sec Loss 8.2460 LearningRate 0.1370 Epoch: 9 Global Step: 98140 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:30:44,862-Speed 5961.69 samples/sec Loss 8.2650 LearningRate 0.1370 Epoch: 9 Global Step: 98150 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:30:51,733-Speed 5962.65 samples/sec Loss 8.1891 LearningRate 0.1370 Epoch: 9 Global Step: 98160 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:30:58,585-Speed 5979.49 samples/sec Loss 8.3401 LearningRate 0.1370 Epoch: 9 Global Step: 98170 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:31:05,460-Speed 5959.18 samples/sec Loss 8.2303 LearningRate 0.1369 Epoch: 9 Global Step: 98180 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-08 15:31:12,325-Speed 5968.23 samples/sec Loss 8.2491 LearningRate 0.1369 Epoch: 9 Global Step: 98190 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:31:19,180-Speed 5975.49 samples/sec Loss 8.3168 LearningRate 0.1369 Epoch: 9 Global Step: 98200 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:31:26,049-Speed 5964.99 samples/sec Loss 8.2436 LearningRate 0.1369 Epoch: 9 Global Step: 98210 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:31:32,912-Speed 5969.59 samples/sec Loss 8.2803 LearningRate 0.1368 Epoch: 9 Global Step: 98220 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:31:39,778-Speed 5966.69 samples/sec Loss 8.2225 LearningRate 0.1368 Epoch: 9 Global Step: 98230 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:31:46,644-Speed 5967.04 samples/sec Loss 8.2615 LearningRate 0.1368 Epoch: 9 Global Step: 98240 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:31:53,495-Speed 5979.85 samples/sec Loss 8.2266 LearningRate 0.1368 Epoch: 9 Global Step: 98250 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:32:00,366-Speed 5962.65 samples/sec Loss 8.2555 LearningRate 0.1367 Epoch: 9 Global Step: 98260 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:32:07,253-Speed 5949.47 samples/sec Loss 8.2067 LearningRate 0.1367 Epoch: 9 Global Step: 98270 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:32:14,109-Speed 5975.48 samples/sec Loss 8.2822 LearningRate 0.1367 Epoch: 9 Global Step: 98280 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:32:20,973-Speed 5970.20 samples/sec Loss 8.1825 LearningRate 0.1367 Epoch: 9 Global Step: 98290 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:32:27,870-Speed 5939.86 samples/sec Loss 8.2746 LearningRate 0.1366 Epoch: 9 Global Step: 98300 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:32:34,745-Speed 5958.54 samples/sec Loss 8.2554 LearningRate 0.1366 Epoch: 9 Global Step: 98310 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:32:41,593-Speed 5983.12 samples/sec Loss 8.2397 LearningRate 0.1366 Epoch: 9 Global Step: 98320 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:32:48,511-Speed 5923.02 samples/sec Loss 8.2082 LearningRate 0.1366 Epoch: 9 Global Step: 98330 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:32:55,399-Speed 5947.60 samples/sec Loss 8.1974 LearningRate 0.1365 Epoch: 9 Global Step: 98340 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:33:02,288-Speed 5946.65 samples/sec Loss 8.1584 LearningRate 0.1365 Epoch: 9 Global Step: 98350 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:33:09,147-Speed 5973.67 samples/sec Loss 8.2138 LearningRate 0.1365 Epoch: 9 Global Step: 98360 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:33:15,991-Speed 5985.85 samples/sec Loss 8.2126 LearningRate 0.1365 Epoch: 9 Global Step: 98370 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:33:22,871-Speed 5953.88 samples/sec Loss 8.1624 LearningRate 0.1364 Epoch: 9 Global Step: 98380 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:33:29,736-Speed 5967.69 samples/sec Loss 8.2622 LearningRate 0.1364 Epoch: 9 Global Step: 98390 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:33:36,601-Speed 5968.11 samples/sec Loss 8.2412 LearningRate 0.1364 Epoch: 9 Global Step: 98400 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:33:43,462-Speed 5970.25 samples/sec Loss 8.2111 LearningRate 0.1363 Epoch: 9 Global Step: 98410 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:33:50,350-Speed 5949.52 samples/sec Loss 8.2805 LearningRate 0.1363 Epoch: 9 Global Step: 98420 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:33:57,191-Speed 5988.40 samples/sec Loss 8.2371 LearningRate 0.1363 Epoch: 9 Global Step: 98430 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:34:04,051-Speed 5971.50 samples/sec Loss 8.1165 LearningRate 0.1363 Epoch: 9 Global Step: 98440 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:34:10,897-Speed 5984.77 samples/sec Loss 8.2776 LearningRate 0.1362 Epoch: 9 Global Step: 98450 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:34:17,762-Speed 5967.33 samples/sec Loss 8.1864 LearningRate 0.1362 Epoch: 9 Global Step: 98460 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:34:24,640-Speed 5956.68 samples/sec Loss 8.1777 LearningRate 0.1362 Epoch: 9 Global Step: 98470 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:34:31,520-Speed 5953.93 samples/sec Loss 8.2214 LearningRate 0.1362 Epoch: 9 Global Step: 98480 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:34:38,363-Speed 5986.84 samples/sec Loss 8.2398 LearningRate 0.1361 Epoch: 9 Global Step: 98490 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:34:45,216-Speed 5977.85 samples/sec Loss 8.1783 LearningRate 0.1361 Epoch: 9 Global Step: 98500 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:34:52,089-Speed 5960.63 samples/sec Loss 8.1769 LearningRate 0.1361 Epoch: 9 Global Step: 98510 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:34:58,942-Speed 5977.55 samples/sec Loss 8.2323 LearningRate 0.1361 Epoch: 9 Global Step: 98520 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:35:05,804-Speed 5970.55 samples/sec Loss 8.2366 LearningRate 0.1360 Epoch: 9 Global Step: 98530 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:35:12,650-Speed 5984.47 samples/sec Loss 8.2106 LearningRate 0.1360 Epoch: 9 Global Step: 98540 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:35:19,529-Speed 5955.87 samples/sec Loss 8.1485 LearningRate 0.1360 Epoch: 9 Global Step: 98550 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:35:26,385-Speed 5975.31 samples/sec Loss 8.1816 LearningRate 0.1360 Epoch: 9 Global Step: 98560 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:35:33,225-Speed 5989.37 samples/sec Loss 8.2244 LearningRate 0.1359 Epoch: 9 Global Step: 98570 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:35:40,069-Speed 5985.21 samples/sec Loss 8.2433 LearningRate 0.1359 Epoch: 9 Global Step: 98580 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:35:46,936-Speed 5965.70 samples/sec Loss 8.2864 LearningRate 0.1359 Epoch: 9 Global Step: 98590 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:35:53,817-Speed 5954.46 samples/sec Loss 8.2135 LearningRate 0.1359 Epoch: 9 Global Step: 98600 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:36:00,699-Speed 5952.65 samples/sec Loss 8.2224 LearningRate 0.1358 Epoch: 9 Global Step: 98610 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:36:07,595-Speed 5940.83 samples/sec Loss 8.2727 LearningRate 0.1358 Epoch: 9 Global Step: 98620 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:36:14,454-Speed 5975.36 samples/sec Loss 8.2061 LearningRate 0.1358 Epoch: 9 Global Step: 98630 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:36:21,301-Speed 5983.66 samples/sec Loss 8.1858 LearningRate 0.1358 Epoch: 9 Global Step: 98640 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:36:28,158-Speed 5973.94 samples/sec Loss 8.2357 LearningRate 0.1358 Epoch: 9 Global Step: 98650 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:36:35,012-Speed 5977.57 samples/sec Loss 8.1357 LearningRate 0.1357 Epoch: 9 Global Step: 98660 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:36:41,870-Speed 5973.69 samples/sec Loss 8.1950 LearningRate 0.1357 Epoch: 9 Global Step: 98670 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:36:48,747-Speed 5957.08 samples/sec Loss 8.1684 LearningRate 0.1357 Epoch: 9 Global Step: 98680 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:36:55,599-Speed 5979.10 samples/sec Loss 8.1856 LearningRate 0.1357 Epoch: 9 Global Step: 98690 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-08 15:37:02,481-Speed 5953.16 samples/sec Loss 8.1883 LearningRate 0.1356 Epoch: 9 Global Step: 98700 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-08 15:37:09,334-Speed 5978.23 samples/sec Loss 8.1893 LearningRate 0.1356 Epoch: 9 Global Step: 98710 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:37:16,206-Speed 5961.03 samples/sec Loss 8.2013 LearningRate 0.1356 Epoch: 9 Global Step: 98720 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:37:23,059-Speed 5978.33 samples/sec Loss 8.1913 LearningRate 0.1356 Epoch: 9 Global Step: 98730 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:37:29,915-Speed 5975.64 samples/sec Loss 8.2242 LearningRate 0.1355 Epoch: 9 Global Step: 98740 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:37:36,775-Speed 5972.38 samples/sec Loss 8.2279 LearningRate 0.1355 Epoch: 9 Global Step: 98750 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:37:43,629-Speed 5976.44 samples/sec Loss 8.2080 LearningRate 0.1355 Epoch: 9 Global Step: 98760 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:37:50,495-Speed 5966.82 samples/sec Loss 8.2050 LearningRate 0.1355 Epoch: 9 Global Step: 98770 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:37:57,370-Speed 5959.58 samples/sec Loss 8.1973 LearningRate 0.1354 Epoch: 9 Global Step: 98780 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:38:04,244-Speed 5959.48 samples/sec Loss 8.1549 LearningRate 0.1354 Epoch: 9 Global Step: 98790 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:38:11,098-Speed 5977.63 samples/sec Loss 8.2171 LearningRate 0.1354 Epoch: 9 Global Step: 98800 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:38:17,960-Speed 5969.79 samples/sec Loss 8.2406 LearningRate 0.1354 Epoch: 9 Global Step: 98810 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:38:24,811-Speed 5980.17 samples/sec Loss 8.2407 LearningRate 0.1353 Epoch: 9 Global Step: 98820 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:38:31,675-Speed 5969.19 samples/sec Loss 8.2067 LearningRate 0.1353 Epoch: 9 Global Step: 98830 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:38:38,524-Speed 5981.07 samples/sec Loss 8.2349 LearningRate 0.1353 Epoch: 9 Global Step: 98840 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:38:45,401-Speed 5960.47 samples/sec Loss 8.2162 LearningRate 0.1353 Epoch: 9 Global Step: 98850 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:38:52,256-Speed 5975.96 samples/sec Loss 8.2269 LearningRate 0.1352 Epoch: 9 Global Step: 98860 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:38:59,139-Speed 5952.75 samples/sec Loss 8.2069 LearningRate 0.1352 Epoch: 9 Global Step: 98870 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:39:05,988-Speed 5981.21 samples/sec Loss 8.1926 LearningRate 0.1352 Epoch: 9 Global Step: 98880 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:39:12,871-Speed 5952.21 samples/sec Loss 8.1943 LearningRate 0.1352 Epoch: 9 Global Step: 98890 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:39:19,721-Speed 5983.43 samples/sec Loss 8.2658 LearningRate 0.1351 Epoch: 9 Global Step: 98900 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:39:26,590-Speed 5963.93 samples/sec Loss 8.1996 LearningRate 0.1351 Epoch: 9 Global Step: 98910 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:39:33,446-Speed 5975.50 samples/sec Loss 8.2428 LearningRate 0.1351 Epoch: 9 Global Step: 98920 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:39:40,300-Speed 5977.27 samples/sec Loss 8.2411 LearningRate 0.1351 Epoch: 9 Global Step: 98930 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:39:47,151-Speed 5978.84 samples/sec Loss 8.1901 LearningRate 0.1350 Epoch: 9 Global Step: 98940 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:39:54,002-Speed 5981.23 samples/sec Loss 8.1448 LearningRate 0.1350 Epoch: 9 Global Step: 98950 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:40:00,888-Speed 5949.20 samples/sec Loss 8.1953 LearningRate 0.1350 Epoch: 9 Global Step: 98960 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:40:07,728-Speed 5988.71 samples/sec Loss 8.2079 LearningRate 0.1350 Epoch: 9 Global Step: 98970 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:40:14,597-Speed 5964.48 samples/sec Loss 8.1665 LearningRate 0.1349 Epoch: 9 Global Step: 98980 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:40:21,478-Speed 5955.95 samples/sec Loss 8.2327 LearningRate 0.1349 Epoch: 9 Global Step: 98990 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:40:28,353-Speed 5976.17 samples/sec Loss 8.1561 LearningRate 0.1349 Epoch: 9 Global Step: 99000 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:40:35,214-Speed 5970.29 samples/sec Loss 8.2375 LearningRate 0.1349 Epoch: 9 Global Step: 99010 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:40:42,092-Speed 5957.22 samples/sec Loss 8.2395 LearningRate 0.1348 Epoch: 9 Global Step: 99020 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:40:48,951-Speed 5972.78 samples/sec Loss 8.1773 LearningRate 0.1348 Epoch: 9 Global Step: 99030 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:40:55,807-Speed 5975.50 samples/sec Loss 8.1764 LearningRate 0.1348 Epoch: 9 Global Step: 99040 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:41:02,652-Speed 5984.79 samples/sec Loss 8.1724 LearningRate 0.1348 Epoch: 9 Global Step: 99050 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:41:09,511-Speed 5972.71 samples/sec Loss 8.2102 LearningRate 0.1347 Epoch: 9 Global Step: 99060 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:41:16,369-Speed 5973.81 samples/sec Loss 8.1815 LearningRate 0.1347 Epoch: 9 Global Step: 99070 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:41:23,241-Speed 5961.53 samples/sec Loss 8.2094 LearningRate 0.1347 Epoch: 9 Global Step: 99080 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:41:30,110-Speed 5964.60 samples/sec Loss 8.1206 LearningRate 0.1347 Epoch: 9 Global Step: 99090 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:41:36,975-Speed 5967.00 samples/sec Loss 8.1377 LearningRate 0.1346 Epoch: 9 Global Step: 99100 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:41:43,832-Speed 5974.86 samples/sec Loss 8.2276 LearningRate 0.1346 Epoch: 9 Global Step: 99110 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:41:50,714-Speed 5953.87 samples/sec Loss 8.2005 LearningRate 0.1346 Epoch: 9 Global Step: 99120 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:41:57,611-Speed 5940.98 samples/sec Loss 8.1623 LearningRate 0.1346 Epoch: 9 Global Step: 99130 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:42:04,461-Speed 5980.82 samples/sec Loss 8.1180 LearningRate 0.1345 Epoch: 9 Global Step: 99140 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:42:11,344-Speed 5954.25 samples/sec Loss 8.1001 LearningRate 0.1345 Epoch: 9 Global Step: 99150 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:42:18,191-Speed 5983.25 samples/sec Loss 8.2057 LearningRate 0.1345 Epoch: 9 Global Step: 99160 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:42:25,056-Speed 5967.76 samples/sec Loss 8.1412 LearningRate 0.1345 Epoch: 9 Global Step: 99170 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-08 15:42:31,929-Speed 5961.25 samples/sec Loss 8.1758 LearningRate 0.1344 Epoch: 9 Global Step: 99180 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-08 15:42:38,815-Speed 5949.40 samples/sec Loss 8.2563 LearningRate 0.1344 Epoch: 9 Global Step: 99190 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-08 15:42:45,668-Speed 5978.54 samples/sec Loss 8.1552 LearningRate 0.1344 Epoch: 9 Global Step: 99200 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-08 15:42:52,526-Speed 5973.22 samples/sec Loss 8.1841 LearningRate 0.1344 Epoch: 9 Global Step: 99210 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-08 15:42:59,370-Speed 5986.98 samples/sec Loss 8.1932 LearningRate 0.1343 Epoch: 9 Global Step: 99220 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:43:06,239-Speed 5963.79 samples/sec Loss 8.0924 LearningRate 0.1343 Epoch: 9 Global Step: 99230 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:43:13,080-Speed 5988.26 samples/sec Loss 8.1686 LearningRate 0.1343 Epoch: 9 Global Step: 99240 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:43:19,929-Speed 5981.52 samples/sec Loss 8.1840 LearningRate 0.1343 Epoch: 9 Global Step: 99250 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:43:26,784-Speed 5977.10 samples/sec Loss 8.1794 LearningRate 0.1342 Epoch: 9 Global Step: 99260 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:43:33,653-Speed 5964.17 samples/sec Loss 8.1802 LearningRate 0.1342 Epoch: 9 Global Step: 99270 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:43:40,513-Speed 5972.34 samples/sec Loss 8.1780 LearningRate 0.1342 Epoch: 9 Global Step: 99280 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:43:47,382-Speed 5963.71 samples/sec Loss 8.1523 LearningRate 0.1342 Epoch: 9 Global Step: 99290 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:43:54,319-Speed 5905.61 samples/sec Loss 8.1522 LearningRate 0.1341 Epoch: 9 Global Step: 99300 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:44:01,175-Speed 5978.13 samples/sec Loss 8.1329 LearningRate 0.1341 Epoch: 9 Global Step: 99310 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:44:08,024-Speed 5981.97 samples/sec Loss 8.1847 LearningRate 0.1341 Epoch: 9 Global Step: 99320 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-08 15:44:14,976-Speed 5893.05 samples/sec Loss 8.1259 LearningRate 0.1341 Epoch: 9 Global Step: 99330 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:44:21,827-Speed 5979.65 samples/sec Loss 8.1821 LearningRate 0.1340 Epoch: 9 Global Step: 99340 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:44:28,682-Speed 5976.69 samples/sec Loss 8.1532 LearningRate 0.1340 Epoch: 9 Global Step: 99350 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:44:35,539-Speed 5974.33 samples/sec Loss 8.1619 LearningRate 0.1340 Epoch: 9 Global Step: 99360 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:44:42,402-Speed 5969.58 samples/sec Loss 8.1505 LearningRate 0.1340 Epoch: 9 Global Step: 99370 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:44:49,273-Speed 5962.61 samples/sec Loss 8.1431 LearningRate 0.1339 Epoch: 9 Global Step: 99380 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:44:56,133-Speed 5971.68 samples/sec Loss 8.1613 LearningRate 0.1339 Epoch: 9 Global Step: 99390 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:45:02,983-Speed 5980.85 samples/sec Loss 8.1450 LearningRate 0.1339 Epoch: 9 Global Step: 99400 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:45:09,854-Speed 5962.75 samples/sec Loss 8.1842 LearningRate 0.1339 Epoch: 9 Global Step: 99410 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:45:16,727-Speed 5960.60 samples/sec Loss 8.1714 LearningRate 0.1338 Epoch: 9 Global Step: 99420 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:45:23,597-Speed 5965.66 samples/sec Loss 8.1860 LearningRate 0.1338 Epoch: 9 Global Step: 99430 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-08 15:45:30,440-Speed 5986.69 samples/sec Loss 8.1792 LearningRate 0.1338 Epoch: 9 Global Step: 99440 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:45:37,284-Speed 5985.23 samples/sec Loss 8.2296 LearningRate 0.1338 Epoch: 9 Global Step: 99450 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:45:44,131-Speed 5983.51 samples/sec Loss 8.1709 LearningRate 0.1337 Epoch: 9 Global Step: 99460 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:45:50,991-Speed 5971.32 samples/sec Loss 8.1073 LearningRate 0.1337 Epoch: 9 Global Step: 99470 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:45:57,886-Speed 5942.70 samples/sec Loss 8.1364 LearningRate 0.1337 Epoch: 9 Global Step: 99480 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:46:04,744-Speed 5973.85 samples/sec Loss 8.1453 LearningRate 0.1337 Epoch: 9 Global Step: 99490 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:46:11,611-Speed 5966.05 samples/sec Loss 8.1031 LearningRate 0.1336 Epoch: 9 Global Step: 99500 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:46:18,463-Speed 5978.44 samples/sec Loss 8.0761 LearningRate 0.1336 Epoch: 9 Global Step: 99510 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:46:25,320-Speed 5975.49 samples/sec Loss 8.1261 LearningRate 0.1336 Epoch: 9 Global Step: 99520 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:46:32,201-Speed 5953.94 samples/sec Loss 8.1693 LearningRate 0.1336 Epoch: 9 Global Step: 99530 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:46:39,077-Speed 5958.70 samples/sec Loss 8.1564 LearningRate 0.1335 Epoch: 9 Global Step: 99540 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-08 15:46:45,933-Speed 5975.11 samples/sec Loss 8.1098 LearningRate 0.1335 Epoch: 9 Global Step: 99550 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-08 15:46:52,786-Speed 5978.68 samples/sec Loss 8.1123 LearningRate 0.1335 Epoch: 9 Global Step: 99560 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:46:59,635-Speed 5981.06 samples/sec Loss 8.1017 LearningRate 0.1335 Epoch: 9 Global Step: 99570 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:47:06,487-Speed 5979.35 samples/sec Loss 8.0916 LearningRate 0.1334 Epoch: 9 Global Step: 99580 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:47:13,336-Speed 5982.01 samples/sec Loss 8.2304 LearningRate 0.1334 Epoch: 9 Global Step: 99590 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:47:20,184-Speed 5982.54 samples/sec Loss 8.2088 LearningRate 0.1334 Epoch: 9 Global Step: 99600 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:47:27,031-Speed 5988.06 samples/sec Loss 8.1875 LearningRate 0.1334 Epoch: 9 Global Step: 99610 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:47:33,885-Speed 5977.16 samples/sec Loss 8.1500 LearningRate 0.1333 Epoch: 9 Global Step: 99620 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:47:40,749-Speed 5969.17 samples/sec Loss 8.1679 LearningRate 0.1333 Epoch: 9 Global Step: 99630 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:47:47,609-Speed 5971.89 samples/sec Loss 8.0741 LearningRate 0.1333 Epoch: 9 Global Step: 99640 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:47:54,459-Speed 5980.34 samples/sec Loss 8.1762 LearningRate 0.1333 Epoch: 9 Global Step: 99650 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:48:01,317-Speed 5973.16 samples/sec Loss 8.1143 LearningRate 0.1332 Epoch: 9 Global Step: 99660 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-08 15:48:08,211-Speed 5943.79 samples/sec Loss 8.1443 LearningRate 0.1332 Epoch: 9 Global Step: 99670 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:48:15,087-Speed 5957.73 samples/sec Loss 8.1361 LearningRate 0.1332 Epoch: 9 Global Step: 99680 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:48:21,944-Speed 5974.42 samples/sec Loss 8.1877 LearningRate 0.1332 Epoch: 9 Global Step: 99690 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:48:28,838-Speed 5943.30 samples/sec Loss 8.1331 LearningRate 0.1331 Epoch: 9 Global Step: 99700 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:48:35,690-Speed 5978.96 samples/sec Loss 8.1540 LearningRate 0.1331 Epoch: 9 Global Step: 99710 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:48:42,536-Speed 5983.86 samples/sec Loss 8.1786 LearningRate 0.1331 Epoch: 9 Global Step: 99720 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:48:49,418-Speed 5952.16 samples/sec Loss 8.1174 LearningRate 0.1331 Epoch: 9 Global Step: 99730 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:48:56,274-Speed 5975.34 samples/sec Loss 8.1406 LearningRate 0.1330 Epoch: 9 Global Step: 99740 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:49:03,116-Speed 5987.65 samples/sec Loss 8.1625 LearningRate 0.1330 Epoch: 9 Global Step: 99750 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:49:09,985-Speed 5964.73 samples/sec Loss 8.1631 LearningRate 0.1330 Epoch: 9 Global Step: 99760 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:49:16,907-Speed 5918.91 samples/sec Loss 8.2018 LearningRate 0.1330 Epoch: 9 Global Step: 99770 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-08 15:49:23,763-Speed 5974.93 samples/sec Loss 8.1783 LearningRate 0.1329 Epoch: 9 Global Step: 99780 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-08 15:49:30,597-Speed 5995.07 samples/sec Loss 8.1305 LearningRate 0.1329 Epoch: 9 Global Step: 99790 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:49:37,446-Speed 5980.71 samples/sec Loss 8.0963 LearningRate 0.1329 Epoch: 9 Global Step: 99800 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:49:44,304-Speed 5974.61 samples/sec Loss 8.2086 LearningRate 0.1329 Epoch: 9 Global Step: 99810 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:49:51,154-Speed 5980.60 samples/sec Loss 8.1213 LearningRate 0.1328 Epoch: 9 Global Step: 99820 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:49:58,024-Speed 5963.44 samples/sec Loss 8.1410 LearningRate 0.1328 Epoch: 9 Global Step: 99830 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:50:04,882-Speed 5973.76 samples/sec Loss 8.1486 LearningRate 0.1328 Epoch: 9 Global Step: 99840 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:50:11,726-Speed 5985.32 samples/sec Loss 8.1659 LearningRate 0.1328 Epoch: 9 Global Step: 99850 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:50:18,584-Speed 5974.52 samples/sec Loss 8.1164 LearningRate 0.1327 Epoch: 9 Global Step: 99860 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:50:25,463-Speed 5955.92 samples/sec Loss 8.1636 LearningRate 0.1327 Epoch: 9 Global Step: 99870 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:50:32,343-Speed 5954.45 samples/sec Loss 8.0989 LearningRate 0.1327 Epoch: 9 Global Step: 99880 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:50:39,198-Speed 5976.54 samples/sec Loss 8.0624 LearningRate 0.1327 Epoch: 9 Global Step: 99890 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-08 15:50:46,072-Speed 5959.99 samples/sec Loss 8.0713 LearningRate 0.1326 Epoch: 9 Global Step: 99900 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-08 15:50:52,926-Speed 5977.25 samples/sec Loss 8.1521 LearningRate 0.1326 Epoch: 9 Global Step: 99910 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:50:59,791-Speed 5967.17 samples/sec Loss 8.1443 LearningRate 0.1326 Epoch: 9 Global Step: 99920 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:51:06,634-Speed 5986.79 samples/sec Loss 8.0925 LearningRate 0.1326 Epoch: 9 Global Step: 99930 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:51:13,497-Speed 5969.21 samples/sec Loss 8.0720 LearningRate 0.1325 Epoch: 9 Global Step: 99940 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:51:20,373-Speed 5958.42 samples/sec Loss 8.1456 LearningRate 0.1325 Epoch: 9 Global Step: 99950 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:51:27,243-Speed 5963.56 samples/sec Loss 8.1441 LearningRate 0.1325 Epoch: 9 Global Step: 99960 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:51:34,126-Speed 5952.80 samples/sec Loss 8.1028 LearningRate 0.1325 Epoch: 9 Global Step: 99970 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:51:40,990-Speed 5968.35 samples/sec Loss 8.1817 LearningRate 0.1324 Epoch: 9 Global Step: 99980 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:51:47,868-Speed 5956.09 samples/sec Loss 8.0969 LearningRate 0.1324 Epoch: 9 Global Step: 99990 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:51:54,716-Speed 5982.40 samples/sec Loss 8.1212 LearningRate 0.1324 Epoch: 9 Global Step: 100000 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:52:21,676-[lfw][100000]XNorm: 25.080002 Training: 2022-01-08 15:52:21,677-[lfw][100000]Accuracy-Flip: 0.99717+-0.00269 Training: 2022-01-08 15:52:21,678-[lfw][100000]Accuracy-Highest: 0.99750 Training: 2022-01-08 15:52:52,927-[cfp_fp][100000]XNorm: 22.028425 Training: 2022-01-08 15:52:52,928-[cfp_fp][100000]Accuracy-Flip: 0.98257+-0.00530 Training: 2022-01-08 15:52:52,929-[cfp_fp][100000]Accuracy-Highest: 0.98257 Training: 2022-01-08 15:53:19,915-[agedb_30][100000]XNorm: 24.118641 Training: 2022-01-08 15:53:19,916-[agedb_30][100000]Accuracy-Flip: 0.96950+-0.00667 Training: 2022-01-08 15:53:19,916-[agedb_30][100000]Accuracy-Highest: 0.97150 Training: 2022-01-08 15:53:26,763-Speed 445.00 samples/sec Loss 8.1594 LearningRate 0.1324 Epoch: 9 Global Step: 100010 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-08 15:53:33,579-Speed 6011.99 samples/sec Loss 8.1189 LearningRate 0.1324 Epoch: 9 Global Step: 100020 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:53:40,442-Speed 5969.54 samples/sec Loss 8.1343 LearningRate 0.1323 Epoch: 9 Global Step: 100030 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:53:47,309-Speed 5966.83 samples/sec Loss 8.0438 LearningRate 0.1323 Epoch: 9 Global Step: 100040 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:53:54,158-Speed 5981.65 samples/sec Loss 8.1311 LearningRate 0.1323 Epoch: 9 Global Step: 100050 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:54:01,035-Speed 5957.72 samples/sec Loss 8.1485 LearningRate 0.1323 Epoch: 9 Global Step: 100060 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:54:07,922-Speed 5949.20 samples/sec Loss 8.1221 LearningRate 0.1322 Epoch: 9 Global Step: 100070 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:54:14,806-Speed 5953.32 samples/sec Loss 8.0612 LearningRate 0.1322 Epoch: 9 Global Step: 100080 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:54:21,692-Speed 5949.35 samples/sec Loss 8.1228 LearningRate 0.1322 Epoch: 9 Global Step: 100090 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:54:28,590-Speed 5952.70 samples/sec Loss 8.1454 LearningRate 0.1322 Epoch: 9 Global Step: 100100 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:54:35,495-Speed 5947.63 samples/sec Loss 8.1453 LearningRate 0.1321 Epoch: 9 Global Step: 100110 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:54:42,471-Speed 5977.76 samples/sec Loss 8.1020 LearningRate 0.1321 Epoch: 9 Global Step: 100120 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:54:49,354-Speed 5950.98 samples/sec Loss 8.1753 LearningRate 0.1321 Epoch: 9 Global Step: 100130 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:54:56,284-Speed 5912.21 samples/sec Loss 8.0929 LearningRate 0.1321 Epoch: 9 Global Step: 100140 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:55:03,141-Speed 5974.75 samples/sec Loss 8.1083 LearningRate 0.1320 Epoch: 9 Global Step: 100150 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:55:09,992-Speed 5979.86 samples/sec Loss 8.0710 LearningRate 0.1320 Epoch: 9 Global Step: 100160 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:55:16,887-Speed 5941.16 samples/sec Loss 8.1171 LearningRate 0.1320 Epoch: 9 Global Step: 100170 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:55:23,753-Speed 5966.27 samples/sec Loss 8.1434 LearningRate 0.1320 Epoch: 9 Global Step: 100180 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:55:30,612-Speed 5973.15 samples/sec Loss 8.0478 LearningRate 0.1319 Epoch: 9 Global Step: 100190 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:55:37,473-Speed 5971.12 samples/sec Loss 8.0923 LearningRate 0.1319 Epoch: 9 Global Step: 100200 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:55:44,326-Speed 5977.81 samples/sec Loss 8.0735 LearningRate 0.1319 Epoch: 9 Global Step: 100210 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:55:51,209-Speed 5952.15 samples/sec Loss 8.0818 LearningRate 0.1319 Epoch: 9 Global Step: 100220 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:55:58,088-Speed 5955.39 samples/sec Loss 8.1112 LearningRate 0.1318 Epoch: 9 Global Step: 100230 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:56:04,958-Speed 5963.68 samples/sec Loss 8.0430 LearningRate 0.1318 Epoch: 9 Global Step: 100240 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:56:11,825-Speed 5966.80 samples/sec Loss 8.0827 LearningRate 0.1318 Epoch: 9 Global Step: 100250 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:56:18,688-Speed 5969.52 samples/sec Loss 8.1107 LearningRate 0.1318 Epoch: 9 Global Step: 100260 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:56:25,526-Speed 5993.71 samples/sec Loss 8.1608 LearningRate 0.1317 Epoch: 9 Global Step: 100270 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:56:32,373-Speed 5983.17 samples/sec Loss 8.1320 LearningRate 0.1317 Epoch: 9 Global Step: 100280 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:56:39,239-Speed 5967.01 samples/sec Loss 8.1750 LearningRate 0.1317 Epoch: 9 Global Step: 100290 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:56:46,100-Speed 5971.15 samples/sec Loss 8.1075 LearningRate 0.1317 Epoch: 9 Global Step: 100300 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:56:52,957-Speed 5974.80 samples/sec Loss 8.0190 LearningRate 0.1316 Epoch: 9 Global Step: 100310 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:56:59,804-Speed 5983.77 samples/sec Loss 8.1246 LearningRate 0.1316 Epoch: 9 Global Step: 100320 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:57:06,685-Speed 5953.40 samples/sec Loss 8.0684 LearningRate 0.1316 Epoch: 9 Global Step: 100330 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:57:13,541-Speed 5975.84 samples/sec Loss 8.0488 LearningRate 0.1316 Epoch: 9 Global Step: 100340 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:57:20,409-Speed 5964.59 samples/sec Loss 8.0747 LearningRate 0.1315 Epoch: 9 Global Step: 100350 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:57:27,289-Speed 5955.44 samples/sec Loss 8.0770 LearningRate 0.1315 Epoch: 9 Global Step: 100360 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:57:34,150-Speed 5971.04 samples/sec Loss 8.1299 LearningRate 0.1315 Epoch: 9 Global Step: 100370 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:57:41,009-Speed 5972.61 samples/sec Loss 8.1465 LearningRate 0.1315 Epoch: 9 Global Step: 100380 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:57:47,861-Speed 5979.21 samples/sec Loss 8.1514 LearningRate 0.1314 Epoch: 9 Global Step: 100390 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:57:54,710-Speed 5981.51 samples/sec Loss 8.1138 LearningRate 0.1314 Epoch: 9 Global Step: 100400 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:58:01,554-Speed 5986.08 samples/sec Loss 8.0883 LearningRate 0.1314 Epoch: 9 Global Step: 100410 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:58:08,410-Speed 5975.15 samples/sec Loss 8.0766 LearningRate 0.1314 Epoch: 9 Global Step: 100420 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:58:15,268-Speed 5973.47 samples/sec Loss 8.0545 LearningRate 0.1313 Epoch: 9 Global Step: 100430 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:58:22,125-Speed 5974.45 samples/sec Loss 8.1000 LearningRate 0.1313 Epoch: 9 Global Step: 100440 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:58:29,003-Speed 5956.57 samples/sec Loss 8.0365 LearningRate 0.1313 Epoch: 9 Global Step: 100450 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:58:35,873-Speed 5963.74 samples/sec Loss 8.0943 LearningRate 0.1313 Epoch: 9 Global Step: 100460 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:58:42,733-Speed 5972.28 samples/sec Loss 8.0646 LearningRate 0.1312 Epoch: 9 Global Step: 100470 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:58:49,592-Speed 5972.92 samples/sec Loss 8.0641 LearningRate 0.1312 Epoch: 9 Global Step: 100480 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:58:56,451-Speed 5973.21 samples/sec Loss 8.0334 LearningRate 0.1312 Epoch: 9 Global Step: 100490 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:59:03,318-Speed 5966.22 samples/sec Loss 8.0887 LearningRate 0.1312 Epoch: 9 Global Step: 100500 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:59:10,167-Speed 5981.97 samples/sec Loss 8.0572 LearningRate 0.1311 Epoch: 9 Global Step: 100510 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:59:17,032-Speed 5975.41 samples/sec Loss 8.1135 LearningRate 0.1311 Epoch: 9 Global Step: 100520 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 15:59:23,873-Speed 5988.83 samples/sec Loss 8.0883 LearningRate 0.1311 Epoch: 9 Global Step: 100530 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:59:30,739-Speed 5969.23 samples/sec Loss 8.0874 LearningRate 0.1311 Epoch: 9 Global Step: 100540 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:59:37,616-Speed 5956.69 samples/sec Loss 8.0648 LearningRate 0.1310 Epoch: 9 Global Step: 100550 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:59:44,469-Speed 5978.25 samples/sec Loss 8.0698 LearningRate 0.1310 Epoch: 9 Global Step: 100560 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:59:51,329-Speed 5972.45 samples/sec Loss 8.1062 LearningRate 0.1310 Epoch: 9 Global Step: 100570 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 15:59:58,193-Speed 5968.25 samples/sec Loss 8.0687 LearningRate 0.1310 Epoch: 9 Global Step: 100580 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:00:05,070-Speed 5959.30 samples/sec Loss 8.0641 LearningRate 0.1309 Epoch: 9 Global Step: 100590 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:00:11,922-Speed 5980.98 samples/sec Loss 8.0442 LearningRate 0.1309 Epoch: 9 Global Step: 100600 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:00:18,782-Speed 5972.60 samples/sec Loss 8.0250 LearningRate 0.1309 Epoch: 9 Global Step: 100610 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:00:25,631-Speed 5981.02 samples/sec Loss 8.0493 LearningRate 0.1309 Epoch: 9 Global Step: 100620 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:00:32,496-Speed 5967.51 samples/sec Loss 8.0190 LearningRate 0.1309 Epoch: 9 Global Step: 100630 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:00:39,356-Speed 5971.83 samples/sec Loss 8.0346 LearningRate 0.1308 Epoch: 9 Global Step: 100640 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:00:46,229-Speed 5961.73 samples/sec Loss 8.0620 LearningRate 0.1308 Epoch: 9 Global Step: 100650 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:00:53,091-Speed 5970.37 samples/sec Loss 8.0641 LearningRate 0.1308 Epoch: 9 Global Step: 100660 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:00:59,943-Speed 5979.01 samples/sec Loss 8.1076 LearningRate 0.1308 Epoch: 9 Global Step: 100670 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:01:06,794-Speed 5979.60 samples/sec Loss 8.0600 LearningRate 0.1307 Epoch: 9 Global Step: 100680 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:01:13,655-Speed 5971.05 samples/sec Loss 8.1224 LearningRate 0.1307 Epoch: 9 Global Step: 100690 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:01:20,533-Speed 5959.54 samples/sec Loss 8.0297 LearningRate 0.1307 Epoch: 9 Global Step: 100700 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:01:27,421-Speed 5947.74 samples/sec Loss 8.0738 LearningRate 0.1307 Epoch: 9 Global Step: 100710 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:01:34,320-Speed 5938.83 samples/sec Loss 8.1454 LearningRate 0.1306 Epoch: 9 Global Step: 100720 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:01:41,180-Speed 5971.42 samples/sec Loss 8.1043 LearningRate 0.1306 Epoch: 9 Global Step: 100730 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:01:48,061-Speed 5953.99 samples/sec Loss 8.0992 LearningRate 0.1306 Epoch: 9 Global Step: 100740 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:01:54,927-Speed 5966.70 samples/sec Loss 8.0596 LearningRate 0.1306 Epoch: 9 Global Step: 100750 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:02:01,773-Speed 5983.75 samples/sec Loss 8.0392 LearningRate 0.1305 Epoch: 9 Global Step: 100760 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:02:08,627-Speed 5977.13 samples/sec Loss 8.0400 LearningRate 0.1305 Epoch: 9 Global Step: 100770 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:02:15,507-Speed 5955.29 samples/sec Loss 8.0186 LearningRate 0.1305 Epoch: 9 Global Step: 100780 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:02:22,370-Speed 5969.27 samples/sec Loss 8.1210 LearningRate 0.1305 Epoch: 9 Global Step: 100790 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:02:29,229-Speed 5972.79 samples/sec Loss 8.0707 LearningRate 0.1304 Epoch: 9 Global Step: 100800 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:02:36,085-Speed 5975.28 samples/sec Loss 8.0917 LearningRate 0.1304 Epoch: 9 Global Step: 100810 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:02:42,959-Speed 5959.68 samples/sec Loss 8.0491 LearningRate 0.1304 Epoch: 9 Global Step: 100820 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:02:49,827-Speed 5965.84 samples/sec Loss 8.0290 LearningRate 0.1304 Epoch: 9 Global Step: 100830 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:02:56,718-Speed 5944.45 samples/sec Loss 8.1027 LearningRate 0.1303 Epoch: 9 Global Step: 100840 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:03:03,572-Speed 5977.51 samples/sec Loss 8.0505 LearningRate 0.1303 Epoch: 9 Global Step: 100850 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:03:10,416-Speed 5985.89 samples/sec Loss 8.0704 LearningRate 0.1303 Epoch: 9 Global Step: 100860 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:03:17,265-Speed 5984.04 samples/sec Loss 8.0200 LearningRate 0.1303 Epoch: 9 Global Step: 100870 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:03:24,195-Speed 5911.66 samples/sec Loss 8.0643 LearningRate 0.1302 Epoch: 9 Global Step: 100880 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:03:31,122-Speed 5914.24 samples/sec Loss 7.9500 LearningRate 0.1302 Epoch: 9 Global Step: 100890 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:03:38,011-Speed 5948.57 samples/sec Loss 8.0554 LearningRate 0.1302 Epoch: 9 Global Step: 100900 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:03:44,874-Speed 5969.40 samples/sec Loss 8.0805 LearningRate 0.1302 Epoch: 9 Global Step: 100910 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:03:51,838-Speed 5882.93 samples/sec Loss 8.0960 LearningRate 0.1301 Epoch: 9 Global Step: 100920 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:03:58,703-Speed 5968.09 samples/sec Loss 8.0876 LearningRate 0.1301 Epoch: 9 Global Step: 100930 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:04:05,573-Speed 5963.54 samples/sec Loss 8.0335 LearningRate 0.1301 Epoch: 9 Global Step: 100940 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:04:12,421-Speed 5982.50 samples/sec Loss 8.0340 LearningRate 0.1301 Epoch: 9 Global Step: 100950 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:04:19,276-Speed 5975.89 samples/sec Loss 8.0046 LearningRate 0.1300 Epoch: 9 Global Step: 100960 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:04:26,132-Speed 5976.13 samples/sec Loss 8.0404 LearningRate 0.1300 Epoch: 9 Global Step: 100970 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:04:32,977-Speed 5984.26 samples/sec Loss 8.0844 LearningRate 0.1300 Epoch: 9 Global Step: 100980 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:04:39,851-Speed 5960.24 samples/sec Loss 8.0353 LearningRate 0.1300 Epoch: 9 Global Step: 100990 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:04:46,722-Speed 5962.66 samples/sec Loss 8.0840 LearningRate 0.1299 Epoch: 9 Global Step: 101000 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:04:53,595-Speed 5960.95 samples/sec Loss 8.0908 LearningRate 0.1299 Epoch: 9 Global Step: 101010 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:05:00,474-Speed 5957.13 samples/sec Loss 8.1207 LearningRate 0.1299 Epoch: 9 Global Step: 101020 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:05:07,352-Speed 5956.11 samples/sec Loss 8.0525 LearningRate 0.1299 Epoch: 9 Global Step: 101030 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:05:14,214-Speed 5970.34 samples/sec Loss 8.0484 LearningRate 0.1298 Epoch: 9 Global Step: 101040 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:05:21,074-Speed 5971.98 samples/sec Loss 8.1113 LearningRate 0.1298 Epoch: 9 Global Step: 101050 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:05:27,938-Speed 5968.51 samples/sec Loss 8.0541 LearningRate 0.1298 Epoch: 9 Global Step: 101060 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:05:34,788-Speed 5981.15 samples/sec Loss 8.0352 LearningRate 0.1298 Epoch: 9 Global Step: 101070 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:05:41,668-Speed 5954.37 samples/sec Loss 8.0635 LearningRate 0.1298 Epoch: 9 Global Step: 101080 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:05:48,541-Speed 5959.97 samples/sec Loss 8.1014 LearningRate 0.1297 Epoch: 9 Global Step: 101090 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:05:55,425-Speed 5955.75 samples/sec Loss 8.0628 LearningRate 0.1297 Epoch: 9 Global Step: 101100 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:06:02,284-Speed 5972.55 samples/sec Loss 8.0448 LearningRate 0.1297 Epoch: 9 Global Step: 101110 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-08 16:06:09,177-Speed 5943.16 samples/sec Loss 8.0158 LearningRate 0.1297 Epoch: 9 Global Step: 101120 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-08 16:06:16,030-Speed 5978.60 samples/sec Loss 8.1284 LearningRate 0.1296 Epoch: 9 Global Step: 101130 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:06:22,892-Speed 5970.45 samples/sec Loss 7.9992 LearningRate 0.1296 Epoch: 9 Global Step: 101140 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:06:29,754-Speed 5970.16 samples/sec Loss 7.9808 LearningRate 0.1296 Epoch: 9 Global Step: 101150 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:06:36,622-Speed 5965.19 samples/sec Loss 8.0437 LearningRate 0.1296 Epoch: 9 Global Step: 101160 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:06:43,494-Speed 5961.84 samples/sec Loss 8.0324 LearningRate 0.1295 Epoch: 9 Global Step: 101170 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:06:50,389-Speed 5940.88 samples/sec Loss 7.9935 LearningRate 0.1295 Epoch: 9 Global Step: 101180 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:06:57,240-Speed 5980.43 samples/sec Loss 8.0222 LearningRate 0.1295 Epoch: 9 Global Step: 101190 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:07:04,100-Speed 5972.03 samples/sec Loss 8.0030 LearningRate 0.1295 Epoch: 9 Global Step: 101200 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:07:10,961-Speed 5971.34 samples/sec Loss 8.0489 LearningRate 0.1294 Epoch: 9 Global Step: 101210 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:07:17,823-Speed 5970.23 samples/sec Loss 8.0476 LearningRate 0.1294 Epoch: 9 Global Step: 101220 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:07:24,682-Speed 5973.35 samples/sec Loss 8.0478 LearningRate 0.1294 Epoch: 9 Global Step: 101230 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-08 16:07:31,573-Speed 5945.55 samples/sec Loss 7.9503 LearningRate 0.1294 Epoch: 9 Global Step: 101240 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-08 16:07:38,421-Speed 5982.55 samples/sec Loss 8.0173 LearningRate 0.1293 Epoch: 9 Global Step: 101250 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:07:45,273-Speed 5978.82 samples/sec Loss 8.0673 LearningRate 0.1293 Epoch: 9 Global Step: 101260 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:07:52,135-Speed 5970.67 samples/sec Loss 8.0119 LearningRate 0.1293 Epoch: 9 Global Step: 101270 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:07:59,015-Speed 5954.43 samples/sec Loss 8.0549 LearningRate 0.1293 Epoch: 9 Global Step: 101280 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:08:05,907-Speed 5944.25 samples/sec Loss 8.0138 LearningRate 0.1292 Epoch: 9 Global Step: 101290 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:08:12,774-Speed 5965.54 samples/sec Loss 8.0791 LearningRate 0.1292 Epoch: 9 Global Step: 101300 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:08:19,642-Speed 5964.68 samples/sec Loss 8.0241 LearningRate 0.1292 Epoch: 9 Global Step: 101310 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:08:26,493-Speed 5980.14 samples/sec Loss 8.0530 LearningRate 0.1292 Epoch: 9 Global Step: 101320 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:08:33,421-Speed 5913.78 samples/sec Loss 8.0069 LearningRate 0.1291 Epoch: 9 Global Step: 101330 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:08:40,335-Speed 5925.27 samples/sec Loss 8.0136 LearningRate 0.1291 Epoch: 9 Global Step: 101340 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:08:47,188-Speed 5977.43 samples/sec Loss 8.0312 LearningRate 0.1291 Epoch: 9 Global Step: 101350 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-08 16:08:54,153-Speed 5882.22 samples/sec Loss 8.0502 LearningRate 0.1291 Epoch: 9 Global Step: 101360 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:09:01,007-Speed 5977.70 samples/sec Loss 8.0218 LearningRate 0.1290 Epoch: 9 Global Step: 101370 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:09:07,879-Speed 5960.95 samples/sec Loss 8.0726 LearningRate 0.1290 Epoch: 9 Global Step: 101380 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:09:14,729-Speed 5980.29 samples/sec Loss 7.9944 LearningRate 0.1290 Epoch: 9 Global Step: 101390 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:09:21,602-Speed 5961.64 samples/sec Loss 8.0283 LearningRate 0.1290 Epoch: 9 Global Step: 101400 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:09:28,471-Speed 5963.33 samples/sec Loss 8.0612 LearningRate 0.1289 Epoch: 9 Global Step: 101410 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:09:35,336-Speed 5970.06 samples/sec Loss 8.0606 LearningRate 0.1289 Epoch: 9 Global Step: 101420 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:09:42,197-Speed 5971.84 samples/sec Loss 8.0805 LearningRate 0.1289 Epoch: 9 Global Step: 101430 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:09:49,066-Speed 5963.80 samples/sec Loss 8.0504 LearningRate 0.1289 Epoch: 9 Global Step: 101440 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:09:55,934-Speed 5965.51 samples/sec Loss 8.0412 LearningRate 0.1288 Epoch: 9 Global Step: 101450 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:10:02,789-Speed 5976.40 samples/sec Loss 8.0002 LearningRate 0.1288 Epoch: 9 Global Step: 101460 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:10:09,663-Speed 5960.03 samples/sec Loss 8.0368 LearningRate 0.1288 Epoch: 9 Global Step: 101470 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:10:16,518-Speed 5976.44 samples/sec Loss 8.0146 LearningRate 0.1288 Epoch: 9 Global Step: 101480 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:10:23,388-Speed 5963.66 samples/sec Loss 7.9451 LearningRate 0.1288 Epoch: 9 Global Step: 101490 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:10:30,276-Speed 5950.01 samples/sec Loss 8.0614 LearningRate 0.1287 Epoch: 9 Global Step: 101500 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:10:37,131-Speed 5976.70 samples/sec Loss 8.0538 LearningRate 0.1287 Epoch: 9 Global Step: 101510 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:10:44,012-Speed 5954.18 samples/sec Loss 8.0036 LearningRate 0.1287 Epoch: 9 Global Step: 101520 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:10:50,875-Speed 5969.01 samples/sec Loss 8.0168 LearningRate 0.1287 Epoch: 9 Global Step: 101530 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:10:57,736-Speed 5971.52 samples/sec Loss 8.0282 LearningRate 0.1286 Epoch: 9 Global Step: 101540 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:11:04,627-Speed 5945.28 samples/sec Loss 8.0542 LearningRate 0.1286 Epoch: 9 Global Step: 101550 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:11:11,515-Speed 5950.41 samples/sec Loss 7.9903 LearningRate 0.1286 Epoch: 9 Global Step: 101560 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:11:18,390-Speed 5958.99 samples/sec Loss 7.9817 LearningRate 0.1286 Epoch: 9 Global Step: 101570 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:11:25,262-Speed 5961.64 samples/sec Loss 8.0308 LearningRate 0.1285 Epoch: 9 Global Step: 101580 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:11:32,132-Speed 5963.10 samples/sec Loss 8.0359 LearningRate 0.1285 Epoch: 9 Global Step: 101590 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:11:38,990-Speed 5974.30 samples/sec Loss 8.0297 LearningRate 0.1285 Epoch: 9 Global Step: 101600 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:11:45,839-Speed 5982.05 samples/sec Loss 8.0865 LearningRate 0.1285 Epoch: 9 Global Step: 101610 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:11:52,686-Speed 5982.37 samples/sec Loss 7.9830 LearningRate 0.1284 Epoch: 9 Global Step: 101620 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:11:59,564-Speed 5957.56 samples/sec Loss 8.0699 LearningRate 0.1284 Epoch: 9 Global Step: 101630 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:12:06,482-Speed 5921.84 samples/sec Loss 7.9886 LearningRate 0.1284 Epoch: 9 Global Step: 101640 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:12:13,336-Speed 5977.37 samples/sec Loss 8.0489 LearningRate 0.1284 Epoch: 9 Global Step: 101650 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:12:20,199-Speed 5969.52 samples/sec Loss 7.9798 LearningRate 0.1283 Epoch: 9 Global Step: 101660 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:12:27,059-Speed 5971.65 samples/sec Loss 7.9687 LearningRate 0.1283 Epoch: 9 Global Step: 101670 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:12:33,914-Speed 5976.57 samples/sec Loss 7.9670 LearningRate 0.1283 Epoch: 9 Global Step: 101680 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:12:40,775-Speed 5971.13 samples/sec Loss 8.0576 LearningRate 0.1283 Epoch: 9 Global Step: 101690 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:12:47,629-Speed 5977.68 samples/sec Loss 8.0329 LearningRate 0.1282 Epoch: 9 Global Step: 101700 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:12:54,482-Speed 5977.98 samples/sec Loss 8.0034 LearningRate 0.1282 Epoch: 9 Global Step: 101710 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:13:01,344-Speed 5970.11 samples/sec Loss 8.0031 LearningRate 0.1282 Epoch: 9 Global Step: 101720 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:13:08,201-Speed 5974.83 samples/sec Loss 7.9504 LearningRate 0.1282 Epoch: 9 Global Step: 101730 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:13:15,053-Speed 5978.91 samples/sec Loss 8.0967 LearningRate 0.1281 Epoch: 9 Global Step: 101740 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:13:21,902-Speed 5981.06 samples/sec Loss 8.0610 LearningRate 0.1281 Epoch: 9 Global Step: 101750 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:13:28,756-Speed 5977.02 samples/sec Loss 8.0020 LearningRate 0.1281 Epoch: 9 Global Step: 101760 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:13:35,629-Speed 5960.78 samples/sec Loss 7.9898 LearningRate 0.1281 Epoch: 9 Global Step: 101770 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:13:42,475-Speed 5984.36 samples/sec Loss 7.9795 LearningRate 0.1280 Epoch: 9 Global Step: 101780 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:13:49,360-Speed 5949.80 samples/sec Loss 8.0074 LearningRate 0.1280 Epoch: 9 Global Step: 101790 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:13:56,236-Speed 5958.43 samples/sec Loss 7.9902 LearningRate 0.1280 Epoch: 9 Global Step: 101800 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-08 16:14:03,107-Speed 5962.79 samples/sec Loss 7.9761 LearningRate 0.1280 Epoch: 9 Global Step: 101810 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:14:09,961-Speed 5977.00 samples/sec Loss 7.9427 LearningRate 0.1279 Epoch: 9 Global Step: 101820 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:14:16,816-Speed 5976.71 samples/sec Loss 7.9655 LearningRate 0.1279 Epoch: 9 Global Step: 101830 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:14:23,689-Speed 5960.99 samples/sec Loss 8.0027 LearningRate 0.1279 Epoch: 9 Global Step: 101840 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:14:30,556-Speed 5965.42 samples/sec Loss 8.0062 LearningRate 0.1279 Epoch: 9 Global Step: 101850 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:14:37,419-Speed 5970.07 samples/sec Loss 7.9645 LearningRate 0.1279 Epoch: 9 Global Step: 101860 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:14:44,287-Speed 5965.53 samples/sec Loss 7.9278 LearningRate 0.1278 Epoch: 9 Global Step: 101870 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:14:51,217-Speed 5911.68 samples/sec Loss 7.9436 LearningRate 0.1278 Epoch: 9 Global Step: 101880 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:14:58,121-Speed 5933.51 samples/sec Loss 8.0365 LearningRate 0.1278 Epoch: 9 Global Step: 101890 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:15:05,002-Speed 5954.49 samples/sec Loss 8.0468 LearningRate 0.1278 Epoch: 9 Global Step: 101900 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:15:11,846-Speed 5986.39 samples/sec Loss 8.0181 LearningRate 0.1277 Epoch: 9 Global Step: 101910 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:15:18,718-Speed 5961.59 samples/sec Loss 7.9921 LearningRate 0.1277 Epoch: 9 Global Step: 101920 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:15:25,555-Speed 5991.87 samples/sec Loss 7.9819 LearningRate 0.1277 Epoch: 9 Global Step: 101930 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:15:32,439-Speed 5951.40 samples/sec Loss 7.9853 LearningRate 0.1277 Epoch: 9 Global Step: 101940 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:15:39,292-Speed 5978.88 samples/sec Loss 7.9915 LearningRate 0.1276 Epoch: 9 Global Step: 101950 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:15:46,170-Speed 5956.23 samples/sec Loss 8.0151 LearningRate 0.1276 Epoch: 9 Global Step: 101960 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:15:53,030-Speed 5972.92 samples/sec Loss 8.0120 LearningRate 0.1276 Epoch: 9 Global Step: 101970 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:15:59,886-Speed 5974.37 samples/sec Loss 8.0254 LearningRate 0.1276 Epoch: 9 Global Step: 101980 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:16:06,765-Speed 5955.75 samples/sec Loss 8.0053 LearningRate 0.1275 Epoch: 9 Global Step: 101990 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:16:13,667-Speed 5935.60 samples/sec Loss 7.9798 LearningRate 0.1275 Epoch: 9 Global Step: 102000 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:16:20,535-Speed 5965.81 samples/sec Loss 7.9608 LearningRate 0.1275 Epoch: 9 Global Step: 102010 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:16:27,390-Speed 5976.29 samples/sec Loss 7.9428 LearningRate 0.1275 Epoch: 9 Global Step: 102020 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:16:34,263-Speed 5960.67 samples/sec Loss 7.9537 LearningRate 0.1274 Epoch: 9 Global Step: 102030 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:16:41,123-Speed 5972.62 samples/sec Loss 7.9705 LearningRate 0.1274 Epoch: 9 Global Step: 102040 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:16:47,972-Speed 5981.86 samples/sec Loss 8.0115 LearningRate 0.1274 Epoch: 9 Global Step: 102050 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:16:54,842-Speed 5963.18 samples/sec Loss 7.9295 LearningRate 0.1274 Epoch: 9 Global Step: 102060 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:17:01,692-Speed 5981.12 samples/sec Loss 7.9564 LearningRate 0.1273 Epoch: 9 Global Step: 102070 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:17:08,576-Speed 5951.61 samples/sec Loss 7.9466 LearningRate 0.1273 Epoch: 9 Global Step: 102080 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:17:15,466-Speed 5945.88 samples/sec Loss 7.9737 LearningRate 0.1273 Epoch: 9 Global Step: 102090 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:17:22,323-Speed 5977.54 samples/sec Loss 7.9894 LearningRate 0.1273 Epoch: 9 Global Step: 102100 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:17:29,192-Speed 5963.78 samples/sec Loss 7.9493 LearningRate 0.1272 Epoch: 9 Global Step: 102110 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:17:36,051-Speed 5973.82 samples/sec Loss 7.9793 LearningRate 0.1272 Epoch: 9 Global Step: 102120 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:17:42,922-Speed 5962.19 samples/sec Loss 7.9504 LearningRate 0.1272 Epoch: 9 Global Step: 102130 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:17:49,773-Speed 5979.18 samples/sec Loss 7.9686 LearningRate 0.1272 Epoch: 9 Global Step: 102140 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:17:56,633-Speed 5972.36 samples/sec Loss 7.9845 LearningRate 0.1272 Epoch: 9 Global Step: 102150 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:18:03,492-Speed 5972.37 samples/sec Loss 7.9278 LearningRate 0.1271 Epoch: 9 Global Step: 102160 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:18:10,347-Speed 5977.80 samples/sec Loss 7.9780 LearningRate 0.1271 Epoch: 9 Global Step: 102170 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:18:17,218-Speed 5961.63 samples/sec Loss 7.9439 LearningRate 0.1271 Epoch: 9 Global Step: 102180 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:18:24,188-Speed 5878.24 samples/sec Loss 7.9952 LearningRate 0.1271 Epoch: 9 Global Step: 102190 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:18:31,080-Speed 5944.85 samples/sec Loss 7.9729 LearningRate 0.1270 Epoch: 9 Global Step: 102200 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:18:37,946-Speed 5966.62 samples/sec Loss 7.9654 LearningRate 0.1270 Epoch: 9 Global Step: 102210 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:18:44,802-Speed 5975.39 samples/sec Loss 7.9577 LearningRate 0.1270 Epoch: 9 Global Step: 102220 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:18:51,648-Speed 5985.37 samples/sec Loss 7.9358 LearningRate 0.1270 Epoch: 9 Global Step: 102230 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:18:58,518-Speed 5966.12 samples/sec Loss 7.9421 LearningRate 0.1269 Epoch: 9 Global Step: 102240 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:19:05,379-Speed 5970.69 samples/sec Loss 7.9919 LearningRate 0.1269 Epoch: 9 Global Step: 102250 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:19:12,241-Speed 5970.38 samples/sec Loss 8.0067 LearningRate 0.1269 Epoch: 9 Global Step: 102260 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:19:19,107-Speed 5967.09 samples/sec Loss 8.0104 LearningRate 0.1269 Epoch: 9 Global Step: 102270 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:19:25,995-Speed 5948.07 samples/sec Loss 8.0049 LearningRate 0.1268 Epoch: 9 Global Step: 102280 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:19:32,865-Speed 5962.83 samples/sec Loss 7.9465 LearningRate 0.1268 Epoch: 9 Global Step: 102290 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:19:39,724-Speed 5973.64 samples/sec Loss 7.9633 LearningRate 0.1268 Epoch: 9 Global Step: 102300 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:19:46,567-Speed 5986.41 samples/sec Loss 7.9705 LearningRate 0.1268 Epoch: 9 Global Step: 102310 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:19:53,419-Speed 5978.92 samples/sec Loss 7.9047 LearningRate 0.1267 Epoch: 9 Global Step: 102320 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:20:00,268-Speed 5982.36 samples/sec Loss 7.9304 LearningRate 0.1267 Epoch: 9 Global Step: 102330 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:20:07,162-Speed 5941.53 samples/sec Loss 7.9196 LearningRate 0.1267 Epoch: 9 Global Step: 102340 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:20:14,010-Speed 5983.33 samples/sec Loss 7.9660 LearningRate 0.1267 Epoch: 9 Global Step: 102350 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:20:20,896-Speed 5949.15 samples/sec Loss 7.9289 LearningRate 0.1266 Epoch: 9 Global Step: 102360 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:20:27,747-Speed 5980.55 samples/sec Loss 7.9795 LearningRate 0.1266 Epoch: 9 Global Step: 102370 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:20:34,606-Speed 5973.30 samples/sec Loss 7.9271 LearningRate 0.1266 Epoch: 9 Global Step: 102380 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:20:41,776-Speed 5713.44 samples/sec Loss 7.9256 LearningRate 0.1266 Epoch: 9 Global Step: 102390 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:20:48,627-Speed 5980.18 samples/sec Loss 7.9357 LearningRate 0.1265 Epoch: 9 Global Step: 102400 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:20:55,478-Speed 5980.23 samples/sec Loss 7.9882 LearningRate 0.1265 Epoch: 9 Global Step: 102410 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:21:02,339-Speed 5971.48 samples/sec Loss 7.9109 LearningRate 0.1265 Epoch: 9 Global Step: 102420 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:21:09,202-Speed 5968.65 samples/sec Loss 7.9593 LearningRate 0.1265 Epoch: 9 Global Step: 102430 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:21:16,055-Speed 5978.46 samples/sec Loss 7.9064 LearningRate 0.1265 Epoch: 9 Global Step: 102440 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:21:22,916-Speed 5970.71 samples/sec Loss 7.8977 LearningRate 0.1264 Epoch: 9 Global Step: 102450 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:21:29,789-Speed 5961.43 samples/sec Loss 7.9519 LearningRate 0.1264 Epoch: 9 Global Step: 102460 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:21:36,975-Speed 5701.22 samples/sec Loss 7.9887 LearningRate 0.1264 Epoch: 9 Global Step: 102470 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:21:43,852-Speed 5956.46 samples/sec Loss 8.0610 LearningRate 0.1264 Epoch: 9 Global Step: 102480 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:21:50,703-Speed 5980.14 samples/sec Loss 7.9254 LearningRate 0.1263 Epoch: 9 Global Step: 102490 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:21:57,555-Speed 5979.70 samples/sec Loss 7.9422 LearningRate 0.1263 Epoch: 9 Global Step: 102500 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:22:04,409-Speed 5977.32 samples/sec Loss 7.9379 LearningRate 0.1263 Epoch: 9 Global Step: 102510 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-08 16:22:11,261-Speed 5978.80 samples/sec Loss 7.9046 LearningRate 0.1263 Epoch: 9 Global Step: 102520 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-08 16:22:18,117-Speed 5976.04 samples/sec Loss 7.9515 LearningRate 0.1262 Epoch: 9 Global Step: 102530 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:22:24,979-Speed 5971.23 samples/sec Loss 7.9411 LearningRate 0.1262 Epoch: 9 Global Step: 102540 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:22:31,858-Speed 5955.75 samples/sec Loss 8.0014 LearningRate 0.1262 Epoch: 9 Global Step: 102550 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:22:38,746-Speed 5947.19 samples/sec Loss 7.9756 LearningRate 0.1262 Epoch: 9 Global Step: 102560 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:22:45,649-Speed 5937.38 samples/sec Loss 7.9930 LearningRate 0.1261 Epoch: 9 Global Step: 102570 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:22:52,572-Speed 5917.61 samples/sec Loss 7.9413 LearningRate 0.1261 Epoch: 9 Global Step: 102580 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:22:59,500-Speed 5914.68 samples/sec Loss 7.9789 LearningRate 0.1261 Epoch: 9 Global Step: 102590 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:23:06,352-Speed 5978.82 samples/sec Loss 7.9199 LearningRate 0.1261 Epoch: 9 Global Step: 102600 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:23:13,211-Speed 5972.87 samples/sec Loss 7.9693 LearningRate 0.1260 Epoch: 9 Global Step: 102610 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:23:20,076-Speed 5967.03 samples/sec Loss 7.9211 LearningRate 0.1260 Epoch: 9 Global Step: 102620 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:23:26,910-Speed 5994.75 samples/sec Loss 7.9579 LearningRate 0.1260 Epoch: 9 Global Step: 102630 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:23:33,778-Speed 5965.33 samples/sec Loss 7.9349 LearningRate 0.1260 Epoch: 9 Global Step: 102640 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:23:40,642-Speed 5968.42 samples/sec Loss 7.9461 LearningRate 0.1259 Epoch: 9 Global Step: 102650 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:23:47,515-Speed 5961.07 samples/sec Loss 7.9405 LearningRate 0.1259 Epoch: 9 Global Step: 102660 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:23:54,386-Speed 5962.84 samples/sec Loss 7.9511 LearningRate 0.1259 Epoch: 9 Global Step: 102670 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:24:01,238-Speed 5978.62 samples/sec Loss 7.9687 LearningRate 0.1259 Epoch: 9 Global Step: 102680 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:24:08,086-Speed 5982.64 samples/sec Loss 7.9720 LearningRate 0.1258 Epoch: 9 Global Step: 102690 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:24:14,947-Speed 5970.93 samples/sec Loss 7.9547 LearningRate 0.1258 Epoch: 9 Global Step: 102700 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:24:21,801-Speed 5978.16 samples/sec Loss 7.8661 LearningRate 0.1258 Epoch: 9 Global Step: 102710 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:24:28,672-Speed 5961.99 samples/sec Loss 7.9357 LearningRate 0.1258 Epoch: 9 Global Step: 102720 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:24:35,528-Speed 5974.88 samples/sec Loss 7.8850 LearningRate 0.1258 Epoch: 9 Global Step: 102730 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:24:42,387-Speed 5972.86 samples/sec Loss 7.9032 LearningRate 0.1257 Epoch: 9 Global Step: 102740 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 16:24:49,238-Speed 5979.82 samples/sec Loss 7.9451 LearningRate 0.1257 Epoch: 9 Global Step: 102750 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:24:56,119-Speed 5955.49 samples/sec Loss 7.9756 LearningRate 0.1257 Epoch: 9 Global Step: 102760 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:25:02,980-Speed 5971.54 samples/sec Loss 7.9550 LearningRate 0.1257 Epoch: 9 Global Step: 102770 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:25:09,846-Speed 5966.72 samples/sec Loss 7.9479 LearningRate 0.1256 Epoch: 9 Global Step: 102780 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:25:16,703-Speed 5975.45 samples/sec Loss 7.9308 LearningRate 0.1256 Epoch: 9 Global Step: 102790 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:25:23,564-Speed 5971.83 samples/sec Loss 7.9151 LearningRate 0.1256 Epoch: 9 Global Step: 102800 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:25:30,424-Speed 5971.36 samples/sec Loss 7.9087 LearningRate 0.1256 Epoch: 9 Global Step: 102810 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 16:25:37,294-Speed 5963.49 samples/sec Loss 7.9812 LearningRate 0.1255 Epoch: 9 Global Step: 102820 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:25:44,143-Speed 5981.77 samples/sec Loss 7.9403 LearningRate 0.1255 Epoch: 9 Global Step: 102830 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:25:51,002-Speed 5972.41 samples/sec Loss 7.9065 LearningRate 0.1255 Epoch: 9 Global Step: 102840 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:25:57,848-Speed 5985.65 samples/sec Loss 7.9173 LearningRate 0.1255 Epoch: 9 Global Step: 102850 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:26:04,710-Speed 5969.94 samples/sec Loss 7.9679 LearningRate 0.1254 Epoch: 9 Global Step: 102860 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:26:11,605-Speed 5942.19 samples/sec Loss 7.8884 LearningRate 0.1254 Epoch: 9 Global Step: 102870 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:26:18,453-Speed 5982.51 samples/sec Loss 7.9972 LearningRate 0.1254 Epoch: 9 Global Step: 102880 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:26:25,312-Speed 5973.06 samples/sec Loss 7.9612 LearningRate 0.1254 Epoch: 9 Global Step: 102890 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:26:32,186-Speed 5959.19 samples/sec Loss 7.9098 LearningRate 0.1253 Epoch: 9 Global Step: 102900 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:26:39,037-Speed 5980.85 samples/sec Loss 7.9269 LearningRate 0.1253 Epoch: 9 Global Step: 102910 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:26:45,892-Speed 5976.20 samples/sec Loss 7.8665 LearningRate 0.1253 Epoch: 9 Global Step: 102920 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:26:52,757-Speed 5967.42 samples/sec Loss 7.9324 LearningRate 0.1253 Epoch: 9 Global Step: 102930 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:26:59,608-Speed 5980.43 samples/sec Loss 7.9153 LearningRate 0.1252 Epoch: 9 Global Step: 102940 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:27:06,531-Speed 5918.31 samples/sec Loss 7.9029 LearningRate 0.1252 Epoch: 9 Global Step: 102950 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:27:13,404-Speed 5962.02 samples/sec Loss 7.8932 LearningRate 0.1252 Epoch: 9 Global Step: 102960 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:27:20,280-Speed 5957.75 samples/sec Loss 7.9377 LearningRate 0.1252 Epoch: 9 Global Step: 102970 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:27:27,149-Speed 5964.76 samples/sec Loss 7.9184 LearningRate 0.1252 Epoch: 9 Global Step: 102980 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:27:34,000-Speed 5979.53 samples/sec Loss 7.9212 LearningRate 0.1251 Epoch: 9 Global Step: 102990 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:27:41,093-Speed 5778.19 samples/sec Loss 7.9012 LearningRate 0.1251 Epoch: 9 Global Step: 103000 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:27:47,962-Speed 5964.92 samples/sec Loss 7.9132 LearningRate 0.1251 Epoch: 9 Global Step: 103010 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:27:54,851-Speed 5946.82 samples/sec Loss 7.9377 LearningRate 0.1251 Epoch: 9 Global Step: 103020 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:28:01,708-Speed 5974.23 samples/sec Loss 7.9235 LearningRate 0.1250 Epoch: 9 Global Step: 103030 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:28:08,551-Speed 5986.70 samples/sec Loss 7.9135 LearningRate 0.1250 Epoch: 9 Global Step: 103040 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:28:15,430-Speed 5955.74 samples/sec Loss 7.9384 LearningRate 0.1250 Epoch: 9 Global Step: 103050 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-08 16:28:22,276-Speed 5985.30 samples/sec Loss 7.9124 LearningRate 0.1250 Epoch: 9 Global Step: 103060 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:28:29,137-Speed 5970.39 samples/sec Loss 7.8983 LearningRate 0.1249 Epoch: 9 Global Step: 103070 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:28:35,980-Speed 5987.18 samples/sec Loss 7.8704 LearningRate 0.1249 Epoch: 9 Global Step: 103080 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:28:42,838-Speed 5974.45 samples/sec Loss 7.8948 LearningRate 0.1249 Epoch: 9 Global Step: 103090 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:28:49,697-Speed 5972.74 samples/sec Loss 7.9396 LearningRate 0.1249 Epoch: 9 Global Step: 103100 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:28:56,554-Speed 5974.23 samples/sec Loss 7.9275 LearningRate 0.1248 Epoch: 9 Global Step: 103110 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:29:03,416-Speed 5973.28 samples/sec Loss 7.9568 LearningRate 0.1248 Epoch: 9 Global Step: 103120 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:29:10,301-Speed 5950.50 samples/sec Loss 7.9413 LearningRate 0.1248 Epoch: 9 Global Step: 103130 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:29:17,147-Speed 5984.86 samples/sec Loss 7.9246 LearningRate 0.1248 Epoch: 9 Global Step: 103140 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:29:24,010-Speed 5970.16 samples/sec Loss 7.8499 LearningRate 0.1247 Epoch: 9 Global Step: 103150 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:29:30,874-Speed 5967.75 samples/sec Loss 7.8714 LearningRate 0.1247 Epoch: 9 Global Step: 103160 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:29:37,725-Speed 5980.60 samples/sec Loss 7.8808 LearningRate 0.1247 Epoch: 9 Global Step: 103170 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:29:44,583-Speed 5973.09 samples/sec Loss 7.9550 LearningRate 0.1247 Epoch: 9 Global Step: 103180 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:29:51,439-Speed 5976.09 samples/sec Loss 7.8711 LearningRate 0.1247 Epoch: 9 Global Step: 103190 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:29:58,347-Speed 5930.71 samples/sec Loss 7.9431 LearningRate 0.1246 Epoch: 9 Global Step: 103200 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:30:05,211-Speed 5967.98 samples/sec Loss 7.8862 LearningRate 0.1246 Epoch: 9 Global Step: 103210 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:30:12,062-Speed 5979.58 samples/sec Loss 7.9432 LearningRate 0.1246 Epoch: 9 Global Step: 103220 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:30:18,918-Speed 5978.66 samples/sec Loss 7.8650 LearningRate 0.1246 Epoch: 9 Global Step: 103230 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:30:25,795-Speed 5958.48 samples/sec Loss 7.8894 LearningRate 0.1245 Epoch: 9 Global Step: 103240 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:30:32,660-Speed 5967.55 samples/sec Loss 7.9577 LearningRate 0.1245 Epoch: 9 Global Step: 103250 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:30:39,572-Speed 5928.02 samples/sec Loss 7.9007 LearningRate 0.1245 Epoch: 9 Global Step: 103260 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:30:46,434-Speed 5969.74 samples/sec Loss 7.8958 LearningRate 0.1245 Epoch: 9 Global Step: 103270 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:30:53,297-Speed 5969.69 samples/sec Loss 7.9321 LearningRate 0.1244 Epoch: 9 Global Step: 103280 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:31:00,160-Speed 5968.22 samples/sec Loss 7.9153 LearningRate 0.1244 Epoch: 9 Global Step: 103290 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:31:07,038-Speed 5957.01 samples/sec Loss 7.8388 LearningRate 0.1244 Epoch: 9 Global Step: 103300 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:31:13,882-Speed 5985.88 samples/sec Loss 7.9057 LearningRate 0.1244 Epoch: 9 Global Step: 103310 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:31:20,753-Speed 5964.60 samples/sec Loss 7.8221 LearningRate 0.1243 Epoch: 9 Global Step: 103320 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:31:27,607-Speed 5980.69 samples/sec Loss 7.8558 LearningRate 0.1243 Epoch: 9 Global Step: 103330 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:31:34,456-Speed 5981.23 samples/sec Loss 7.8145 LearningRate 0.1243 Epoch: 9 Global Step: 103340 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:31:41,314-Speed 5974.06 samples/sec Loss 7.8983 LearningRate 0.1243 Epoch: 9 Global Step: 103350 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:31:48,165-Speed 5980.23 samples/sec Loss 7.8659 LearningRate 0.1242 Epoch: 9 Global Step: 103360 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:31:55,023-Speed 5973.67 samples/sec Loss 7.8627 LearningRate 0.1242 Epoch: 9 Global Step: 103370 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:32:01,875-Speed 5978.18 samples/sec Loss 7.8381 LearningRate 0.1242 Epoch: 9 Global Step: 103380 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:32:08,760-Speed 5952.07 samples/sec Loss 7.8813 LearningRate 0.1242 Epoch: 9 Global Step: 103390 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:32:15,599-Speed 5989.87 samples/sec Loss 7.9104 LearningRate 0.1241 Epoch: 9 Global Step: 103400 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:32:22,444-Speed 5985.33 samples/sec Loss 7.8553 LearningRate 0.1241 Epoch: 9 Global Step: 103410 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:32:29,294-Speed 5980.28 samples/sec Loss 7.9206 LearningRate 0.1241 Epoch: 9 Global Step: 103420 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:32:36,141-Speed 5983.26 samples/sec Loss 7.8658 LearningRate 0.1241 Epoch: 9 Global Step: 103430 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:32:43,003-Speed 5970.42 samples/sec Loss 7.8572 LearningRate 0.1241 Epoch: 9 Global Step: 103440 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:32:49,885-Speed 5955.57 samples/sec Loss 7.9407 LearningRate 0.1240 Epoch: 9 Global Step: 103450 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:32:56,739-Speed 5977.28 samples/sec Loss 7.9246 LearningRate 0.1240 Epoch: 9 Global Step: 103460 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:33:03,603-Speed 5968.31 samples/sec Loss 7.8993 LearningRate 0.1240 Epoch: 9 Global Step: 103470 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:33:10,452-Speed 5983.63 samples/sec Loss 7.8637 LearningRate 0.1240 Epoch: 9 Global Step: 103480 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:33:17,309-Speed 5974.83 samples/sec Loss 7.8972 LearningRate 0.1239 Epoch: 9 Global Step: 103490 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:33:24,585-Speed 5631.40 samples/sec Loss 7.8375 LearningRate 0.1239 Epoch: 9 Global Step: 103500 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:33:31,485-Speed 5936.43 samples/sec Loss 7.8265 LearningRate 0.1239 Epoch: 9 Global Step: 103510 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:33:38,331-Speed 5987.03 samples/sec Loss 7.8909 LearningRate 0.1239 Epoch: 9 Global Step: 103520 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:33:45,192-Speed 5970.86 samples/sec Loss 7.8133 LearningRate 0.1238 Epoch: 9 Global Step: 103530 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:33:52,065-Speed 5963.21 samples/sec Loss 7.9389 LearningRate 0.1238 Epoch: 9 Global Step: 103540 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:33:58,942-Speed 5957.06 samples/sec Loss 7.9132 LearningRate 0.1238 Epoch: 9 Global Step: 103550 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:34:05,793-Speed 5979.84 samples/sec Loss 7.8999 LearningRate 0.1238 Epoch: 9 Global Step: 103560 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:34:12,637-Speed 5987.43 samples/sec Loss 7.8665 LearningRate 0.1237 Epoch: 9 Global Step: 103570 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:34:19,487-Speed 5981.76 samples/sec Loss 7.9298 LearningRate 0.1237 Epoch: 9 Global Step: 103580 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:34:26,350-Speed 5969.51 samples/sec Loss 7.8719 LearningRate 0.1237 Epoch: 9 Global Step: 103590 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:34:33,208-Speed 5973.48 samples/sec Loss 7.9314 LearningRate 0.1237 Epoch: 9 Global Step: 103600 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:34:40,083-Speed 5959.46 samples/sec Loss 7.8619 LearningRate 0.1236 Epoch: 9 Global Step: 103610 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:34:46,936-Speed 5977.59 samples/sec Loss 7.8347 LearningRate 0.1236 Epoch: 9 Global Step: 103620 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:34:53,782-Speed 5985.03 samples/sec Loss 7.9591 LearningRate 0.1236 Epoch: 9 Global Step: 103630 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-08 16:35:00,641-Speed 5972.18 samples/sec Loss 7.9082 LearningRate 0.1236 Epoch: 9 Global Step: 103640 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-08 16:35:07,502-Speed 5971.43 samples/sec Loss 7.8994 LearningRate 0.1236 Epoch: 9 Global Step: 103650 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:35:14,351-Speed 5981.18 samples/sec Loss 7.9169 LearningRate 0.1235 Epoch: 9 Global Step: 103660 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:35:21,203-Speed 5978.94 samples/sec Loss 7.8780 LearningRate 0.1235 Epoch: 9 Global Step: 103670 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:35:28,057-Speed 5976.85 samples/sec Loss 7.8454 LearningRate 0.1235 Epoch: 9 Global Step: 103680 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:35:34,903-Speed 5986.11 samples/sec Loss 7.9296 LearningRate 0.1235 Epoch: 9 Global Step: 103690 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:36:00,018-Speed 1630.93 samples/sec Loss 7.8552 LearningRate 0.1234 Epoch: 10 Global Step: 103700 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:36:06,869-Speed 5980.26 samples/sec Loss 7.8226 LearningRate 0.1234 Epoch: 10 Global Step: 103710 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:36:13,718-Speed 5982.54 samples/sec Loss 7.9298 LearningRate 0.1234 Epoch: 10 Global Step: 103720 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:36:20,545-Speed 6000.31 samples/sec Loss 7.8478 LearningRate 0.1234 Epoch: 10 Global Step: 103730 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:36:27,377-Speed 5996.92 samples/sec Loss 7.9069 LearningRate 0.1233 Epoch: 10 Global Step: 103740 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:36:34,211-Speed 5996.24 samples/sec Loss 7.8546 LearningRate 0.1233 Epoch: 10 Global Step: 103750 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:36:41,059-Speed 5982.45 samples/sec Loss 7.9140 LearningRate 0.1233 Epoch: 10 Global Step: 103760 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:36:47,913-Speed 5979.46 samples/sec Loss 7.8489 LearningRate 0.1233 Epoch: 10 Global Step: 103770 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:36:54,763-Speed 5982.04 samples/sec Loss 7.7977 LearningRate 0.1232 Epoch: 10 Global Step: 103780 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:37:01,623-Speed 5971.75 samples/sec Loss 7.8381 LearningRate 0.1232 Epoch: 10 Global Step: 103790 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:37:08,498-Speed 5960.00 samples/sec Loss 7.8763 LearningRate 0.1232 Epoch: 10 Global Step: 103800 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:37:15,355-Speed 5975.73 samples/sec Loss 7.8645 LearningRate 0.1232 Epoch: 10 Global Step: 103810 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:37:22,223-Speed 5964.56 samples/sec Loss 7.8491 LearningRate 0.1231 Epoch: 10 Global Step: 103820 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:37:29,103-Speed 5955.21 samples/sec Loss 7.8099 LearningRate 0.1231 Epoch: 10 Global Step: 103830 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:37:35,975-Speed 5961.87 samples/sec Loss 7.8310 LearningRate 0.1231 Epoch: 10 Global Step: 103840 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:37:42,830-Speed 5975.99 samples/sec Loss 7.9224 LearningRate 0.1231 Epoch: 10 Global Step: 103850 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-08 16:37:49,712-Speed 5953.64 samples/sec Loss 7.7877 LearningRate 0.1231 Epoch: 10 Global Step: 103860 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:37:56,585-Speed 5960.52 samples/sec Loss 7.8722 LearningRate 0.1230 Epoch: 10 Global Step: 103870 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:38:03,453-Speed 5965.16 samples/sec Loss 7.8317 LearningRate 0.1230 Epoch: 10 Global Step: 103880 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:38:10,334-Speed 5954.43 samples/sec Loss 7.8749 LearningRate 0.1230 Epoch: 10 Global Step: 103890 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:38:17,227-Speed 5943.44 samples/sec Loss 7.8126 LearningRate 0.1230 Epoch: 10 Global Step: 103900 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:38:24,111-Speed 5951.52 samples/sec Loss 7.8090 LearningRate 0.1229 Epoch: 10 Global Step: 103910 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:38:30,976-Speed 5967.44 samples/sec Loss 7.8723 LearningRate 0.1229 Epoch: 10 Global Step: 103920 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:38:37,853-Speed 5958.59 samples/sec Loss 7.7899 LearningRate 0.1229 Epoch: 10 Global Step: 103930 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:38:44,739-Speed 5949.53 samples/sec Loss 7.8019 LearningRate 0.1229 Epoch: 10 Global Step: 103940 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:38:51,608-Speed 5966.37 samples/sec Loss 7.8786 LearningRate 0.1228 Epoch: 10 Global Step: 103950 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:38:58,464-Speed 5978.81 samples/sec Loss 7.8657 LearningRate 0.1228 Epoch: 10 Global Step: 103960 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:39:05,330-Speed 5966.69 samples/sec Loss 7.8031 LearningRate 0.1228 Epoch: 10 Global Step: 103970 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:39:12,471-Speed 5744.18 samples/sec Loss 7.8580 LearningRate 0.1228 Epoch: 10 Global Step: 103980 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:39:19,377-Speed 5932.93 samples/sec Loss 7.8545 LearningRate 0.1227 Epoch: 10 Global Step: 103990 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:39:26,382-Speed 5848.22 samples/sec Loss 7.8184 LearningRate 0.1227 Epoch: 10 Global Step: 104000 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:39:33,335-Speed 5892.44 samples/sec Loss 7.8395 LearningRate 0.1227 Epoch: 10 Global Step: 104010 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:39:40,214-Speed 5956.32 samples/sec Loss 7.9562 LearningRate 0.1227 Epoch: 10 Global Step: 104020 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:39:47,081-Speed 5965.39 samples/sec Loss 7.8951 LearningRate 0.1226 Epoch: 10 Global Step: 104030 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:39:53,936-Speed 5976.02 samples/sec Loss 7.8949 LearningRate 0.1226 Epoch: 10 Global Step: 104040 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:40:00,806-Speed 5965.35 samples/sec Loss 7.8579 LearningRate 0.1226 Epoch: 10 Global Step: 104050 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:40:07,662-Speed 5974.42 samples/sec Loss 7.8889 LearningRate 0.1226 Epoch: 10 Global Step: 104060 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:40:14,534-Speed 5961.78 samples/sec Loss 7.8290 LearningRate 0.1226 Epoch: 10 Global Step: 104070 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:40:21,391-Speed 5974.69 samples/sec Loss 7.8852 LearningRate 0.1225 Epoch: 10 Global Step: 104080 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:40:28,260-Speed 5963.90 samples/sec Loss 7.8842 LearningRate 0.1225 Epoch: 10 Global Step: 104090 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:40:35,127-Speed 5967.40 samples/sec Loss 7.8853 LearningRate 0.1225 Epoch: 10 Global Step: 104100 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:40:41,992-Speed 5967.04 samples/sec Loss 7.7875 LearningRate 0.1225 Epoch: 10 Global Step: 104110 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:40:48,859-Speed 5965.60 samples/sec Loss 7.9328 LearningRate 0.1224 Epoch: 10 Global Step: 104120 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:40:55,715-Speed 5975.83 samples/sec Loss 7.8537 LearningRate 0.1224 Epoch: 10 Global Step: 104130 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:41:02,579-Speed 5969.17 samples/sec Loss 7.7897 LearningRate 0.1224 Epoch: 10 Global Step: 104140 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:41:09,434-Speed 5976.22 samples/sec Loss 7.8531 LearningRate 0.1224 Epoch: 10 Global Step: 104150 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:41:16,305-Speed 5962.37 samples/sec Loss 7.8285 LearningRate 0.1223 Epoch: 10 Global Step: 104160 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:41:23,161-Speed 5976.32 samples/sec Loss 7.8777 LearningRate 0.1223 Epoch: 10 Global Step: 104170 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:41:30,006-Speed 5984.61 samples/sec Loss 7.8771 LearningRate 0.1223 Epoch: 10 Global Step: 104180 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:41:36,878-Speed 5961.60 samples/sec Loss 7.7752 LearningRate 0.1223 Epoch: 10 Global Step: 104190 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:41:43,738-Speed 5972.50 samples/sec Loss 7.8373 LearningRate 0.1222 Epoch: 10 Global Step: 104200 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:41:50,601-Speed 5969.06 samples/sec Loss 7.8369 LearningRate 0.1222 Epoch: 10 Global Step: 104210 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:41:57,494-Speed 5943.45 samples/sec Loss 7.7981 LearningRate 0.1222 Epoch: 10 Global Step: 104220 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:42:04,350-Speed 5976.03 samples/sec Loss 7.9214 LearningRate 0.1222 Epoch: 10 Global Step: 104230 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:42:11,219-Speed 5964.79 samples/sec Loss 7.8069 LearningRate 0.1222 Epoch: 10 Global Step: 104240 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:42:18,099-Speed 5956.02 samples/sec Loss 7.8003 LearningRate 0.1221 Epoch: 10 Global Step: 104250 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:42:24,948-Speed 5982.22 samples/sec Loss 7.8577 LearningRate 0.1221 Epoch: 10 Global Step: 104260 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:42:31,811-Speed 5969.77 samples/sec Loss 7.8130 LearningRate 0.1221 Epoch: 10 Global Step: 104270 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:42:38,663-Speed 5979.12 samples/sec Loss 7.8050 LearningRate 0.1221 Epoch: 10 Global Step: 104280 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:42:45,513-Speed 5981.93 samples/sec Loss 7.8682 LearningRate 0.1220 Epoch: 10 Global Step: 104290 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:42:52,643-Speed 5746.11 samples/sec Loss 7.7937 LearningRate 0.1220 Epoch: 10 Global Step: 104300 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:42:59,503-Speed 5971.97 samples/sec Loss 7.8434 LearningRate 0.1220 Epoch: 10 Global Step: 104310 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:43:06,350-Speed 5983.33 samples/sec Loss 7.7789 LearningRate 0.1220 Epoch: 10 Global Step: 104320 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:43:13,196-Speed 5983.66 samples/sec Loss 7.8892 LearningRate 0.1219 Epoch: 10 Global Step: 104330 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:43:20,040-Speed 5986.01 samples/sec Loss 7.8188 LearningRate 0.1219 Epoch: 10 Global Step: 104340 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:43:26,900-Speed 5972.12 samples/sec Loss 7.8439 LearningRate 0.1219 Epoch: 10 Global Step: 104350 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:43:33,760-Speed 5971.41 samples/sec Loss 7.8626 LearningRate 0.1219 Epoch: 10 Global Step: 104360 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:43:40,613-Speed 5978.00 samples/sec Loss 7.7153 LearningRate 0.1218 Epoch: 10 Global Step: 104370 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:43:47,484-Speed 5963.74 samples/sec Loss 7.7848 LearningRate 0.1218 Epoch: 10 Global Step: 104380 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:43:54,343-Speed 5972.89 samples/sec Loss 7.7993 LearningRate 0.1218 Epoch: 10 Global Step: 104390 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:44:01,208-Speed 5967.88 samples/sec Loss 7.8163 LearningRate 0.1218 Epoch: 10 Global Step: 104400 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:44:08,074-Speed 5967.38 samples/sec Loss 7.7668 LearningRate 0.1217 Epoch: 10 Global Step: 104410 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:44:14,943-Speed 5963.81 samples/sec Loss 7.9083 LearningRate 0.1217 Epoch: 10 Global Step: 104420 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:44:21,815-Speed 5962.23 samples/sec Loss 7.8113 LearningRate 0.1217 Epoch: 10 Global Step: 104430 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:44:28,676-Speed 5970.68 samples/sec Loss 7.8712 LearningRate 0.1217 Epoch: 10 Global Step: 104440 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:44:35,533-Speed 5975.18 samples/sec Loss 7.8008 LearningRate 0.1217 Epoch: 10 Global Step: 104450 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:44:42,406-Speed 5961.87 samples/sec Loss 7.7943 LearningRate 0.1216 Epoch: 10 Global Step: 104460 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:44:49,262-Speed 5975.08 samples/sec Loss 7.7864 LearningRate 0.1216 Epoch: 10 Global Step: 104470 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:44:56,107-Speed 5985.01 samples/sec Loss 7.7578 LearningRate 0.1216 Epoch: 10 Global Step: 104480 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:45:02,953-Speed 5984.49 samples/sec Loss 7.8724 LearningRate 0.1216 Epoch: 10 Global Step: 104490 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:45:09,906-Speed 5891.95 samples/sec Loss 7.8203 LearningRate 0.1215 Epoch: 10 Global Step: 104500 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-08 16:45:16,748-Speed 5987.73 samples/sec Loss 7.7791 LearningRate 0.1215 Epoch: 10 Global Step: 104510 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:45:23,597-Speed 5981.98 samples/sec Loss 7.8335 LearningRate 0.1215 Epoch: 10 Global Step: 104520 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:45:30,470-Speed 5960.85 samples/sec Loss 7.7820 LearningRate 0.1215 Epoch: 10 Global Step: 104530 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:45:37,329-Speed 5972.37 samples/sec Loss 7.7838 LearningRate 0.1214 Epoch: 10 Global Step: 104540 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:45:44,176-Speed 5984.17 samples/sec Loss 7.8623 LearningRate 0.1214 Epoch: 10 Global Step: 104550 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:45:51,021-Speed 5985.24 samples/sec Loss 7.7921 LearningRate 0.1214 Epoch: 10 Global Step: 104560 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:45:57,866-Speed 5985.56 samples/sec Loss 7.8519 LearningRate 0.1214 Epoch: 10 Global Step: 104570 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:46:04,708-Speed 5987.55 samples/sec Loss 7.8423 LearningRate 0.1213 Epoch: 10 Global Step: 104580 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-01-08 16:46:11,562-Speed 5978.05 samples/sec Loss 7.8276 LearningRate 0.1213 Epoch: 10 Global Step: 104590 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-01-08 16:46:18,440-Speed 5956.23 samples/sec Loss 7.8537 LearningRate 0.1213 Epoch: 10 Global Step: 104600 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-01-08 16:46:25,301-Speed 5970.82 samples/sec Loss 7.8911 LearningRate 0.1213 Epoch: 10 Global Step: 104610 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-01-08 16:46:32,173-Speed 5961.91 samples/sec Loss 7.8001 LearningRate 0.1213 Epoch: 10 Global Step: 104620 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-01-08 16:46:39,023-Speed 5979.81 samples/sec Loss 7.7886 LearningRate 0.1212 Epoch: 10 Global Step: 104630 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-01-08 16:46:45,885-Speed 5971.96 samples/sec Loss 7.7248 LearningRate 0.1212 Epoch: 10 Global Step: 104640 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-01-08 16:46:52,748-Speed 5968.77 samples/sec Loss 7.7991 LearningRate 0.1212 Epoch: 10 Global Step: 104650 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-01-08 16:46:59,600-Speed 5978.28 samples/sec Loss 7.7700 LearningRate 0.1212 Epoch: 10 Global Step: 104660 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-01-08 16:47:06,450-Speed 5983.21 samples/sec Loss 7.7955 LearningRate 0.1211 Epoch: 10 Global Step: 104670 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-01-08 16:47:13,336-Speed 5950.07 samples/sec Loss 7.7611 LearningRate 0.1211 Epoch: 10 Global Step: 104680 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-01-08 16:47:20,205-Speed 5963.83 samples/sec Loss 7.7592 LearningRate 0.1211 Epoch: 10 Global Step: 104690 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-01-08 16:47:27,109-Speed 5934.15 samples/sec Loss 7.7672 LearningRate 0.1211 Epoch: 10 Global Step: 104700 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-01-08 16:47:33,974-Speed 5968.80 samples/sec Loss 7.7771 LearningRate 0.1210 Epoch: 10 Global Step: 104710 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-01-08 16:47:40,842-Speed 5964.59 samples/sec Loss 7.8222 LearningRate 0.1210 Epoch: 10 Global Step: 104720 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-01-08 16:47:47,688-Speed 5984.31 samples/sec Loss 7.8307 LearningRate 0.1210 Epoch: 10 Global Step: 104730 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-01-08 16:47:54,540-Speed 5979.36 samples/sec Loss 7.8445 LearningRate 0.1210 Epoch: 10 Global Step: 104740 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-01-08 16:48:01,392-Speed 5979.93 samples/sec Loss 7.7884 LearningRate 0.1209 Epoch: 10 Global Step: 104750 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-01-08 16:48:08,306-Speed 5925.59 samples/sec Loss 7.7746 LearningRate 0.1209 Epoch: 10 Global Step: 104760 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-01-08 16:48:15,181-Speed 5959.48 samples/sec Loss 7.8235 LearningRate 0.1209 Epoch: 10 Global Step: 104770 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-01-08 16:48:22,037-Speed 5975.05 samples/sec Loss 7.7935 LearningRate 0.1209 Epoch: 10 Global Step: 104780 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 16:48:28,925-Speed 5947.81 samples/sec Loss 7.7382 LearningRate 0.1209 Epoch: 10 Global Step: 104790 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 16:48:35,803-Speed 5957.86 samples/sec Loss 7.8635 LearningRate 0.1208 Epoch: 10 Global Step: 104800 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 16:48:42,763-Speed 5885.59 samples/sec Loss 7.7461 LearningRate 0.1208 Epoch: 10 Global Step: 104810 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 16:48:49,722-Speed 5887.50 samples/sec Loss 7.7941 LearningRate 0.1208 Epoch: 10 Global Step: 104820 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 16:48:56,675-Speed 5893.38 samples/sec Loss 7.8460 LearningRate 0.1208 Epoch: 10 Global Step: 104830 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 16:49:03,523-Speed 5982.72 samples/sec Loss 7.8249 LearningRate 0.1207 Epoch: 10 Global Step: 104840 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 16:49:10,387-Speed 5970.36 samples/sec Loss 7.7997 LearningRate 0.1207 Epoch: 10 Global Step: 104850 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 16:49:17,290-Speed 5934.86 samples/sec Loss 7.7444 LearningRate 0.1207 Epoch: 10 Global Step: 104860 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 16:49:24,135-Speed 5984.64 samples/sec Loss 7.8606 LearningRate 0.1207 Epoch: 10 Global Step: 104870 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 16:49:30,994-Speed 5972.86 samples/sec Loss 7.8015 LearningRate 0.1206 Epoch: 10 Global Step: 104880 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:49:37,849-Speed 5976.72 samples/sec Loss 7.7298 LearningRate 0.1206 Epoch: 10 Global Step: 104890 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:49:44,774-Speed 5916.91 samples/sec Loss 7.7634 LearningRate 0.1206 Epoch: 10 Global Step: 104900 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:49:51,644-Speed 5963.71 samples/sec Loss 7.7838 LearningRate 0.1206 Epoch: 10 Global Step: 104910 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:49:58,590-Speed 5900.79 samples/sec Loss 7.7580 LearningRate 0.1205 Epoch: 10 Global Step: 104920 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:50:05,458-Speed 5965.01 samples/sec Loss 7.8448 LearningRate 0.1205 Epoch: 10 Global Step: 104930 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:50:12,308-Speed 5980.77 samples/sec Loss 7.7529 LearningRate 0.1205 Epoch: 10 Global Step: 104940 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:50:19,162-Speed 5977.70 samples/sec Loss 7.7780 LearningRate 0.1205 Epoch: 10 Global Step: 104950 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:50:26,024-Speed 5969.91 samples/sec Loss 7.7843 LearningRate 0.1205 Epoch: 10 Global Step: 104960 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:50:32,873-Speed 5981.68 samples/sec Loss 7.6963 LearningRate 0.1204 Epoch: 10 Global Step: 104970 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:50:39,773-Speed 5938.00 samples/sec Loss 7.7658 LearningRate 0.1204 Epoch: 10 Global Step: 104980 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:50:46,634-Speed 5971.05 samples/sec Loss 7.7526 LearningRate 0.1204 Epoch: 10 Global Step: 104990 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:50:53,505-Speed 5962.87 samples/sec Loss 7.8529 LearningRate 0.1204 Epoch: 10 Global Step: 105000 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:51:20,317-[lfw][105000]XNorm: 23.686709 Training: 2022-01-08 16:51:20,318-[lfw][105000]Accuracy-Flip: 0.99767+-0.00300 Training: 2022-01-08 16:51:20,319-[lfw][105000]Accuracy-Highest: 0.99767 Training: 2022-01-08 16:51:51,369-[cfp_fp][105000]XNorm: 20.803627 Training: 2022-01-08 16:51:51,370-[cfp_fp][105000]Accuracy-Flip: 0.98357+-0.00767 Training: 2022-01-08 16:51:51,371-[cfp_fp][105000]Accuracy-Highest: 0.98357 Training: 2022-01-08 16:52:18,162-[agedb_30][105000]XNorm: 23.038324 Training: 2022-01-08 16:52:18,163-[agedb_30][105000]Accuracy-Flip: 0.97200+-0.00710 Training: 2022-01-08 16:52:18,163-[agedb_30][105000]Accuracy-Highest: 0.97200 Training: 2022-01-08 16:52:25,009-Speed 447.64 samples/sec Loss 7.7539 LearningRate 0.1203 Epoch: 10 Global Step: 105010 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:52:31,852-Speed 5987.43 samples/sec Loss 7.7663 LearningRate 0.1203 Epoch: 10 Global Step: 105020 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:52:38,703-Speed 5979.69 samples/sec Loss 7.7631 LearningRate 0.1203 Epoch: 10 Global Step: 105030 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:52:45,553-Speed 5981.72 samples/sec Loss 7.7813 LearningRate 0.1203 Epoch: 10 Global Step: 105040 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:52:52,425-Speed 5961.42 samples/sec Loss 7.7248 LearningRate 0.1202 Epoch: 10 Global Step: 105050 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:52:59,282-Speed 5975.16 samples/sec Loss 7.7999 LearningRate 0.1202 Epoch: 10 Global Step: 105060 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:53:06,164-Speed 5953.05 samples/sec Loss 7.8065 LearningRate 0.1202 Epoch: 10 Global Step: 105070 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:53:13,017-Speed 5978.43 samples/sec Loss 7.8162 LearningRate 0.1202 Epoch: 10 Global Step: 105080 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-08 16:53:19,894-Speed 5956.80 samples/sec Loss 7.7137 LearningRate 0.1201 Epoch: 10 Global Step: 105090 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:53:26,793-Speed 5937.82 samples/sec Loss 7.7572 LearningRate 0.1201 Epoch: 10 Global Step: 105100 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:53:33,703-Speed 5928.74 samples/sec Loss 7.8124 LearningRate 0.1201 Epoch: 10 Global Step: 105110 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:53:40,605-Speed 5936.12 samples/sec Loss 7.7785 LearningRate 0.1201 Epoch: 10 Global Step: 105120 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:53:47,510-Speed 5933.75 samples/sec Loss 7.7972 LearningRate 0.1201 Epoch: 10 Global Step: 105130 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:53:54,380-Speed 5962.62 samples/sec Loss 7.7438 LearningRate 0.1200 Epoch: 10 Global Step: 105140 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:54:01,247-Speed 5966.05 samples/sec Loss 7.7831 LearningRate 0.1200 Epoch: 10 Global Step: 105150 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:54:08,120-Speed 5960.67 samples/sec Loss 7.7032 LearningRate 0.1200 Epoch: 10 Global Step: 105160 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:54:14,973-Speed 5978.14 samples/sec Loss 7.7520 LearningRate 0.1200 Epoch: 10 Global Step: 105170 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:54:21,821-Speed 5982.37 samples/sec Loss 7.7995 LearningRate 0.1199 Epoch: 10 Global Step: 105180 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:54:28,665-Speed 5986.31 samples/sec Loss 7.7746 LearningRate 0.1199 Epoch: 10 Global Step: 105190 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-08 16:54:35,560-Speed 5941.52 samples/sec Loss 7.7296 LearningRate 0.1199 Epoch: 10 Global Step: 105200 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-08 16:54:42,410-Speed 5980.21 samples/sec Loss 7.8644 LearningRate 0.1199 Epoch: 10 Global Step: 105210 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:54:49,259-Speed 5981.26 samples/sec Loss 7.8001 LearningRate 0.1198 Epoch: 10 Global Step: 105220 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:54:56,113-Speed 5976.96 samples/sec Loss 7.7169 LearningRate 0.1198 Epoch: 10 Global Step: 105230 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:55:02,974-Speed 5971.49 samples/sec Loss 7.7740 LearningRate 0.1198 Epoch: 10 Global Step: 105240 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:55:09,828-Speed 5977.18 samples/sec Loss 7.7600 LearningRate 0.1198 Epoch: 10 Global Step: 105250 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:55:16,702-Speed 5959.91 samples/sec Loss 7.7680 LearningRate 0.1197 Epoch: 10 Global Step: 105260 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:55:23,557-Speed 5978.76 samples/sec Loss 7.7349 LearningRate 0.1197 Epoch: 10 Global Step: 105270 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:55:30,401-Speed 5985.69 samples/sec Loss 7.7906 LearningRate 0.1197 Epoch: 10 Global Step: 105280 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:55:37,257-Speed 5975.46 samples/sec Loss 7.7649 LearningRate 0.1197 Epoch: 10 Global Step: 105290 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:55:44,124-Speed 5967.65 samples/sec Loss 7.7564 LearningRate 0.1197 Epoch: 10 Global Step: 105300 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:55:50,961-Speed 5991.95 samples/sec Loss 7.7807 LearningRate 0.1196 Epoch: 10 Global Step: 105310 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:55:57,854-Speed 5943.55 samples/sec Loss 7.7421 LearningRate 0.1196 Epoch: 10 Global Step: 105320 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:56:04,706-Speed 5978.54 samples/sec Loss 7.7369 LearningRate 0.1196 Epoch: 10 Global Step: 105330 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:56:11,579-Speed 5961.24 samples/sec Loss 7.7375 LearningRate 0.1196 Epoch: 10 Global Step: 105340 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:56:18,459-Speed 5954.24 samples/sec Loss 7.7297 LearningRate 0.1195 Epoch: 10 Global Step: 105350 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:56:25,303-Speed 5986.10 samples/sec Loss 7.7810 LearningRate 0.1195 Epoch: 10 Global Step: 105360 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:56:32,185-Speed 5956.82 samples/sec Loss 7.7060 LearningRate 0.1195 Epoch: 10 Global Step: 105370 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:56:39,033-Speed 5982.55 samples/sec Loss 7.7418 LearningRate 0.1195 Epoch: 10 Global Step: 105380 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:56:45,884-Speed 5979.97 samples/sec Loss 7.7826 LearningRate 0.1194 Epoch: 10 Global Step: 105390 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:56:52,765-Speed 5954.04 samples/sec Loss 7.6851 LearningRate 0.1194 Epoch: 10 Global Step: 105400 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:56:59,630-Speed 5967.93 samples/sec Loss 7.7721 LearningRate 0.1194 Epoch: 10 Global Step: 105410 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:57:06,483-Speed 5978.07 samples/sec Loss 7.7764 LearningRate 0.1194 Epoch: 10 Global Step: 105420 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:57:13,337-Speed 5977.11 samples/sec Loss 7.6915 LearningRate 0.1193 Epoch: 10 Global Step: 105430 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:57:20,199-Speed 5970.40 samples/sec Loss 7.6853 LearningRate 0.1193 Epoch: 10 Global Step: 105440 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:57:27,076-Speed 5957.21 samples/sec Loss 7.7087 LearningRate 0.1193 Epoch: 10 Global Step: 105450 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:57:33,967-Speed 5944.92 samples/sec Loss 7.7353 LearningRate 0.1193 Epoch: 10 Global Step: 105460 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:57:40,823-Speed 5975.48 samples/sec Loss 7.8076 LearningRate 0.1193 Epoch: 10 Global Step: 105470 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:57:47,780-Speed 5891.43 samples/sec Loss 7.7773 LearningRate 0.1192 Epoch: 10 Global Step: 105480 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:57:54,674-Speed 5942.54 samples/sec Loss 7.7188 LearningRate 0.1192 Epoch: 10 Global Step: 105490 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:58:01,559-Speed 5949.90 samples/sec Loss 7.7604 LearningRate 0.1192 Epoch: 10 Global Step: 105500 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:58:08,413-Speed 5977.43 samples/sec Loss 7.7393 LearningRate 0.1192 Epoch: 10 Global Step: 105510 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 16:58:15,279-Speed 5966.45 samples/sec Loss 7.7902 LearningRate 0.1191 Epoch: 10 Global Step: 105520 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:58:22,140-Speed 5971.52 samples/sec Loss 7.7542 LearningRate 0.1191 Epoch: 10 Global Step: 105530 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:58:29,053-Speed 5926.67 samples/sec Loss 7.7906 LearningRate 0.1191 Epoch: 10 Global Step: 105540 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:58:35,940-Speed 5952.45 samples/sec Loss 7.7952 LearningRate 0.1191 Epoch: 10 Global Step: 105550 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:58:42,795-Speed 5975.66 samples/sec Loss 7.7722 LearningRate 0.1190 Epoch: 10 Global Step: 105560 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:58:49,674-Speed 5955.97 samples/sec Loss 7.7423 LearningRate 0.1190 Epoch: 10 Global Step: 105570 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:58:56,527-Speed 5978.03 samples/sec Loss 7.6746 LearningRate 0.1190 Epoch: 10 Global Step: 105580 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:59:03,388-Speed 5970.89 samples/sec Loss 7.7506 LearningRate 0.1190 Epoch: 10 Global Step: 105590 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:59:10,236-Speed 5982.65 samples/sec Loss 7.6727 LearningRate 0.1190 Epoch: 10 Global Step: 105600 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:59:17,140-Speed 5935.55 samples/sec Loss 7.7232 LearningRate 0.1189 Epoch: 10 Global Step: 105610 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:59:24,001-Speed 5970.90 samples/sec Loss 7.7251 LearningRate 0.1189 Epoch: 10 Global Step: 105620 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:59:30,919-Speed 5923.08 samples/sec Loss 7.7755 LearningRate 0.1189 Epoch: 10 Global Step: 105630 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:59:37,797-Speed 5956.93 samples/sec Loss 7.7051 LearningRate 0.1189 Epoch: 10 Global Step: 105640 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:59:44,672-Speed 5959.02 samples/sec Loss 7.7302 LearningRate 0.1188 Epoch: 10 Global Step: 105650 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:59:51,533-Speed 5970.62 samples/sec Loss 7.7080 LearningRate 0.1188 Epoch: 10 Global Step: 105660 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 16:59:58,394-Speed 5978.01 samples/sec Loss 7.7756 LearningRate 0.1188 Epoch: 10 Global Step: 105670 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:00:05,245-Speed 5978.95 samples/sec Loss 7.7581 LearningRate 0.1188 Epoch: 10 Global Step: 105680 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:00:12,111-Speed 5967.14 samples/sec Loss 7.7674 LearningRate 0.1187 Epoch: 10 Global Step: 105690 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:00:19,032-Speed 5918.75 samples/sec Loss 7.7187 LearningRate 0.1187 Epoch: 10 Global Step: 105700 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:00:25,886-Speed 5977.22 samples/sec Loss 7.7752 LearningRate 0.1187 Epoch: 10 Global Step: 105710 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:00:32,754-Speed 5975.23 samples/sec Loss 7.7314 LearningRate 0.1187 Epoch: 10 Global Step: 105720 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:00:39,628-Speed 5959.83 samples/sec Loss 7.7011 LearningRate 0.1186 Epoch: 10 Global Step: 105730 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:00:46,488-Speed 5971.41 samples/sec Loss 7.7165 LearningRate 0.1186 Epoch: 10 Global Step: 105740 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:00:53,345-Speed 5975.12 samples/sec Loss 7.7058 LearningRate 0.1186 Epoch: 10 Global Step: 105750 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:01:00,207-Speed 5970.30 samples/sec Loss 7.7152 LearningRate 0.1186 Epoch: 10 Global Step: 105760 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:01:07,066-Speed 5972.80 samples/sec Loss 7.7125 LearningRate 0.1186 Epoch: 10 Global Step: 105770 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:01:13,946-Speed 5954.58 samples/sec Loss 7.6962 LearningRate 0.1185 Epoch: 10 Global Step: 105780 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:01:20,818-Speed 5961.73 samples/sec Loss 7.6909 LearningRate 0.1185 Epoch: 10 Global Step: 105790 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:01:27,690-Speed 5962.01 samples/sec Loss 7.7002 LearningRate 0.1185 Epoch: 10 Global Step: 105800 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:01:34,546-Speed 5975.71 samples/sec Loss 7.6535 LearningRate 0.1185 Epoch: 10 Global Step: 105810 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:01:41,419-Speed 5960.75 samples/sec Loss 7.6644 LearningRate 0.1184 Epoch: 10 Global Step: 105820 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:01:48,278-Speed 5973.36 samples/sec Loss 7.8131 LearningRate 0.1184 Epoch: 10 Global Step: 105830 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:01:55,150-Speed 5963.12 samples/sec Loss 7.6628 LearningRate 0.1184 Epoch: 10 Global Step: 105840 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:02:02,030-Speed 5955.22 samples/sec Loss 7.7403 LearningRate 0.1184 Epoch: 10 Global Step: 105850 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:02:09,009-Speed 5870.12 samples/sec Loss 7.6850 LearningRate 0.1183 Epoch: 10 Global Step: 105860 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:02:15,897-Speed 5948.00 samples/sec Loss 7.7018 LearningRate 0.1183 Epoch: 10 Global Step: 105870 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:02:22,780-Speed 5953.30 samples/sec Loss 7.6315 LearningRate 0.1183 Epoch: 10 Global Step: 105880 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:02:29,674-Speed 5942.49 samples/sec Loss 7.7083 LearningRate 0.1183 Epoch: 10 Global Step: 105890 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:02:36,637-Speed 5883.72 samples/sec Loss 7.6744 LearningRate 0.1183 Epoch: 10 Global Step: 105900 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:02:43,525-Speed 5948.54 samples/sec Loss 7.7462 LearningRate 0.1182 Epoch: 10 Global Step: 105910 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:02:50,399-Speed 5959.38 samples/sec Loss 7.7075 LearningRate 0.1182 Epoch: 10 Global Step: 105920 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:02:57,273-Speed 5962.50 samples/sec Loss 7.7263 LearningRate 0.1182 Epoch: 10 Global Step: 105930 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:03:04,133-Speed 5972.16 samples/sec Loss 7.6920 LearningRate 0.1182 Epoch: 10 Global Step: 105940 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:03:10,999-Speed 5966.24 samples/sec Loss 7.7662 LearningRate 0.1181 Epoch: 10 Global Step: 105950 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:03:17,888-Speed 5946.89 samples/sec Loss 7.7201 LearningRate 0.1181 Epoch: 10 Global Step: 105960 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:03:24,756-Speed 5965.93 samples/sec Loss 7.6861 LearningRate 0.1181 Epoch: 10 Global Step: 105970 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:03:31,611-Speed 5975.37 samples/sec Loss 7.6781 LearningRate 0.1181 Epoch: 10 Global Step: 105980 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:03:38,484-Speed 5961.51 samples/sec Loss 7.7123 LearningRate 0.1180 Epoch: 10 Global Step: 105990 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:03:45,364-Speed 5954.35 samples/sec Loss 7.7198 LearningRate 0.1180 Epoch: 10 Global Step: 106000 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-08 17:03:52,244-Speed 5954.39 samples/sec Loss 7.6522 LearningRate 0.1180 Epoch: 10 Global Step: 106010 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-08 17:03:59,094-Speed 5980.38 samples/sec Loss 7.7397 LearningRate 0.1180 Epoch: 10 Global Step: 106020 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:04:05,947-Speed 5978.69 samples/sec Loss 7.6569 LearningRate 0.1179 Epoch: 10 Global Step: 106030 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:04:12,818-Speed 5961.09 samples/sec Loss 7.6347 LearningRate 0.1179 Epoch: 10 Global Step: 106040 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:04:19,680-Speed 5970.53 samples/sec Loss 7.7201 LearningRate 0.1179 Epoch: 10 Global Step: 106050 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:04:26,563-Speed 5952.76 samples/sec Loss 7.7405 LearningRate 0.1179 Epoch: 10 Global Step: 106060 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:04:33,428-Speed 5966.91 samples/sec Loss 7.7301 LearningRate 0.1179 Epoch: 10 Global Step: 106070 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:04:40,282-Speed 5977.23 samples/sec Loss 7.6951 LearningRate 0.1178 Epoch: 10 Global Step: 106080 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:04:47,139-Speed 5973.92 samples/sec Loss 7.6323 LearningRate 0.1178 Epoch: 10 Global Step: 106090 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:04:54,001-Speed 5970.16 samples/sec Loss 7.7124 LearningRate 0.1178 Epoch: 10 Global Step: 106100 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:05:00,869-Speed 5965.13 samples/sec Loss 7.6989 LearningRate 0.1178 Epoch: 10 Global Step: 106110 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:05:07,719-Speed 5980.79 samples/sec Loss 7.7445 LearningRate 0.1177 Epoch: 10 Global Step: 106120 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:05:14,574-Speed 5975.75 samples/sec Loss 7.7069 LearningRate 0.1177 Epoch: 10 Global Step: 106130 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:05:21,436-Speed 5970.37 samples/sec Loss 7.7198 LearningRate 0.1177 Epoch: 10 Global Step: 106140 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:05:28,298-Speed 5970.33 samples/sec Loss 7.7703 LearningRate 0.1177 Epoch: 10 Global Step: 106150 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:05:35,150-Speed 5978.54 samples/sec Loss 7.7007 LearningRate 0.1176 Epoch: 10 Global Step: 106160 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:05:42,006-Speed 5975.73 samples/sec Loss 7.7396 LearningRate 0.1176 Epoch: 10 Global Step: 106170 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:05:48,881-Speed 5959.87 samples/sec Loss 7.6975 LearningRate 0.1176 Epoch: 10 Global Step: 106180 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:05:55,734-Speed 5977.34 samples/sec Loss 7.6846 LearningRate 0.1176 Epoch: 10 Global Step: 106190 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:06:02,595-Speed 5970.96 samples/sec Loss 7.6993 LearningRate 0.1176 Epoch: 10 Global Step: 106200 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:06:09,447-Speed 5978.94 samples/sec Loss 7.6620 LearningRate 0.1175 Epoch: 10 Global Step: 106210 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:06:16,307-Speed 5971.76 samples/sec Loss 7.6843 LearningRate 0.1175 Epoch: 10 Global Step: 106220 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:06:23,169-Speed 5970.80 samples/sec Loss 7.6430 LearningRate 0.1175 Epoch: 10 Global Step: 106230 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-08 17:06:30,040-Speed 5964.89 samples/sec Loss 7.6151 LearningRate 0.1175 Epoch: 10 Global Step: 106240 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:06:36,911-Speed 5973.83 samples/sec Loss 7.7072 LearningRate 0.1174 Epoch: 10 Global Step: 106250 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:06:43,778-Speed 5966.39 samples/sec Loss 7.6748 LearningRate 0.1174 Epoch: 10 Global Step: 106260 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:06:50,665-Speed 5950.57 samples/sec Loss 7.7615 LearningRate 0.1174 Epoch: 10 Global Step: 106270 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:06:57,559-Speed 5941.96 samples/sec Loss 7.7429 LearningRate 0.1174 Epoch: 10 Global Step: 106280 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:07:04,456-Speed 5940.53 samples/sec Loss 7.7280 LearningRate 0.1173 Epoch: 10 Global Step: 106290 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:07:11,315-Speed 5972.39 samples/sec Loss 7.6690 LearningRate 0.1173 Epoch: 10 Global Step: 106300 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:07:18,169-Speed 5977.32 samples/sec Loss 7.6520 LearningRate 0.1173 Epoch: 10 Global Step: 106310 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:07:25,023-Speed 5976.64 samples/sec Loss 7.6662 LearningRate 0.1173 Epoch: 10 Global Step: 106320 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:07:31,888-Speed 5967.51 samples/sec Loss 7.6552 LearningRate 0.1173 Epoch: 10 Global Step: 106330 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:07:38,747-Speed 5972.91 samples/sec Loss 7.7265 LearningRate 0.1172 Epoch: 10 Global Step: 106340 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-08 17:07:45,594-Speed 5984.81 samples/sec Loss 7.6878 LearningRate 0.1172 Epoch: 10 Global Step: 106350 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:07:52,449-Speed 5976.85 samples/sec Loss 7.6678 LearningRate 0.1172 Epoch: 10 Global Step: 106360 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:07:59,318-Speed 5963.48 samples/sec Loss 7.6898 LearningRate 0.1172 Epoch: 10 Global Step: 106370 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:08:06,205-Speed 5949.20 samples/sec Loss 7.6352 LearningRate 0.1171 Epoch: 10 Global Step: 106380 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:08:13,060-Speed 5976.13 samples/sec Loss 7.6456 LearningRate 0.1171 Epoch: 10 Global Step: 106390 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:08:19,916-Speed 5975.35 samples/sec Loss 7.6492 LearningRate 0.1171 Epoch: 10 Global Step: 106400 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:08:26,772-Speed 5974.85 samples/sec Loss 7.6792 LearningRate 0.1171 Epoch: 10 Global Step: 106410 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:08:33,647-Speed 5959.46 samples/sec Loss 7.6883 LearningRate 0.1170 Epoch: 10 Global Step: 106420 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:08:40,503-Speed 5974.89 samples/sec Loss 7.7703 LearningRate 0.1170 Epoch: 10 Global Step: 106430 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:08:47,358-Speed 5975.92 samples/sec Loss 7.6734 LearningRate 0.1170 Epoch: 10 Global Step: 106440 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:08:54,206-Speed 5982.41 samples/sec Loss 7.7563 LearningRate 0.1170 Epoch: 10 Global Step: 106450 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:09:01,066-Speed 5971.15 samples/sec Loss 7.6586 LearningRate 0.1169 Epoch: 10 Global Step: 106460 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:09:07,920-Speed 5977.82 samples/sec Loss 7.6729 LearningRate 0.1169 Epoch: 10 Global Step: 106470 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:09:14,782-Speed 5970.05 samples/sec Loss 7.7336 LearningRate 0.1169 Epoch: 10 Global Step: 106480 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:09:21,653-Speed 5962.12 samples/sec Loss 7.7043 LearningRate 0.1169 Epoch: 10 Global Step: 106490 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:09:28,500-Speed 5983.04 samples/sec Loss 7.6495 LearningRate 0.1169 Epoch: 10 Global Step: 106500 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:09:35,348-Speed 5982.29 samples/sec Loss 7.7604 LearningRate 0.1168 Epoch: 10 Global Step: 106510 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:09:42,264-Speed 5923.77 samples/sec Loss 7.6328 LearningRate 0.1168 Epoch: 10 Global Step: 106520 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:09:49,123-Speed 5973.15 samples/sec Loss 7.7044 LearningRate 0.1168 Epoch: 10 Global Step: 106530 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:09:55,980-Speed 5974.84 samples/sec Loss 7.5745 LearningRate 0.1168 Epoch: 10 Global Step: 106540 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:10:02,837-Speed 5973.89 samples/sec Loss 7.6505 LearningRate 0.1167 Epoch: 10 Global Step: 106550 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:10:09,688-Speed 5979.54 samples/sec Loss 7.6909 LearningRate 0.1167 Epoch: 10 Global Step: 106560 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-08 17:10:16,529-Speed 5988.70 samples/sec Loss 7.6717 LearningRate 0.1167 Epoch: 10 Global Step: 106570 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:10:23,395-Speed 5967.11 samples/sec Loss 7.6604 LearningRate 0.1167 Epoch: 10 Global Step: 106580 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:10:30,256-Speed 5970.43 samples/sec Loss 7.6437 LearningRate 0.1166 Epoch: 10 Global Step: 106590 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:10:37,122-Speed 5967.33 samples/sec Loss 7.6766 LearningRate 0.1166 Epoch: 10 Global Step: 106600 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:10:43,979-Speed 5974.39 samples/sec Loss 7.6051 LearningRate 0.1166 Epoch: 10 Global Step: 106610 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:10:50,844-Speed 5968.12 samples/sec Loss 7.6147 LearningRate 0.1166 Epoch: 10 Global Step: 106620 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:10:57,690-Speed 5984.25 samples/sec Loss 7.6288 LearningRate 0.1166 Epoch: 10 Global Step: 106630 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:11:04,565-Speed 5958.41 samples/sec Loss 7.7263 LearningRate 0.1165 Epoch: 10 Global Step: 106640 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:11:11,416-Speed 5979.82 samples/sec Loss 7.6863 LearningRate 0.1165 Epoch: 10 Global Step: 106650 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:11:18,294-Speed 5958.91 samples/sec Loss 7.6727 LearningRate 0.1165 Epoch: 10 Global Step: 106660 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:11:25,139-Speed 5985.26 samples/sec Loss 7.6911 LearningRate 0.1165 Epoch: 10 Global Step: 106670 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-08 17:11:31,985-Speed 5983.91 samples/sec Loss 7.6294 LearningRate 0.1164 Epoch: 10 Global Step: 106680 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:11:38,836-Speed 5979.89 samples/sec Loss 7.6230 LearningRate 0.1164 Epoch: 10 Global Step: 106690 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:11:45,697-Speed 5971.29 samples/sec Loss 7.6678 LearningRate 0.1164 Epoch: 10 Global Step: 106700 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:11:52,649-Speed 5892.67 samples/sec Loss 7.6990 LearningRate 0.1164 Epoch: 10 Global Step: 106710 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:11:59,510-Speed 5971.28 samples/sec Loss 7.6900 LearningRate 0.1163 Epoch: 10 Global Step: 106720 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:12:06,354-Speed 5985.40 samples/sec Loss 7.6168 LearningRate 0.1163 Epoch: 10 Global Step: 106730 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:12:13,219-Speed 5967.51 samples/sec Loss 7.6891 LearningRate 0.1163 Epoch: 10 Global Step: 106740 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:12:20,091-Speed 5961.60 samples/sec Loss 7.6130 LearningRate 0.1163 Epoch: 10 Global Step: 106750 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:12:26,939-Speed 5982.21 samples/sec Loss 7.6476 LearningRate 0.1163 Epoch: 10 Global Step: 106760 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:12:33,788-Speed 5981.57 samples/sec Loss 7.7045 LearningRate 0.1162 Epoch: 10 Global Step: 106770 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:12:40,632-Speed 5985.81 samples/sec Loss 7.6655 LearningRate 0.1162 Epoch: 10 Global Step: 106780 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:12:47,485-Speed 5977.44 samples/sec Loss 7.6620 LearningRate 0.1162 Epoch: 10 Global Step: 106790 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:12:54,342-Speed 5974.29 samples/sec Loss 7.6621 LearningRate 0.1162 Epoch: 10 Global Step: 106800 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:13:01,207-Speed 5968.06 samples/sec Loss 7.6719 LearningRate 0.1161 Epoch: 10 Global Step: 106810 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:13:08,058-Speed 5979.36 samples/sec Loss 7.6303 LearningRate 0.1161 Epoch: 10 Global Step: 106820 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:13:14,917-Speed 5973.15 samples/sec Loss 7.7005 LearningRate 0.1161 Epoch: 10 Global Step: 106830 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:13:21,775-Speed 5973.80 samples/sec Loss 7.6301 LearningRate 0.1161 Epoch: 10 Global Step: 106840 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:13:28,624-Speed 5981.36 samples/sec Loss 7.7048 LearningRate 0.1160 Epoch: 10 Global Step: 106850 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:13:35,474-Speed 5980.50 samples/sec Loss 7.6204 LearningRate 0.1160 Epoch: 10 Global Step: 106860 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:13:42,341-Speed 5965.99 samples/sec Loss 7.7093 LearningRate 0.1160 Epoch: 10 Global Step: 106870 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:13:49,208-Speed 5965.43 samples/sec Loss 7.6454 LearningRate 0.1160 Epoch: 10 Global Step: 106880 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:13:56,066-Speed 5974.00 samples/sec Loss 7.6008 LearningRate 0.1160 Epoch: 10 Global Step: 106890 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:14:02,930-Speed 5968.15 samples/sec Loss 7.6247 LearningRate 0.1159 Epoch: 10 Global Step: 106900 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:14:09,818-Speed 5948.16 samples/sec Loss 7.5985 LearningRate 0.1159 Epoch: 10 Global Step: 106910 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:14:16,680-Speed 5970.45 samples/sec Loss 7.5653 LearningRate 0.1159 Epoch: 10 Global Step: 106920 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:14:23,527-Speed 5983.41 samples/sec Loss 7.6057 LearningRate 0.1159 Epoch: 10 Global Step: 106930 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:14:30,376-Speed 5981.17 samples/sec Loss 7.5915 LearningRate 0.1158 Epoch: 10 Global Step: 106940 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:14:37,236-Speed 5974.88 samples/sec Loss 7.6235 LearningRate 0.1158 Epoch: 10 Global Step: 106950 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:14:44,093-Speed 5974.59 samples/sec Loss 7.6165 LearningRate 0.1158 Epoch: 10 Global Step: 106960 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:14:50,959-Speed 5966.20 samples/sec Loss 7.6819 LearningRate 0.1158 Epoch: 10 Global Step: 106970 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:14:57,816-Speed 5974.38 samples/sec Loss 7.7382 LearningRate 0.1157 Epoch: 10 Global Step: 106980 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:15:04,672-Speed 5975.36 samples/sec Loss 7.5993 LearningRate 0.1157 Epoch: 10 Global Step: 106990 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:15:11,528-Speed 5974.89 samples/sec Loss 7.5872 LearningRate 0.1157 Epoch: 10 Global Step: 107000 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:15:18,387-Speed 5973.60 samples/sec Loss 7.6216 LearningRate 0.1157 Epoch: 10 Global Step: 107010 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:15:25,239-Speed 5979.26 samples/sec Loss 7.6063 LearningRate 0.1157 Epoch: 10 Global Step: 107020 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:15:32,124-Speed 5953.17 samples/sec Loss 7.6687 LearningRate 0.1156 Epoch: 10 Global Step: 107030 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:15:39,010-Speed 5949.11 samples/sec Loss 7.6415 LearningRate 0.1156 Epoch: 10 Global Step: 107040 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:15:45,868-Speed 5974.18 samples/sec Loss 7.6459 LearningRate 0.1156 Epoch: 10 Global Step: 107050 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-08 17:15:52,738-Speed 5963.84 samples/sec Loss 7.5752 LearningRate 0.1156 Epoch: 10 Global Step: 107060 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-08 17:15:59,603-Speed 5966.96 samples/sec Loss 7.6085 LearningRate 0.1155 Epoch: 10 Global Step: 107070 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:16:06,454-Speed 5980.35 samples/sec Loss 7.6073 LearningRate 0.1155 Epoch: 10 Global Step: 107080 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:16:13,305-Speed 5979.94 samples/sec Loss 7.6941 LearningRate 0.1155 Epoch: 10 Global Step: 107090 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:16:20,176-Speed 5964.03 samples/sec Loss 7.6580 LearningRate 0.1155 Epoch: 10 Global Step: 107100 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:16:27,071-Speed 5944.77 samples/sec Loss 7.6368 LearningRate 0.1154 Epoch: 10 Global Step: 107110 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:16:33,952-Speed 5953.21 samples/sec Loss 7.5958 LearningRate 0.1154 Epoch: 10 Global Step: 107120 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:16:40,807-Speed 5976.56 samples/sec Loss 7.6667 LearningRate 0.1154 Epoch: 10 Global Step: 107130 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:16:47,665-Speed 5973.87 samples/sec Loss 7.5764 LearningRate 0.1154 Epoch: 10 Global Step: 107140 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:16:54,508-Speed 5986.33 samples/sec Loss 7.5980 LearningRate 0.1154 Epoch: 10 Global Step: 107150 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:17:01,358-Speed 5980.70 samples/sec Loss 7.6097 LearningRate 0.1153 Epoch: 10 Global Step: 107160 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:17:08,211-Speed 5978.68 samples/sec Loss 7.5740 LearningRate 0.1153 Epoch: 10 Global Step: 107170 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:17:15,079-Speed 5964.93 samples/sec Loss 7.6633 LearningRate 0.1153 Epoch: 10 Global Step: 107180 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:17:21,939-Speed 5971.80 samples/sec Loss 7.6785 LearningRate 0.1153 Epoch: 10 Global Step: 107190 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:17:28,798-Speed 5973.40 samples/sec Loss 7.6491 LearningRate 0.1152 Epoch: 10 Global Step: 107200 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:17:35,649-Speed 5979.89 samples/sec Loss 7.5834 LearningRate 0.1152 Epoch: 10 Global Step: 107210 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:17:42,499-Speed 5980.58 samples/sec Loss 7.6038 LearningRate 0.1152 Epoch: 10 Global Step: 107220 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:17:49,387-Speed 5947.83 samples/sec Loss 7.6344 LearningRate 0.1152 Epoch: 10 Global Step: 107230 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:17:56,280-Speed 5942.90 samples/sec Loss 7.6407 LearningRate 0.1151 Epoch: 10 Global Step: 107240 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:18:03,143-Speed 5969.83 samples/sec Loss 7.6320 LearningRate 0.1151 Epoch: 10 Global Step: 107250 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:18:09,995-Speed 5979.11 samples/sec Loss 7.6250 LearningRate 0.1151 Epoch: 10 Global Step: 107260 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:18:16,848-Speed 5978.21 samples/sec Loss 7.5919 LearningRate 0.1151 Epoch: 10 Global Step: 107270 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:18:23,717-Speed 5964.55 samples/sec Loss 7.6717 LearningRate 0.1151 Epoch: 10 Global Step: 107280 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:18:30,573-Speed 5974.94 samples/sec Loss 7.6324 LearningRate 0.1150 Epoch: 10 Global Step: 107290 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:18:37,438-Speed 5967.52 samples/sec Loss 7.5748 LearningRate 0.1150 Epoch: 10 Global Step: 107300 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:18:44,305-Speed 5965.96 samples/sec Loss 7.6473 LearningRate 0.1150 Epoch: 10 Global Step: 107310 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:18:51,171-Speed 5967.62 samples/sec Loss 7.6228 LearningRate 0.1150 Epoch: 10 Global Step: 107320 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:18:58,032-Speed 5970.68 samples/sec Loss 7.6131 LearningRate 0.1149 Epoch: 10 Global Step: 107330 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:19:04,935-Speed 5935.36 samples/sec Loss 7.6117 LearningRate 0.1149 Epoch: 10 Global Step: 107340 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:19:11,817-Speed 5952.07 samples/sec Loss 7.6354 LearningRate 0.1149 Epoch: 10 Global Step: 107350 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:19:18,782-Speed 5882.66 samples/sec Loss 7.7118 LearningRate 0.1149 Epoch: 10 Global Step: 107360 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:19:25,650-Speed 5964.40 samples/sec Loss 7.6502 LearningRate 0.1148 Epoch: 10 Global Step: 107370 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:19:32,526-Speed 5957.98 samples/sec Loss 7.6183 LearningRate 0.1148 Epoch: 10 Global Step: 107380 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:19:39,409-Speed 5952.40 samples/sec Loss 7.5716 LearningRate 0.1148 Epoch: 10 Global Step: 107390 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:19:46,279-Speed 5963.76 samples/sec Loss 7.5642 LearningRate 0.1148 Epoch: 10 Global Step: 107400 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:19:53,128-Speed 5981.45 samples/sec Loss 7.5721 LearningRate 0.1148 Epoch: 10 Global Step: 107410 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:19:59,988-Speed 5972.16 samples/sec Loss 7.6055 LearningRate 0.1147 Epoch: 10 Global Step: 107420 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:20:06,918-Speed 5914.28 samples/sec Loss 7.5762 LearningRate 0.1147 Epoch: 10 Global Step: 107430 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:20:13,790-Speed 5961.75 samples/sec Loss 7.5733 LearningRate 0.1147 Epoch: 10 Global Step: 107440 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:20:20,640-Speed 5979.98 samples/sec Loss 7.5937 LearningRate 0.1147 Epoch: 10 Global Step: 107450 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:20:27,499-Speed 5974.38 samples/sec Loss 7.6405 LearningRate 0.1146 Epoch: 10 Global Step: 107460 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:20:34,347-Speed 5982.20 samples/sec Loss 7.6887 LearningRate 0.1146 Epoch: 10 Global Step: 107470 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-08 17:20:41,192-Speed 5985.16 samples/sec Loss 7.6607 LearningRate 0.1146 Epoch: 10 Global Step: 107480 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:20:48,061-Speed 5964.51 samples/sec Loss 7.5355 LearningRate 0.1146 Epoch: 10 Global Step: 107490 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:20:54,927-Speed 5967.34 samples/sec Loss 7.6299 LearningRate 0.1146 Epoch: 10 Global Step: 107500 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:21:01,789-Speed 5970.31 samples/sec Loss 7.5837 LearningRate 0.1145 Epoch: 10 Global Step: 107510 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:21:08,642-Speed 5978.15 samples/sec Loss 7.5901 LearningRate 0.1145 Epoch: 10 Global Step: 107520 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:21:15,498-Speed 5975.11 samples/sec Loss 7.5683 LearningRate 0.1145 Epoch: 10 Global Step: 107530 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:21:22,373-Speed 5959.32 samples/sec Loss 7.5771 LearningRate 0.1145 Epoch: 10 Global Step: 107540 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:21:29,238-Speed 5968.35 samples/sec Loss 7.5558 LearningRate 0.1144 Epoch: 10 Global Step: 107550 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:21:36,098-Speed 5971.66 samples/sec Loss 7.6315 LearningRate 0.1144 Epoch: 10 Global Step: 107560 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:21:42,977-Speed 5955.67 samples/sec Loss 7.5395 LearningRate 0.1144 Epoch: 10 Global Step: 107570 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:21:49,838-Speed 5971.81 samples/sec Loss 7.6042 LearningRate 0.1144 Epoch: 10 Global Step: 107580 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-08 17:21:56,704-Speed 5966.15 samples/sec Loss 7.6310 LearningRate 0.1143 Epoch: 10 Global Step: 107590 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-08 17:22:03,565-Speed 5970.83 samples/sec Loss 7.5678 LearningRate 0.1143 Epoch: 10 Global Step: 107600 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:22:10,413-Speed 5984.76 samples/sec Loss 7.5964 LearningRate 0.1143 Epoch: 10 Global Step: 107610 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:22:17,268-Speed 5977.00 samples/sec Loss 7.5595 LearningRate 0.1143 Epoch: 10 Global Step: 107620 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:22:24,134-Speed 5966.60 samples/sec Loss 7.6281 LearningRate 0.1143 Epoch: 10 Global Step: 107630 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:22:30,991-Speed 5974.14 samples/sec Loss 7.6006 LearningRate 0.1142 Epoch: 10 Global Step: 107640 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:22:37,842-Speed 5980.13 samples/sec Loss 7.5975 LearningRate 0.1142 Epoch: 10 Global Step: 107650 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:22:44,694-Speed 5978.84 samples/sec Loss 7.5978 LearningRate 0.1142 Epoch: 10 Global Step: 107660 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:22:51,568-Speed 5959.11 samples/sec Loss 7.5592 LearningRate 0.1142 Epoch: 10 Global Step: 107670 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:22:58,428-Speed 5973.14 samples/sec Loss 7.5614 LearningRate 0.1141 Epoch: 10 Global Step: 107680 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:23:05,278-Speed 5979.75 samples/sec Loss 7.5581 LearningRate 0.1141 Epoch: 10 Global Step: 107690 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:23:12,167-Speed 5946.71 samples/sec Loss 7.6376 LearningRate 0.1141 Epoch: 10 Global Step: 107700 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:23:19,061-Speed 5946.38 samples/sec Loss 7.6294 LearningRate 0.1141 Epoch: 10 Global Step: 107710 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:23:25,948-Speed 5947.88 samples/sec Loss 7.5631 LearningRate 0.1140 Epoch: 10 Global Step: 107720 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:23:32,803-Speed 5977.11 samples/sec Loss 7.5730 LearningRate 0.1140 Epoch: 10 Global Step: 107730 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:23:39,666-Speed 5970.96 samples/sec Loss 7.5562 LearningRate 0.1140 Epoch: 10 Global Step: 107740 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:23:46,562-Speed 5940.39 samples/sec Loss 7.5537 LearningRate 0.1140 Epoch: 10 Global Step: 107750 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:23:53,437-Speed 5960.21 samples/sec Loss 7.6047 LearningRate 0.1140 Epoch: 10 Global Step: 107760 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:24:00,319-Speed 5952.56 samples/sec Loss 7.6166 LearningRate 0.1139 Epoch: 10 Global Step: 107770 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:24:07,172-Speed 5978.27 samples/sec Loss 7.5518 LearningRate 0.1139 Epoch: 10 Global Step: 107780 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:24:14,027-Speed 5976.45 samples/sec Loss 7.5544 LearningRate 0.1139 Epoch: 10 Global Step: 107790 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:24:20,892-Speed 5967.20 samples/sec Loss 7.6107 LearningRate 0.1139 Epoch: 10 Global Step: 107800 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:24:27,753-Speed 5970.44 samples/sec Loss 7.5889 LearningRate 0.1138 Epoch: 10 Global Step: 107810 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:24:34,598-Speed 5985.52 samples/sec Loss 7.6156 LearningRate 0.1138 Epoch: 10 Global Step: 107820 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:24:41,471-Speed 5961.07 samples/sec Loss 7.5741 LearningRate 0.1138 Epoch: 10 Global Step: 107830 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:24:48,343-Speed 5961.40 samples/sec Loss 7.5319 LearningRate 0.1138 Epoch: 10 Global Step: 107840 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:24:55,201-Speed 5974.19 samples/sec Loss 7.5393 LearningRate 0.1137 Epoch: 10 Global Step: 107850 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:25:02,077-Speed 5958.26 samples/sec Loss 7.5318 LearningRate 0.1137 Epoch: 10 Global Step: 107860 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:25:08,935-Speed 5974.00 samples/sec Loss 7.5461 LearningRate 0.1137 Epoch: 10 Global Step: 107870 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:25:15,793-Speed 5973.75 samples/sec Loss 7.5725 LearningRate 0.1137 Epoch: 10 Global Step: 107880 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:25:22,663-Speed 5963.90 samples/sec Loss 7.5278 LearningRate 0.1137 Epoch: 10 Global Step: 107890 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:25:29,522-Speed 5972.33 samples/sec Loss 7.5346 LearningRate 0.1136 Epoch: 10 Global Step: 107900 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 17:25:36,369-Speed 5983.41 samples/sec Loss 7.5666 LearningRate 0.1136 Epoch: 10 Global Step: 107910 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:25:43,226-Speed 5974.72 samples/sec Loss 7.5816 LearningRate 0.1136 Epoch: 10 Global Step: 107920 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 17:25:50,076-Speed 5980.51 samples/sec Loss 7.5327 LearningRate 0.1136 Epoch: 10 Global Step: 107930 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:25:56,941-Speed 5968.25 samples/sec Loss 7.5378 LearningRate 0.1135 Epoch: 10 Global Step: 107940 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:26:03,784-Speed 5987.10 samples/sec Loss 7.5943 LearningRate 0.1135 Epoch: 10 Global Step: 107950 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:26:10,654-Speed 5962.07 samples/sec Loss 7.5218 LearningRate 0.1135 Epoch: 10 Global Step: 107960 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:26:17,600-Speed 5901.60 samples/sec Loss 7.4394 LearningRate 0.1135 Epoch: 10 Global Step: 107970 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:26:24,622-Speed 5834.09 samples/sec Loss 7.5880 LearningRate 0.1135 Epoch: 10 Global Step: 107980 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:26:31,553-Speed 5911.16 samples/sec Loss 7.5780 LearningRate 0.1134 Epoch: 10 Global Step: 107990 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:26:38,413-Speed 5972.14 samples/sec Loss 7.5326 LearningRate 0.1134 Epoch: 10 Global Step: 108000 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:26:45,247-Speed 5994.40 samples/sec Loss 7.5427 LearningRate 0.1134 Epoch: 10 Global Step: 108010 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:26:52,101-Speed 5976.19 samples/sec Loss 7.5711 LearningRate 0.1134 Epoch: 10 Global Step: 108020 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:26:58,959-Speed 5974.28 samples/sec Loss 7.5460 LearningRate 0.1133 Epoch: 10 Global Step: 108030 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:27:05,811-Speed 5981.15 samples/sec Loss 7.6009 LearningRate 0.1133 Epoch: 10 Global Step: 108040 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:27:12,659-Speed 5981.88 samples/sec Loss 7.5689 LearningRate 0.1133 Epoch: 10 Global Step: 108050 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:27:19,521-Speed 5970.81 samples/sec Loss 7.5627 LearningRate 0.1133 Epoch: 10 Global Step: 108060 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:27:26,400-Speed 5955.78 samples/sec Loss 7.5644 LearningRate 0.1132 Epoch: 10 Global Step: 108070 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:27:33,291-Speed 5944.27 samples/sec Loss 7.5955 LearningRate 0.1132 Epoch: 10 Global Step: 108080 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:27:40,162-Speed 5962.82 samples/sec Loss 7.6258 LearningRate 0.1132 Epoch: 10 Global Step: 108090 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:27:47,048-Speed 5949.88 samples/sec Loss 7.5916 LearningRate 0.1132 Epoch: 10 Global Step: 108100 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:27:53,892-Speed 5985.43 samples/sec Loss 7.5289 LearningRate 0.1132 Epoch: 10 Global Step: 108110 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:28:00,772-Speed 5955.44 samples/sec Loss 7.5185 LearningRate 0.1131 Epoch: 10 Global Step: 108120 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:28:07,641-Speed 5964.47 samples/sec Loss 7.5417 LearningRate 0.1131 Epoch: 10 Global Step: 108130 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:28:14,486-Speed 5984.24 samples/sec Loss 7.5273 LearningRate 0.1131 Epoch: 10 Global Step: 108140 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:28:21,354-Speed 5966.57 samples/sec Loss 7.5610 LearningRate 0.1131 Epoch: 10 Global Step: 108150 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:28:28,206-Speed 5980.16 samples/sec Loss 7.5177 LearningRate 0.1130 Epoch: 10 Global Step: 108160 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:28:35,041-Speed 5993.71 samples/sec Loss 7.5939 LearningRate 0.1130 Epoch: 10 Global Step: 108170 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 17:28:41,917-Speed 5960.82 samples/sec Loss 7.5583 LearningRate 0.1130 Epoch: 10 Global Step: 108180 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 17:28:48,820-Speed 5934.42 samples/sec Loss 7.5644 LearningRate 0.1130 Epoch: 10 Global Step: 108190 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 17:28:55,692-Speed 5961.96 samples/sec Loss 7.5270 LearningRate 0.1130 Epoch: 10 Global Step: 108200 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 17:29:02,540-Speed 5983.95 samples/sec Loss 7.5802 LearningRate 0.1129 Epoch: 10 Global Step: 108210 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 17:29:09,418-Speed 5956.34 samples/sec Loss 7.5423 LearningRate 0.1129 Epoch: 10 Global Step: 108220 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 17:29:16,261-Speed 5986.42 samples/sec Loss 7.5599 LearningRate 0.1129 Epoch: 10 Global Step: 108230 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 17:29:23,110-Speed 5984.21 samples/sec Loss 7.5051 LearningRate 0.1129 Epoch: 10 Global Step: 108240 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 17:29:29,967-Speed 5974.31 samples/sec Loss 7.6013 LearningRate 0.1128 Epoch: 10 Global Step: 108250 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 17:29:36,865-Speed 5939.11 samples/sec Loss 7.5051 LearningRate 0.1128 Epoch: 10 Global Step: 108260 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 17:29:43,736-Speed 5962.13 samples/sec Loss 7.5988 LearningRate 0.1128 Epoch: 10 Global Step: 108270 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:29:50,597-Speed 5971.27 samples/sec Loss 7.5866 LearningRate 0.1128 Epoch: 10 Global Step: 108280 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:29:57,453-Speed 5974.92 samples/sec Loss 7.4988 LearningRate 0.1127 Epoch: 10 Global Step: 108290 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:30:04,315-Speed 5970.47 samples/sec Loss 7.4987 LearningRate 0.1127 Epoch: 10 Global Step: 108300 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:30:11,181-Speed 5967.22 samples/sec Loss 7.4950 LearningRate 0.1127 Epoch: 10 Global Step: 108310 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:30:18,050-Speed 5964.45 samples/sec Loss 7.5163 LearningRate 0.1127 Epoch: 10 Global Step: 108320 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:30:24,930-Speed 5954.27 samples/sec Loss 7.5203 LearningRate 0.1127 Epoch: 10 Global Step: 108330 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:30:31,901-Speed 5877.55 samples/sec Loss 7.5787 LearningRate 0.1126 Epoch: 10 Global Step: 108340 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:30:38,781-Speed 5953.87 samples/sec Loss 7.5723 LearningRate 0.1126 Epoch: 10 Global Step: 108350 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:30:45,686-Speed 5933.94 samples/sec Loss 7.5624 LearningRate 0.1126 Epoch: 10 Global Step: 108360 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:30:52,551-Speed 5967.03 samples/sec Loss 7.5515 LearningRate 0.1126 Epoch: 10 Global Step: 108370 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:30:59,406-Speed 5976.16 samples/sec Loss 7.5381 LearningRate 0.1125 Epoch: 10 Global Step: 108380 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:31:06,269-Speed 5969.79 samples/sec Loss 7.5163 LearningRate 0.1125 Epoch: 10 Global Step: 108390 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:31:13,131-Speed 5969.85 samples/sec Loss 7.4937 LearningRate 0.1125 Epoch: 10 Global Step: 108400 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:31:19,978-Speed 5983.11 samples/sec Loss 7.5433 LearningRate 0.1125 Epoch: 10 Global Step: 108410 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:31:26,847-Speed 5964.14 samples/sec Loss 7.4792 LearningRate 0.1125 Epoch: 10 Global Step: 108420 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:31:33,714-Speed 5966.13 samples/sec Loss 7.5833 LearningRate 0.1124 Epoch: 10 Global Step: 108430 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:31:40,560-Speed 5984.16 samples/sec Loss 7.5692 LearningRate 0.1124 Epoch: 10 Global Step: 108440 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:31:47,428-Speed 5965.18 samples/sec Loss 7.4860 LearningRate 0.1124 Epoch: 10 Global Step: 108450 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:31:54,296-Speed 5965.34 samples/sec Loss 7.5311 LearningRate 0.1124 Epoch: 10 Global Step: 108460 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:32:01,145-Speed 5981.37 samples/sec Loss 7.5116 LearningRate 0.1123 Epoch: 10 Global Step: 108470 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:32:08,007-Speed 5970.40 samples/sec Loss 7.5152 LearningRate 0.1123 Epoch: 10 Global Step: 108480 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:32:14,869-Speed 5969.91 samples/sec Loss 7.5499 LearningRate 0.1123 Epoch: 10 Global Step: 108490 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:32:21,865-Speed 5856.17 samples/sec Loss 7.4803 LearningRate 0.1123 Epoch: 10 Global Step: 108500 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:32:28,727-Speed 5970.20 samples/sec Loss 7.4934 LearningRate 0.1122 Epoch: 10 Global Step: 108510 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:32:35,581-Speed 5977.31 samples/sec Loss 7.5284 LearningRate 0.1122 Epoch: 10 Global Step: 108520 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:32:42,432-Speed 5980.12 samples/sec Loss 7.4927 LearningRate 0.1122 Epoch: 10 Global Step: 108530 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:32:49,335-Speed 5934.31 samples/sec Loss 7.5260 LearningRate 0.1122 Epoch: 10 Global Step: 108540 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:32:56,268-Speed 5909.23 samples/sec Loss 7.5459 LearningRate 0.1122 Epoch: 10 Global Step: 108550 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:33:03,161-Speed 5943.25 samples/sec Loss 7.5878 LearningRate 0.1121 Epoch: 10 Global Step: 108560 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:33:10,054-Speed 5944.15 samples/sec Loss 7.5755 LearningRate 0.1121 Epoch: 10 Global Step: 108570 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-08 17:33:16,909-Speed 5975.84 samples/sec Loss 7.5807 LearningRate 0.1121 Epoch: 10 Global Step: 108580 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:33:23,768-Speed 5972.43 samples/sec Loss 7.4804 LearningRate 0.1121 Epoch: 10 Global Step: 108590 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:33:30,636-Speed 5965.62 samples/sec Loss 7.5263 LearningRate 0.1120 Epoch: 10 Global Step: 108600 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:33:37,514-Speed 5956.23 samples/sec Loss 7.5312 LearningRate 0.1120 Epoch: 10 Global Step: 108610 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:33:44,432-Speed 5921.37 samples/sec Loss 7.5098 LearningRate 0.1120 Epoch: 10 Global Step: 108620 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:33:51,291-Speed 5973.02 samples/sec Loss 7.4886 LearningRate 0.1120 Epoch: 10 Global Step: 108630 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:33:58,183-Speed 5944.96 samples/sec Loss 7.4717 LearningRate 0.1120 Epoch: 10 Global Step: 108640 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:34:05,036-Speed 5977.19 samples/sec Loss 7.5712 LearningRate 0.1119 Epoch: 10 Global Step: 108650 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:34:11,904-Speed 5965.52 samples/sec Loss 7.5304 LearningRate 0.1119 Epoch: 10 Global Step: 108660 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:34:18,763-Speed 5972.80 samples/sec Loss 7.5128 LearningRate 0.1119 Epoch: 10 Global Step: 108670 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:34:25,615-Speed 5978.93 samples/sec Loss 7.5060 LearningRate 0.1119 Epoch: 10 Global Step: 108680 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:34:32,460-Speed 5985.38 samples/sec Loss 7.4916 LearningRate 0.1118 Epoch: 10 Global Step: 108690 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:34:39,331-Speed 5962.54 samples/sec Loss 7.4616 LearningRate 0.1118 Epoch: 10 Global Step: 108700 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:34:46,199-Speed 5965.16 samples/sec Loss 7.5074 LearningRate 0.1118 Epoch: 10 Global Step: 108710 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:34:53,120-Speed 5919.26 samples/sec Loss 7.4830 LearningRate 0.1118 Epoch: 10 Global Step: 108720 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:35:00,054-Speed 5908.51 samples/sec Loss 7.4803 LearningRate 0.1117 Epoch: 10 Global Step: 108730 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:35:06,895-Speed 5987.54 samples/sec Loss 7.4962 LearningRate 0.1117 Epoch: 10 Global Step: 108740 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:35:13,745-Speed 5980.98 samples/sec Loss 7.5230 LearningRate 0.1117 Epoch: 10 Global Step: 108750 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:35:20,619-Speed 5959.70 samples/sec Loss 7.4766 LearningRate 0.1117 Epoch: 10 Global Step: 108760 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:35:27,475-Speed 5975.43 samples/sec Loss 7.5133 LearningRate 0.1117 Epoch: 10 Global Step: 108770 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:35:34,320-Speed 5984.80 samples/sec Loss 7.4674 LearningRate 0.1116 Epoch: 10 Global Step: 108780 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:35:41,185-Speed 5968.04 samples/sec Loss 7.5124 LearningRate 0.1116 Epoch: 10 Global Step: 108790 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:35:48,036-Speed 5979.63 samples/sec Loss 7.5803 LearningRate 0.1116 Epoch: 10 Global Step: 108800 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:35:54,890-Speed 5977.51 samples/sec Loss 7.5147 LearningRate 0.1116 Epoch: 10 Global Step: 108810 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:36:01,742-Speed 5978.63 samples/sec Loss 7.5622 LearningRate 0.1115 Epoch: 10 Global Step: 108820 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:36:08,641-Speed 5938.71 samples/sec Loss 7.5636 LearningRate 0.1115 Epoch: 10 Global Step: 108830 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:36:15,570-Speed 5912.07 samples/sec Loss 7.4509 LearningRate 0.1115 Epoch: 10 Global Step: 108840 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:36:22,435-Speed 5967.45 samples/sec Loss 7.4992 LearningRate 0.1115 Epoch: 10 Global Step: 108850 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:36:29,300-Speed 5967.68 samples/sec Loss 7.5009 LearningRate 0.1115 Epoch: 10 Global Step: 108860 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:36:36,165-Speed 5967.19 samples/sec Loss 7.5259 LearningRate 0.1114 Epoch: 10 Global Step: 108870 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:36:43,063-Speed 5939.53 samples/sec Loss 7.5121 LearningRate 0.1114 Epoch: 10 Global Step: 108880 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:36:49,907-Speed 5985.79 samples/sec Loss 7.4779 LearningRate 0.1114 Epoch: 10 Global Step: 108890 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:36:56,874-Speed 5880.12 samples/sec Loss 7.4128 LearningRate 0.1114 Epoch: 10 Global Step: 108900 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:37:03,751-Speed 5957.57 samples/sec Loss 7.5377 LearningRate 0.1113 Epoch: 10 Global Step: 108910 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:37:10,605-Speed 5976.55 samples/sec Loss 7.4855 LearningRate 0.1113 Epoch: 10 Global Step: 108920 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:37:17,459-Speed 5977.10 samples/sec Loss 7.4772 LearningRate 0.1113 Epoch: 10 Global Step: 108930 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:37:24,310-Speed 5979.59 samples/sec Loss 7.4792 LearningRate 0.1113 Epoch: 10 Global Step: 108940 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:37:31,176-Speed 5966.99 samples/sec Loss 7.4972 LearningRate 0.1112 Epoch: 10 Global Step: 108950 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:37:38,046-Speed 5963.89 samples/sec Loss 7.4504 LearningRate 0.1112 Epoch: 10 Global Step: 108960 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:37:44,900-Speed 5977.04 samples/sec Loss 7.4669 LearningRate 0.1112 Epoch: 10 Global Step: 108970 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:37:51,773-Speed 5960.42 samples/sec Loss 7.5284 LearningRate 0.1112 Epoch: 10 Global Step: 108980 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:37:58,639-Speed 5966.77 samples/sec Loss 7.4872 LearningRate 0.1112 Epoch: 10 Global Step: 108990 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:38:05,498-Speed 5972.34 samples/sec Loss 7.5012 LearningRate 0.1111 Epoch: 10 Global Step: 109000 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:38:12,349-Speed 5980.29 samples/sec Loss 7.4536 LearningRate 0.1111 Epoch: 10 Global Step: 109010 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:38:19,208-Speed 5972.86 samples/sec Loss 7.5161 LearningRate 0.1111 Epoch: 10 Global Step: 109020 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:38:26,082-Speed 5960.46 samples/sec Loss 7.4497 LearningRate 0.1111 Epoch: 10 Global Step: 109030 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:38:32,945-Speed 5970.10 samples/sec Loss 7.4838 LearningRate 0.1110 Epoch: 10 Global Step: 109040 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:38:39,812-Speed 5965.20 samples/sec Loss 7.4709 LearningRate 0.1110 Epoch: 10 Global Step: 109050 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:38:46,676-Speed 5968.40 samples/sec Loss 7.4587 LearningRate 0.1110 Epoch: 10 Global Step: 109060 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:38:53,535-Speed 5972.58 samples/sec Loss 7.4570 LearningRate 0.1110 Epoch: 10 Global Step: 109070 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:39:00,394-Speed 5973.01 samples/sec Loss 7.3944 LearningRate 0.1110 Epoch: 10 Global Step: 109080 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:39:07,285-Speed 5945.20 samples/sec Loss 7.5756 LearningRate 0.1109 Epoch: 10 Global Step: 109090 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:39:14,131-Speed 5984.41 samples/sec Loss 7.5479 LearningRate 0.1109 Epoch: 10 Global Step: 109100 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:39:21,010-Speed 5955.32 samples/sec Loss 7.5055 LearningRate 0.1109 Epoch: 10 Global Step: 109110 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:39:27,868-Speed 5973.31 samples/sec Loss 7.4515 LearningRate 0.1109 Epoch: 10 Global Step: 109120 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:39:34,724-Speed 5976.32 samples/sec Loss 7.4520 LearningRate 0.1108 Epoch: 10 Global Step: 109130 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:39:41,568-Speed 5985.33 samples/sec Loss 7.4796 LearningRate 0.1108 Epoch: 10 Global Step: 109140 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:39:48,443-Speed 5959.11 samples/sec Loss 7.4850 LearningRate 0.1108 Epoch: 10 Global Step: 109150 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:39:55,302-Speed 5973.88 samples/sec Loss 7.5030 LearningRate 0.1108 Epoch: 10 Global Step: 109160 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:40:02,157-Speed 5976.20 samples/sec Loss 7.5161 LearningRate 0.1108 Epoch: 10 Global Step: 109170 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:40:09,056-Speed 5938.54 samples/sec Loss 7.5090 LearningRate 0.1107 Epoch: 10 Global Step: 109180 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:40:15,962-Speed 5932.28 samples/sec Loss 7.4834 LearningRate 0.1107 Epoch: 10 Global Step: 109190 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:40:22,852-Speed 5946.43 samples/sec Loss 7.4258 LearningRate 0.1107 Epoch: 10 Global Step: 109200 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:40:29,751-Speed 5938.46 samples/sec Loss 7.4670 LearningRate 0.1107 Epoch: 10 Global Step: 109210 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:40:36,616-Speed 5967.80 samples/sec Loss 7.4949 LearningRate 0.1106 Epoch: 10 Global Step: 109220 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:40:43,490-Speed 5959.43 samples/sec Loss 7.4706 LearningRate 0.1106 Epoch: 10 Global Step: 109230 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:40:50,352-Speed 5970.99 samples/sec Loss 7.4971 LearningRate 0.1106 Epoch: 10 Global Step: 109240 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:40:57,225-Speed 5960.48 samples/sec Loss 7.4770 LearningRate 0.1106 Epoch: 10 Global Step: 109250 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:41:04,092-Speed 5966.22 samples/sec Loss 7.4591 LearningRate 0.1105 Epoch: 10 Global Step: 109260 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:41:10,943-Speed 5979.80 samples/sec Loss 7.4965 LearningRate 0.1105 Epoch: 10 Global Step: 109270 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:41:17,795-Speed 5978.73 samples/sec Loss 7.4575 LearningRate 0.1105 Epoch: 10 Global Step: 109280 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:41:24,648-Speed 5978.65 samples/sec Loss 7.4486 LearningRate 0.1105 Epoch: 10 Global Step: 109290 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:41:31,518-Speed 5963.20 samples/sec Loss 7.4673 LearningRate 0.1105 Epoch: 10 Global Step: 109300 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:41:38,384-Speed 5967.14 samples/sec Loss 7.5152 LearningRate 0.1104 Epoch: 10 Global Step: 109310 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:41:45,256-Speed 5961.61 samples/sec Loss 7.4680 LearningRate 0.1104 Epoch: 10 Global Step: 109320 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:41:52,135-Speed 5955.28 samples/sec Loss 7.5237 LearningRate 0.1104 Epoch: 10 Global Step: 109330 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-08 17:41:59,020-Speed 5950.48 samples/sec Loss 7.4288 LearningRate 0.1104 Epoch: 10 Global Step: 109340 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:42:05,914-Speed 5942.91 samples/sec Loss 7.5156 LearningRate 0.1103 Epoch: 10 Global Step: 109350 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:42:12,800-Speed 5949.29 samples/sec Loss 7.4588 LearningRate 0.1103 Epoch: 10 Global Step: 109360 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:42:19,645-Speed 5984.92 samples/sec Loss 7.4452 LearningRate 0.1103 Epoch: 10 Global Step: 109370 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:42:26,495-Speed 5980.70 samples/sec Loss 7.4805 LearningRate 0.1103 Epoch: 10 Global Step: 109380 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:42:33,349-Speed 5976.90 samples/sec Loss 7.4303 LearningRate 0.1103 Epoch: 10 Global Step: 109390 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:42:40,212-Speed 5970.10 samples/sec Loss 7.4705 LearningRate 0.1102 Epoch: 10 Global Step: 109400 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:42:47,069-Speed 5974.33 samples/sec Loss 7.4670 LearningRate 0.1102 Epoch: 10 Global Step: 109410 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:42:53,931-Speed 5970.72 samples/sec Loss 7.5174 LearningRate 0.1102 Epoch: 10 Global Step: 109420 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:43:00,771-Speed 5989.42 samples/sec Loss 7.4407 LearningRate 0.1102 Epoch: 10 Global Step: 109430 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:43:07,640-Speed 5964.21 samples/sec Loss 7.4570 LearningRate 0.1101 Epoch: 10 Global Step: 109440 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:43:14,493-Speed 5977.76 samples/sec Loss 7.4873 LearningRate 0.1101 Epoch: 10 Global Step: 109450 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:43:21,358-Speed 5968.06 samples/sec Loss 7.4553 LearningRate 0.1101 Epoch: 10 Global Step: 109460 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:43:28,202-Speed 5984.85 samples/sec Loss 7.4272 LearningRate 0.1101 Epoch: 10 Global Step: 109470 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:43:35,065-Speed 5969.01 samples/sec Loss 7.3913 LearningRate 0.1101 Epoch: 10 Global Step: 109480 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:43:41,923-Speed 5973.94 samples/sec Loss 7.4971 LearningRate 0.1100 Epoch: 10 Global Step: 109490 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:43:48,772-Speed 5981.84 samples/sec Loss 7.4097 LearningRate 0.1100 Epoch: 10 Global Step: 109500 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:43:55,643-Speed 5962.78 samples/sec Loss 7.4212 LearningRate 0.1100 Epoch: 10 Global Step: 109510 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:44:02,513-Speed 5963.42 samples/sec Loss 7.4602 LearningRate 0.1100 Epoch: 10 Global Step: 109520 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:44:09,362-Speed 5981.92 samples/sec Loss 7.3830 LearningRate 0.1099 Epoch: 10 Global Step: 109530 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:44:16,234-Speed 5961.28 samples/sec Loss 7.4431 LearningRate 0.1099 Epoch: 10 Global Step: 109540 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:44:23,207-Speed 5880.95 samples/sec Loss 7.4595 LearningRate 0.1099 Epoch: 10 Global Step: 109550 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:44:30,181-Speed 5874.40 samples/sec Loss 7.4487 LearningRate 0.1099 Epoch: 10 Global Step: 109560 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:44:37,070-Speed 5947.24 samples/sec Loss 7.4176 LearningRate 0.1099 Epoch: 10 Global Step: 109570 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:44:43,918-Speed 5983.88 samples/sec Loss 7.4386 LearningRate 0.1098 Epoch: 10 Global Step: 109580 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 17:44:51,072-Speed 5726.54 samples/sec Loss 7.4186 LearningRate 0.1098 Epoch: 10 Global Step: 109590 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 17:44:57,948-Speed 5958.61 samples/sec Loss 7.5088 LearningRate 0.1098 Epoch: 10 Global Step: 109600 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 17:45:04,805-Speed 5974.69 samples/sec Loss 7.5842 LearningRate 0.1098 Epoch: 10 Global Step: 109610 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 17:45:11,652-Speed 5983.01 samples/sec Loss 7.5439 LearningRate 0.1097 Epoch: 10 Global Step: 109620 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 17:45:18,519-Speed 5965.66 samples/sec Loss 7.5299 LearningRate 0.1097 Epoch: 10 Global Step: 109630 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 17:45:25,371-Speed 5980.37 samples/sec Loss 7.4682 LearningRate 0.1097 Epoch: 10 Global Step: 109640 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 17:45:32,228-Speed 5974.31 samples/sec Loss 7.4468 LearningRate 0.1097 Epoch: 10 Global Step: 109650 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 17:45:39,076-Speed 5982.43 samples/sec Loss 7.5132 LearningRate 0.1096 Epoch: 10 Global Step: 109660 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 17:45:45,938-Speed 5971.10 samples/sec Loss 7.4564 LearningRate 0.1096 Epoch: 10 Global Step: 109670 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 17:45:52,790-Speed 5978.83 samples/sec Loss 7.3970 LearningRate 0.1096 Epoch: 10 Global Step: 109680 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:45:59,635-Speed 5985.14 samples/sec Loss 7.4191 LearningRate 0.1096 Epoch: 10 Global Step: 109690 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:46:06,480-Speed 5985.24 samples/sec Loss 7.4409 LearningRate 0.1096 Epoch: 10 Global Step: 109700 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:46:13,343-Speed 5969.29 samples/sec Loss 7.3625 LearningRate 0.1095 Epoch: 10 Global Step: 109710 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:46:20,205-Speed 5970.52 samples/sec Loss 7.4050 LearningRate 0.1095 Epoch: 10 Global Step: 109720 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:46:27,076-Speed 5962.47 samples/sec Loss 7.3850 LearningRate 0.1095 Epoch: 10 Global Step: 109730 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:46:33,942-Speed 5966.41 samples/sec Loss 7.4923 LearningRate 0.1095 Epoch: 10 Global Step: 109740 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:46:40,809-Speed 5966.64 samples/sec Loss 7.4577 LearningRate 0.1094 Epoch: 10 Global Step: 109750 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:46:47,656-Speed 5982.88 samples/sec Loss 7.4856 LearningRate 0.1094 Epoch: 10 Global Step: 109760 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:46:54,506-Speed 5982.72 samples/sec Loss 7.4598 LearningRate 0.1094 Epoch: 10 Global Step: 109770 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:47:01,360-Speed 5978.18 samples/sec Loss 7.4101 LearningRate 0.1094 Epoch: 10 Global Step: 109780 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:47:08,307-Speed 5897.61 samples/sec Loss 7.4378 LearningRate 0.1094 Epoch: 10 Global Step: 109790 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:47:15,231-Speed 5916.19 samples/sec Loss 7.4948 LearningRate 0.1093 Epoch: 10 Global Step: 109800 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:47:22,157-Speed 5915.68 samples/sec Loss 7.4380 LearningRate 0.1093 Epoch: 10 Global Step: 109810 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:47:29,081-Speed 5916.96 samples/sec Loss 7.5022 LearningRate 0.1093 Epoch: 10 Global Step: 109820 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:47:36,025-Speed 5898.84 samples/sec Loss 7.3652 LearningRate 0.1093 Epoch: 10 Global Step: 109830 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:47:42,952-Speed 5914.85 samples/sec Loss 7.4172 LearningRate 0.1092 Epoch: 10 Global Step: 109840 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:47:49,855-Speed 5934.58 samples/sec Loss 7.3834 LearningRate 0.1092 Epoch: 10 Global Step: 109850 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:47:56,773-Speed 5922.10 samples/sec Loss 7.4472 LearningRate 0.1092 Epoch: 10 Global Step: 109860 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:48:03,696-Speed 5917.34 samples/sec Loss 7.3754 LearningRate 0.1092 Epoch: 10 Global Step: 109870 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:48:10,601-Speed 5933.05 samples/sec Loss 7.3794 LearningRate 0.1092 Epoch: 10 Global Step: 109880 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-08 17:48:17,534-Speed 5909.26 samples/sec Loss 7.4213 LearningRate 0.1091 Epoch: 10 Global Step: 109890 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-08 17:48:24,420-Speed 5949.84 samples/sec Loss 7.4333 LearningRate 0.1091 Epoch: 10 Global Step: 109900 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:48:31,331-Speed 5927.52 samples/sec Loss 7.4148 LearningRate 0.1091 Epoch: 10 Global Step: 109910 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:48:38,214-Speed 5952.21 samples/sec Loss 7.4880 LearningRate 0.1091 Epoch: 10 Global Step: 109920 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:48:45,077-Speed 5969.16 samples/sec Loss 7.4018 LearningRate 0.1090 Epoch: 10 Global Step: 109930 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:48:51,944-Speed 5966.44 samples/sec Loss 7.4367 LearningRate 0.1090 Epoch: 10 Global Step: 109940 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:48:58,901-Speed 5888.58 samples/sec Loss 7.4271 LearningRate 0.1090 Epoch: 10 Global Step: 109950 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:49:05,763-Speed 5969.79 samples/sec Loss 7.3763 LearningRate 0.1090 Epoch: 10 Global Step: 109960 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:49:12,621-Speed 5973.96 samples/sec Loss 7.4033 LearningRate 0.1090 Epoch: 10 Global Step: 109970 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:49:19,470-Speed 5981.38 samples/sec Loss 7.4417 LearningRate 0.1089 Epoch: 10 Global Step: 109980 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:49:26,333-Speed 5969.29 samples/sec Loss 7.3959 LearningRate 0.1089 Epoch: 10 Global Step: 109990 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:49:33,177-Speed 5985.93 samples/sec Loss 7.4159 LearningRate 0.1089 Epoch: 10 Global Step: 110000 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:49:59,867-[lfw][110000]XNorm: 22.489254 Training: 2022-01-08 17:49:59,868-[lfw][110000]Accuracy-Flip: 0.99783+-0.00269 Training: 2022-01-08 17:49:59,868-[lfw][110000]Accuracy-Highest: 0.99783 Training: 2022-01-08 17:50:30,697-[cfp_fp][110000]XNorm: 19.859721 Training: 2022-01-08 17:50:30,698-[cfp_fp][110000]Accuracy-Flip: 0.98057+-0.00613 Training: 2022-01-08 17:50:30,699-[cfp_fp][110000]Accuracy-Highest: 0.98357 Training: 2022-01-08 17:50:57,367-[agedb_30][110000]XNorm: 21.727236 Training: 2022-01-08 17:50:57,368-[agedb_30][110000]Accuracy-Flip: 0.96850+-0.00647 Training: 2022-01-08 17:50:57,368-[agedb_30][110000]Accuracy-Highest: 0.97200 Training: 2022-01-08 17:51:04,224-Speed 449.88 samples/sec Loss 7.4067 LearningRate 0.1089 Epoch: 10 Global Step: 110010 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:51:11,116-Speed 5945.95 samples/sec Loss 7.4301 LearningRate 0.1088 Epoch: 10 Global Step: 110020 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:51:18,076-Speed 5886.12 samples/sec Loss 7.4576 LearningRate 0.1088 Epoch: 10 Global Step: 110030 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:51:24,927-Speed 5979.63 samples/sec Loss 7.4126 LearningRate 0.1088 Epoch: 10 Global Step: 110040 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:51:31,796-Speed 5964.20 samples/sec Loss 7.3935 LearningRate 0.1088 Epoch: 10 Global Step: 110050 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:51:38,664-Speed 5965.56 samples/sec Loss 7.3637 LearningRate 0.1088 Epoch: 10 Global Step: 110060 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:51:45,539-Speed 5961.09 samples/sec Loss 7.4348 LearningRate 0.1087 Epoch: 10 Global Step: 110070 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:51:52,416-Speed 5956.61 samples/sec Loss 7.3654 LearningRate 0.1087 Epoch: 10 Global Step: 110080 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:51:59,279-Speed 5969.15 samples/sec Loss 7.4266 LearningRate 0.1087 Epoch: 10 Global Step: 110090 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:52:06,162-Speed 5954.79 samples/sec Loss 7.3873 LearningRate 0.1087 Epoch: 10 Global Step: 110100 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:52:13,035-Speed 5960.26 samples/sec Loss 7.4282 LearningRate 0.1086 Epoch: 10 Global Step: 110110 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:52:19,904-Speed 5964.42 samples/sec Loss 7.4589 LearningRate 0.1086 Epoch: 10 Global Step: 110120 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:52:26,771-Speed 5965.50 samples/sec Loss 7.4172 LearningRate 0.1086 Epoch: 10 Global Step: 110130 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:52:33,627-Speed 5975.20 samples/sec Loss 7.4838 LearningRate 0.1086 Epoch: 10 Global Step: 110140 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:52:40,517-Speed 5946.29 samples/sec Loss 7.4093 LearningRate 0.1086 Epoch: 10 Global Step: 110150 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:52:47,379-Speed 5970.52 samples/sec Loss 7.3929 LearningRate 0.1085 Epoch: 10 Global Step: 110160 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:52:54,245-Speed 5966.73 samples/sec Loss 7.4564 LearningRate 0.1085 Epoch: 10 Global Step: 110170 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-08 17:53:01,114-Speed 5964.11 samples/sec Loss 7.4164 LearningRate 0.1085 Epoch: 10 Global Step: 110180 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-08 17:53:07,990-Speed 5957.98 samples/sec Loss 7.3989 LearningRate 0.1085 Epoch: 10 Global Step: 110190 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:53:14,852-Speed 5970.32 samples/sec Loss 7.4356 LearningRate 0.1084 Epoch: 10 Global Step: 110200 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:53:21,701-Speed 5981.97 samples/sec Loss 7.4361 LearningRate 0.1084 Epoch: 10 Global Step: 110210 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:53:28,557-Speed 5975.46 samples/sec Loss 7.4413 LearningRate 0.1084 Epoch: 10 Global Step: 110220 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:53:35,404-Speed 5983.47 samples/sec Loss 7.4217 LearningRate 0.1084 Epoch: 10 Global Step: 110230 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:53:42,247-Speed 5986.87 samples/sec Loss 7.3389 LearningRate 0.1084 Epoch: 10 Global Step: 110240 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:53:49,087-Speed 5988.83 samples/sec Loss 7.3828 LearningRate 0.1083 Epoch: 10 Global Step: 110250 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:53:55,938-Speed 5979.64 samples/sec Loss 7.4038 LearningRate 0.1083 Epoch: 10 Global Step: 110260 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:54:02,791-Speed 5978.28 samples/sec Loss 7.3843 LearningRate 0.1083 Epoch: 10 Global Step: 110270 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:54:09,640-Speed 5981.89 samples/sec Loss 7.4403 LearningRate 0.1083 Epoch: 10 Global Step: 110280 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:54:16,499-Speed 5972.41 samples/sec Loss 7.4091 LearningRate 0.1082 Epoch: 10 Global Step: 110290 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:54:23,377-Speed 5956.95 samples/sec Loss 7.4115 LearningRate 0.1082 Epoch: 10 Global Step: 110300 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:54:30,253-Speed 5958.38 samples/sec Loss 7.3554 LearningRate 0.1082 Epoch: 10 Global Step: 110310 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:54:37,098-Speed 5984.17 samples/sec Loss 7.4583 LearningRate 0.1082 Epoch: 10 Global Step: 110320 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:54:43,951-Speed 5980.54 samples/sec Loss 7.3978 LearningRate 0.1082 Epoch: 10 Global Step: 110330 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:54:50,809-Speed 5977.36 samples/sec Loss 7.4416 LearningRate 0.1081 Epoch: 10 Global Step: 110340 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:54:57,658-Speed 5980.93 samples/sec Loss 7.4339 LearningRate 0.1081 Epoch: 10 Global Step: 110350 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:55:04,524-Speed 5968.10 samples/sec Loss 7.4270 LearningRate 0.1081 Epoch: 10 Global Step: 110360 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:55:11,381-Speed 5974.35 samples/sec Loss 7.3873 LearningRate 0.1081 Epoch: 10 Global Step: 110370 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:55:18,269-Speed 5947.81 samples/sec Loss 7.4266 LearningRate 0.1080 Epoch: 10 Global Step: 110380 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:55:25,118-Speed 5980.96 samples/sec Loss 7.4043 LearningRate 0.1080 Epoch: 10 Global Step: 110390 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:55:31,987-Speed 5965.18 samples/sec Loss 7.3513 LearningRate 0.1080 Epoch: 10 Global Step: 110400 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:55:38,845-Speed 5972.55 samples/sec Loss 7.3343 LearningRate 0.1080 Epoch: 10 Global Step: 110410 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:55:45,717-Speed 5961.52 samples/sec Loss 7.3560 LearningRate 0.1080 Epoch: 10 Global Step: 110420 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:55:52,571-Speed 5977.85 samples/sec Loss 7.3826 LearningRate 0.1079 Epoch: 10 Global Step: 110430 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-08 17:55:59,430-Speed 5971.94 samples/sec Loss 7.3737 LearningRate 0.1079 Epoch: 10 Global Step: 110440 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-08 17:56:06,253-Speed 6004.63 samples/sec Loss 7.4276 LearningRate 0.1079 Epoch: 10 Global Step: 110450 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 17:56:13,124-Speed 5962.36 samples/sec Loss 7.4500 LearningRate 0.1079 Epoch: 10 Global Step: 110460 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 17:56:19,980-Speed 5975.24 samples/sec Loss 7.3768 LearningRate 0.1078 Epoch: 10 Global Step: 110470 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 17:56:26,847-Speed 5966.80 samples/sec Loss 7.4014 LearningRate 0.1078 Epoch: 10 Global Step: 110480 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 17:56:33,725-Speed 5956.46 samples/sec Loss 7.4482 LearningRate 0.1078 Epoch: 10 Global Step: 110490 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 17:56:40,611-Speed 5949.75 samples/sec Loss 7.3904 LearningRate 0.1078 Epoch: 10 Global Step: 110500 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 17:56:47,461-Speed 5981.31 samples/sec Loss 7.4134 LearningRate 0.1078 Epoch: 10 Global Step: 110510 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 17:56:54,314-Speed 5980.98 samples/sec Loss 7.3320 LearningRate 0.1077 Epoch: 10 Global Step: 110520 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 17:57:01,169-Speed 5975.97 samples/sec Loss 7.4036 LearningRate 0.1077 Epoch: 10 Global Step: 110530 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 17:57:08,026-Speed 5975.07 samples/sec Loss 7.4333 LearningRate 0.1077 Epoch: 10 Global Step: 110540 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 17:57:14,884-Speed 5973.75 samples/sec Loss 7.3865 LearningRate 0.1077 Epoch: 10 Global Step: 110550 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:57:21,760-Speed 5958.25 samples/sec Loss 7.3681 LearningRate 0.1076 Epoch: 10 Global Step: 110560 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:57:28,619-Speed 5975.57 samples/sec Loss 7.3860 LearningRate 0.1076 Epoch: 10 Global Step: 110570 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:57:35,478-Speed 5972.47 samples/sec Loss 7.3800 LearningRate 0.1076 Epoch: 10 Global Step: 110580 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:57:42,338-Speed 5971.56 samples/sec Loss 7.4409 LearningRate 0.1076 Epoch: 10 Global Step: 110590 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:57:49,246-Speed 5930.55 samples/sec Loss 7.3543 LearningRate 0.1076 Epoch: 10 Global Step: 110600 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:57:56,120-Speed 5961.16 samples/sec Loss 7.3528 LearningRate 0.1075 Epoch: 10 Global Step: 110610 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:58:03,001-Speed 5953.83 samples/sec Loss 7.3582 LearningRate 0.1075 Epoch: 10 Global Step: 110620 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:58:09,857-Speed 5975.65 samples/sec Loss 7.3337 LearningRate 0.1075 Epoch: 10 Global Step: 110630 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:58:16,723-Speed 5967.24 samples/sec Loss 7.3426 LearningRate 0.1075 Epoch: 10 Global Step: 110640 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:58:23,573-Speed 5980.07 samples/sec Loss 7.3237 LearningRate 0.1074 Epoch: 10 Global Step: 110650 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:58:30,442-Speed 5966.29 samples/sec Loss 7.3237 LearningRate 0.1074 Epoch: 10 Global Step: 110660 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:58:37,323-Speed 5953.54 samples/sec Loss 7.3787 LearningRate 0.1074 Epoch: 10 Global Step: 110670 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:58:44,227-Speed 5933.89 samples/sec Loss 7.3854 LearningRate 0.1074 Epoch: 10 Global Step: 110680 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:58:51,118-Speed 5945.25 samples/sec Loss 7.3755 LearningRate 0.1074 Epoch: 10 Global Step: 110690 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:58:57,978-Speed 5974.24 samples/sec Loss 7.4014 LearningRate 0.1073 Epoch: 10 Global Step: 110700 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:59:04,835-Speed 5974.60 samples/sec Loss 7.3842 LearningRate 0.1073 Epoch: 10 Global Step: 110710 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:59:11,699-Speed 5968.01 samples/sec Loss 7.3761 LearningRate 0.1073 Epoch: 10 Global Step: 110720 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:59:18,552-Speed 5978.38 samples/sec Loss 7.3735 LearningRate 0.1073 Epoch: 10 Global Step: 110730 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 17:59:25,426-Speed 5960.02 samples/sec Loss 7.4042 LearningRate 0.1072 Epoch: 10 Global Step: 110740 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:59:32,271-Speed 5984.72 samples/sec Loss 7.3652 LearningRate 0.1072 Epoch: 10 Global Step: 110750 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:59:39,138-Speed 5966.19 samples/sec Loss 7.3482 LearningRate 0.1072 Epoch: 10 Global Step: 110760 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:59:45,984-Speed 5983.63 samples/sec Loss 7.4506 LearningRate 0.1072 Epoch: 10 Global Step: 110770 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:59:52,862-Speed 5956.65 samples/sec Loss 7.3787 LearningRate 0.1072 Epoch: 10 Global Step: 110780 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 17:59:59,715-Speed 5978.06 samples/sec Loss 7.3367 LearningRate 0.1071 Epoch: 10 Global Step: 110790 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:00:06,572-Speed 5974.62 samples/sec Loss 7.2994 LearningRate 0.1071 Epoch: 10 Global Step: 110800 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:00:13,435-Speed 5969.67 samples/sec Loss 7.3800 LearningRate 0.1071 Epoch: 10 Global Step: 110810 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:00:20,288-Speed 5977.87 samples/sec Loss 7.3668 LearningRate 0.1071 Epoch: 10 Global Step: 110820 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:00:27,164-Speed 5958.31 samples/sec Loss 7.3424 LearningRate 0.1070 Epoch: 10 Global Step: 110830 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:00:34,048-Speed 5951.54 samples/sec Loss 7.3503 LearningRate 0.1070 Epoch: 10 Global Step: 110840 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:00:40,904-Speed 5975.24 samples/sec Loss 7.3512 LearningRate 0.1070 Epoch: 10 Global Step: 110850 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:00:47,759-Speed 5976.02 samples/sec Loss 7.3111 LearningRate 0.1070 Epoch: 10 Global Step: 110860 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:00:54,612-Speed 5977.90 samples/sec Loss 7.3277 LearningRate 0.1070 Epoch: 10 Global Step: 110870 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:01:01,476-Speed 5968.58 samples/sec Loss 7.4074 LearningRate 0.1069 Epoch: 10 Global Step: 110880 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:01:08,368-Speed 5944.28 samples/sec Loss 7.3953 LearningRate 0.1069 Epoch: 10 Global Step: 110890 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:01:15,223-Speed 5977.70 samples/sec Loss 7.3732 LearningRate 0.1069 Epoch: 10 Global Step: 110900 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:01:22,100-Speed 5957.62 samples/sec Loss 7.3347 LearningRate 0.1069 Epoch: 10 Global Step: 110910 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:01:28,960-Speed 5971.48 samples/sec Loss 7.3178 LearningRate 0.1068 Epoch: 10 Global Step: 110920 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:01:35,871-Speed 5928.55 samples/sec Loss 7.3569 LearningRate 0.1068 Epoch: 10 Global Step: 110930 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:01:42,727-Speed 5975.86 samples/sec Loss 7.3431 LearningRate 0.1068 Epoch: 10 Global Step: 110940 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:01:49,585-Speed 5973.84 samples/sec Loss 7.3037 LearningRate 0.1068 Epoch: 10 Global Step: 110950 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:01:56,430-Speed 5985.41 samples/sec Loss 7.3370 LearningRate 0.1068 Epoch: 10 Global Step: 110960 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:02:03,288-Speed 5973.65 samples/sec Loss 7.3581 LearningRate 0.1067 Epoch: 10 Global Step: 110970 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:02:10,163-Speed 5958.85 samples/sec Loss 7.3473 LearningRate 0.1067 Epoch: 10 Global Step: 110980 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:02:17,001-Speed 5991.25 samples/sec Loss 7.2935 LearningRate 0.1067 Epoch: 10 Global Step: 110990 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:02:23,846-Speed 5985.47 samples/sec Loss 7.3775 LearningRate 0.1067 Epoch: 10 Global Step: 111000 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:02:30,713-Speed 5966.02 samples/sec Loss 7.3338 LearningRate 0.1066 Epoch: 10 Global Step: 111010 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:02:37,590-Speed 5957.12 samples/sec Loss 7.3795 LearningRate 0.1066 Epoch: 10 Global Step: 111020 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:02:44,464-Speed 5960.25 samples/sec Loss 7.3585 LearningRate 0.1066 Epoch: 10 Global Step: 111030 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:02:51,336-Speed 5961.29 samples/sec Loss 7.3396 LearningRate 0.1066 Epoch: 10 Global Step: 111040 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:02:58,227-Speed 5945.11 samples/sec Loss 7.3105 LearningRate 0.1066 Epoch: 10 Global Step: 111050 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:03:05,088-Speed 5972.16 samples/sec Loss 7.3964 LearningRate 0.1065 Epoch: 10 Global Step: 111060 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:03:11,963-Speed 5959.21 samples/sec Loss 7.3312 LearningRate 0.1065 Epoch: 10 Global Step: 111070 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:03:18,841-Speed 5956.43 samples/sec Loss 7.3371 LearningRate 0.1065 Epoch: 10 Global Step: 111080 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:03:25,706-Speed 5967.60 samples/sec Loss 7.3677 LearningRate 0.1065 Epoch: 10 Global Step: 111090 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:03:32,553-Speed 5983.45 samples/sec Loss 7.2698 LearningRate 0.1064 Epoch: 10 Global Step: 111100 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:03:39,416-Speed 5969.42 samples/sec Loss 7.3250 LearningRate 0.1064 Epoch: 10 Global Step: 111110 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:03:46,266-Speed 5980.63 samples/sec Loss 7.3602 LearningRate 0.1064 Epoch: 10 Global Step: 111120 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:03:53,123-Speed 5974.08 samples/sec Loss 7.3218 LearningRate 0.1064 Epoch: 10 Global Step: 111130 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:03:59,968-Speed 5984.71 samples/sec Loss 7.2759 LearningRate 0.1064 Epoch: 10 Global Step: 111140 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:04:06,819-Speed 5979.72 samples/sec Loss 7.4083 LearningRate 0.1063 Epoch: 10 Global Step: 111150 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:04:13,670-Speed 5979.93 samples/sec Loss 7.3773 LearningRate 0.1063 Epoch: 10 Global Step: 111160 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:04:20,545-Speed 5958.71 samples/sec Loss 7.3641 LearningRate 0.1063 Epoch: 10 Global Step: 111170 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:04:27,419-Speed 5959.49 samples/sec Loss 7.2853 LearningRate 0.1063 Epoch: 10 Global Step: 111180 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:04:34,260-Speed 5989.06 samples/sec Loss 7.3887 LearningRate 0.1062 Epoch: 10 Global Step: 111190 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:04:41,111-Speed 5981.66 samples/sec Loss 7.3735 LearningRate 0.1062 Epoch: 10 Global Step: 111200 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:04:47,969-Speed 5974.04 samples/sec Loss 7.3872 LearningRate 0.1062 Epoch: 10 Global Step: 111210 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:04:54,823-Speed 5976.36 samples/sec Loss 7.3861 LearningRate 0.1062 Epoch: 10 Global Step: 111220 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:05:01,675-Speed 5978.75 samples/sec Loss 7.3402 LearningRate 0.1062 Epoch: 10 Global Step: 111230 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:05:08,543-Speed 5965.21 samples/sec Loss 7.3870 LearningRate 0.1061 Epoch: 10 Global Step: 111240 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:05:15,418-Speed 5959.10 samples/sec Loss 7.4023 LearningRate 0.1061 Epoch: 10 Global Step: 111250 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:05:22,391-Speed 5875.31 samples/sec Loss 7.3442 LearningRate 0.1061 Epoch: 10 Global Step: 111260 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:05:29,239-Speed 5982.83 samples/sec Loss 7.2859 LearningRate 0.1061 Epoch: 10 Global Step: 111270 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:05:36,105-Speed 5966.85 samples/sec Loss 7.2875 LearningRate 0.1060 Epoch: 10 Global Step: 111280 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:05:43,007-Speed 5937.09 samples/sec Loss 7.2833 LearningRate 0.1060 Epoch: 10 Global Step: 111290 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:05:49,858-Speed 5979.78 samples/sec Loss 7.3597 LearningRate 0.1060 Epoch: 10 Global Step: 111300 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:05:56,719-Speed 5971.16 samples/sec Loss 7.3174 LearningRate 0.1060 Epoch: 10 Global Step: 111310 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:06:03,570-Speed 5979.46 samples/sec Loss 7.3101 LearningRate 0.1060 Epoch: 10 Global Step: 111320 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:06:10,420-Speed 5981.53 samples/sec Loss 7.3510 LearningRate 0.1059 Epoch: 10 Global Step: 111330 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:06:17,316-Speed 5940.62 samples/sec Loss 7.3047 LearningRate 0.1059 Epoch: 10 Global Step: 111340 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:06:24,178-Speed 5970.73 samples/sec Loss 7.3554 LearningRate 0.1059 Epoch: 10 Global Step: 111350 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:06:31,054-Speed 5958.53 samples/sec Loss 7.3372 LearningRate 0.1059 Epoch: 10 Global Step: 111360 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:06:37,945-Speed 5944.79 samples/sec Loss 7.3359 LearningRate 0.1058 Epoch: 10 Global Step: 111370 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:06:44,804-Speed 5972.90 samples/sec Loss 7.3260 LearningRate 0.1058 Epoch: 10 Global Step: 111380 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:06:51,654-Speed 5980.83 samples/sec Loss 7.3952 LearningRate 0.1058 Epoch: 10 Global Step: 111390 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:06:58,547-Speed 5943.31 samples/sec Loss 7.2770 LearningRate 0.1058 Epoch: 10 Global Step: 111400 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:07:05,427-Speed 5954.33 samples/sec Loss 7.3154 LearningRate 0.1058 Epoch: 10 Global Step: 111410 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:07:12,299-Speed 5963.27 samples/sec Loss 7.3572 LearningRate 0.1057 Epoch: 10 Global Step: 111420 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:07:19,152-Speed 5977.24 samples/sec Loss 7.2983 LearningRate 0.1057 Epoch: 10 Global Step: 111430 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:07:25,997-Speed 5985.37 samples/sec Loss 7.3207 LearningRate 0.1057 Epoch: 10 Global Step: 111440 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:07:32,858-Speed 5971.42 samples/sec Loss 7.3779 LearningRate 0.1057 Epoch: 10 Global Step: 111450 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:07:39,719-Speed 5970.80 samples/sec Loss 7.3662 LearningRate 0.1056 Epoch: 10 Global Step: 111460 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:07:46,586-Speed 5966.18 samples/sec Loss 7.2948 LearningRate 0.1056 Epoch: 10 Global Step: 111470 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:07:53,469-Speed 5952.87 samples/sec Loss 7.3103 LearningRate 0.1056 Epoch: 10 Global Step: 111480 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:08:00,334-Speed 5967.42 samples/sec Loss 7.3586 LearningRate 0.1056 Epoch: 10 Global Step: 111490 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:08:07,186-Speed 5980.90 samples/sec Loss 7.3259 LearningRate 0.1056 Epoch: 10 Global Step: 111500 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:08:14,037-Speed 5980.39 samples/sec Loss 7.2514 LearningRate 0.1055 Epoch: 10 Global Step: 111510 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:08:20,880-Speed 5986.15 samples/sec Loss 7.3532 LearningRate 0.1055 Epoch: 10 Global Step: 111520 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:08:27,876-Speed 5856.56 samples/sec Loss 7.3329 LearningRate 0.1055 Epoch: 10 Global Step: 111530 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:08:34,722-Speed 5983.89 samples/sec Loss 7.2480 LearningRate 0.1055 Epoch: 10 Global Step: 111540 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:08:41,574-Speed 5978.56 samples/sec Loss 7.2719 LearningRate 0.1054 Epoch: 10 Global Step: 111550 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:08:48,435-Speed 5971.22 samples/sec Loss 7.3461 LearningRate 0.1054 Epoch: 10 Global Step: 111560 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:08:55,295-Speed 5972.38 samples/sec Loss 7.3322 LearningRate 0.1054 Epoch: 10 Global Step: 111570 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-08 18:09:02,184-Speed 5946.98 samples/sec Loss 7.3201 LearningRate 0.1054 Epoch: 10 Global Step: 111580 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:09:09,043-Speed 5974.61 samples/sec Loss 7.2448 LearningRate 0.1054 Epoch: 10 Global Step: 111590 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:09:15,894-Speed 5979.40 samples/sec Loss 7.2878 LearningRate 0.1053 Epoch: 10 Global Step: 111600 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:09:22,753-Speed 5972.63 samples/sec Loss 7.2932 LearningRate 0.1053 Epoch: 10 Global Step: 111610 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:09:29,607-Speed 5977.66 samples/sec Loss 7.2817 LearningRate 0.1053 Epoch: 10 Global Step: 111620 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:09:36,476-Speed 5964.41 samples/sec Loss 7.2902 LearningRate 0.1053 Epoch: 10 Global Step: 111630 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:09:43,337-Speed 5970.55 samples/sec Loss 7.3288 LearningRate 0.1053 Epoch: 10 Global Step: 111640 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:09:50,193-Speed 5977.77 samples/sec Loss 7.3314 LearningRate 0.1052 Epoch: 10 Global Step: 111650 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:09:57,056-Speed 5970.06 samples/sec Loss 7.3978 LearningRate 0.1052 Epoch: 10 Global Step: 111660 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:10:03,917-Speed 5970.89 samples/sec Loss 7.2882 LearningRate 0.1052 Epoch: 10 Global Step: 111670 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:10:10,760-Speed 5987.10 samples/sec Loss 7.3444 LearningRate 0.1052 Epoch: 10 Global Step: 111680 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:10:17,628-Speed 5964.92 samples/sec Loss 7.3053 LearningRate 0.1051 Epoch: 10 Global Step: 111690 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:10:24,498-Speed 5962.37 samples/sec Loss 7.2128 LearningRate 0.1051 Epoch: 10 Global Step: 111700 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:10:31,371-Speed 5960.88 samples/sec Loss 7.3240 LearningRate 0.1051 Epoch: 10 Global Step: 111710 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:10:38,230-Speed 5972.98 samples/sec Loss 7.2942 LearningRate 0.1051 Epoch: 10 Global Step: 111720 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:10:45,094-Speed 5968.60 samples/sec Loss 7.3026 LearningRate 0.1051 Epoch: 10 Global Step: 111730 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:10:51,954-Speed 5972.36 samples/sec Loss 7.3503 LearningRate 0.1050 Epoch: 10 Global Step: 111740 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:10:58,821-Speed 5969.54 samples/sec Loss 7.3629 LearningRate 0.1050 Epoch: 10 Global Step: 111750 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:11:05,685-Speed 5968.32 samples/sec Loss 7.2868 LearningRate 0.1050 Epoch: 10 Global Step: 111760 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:11:12,546-Speed 5971.30 samples/sec Loss 7.3334 LearningRate 0.1050 Epoch: 10 Global Step: 111770 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:11:19,417-Speed 5963.13 samples/sec Loss 7.3114 LearningRate 0.1049 Epoch: 10 Global Step: 111780 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:11:26,283-Speed 5966.48 samples/sec Loss 7.3348 LearningRate 0.1049 Epoch: 10 Global Step: 111790 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:11:33,157-Speed 5960.38 samples/sec Loss 7.3061 LearningRate 0.1049 Epoch: 10 Global Step: 111800 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:11:40,030-Speed 5961.05 samples/sec Loss 7.2889 LearningRate 0.1049 Epoch: 10 Global Step: 111810 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:11:46,882-Speed 5978.48 samples/sec Loss 7.2922 LearningRate 0.1049 Epoch: 10 Global Step: 111820 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:11:53,735-Speed 5977.85 samples/sec Loss 7.3180 LearningRate 0.1048 Epoch: 10 Global Step: 111830 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:12:00,586-Speed 5979.57 samples/sec Loss 7.3291 LearningRate 0.1048 Epoch: 10 Global Step: 111840 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:12:07,482-Speed 5941.36 samples/sec Loss 7.2685 LearningRate 0.1048 Epoch: 10 Global Step: 111850 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:12:14,419-Speed 5905.29 samples/sec Loss 7.2592 LearningRate 0.1048 Epoch: 10 Global Step: 111860 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:12:21,351-Speed 5910.47 samples/sec Loss 7.2812 LearningRate 0.1047 Epoch: 10 Global Step: 111870 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:12:28,276-Speed 5915.03 samples/sec Loss 7.3114 LearningRate 0.1047 Epoch: 10 Global Step: 111880 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:12:35,173-Speed 5940.76 samples/sec Loss 7.2883 LearningRate 0.1047 Epoch: 10 Global Step: 111890 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:12:42,036-Speed 5969.25 samples/sec Loss 7.3323 LearningRate 0.1047 Epoch: 10 Global Step: 111900 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:12:48,900-Speed 5968.84 samples/sec Loss 7.3360 LearningRate 0.1047 Epoch: 10 Global Step: 111910 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:12:55,773-Speed 5960.51 samples/sec Loss 7.3545 LearningRate 0.1046 Epoch: 10 Global Step: 111920 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:13:02,630-Speed 5975.19 samples/sec Loss 7.2328 LearningRate 0.1046 Epoch: 10 Global Step: 111930 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:13:09,509-Speed 5954.97 samples/sec Loss 7.3355 LearningRate 0.1046 Epoch: 10 Global Step: 111940 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:13:16,397-Speed 5948.62 samples/sec Loss 7.2784 LearningRate 0.1046 Epoch: 10 Global Step: 111950 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:13:23,253-Speed 5974.79 samples/sec Loss 7.3078 LearningRate 0.1045 Epoch: 10 Global Step: 111960 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:13:30,129-Speed 5957.81 samples/sec Loss 7.2807 LearningRate 0.1045 Epoch: 10 Global Step: 111970 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:13:36,984-Speed 5976.42 samples/sec Loss 7.2696 LearningRate 0.1045 Epoch: 10 Global Step: 111980 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:13:43,830-Speed 5984.26 samples/sec Loss 7.2575 LearningRate 0.1045 Epoch: 10 Global Step: 111990 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:13:50,684-Speed 5976.38 samples/sec Loss 7.2549 LearningRate 0.1045 Epoch: 10 Global Step: 112000 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:13:57,527-Speed 5987.75 samples/sec Loss 7.3021 LearningRate 0.1044 Epoch: 10 Global Step: 112010 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:14:04,374-Speed 5982.90 samples/sec Loss 7.2439 LearningRate 0.1044 Epoch: 10 Global Step: 112020 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:14:11,224-Speed 5979.83 samples/sec Loss 7.2054 LearningRate 0.1044 Epoch: 10 Global Step: 112030 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:14:18,076-Speed 5984.24 samples/sec Loss 7.3185 LearningRate 0.1044 Epoch: 10 Global Step: 112040 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:14:24,943-Speed 5966.91 samples/sec Loss 7.2623 LearningRate 0.1044 Epoch: 10 Global Step: 112050 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:14:31,810-Speed 5965.39 samples/sec Loss 7.2782 LearningRate 0.1043 Epoch: 10 Global Step: 112060 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:14:38,653-Speed 5989.95 samples/sec Loss 7.2413 LearningRate 0.1043 Epoch: 10 Global Step: 112070 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:14:45,524-Speed 5962.30 samples/sec Loss 7.3279 LearningRate 0.1043 Epoch: 10 Global Step: 112080 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:14:52,421-Speed 5939.99 samples/sec Loss 7.2667 LearningRate 0.1043 Epoch: 10 Global Step: 112090 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:14:59,271-Speed 5980.53 samples/sec Loss 7.2573 LearningRate 0.1042 Epoch: 10 Global Step: 112100 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:15:06,133-Speed 5970.82 samples/sec Loss 7.2665 LearningRate 0.1042 Epoch: 10 Global Step: 112110 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:15:12,982-Speed 5981.47 samples/sec Loss 7.2490 LearningRate 0.1042 Epoch: 10 Global Step: 112120 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:15:19,862-Speed 5954.11 samples/sec Loss 7.2984 LearningRate 0.1042 Epoch: 10 Global Step: 112130 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:15:26,711-Speed 5982.58 samples/sec Loss 7.3019 LearningRate 0.1042 Epoch: 10 Global Step: 112140 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:15:33,570-Speed 5972.24 samples/sec Loss 7.3537 LearningRate 0.1041 Epoch: 10 Global Step: 112150 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:15:40,496-Speed 5914.97 samples/sec Loss 7.3214 LearningRate 0.1041 Epoch: 10 Global Step: 112160 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:15:47,346-Speed 5981.34 samples/sec Loss 7.2616 LearningRate 0.1041 Epoch: 10 Global Step: 112170 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:15:54,197-Speed 5979.58 samples/sec Loss 7.1950 LearningRate 0.1041 Epoch: 10 Global Step: 112180 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:16:01,053-Speed 5975.31 samples/sec Loss 7.2786 LearningRate 0.1040 Epoch: 10 Global Step: 112190 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:16:07,950-Speed 5940.72 samples/sec Loss 7.2153 LearningRate 0.1040 Epoch: 10 Global Step: 112200 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:16:14,794-Speed 5985.12 samples/sec Loss 7.2017 LearningRate 0.1040 Epoch: 10 Global Step: 112210 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:16:21,647-Speed 5978.40 samples/sec Loss 7.2535 LearningRate 0.1040 Epoch: 10 Global Step: 112220 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:16:28,487-Speed 5989.31 samples/sec Loss 7.2623 LearningRate 0.1040 Epoch: 10 Global Step: 112230 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:16:35,333-Speed 5983.68 samples/sec Loss 7.2017 LearningRate 0.1039 Epoch: 10 Global Step: 112240 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:16:42,208-Speed 5964.49 samples/sec Loss 7.2537 LearningRate 0.1039 Epoch: 10 Global Step: 112250 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:16:49,069-Speed 5979.44 samples/sec Loss 7.3030 LearningRate 0.1039 Epoch: 10 Global Step: 112260 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:16:55,942-Speed 5960.63 samples/sec Loss 7.3285 LearningRate 0.1039 Epoch: 10 Global Step: 112270 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:17:02,810-Speed 5964.61 samples/sec Loss 7.2524 LearningRate 0.1038 Epoch: 10 Global Step: 112280 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:17:09,668-Speed 5974.17 samples/sec Loss 7.2004 LearningRate 0.1038 Epoch: 10 Global Step: 112290 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:17:16,523-Speed 5975.66 samples/sec Loss 7.3273 LearningRate 0.1038 Epoch: 10 Global Step: 112300 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:17:23,398-Speed 5960.02 samples/sec Loss 7.2806 LearningRate 0.1038 Epoch: 10 Global Step: 112310 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:17:30,272-Speed 5959.66 samples/sec Loss 7.3112 LearningRate 0.1038 Epoch: 10 Global Step: 112320 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:17:37,137-Speed 5967.56 samples/sec Loss 7.3472 LearningRate 0.1037 Epoch: 10 Global Step: 112330 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:17:43,992-Speed 5977.09 samples/sec Loss 7.2656 LearningRate 0.1037 Epoch: 10 Global Step: 112340 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:17:50,851-Speed 5972.83 samples/sec Loss 7.2995 LearningRate 0.1037 Epoch: 10 Global Step: 112350 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:17:57,734-Speed 5951.22 samples/sec Loss 7.2796 LearningRate 0.1037 Epoch: 10 Global Step: 112360 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:18:04,582-Speed 5983.14 samples/sec Loss 7.2327 LearningRate 0.1037 Epoch: 10 Global Step: 112370 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:18:11,464-Speed 5952.86 samples/sec Loss 7.2990 LearningRate 0.1036 Epoch: 10 Global Step: 112380 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:18:18,349-Speed 5949.99 samples/sec Loss 7.1537 LearningRate 0.1036 Epoch: 10 Global Step: 112390 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:18:25,197-Speed 5982.51 samples/sec Loss 7.2179 LearningRate 0.1036 Epoch: 10 Global Step: 112400 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:18:32,042-Speed 5985.39 samples/sec Loss 7.2305 LearningRate 0.1036 Epoch: 10 Global Step: 112410 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:18:38,886-Speed 5986.01 samples/sec Loss 7.3126 LearningRate 0.1035 Epoch: 10 Global Step: 112420 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:18:45,757-Speed 5965.47 samples/sec Loss 7.2807 LearningRate 0.1035 Epoch: 10 Global Step: 112430 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:18:52,616-Speed 5974.58 samples/sec Loss 7.3410 LearningRate 0.1035 Epoch: 10 Global Step: 112440 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:18:59,479-Speed 5969.86 samples/sec Loss 7.2852 LearningRate 0.1035 Epoch: 10 Global Step: 112450 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:19:06,325-Speed 5985.50 samples/sec Loss 7.2445 LearningRate 0.1035 Epoch: 10 Global Step: 112460 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:19:13,195-Speed 5964.07 samples/sec Loss 7.2668 LearningRate 0.1034 Epoch: 10 Global Step: 112470 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:19:20,117-Speed 5917.70 samples/sec Loss 7.1943 LearningRate 0.1034 Epoch: 10 Global Step: 112480 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:19:27,037-Speed 5922.46 samples/sec Loss 7.2284 LearningRate 0.1034 Epoch: 10 Global Step: 112490 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:19:33,880-Speed 5989.76 samples/sec Loss 7.1778 LearningRate 0.1034 Epoch: 10 Global Step: 112500 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:19:40,733-Speed 5977.33 samples/sec Loss 7.1954 LearningRate 0.1033 Epoch: 10 Global Step: 112510 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:19:47,627-Speed 5943.01 samples/sec Loss 7.2189 LearningRate 0.1033 Epoch: 10 Global Step: 112520 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:19:54,534-Speed 5931.53 samples/sec Loss 7.2589 LearningRate 0.1033 Epoch: 10 Global Step: 112530 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:20:01,405-Speed 5962.40 samples/sec Loss 7.1878 LearningRate 0.1033 Epoch: 10 Global Step: 112540 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:20:08,283-Speed 5957.24 samples/sec Loss 7.2168 LearningRate 0.1033 Epoch: 10 Global Step: 112550 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:20:15,134-Speed 5980.00 samples/sec Loss 7.2194 LearningRate 0.1032 Epoch: 10 Global Step: 112560 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:20:21,994-Speed 5971.61 samples/sec Loss 7.1753 LearningRate 0.1032 Epoch: 10 Global Step: 112570 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:20:28,881-Speed 5948.60 samples/sec Loss 7.2011 LearningRate 0.1032 Epoch: 10 Global Step: 112580 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:20:35,752-Speed 5965.65 samples/sec Loss 7.2314 LearningRate 0.1032 Epoch: 10 Global Step: 112590 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:20:42,607-Speed 5975.11 samples/sec Loss 7.2637 LearningRate 0.1032 Epoch: 10 Global Step: 112600 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:20:49,461-Speed 5977.66 samples/sec Loss 7.2161 LearningRate 0.1031 Epoch: 10 Global Step: 112610 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:20:56,323-Speed 5970.50 samples/sec Loss 7.1313 LearningRate 0.1031 Epoch: 10 Global Step: 112620 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:21:03,176-Speed 5977.68 samples/sec Loss 7.2729 LearningRate 0.1031 Epoch: 10 Global Step: 112630 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-08 18:21:10,061-Speed 5950.68 samples/sec Loss 7.2715 LearningRate 0.1031 Epoch: 10 Global Step: 112640 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-08 18:21:16,930-Speed 5966.44 samples/sec Loss 7.2268 LearningRate 0.1030 Epoch: 10 Global Step: 112650 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:21:23,767-Speed 5991.92 samples/sec Loss 7.2290 LearningRate 0.1030 Epoch: 10 Global Step: 112660 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:21:30,626-Speed 5972.91 samples/sec Loss 7.2724 LearningRate 0.1030 Epoch: 10 Global Step: 112670 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:21:37,485-Speed 5972.44 samples/sec Loss 7.2390 LearningRate 0.1030 Epoch: 10 Global Step: 112680 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:21:44,335-Speed 5981.05 samples/sec Loss 7.2659 LearningRate 0.1030 Epoch: 10 Global Step: 112690 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:21:51,279-Speed 5899.48 samples/sec Loss 7.2054 LearningRate 0.1029 Epoch: 10 Global Step: 112700 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:21:58,130-Speed 5979.67 samples/sec Loss 7.2414 LearningRate 0.1029 Epoch: 10 Global Step: 112710 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:22:04,975-Speed 5985.40 samples/sec Loss 7.2101 LearningRate 0.1029 Epoch: 10 Global Step: 112720 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:22:11,833-Speed 5973.30 samples/sec Loss 7.1830 LearningRate 0.1029 Epoch: 10 Global Step: 112730 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:22:18,708-Speed 5959.19 samples/sec Loss 7.1767 LearningRate 0.1028 Epoch: 10 Global Step: 112740 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:22:25,554-Speed 5984.38 samples/sec Loss 7.2546 LearningRate 0.1028 Epoch: 10 Global Step: 112750 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:22:32,398-Speed 5985.68 samples/sec Loss 7.2076 LearningRate 0.1028 Epoch: 10 Global Step: 112760 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:22:39,249-Speed 5981.09 samples/sec Loss 7.2076 LearningRate 0.1028 Epoch: 10 Global Step: 112770 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:22:46,122-Speed 5960.01 samples/sec Loss 7.2321 LearningRate 0.1028 Epoch: 10 Global Step: 112780 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:22:52,986-Speed 5969.19 samples/sec Loss 7.2432 LearningRate 0.1027 Epoch: 10 Global Step: 112790 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:22:59,863-Speed 5957.32 samples/sec Loss 7.2224 LearningRate 0.1027 Epoch: 10 Global Step: 112800 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:23:06,712-Speed 5981.37 samples/sec Loss 7.2587 LearningRate 0.1027 Epoch: 10 Global Step: 112810 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:23:13,572-Speed 5972.27 samples/sec Loss 7.2166 LearningRate 0.1027 Epoch: 10 Global Step: 112820 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:23:20,422-Speed 5980.51 samples/sec Loss 7.1975 LearningRate 0.1027 Epoch: 10 Global Step: 112830 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:23:27,308-Speed 5948.81 samples/sec Loss 7.1898 LearningRate 0.1026 Epoch: 10 Global Step: 112840 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:23:34,161-Speed 5977.90 samples/sec Loss 7.1918 LearningRate 0.1026 Epoch: 10 Global Step: 112850 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:23:41,008-Speed 5983.32 samples/sec Loss 7.2034 LearningRate 0.1026 Epoch: 10 Global Step: 112860 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:23:47,872-Speed 5969.16 samples/sec Loss 7.2839 LearningRate 0.1026 Epoch: 10 Global Step: 112870 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:23:54,765-Speed 5942.52 samples/sec Loss 7.2756 LearningRate 0.1025 Epoch: 10 Global Step: 112880 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:24:01,625-Speed 5972.72 samples/sec Loss 7.1920 LearningRate 0.1025 Epoch: 10 Global Step: 112890 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:24:08,474-Speed 5980.82 samples/sec Loss 7.2411 LearningRate 0.1025 Epoch: 10 Global Step: 112900 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:24:15,345-Speed 5963.11 samples/sec Loss 7.1924 LearningRate 0.1025 Epoch: 10 Global Step: 112910 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-08 18:24:22,209-Speed 5968.65 samples/sec Loss 7.2486 LearningRate 0.1025 Epoch: 10 Global Step: 112920 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:24:29,178-Speed 5878.57 samples/sec Loss 7.1973 LearningRate 0.1024 Epoch: 10 Global Step: 112930 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 18:24:36,042-Speed 5968.87 samples/sec Loss 7.2240 LearningRate 0.1024 Epoch: 10 Global Step: 112940 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:24:42,899-Speed 5975.25 samples/sec Loss 7.2651 LearningRate 0.1024 Epoch: 10 Global Step: 112950 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:24:49,775-Speed 5957.70 samples/sec Loss 7.2450 LearningRate 0.1024 Epoch: 10 Global Step: 112960 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:24:56,718-Speed 5900.97 samples/sec Loss 7.2803 LearningRate 0.1023 Epoch: 10 Global Step: 112970 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:25:03,650-Speed 5910.28 samples/sec Loss 7.2246 LearningRate 0.1023 Epoch: 10 Global Step: 112980 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:25:10,547-Speed 5939.43 samples/sec Loss 7.1188 LearningRate 0.1023 Epoch: 10 Global Step: 112990 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:25:17,417-Speed 5966.11 samples/sec Loss 7.2109 LearningRate 0.1023 Epoch: 10 Global Step: 113000 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:25:24,277-Speed 5972.56 samples/sec Loss 7.2146 LearningRate 0.1023 Epoch: 10 Global Step: 113010 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 18:25:31,161-Speed 5950.27 samples/sec Loss 7.2620 LearningRate 0.1022 Epoch: 10 Global Step: 113020 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:25:38,019-Speed 5974.85 samples/sec Loss 7.1922 LearningRate 0.1022 Epoch: 10 Global Step: 113030 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:25:44,899-Speed 5954.51 samples/sec Loss 7.1875 LearningRate 0.1022 Epoch: 10 Global Step: 113040 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:25:51,753-Speed 5976.91 samples/sec Loss 7.1519 LearningRate 0.1022 Epoch: 10 Global Step: 113050 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:25:58,604-Speed 5980.36 samples/sec Loss 7.2456 LearningRate 0.1022 Epoch: 10 Global Step: 113060 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:26:05,469-Speed 5968.26 samples/sec Loss 7.2074 LearningRate 0.1021 Epoch: 10 Global Step: 113070 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:26:12,323-Speed 5976.25 samples/sec Loss 7.1742 LearningRate 0.1021 Epoch: 10 Global Step: 113080 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:26:19,204-Speed 5953.99 samples/sec Loss 7.1848 LearningRate 0.1021 Epoch: 10 Global Step: 113090 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:26:26,082-Speed 5957.21 samples/sec Loss 7.1888 LearningRate 0.1021 Epoch: 10 Global Step: 113100 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:26:32,933-Speed 5979.17 samples/sec Loss 7.1710 LearningRate 0.1020 Epoch: 10 Global Step: 113110 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:26:39,817-Speed 5951.73 samples/sec Loss 7.2490 LearningRate 0.1020 Epoch: 10 Global Step: 113120 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:26:46,713-Speed 5941.21 samples/sec Loss 7.1689 LearningRate 0.1020 Epoch: 10 Global Step: 113130 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:26:54,657-Speed 5156.94 samples/sec Loss 7.1837 LearningRate 0.1020 Epoch: 10 Global Step: 113140 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:27:01,507-Speed 5981.04 samples/sec Loss 7.2847 LearningRate 0.1020 Epoch: 10 Global Step: 113150 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:27:08,375-Speed 5965.02 samples/sec Loss 7.2229 LearningRate 0.1019 Epoch: 10 Global Step: 113160 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:27:15,226-Speed 5982.04 samples/sec Loss 7.1826 LearningRate 0.1019 Epoch: 10 Global Step: 113170 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:27:22,092-Speed 5969.61 samples/sec Loss 7.1775 LearningRate 0.1019 Epoch: 10 Global Step: 113180 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:27:28,964-Speed 5962.06 samples/sec Loss 7.2069 LearningRate 0.1019 Epoch: 10 Global Step: 113190 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:27:35,824-Speed 5971.28 samples/sec Loss 7.1860 LearningRate 0.1018 Epoch: 10 Global Step: 113200 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:27:42,691-Speed 5965.89 samples/sec Loss 7.1787 LearningRate 0.1018 Epoch: 10 Global Step: 113210 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:27:49,547-Speed 5975.81 samples/sec Loss 7.2048 LearningRate 0.1018 Epoch: 10 Global Step: 113220 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:27:56,400-Speed 5977.74 samples/sec Loss 7.1755 LearningRate 0.1018 Epoch: 10 Global Step: 113230 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:28:03,248-Speed 5982.67 samples/sec Loss 7.1485 LearningRate 0.1018 Epoch: 10 Global Step: 113240 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:28:10,123-Speed 5958.84 samples/sec Loss 7.2087 LearningRate 0.1017 Epoch: 10 Global Step: 113250 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:28:16,980-Speed 5974.82 samples/sec Loss 7.1702 LearningRate 0.1017 Epoch: 10 Global Step: 113260 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:28:23,869-Speed 5946.79 samples/sec Loss 7.1067 LearningRate 0.1017 Epoch: 10 Global Step: 113270 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:28:30,739-Speed 5963.38 samples/sec Loss 7.2116 LearningRate 0.1017 Epoch: 10 Global Step: 113280 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:28:37,588-Speed 5981.36 samples/sec Loss 7.2070 LearningRate 0.1017 Epoch: 10 Global Step: 113290 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:28:44,462-Speed 5971.16 samples/sec Loss 7.2044 LearningRate 0.1016 Epoch: 10 Global Step: 113300 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:28:51,334-Speed 5961.86 samples/sec Loss 7.2271 LearningRate 0.1016 Epoch: 10 Global Step: 113310 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:28:58,216-Speed 5952.38 samples/sec Loss 7.2431 LearningRate 0.1016 Epoch: 10 Global Step: 113320 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:29:05,070-Speed 5977.80 samples/sec Loss 7.1822 LearningRate 0.1016 Epoch: 10 Global Step: 113330 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:29:11,913-Speed 5986.50 samples/sec Loss 7.2056 LearningRate 0.1015 Epoch: 10 Global Step: 113340 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:29:18,777-Speed 5967.79 samples/sec Loss 7.1907 LearningRate 0.1015 Epoch: 10 Global Step: 113350 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:29:25,639-Speed 5970.74 samples/sec Loss 7.1886 LearningRate 0.1015 Epoch: 10 Global Step: 113360 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:29:32,520-Speed 5954.56 samples/sec Loss 7.1298 LearningRate 0.1015 Epoch: 10 Global Step: 113370 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:29:39,405-Speed 5949.50 samples/sec Loss 7.1739 LearningRate 0.1015 Epoch: 10 Global Step: 113380 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:29:46,290-Speed 5952.83 samples/sec Loss 7.1994 LearningRate 0.1014 Epoch: 10 Global Step: 113390 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:29:53,152-Speed 5970.07 samples/sec Loss 7.1712 LearningRate 0.1014 Epoch: 10 Global Step: 113400 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:30:00,009-Speed 5973.97 samples/sec Loss 7.1979 LearningRate 0.1014 Epoch: 10 Global Step: 113410 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:30:07,701-Speed 5326.51 samples/sec Loss 7.2410 LearningRate 0.1014 Epoch: 10 Global Step: 113420 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:30:14,552-Speed 5979.25 samples/sec Loss 7.1886 LearningRate 0.1014 Epoch: 10 Global Step: 113430 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:30:21,400-Speed 5982.74 samples/sec Loss 7.1593 LearningRate 0.1013 Epoch: 10 Global Step: 113440 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:30:28,360-Speed 5886.35 samples/sec Loss 7.1435 LearningRate 0.1013 Epoch: 10 Global Step: 113450 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:30:35,208-Speed 5982.09 samples/sec Loss 7.1546 LearningRate 0.1013 Epoch: 10 Global Step: 113460 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:30:42,078-Speed 5962.39 samples/sec Loss 7.1808 LearningRate 0.1013 Epoch: 10 Global Step: 113470 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:30:48,934-Speed 5975.85 samples/sec Loss 7.0694 LearningRate 0.1012 Epoch: 10 Global Step: 113480 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:30:55,810-Speed 5958.41 samples/sec Loss 7.1341 LearningRate 0.1012 Epoch: 10 Global Step: 113490 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:31:02,684-Speed 5959.16 samples/sec Loss 7.2303 LearningRate 0.1012 Epoch: 10 Global Step: 113500 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-08 18:31:09,544-Speed 5972.17 samples/sec Loss 7.1616 LearningRate 0.1012 Epoch: 10 Global Step: 113510 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:31:16,405-Speed 5971.72 samples/sec Loss 7.1616 LearningRate 0.1012 Epoch: 10 Global Step: 113520 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:31:23,294-Speed 5946.96 samples/sec Loss 7.2012 LearningRate 0.1011 Epoch: 10 Global Step: 113530 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:31:30,143-Speed 5981.02 samples/sec Loss 7.2274 LearningRate 0.1011 Epoch: 10 Global Step: 113540 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:31:36,997-Speed 5977.78 samples/sec Loss 7.1892 LearningRate 0.1011 Epoch: 10 Global Step: 113550 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:31:43,850-Speed 5977.90 samples/sec Loss 7.1752 LearningRate 0.1011 Epoch: 10 Global Step: 113560 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:31:50,834-Speed 5867.05 samples/sec Loss 7.1814 LearningRate 0.1011 Epoch: 10 Global Step: 113570 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:31:57,685-Speed 5980.04 samples/sec Loss 7.2349 LearningRate 0.1010 Epoch: 10 Global Step: 113580 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:32:04,520-Speed 5993.39 samples/sec Loss 7.1977 LearningRate 0.1010 Epoch: 10 Global Step: 113590 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:32:11,365-Speed 5985.53 samples/sec Loss 7.2219 LearningRate 0.1010 Epoch: 10 Global Step: 113600 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:32:18,215-Speed 5980.93 samples/sec Loss 7.1727 LearningRate 0.1010 Epoch: 10 Global Step: 113610 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:32:25,065-Speed 5980.44 samples/sec Loss 7.2182 LearningRate 0.1009 Epoch: 10 Global Step: 113620 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:32:31,934-Speed 5963.88 samples/sec Loss 7.1697 LearningRate 0.1009 Epoch: 10 Global Step: 113630 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:32:38,796-Speed 5970.39 samples/sec Loss 7.1813 LearningRate 0.1009 Epoch: 10 Global Step: 113640 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:32:45,656-Speed 5971.43 samples/sec Loss 7.1674 LearningRate 0.1009 Epoch: 10 Global Step: 113650 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:32:52,552-Speed 5941.87 samples/sec Loss 7.1447 LearningRate 0.1009 Epoch: 10 Global Step: 113660 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:32:59,538-Speed 5865.48 samples/sec Loss 7.1718 LearningRate 0.1008 Epoch: 10 Global Step: 113670 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:33:06,408-Speed 5963.17 samples/sec Loss 7.1760 LearningRate 0.1008 Epoch: 10 Global Step: 113680 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:33:13,392-Speed 5865.84 samples/sec Loss 7.0758 LearningRate 0.1008 Epoch: 10 Global Step: 113690 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:33:20,385-Speed 5858.84 samples/sec Loss 7.1581 LearningRate 0.1008 Epoch: 10 Global Step: 113700 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:33:27,278-Speed 5943.67 samples/sec Loss 7.1561 LearningRate 0.1007 Epoch: 10 Global Step: 113710 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:33:34,147-Speed 5964.28 samples/sec Loss 7.0866 LearningRate 0.1007 Epoch: 10 Global Step: 113720 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:33:40,997-Speed 5980.98 samples/sec Loss 7.1593 LearningRate 0.1007 Epoch: 10 Global Step: 113730 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:33:47,858-Speed 5971.15 samples/sec Loss 7.1557 LearningRate 0.1007 Epoch: 10 Global Step: 113740 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:33:54,712-Speed 5979.33 samples/sec Loss 7.1513 LearningRate 0.1007 Epoch: 10 Global Step: 113750 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:34:01,574-Speed 5969.84 samples/sec Loss 7.1678 LearningRate 0.1006 Epoch: 10 Global Step: 113760 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:34:08,430-Speed 5975.76 samples/sec Loss 7.1258 LearningRate 0.1006 Epoch: 10 Global Step: 113770 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:34:15,272-Speed 5987.28 samples/sec Loss 7.1220 LearningRate 0.1006 Epoch: 10 Global Step: 113780 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:34:22,131-Speed 5973.49 samples/sec Loss 7.1091 LearningRate 0.1006 Epoch: 10 Global Step: 113790 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:34:29,003-Speed 5961.54 samples/sec Loss 7.0986 LearningRate 0.1006 Epoch: 10 Global Step: 113800 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:34:35,862-Speed 5972.89 samples/sec Loss 7.1909 LearningRate 0.1005 Epoch: 10 Global Step: 113810 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:34:42,712-Speed 5981.08 samples/sec Loss 7.1031 LearningRate 0.1005 Epoch: 10 Global Step: 113820 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:34:49,558-Speed 5983.85 samples/sec Loss 7.1534 LearningRate 0.1005 Epoch: 10 Global Step: 113830 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:34:56,407-Speed 5982.37 samples/sec Loss 7.2127 LearningRate 0.1005 Epoch: 10 Global Step: 113840 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:35:03,254-Speed 5984.32 samples/sec Loss 7.0997 LearningRate 0.1004 Epoch: 10 Global Step: 113850 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:35:10,109-Speed 5976.02 samples/sec Loss 7.1492 LearningRate 0.1004 Epoch: 10 Global Step: 113860 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:35:16,962-Speed 5978.80 samples/sec Loss 7.1913 LearningRate 0.1004 Epoch: 10 Global Step: 113870 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:35:23,813-Speed 5979.69 samples/sec Loss 7.1634 LearningRate 0.1004 Epoch: 10 Global Step: 113880 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:35:30,674-Speed 5971.11 samples/sec Loss 7.1710 LearningRate 0.1004 Epoch: 10 Global Step: 113890 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:35:37,541-Speed 5965.64 samples/sec Loss 7.1149 LearningRate 0.1003 Epoch: 10 Global Step: 113900 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:35:44,404-Speed 5969.46 samples/sec Loss 7.1142 LearningRate 0.1003 Epoch: 10 Global Step: 113910 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:35:51,319-Speed 5924.64 samples/sec Loss 7.1745 LearningRate 0.1003 Epoch: 10 Global Step: 113920 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:35:58,172-Speed 5977.41 samples/sec Loss 7.1902 LearningRate 0.1003 Epoch: 10 Global Step: 113930 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:36:05,023-Speed 5980.33 samples/sec Loss 7.1136 LearningRate 0.1003 Epoch: 10 Global Step: 113940 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:36:11,870-Speed 5982.41 samples/sec Loss 7.1527 LearningRate 0.1002 Epoch: 10 Global Step: 113950 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:36:18,763-Speed 5943.90 samples/sec Loss 7.1801 LearningRate 0.1002 Epoch: 10 Global Step: 113960 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:36:25,633-Speed 5963.09 samples/sec Loss 7.1633 LearningRate 0.1002 Epoch: 10 Global Step: 113970 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:36:32,507-Speed 5959.60 samples/sec Loss 7.1543 LearningRate 0.1002 Epoch: 10 Global Step: 113980 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:36:39,383-Speed 5959.09 samples/sec Loss 7.2331 LearningRate 0.1001 Epoch: 10 Global Step: 113990 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:36:46,266-Speed 5952.35 samples/sec Loss 7.1854 LearningRate 0.1001 Epoch: 10 Global Step: 114000 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:36:53,129-Speed 5969.06 samples/sec Loss 7.2365 LearningRate 0.1001 Epoch: 10 Global Step: 114010 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:36:59,992-Speed 5969.78 samples/sec Loss 7.1628 LearningRate 0.1001 Epoch: 10 Global Step: 114020 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:37:06,844-Speed 5979.08 samples/sec Loss 7.1744 LearningRate 0.1001 Epoch: 10 Global Step: 114030 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:37:13,686-Speed 5987.17 samples/sec Loss 7.1399 LearningRate 0.1000 Epoch: 10 Global Step: 114040 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:37:20,542-Speed 5977.50 samples/sec Loss 7.1499 LearningRate 0.1000 Epoch: 10 Global Step: 114050 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:37:27,412-Speed 5963.12 samples/sec Loss 7.1142 LearningRate 0.1000 Epoch: 10 Global Step: 114060 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:37:51,428-Speed 1705.67 samples/sec Loss 7.1571 LearningRate 0.1000 Epoch: 11 Global Step: 114070 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:37:58,264-Speed 5993.02 samples/sec Loss 7.1330 LearningRate 0.1000 Epoch: 11 Global Step: 114080 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:38:05,107-Speed 5987.42 samples/sec Loss 7.1313 LearningRate 0.0999 Epoch: 11 Global Step: 114090 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:38:11,954-Speed 5983.96 samples/sec Loss 7.1215 LearningRate 0.0999 Epoch: 11 Global Step: 114100 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:38:18,819-Speed 5967.67 samples/sec Loss 7.1864 LearningRate 0.0999 Epoch: 11 Global Step: 114110 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:38:25,702-Speed 5951.44 samples/sec Loss 7.0655 LearningRate 0.0999 Epoch: 11 Global Step: 114120 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:38:32,576-Speed 5960.90 samples/sec Loss 7.1030 LearningRate 0.0998 Epoch: 11 Global Step: 114130 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:38:39,449-Speed 5962.91 samples/sec Loss 7.1188 LearningRate 0.0998 Epoch: 11 Global Step: 114140 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:38:46,321-Speed 5961.26 samples/sec Loss 7.1873 LearningRate 0.0998 Epoch: 11 Global Step: 114150 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:38:53,233-Speed 5927.57 samples/sec Loss 7.1540 LearningRate 0.0998 Epoch: 11 Global Step: 114160 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:39:00,225-Speed 5859.56 samples/sec Loss 7.1939 LearningRate 0.0998 Epoch: 11 Global Step: 114170 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:39:07,097-Speed 5961.20 samples/sec Loss 7.1496 LearningRate 0.0997 Epoch: 11 Global Step: 114180 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:39:13,992-Speed 5941.70 samples/sec Loss 7.1240 LearningRate 0.0997 Epoch: 11 Global Step: 114190 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:39:20,860-Speed 5965.69 samples/sec Loss 7.1299 LearningRate 0.0997 Epoch: 11 Global Step: 114200 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:39:27,734-Speed 5959.17 samples/sec Loss 7.0933 LearningRate 0.0997 Epoch: 11 Global Step: 114210 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:39:34,582-Speed 5982.75 samples/sec Loss 7.0996 LearningRate 0.0997 Epoch: 11 Global Step: 114220 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:39:41,427-Speed 5985.42 samples/sec Loss 7.0684 LearningRate 0.0996 Epoch: 11 Global Step: 114230 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:39:48,290-Speed 5969.21 samples/sec Loss 7.1138 LearningRate 0.0996 Epoch: 11 Global Step: 114240 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:39:55,156-Speed 5966.68 samples/sec Loss 7.1334 LearningRate 0.0996 Epoch: 11 Global Step: 114250 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:40:02,006-Speed 5981.20 samples/sec Loss 7.1209 LearningRate 0.0996 Epoch: 11 Global Step: 114260 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:40:08,871-Speed 5967.68 samples/sec Loss 7.1245 LearningRate 0.0995 Epoch: 11 Global Step: 114270 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:40:15,734-Speed 5969.13 samples/sec Loss 7.1452 LearningRate 0.0995 Epoch: 11 Global Step: 114280 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:40:22,584-Speed 5981.37 samples/sec Loss 7.1216 LearningRate 0.0995 Epoch: 11 Global Step: 114290 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:40:29,427-Speed 5986.06 samples/sec Loss 7.1379 LearningRate 0.0995 Epoch: 11 Global Step: 114300 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:40:36,263-Speed 5993.07 samples/sec Loss 7.1189 LearningRate 0.0995 Epoch: 11 Global Step: 114310 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:40:43,104-Speed 5988.74 samples/sec Loss 7.0890 LearningRate 0.0994 Epoch: 11 Global Step: 114320 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:40:49,959-Speed 5976.47 samples/sec Loss 7.1107 LearningRate 0.0994 Epoch: 11 Global Step: 114330 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:40:56,811-Speed 5978.83 samples/sec Loss 7.1776 LearningRate 0.0994 Epoch: 11 Global Step: 114340 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:41:03,657-Speed 5985.02 samples/sec Loss 7.0901 LearningRate 0.0994 Epoch: 11 Global Step: 114350 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:41:10,517-Speed 5972.23 samples/sec Loss 7.0855 LearningRate 0.0994 Epoch: 11 Global Step: 114360 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:41:17,386-Speed 5964.56 samples/sec Loss 7.1373 LearningRate 0.0993 Epoch: 11 Global Step: 114370 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:41:24,271-Speed 5950.72 samples/sec Loss 7.1223 LearningRate 0.0993 Epoch: 11 Global Step: 114380 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:41:31,118-Speed 5982.93 samples/sec Loss 7.1376 LearningRate 0.0993 Epoch: 11 Global Step: 114390 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:41:37,975-Speed 5973.88 samples/sec Loss 7.1358 LearningRate 0.0993 Epoch: 11 Global Step: 114400 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:41:44,828-Speed 5978.29 samples/sec Loss 7.0785 LearningRate 0.0992 Epoch: 11 Global Step: 114410 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:41:51,706-Speed 5955.76 samples/sec Loss 7.1515 LearningRate 0.0992 Epoch: 11 Global Step: 114420 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:41:58,573-Speed 5966.82 samples/sec Loss 7.1426 LearningRate 0.0992 Epoch: 11 Global Step: 114430 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:42:05,455-Speed 5952.61 samples/sec Loss 7.0530 LearningRate 0.0992 Epoch: 11 Global Step: 114440 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:42:12,318-Speed 5969.34 samples/sec Loss 7.0851 LearningRate 0.0992 Epoch: 11 Global Step: 114450 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:42:19,201-Speed 5954.61 samples/sec Loss 7.0548 LearningRate 0.0991 Epoch: 11 Global Step: 114460 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:42:26,077-Speed 5959.83 samples/sec Loss 7.1332 LearningRate 0.0991 Epoch: 11 Global Step: 114470 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:42:32,970-Speed 5943.35 samples/sec Loss 7.1758 LearningRate 0.0991 Epoch: 11 Global Step: 114480 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:42:39,931-Speed 5885.97 samples/sec Loss 7.0914 LearningRate 0.0991 Epoch: 11 Global Step: 114490 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:42:46,872-Speed 5901.60 samples/sec Loss 7.1583 LearningRate 0.0991 Epoch: 11 Global Step: 114500 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:42:53,721-Speed 5982.02 samples/sec Loss 7.1061 LearningRate 0.0990 Epoch: 11 Global Step: 114510 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:43:00,595-Speed 5960.13 samples/sec Loss 7.0959 LearningRate 0.0990 Epoch: 11 Global Step: 114520 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:43:07,459-Speed 5968.41 samples/sec Loss 7.0578 LearningRate 0.0990 Epoch: 11 Global Step: 114530 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:43:14,332-Speed 5962.04 samples/sec Loss 7.0557 LearningRate 0.0990 Epoch: 11 Global Step: 114540 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:43:21,198-Speed 5966.56 samples/sec Loss 7.1336 LearningRate 0.0990 Epoch: 11 Global Step: 114550 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:43:28,055-Speed 5975.09 samples/sec Loss 7.1098 LearningRate 0.0989 Epoch: 11 Global Step: 114560 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:43:34,951-Speed 5943.38 samples/sec Loss 7.0835 LearningRate 0.0989 Epoch: 11 Global Step: 114570 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:43:41,807-Speed 5975.07 samples/sec Loss 7.0536 LearningRate 0.0989 Epoch: 11 Global Step: 114580 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:43:48,677-Speed 5962.96 samples/sec Loss 7.1144 LearningRate 0.0989 Epoch: 11 Global Step: 114590 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:43:55,549-Speed 5961.78 samples/sec Loss 7.1130 LearningRate 0.0988 Epoch: 11 Global Step: 114600 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:44:02,415-Speed 5967.26 samples/sec Loss 7.1083 LearningRate 0.0988 Epoch: 11 Global Step: 114610 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:44:09,269-Speed 5976.95 samples/sec Loss 7.0792 LearningRate 0.0988 Epoch: 11 Global Step: 114620 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:44:16,169-Speed 5937.97 samples/sec Loss 7.0926 LearningRate 0.0988 Epoch: 11 Global Step: 114630 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:44:23,031-Speed 5970.50 samples/sec Loss 7.1169 LearningRate 0.0988 Epoch: 11 Global Step: 114640 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:44:29,932-Speed 5936.74 samples/sec Loss 7.1147 LearningRate 0.0987 Epoch: 11 Global Step: 114650 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:44:36,799-Speed 5965.21 samples/sec Loss 7.1055 LearningRate 0.0987 Epoch: 11 Global Step: 114660 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:44:43,666-Speed 5966.04 samples/sec Loss 7.1119 LearningRate 0.0987 Epoch: 11 Global Step: 114670 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:44:50,520-Speed 5976.81 samples/sec Loss 7.1002 LearningRate 0.0987 Epoch: 11 Global Step: 114680 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:44:57,377-Speed 5974.28 samples/sec Loss 7.0701 LearningRate 0.0987 Epoch: 11 Global Step: 114690 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:45:04,261-Speed 5953.33 samples/sec Loss 7.0339 LearningRate 0.0986 Epoch: 11 Global Step: 114700 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:45:11,114-Speed 5977.20 samples/sec Loss 7.0494 LearningRate 0.0986 Epoch: 11 Global Step: 114710 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:45:18,435-Speed 5595.81 samples/sec Loss 7.0857 LearningRate 0.0986 Epoch: 11 Global Step: 114720 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:45:25,336-Speed 5937.25 samples/sec Loss 7.0828 LearningRate 0.0986 Epoch: 11 Global Step: 114730 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:45:32,188-Speed 5978.65 samples/sec Loss 7.1018 LearningRate 0.0985 Epoch: 11 Global Step: 114740 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:45:39,048-Speed 5971.82 samples/sec Loss 7.1109 LearningRate 0.0985 Epoch: 11 Global Step: 114750 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:45:45,896-Speed 5982.06 samples/sec Loss 7.0668 LearningRate 0.0985 Epoch: 11 Global Step: 114760 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:45:52,741-Speed 5985.38 samples/sec Loss 7.0870 LearningRate 0.0985 Epoch: 11 Global Step: 114770 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:45:59,597-Speed 5977.10 samples/sec Loss 7.1405 LearningRate 0.0985 Epoch: 11 Global Step: 114780 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:46:06,453-Speed 5975.03 samples/sec Loss 7.0657 LearningRate 0.0984 Epoch: 11 Global Step: 114790 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:46:13,311-Speed 5974.22 samples/sec Loss 7.0861 LearningRate 0.0984 Epoch: 11 Global Step: 114800 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:46:20,164-Speed 5977.48 samples/sec Loss 7.1009 LearningRate 0.0984 Epoch: 11 Global Step: 114810 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:46:27,018-Speed 5976.33 samples/sec Loss 7.0976 LearningRate 0.0984 Epoch: 11 Global Step: 114820 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:46:33,866-Speed 5982.42 samples/sec Loss 7.1084 LearningRate 0.0984 Epoch: 11 Global Step: 114830 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:46:40,724-Speed 5973.64 samples/sec Loss 7.1008 LearningRate 0.0983 Epoch: 11 Global Step: 114840 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:46:47,586-Speed 5969.68 samples/sec Loss 7.0629 LearningRate 0.0983 Epoch: 11 Global Step: 114850 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:46:54,979-Speed 5897.85 samples/sec Loss 7.0747 LearningRate 0.0983 Epoch: 11 Global Step: 114860 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:47:01,834-Speed 5976.07 samples/sec Loss 7.0679 LearningRate 0.0983 Epoch: 11 Global Step: 114870 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:47:08,678-Speed 5985.68 samples/sec Loss 7.0521 LearningRate 0.0982 Epoch: 11 Global Step: 114880 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-08 18:47:15,520-Speed 5987.98 samples/sec Loss 7.1134 LearningRate 0.0982 Epoch: 11 Global Step: 114890 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:47:22,373-Speed 5977.76 samples/sec Loss 7.0866 LearningRate 0.0982 Epoch: 11 Global Step: 114900 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:47:29,249-Speed 5958.24 samples/sec Loss 7.0587 LearningRate 0.0982 Epoch: 11 Global Step: 114910 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:47:36,118-Speed 5968.27 samples/sec Loss 7.0778 LearningRate 0.0982 Epoch: 11 Global Step: 114920 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:47:42,981-Speed 5969.59 samples/sec Loss 7.0977 LearningRate 0.0981 Epoch: 11 Global Step: 114930 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:47:49,846-Speed 5967.26 samples/sec Loss 7.1482 LearningRate 0.0981 Epoch: 11 Global Step: 114940 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:47:56,713-Speed 5966.17 samples/sec Loss 7.1004 LearningRate 0.0981 Epoch: 11 Global Step: 114950 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:48:03,576-Speed 5969.24 samples/sec Loss 7.0508 LearningRate 0.0981 Epoch: 11 Global Step: 114960 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:48:10,431-Speed 5975.82 samples/sec Loss 7.0266 LearningRate 0.0981 Epoch: 11 Global Step: 114970 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:48:17,283-Speed 5982.60 samples/sec Loss 7.0441 LearningRate 0.0980 Epoch: 11 Global Step: 114980 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:48:24,125-Speed 5987.02 samples/sec Loss 7.0239 LearningRate 0.0980 Epoch: 11 Global Step: 114990 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:48:30,971-Speed 5984.23 samples/sec Loss 7.1320 LearningRate 0.0980 Epoch: 11 Global Step: 115000 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:48:58,197-[lfw][115000]XNorm: 24.114920 Training: 2022-01-08 18:48:58,197-[lfw][115000]Accuracy-Flip: 0.99750+-0.00281 Training: 2022-01-08 18:48:58,198-[lfw][115000]Accuracy-Highest: 0.99783 Training: 2022-01-08 18:49:29,614-[cfp_fp][115000]XNorm: 21.190004 Training: 2022-01-08 18:49:29,615-[cfp_fp][115000]Accuracy-Flip: 0.98557+-0.00724 Training: 2022-01-08 18:49:29,616-[cfp_fp][115000]Accuracy-Highest: 0.98557 Training: 2022-01-08 18:49:56,492-[agedb_30][115000]XNorm: 23.470952 Training: 2022-01-08 18:49:56,493-[agedb_30][115000]Accuracy-Flip: 0.97383+-0.00641 Training: 2022-01-08 18:49:56,493-[agedb_30][115000]Accuracy-Highest: 0.97383 Training: 2022-01-08 18:50:03,338-Speed 443.45 samples/sec Loss 7.0247 LearningRate 0.0980 Epoch: 11 Global Step: 115010 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:50:10,225-Speed 5950.49 samples/sec Loss 7.1038 LearningRate 0.0980 Epoch: 11 Global Step: 115020 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:50:17,106-Speed 5954.30 samples/sec Loss 7.0273 LearningRate 0.0979 Epoch: 11 Global Step: 115030 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:50:23,949-Speed 5986.04 samples/sec Loss 7.0485 LearningRate 0.0979 Epoch: 11 Global Step: 115040 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:50:30,806-Speed 5975.06 samples/sec Loss 7.0687 LearningRate 0.0979 Epoch: 11 Global Step: 115050 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:50:37,663-Speed 5974.36 samples/sec Loss 7.0549 LearningRate 0.0979 Epoch: 11 Global Step: 115060 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:50:44,529-Speed 5966.60 samples/sec Loss 7.0303 LearningRate 0.0978 Epoch: 11 Global Step: 115070 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:50:51,394-Speed 5967.18 samples/sec Loss 7.0611 LearningRate 0.0978 Epoch: 11 Global Step: 115080 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:50:58,281-Speed 5948.50 samples/sec Loss 7.0104 LearningRate 0.0978 Epoch: 11 Global Step: 115090 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:51:05,176-Speed 5941.86 samples/sec Loss 7.0891 LearningRate 0.0978 Epoch: 11 Global Step: 115100 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:51:12,085-Speed 5930.42 samples/sec Loss 7.1442 LearningRate 0.0978 Epoch: 11 Global Step: 115110 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:51:18,950-Speed 5967.42 samples/sec Loss 7.0969 LearningRate 0.0977 Epoch: 11 Global Step: 115120 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:51:25,834-Speed 5950.57 samples/sec Loss 7.0071 LearningRate 0.0977 Epoch: 11 Global Step: 115130 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:51:32,695-Speed 5971.51 samples/sec Loss 7.1090 LearningRate 0.0977 Epoch: 11 Global Step: 115140 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:51:39,590-Speed 5941.16 samples/sec Loss 7.0864 LearningRate 0.0977 Epoch: 11 Global Step: 115150 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:51:46,445-Speed 5976.89 samples/sec Loss 7.0796 LearningRate 0.0977 Epoch: 11 Global Step: 115160 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:51:53,300-Speed 5976.09 samples/sec Loss 7.0159 LearningRate 0.0976 Epoch: 11 Global Step: 115170 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:52:00,162-Speed 5969.88 samples/sec Loss 7.0835 LearningRate 0.0976 Epoch: 11 Global Step: 115180 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:52:07,024-Speed 5970.46 samples/sec Loss 7.1043 LearningRate 0.0976 Epoch: 11 Global Step: 115190 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:52:13,879-Speed 5976.01 samples/sec Loss 7.0629 LearningRate 0.0976 Epoch: 11 Global Step: 115200 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:52:20,748-Speed 5963.85 samples/sec Loss 6.9985 LearningRate 0.0975 Epoch: 11 Global Step: 115210 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:52:27,630-Speed 5953.20 samples/sec Loss 7.0555 LearningRate 0.0975 Epoch: 11 Global Step: 115220 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:52:34,484-Speed 5978.17 samples/sec Loss 7.0842 LearningRate 0.0975 Epoch: 11 Global Step: 115230 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:52:41,337-Speed 5978.13 samples/sec Loss 7.1003 LearningRate 0.0975 Epoch: 11 Global Step: 115240 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:52:48,195-Speed 5973.48 samples/sec Loss 7.1084 LearningRate 0.0975 Epoch: 11 Global Step: 115250 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:52:55,056-Speed 5971.14 samples/sec Loss 7.0708 LearningRate 0.0974 Epoch: 11 Global Step: 115260 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:53:01,910-Speed 5977.03 samples/sec Loss 7.0381 LearningRate 0.0974 Epoch: 11 Global Step: 115270 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:53:08,779-Speed 5964.14 samples/sec Loss 7.0324 LearningRate 0.0974 Epoch: 11 Global Step: 115280 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:53:15,654-Speed 5958.66 samples/sec Loss 7.0629 LearningRate 0.0974 Epoch: 11 Global Step: 115290 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:53:22,518-Speed 5968.96 samples/sec Loss 7.0724 LearningRate 0.0974 Epoch: 11 Global Step: 115300 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:53:29,397-Speed 5954.79 samples/sec Loss 7.0012 LearningRate 0.0973 Epoch: 11 Global Step: 115310 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:53:36,286-Speed 5947.63 samples/sec Loss 7.0419 LearningRate 0.0973 Epoch: 11 Global Step: 115320 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:53:43,135-Speed 5981.16 samples/sec Loss 7.1391 LearningRate 0.0973 Epoch: 11 Global Step: 115330 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:53:50,004-Speed 5964.10 samples/sec Loss 7.0232 LearningRate 0.0973 Epoch: 11 Global Step: 115340 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:53:56,861-Speed 5974.61 samples/sec Loss 7.0873 LearningRate 0.0973 Epoch: 11 Global Step: 115350 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:54:03,721-Speed 5971.98 samples/sec Loss 6.9932 LearningRate 0.0972 Epoch: 11 Global Step: 115360 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:54:10,597-Speed 5957.72 samples/sec Loss 7.0726 LearningRate 0.0972 Epoch: 11 Global Step: 115370 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:54:17,466-Speed 5964.25 samples/sec Loss 7.0597 LearningRate 0.0972 Epoch: 11 Global Step: 115380 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:54:24,335-Speed 5964.38 samples/sec Loss 6.9798 LearningRate 0.0972 Epoch: 11 Global Step: 115390 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:54:31,220-Speed 5949.92 samples/sec Loss 7.0574 LearningRate 0.0971 Epoch: 11 Global Step: 115400 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:54:38,082-Speed 5971.06 samples/sec Loss 7.0742 LearningRate 0.0971 Epoch: 11 Global Step: 115410 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:54:44,958-Speed 5958.17 samples/sec Loss 7.0742 LearningRate 0.0971 Epoch: 11 Global Step: 115420 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:54:51,809-Speed 5979.19 samples/sec Loss 7.0693 LearningRate 0.0971 Epoch: 11 Global Step: 115430 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:54:58,672-Speed 5969.38 samples/sec Loss 6.9947 LearningRate 0.0971 Epoch: 11 Global Step: 115440 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:55:05,524-Speed 5978.71 samples/sec Loss 7.0293 LearningRate 0.0970 Epoch: 11 Global Step: 115450 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:55:12,380-Speed 5975.52 samples/sec Loss 7.0525 LearningRate 0.0970 Epoch: 11 Global Step: 115460 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:55:19,234-Speed 5976.66 samples/sec Loss 6.9817 LearningRate 0.0970 Epoch: 11 Global Step: 115470 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:55:26,103-Speed 5964.53 samples/sec Loss 7.0054 LearningRate 0.0970 Epoch: 11 Global Step: 115480 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:55:33,026-Speed 5917.50 samples/sec Loss 6.9947 LearningRate 0.0970 Epoch: 11 Global Step: 115490 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:55:39,873-Speed 5982.88 samples/sec Loss 7.1168 LearningRate 0.0969 Epoch: 11 Global Step: 115500 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:55:46,745-Speed 5961.81 samples/sec Loss 7.0092 LearningRate 0.0969 Epoch: 11 Global Step: 115510 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:55:53,595-Speed 5980.28 samples/sec Loss 6.9953 LearningRate 0.0969 Epoch: 11 Global Step: 115520 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:56:00,472-Speed 5957.53 samples/sec Loss 7.0463 LearningRate 0.0969 Epoch: 11 Global Step: 115530 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:56:07,330-Speed 5973.65 samples/sec Loss 7.0339 LearningRate 0.0969 Epoch: 11 Global Step: 115540 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:56:14,185-Speed 5975.62 samples/sec Loss 7.0572 LearningRate 0.0968 Epoch: 11 Global Step: 115550 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:56:21,017-Speed 5996.97 samples/sec Loss 7.0816 LearningRate 0.0968 Epoch: 11 Global Step: 115560 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:56:27,872-Speed 5975.96 samples/sec Loss 7.0985 LearningRate 0.0968 Epoch: 11 Global Step: 115570 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:56:34,723-Speed 5979.70 samples/sec Loss 7.0541 LearningRate 0.0968 Epoch: 11 Global Step: 115580 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:56:41,574-Speed 5981.36 samples/sec Loss 6.9724 LearningRate 0.0967 Epoch: 11 Global Step: 115590 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:56:48,459-Speed 5951.38 samples/sec Loss 7.0864 LearningRate 0.0967 Epoch: 11 Global Step: 115600 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:56:55,471-Speed 5842.51 samples/sec Loss 7.0695 LearningRate 0.0967 Epoch: 11 Global Step: 115610 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:57:02,420-Speed 5895.66 samples/sec Loss 7.0338 LearningRate 0.0967 Epoch: 11 Global Step: 115620 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:57:09,267-Speed 5983.08 samples/sec Loss 7.0192 LearningRate 0.0967 Epoch: 11 Global Step: 115630 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:57:16,116-Speed 5981.16 samples/sec Loss 6.9890 LearningRate 0.0966 Epoch: 11 Global Step: 115640 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:57:22,972-Speed 5975.62 samples/sec Loss 6.9897 LearningRate 0.0966 Epoch: 11 Global Step: 115650 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:57:29,829-Speed 5974.85 samples/sec Loss 7.0510 LearningRate 0.0966 Epoch: 11 Global Step: 115660 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:57:36,683-Speed 5977.25 samples/sec Loss 7.0229 LearningRate 0.0966 Epoch: 11 Global Step: 115670 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:57:43,543-Speed 5971.94 samples/sec Loss 7.0008 LearningRate 0.0966 Epoch: 11 Global Step: 115680 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:57:50,406-Speed 5968.27 samples/sec Loss 6.9475 LearningRate 0.0965 Epoch: 11 Global Step: 115690 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:57:57,277-Speed 5962.44 samples/sec Loss 7.0172 LearningRate 0.0965 Epoch: 11 Global Step: 115700 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:58:04,151-Speed 5959.75 samples/sec Loss 7.0125 LearningRate 0.0965 Epoch: 11 Global Step: 115710 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:58:11,038-Speed 5948.18 samples/sec Loss 6.9871 LearningRate 0.0965 Epoch: 11 Global Step: 115720 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:58:17,890-Speed 5979.13 samples/sec Loss 7.0640 LearningRate 0.0965 Epoch: 11 Global Step: 115730 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:58:24,739-Speed 5982.25 samples/sec Loss 7.0012 LearningRate 0.0964 Epoch: 11 Global Step: 115740 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:58:31,598-Speed 5973.20 samples/sec Loss 7.0124 LearningRate 0.0964 Epoch: 11 Global Step: 115750 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:58:38,458-Speed 5971.52 samples/sec Loss 7.0271 LearningRate 0.0964 Epoch: 11 Global Step: 115760 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:58:45,354-Speed 5941.49 samples/sec Loss 6.9859 LearningRate 0.0964 Epoch: 11 Global Step: 115770 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:58:52,228-Speed 5964.83 samples/sec Loss 6.9743 LearningRate 0.0963 Epoch: 11 Global Step: 115780 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:58:59,094-Speed 5966.72 samples/sec Loss 7.0240 LearningRate 0.0963 Epoch: 11 Global Step: 115790 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:59:05,953-Speed 5973.23 samples/sec Loss 7.0053 LearningRate 0.0963 Epoch: 11 Global Step: 115800 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:59:12,829-Speed 5958.26 samples/sec Loss 7.0040 LearningRate 0.0963 Epoch: 11 Global Step: 115810 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 18:59:19,684-Speed 5976.21 samples/sec Loss 7.0270 LearningRate 0.0963 Epoch: 11 Global Step: 115820 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:59:26,553-Speed 5964.48 samples/sec Loss 6.9481 LearningRate 0.0962 Epoch: 11 Global Step: 115830 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:59:33,434-Speed 5953.93 samples/sec Loss 6.9914 LearningRate 0.0962 Epoch: 11 Global Step: 115840 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:59:40,306-Speed 5962.30 samples/sec Loss 7.0542 LearningRate 0.0962 Epoch: 11 Global Step: 115850 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:59:47,174-Speed 5965.08 samples/sec Loss 7.0416 LearningRate 0.0962 Epoch: 11 Global Step: 115860 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 18:59:54,025-Speed 5979.62 samples/sec Loss 6.9916 LearningRate 0.0962 Epoch: 11 Global Step: 115870 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:00:00,884-Speed 5972.70 samples/sec Loss 7.0080 LearningRate 0.0961 Epoch: 11 Global Step: 115880 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:00:07,770-Speed 5949.36 samples/sec Loss 6.9529 LearningRate 0.0961 Epoch: 11 Global Step: 115890 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:00:14,625-Speed 5976.65 samples/sec Loss 7.0552 LearningRate 0.0961 Epoch: 11 Global Step: 115900 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:00:21,501-Speed 5958.92 samples/sec Loss 7.0038 LearningRate 0.0961 Epoch: 11 Global Step: 115910 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:00:28,356-Speed 5977.62 samples/sec Loss 6.9503 LearningRate 0.0961 Epoch: 11 Global Step: 115920 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:00:35,212-Speed 5975.87 samples/sec Loss 6.9655 LearningRate 0.0960 Epoch: 11 Global Step: 115930 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:00:42,082-Speed 5962.92 samples/sec Loss 7.0182 LearningRate 0.0960 Epoch: 11 Global Step: 115940 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:00:48,975-Speed 5943.85 samples/sec Loss 6.9515 LearningRate 0.0960 Epoch: 11 Global Step: 115950 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:00:55,831-Speed 5975.27 samples/sec Loss 6.9896 LearningRate 0.0960 Epoch: 11 Global Step: 115960 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:01:02,677-Speed 5984.45 samples/sec Loss 7.0506 LearningRate 0.0959 Epoch: 11 Global Step: 115970 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:01:09,554-Speed 5958.17 samples/sec Loss 6.9859 LearningRate 0.0959 Epoch: 11 Global Step: 115980 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:01:16,413-Speed 5972.15 samples/sec Loss 6.9682 LearningRate 0.0959 Epoch: 11 Global Step: 115990 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:01:23,272-Speed 5973.58 samples/sec Loss 6.9543 LearningRate 0.0959 Epoch: 11 Global Step: 116000 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:01:30,135-Speed 5971.09 samples/sec Loss 7.0342 LearningRate 0.0959 Epoch: 11 Global Step: 116010 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:01:36,990-Speed 5976.63 samples/sec Loss 7.0212 LearningRate 0.0958 Epoch: 11 Global Step: 116020 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:01:43,867-Speed 5956.58 samples/sec Loss 7.0070 LearningRate 0.0958 Epoch: 11 Global Step: 116030 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:01:50,718-Speed 5980.03 samples/sec Loss 7.0184 LearningRate 0.0958 Epoch: 11 Global Step: 116040 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:01:57,590-Speed 5961.51 samples/sec Loss 7.0000 LearningRate 0.0958 Epoch: 11 Global Step: 116050 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:02:04,438-Speed 5982.54 samples/sec Loss 7.0352 LearningRate 0.0958 Epoch: 11 Global Step: 116060 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:02:11,311-Speed 5960.40 samples/sec Loss 6.9139 LearningRate 0.0957 Epoch: 11 Global Step: 116070 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:02:18,159-Speed 5982.24 samples/sec Loss 6.9709 LearningRate 0.0957 Epoch: 11 Global Step: 116080 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:02:25,017-Speed 5973.68 samples/sec Loss 7.0174 LearningRate 0.0957 Epoch: 11 Global Step: 116090 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:02:31,892-Speed 5958.82 samples/sec Loss 7.0045 LearningRate 0.0957 Epoch: 11 Global Step: 116100 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:02:38,756-Speed 5968.53 samples/sec Loss 7.0530 LearningRate 0.0957 Epoch: 11 Global Step: 116110 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:02:45,610-Speed 5977.29 samples/sec Loss 7.0616 LearningRate 0.0956 Epoch: 11 Global Step: 116120 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:02:52,472-Speed 5969.95 samples/sec Loss 7.0039 LearningRate 0.0956 Epoch: 11 Global Step: 116130 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:02:59,328-Speed 5975.21 samples/sec Loss 7.0023 LearningRate 0.0956 Epoch: 11 Global Step: 116140 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:03:06,227-Speed 5937.76 samples/sec Loss 7.0209 LearningRate 0.0956 Epoch: 11 Global Step: 116150 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:03:13,143-Speed 5926.68 samples/sec Loss 6.9869 LearningRate 0.0955 Epoch: 11 Global Step: 116160 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:03:20,047-Speed 5933.88 samples/sec Loss 7.0002 LearningRate 0.0955 Epoch: 11 Global Step: 116170 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:03:26,948-Speed 5936.07 samples/sec Loss 6.9702 LearningRate 0.0955 Epoch: 11 Global Step: 116180 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:03:33,829-Speed 5954.27 samples/sec Loss 6.9650 LearningRate 0.0955 Epoch: 11 Global Step: 116190 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:03:40,670-Speed 5988.34 samples/sec Loss 6.9772 LearningRate 0.0955 Epoch: 11 Global Step: 116200 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 19:03:47,532-Speed 5973.96 samples/sec Loss 6.9872 LearningRate 0.0954 Epoch: 11 Global Step: 116210 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 19:03:54,410-Speed 5955.86 samples/sec Loss 6.9980 LearningRate 0.0954 Epoch: 11 Global Step: 116220 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 19:04:01,288-Speed 5956.54 samples/sec Loss 6.9476 LearningRate 0.0954 Epoch: 11 Global Step: 116230 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 19:04:08,169-Speed 5953.34 samples/sec Loss 7.0025 LearningRate 0.0954 Epoch: 11 Global Step: 116240 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 19:04:15,030-Speed 5971.42 samples/sec Loss 6.9714 LearningRate 0.0954 Epoch: 11 Global Step: 116250 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 19:04:21,884-Speed 5977.09 samples/sec Loss 7.0064 LearningRate 0.0953 Epoch: 11 Global Step: 116260 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 19:04:28,733-Speed 5980.94 samples/sec Loss 6.9362 LearningRate 0.0953 Epoch: 11 Global Step: 116270 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 19:04:35,591-Speed 5976.73 samples/sec Loss 7.0013 LearningRate 0.0953 Epoch: 11 Global Step: 116280 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 19:04:42,463-Speed 5960.92 samples/sec Loss 6.8973 LearningRate 0.0953 Epoch: 11 Global Step: 116290 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 19:04:49,334-Speed 5962.97 samples/sec Loss 6.9895 LearningRate 0.0953 Epoch: 11 Global Step: 116300 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:04:56,206-Speed 5962.00 samples/sec Loss 6.9826 LearningRate 0.0952 Epoch: 11 Global Step: 116310 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:05:03,086-Speed 5954.10 samples/sec Loss 6.9726 LearningRate 0.0952 Epoch: 11 Global Step: 116320 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:05:10,036-Speed 5895.38 samples/sec Loss 6.9508 LearningRate 0.0952 Epoch: 11 Global Step: 116330 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:05:16,891-Speed 5976.21 samples/sec Loss 6.9827 LearningRate 0.0952 Epoch: 11 Global Step: 116340 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:05:23,740-Speed 5982.36 samples/sec Loss 6.9505 LearningRate 0.0952 Epoch: 11 Global Step: 116350 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:05:30,583-Speed 5985.76 samples/sec Loss 6.9820 LearningRate 0.0951 Epoch: 11 Global Step: 116360 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:05:37,445-Speed 5970.55 samples/sec Loss 6.9446 LearningRate 0.0951 Epoch: 11 Global Step: 116370 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:05:44,302-Speed 5974.38 samples/sec Loss 6.9941 LearningRate 0.0951 Epoch: 11 Global Step: 116380 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:05:51,169-Speed 5966.22 samples/sec Loss 7.0254 LearningRate 0.0951 Epoch: 11 Global Step: 116390 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:05:58,053-Speed 5950.63 samples/sec Loss 6.9536 LearningRate 0.0950 Epoch: 11 Global Step: 116400 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:06:04,928-Speed 5960.02 samples/sec Loss 6.9594 LearningRate 0.0950 Epoch: 11 Global Step: 116410 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:06:11,793-Speed 5967.12 samples/sec Loss 7.0157 LearningRate 0.0950 Epoch: 11 Global Step: 116420 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:06:18,701-Speed 5931.07 samples/sec Loss 6.9303 LearningRate 0.0950 Epoch: 11 Global Step: 116430 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:06:25,567-Speed 5966.14 samples/sec Loss 6.9537 LearningRate 0.0950 Epoch: 11 Global Step: 116440 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:06:32,426-Speed 5972.91 samples/sec Loss 6.9794 LearningRate 0.0949 Epoch: 11 Global Step: 116450 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:06:39,282-Speed 5975.38 samples/sec Loss 7.0050 LearningRate 0.0949 Epoch: 11 Global Step: 116460 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:06:46,136-Speed 5977.73 samples/sec Loss 6.9428 LearningRate 0.0949 Epoch: 11 Global Step: 116470 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 19:06:53,019-Speed 5953.51 samples/sec Loss 6.9996 LearningRate 0.0949 Epoch: 11 Global Step: 116480 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 19:06:59,886-Speed 5966.05 samples/sec Loss 7.0506 LearningRate 0.0949 Epoch: 11 Global Step: 116490 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 19:07:06,757-Speed 5962.11 samples/sec Loss 6.9937 LearningRate 0.0948 Epoch: 11 Global Step: 116500 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 19:07:13,632-Speed 5959.25 samples/sec Loss 7.0012 LearningRate 0.0948 Epoch: 11 Global Step: 116510 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 19:07:20,484-Speed 5979.04 samples/sec Loss 6.9466 LearningRate 0.0948 Epoch: 11 Global Step: 116520 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 19:07:27,379-Speed 5942.05 samples/sec Loss 6.9976 LearningRate 0.0948 Epoch: 11 Global Step: 116530 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 19:07:34,250-Speed 5961.93 samples/sec Loss 6.9298 LearningRate 0.0948 Epoch: 11 Global Step: 116540 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 19:07:41,123-Speed 5960.65 samples/sec Loss 7.0082 LearningRate 0.0947 Epoch: 11 Global Step: 116550 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 19:07:47,969-Speed 5984.22 samples/sec Loss 6.9315 LearningRate 0.0947 Epoch: 11 Global Step: 116560 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 19:07:54,815-Speed 5986.80 samples/sec Loss 7.0041 LearningRate 0.0947 Epoch: 11 Global Step: 116570 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:08:01,686-Speed 5962.33 samples/sec Loss 6.9184 LearningRate 0.0947 Epoch: 11 Global Step: 116580 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:08:08,562-Speed 5958.29 samples/sec Loss 6.9446 LearningRate 0.0946 Epoch: 11 Global Step: 116590 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:08:15,435-Speed 5960.67 samples/sec Loss 6.9489 LearningRate 0.0946 Epoch: 11 Global Step: 116600 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:08:22,282-Speed 5983.33 samples/sec Loss 6.9993 LearningRate 0.0946 Epoch: 11 Global Step: 116610 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:08:29,127-Speed 5985.76 samples/sec Loss 6.9893 LearningRate 0.0946 Epoch: 11 Global Step: 116620 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:08:35,977-Speed 5980.25 samples/sec Loss 6.9012 LearningRate 0.0946 Epoch: 11 Global Step: 116630 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:08:42,844-Speed 5965.63 samples/sec Loss 6.9075 LearningRate 0.0945 Epoch: 11 Global Step: 116640 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:08:49,713-Speed 5964.66 samples/sec Loss 6.9580 LearningRate 0.0945 Epoch: 11 Global Step: 116650 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:08:56,572-Speed 5972.83 samples/sec Loss 7.0094 LearningRate 0.0945 Epoch: 11 Global Step: 116660 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:09:03,424-Speed 5980.10 samples/sec Loss 6.9757 LearningRate 0.0945 Epoch: 11 Global Step: 116670 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:09:10,300-Speed 5958.04 samples/sec Loss 6.9799 LearningRate 0.0945 Epoch: 11 Global Step: 116680 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:09:17,156-Speed 5976.78 samples/sec Loss 6.9367 LearningRate 0.0944 Epoch: 11 Global Step: 116690 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:09:24,049-Speed 5944.31 samples/sec Loss 6.9495 LearningRate 0.0944 Epoch: 11 Global Step: 116700 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:09:30,896-Speed 5983.25 samples/sec Loss 6.9174 LearningRate 0.0944 Epoch: 11 Global Step: 116710 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:09:37,751-Speed 5976.18 samples/sec Loss 6.9455 LearningRate 0.0944 Epoch: 11 Global Step: 116720 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:09:44,603-Speed 5979.56 samples/sec Loss 6.9603 LearningRate 0.0944 Epoch: 11 Global Step: 116730 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:09:51,463-Speed 5972.05 samples/sec Loss 6.9608 LearningRate 0.0943 Epoch: 11 Global Step: 116740 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:09:58,312-Speed 5981.76 samples/sec Loss 6.9249 LearningRate 0.0943 Epoch: 11 Global Step: 116750 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:10:05,252-Speed 5903.21 samples/sec Loss 6.9063 LearningRate 0.0943 Epoch: 11 Global Step: 116760 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:10:12,160-Speed 5929.80 samples/sec Loss 6.9874 LearningRate 0.0943 Epoch: 11 Global Step: 116770 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:10:19,063-Speed 5935.02 samples/sec Loss 6.9461 LearningRate 0.0943 Epoch: 11 Global Step: 116780 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:10:25,941-Speed 5956.88 samples/sec Loss 6.9568 LearningRate 0.0942 Epoch: 11 Global Step: 116790 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:10:32,820-Speed 5955.05 samples/sec Loss 6.9600 LearningRate 0.0942 Epoch: 11 Global Step: 116800 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:10:39,672-Speed 5979.21 samples/sec Loss 6.9434 LearningRate 0.0942 Epoch: 11 Global Step: 116810 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:10:46,553-Speed 5954.99 samples/sec Loss 6.8741 LearningRate 0.0942 Epoch: 11 Global Step: 116820 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:10:53,444-Speed 5946.47 samples/sec Loss 6.9023 LearningRate 0.0941 Epoch: 11 Global Step: 116830 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:11:00,313-Speed 5963.55 samples/sec Loss 6.9978 LearningRate 0.0941 Epoch: 11 Global Step: 116840 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:11:07,182-Speed 5966.83 samples/sec Loss 7.0034 LearningRate 0.0941 Epoch: 11 Global Step: 116850 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:11:14,056-Speed 5959.36 samples/sec Loss 6.9224 LearningRate 0.0941 Epoch: 11 Global Step: 116860 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:11:20,905-Speed 5981.43 samples/sec Loss 6.9629 LearningRate 0.0941 Epoch: 11 Global Step: 116870 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:11:27,774-Speed 5964.56 samples/sec Loss 6.9563 LearningRate 0.0940 Epoch: 11 Global Step: 116880 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:11:34,645-Speed 5962.09 samples/sec Loss 6.9389 LearningRate 0.0940 Epoch: 11 Global Step: 116890 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:11:41,502-Speed 5974.68 samples/sec Loss 6.9596 LearningRate 0.0940 Epoch: 11 Global Step: 116900 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:11:48,387-Speed 5950.00 samples/sec Loss 6.9322 LearningRate 0.0940 Epoch: 11 Global Step: 116910 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:11:55,238-Speed 5982.31 samples/sec Loss 6.9896 LearningRate 0.0940 Epoch: 11 Global Step: 116920 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:12:02,089-Speed 5980.19 samples/sec Loss 6.9582 LearningRate 0.0939 Epoch: 11 Global Step: 116930 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:12:08,948-Speed 5972.36 samples/sec Loss 6.9117 LearningRate 0.0939 Epoch: 11 Global Step: 116940 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:12:15,811-Speed 5969.72 samples/sec Loss 6.9249 LearningRate 0.0939 Epoch: 11 Global Step: 116950 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:12:22,674-Speed 5969.14 samples/sec Loss 6.8983 LearningRate 0.0939 Epoch: 11 Global Step: 116960 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:12:29,561-Speed 5949.12 samples/sec Loss 6.9586 LearningRate 0.0939 Epoch: 11 Global Step: 116970 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-08 19:12:36,433-Speed 5961.48 samples/sec Loss 6.9920 LearningRate 0.0938 Epoch: 11 Global Step: 116980 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:12:43,293-Speed 5972.68 samples/sec Loss 7.0068 LearningRate 0.0938 Epoch: 11 Global Step: 116990 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:12:50,156-Speed 5969.21 samples/sec Loss 6.9150 LearningRate 0.0938 Epoch: 11 Global Step: 117000 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:12:57,002-Speed 5984.31 samples/sec Loss 6.9171 LearningRate 0.0938 Epoch: 11 Global Step: 117010 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:13:03,862-Speed 5971.96 samples/sec Loss 6.9412 LearningRate 0.0938 Epoch: 11 Global Step: 117020 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:13:10,710-Speed 5982.47 samples/sec Loss 6.8890 LearningRate 0.0937 Epoch: 11 Global Step: 117030 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:13:17,578-Speed 5964.39 samples/sec Loss 6.9426 LearningRate 0.0937 Epoch: 11 Global Step: 117040 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:13:24,442-Speed 5970.39 samples/sec Loss 6.9595 LearningRate 0.0937 Epoch: 11 Global Step: 117050 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:13:31,303-Speed 5971.60 samples/sec Loss 6.9557 LearningRate 0.0937 Epoch: 11 Global Step: 117060 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:13:38,169-Speed 5966.15 samples/sec Loss 6.9232 LearningRate 0.0937 Epoch: 11 Global Step: 117070 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:13:45,010-Speed 5988.43 samples/sec Loss 6.9564 LearningRate 0.0936 Epoch: 11 Global Step: 117080 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:13:51,872-Speed 5971.24 samples/sec Loss 6.9570 LearningRate 0.0936 Epoch: 11 Global Step: 117090 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:13:58,715-Speed 5987.08 samples/sec Loss 6.9096 LearningRate 0.0936 Epoch: 11 Global Step: 117100 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:14:05,559-Speed 5985.95 samples/sec Loss 6.9536 LearningRate 0.0936 Epoch: 11 Global Step: 117110 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:14:12,457-Speed 5938.27 samples/sec Loss 6.8685 LearningRate 0.0935 Epoch: 11 Global Step: 117120 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:14:19,388-Speed 5910.41 samples/sec Loss 6.9038 LearningRate 0.0935 Epoch: 11 Global Step: 117130 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:14:26,285-Speed 5940.55 samples/sec Loss 6.9476 LearningRate 0.0935 Epoch: 11 Global Step: 117140 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:14:33,232-Speed 5896.90 samples/sec Loss 6.9494 LearningRate 0.0935 Epoch: 11 Global Step: 117150 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:14:40,084-Speed 5978.80 samples/sec Loss 6.9487 LearningRate 0.0935 Epoch: 11 Global Step: 117160 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:14:46,934-Speed 5980.15 samples/sec Loss 6.9843 LearningRate 0.0934 Epoch: 11 Global Step: 117170 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:14:53,894-Speed 5885.82 samples/sec Loss 6.9250 LearningRate 0.0934 Epoch: 11 Global Step: 117180 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:15:00,824-Speed 5912.07 samples/sec Loss 6.9093 LearningRate 0.0934 Epoch: 11 Global Step: 117190 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:15:07,698-Speed 5959.66 samples/sec Loss 6.9527 LearningRate 0.0934 Epoch: 11 Global Step: 117200 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:15:14,554-Speed 5977.14 samples/sec Loss 6.9454 LearningRate 0.0934 Epoch: 11 Global Step: 117210 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:15:21,489-Speed 5907.07 samples/sec Loss 6.8909 LearningRate 0.0933 Epoch: 11 Global Step: 117220 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:15:28,386-Speed 5940.44 samples/sec Loss 6.9454 LearningRate 0.0933 Epoch: 11 Global Step: 117230 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:15:35,240-Speed 5976.96 samples/sec Loss 6.9675 LearningRate 0.0933 Epoch: 11 Global Step: 117240 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:15:42,082-Speed 5988.45 samples/sec Loss 6.9232 LearningRate 0.0933 Epoch: 11 Global Step: 117250 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:15:48,945-Speed 5969.32 samples/sec Loss 6.9079 LearningRate 0.0933 Epoch: 11 Global Step: 117260 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:15:55,819-Speed 5960.23 samples/sec Loss 6.9325 LearningRate 0.0932 Epoch: 11 Global Step: 117270 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:16:02,674-Speed 5976.27 samples/sec Loss 6.8582 LearningRate 0.0932 Epoch: 11 Global Step: 117280 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:16:09,534-Speed 5971.25 samples/sec Loss 6.9227 LearningRate 0.0932 Epoch: 11 Global Step: 117290 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-08 19:16:17,714-Speed 5008.12 samples/sec Loss 6.9527 LearningRate 0.0932 Epoch: 11 Global Step: 117300 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-08 19:16:24,549-Speed 5994.47 samples/sec Loss 6.8935 LearningRate 0.0932 Epoch: 11 Global Step: 117310 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:16:31,406-Speed 5975.01 samples/sec Loss 6.8496 LearningRate 0.0931 Epoch: 11 Global Step: 117320 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:16:38,260-Speed 5976.12 samples/sec Loss 6.9014 LearningRate 0.0931 Epoch: 11 Global Step: 117330 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:16:45,118-Speed 5974.04 samples/sec Loss 6.9837 LearningRate 0.0931 Epoch: 11 Global Step: 117340 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:16:51,965-Speed 5983.73 samples/sec Loss 6.8879 LearningRate 0.0931 Epoch: 11 Global Step: 117350 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:16:58,840-Speed 5958.95 samples/sec Loss 6.8937 LearningRate 0.0931 Epoch: 11 Global Step: 117360 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:17:05,707-Speed 5967.81 samples/sec Loss 6.8529 LearningRate 0.0930 Epoch: 11 Global Step: 117370 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:17:12,576-Speed 5964.40 samples/sec Loss 6.8631 LearningRate 0.0930 Epoch: 11 Global Step: 117380 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:17:19,423-Speed 5983.21 samples/sec Loss 6.9068 LearningRate 0.0930 Epoch: 11 Global Step: 117390 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:17:26,286-Speed 5969.05 samples/sec Loss 6.9717 LearningRate 0.0930 Epoch: 11 Global Step: 117400 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:17:33,141-Speed 5976.61 samples/sec Loss 6.9494 LearningRate 0.0929 Epoch: 11 Global Step: 117410 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-08 19:17:39,995-Speed 5977.09 samples/sec Loss 6.8552 LearningRate 0.0929 Epoch: 11 Global Step: 117420 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-08 19:17:46,847-Speed 5978.64 samples/sec Loss 6.9318 LearningRate 0.0929 Epoch: 11 Global Step: 117430 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-08 19:17:53,687-Speed 5991.21 samples/sec Loss 6.8556 LearningRate 0.0929 Epoch: 11 Global Step: 117440 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:18:00,544-Speed 5974.29 samples/sec Loss 6.9177 LearningRate 0.0929 Epoch: 11 Global Step: 117450 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:18:07,405-Speed 5973.75 samples/sec Loss 6.9120 LearningRate 0.0928 Epoch: 11 Global Step: 117460 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:18:14,259-Speed 5976.46 samples/sec Loss 6.8830 LearningRate 0.0928 Epoch: 11 Global Step: 117470 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:18:21,135-Speed 5959.04 samples/sec Loss 6.8600 LearningRate 0.0928 Epoch: 11 Global Step: 117480 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:18:28,025-Speed 5946.06 samples/sec Loss 6.9490 LearningRate 0.0928 Epoch: 11 Global Step: 117490 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:18:34,879-Speed 5976.91 samples/sec Loss 6.8850 LearningRate 0.0928 Epoch: 11 Global Step: 117500 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:18:41,734-Speed 5976.67 samples/sec Loss 6.9205 LearningRate 0.0927 Epoch: 11 Global Step: 117510 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:18:48,580-Speed 5984.50 samples/sec Loss 6.8429 LearningRate 0.0927 Epoch: 11 Global Step: 117520 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:18:55,416-Speed 5993.26 samples/sec Loss 6.9334 LearningRate 0.0927 Epoch: 11 Global Step: 117530 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:19:02,270-Speed 5978.90 samples/sec Loss 6.9580 LearningRate 0.0927 Epoch: 11 Global Step: 117540 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:19:09,160-Speed 5951.46 samples/sec Loss 6.9606 LearningRate 0.0927 Epoch: 11 Global Step: 117550 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:19:16,054-Speed 5943.25 samples/sec Loss 6.8941 LearningRate 0.0926 Epoch: 11 Global Step: 117560 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:19:22,931-Speed 5957.29 samples/sec Loss 6.9066 LearningRate 0.0926 Epoch: 11 Global Step: 117570 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:19:29,792-Speed 5971.36 samples/sec Loss 6.8851 LearningRate 0.0926 Epoch: 11 Global Step: 117580 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:19:36,668-Speed 5957.64 samples/sec Loss 6.8623 LearningRate 0.0926 Epoch: 11 Global Step: 117590 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:19:43,517-Speed 5981.42 samples/sec Loss 6.9282 LearningRate 0.0926 Epoch: 11 Global Step: 117600 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:19:50,366-Speed 5981.76 samples/sec Loss 6.8843 LearningRate 0.0925 Epoch: 11 Global Step: 117610 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:19:57,218-Speed 5979.22 samples/sec Loss 6.9207 LearningRate 0.0925 Epoch: 11 Global Step: 117620 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:20:04,094-Speed 5958.82 samples/sec Loss 6.8597 LearningRate 0.0925 Epoch: 11 Global Step: 117630 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:20:10,951-Speed 5974.48 samples/sec Loss 6.8769 LearningRate 0.0925 Epoch: 11 Global Step: 117640 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:20:17,818-Speed 5965.61 samples/sec Loss 6.8712 LearningRate 0.0925 Epoch: 11 Global Step: 117650 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:20:24,688-Speed 5963.71 samples/sec Loss 6.8638 LearningRate 0.0924 Epoch: 11 Global Step: 117660 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:20:31,572-Speed 5951.18 samples/sec Loss 6.9173 LearningRate 0.0924 Epoch: 11 Global Step: 117670 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:20:38,448-Speed 5958.99 samples/sec Loss 6.8770 LearningRate 0.0924 Epoch: 11 Global Step: 117680 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:20:45,302-Speed 5976.85 samples/sec Loss 6.8748 LearningRate 0.0924 Epoch: 11 Global Step: 117690 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:20:52,145-Speed 5987.06 samples/sec Loss 6.9599 LearningRate 0.0923 Epoch: 11 Global Step: 117700 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:20:58,988-Speed 5986.69 samples/sec Loss 6.8955 LearningRate 0.0923 Epoch: 11 Global Step: 117710 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:21:05,848-Speed 5971.75 samples/sec Loss 6.8943 LearningRate 0.0923 Epoch: 11 Global Step: 117720 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:21:12,724-Speed 5958.56 samples/sec Loss 6.9175 LearningRate 0.0923 Epoch: 11 Global Step: 117730 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:21:19,573-Speed 5981.04 samples/sec Loss 6.9023 LearningRate 0.0923 Epoch: 11 Global Step: 117740 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:21:26,452-Speed 5957.06 samples/sec Loss 6.9289 LearningRate 0.0922 Epoch: 11 Global Step: 117750 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:21:33,353-Speed 5936.14 samples/sec Loss 6.8098 LearningRate 0.0922 Epoch: 11 Global Step: 117760 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:21:40,225-Speed 5963.38 samples/sec Loss 6.8401 LearningRate 0.0922 Epoch: 11 Global Step: 117770 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:21:47,074-Speed 5981.61 samples/sec Loss 6.8903 LearningRate 0.0922 Epoch: 11 Global Step: 117780 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:21:53,952-Speed 5956.52 samples/sec Loss 6.8820 LearningRate 0.0922 Epoch: 11 Global Step: 117790 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:22:00,841-Speed 5946.80 samples/sec Loss 6.9050 LearningRate 0.0921 Epoch: 11 Global Step: 117800 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:22:07,695-Speed 5977.30 samples/sec Loss 6.8869 LearningRate 0.0921 Epoch: 11 Global Step: 117810 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:22:14,550-Speed 5976.66 samples/sec Loss 6.8874 LearningRate 0.0921 Epoch: 11 Global Step: 117820 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:22:21,430-Speed 5954.11 samples/sec Loss 6.8188 LearningRate 0.0921 Epoch: 11 Global Step: 117830 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:22:28,330-Speed 5937.98 samples/sec Loss 6.9099 LearningRate 0.0921 Epoch: 11 Global Step: 117840 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:22:35,207-Speed 5957.27 samples/sec Loss 6.8915 LearningRate 0.0920 Epoch: 11 Global Step: 117850 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:22:42,092-Speed 5950.69 samples/sec Loss 6.7850 LearningRate 0.0920 Epoch: 11 Global Step: 117860 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:22:48,960-Speed 5964.79 samples/sec Loss 6.8655 LearningRate 0.0920 Epoch: 11 Global Step: 117870 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:22:55,921-Speed 5885.63 samples/sec Loss 6.8568 LearningRate 0.0920 Epoch: 11 Global Step: 117880 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-08 19:23:02,768-Speed 5982.68 samples/sec Loss 6.7979 LearningRate 0.0920 Epoch: 11 Global Step: 117890 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:23:09,599-Speed 5997.95 samples/sec Loss 6.8596 LearningRate 0.0919 Epoch: 11 Global Step: 117900 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:23:16,456-Speed 5978.16 samples/sec Loss 6.8994 LearningRate 0.0919 Epoch: 11 Global Step: 117910 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:23:23,316-Speed 5971.98 samples/sec Loss 6.8542 LearningRate 0.0919 Epoch: 11 Global Step: 117920 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:23:30,210-Speed 5942.92 samples/sec Loss 6.8358 LearningRate 0.0919 Epoch: 11 Global Step: 117930 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:23:37,086-Speed 5960.32 samples/sec Loss 6.8487 LearningRate 0.0919 Epoch: 11 Global Step: 117940 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:23:43,959-Speed 5962.41 samples/sec Loss 6.8911 LearningRate 0.0918 Epoch: 11 Global Step: 117950 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:23:50,823-Speed 5968.48 samples/sec Loss 6.8415 LearningRate 0.0918 Epoch: 11 Global Step: 117960 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:23:57,678-Speed 5975.67 samples/sec Loss 6.9289 LearningRate 0.0918 Epoch: 11 Global Step: 117970 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:24:04,536-Speed 5973.75 samples/sec Loss 6.8607 LearningRate 0.0918 Epoch: 11 Global Step: 117980 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:24:11,425-Speed 5947.55 samples/sec Loss 6.9309 LearningRate 0.0918 Epoch: 11 Global Step: 117990 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:24:18,269-Speed 5986.32 samples/sec Loss 6.8926 LearningRate 0.0917 Epoch: 11 Global Step: 118000 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:24:25,119-Speed 5980.64 samples/sec Loss 6.8070 LearningRate 0.0917 Epoch: 11 Global Step: 118010 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:24:31,975-Speed 5975.54 samples/sec Loss 6.8459 LearningRate 0.0917 Epoch: 11 Global Step: 118020 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:24:38,862-Speed 5948.08 samples/sec Loss 6.9099 LearningRate 0.0917 Epoch: 11 Global Step: 118030 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:24:45,713-Speed 5980.39 samples/sec Loss 6.8954 LearningRate 0.0917 Epoch: 11 Global Step: 118040 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:24:52,620-Speed 5931.99 samples/sec Loss 6.8760 LearningRate 0.0916 Epoch: 11 Global Step: 118050 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:24:59,476-Speed 5975.77 samples/sec Loss 6.8747 LearningRate 0.0916 Epoch: 11 Global Step: 118060 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:25:06,377-Speed 5936.87 samples/sec Loss 6.8335 LearningRate 0.0916 Epoch: 11 Global Step: 118070 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:25:14,831-Speed 4845.37 samples/sec Loss 6.8138 LearningRate 0.0916 Epoch: 11 Global Step: 118080 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:25:21,691-Speed 5972.35 samples/sec Loss 6.8717 LearningRate 0.0915 Epoch: 11 Global Step: 118090 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 19:25:28,547-Speed 5975.09 samples/sec Loss 6.7953 LearningRate 0.0915 Epoch: 11 Global Step: 118100 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:25:35,399-Speed 5979.07 samples/sec Loss 6.8331 LearningRate 0.0915 Epoch: 11 Global Step: 118110 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:25:42,249-Speed 5980.73 samples/sec Loss 6.8718 LearningRate 0.0915 Epoch: 11 Global Step: 118120 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 19:25:49,108-Speed 5974.86 samples/sec Loss 6.8764 LearningRate 0.0915 Epoch: 11 Global Step: 118130 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:25:55,977-Speed 5964.03 samples/sec Loss 6.8265 LearningRate 0.0914 Epoch: 11 Global Step: 118140 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:26:02,848-Speed 5961.89 samples/sec Loss 6.8620 LearningRate 0.0914 Epoch: 11 Global Step: 118150 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:26:09,704-Speed 5975.33 samples/sec Loss 6.8475 LearningRate 0.0914 Epoch: 11 Global Step: 118160 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:26:16,569-Speed 5968.83 samples/sec Loss 6.8840 LearningRate 0.0914 Epoch: 11 Global Step: 118170 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:26:23,447-Speed 5956.42 samples/sec Loss 6.8438 LearningRate 0.0914 Epoch: 11 Global Step: 118180 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:26:30,331-Speed 5951.41 samples/sec Loss 6.8557 LearningRate 0.0913 Epoch: 11 Global Step: 118190 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:26:37,217-Speed 5952.03 samples/sec Loss 6.8839 LearningRate 0.0913 Epoch: 11 Global Step: 118200 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-08 19:26:44,073-Speed 5975.53 samples/sec Loss 6.8206 LearningRate 0.0913 Epoch: 11 Global Step: 118210 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:26:50,925-Speed 5980.00 samples/sec Loss 6.8161 LearningRate 0.0913 Epoch: 11 Global Step: 118220 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:26:57,784-Speed 5972.93 samples/sec Loss 6.8289 LearningRate 0.0913 Epoch: 11 Global Step: 118230 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:27:04,641-Speed 5975.14 samples/sec Loss 6.9115 LearningRate 0.0912 Epoch: 11 Global Step: 118240 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:27:11,521-Speed 5954.85 samples/sec Loss 6.8436 LearningRate 0.0912 Epoch: 11 Global Step: 118250 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:27:18,366-Speed 5985.31 samples/sec Loss 6.8798 LearningRate 0.0912 Epoch: 11 Global Step: 118260 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:27:25,223-Speed 5974.09 samples/sec Loss 6.7761 LearningRate 0.0912 Epoch: 11 Global Step: 118270 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:27:32,082-Speed 5972.65 samples/sec Loss 6.8222 LearningRate 0.0912 Epoch: 11 Global Step: 118280 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:27:38,951-Speed 5964.31 samples/sec Loss 6.8557 LearningRate 0.0911 Epoch: 11 Global Step: 118290 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:27:45,795-Speed 5985.92 samples/sec Loss 6.8318 LearningRate 0.0911 Epoch: 11 Global Step: 118300 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:27:52,640-Speed 5985.00 samples/sec Loss 6.8161 LearningRate 0.0911 Epoch: 11 Global Step: 118310 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-08 19:27:59,524-Speed 5953.86 samples/sec Loss 6.8452 LearningRate 0.0911 Epoch: 11 Global Step: 118320 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:28:06,367-Speed 5986.33 samples/sec Loss 6.8300 LearningRate 0.0911 Epoch: 11 Global Step: 118330 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:28:13,248-Speed 5953.67 samples/sec Loss 6.8802 LearningRate 0.0910 Epoch: 11 Global Step: 118340 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:28:20,117-Speed 5964.78 samples/sec Loss 6.8236 LearningRate 0.0910 Epoch: 11 Global Step: 118350 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:28:26,958-Speed 5988.50 samples/sec Loss 6.8886 LearningRate 0.0910 Epoch: 11 Global Step: 118360 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:28:33,824-Speed 5966.19 samples/sec Loss 6.9133 LearningRate 0.0910 Epoch: 11 Global Step: 118370 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:28:40,689-Speed 5967.55 samples/sec Loss 6.8336 LearningRate 0.0910 Epoch: 11 Global Step: 118380 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:28:47,542-Speed 5978.84 samples/sec Loss 6.8512 LearningRate 0.0909 Epoch: 11 Global Step: 118390 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:28:54,452-Speed 5928.52 samples/sec Loss 6.8329 LearningRate 0.0909 Epoch: 11 Global Step: 118400 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:29:01,313-Speed 5970.90 samples/sec Loss 6.8623 LearningRate 0.0909 Epoch: 11 Global Step: 118410 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:29:08,163-Speed 5981.10 samples/sec Loss 6.7588 LearningRate 0.0909 Epoch: 11 Global Step: 118420 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:29:15,055-Speed 5944.04 samples/sec Loss 6.8120 LearningRate 0.0909 Epoch: 11 Global Step: 118430 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:29:21,907-Speed 5978.58 samples/sec Loss 6.8190 LearningRate 0.0908 Epoch: 11 Global Step: 118440 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:29:29,534-Speed 5371.23 samples/sec Loss 6.8012 LearningRate 0.0908 Epoch: 11 Global Step: 118450 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:29:36,377-Speed 5987.52 samples/sec Loss 6.8443 LearningRate 0.0908 Epoch: 11 Global Step: 118460 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:29:43,355-Speed 5870.48 samples/sec Loss 6.7931 LearningRate 0.0908 Epoch: 11 Global Step: 118470 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:29:50,211-Speed 5975.61 samples/sec Loss 6.8788 LearningRate 0.0907 Epoch: 11 Global Step: 118480 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:29:57,062-Speed 5982.25 samples/sec Loss 6.7687 LearningRate 0.0907 Epoch: 11 Global Step: 118490 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:30:03,936-Speed 5959.94 samples/sec Loss 6.8472 LearningRate 0.0907 Epoch: 11 Global Step: 118500 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:30:10,788-Speed 5978.26 samples/sec Loss 6.8561 LearningRate 0.0907 Epoch: 11 Global Step: 118510 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:30:17,659-Speed 5962.48 samples/sec Loss 6.8526 LearningRate 0.0907 Epoch: 11 Global Step: 118520 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-08 19:30:24,536-Speed 5958.03 samples/sec Loss 6.7936 LearningRate 0.0906 Epoch: 11 Global Step: 118530 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-08 19:30:31,389-Speed 5978.08 samples/sec Loss 6.8426 LearningRate 0.0906 Epoch: 11 Global Step: 118540 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:30:38,250-Speed 5971.00 samples/sec Loss 6.8068 LearningRate 0.0906 Epoch: 11 Global Step: 118550 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:30:45,117-Speed 5966.07 samples/sec Loss 6.8824 LearningRate 0.0906 Epoch: 11 Global Step: 118560 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:30:51,972-Speed 5976.31 samples/sec Loss 6.7889 LearningRate 0.0906 Epoch: 11 Global Step: 118570 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:30:58,921-Speed 5895.04 samples/sec Loss 6.7941 LearningRate 0.0905 Epoch: 11 Global Step: 118580 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:31:05,769-Speed 5982.93 samples/sec Loss 6.8029 LearningRate 0.0905 Epoch: 11 Global Step: 118590 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:31:12,620-Speed 5979.83 samples/sec Loss 6.8218 LearningRate 0.0905 Epoch: 11 Global Step: 118600 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:31:19,480-Speed 5971.59 samples/sec Loss 6.8708 LearningRate 0.0905 Epoch: 11 Global Step: 118610 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:31:26,370-Speed 5946.76 samples/sec Loss 6.8354 LearningRate 0.0905 Epoch: 11 Global Step: 118620 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:31:33,255-Speed 5949.70 samples/sec Loss 6.8199 LearningRate 0.0904 Epoch: 11 Global Step: 118630 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:31:40,104-Speed 5981.51 samples/sec Loss 6.8518 LearningRate 0.0904 Epoch: 11 Global Step: 118640 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:31:46,957-Speed 5979.62 samples/sec Loss 6.7927 LearningRate 0.0904 Epoch: 11 Global Step: 118650 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:31:53,807-Speed 5980.50 samples/sec Loss 6.8136 LearningRate 0.0904 Epoch: 11 Global Step: 118660 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:32:00,666-Speed 5973.07 samples/sec Loss 6.7857 LearningRate 0.0904 Epoch: 11 Global Step: 118670 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:32:07,517-Speed 5980.02 samples/sec Loss 6.8322 LearningRate 0.0903 Epoch: 11 Global Step: 118680 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:32:14,370-Speed 5978.67 samples/sec Loss 6.8179 LearningRate 0.0903 Epoch: 11 Global Step: 118690 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:32:21,240-Speed 5963.20 samples/sec Loss 6.8386 LearningRate 0.0903 Epoch: 11 Global Step: 118700 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:32:28,100-Speed 5972.75 samples/sec Loss 6.8314 LearningRate 0.0903 Epoch: 11 Global Step: 118710 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:32:34,960-Speed 5970.86 samples/sec Loss 6.7755 LearningRate 0.0903 Epoch: 11 Global Step: 118720 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:32:41,840-Speed 5954.70 samples/sec Loss 6.7722 LearningRate 0.0902 Epoch: 11 Global Step: 118730 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:32:48,709-Speed 5964.83 samples/sec Loss 6.8346 LearningRate 0.0902 Epoch: 11 Global Step: 118740 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:32:55,574-Speed 5966.46 samples/sec Loss 6.8269 LearningRate 0.0902 Epoch: 11 Global Step: 118750 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:33:02,425-Speed 5980.08 samples/sec Loss 6.8184 LearningRate 0.0902 Epoch: 11 Global Step: 118760 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:33:09,275-Speed 5980.63 samples/sec Loss 6.8038 LearningRate 0.0902 Epoch: 11 Global Step: 118770 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:33:16,127-Speed 5979.27 samples/sec Loss 6.8020 LearningRate 0.0901 Epoch: 11 Global Step: 118780 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:33:22,985-Speed 5973.80 samples/sec Loss 6.7996 LearningRate 0.0901 Epoch: 11 Global Step: 118790 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:33:29,833-Speed 5982.01 samples/sec Loss 6.8054 LearningRate 0.0901 Epoch: 11 Global Step: 118800 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:33:36,678-Speed 5985.06 samples/sec Loss 6.8275 LearningRate 0.0901 Epoch: 11 Global Step: 118810 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:33:43,569-Speed 5945.12 samples/sec Loss 6.7502 LearningRate 0.0901 Epoch: 11 Global Step: 118820 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:33:50,461-Speed 5944.34 samples/sec Loss 6.7717 LearningRate 0.0900 Epoch: 11 Global Step: 118830 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:33:57,332-Speed 5962.18 samples/sec Loss 6.7989 LearningRate 0.0900 Epoch: 11 Global Step: 118840 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:34:04,210-Speed 5955.87 samples/sec Loss 6.7961 LearningRate 0.0900 Epoch: 11 Global Step: 118850 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:34:11,050-Speed 5990.11 samples/sec Loss 6.8424 LearningRate 0.0900 Epoch: 11 Global Step: 118860 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:34:17,913-Speed 5968.45 samples/sec Loss 6.8347 LearningRate 0.0900 Epoch: 11 Global Step: 118870 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:34:24,773-Speed 5973.04 samples/sec Loss 6.8247 LearningRate 0.0899 Epoch: 11 Global Step: 118880 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:34:31,639-Speed 5966.74 samples/sec Loss 6.7623 LearningRate 0.0899 Epoch: 11 Global Step: 118890 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:34:38,487-Speed 5981.47 samples/sec Loss 6.8490 LearningRate 0.0899 Epoch: 11 Global Step: 118900 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:34:45,360-Speed 5961.19 samples/sec Loss 6.7916 LearningRate 0.0899 Epoch: 11 Global Step: 118910 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:34:52,224-Speed 5968.55 samples/sec Loss 6.7693 LearningRate 0.0899 Epoch: 11 Global Step: 118920 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:34:59,083-Speed 5972.73 samples/sec Loss 6.7931 LearningRate 0.0898 Epoch: 11 Global Step: 118930 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:35:05,931-Speed 5982.08 samples/sec Loss 6.8093 LearningRate 0.0898 Epoch: 11 Global Step: 118940 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:35:12,785-Speed 5978.03 samples/sec Loss 6.7594 LearningRate 0.0898 Epoch: 11 Global Step: 118950 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:35:19,649-Speed 5968.49 samples/sec Loss 6.8600 LearningRate 0.0898 Epoch: 11 Global Step: 118960 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:35:26,498-Speed 5981.55 samples/sec Loss 6.7549 LearningRate 0.0898 Epoch: 11 Global Step: 118970 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:35:33,351-Speed 5977.72 samples/sec Loss 6.8243 LearningRate 0.0897 Epoch: 11 Global Step: 118980 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:35:40,209-Speed 5973.82 samples/sec Loss 6.8318 LearningRate 0.0897 Epoch: 11 Global Step: 118990 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:35:47,061-Speed 5978.66 samples/sec Loss 6.7924 LearningRate 0.0897 Epoch: 11 Global Step: 119000 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:35:53,923-Speed 5970.42 samples/sec Loss 6.7921 LearningRate 0.0897 Epoch: 11 Global Step: 119010 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:36:00,774-Speed 5980.00 samples/sec Loss 6.7972 LearningRate 0.0897 Epoch: 11 Global Step: 119020 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:36:07,606-Speed 5996.51 samples/sec Loss 6.7742 LearningRate 0.0896 Epoch: 11 Global Step: 119030 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:36:14,468-Speed 5969.74 samples/sec Loss 6.8389 LearningRate 0.0896 Epoch: 11 Global Step: 119040 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:36:21,308-Speed 5989.16 samples/sec Loss 6.8173 LearningRate 0.0896 Epoch: 11 Global Step: 119050 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:36:28,217-Speed 5929.39 samples/sec Loss 6.8352 LearningRate 0.0896 Epoch: 11 Global Step: 119060 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:36:35,066-Speed 5981.68 samples/sec Loss 6.7448 LearningRate 0.0895 Epoch: 11 Global Step: 119070 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:36:41,949-Speed 5952.14 samples/sec Loss 6.7872 LearningRate 0.0895 Epoch: 11 Global Step: 119080 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:36:48,824-Speed 5959.30 samples/sec Loss 6.7858 LearningRate 0.0895 Epoch: 11 Global Step: 119090 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:36:55,686-Speed 5970.16 samples/sec Loss 6.7952 LearningRate 0.0895 Epoch: 11 Global Step: 119100 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:37:02,538-Speed 5977.98 samples/sec Loss 6.8155 LearningRate 0.0895 Epoch: 11 Global Step: 119110 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:37:09,404-Speed 5967.11 samples/sec Loss 6.7943 LearningRate 0.0894 Epoch: 11 Global Step: 119120 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:37:16,257-Speed 5981.00 samples/sec Loss 6.7921 LearningRate 0.0894 Epoch: 11 Global Step: 119130 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:37:23,104-Speed 5982.40 samples/sec Loss 6.7771 LearningRate 0.0894 Epoch: 11 Global Step: 119140 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:37:29,947-Speed 5986.94 samples/sec Loss 6.8110 LearningRate 0.0894 Epoch: 11 Global Step: 119150 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:37:36,826-Speed 5955.89 samples/sec Loss 6.8067 LearningRate 0.0894 Epoch: 11 Global Step: 119160 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:37:43,682-Speed 5974.88 samples/sec Loss 6.7567 LearningRate 0.0893 Epoch: 11 Global Step: 119170 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:37:50,532-Speed 5981.13 samples/sec Loss 6.8185 LearningRate 0.0893 Epoch: 11 Global Step: 119180 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:37:57,421-Speed 5946.50 samples/sec Loss 6.7343 LearningRate 0.0893 Epoch: 11 Global Step: 119190 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:38:04,273-Speed 5979.27 samples/sec Loss 6.7698 LearningRate 0.0893 Epoch: 11 Global Step: 119200 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:38:11,129-Speed 5974.51 samples/sec Loss 6.7911 LearningRate 0.0893 Epoch: 11 Global Step: 119210 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:38:17,984-Speed 5976.80 samples/sec Loss 6.8090 LearningRate 0.0892 Epoch: 11 Global Step: 119220 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:38:24,831-Speed 5982.19 samples/sec Loss 6.7853 LearningRate 0.0892 Epoch: 11 Global Step: 119230 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:38:31,691-Speed 5972.87 samples/sec Loss 6.7614 LearningRate 0.0892 Epoch: 11 Global Step: 119240 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:38:38,543-Speed 5981.15 samples/sec Loss 6.7356 LearningRate 0.0892 Epoch: 11 Global Step: 119250 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:38:45,393-Speed 5980.39 samples/sec Loss 6.7145 LearningRate 0.0892 Epoch: 11 Global Step: 119260 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:38:52,250-Speed 5974.79 samples/sec Loss 6.7888 LearningRate 0.0891 Epoch: 11 Global Step: 119270 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:38:59,118-Speed 5967.92 samples/sec Loss 6.7726 LearningRate 0.0891 Epoch: 11 Global Step: 119280 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:39:05,980-Speed 5970.24 samples/sec Loss 6.8206 LearningRate 0.0891 Epoch: 11 Global Step: 119290 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:39:12,892-Speed 5926.86 samples/sec Loss 6.7445 LearningRate 0.0891 Epoch: 11 Global Step: 119300 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:39:19,767-Speed 5959.31 samples/sec Loss 6.7731 LearningRate 0.0891 Epoch: 11 Global Step: 119310 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:39:26,633-Speed 5966.80 samples/sec Loss 6.7948 LearningRate 0.0890 Epoch: 11 Global Step: 119320 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:39:33,499-Speed 5966.23 samples/sec Loss 6.6995 LearningRate 0.0890 Epoch: 11 Global Step: 119330 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:39:40,346-Speed 5983.46 samples/sec Loss 6.7364 LearningRate 0.0890 Epoch: 11 Global Step: 119340 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:39:47,335-Speed 5860.97 samples/sec Loss 6.8238 LearningRate 0.0890 Epoch: 11 Global Step: 119350 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-08 19:39:54,205-Speed 5966.00 samples/sec Loss 6.7273 LearningRate 0.0890 Epoch: 11 Global Step: 119360 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:40:01,059-Speed 5977.57 samples/sec Loss 6.7391 LearningRate 0.0889 Epoch: 11 Global Step: 119370 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:40:07,924-Speed 5967.44 samples/sec Loss 6.7198 LearningRate 0.0889 Epoch: 11 Global Step: 119380 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:40:14,805-Speed 5953.74 samples/sec Loss 6.7176 LearningRate 0.0889 Epoch: 11 Global Step: 119390 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:40:21,715-Speed 5928.93 samples/sec Loss 6.7434 LearningRate 0.0889 Epoch: 11 Global Step: 119400 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:40:28,607-Speed 5943.65 samples/sec Loss 6.7807 LearningRate 0.0889 Epoch: 11 Global Step: 119410 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:40:35,476-Speed 5964.24 samples/sec Loss 6.7743 LearningRate 0.0888 Epoch: 11 Global Step: 119420 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:40:42,336-Speed 5971.79 samples/sec Loss 6.8154 LearningRate 0.0888 Epoch: 11 Global Step: 119430 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:40:49,174-Speed 5991.23 samples/sec Loss 6.7953 LearningRate 0.0888 Epoch: 11 Global Step: 119440 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:40:56,022-Speed 5982.56 samples/sec Loss 6.7605 LearningRate 0.0888 Epoch: 11 Global Step: 119450 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:41:02,880-Speed 5974.36 samples/sec Loss 6.7950 LearningRate 0.0888 Epoch: 11 Global Step: 119460 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:41:09,752-Speed 5961.01 samples/sec Loss 6.8768 LearningRate 0.0887 Epoch: 11 Global Step: 119470 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:41:16,601-Speed 5982.01 samples/sec Loss 6.7423 LearningRate 0.0887 Epoch: 11 Global Step: 119480 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:41:23,449-Speed 5982.62 samples/sec Loss 6.7332 LearningRate 0.0887 Epoch: 11 Global Step: 119490 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:41:30,310-Speed 5971.50 samples/sec Loss 6.7360 LearningRate 0.0887 Epoch: 11 Global Step: 119500 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:41:37,194-Speed 5950.55 samples/sec Loss 6.8296 LearningRate 0.0887 Epoch: 11 Global Step: 119510 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:41:44,047-Speed 5977.94 samples/sec Loss 6.7292 LearningRate 0.0886 Epoch: 11 Global Step: 119520 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:41:50,904-Speed 5974.96 samples/sec Loss 6.7687 LearningRate 0.0886 Epoch: 11 Global Step: 119530 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:41:57,752-Speed 5982.09 samples/sec Loss 6.7674 LearningRate 0.0886 Epoch: 11 Global Step: 119540 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:42:04,603-Speed 5980.10 samples/sec Loss 6.8350 LearningRate 0.0886 Epoch: 11 Global Step: 119550 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:42:11,445-Speed 5988.07 samples/sec Loss 6.7905 LearningRate 0.0886 Epoch: 11 Global Step: 119560 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:42:18,302-Speed 5974.33 samples/sec Loss 6.7692 LearningRate 0.0885 Epoch: 11 Global Step: 119570 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:42:25,156-Speed 5977.39 samples/sec Loss 6.7826 LearningRate 0.0885 Epoch: 11 Global Step: 119580 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:42:32,039-Speed 5951.99 samples/sec Loss 6.7119 LearningRate 0.0885 Epoch: 11 Global Step: 119590 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:42:38,893-Speed 5977.42 samples/sec Loss 6.7495 LearningRate 0.0885 Epoch: 11 Global Step: 119600 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:42:45,758-Speed 5967.78 samples/sec Loss 6.7091 LearningRate 0.0885 Epoch: 11 Global Step: 119610 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:42:52,614-Speed 5974.19 samples/sec Loss 6.7682 LearningRate 0.0884 Epoch: 11 Global Step: 119620 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:42:59,464-Speed 5981.12 samples/sec Loss 6.7940 LearningRate 0.0884 Epoch: 11 Global Step: 119630 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:43:06,321-Speed 5974.41 samples/sec Loss 6.7905 LearningRate 0.0884 Epoch: 11 Global Step: 119640 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-08 19:43:13,184-Speed 5969.45 samples/sec Loss 6.8193 LearningRate 0.0884 Epoch: 11 Global Step: 119650 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:43:20,064-Speed 5954.32 samples/sec Loss 6.7003 LearningRate 0.0884 Epoch: 11 Global Step: 119660 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:43:26,925-Speed 5971.88 samples/sec Loss 6.7669 LearningRate 0.0883 Epoch: 11 Global Step: 119670 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:43:33,779-Speed 5976.41 samples/sec Loss 6.7394 LearningRate 0.0883 Epoch: 11 Global Step: 119680 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:43:40,623-Speed 5986.60 samples/sec Loss 6.7955 LearningRate 0.0883 Epoch: 11 Global Step: 119690 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:43:47,469-Speed 5984.34 samples/sec Loss 6.7480 LearningRate 0.0883 Epoch: 11 Global Step: 119700 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:43:54,339-Speed 5963.20 samples/sec Loss 6.7362 LearningRate 0.0883 Epoch: 11 Global Step: 119710 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:44:01,179-Speed 5989.70 samples/sec Loss 6.7463 LearningRate 0.0882 Epoch: 11 Global Step: 119720 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:44:08,043-Speed 5970.54 samples/sec Loss 6.7627 LearningRate 0.0882 Epoch: 11 Global Step: 119730 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:44:14,918-Speed 5959.25 samples/sec Loss 6.7409 LearningRate 0.0882 Epoch: 11 Global Step: 119740 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:44:21,780-Speed 5970.77 samples/sec Loss 6.7754 LearningRate 0.0882 Epoch: 11 Global Step: 119750 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:44:28,627-Speed 5982.91 samples/sec Loss 6.7400 LearningRate 0.0882 Epoch: 11 Global Step: 119760 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:44:35,501-Speed 5959.84 samples/sec Loss 6.7690 LearningRate 0.0881 Epoch: 11 Global Step: 119770 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:44:42,352-Speed 5979.83 samples/sec Loss 6.7232 LearningRate 0.0881 Epoch: 11 Global Step: 119780 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:44:49,204-Speed 5978.77 samples/sec Loss 6.7653 LearningRate 0.0881 Epoch: 11 Global Step: 119790 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:44:56,074-Speed 5963.43 samples/sec Loss 6.7184 LearningRate 0.0881 Epoch: 11 Global Step: 119800 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:45:02,930-Speed 5975.30 samples/sec Loss 6.7541 LearningRate 0.0881 Epoch: 11 Global Step: 119810 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:45:09,778-Speed 5981.98 samples/sec Loss 6.7281 LearningRate 0.0880 Epoch: 11 Global Step: 119820 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:45:16,622-Speed 5986.22 samples/sec Loss 6.7707 LearningRate 0.0880 Epoch: 11 Global Step: 119830 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:45:23,486-Speed 5968.45 samples/sec Loss 6.7413 LearningRate 0.0880 Epoch: 11 Global Step: 119840 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:45:30,442-Speed 5889.60 samples/sec Loss 6.7133 LearningRate 0.0880 Epoch: 11 Global Step: 119850 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:45:37,302-Speed 5972.15 samples/sec Loss 6.7301 LearningRate 0.0880 Epoch: 11 Global Step: 119860 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:45:44,151-Speed 5980.96 samples/sec Loss 6.6355 LearningRate 0.0879 Epoch: 11 Global Step: 119870 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:45:51,008-Speed 5975.29 samples/sec Loss 6.7661 LearningRate 0.0879 Epoch: 11 Global Step: 119880 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:45:57,848-Speed 5989.06 samples/sec Loss 6.7835 LearningRate 0.0879 Epoch: 11 Global Step: 119890 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:46:04,713-Speed 5967.89 samples/sec Loss 6.6921 LearningRate 0.0879 Epoch: 11 Global Step: 119900 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:46:11,595-Speed 5953.34 samples/sec Loss 6.6919 LearningRate 0.0879 Epoch: 11 Global Step: 119910 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:46:18,448-Speed 5977.24 samples/sec Loss 6.6977 LearningRate 0.0878 Epoch: 11 Global Step: 119920 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:46:25,294-Speed 5984.49 samples/sec Loss 6.6913 LearningRate 0.0878 Epoch: 11 Global Step: 119930 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:46:32,146-Speed 5979.57 samples/sec Loss 6.7447 LearningRate 0.0878 Epoch: 11 Global Step: 119940 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:46:38,998-Speed 5978.98 samples/sec Loss 6.7327 LearningRate 0.0878 Epoch: 11 Global Step: 119950 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:46:45,869-Speed 5961.95 samples/sec Loss 6.7293 LearningRate 0.0878 Epoch: 11 Global Step: 119960 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:46:52,855-Speed 5864.60 samples/sec Loss 6.6819 LearningRate 0.0877 Epoch: 11 Global Step: 119970 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:46:59,898-Speed 5816.83 samples/sec Loss 6.6859 LearningRate 0.0877 Epoch: 11 Global Step: 119980 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:47:06,745-Speed 5982.95 samples/sec Loss 6.7233 LearningRate 0.0877 Epoch: 11 Global Step: 119990 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:47:13,602-Speed 5974.62 samples/sec Loss 6.7342 LearningRate 0.0877 Epoch: 11 Global Step: 120000 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:47:40,270-[lfw][120000]XNorm: 23.052176 Training: 2022-01-08 19:47:40,270-[lfw][120000]Accuracy-Flip: 0.99633+-0.00314 Training: 2022-01-08 19:47:40,271-[lfw][120000]Accuracy-Highest: 0.99783 Training: 2022-01-08 19:48:11,130-[cfp_fp][120000]XNorm: 20.136159 Training: 2022-01-08 19:48:11,131-[cfp_fp][120000]Accuracy-Flip: 0.98386+-0.00409 Training: 2022-01-08 19:48:11,132-[cfp_fp][120000]Accuracy-Highest: 0.98557 Training: 2022-01-08 19:48:37,853-[agedb_30][120000]XNorm: 22.390298 Training: 2022-01-08 19:48:37,854-[agedb_30][120000]Accuracy-Flip: 0.97383+-0.00606 Training: 2022-01-08 19:48:37,855-[agedb_30][120000]Accuracy-Highest: 0.97383 Training: 2022-01-08 19:48:44,718-Speed 449.54 samples/sec Loss 6.7477 LearningRate 0.0877 Epoch: 11 Global Step: 120010 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:48:51,559-Speed 5988.36 samples/sec Loss 6.7198 LearningRate 0.0876 Epoch: 11 Global Step: 120020 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:48:58,394-Speed 5993.71 samples/sec Loss 6.6936 LearningRate 0.0876 Epoch: 11 Global Step: 120030 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:49:05,252-Speed 5973.75 samples/sec Loss 6.7214 LearningRate 0.0876 Epoch: 11 Global Step: 120040 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-08 19:49:12,087-Speed 5993.04 samples/sec Loss 6.7723 LearningRate 0.0876 Epoch: 11 Global Step: 120050 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:49:18,972-Speed 5951.07 samples/sec Loss 6.7287 LearningRate 0.0876 Epoch: 11 Global Step: 120060 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:49:25,838-Speed 5966.25 samples/sec Loss 6.6807 LearningRate 0.0875 Epoch: 11 Global Step: 120070 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:49:32,692-Speed 5977.07 samples/sec Loss 6.7174 LearningRate 0.0875 Epoch: 11 Global Step: 120080 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:49:39,556-Speed 5968.51 samples/sec Loss 6.7061 LearningRate 0.0875 Epoch: 11 Global Step: 120090 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:49:46,439-Speed 5952.68 samples/sec Loss 6.7155 LearningRate 0.0875 Epoch: 11 Global Step: 120100 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:49:53,300-Speed 5971.16 samples/sec Loss 6.7111 LearningRate 0.0875 Epoch: 11 Global Step: 120110 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:50:00,140-Speed 5988.94 samples/sec Loss 6.7574 LearningRate 0.0874 Epoch: 11 Global Step: 120120 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:50:07,009-Speed 5964.45 samples/sec Loss 6.7152 LearningRate 0.0874 Epoch: 11 Global Step: 120130 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:50:13,847-Speed 5991.00 samples/sec Loss 6.7221 LearningRate 0.0874 Epoch: 11 Global Step: 120140 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:50:20,690-Speed 5986.60 samples/sec Loss 6.7740 LearningRate 0.0874 Epoch: 11 Global Step: 120150 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:50:27,554-Speed 5969.17 samples/sec Loss 6.6609 LearningRate 0.0874 Epoch: 11 Global Step: 120160 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:50:34,424-Speed 5962.85 samples/sec Loss 6.7256 LearningRate 0.0873 Epoch: 11 Global Step: 120170 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:50:41,266-Speed 5987.99 samples/sec Loss 6.7437 LearningRate 0.0873 Epoch: 11 Global Step: 120180 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:50:48,140-Speed 5959.82 samples/sec Loss 6.7021 LearningRate 0.0873 Epoch: 11 Global Step: 120190 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:50:55,011-Speed 5962.37 samples/sec Loss 6.6540 LearningRate 0.0873 Epoch: 11 Global Step: 120200 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:51:01,853-Speed 5987.52 samples/sec Loss 6.6627 LearningRate 0.0873 Epoch: 11 Global Step: 120210 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:51:08,741-Speed 5947.62 samples/sec Loss 6.6633 LearningRate 0.0872 Epoch: 11 Global Step: 120220 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:51:15,638-Speed 5940.45 samples/sec Loss 6.6813 LearningRate 0.0872 Epoch: 11 Global Step: 120230 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:51:22,500-Speed 5972.22 samples/sec Loss 6.6668 LearningRate 0.0872 Epoch: 11 Global Step: 120240 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:51:29,368-Speed 5966.86 samples/sec Loss 6.6556 LearningRate 0.0872 Epoch: 11 Global Step: 120250 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:51:36,219-Speed 5979.39 samples/sec Loss 6.7666 LearningRate 0.0872 Epoch: 11 Global Step: 120260 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:51:43,078-Speed 5973.45 samples/sec Loss 6.7470 LearningRate 0.0871 Epoch: 11 Global Step: 120270 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:51:49,975-Speed 5940.61 samples/sec Loss 6.6484 LearningRate 0.0871 Epoch: 11 Global Step: 120280 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:51:56,823-Speed 5981.81 samples/sec Loss 6.6590 LearningRate 0.0871 Epoch: 11 Global Step: 120290 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:52:03,678-Speed 5976.76 samples/sec Loss 6.7353 LearningRate 0.0871 Epoch: 11 Global Step: 120300 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:52:10,577-Speed 5940.69 samples/sec Loss 6.7034 LearningRate 0.0871 Epoch: 11 Global Step: 120310 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:52:17,458-Speed 5953.38 samples/sec Loss 6.6891 LearningRate 0.0870 Epoch: 11 Global Step: 120320 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-08 19:52:24,278-Speed 6007.37 samples/sec Loss 6.6950 LearningRate 0.0870 Epoch: 11 Global Step: 120330 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:52:31,156-Speed 5956.26 samples/sec Loss 6.6587 LearningRate 0.0870 Epoch: 11 Global Step: 120340 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:52:38,041-Speed 5950.36 samples/sec Loss 6.7242 LearningRate 0.0870 Epoch: 11 Global Step: 120350 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:52:44,884-Speed 5986.61 samples/sec Loss 6.7417 LearningRate 0.0870 Epoch: 11 Global Step: 120360 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:52:51,764-Speed 5955.11 samples/sec Loss 6.7091 LearningRate 0.0869 Epoch: 11 Global Step: 120370 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:52:58,609-Speed 5985.47 samples/sec Loss 6.7518 LearningRate 0.0869 Epoch: 11 Global Step: 120380 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:53:05,450-Speed 5988.19 samples/sec Loss 6.6885 LearningRate 0.0869 Epoch: 11 Global Step: 120390 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:53:12,305-Speed 5975.71 samples/sec Loss 6.6814 LearningRate 0.0869 Epoch: 11 Global Step: 120400 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:53:19,147-Speed 5987.68 samples/sec Loss 6.6817 LearningRate 0.0869 Epoch: 11 Global Step: 120410 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:53:26,107-Speed 5887.34 samples/sec Loss 6.7227 LearningRate 0.0868 Epoch: 11 Global Step: 120420 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:53:32,957-Speed 5979.87 samples/sec Loss 6.7060 LearningRate 0.0868 Epoch: 11 Global Step: 120430 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:53:39,808-Speed 5980.65 samples/sec Loss 6.7010 LearningRate 0.0868 Epoch: 11 Global Step: 120440 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:53:46,654-Speed 5984.06 samples/sec Loss 6.7453 LearningRate 0.0868 Epoch: 11 Global Step: 120450 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:53:53,586-Speed 5910.08 samples/sec Loss 6.7268 LearningRate 0.0868 Epoch: 11 Global Step: 120460 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:54:00,491-Speed 5933.93 samples/sec Loss 6.7284 LearningRate 0.0867 Epoch: 11 Global Step: 120470 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:54:07,351-Speed 5971.75 samples/sec Loss 6.7137 LearningRate 0.0867 Epoch: 11 Global Step: 120480 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:54:14,212-Speed 5971.32 samples/sec Loss 6.7032 LearningRate 0.0867 Epoch: 11 Global Step: 120490 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:54:21,069-Speed 5977.16 samples/sec Loss 6.7054 LearningRate 0.0867 Epoch: 11 Global Step: 120500 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:54:27,932-Speed 5969.14 samples/sec Loss 6.7538 LearningRate 0.0867 Epoch: 11 Global Step: 120510 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:54:34,786-Speed 5977.39 samples/sec Loss 6.7310 LearningRate 0.0866 Epoch: 11 Global Step: 120520 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:54:41,674-Speed 5947.57 samples/sec Loss 6.6833 LearningRate 0.0866 Epoch: 11 Global Step: 120530 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-08 19:54:48,613-Speed 5903.91 samples/sec Loss 6.6819 LearningRate 0.0866 Epoch: 11 Global Step: 120540 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-08 19:54:55,460-Speed 5983.12 samples/sec Loss 6.7270 LearningRate 0.0866 Epoch: 11 Global Step: 120550 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:55:02,310-Speed 5980.85 samples/sec Loss 6.6558 LearningRate 0.0866 Epoch: 11 Global Step: 120560 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:55:09,162-Speed 5978.74 samples/sec Loss 6.6935 LearningRate 0.0865 Epoch: 11 Global Step: 120570 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:55:16,007-Speed 5985.80 samples/sec Loss 6.6785 LearningRate 0.0865 Epoch: 11 Global Step: 120580 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:55:22,859-Speed 5978.16 samples/sec Loss 6.6520 LearningRate 0.0865 Epoch: 11 Global Step: 120590 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:55:29,703-Speed 5987.35 samples/sec Loss 6.7126 LearningRate 0.0865 Epoch: 11 Global Step: 120600 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:55:36,556-Speed 5977.84 samples/sec Loss 6.7114 LearningRate 0.0865 Epoch: 11 Global Step: 120610 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:55:43,410-Speed 5979.34 samples/sec Loss 6.6960 LearningRate 0.0864 Epoch: 11 Global Step: 120620 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:55:50,269-Speed 5973.11 samples/sec Loss 6.6778 LearningRate 0.0864 Epoch: 11 Global Step: 120630 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:55:57,127-Speed 5973.66 samples/sec Loss 6.7045 LearningRate 0.0864 Epoch: 11 Global Step: 120640 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:56:03,984-Speed 5974.83 samples/sec Loss 6.6808 LearningRate 0.0864 Epoch: 11 Global Step: 120650 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-08 19:56:10,850-Speed 5966.91 samples/sec Loss 6.6400 LearningRate 0.0864 Epoch: 11 Global Step: 120660 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:56:17,706-Speed 5978.05 samples/sec Loss 6.7391 LearningRate 0.0863 Epoch: 11 Global Step: 120670 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:56:24,563-Speed 5974.70 samples/sec Loss 6.6838 LearningRate 0.0863 Epoch: 11 Global Step: 120680 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:56:31,421-Speed 5974.05 samples/sec Loss 6.7476 LearningRate 0.0863 Epoch: 11 Global Step: 120690 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:56:38,343-Speed 5918.24 samples/sec Loss 6.6389 LearningRate 0.0863 Epoch: 11 Global Step: 120700 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:56:45,179-Speed 5993.18 samples/sec Loss 6.6740 LearningRate 0.0863 Epoch: 11 Global Step: 120710 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:56:52,049-Speed 5962.64 samples/sec Loss 6.6885 LearningRate 0.0862 Epoch: 11 Global Step: 120720 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:56:58,905-Speed 5975.86 samples/sec Loss 6.6894 LearningRate 0.0862 Epoch: 11 Global Step: 120730 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:57:05,769-Speed 5975.58 samples/sec Loss 6.6483 LearningRate 0.0862 Epoch: 11 Global Step: 120740 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:57:13,460-Speed 5971.58 samples/sec Loss 6.6532 LearningRate 0.0862 Epoch: 11 Global Step: 120750 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:57:20,317-Speed 5973.99 samples/sec Loss 6.7062 LearningRate 0.0862 Epoch: 11 Global Step: 120760 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:57:27,163-Speed 5984.31 samples/sec Loss 6.6750 LearningRate 0.0861 Epoch: 11 Global Step: 120770 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:57:34,008-Speed 5985.27 samples/sec Loss 6.6803 LearningRate 0.0861 Epoch: 11 Global Step: 120780 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:57:40,853-Speed 5985.94 samples/sec Loss 6.6596 LearningRate 0.0861 Epoch: 11 Global Step: 120790 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:57:47,761-Speed 5930.24 samples/sec Loss 6.6816 LearningRate 0.0861 Epoch: 11 Global Step: 120800 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:57:54,625-Speed 5968.24 samples/sec Loss 6.6662 LearningRate 0.0861 Epoch: 11 Global Step: 120810 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:58:01,524-Speed 5938.61 samples/sec Loss 6.6759 LearningRate 0.0860 Epoch: 11 Global Step: 120820 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:58:08,390-Speed 5966.87 samples/sec Loss 6.7045 LearningRate 0.0860 Epoch: 11 Global Step: 120830 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:58:15,238-Speed 5982.05 samples/sec Loss 6.7171 LearningRate 0.0860 Epoch: 11 Global Step: 120840 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:58:22,093-Speed 5976.30 samples/sec Loss 6.6551 LearningRate 0.0860 Epoch: 11 Global Step: 120850 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:58:28,946-Speed 5977.94 samples/sec Loss 6.6687 LearningRate 0.0860 Epoch: 11 Global Step: 120860 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:58:35,810-Speed 5968.45 samples/sec Loss 6.6657 LearningRate 0.0859 Epoch: 11 Global Step: 120870 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:58:42,660-Speed 5981.39 samples/sec Loss 6.5967 LearningRate 0.0859 Epoch: 11 Global Step: 120880 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:58:49,526-Speed 5966.20 samples/sec Loss 6.6016 LearningRate 0.0859 Epoch: 11 Global Step: 120890 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:58:56,406-Speed 5954.54 samples/sec Loss 6.6980 LearningRate 0.0859 Epoch: 11 Global Step: 120900 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:59:03,254-Speed 5982.35 samples/sec Loss 6.7825 LearningRate 0.0859 Epoch: 11 Global Step: 120910 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-08 19:59:10,102-Speed 5985.93 samples/sec Loss 6.6389 LearningRate 0.0858 Epoch: 11 Global Step: 120920 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 19:59:16,946-Speed 5985.71 samples/sec Loss 6.6099 LearningRate 0.0858 Epoch: 11 Global Step: 120930 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:59:23,815-Speed 5965.63 samples/sec Loss 6.7182 LearningRate 0.0858 Epoch: 11 Global Step: 120940 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:59:30,674-Speed 5972.97 samples/sec Loss 6.6285 LearningRate 0.0858 Epoch: 11 Global Step: 120950 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:59:37,522-Speed 5982.20 samples/sec Loss 6.5977 LearningRate 0.0858 Epoch: 11 Global Step: 120960 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:59:44,375-Speed 5978.22 samples/sec Loss 6.6666 LearningRate 0.0857 Epoch: 11 Global Step: 120970 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:59:51,235-Speed 5972.50 samples/sec Loss 6.6599 LearningRate 0.0857 Epoch: 11 Global Step: 120980 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 19:59:58,111-Speed 5958.00 samples/sec Loss 6.6285 LearningRate 0.0857 Epoch: 11 Global Step: 120990 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:00:04,968-Speed 5974.72 samples/sec Loss 6.6410 LearningRate 0.0857 Epoch: 11 Global Step: 121000 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:00:11,817-Speed 5981.09 samples/sec Loss 6.6132 LearningRate 0.0857 Epoch: 11 Global Step: 121010 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:00:18,699-Speed 5956.08 samples/sec Loss 6.5942 LearningRate 0.0856 Epoch: 11 Global Step: 121020 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:00:25,568-Speed 5964.24 samples/sec Loss 6.6770 LearningRate 0.0856 Epoch: 11 Global Step: 121030 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:00:32,431-Speed 5968.99 samples/sec Loss 6.6402 LearningRate 0.0856 Epoch: 11 Global Step: 121040 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:00:39,296-Speed 5968.57 samples/sec Loss 6.6708 LearningRate 0.0856 Epoch: 11 Global Step: 121050 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:00:46,157-Speed 5971.66 samples/sec Loss 6.6455 LearningRate 0.0856 Epoch: 11 Global Step: 121060 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:00:53,021-Speed 5967.76 samples/sec Loss 6.7324 LearningRate 0.0855 Epoch: 11 Global Step: 121070 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:00:59,886-Speed 5968.12 samples/sec Loss 6.6840 LearningRate 0.0855 Epoch: 11 Global Step: 121080 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:01:06,734-Speed 5981.95 samples/sec Loss 6.6920 LearningRate 0.0855 Epoch: 11 Global Step: 121090 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:01:13,586-Speed 5980.50 samples/sec Loss 6.6717 LearningRate 0.0855 Epoch: 11 Global Step: 121100 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:01:20,447-Speed 5971.78 samples/sec Loss 6.6951 LearningRate 0.0855 Epoch: 11 Global Step: 121110 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:01:27,310-Speed 5968.45 samples/sec Loss 6.5960 LearningRate 0.0854 Epoch: 11 Global Step: 121120 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:01:34,171-Speed 5971.59 samples/sec Loss 6.6012 LearningRate 0.0854 Epoch: 11 Global Step: 121130 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:01:41,064-Speed 5943.34 samples/sec Loss 6.6535 LearningRate 0.0854 Epoch: 11 Global Step: 121140 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:01:47,969-Speed 5932.88 samples/sec Loss 6.6521 LearningRate 0.0854 Epoch: 11 Global Step: 121150 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:01:54,829-Speed 5972.33 samples/sec Loss 6.6589 LearningRate 0.0854 Epoch: 11 Global Step: 121160 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:02:01,690-Speed 5970.76 samples/sec Loss 6.6193 LearningRate 0.0853 Epoch: 11 Global Step: 121170 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:02:08,577-Speed 5948.16 samples/sec Loss 6.6637 LearningRate 0.0853 Epoch: 11 Global Step: 121180 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:02:15,436-Speed 5973.17 samples/sec Loss 6.6804 LearningRate 0.0853 Epoch: 11 Global Step: 121190 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:02:22,302-Speed 5967.28 samples/sec Loss 6.6272 LearningRate 0.0853 Epoch: 11 Global Step: 121200 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:02:29,177-Speed 5958.49 samples/sec Loss 6.6324 LearningRate 0.0853 Epoch: 11 Global Step: 121210 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:02:36,028-Speed 5979.75 samples/sec Loss 6.6032 LearningRate 0.0852 Epoch: 11 Global Step: 121220 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:02:42,898-Speed 5963.20 samples/sec Loss 6.6461 LearningRate 0.0852 Epoch: 11 Global Step: 121230 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:02:49,780-Speed 5953.22 samples/sec Loss 6.6495 LearningRate 0.0852 Epoch: 11 Global Step: 121240 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-08 20:02:56,636-Speed 5975.24 samples/sec Loss 6.6135 LearningRate 0.0852 Epoch: 11 Global Step: 121250 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:03:03,513-Speed 5957.61 samples/sec Loss 6.6528 LearningRate 0.0852 Epoch: 11 Global Step: 121260 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:03:10,366-Speed 5977.46 samples/sec Loss 6.6153 LearningRate 0.0851 Epoch: 11 Global Step: 121270 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:03:17,215-Speed 5981.80 samples/sec Loss 6.5736 LearningRate 0.0851 Epoch: 11 Global Step: 121280 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:03:24,070-Speed 5976.78 samples/sec Loss 6.5665 LearningRate 0.0851 Epoch: 11 Global Step: 121290 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:03:30,928-Speed 5972.64 samples/sec Loss 6.7170 LearningRate 0.0851 Epoch: 11 Global Step: 121300 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:03:37,776-Speed 5983.16 samples/sec Loss 6.5707 LearningRate 0.0851 Epoch: 11 Global Step: 121310 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:03:44,616-Speed 5989.40 samples/sec Loss 6.5855 LearningRate 0.0850 Epoch: 11 Global Step: 121320 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-08 20:03:51,470-Speed 5977.85 samples/sec Loss 6.5699 LearningRate 0.0850 Epoch: 11 Global Step: 121330 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-08 20:03:58,326-Speed 5975.18 samples/sec Loss 6.6195 LearningRate 0.0850 Epoch: 11 Global Step: 121340 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-08 20:04:05,189-Speed 5969.13 samples/sec Loss 6.6221 LearningRate 0.0850 Epoch: 11 Global Step: 121350 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-08 20:04:12,056-Speed 5966.11 samples/sec Loss 6.6000 LearningRate 0.0850 Epoch: 11 Global Step: 121360 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-08 20:04:18,903-Speed 5983.17 samples/sec Loss 6.5891 LearningRate 0.0849 Epoch: 11 Global Step: 121370 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-08 20:04:25,766-Speed 5969.76 samples/sec Loss 6.7243 LearningRate 0.0849 Epoch: 11 Global Step: 121380 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-08 20:04:32,626-Speed 5972.24 samples/sec Loss 6.6496 LearningRate 0.0849 Epoch: 11 Global Step: 121390 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-08 20:04:39,482-Speed 5975.51 samples/sec Loss 6.5538 LearningRate 0.0849 Epoch: 11 Global Step: 121400 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-08 20:04:46,402-Speed 5920.22 samples/sec Loss 6.5996 LearningRate 0.0849 Epoch: 11 Global Step: 121410 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-08 20:04:53,268-Speed 5966.85 samples/sec Loss 6.6279 LearningRate 0.0848 Epoch: 11 Global Step: 121420 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:05:00,133-Speed 5967.91 samples/sec Loss 6.6377 LearningRate 0.0848 Epoch: 11 Global Step: 121430 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:05:06,998-Speed 5967.19 samples/sec Loss 6.6246 LearningRate 0.0848 Epoch: 11 Global Step: 121440 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:05:13,854-Speed 5976.55 samples/sec Loss 6.6262 LearningRate 0.0848 Epoch: 11 Global Step: 121450 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:05:20,746-Speed 5945.08 samples/sec Loss 6.6829 LearningRate 0.0848 Epoch: 11 Global Step: 121460 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:05:27,604-Speed 5973.79 samples/sec Loss 6.5525 LearningRate 0.0847 Epoch: 11 Global Step: 121470 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:05:34,481-Speed 5956.83 samples/sec Loss 6.6598 LearningRate 0.0847 Epoch: 11 Global Step: 121480 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:05:41,344-Speed 5969.29 samples/sec Loss 6.6336 LearningRate 0.0847 Epoch: 11 Global Step: 121490 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:05:48,204-Speed 5971.96 samples/sec Loss 6.6136 LearningRate 0.0847 Epoch: 11 Global Step: 121500 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:05:55,080-Speed 5958.07 samples/sec Loss 6.6424 LearningRate 0.0847 Epoch: 11 Global Step: 121510 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:06:01,930-Speed 5980.93 samples/sec Loss 6.7175 LearningRate 0.0846 Epoch: 11 Global Step: 121520 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:06:08,792-Speed 5970.22 samples/sec Loss 6.6458 LearningRate 0.0846 Epoch: 11 Global Step: 121530 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:06:15,674-Speed 5953.50 samples/sec Loss 6.5883 LearningRate 0.0846 Epoch: 11 Global Step: 121540 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:06:22,541-Speed 5965.91 samples/sec Loss 6.6254 LearningRate 0.0846 Epoch: 11 Global Step: 121550 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:06:29,409-Speed 5964.74 samples/sec Loss 6.5895 LearningRate 0.0846 Epoch: 11 Global Step: 121560 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:06:36,274-Speed 5967.52 samples/sec Loss 6.5867 LearningRate 0.0846 Epoch: 11 Global Step: 121570 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:06:43,155-Speed 5954.06 samples/sec Loss 6.6062 LearningRate 0.0845 Epoch: 11 Global Step: 121580 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:06:50,143-Speed 5863.31 samples/sec Loss 6.6228 LearningRate 0.0845 Epoch: 11 Global Step: 121590 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:06:57,004-Speed 5971.37 samples/sec Loss 6.6642 LearningRate 0.0845 Epoch: 11 Global Step: 121600 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:07:03,883-Speed 5955.26 samples/sec Loss 6.6527 LearningRate 0.0845 Epoch: 11 Global Step: 121610 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:07:10,761-Speed 5956.17 samples/sec Loss 6.6037 LearningRate 0.0845 Epoch: 11 Global Step: 121620 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:07:17,630-Speed 5963.94 samples/sec Loss 6.6182 LearningRate 0.0844 Epoch: 11 Global Step: 121630 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:07:24,497-Speed 5965.20 samples/sec Loss 6.5775 LearningRate 0.0844 Epoch: 11 Global Step: 121640 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:07:31,361-Speed 5969.44 samples/sec Loss 6.5576 LearningRate 0.0844 Epoch: 11 Global Step: 121650 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:07:38,234-Speed 5959.89 samples/sec Loss 6.5968 LearningRate 0.0844 Epoch: 11 Global Step: 121660 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:07:45,099-Speed 5967.97 samples/sec Loss 6.6722 LearningRate 0.0844 Epoch: 11 Global Step: 121670 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:07:51,966-Speed 5965.30 samples/sec Loss 6.6847 LearningRate 0.0843 Epoch: 11 Global Step: 121680 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:07:58,848-Speed 5954.16 samples/sec Loss 6.6167 LearningRate 0.0843 Epoch: 11 Global Step: 121690 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:08:05,715-Speed 5966.49 samples/sec Loss 6.5944 LearningRate 0.0843 Epoch: 11 Global Step: 121700 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:08:12,621-Speed 5931.82 samples/sec Loss 6.6078 LearningRate 0.0843 Epoch: 11 Global Step: 121710 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:08:19,475-Speed 5977.80 samples/sec Loss 6.6894 LearningRate 0.0843 Epoch: 11 Global Step: 121720 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:08:26,359-Speed 5951.30 samples/sec Loss 6.5820 LearningRate 0.0842 Epoch: 11 Global Step: 121730 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:08:33,213-Speed 5977.48 samples/sec Loss 6.5990 LearningRate 0.0842 Epoch: 11 Global Step: 121740 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:08:40,103-Speed 5946.53 samples/sec Loss 6.5962 LearningRate 0.0842 Epoch: 11 Global Step: 121750 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:08:47,021-Speed 5922.05 samples/sec Loss 6.6662 LearningRate 0.0842 Epoch: 11 Global Step: 121760 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-08 20:08:53,909-Speed 5948.81 samples/sec Loss 6.6074 LearningRate 0.0842 Epoch: 11 Global Step: 121770 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:09:00,776-Speed 5967.14 samples/sec Loss 6.5892 LearningRate 0.0841 Epoch: 11 Global Step: 121780 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:09:07,676-Speed 5937.21 samples/sec Loss 6.6224 LearningRate 0.0841 Epoch: 11 Global Step: 121790 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:09:14,549-Speed 5960.56 samples/sec Loss 6.6018 LearningRate 0.0841 Epoch: 11 Global Step: 121800 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:09:21,495-Speed 5898.48 samples/sec Loss 6.6161 LearningRate 0.0841 Epoch: 11 Global Step: 121810 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:09:28,360-Speed 5967.69 samples/sec Loss 6.5962 LearningRate 0.0841 Epoch: 11 Global Step: 121820 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:09:35,379-Speed 5836.81 samples/sec Loss 6.5680 LearningRate 0.0840 Epoch: 11 Global Step: 121830 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:09:42,315-Speed 5906.73 samples/sec Loss 6.6007 LearningRate 0.0840 Epoch: 11 Global Step: 121840 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:09:49,175-Speed 5971.80 samples/sec Loss 6.5893 LearningRate 0.0840 Epoch: 11 Global Step: 121850 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:09:56,031-Speed 5975.81 samples/sec Loss 6.5997 LearningRate 0.0840 Epoch: 11 Global Step: 121860 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:10:02,875-Speed 5986.31 samples/sec Loss 6.6263 LearningRate 0.0840 Epoch: 11 Global Step: 121870 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:10:09,744-Speed 5963.96 samples/sec Loss 6.5918 LearningRate 0.0839 Epoch: 11 Global Step: 121880 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:10:16,596-Speed 5979.24 samples/sec Loss 6.5916 LearningRate 0.0839 Epoch: 11 Global Step: 121890 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:10:23,463-Speed 5965.49 samples/sec Loss 6.5697 LearningRate 0.0839 Epoch: 11 Global Step: 121900 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:10:30,343-Speed 5954.84 samples/sec Loss 6.6272 LearningRate 0.0839 Epoch: 11 Global Step: 121910 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:10:37,232-Speed 5947.18 samples/sec Loss 6.6139 LearningRate 0.0839 Epoch: 11 Global Step: 121920 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:10:44,199-Speed 5879.44 samples/sec Loss 6.6231 LearningRate 0.0838 Epoch: 11 Global Step: 121930 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:10:51,059-Speed 5972.21 samples/sec Loss 6.6143 LearningRate 0.0838 Epoch: 11 Global Step: 121940 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:10:57,923-Speed 5968.55 samples/sec Loss 6.5661 LearningRate 0.0838 Epoch: 11 Global Step: 121950 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:11:04,783-Speed 5971.49 samples/sec Loss 6.6277 LearningRate 0.0838 Epoch: 11 Global Step: 121960 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:11:11,631-Speed 5982.67 samples/sec Loss 6.5563 LearningRate 0.0838 Epoch: 11 Global Step: 121970 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:11:18,501-Speed 5963.29 samples/sec Loss 6.5138 LearningRate 0.0837 Epoch: 11 Global Step: 121980 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:11:25,389-Speed 5947.87 samples/sec Loss 6.6298 LearningRate 0.0837 Epoch: 11 Global Step: 121990 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:11:32,251-Speed 5972.24 samples/sec Loss 6.5784 LearningRate 0.0837 Epoch: 11 Global Step: 122000 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:11:39,099-Speed 5983.14 samples/sec Loss 6.5810 LearningRate 0.0837 Epoch: 11 Global Step: 122010 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:11:45,994-Speed 5941.43 samples/sec Loss 6.5883 LearningRate 0.0837 Epoch: 11 Global Step: 122020 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:11:52,843-Speed 5981.67 samples/sec Loss 6.5550 LearningRate 0.0836 Epoch: 11 Global Step: 122030 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:11:59,716-Speed 5960.74 samples/sec Loss 6.6469 LearningRate 0.0836 Epoch: 11 Global Step: 122040 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:12:06,582-Speed 5968.20 samples/sec Loss 6.5607 LearningRate 0.0836 Epoch: 11 Global Step: 122050 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:12:13,454-Speed 5961.38 samples/sec Loss 6.5317 LearningRate 0.0836 Epoch: 11 Global Step: 122060 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:12:20,344-Speed 5946.15 samples/sec Loss 6.6133 LearningRate 0.0836 Epoch: 11 Global Step: 122070 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:12:27,199-Speed 5975.80 samples/sec Loss 6.6642 LearningRate 0.0835 Epoch: 11 Global Step: 122080 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:12:34,060-Speed 5971.28 samples/sec Loss 6.5546 LearningRate 0.0835 Epoch: 11 Global Step: 122090 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:12:40,922-Speed 5971.18 samples/sec Loss 6.5833 LearningRate 0.0835 Epoch: 11 Global Step: 122100 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:12:47,787-Speed 5967.44 samples/sec Loss 6.5718 LearningRate 0.0835 Epoch: 11 Global Step: 122110 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:12:54,654-Speed 5965.70 samples/sec Loss 6.5696 LearningRate 0.0835 Epoch: 11 Global Step: 122120 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:13:01,509-Speed 5976.74 samples/sec Loss 6.6058 LearningRate 0.0835 Epoch: 11 Global Step: 122130 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:13:08,406-Speed 5940.97 samples/sec Loss 6.6451 LearningRate 0.0834 Epoch: 11 Global Step: 122140 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:13:15,268-Speed 5969.61 samples/sec Loss 6.5845 LearningRate 0.0834 Epoch: 11 Global Step: 122150 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:13:22,139-Speed 5965.00 samples/sec Loss 6.6039 LearningRate 0.0834 Epoch: 11 Global Step: 122160 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:13:29,004-Speed 5967.74 samples/sec Loss 6.6042 LearningRate 0.0834 Epoch: 11 Global Step: 122170 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:13:35,853-Speed 5981.33 samples/sec Loss 6.5814 LearningRate 0.0834 Epoch: 11 Global Step: 122180 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:13:42,702-Speed 5981.84 samples/sec Loss 6.6222 LearningRate 0.0833 Epoch: 11 Global Step: 122190 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:13:49,560-Speed 5975.66 samples/sec Loss 6.5881 LearningRate 0.0833 Epoch: 11 Global Step: 122200 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:13:56,429-Speed 5964.12 samples/sec Loss 6.6023 LearningRate 0.0833 Epoch: 11 Global Step: 122210 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:14:03,288-Speed 5973.65 samples/sec Loss 6.5754 LearningRate 0.0833 Epoch: 11 Global Step: 122220 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:14:10,143-Speed 5976.96 samples/sec Loss 6.5937 LearningRate 0.0833 Epoch: 11 Global Step: 122230 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:14:17,022-Speed 5954.65 samples/sec Loss 6.5993 LearningRate 0.0832 Epoch: 11 Global Step: 122240 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:14:23,882-Speed 5972.08 samples/sec Loss 6.5626 LearningRate 0.0832 Epoch: 11 Global Step: 122250 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:14:30,734-Speed 5979.55 samples/sec Loss 6.5137 LearningRate 0.0832 Epoch: 11 Global Step: 122260 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:14:37,590-Speed 5975.10 samples/sec Loss 6.5601 LearningRate 0.0832 Epoch: 11 Global Step: 122270 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:14:44,431-Speed 5988.62 samples/sec Loss 6.5845 LearningRate 0.0832 Epoch: 11 Global Step: 122280 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-08 20:14:51,283-Speed 5979.07 samples/sec Loss 6.5892 LearningRate 0.0831 Epoch: 11 Global Step: 122290 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-08 20:14:58,160-Speed 5956.93 samples/sec Loss 6.6158 LearningRate 0.0831 Epoch: 11 Global Step: 122300 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-08 20:15:05,027-Speed 5966.10 samples/sec Loss 6.5663 LearningRate 0.0831 Epoch: 11 Global Step: 122310 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-08 20:15:11,895-Speed 5965.50 samples/sec Loss 6.6785 LearningRate 0.0831 Epoch: 11 Global Step: 122320 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-08 20:15:18,787-Speed 5944.06 samples/sec Loss 6.5624 LearningRate 0.0831 Epoch: 11 Global Step: 122330 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-08 20:15:25,651-Speed 5968.33 samples/sec Loss 6.5774 LearningRate 0.0830 Epoch: 11 Global Step: 122340 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-08 20:15:32,535-Speed 5951.33 samples/sec Loss 6.5395 LearningRate 0.0830 Epoch: 11 Global Step: 122350 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-08 20:15:39,405-Speed 5963.24 samples/sec Loss 6.5181 LearningRate 0.0830 Epoch: 11 Global Step: 122360 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-08 20:15:46,273-Speed 5965.16 samples/sec Loss 6.5464 LearningRate 0.0830 Epoch: 11 Global Step: 122370 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-08 20:15:53,128-Speed 5976.47 samples/sec Loss 6.5512 LearningRate 0.0830 Epoch: 11 Global Step: 122380 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:15:59,987-Speed 5973.09 samples/sec Loss 6.5245 LearningRate 0.0829 Epoch: 11 Global Step: 122390 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:16:06,892-Speed 5933.04 samples/sec Loss 6.5778 LearningRate 0.0829 Epoch: 11 Global Step: 122400 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:16:13,805-Speed 5926.24 samples/sec Loss 6.6057 LearningRate 0.0829 Epoch: 11 Global Step: 122410 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:16:20,725-Speed 5920.18 samples/sec Loss 6.5737 LearningRate 0.0829 Epoch: 11 Global Step: 122420 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:16:27,623-Speed 5939.28 samples/sec Loss 6.6006 LearningRate 0.0829 Epoch: 11 Global Step: 122430 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:16:34,484-Speed 5971.42 samples/sec Loss 6.4926 LearningRate 0.0828 Epoch: 11 Global Step: 122440 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:16:41,342-Speed 5973.29 samples/sec Loss 6.5061 LearningRate 0.0828 Epoch: 11 Global Step: 122450 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:16:48,208-Speed 5967.44 samples/sec Loss 6.5136 LearningRate 0.0828 Epoch: 11 Global Step: 122460 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:16:55,066-Speed 5973.84 samples/sec Loss 6.5542 LearningRate 0.0828 Epoch: 11 Global Step: 122470 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:17:01,920-Speed 5977.36 samples/sec Loss 6.5863 LearningRate 0.0828 Epoch: 11 Global Step: 122480 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:17:08,768-Speed 5982.87 samples/sec Loss 6.5741 LearningRate 0.0827 Epoch: 11 Global Step: 122490 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:17:15,618-Speed 5979.98 samples/sec Loss 6.4981 LearningRate 0.0827 Epoch: 11 Global Step: 122500 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:17:22,491-Speed 5961.24 samples/sec Loss 6.5541 LearningRate 0.0827 Epoch: 11 Global Step: 122510 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:17:29,357-Speed 5966.98 samples/sec Loss 6.5958 LearningRate 0.0827 Epoch: 11 Global Step: 122520 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:17:36,213-Speed 5975.60 samples/sec Loss 6.5228 LearningRate 0.0827 Epoch: 11 Global Step: 122530 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:17:43,075-Speed 5969.97 samples/sec Loss 6.5376 LearningRate 0.0826 Epoch: 11 Global Step: 122540 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:17:49,948-Speed 5961.31 samples/sec Loss 6.5672 LearningRate 0.0826 Epoch: 11 Global Step: 122550 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:17:56,807-Speed 5973.25 samples/sec Loss 6.5685 LearningRate 0.0826 Epoch: 11 Global Step: 122560 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:18:03,651-Speed 5985.72 samples/sec Loss 6.5780 LearningRate 0.0826 Epoch: 11 Global Step: 122570 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:18:10,509-Speed 5973.25 samples/sec Loss 6.5845 LearningRate 0.0826 Epoch: 11 Global Step: 122580 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:18:17,367-Speed 5974.39 samples/sec Loss 6.5179 LearningRate 0.0826 Epoch: 11 Global Step: 122590 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:18:24,240-Speed 5960.76 samples/sec Loss 6.5418 LearningRate 0.0825 Epoch: 11 Global Step: 122600 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:18:31,096-Speed 5975.42 samples/sec Loss 6.5345 LearningRate 0.0825 Epoch: 11 Global Step: 122610 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:18:37,956-Speed 5971.78 samples/sec Loss 6.5262 LearningRate 0.0825 Epoch: 11 Global Step: 122620 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:18:44,830-Speed 5959.62 samples/sec Loss 6.5488 LearningRate 0.0825 Epoch: 11 Global Step: 122630 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:18:52,281-Speed 5499.41 samples/sec Loss 6.5243 LearningRate 0.0825 Epoch: 11 Global Step: 122640 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:18:59,151-Speed 5963.45 samples/sec Loss 6.5782 LearningRate 0.0824 Epoch: 11 Global Step: 122650 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:19:06,006-Speed 5976.57 samples/sec Loss 6.5911 LearningRate 0.0824 Epoch: 11 Global Step: 122660 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:19:12,860-Speed 5977.21 samples/sec Loss 6.5241 LearningRate 0.0824 Epoch: 11 Global Step: 122670 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:19:19,718-Speed 5973.95 samples/sec Loss 6.5375 LearningRate 0.0824 Epoch: 11 Global Step: 122680 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:19:26,605-Speed 5950.41 samples/sec Loss 6.5299 LearningRate 0.0824 Epoch: 11 Global Step: 122690 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:19:33,468-Speed 5969.22 samples/sec Loss 6.5193 LearningRate 0.0823 Epoch: 11 Global Step: 122700 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:19:40,343-Speed 5958.71 samples/sec Loss 6.5102 LearningRate 0.0823 Epoch: 11 Global Step: 122710 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:19:47,216-Speed 5960.81 samples/sec Loss 6.5059 LearningRate 0.0823 Epoch: 11 Global Step: 122720 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:19:54,119-Speed 5934.31 samples/sec Loss 6.5398 LearningRate 0.0823 Epoch: 11 Global Step: 122730 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:20:01,026-Speed 5930.89 samples/sec Loss 6.5760 LearningRate 0.0823 Epoch: 11 Global Step: 122740 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:20:07,937-Speed 5928.20 samples/sec Loss 6.5290 LearningRate 0.0822 Epoch: 11 Global Step: 122750 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:20:14,807-Speed 5963.25 samples/sec Loss 6.6183 LearningRate 0.0822 Epoch: 11 Global Step: 122760 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:20:21,744-Speed 5906.31 samples/sec Loss 6.4624 LearningRate 0.0822 Epoch: 11 Global Step: 122770 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:20:28,599-Speed 5976.84 samples/sec Loss 6.4662 LearningRate 0.0822 Epoch: 11 Global Step: 122780 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:20:35,460-Speed 5970.82 samples/sec Loss 6.5779 LearningRate 0.0822 Epoch: 11 Global Step: 122790 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:20:42,347-Speed 5949.06 samples/sec Loss 6.5080 LearningRate 0.0821 Epoch: 11 Global Step: 122800 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:20:49,241-Speed 5942.21 samples/sec Loss 6.5348 LearningRate 0.0821 Epoch: 11 Global Step: 122810 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:20:56,077-Speed 5992.89 samples/sec Loss 6.5372 LearningRate 0.0821 Epoch: 11 Global Step: 122820 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:21:02,925-Speed 5981.85 samples/sec Loss 6.5418 LearningRate 0.0821 Epoch: 11 Global Step: 122830 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:21:09,806-Speed 5955.33 samples/sec Loss 6.5801 LearningRate 0.0821 Epoch: 11 Global Step: 122840 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:21:16,665-Speed 5972.35 samples/sec Loss 6.5234 LearningRate 0.0820 Epoch: 11 Global Step: 122850 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:21:23,518-Speed 5980.08 samples/sec Loss 6.5023 LearningRate 0.0820 Epoch: 11 Global Step: 122860 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:21:30,380-Speed 5969.62 samples/sec Loss 6.4832 LearningRate 0.0820 Epoch: 11 Global Step: 122870 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:21:37,227-Speed 5983.22 samples/sec Loss 6.4923 LearningRate 0.0820 Epoch: 11 Global Step: 122880 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:21:44,083-Speed 5975.82 samples/sec Loss 6.5197 LearningRate 0.0820 Epoch: 11 Global Step: 122890 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:21:50,947-Speed 5968.36 samples/sec Loss 6.5148 LearningRate 0.0820 Epoch: 11 Global Step: 122900 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:21:57,813-Speed 5966.48 samples/sec Loss 6.5307 LearningRate 0.0819 Epoch: 11 Global Step: 122910 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:22:04,665-Speed 5978.85 samples/sec Loss 6.4871 LearningRate 0.0819 Epoch: 11 Global Step: 122920 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:22:11,515-Speed 5980.75 samples/sec Loss 6.4605 LearningRate 0.0819 Epoch: 11 Global Step: 122930 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:22:18,382-Speed 5965.74 samples/sec Loss 6.5189 LearningRate 0.0819 Epoch: 11 Global Step: 122940 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:22:25,249-Speed 5966.11 samples/sec Loss 6.5018 LearningRate 0.0819 Epoch: 11 Global Step: 122950 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:22:32,108-Speed 5972.78 samples/sec Loss 6.5282 LearningRate 0.0818 Epoch: 11 Global Step: 122960 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:22:38,970-Speed 5970.56 samples/sec Loss 6.5162 LearningRate 0.0818 Epoch: 11 Global Step: 122970 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:22:45,828-Speed 5972.74 samples/sec Loss 6.4995 LearningRate 0.0818 Epoch: 11 Global Step: 122980 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:22:52,683-Speed 5976.43 samples/sec Loss 6.5139 LearningRate 0.0818 Epoch: 11 Global Step: 122990 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:22:59,548-Speed 5967.70 samples/sec Loss 6.5648 LearningRate 0.0818 Epoch: 11 Global Step: 123000 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:23:06,455-Speed 5931.74 samples/sec Loss 6.5301 LearningRate 0.0817 Epoch: 11 Global Step: 123010 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:23:13,313-Speed 5972.85 samples/sec Loss 6.5023 LearningRate 0.0817 Epoch: 11 Global Step: 123020 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:23:20,162-Speed 5982.57 samples/sec Loss 6.5123 LearningRate 0.0817 Epoch: 11 Global Step: 123030 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:23:27,011-Speed 5980.95 samples/sec Loss 6.5443 LearningRate 0.0817 Epoch: 11 Global Step: 123040 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:23:33,865-Speed 5977.04 samples/sec Loss 6.4724 LearningRate 0.0817 Epoch: 11 Global Step: 123050 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:23:40,752-Speed 5948.17 samples/sec Loss 6.5384 LearningRate 0.0816 Epoch: 11 Global Step: 123060 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:23:47,606-Speed 5977.93 samples/sec Loss 6.5073 LearningRate 0.0816 Epoch: 11 Global Step: 123070 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:23:54,473-Speed 5966.28 samples/sec Loss 6.5214 LearningRate 0.0816 Epoch: 11 Global Step: 123080 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:24:01,359-Speed 5949.76 samples/sec Loss 6.4642 LearningRate 0.0816 Epoch: 11 Global Step: 123090 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:24:08,214-Speed 5976.26 samples/sec Loss 6.4605 LearningRate 0.0816 Epoch: 11 Global Step: 123100 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-08 20:24:15,064-Speed 5980.67 samples/sec Loss 6.5001 LearningRate 0.0815 Epoch: 11 Global Step: 123110 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:24:21,924-Speed 5972.05 samples/sec Loss 6.5338 LearningRate 0.0815 Epoch: 11 Global Step: 123120 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:24:28,788-Speed 5967.80 samples/sec Loss 6.5229 LearningRate 0.0815 Epoch: 11 Global Step: 123130 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-08 20:24:35,628-Speed 5990.23 samples/sec Loss 6.5372 LearningRate 0.0815 Epoch: 11 Global Step: 123140 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-01-08 20:24:42,490-Speed 5969.63 samples/sec Loss 6.5333 LearningRate 0.0815 Epoch: 11 Global Step: 123150 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-01-08 20:24:49,337-Speed 5983.87 samples/sec Loss 6.5135 LearningRate 0.0814 Epoch: 11 Global Step: 123160 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-01-08 20:24:56,188-Speed 5981.15 samples/sec Loss 6.4992 LearningRate 0.0814 Epoch: 11 Global Step: 123170 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-01-08 20:25:03,077-Speed 5946.99 samples/sec Loss 6.4838 LearningRate 0.0814 Epoch: 11 Global Step: 123180 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-01-08 20:25:09,946-Speed 5964.46 samples/sec Loss 6.4865 LearningRate 0.0814 Epoch: 11 Global Step: 123190 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-01-08 20:25:16,810-Speed 5968.35 samples/sec Loss 6.4854 LearningRate 0.0814 Epoch: 11 Global Step: 123200 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-01-08 20:25:23,659-Speed 5981.05 samples/sec Loss 6.5322 LearningRate 0.0813 Epoch: 11 Global Step: 123210 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-01-08 20:25:30,505-Speed 5985.07 samples/sec Loss 6.5253 LearningRate 0.0813 Epoch: 11 Global Step: 123220 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-01-08 20:25:37,398-Speed 5942.98 samples/sec Loss 6.5683 LearningRate 0.0813 Epoch: 11 Global Step: 123230 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-01-08 20:25:44,310-Speed 5927.24 samples/sec Loss 6.5112 LearningRate 0.0813 Epoch: 11 Global Step: 123240 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-08 20:25:51,182-Speed 5961.47 samples/sec Loss 6.4498 LearningRate 0.0813 Epoch: 11 Global Step: 123250 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-08 20:25:58,041-Speed 5972.71 samples/sec Loss 6.4226 LearningRate 0.0813 Epoch: 11 Global Step: 123260 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-08 20:26:04,926-Speed 5959.36 samples/sec Loss 6.5616 LearningRate 0.0812 Epoch: 11 Global Step: 123270 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-08 20:26:11,787-Speed 5971.67 samples/sec Loss 6.4837 LearningRate 0.0812 Epoch: 11 Global Step: 123280 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-08 20:26:18,663-Speed 5958.14 samples/sec Loss 6.5168 LearningRate 0.0812 Epoch: 11 Global Step: 123290 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-08 20:26:25,910-Speed 5653.43 samples/sec Loss 6.5061 LearningRate 0.0812 Epoch: 11 Global Step: 123300 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-08 20:26:32,772-Speed 5971.07 samples/sec Loss 6.5066 LearningRate 0.0812 Epoch: 11 Global Step: 123310 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-08 20:26:39,637-Speed 5967.72 samples/sec Loss 6.4911 LearningRate 0.0811 Epoch: 11 Global Step: 123320 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-08 20:26:46,510-Speed 5960.40 samples/sec Loss 6.4996 LearningRate 0.0811 Epoch: 11 Global Step: 123330 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-08 20:26:53,381-Speed 5963.01 samples/sec Loss 6.4891 LearningRate 0.0811 Epoch: 11 Global Step: 123340 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:27:00,242-Speed 5970.89 samples/sec Loss 6.4884 LearningRate 0.0811 Epoch: 11 Global Step: 123350 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:27:07,098-Speed 5976.19 samples/sec Loss 6.4747 LearningRate 0.0811 Epoch: 11 Global Step: 123360 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:27:13,981-Speed 5952.26 samples/sec Loss 6.5592 LearningRate 0.0810 Epoch: 11 Global Step: 123370 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:27:20,844-Speed 5971.27 samples/sec Loss 6.4917 LearningRate 0.0810 Epoch: 11 Global Step: 123380 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:27:27,709-Speed 5967.21 samples/sec Loss 6.5502 LearningRate 0.0810 Epoch: 11 Global Step: 123390 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:27:34,554-Speed 5986.87 samples/sec Loss 6.5543 LearningRate 0.0810 Epoch: 11 Global Step: 123400 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:27:41,399-Speed 5984.65 samples/sec Loss 6.5272 LearningRate 0.0810 Epoch: 11 Global Step: 123410 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:27:48,243-Speed 5986.06 samples/sec Loss 6.4718 LearningRate 0.0809 Epoch: 11 Global Step: 123420 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:27:55,119-Speed 5958.07 samples/sec Loss 6.4466 LearningRate 0.0809 Epoch: 11 Global Step: 123430 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:28:01,988-Speed 5964.99 samples/sec Loss 6.4316 LearningRate 0.0809 Epoch: 11 Global Step: 123440 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:28:08,857-Speed 5963.91 samples/sec Loss 6.4572 LearningRate 0.0809 Epoch: 11 Global Step: 123450 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:28:15,712-Speed 5978.15 samples/sec Loss 6.4679 LearningRate 0.0809 Epoch: 11 Global Step: 123460 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:28:22,573-Speed 5970.91 samples/sec Loss 6.4984 LearningRate 0.0808 Epoch: 11 Global Step: 123470 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:28:29,427-Speed 5977.32 samples/sec Loss 6.5109 LearningRate 0.0808 Epoch: 11 Global Step: 123480 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:28:36,281-Speed 5977.31 samples/sec Loss 6.4643 LearningRate 0.0808 Epoch: 11 Global Step: 123490 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:28:43,149-Speed 5965.91 samples/sec Loss 6.5155 LearningRate 0.0808 Epoch: 11 Global Step: 123500 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:28:50,001-Speed 5977.77 samples/sec Loss 6.4670 LearningRate 0.0808 Epoch: 11 Global Step: 123510 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:28:56,855-Speed 5977.37 samples/sec Loss 6.4995 LearningRate 0.0808 Epoch: 11 Global Step: 123520 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:29:03,708-Speed 5977.85 samples/sec Loss 6.4560 LearningRate 0.0807 Epoch: 11 Global Step: 123530 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:29:10,571-Speed 5969.74 samples/sec Loss 6.4542 LearningRate 0.0807 Epoch: 11 Global Step: 123540 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:29:17,430-Speed 5972.63 samples/sec Loss 6.5073 LearningRate 0.0807 Epoch: 11 Global Step: 123550 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:29:24,286-Speed 5975.90 samples/sec Loss 6.4787 LearningRate 0.0807 Epoch: 11 Global Step: 123560 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:29:31,136-Speed 5980.52 samples/sec Loss 6.4446 LearningRate 0.0807 Epoch: 11 Global Step: 123570 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:29:37,990-Speed 5976.67 samples/sec Loss 6.4918 LearningRate 0.0806 Epoch: 11 Global Step: 123580 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:29:44,847-Speed 5975.48 samples/sec Loss 6.5315 LearningRate 0.0806 Epoch: 11 Global Step: 123590 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:29:51,690-Speed 5987.33 samples/sec Loss 6.5031 LearningRate 0.0806 Epoch: 11 Global Step: 123600 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:29:58,553-Speed 5970.39 samples/sec Loss 6.4611 LearningRate 0.0806 Epoch: 11 Global Step: 123610 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:30:05,402-Speed 5980.50 samples/sec Loss 6.4579 LearningRate 0.0806 Epoch: 11 Global Step: 123620 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:30:12,242-Speed 5990.52 samples/sec Loss 6.4846 LearningRate 0.0805 Epoch: 11 Global Step: 123630 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:30:19,101-Speed 5974.60 samples/sec Loss 6.4845 LearningRate 0.0805 Epoch: 11 Global Step: 123640 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:30:25,964-Speed 5970.12 samples/sec Loss 6.4988 LearningRate 0.0805 Epoch: 11 Global Step: 123650 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:30:32,809-Speed 5985.02 samples/sec Loss 6.4795 LearningRate 0.0805 Epoch: 11 Global Step: 123660 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:30:39,684-Speed 5958.55 samples/sec Loss 6.4057 LearningRate 0.0805 Epoch: 11 Global Step: 123670 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:30:46,567-Speed 5952.37 samples/sec Loss 6.5160 LearningRate 0.0804 Epoch: 11 Global Step: 123680 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:30:53,414-Speed 5983.50 samples/sec Loss 6.4910 LearningRate 0.0804 Epoch: 11 Global Step: 123690 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:31:00,268-Speed 5977.46 samples/sec Loss 6.5079 LearningRate 0.0804 Epoch: 11 Global Step: 123700 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:31:07,129-Speed 5971.79 samples/sec Loss 6.4727 LearningRate 0.0804 Epoch: 11 Global Step: 123710 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:31:13,970-Speed 5988.76 samples/sec Loss 6.5018 LearningRate 0.0804 Epoch: 11 Global Step: 123720 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-08 20:31:20,819-Speed 5981.62 samples/sec Loss 6.4739 LearningRate 0.0803 Epoch: 11 Global Step: 123730 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-08 20:31:27,689-Speed 5962.57 samples/sec Loss 6.5236 LearningRate 0.0803 Epoch: 11 Global Step: 123740 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-08 20:31:34,532-Speed 5986.61 samples/sec Loss 6.4769 LearningRate 0.0803 Epoch: 11 Global Step: 123750 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-08 20:31:41,420-Speed 5947.64 samples/sec Loss 6.4647 LearningRate 0.0803 Epoch: 11 Global Step: 123760 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-08 20:31:48,279-Speed 5973.10 samples/sec Loss 6.4626 LearningRate 0.0803 Epoch: 11 Global Step: 123770 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-08 20:31:55,124-Speed 5984.73 samples/sec Loss 6.4455 LearningRate 0.0803 Epoch: 11 Global Step: 123780 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-08 20:32:01,969-Speed 5984.85 samples/sec Loss 6.5253 LearningRate 0.0802 Epoch: 11 Global Step: 123790 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-08 20:32:08,807-Speed 5990.98 samples/sec Loss 6.4017 LearningRate 0.0802 Epoch: 11 Global Step: 123800 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-08 20:32:15,669-Speed 5970.76 samples/sec Loss 6.5033 LearningRate 0.0802 Epoch: 11 Global Step: 123810 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-08 20:32:22,515-Speed 5984.38 samples/sec Loss 6.4222 LearningRate 0.0802 Epoch: 11 Global Step: 123820 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:32:29,384-Speed 5963.16 samples/sec Loss 6.5015 LearningRate 0.0802 Epoch: 11 Global Step: 123830 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:32:36,230-Speed 5984.88 samples/sec Loss 6.4491 LearningRate 0.0801 Epoch: 11 Global Step: 123840 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:32:43,098-Speed 5965.26 samples/sec Loss 6.4178 LearningRate 0.0801 Epoch: 11 Global Step: 123850 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:32:49,946-Speed 5982.51 samples/sec Loss 6.4768 LearningRate 0.0801 Epoch: 11 Global Step: 123860 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:32:56,825-Speed 5955.49 samples/sec Loss 6.4890 LearningRate 0.0801 Epoch: 11 Global Step: 123870 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:33:03,712-Speed 5948.59 samples/sec Loss 6.4508 LearningRate 0.0801 Epoch: 11 Global Step: 123880 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:33:10,575-Speed 5969.21 samples/sec Loss 6.4781 LearningRate 0.0800 Epoch: 11 Global Step: 123890 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:33:17,445-Speed 5963.76 samples/sec Loss 6.4569 LearningRate 0.0800 Epoch: 11 Global Step: 123900 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:33:24,309-Speed 5969.02 samples/sec Loss 6.5000 LearningRate 0.0800 Epoch: 11 Global Step: 123910 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:33:31,158-Speed 5980.87 samples/sec Loss 6.4776 LearningRate 0.0800 Epoch: 11 Global Step: 123920 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:33:38,013-Speed 5979.73 samples/sec Loss 6.5089 LearningRate 0.0800 Epoch: 11 Global Step: 123930 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:33:44,892-Speed 5954.89 samples/sec Loss 6.4916 LearningRate 0.0799 Epoch: 11 Global Step: 123940 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:33:51,769-Speed 5959.12 samples/sec Loss 6.5089 LearningRate 0.0799 Epoch: 11 Global Step: 123950 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:33:58,621-Speed 5978.96 samples/sec Loss 6.4810 LearningRate 0.0799 Epoch: 11 Global Step: 123960 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:34:05,476-Speed 5976.10 samples/sec Loss 6.4130 LearningRate 0.0799 Epoch: 11 Global Step: 123970 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:34:12,316-Speed 5990.33 samples/sec Loss 6.4674 LearningRate 0.0799 Epoch: 11 Global Step: 123980 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:34:19,180-Speed 5968.89 samples/sec Loss 6.4768 LearningRate 0.0798 Epoch: 11 Global Step: 123990 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:34:26,027-Speed 5983.09 samples/sec Loss 6.4278 LearningRate 0.0798 Epoch: 11 Global Step: 124000 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:34:32,929-Speed 5935.26 samples/sec Loss 6.4107 LearningRate 0.0798 Epoch: 11 Global Step: 124010 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:34:39,782-Speed 5978.13 samples/sec Loss 6.4713 LearningRate 0.0798 Epoch: 11 Global Step: 124020 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:34:46,638-Speed 5977.57 samples/sec Loss 6.4921 LearningRate 0.0798 Epoch: 11 Global Step: 124030 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:34:53,485-Speed 5983.33 samples/sec Loss 6.4252 LearningRate 0.0798 Epoch: 11 Global Step: 124040 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:35:00,337-Speed 5979.28 samples/sec Loss 6.4256 LearningRate 0.0797 Epoch: 11 Global Step: 124050 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:35:07,199-Speed 5970.18 samples/sec Loss 6.4601 LearningRate 0.0797 Epoch: 11 Global Step: 124060 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:35:14,068-Speed 5965.69 samples/sec Loss 6.4400 LearningRate 0.0797 Epoch: 11 Global Step: 124070 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:35:20,943-Speed 5960.12 samples/sec Loss 6.4235 LearningRate 0.0797 Epoch: 11 Global Step: 124080 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:35:27,800-Speed 5974.13 samples/sec Loss 6.3719 LearningRate 0.0797 Epoch: 11 Global Step: 124090 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:35:34,650-Speed 5981.19 samples/sec Loss 6.4711 LearningRate 0.0796 Epoch: 11 Global Step: 124100 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:35:41,499-Speed 5981.39 samples/sec Loss 6.4384 LearningRate 0.0796 Epoch: 11 Global Step: 124110 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:35:48,378-Speed 5955.33 samples/sec Loss 6.4508 LearningRate 0.0796 Epoch: 11 Global Step: 124120 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:35:55,239-Speed 5971.17 samples/sec Loss 6.4714 LearningRate 0.0796 Epoch: 11 Global Step: 124130 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:36:02,102-Speed 5969.20 samples/sec Loss 6.4112 LearningRate 0.0796 Epoch: 11 Global Step: 124140 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:36:08,954-Speed 5979.16 samples/sec Loss 6.4424 LearningRate 0.0795 Epoch: 11 Global Step: 124150 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:36:15,834-Speed 5955.03 samples/sec Loss 6.4543 LearningRate 0.0795 Epoch: 11 Global Step: 124160 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:36:22,698-Speed 5969.14 samples/sec Loss 6.4009 LearningRate 0.0795 Epoch: 11 Global Step: 124170 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:36:29,549-Speed 5980.26 samples/sec Loss 6.4244 LearningRate 0.0795 Epoch: 11 Global Step: 124180 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:36:36,462-Speed 5925.42 samples/sec Loss 6.4568 LearningRate 0.0795 Epoch: 11 Global Step: 124190 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:36:43,330-Speed 5965.40 samples/sec Loss 6.3734 LearningRate 0.0794 Epoch: 11 Global Step: 124200 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:36:50,187-Speed 5974.93 samples/sec Loss 6.4469 LearningRate 0.0794 Epoch: 11 Global Step: 124210 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:36:57,047-Speed 5973.62 samples/sec Loss 6.4855 LearningRate 0.0794 Epoch: 11 Global Step: 124220 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:37:03,907-Speed 5970.77 samples/sec Loss 6.4053 LearningRate 0.0794 Epoch: 11 Global Step: 124230 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:37:10,762-Speed 5978.33 samples/sec Loss 6.3922 LearningRate 0.0794 Epoch: 11 Global Step: 124240 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:37:17,613-Speed 5979.71 samples/sec Loss 6.4488 LearningRate 0.0794 Epoch: 11 Global Step: 124250 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:37:24,471-Speed 5973.88 samples/sec Loss 6.5116 LearningRate 0.0793 Epoch: 11 Global Step: 124260 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:37:31,333-Speed 5971.04 samples/sec Loss 6.4920 LearningRate 0.0793 Epoch: 11 Global Step: 124270 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:37:38,208-Speed 5958.36 samples/sec Loss 6.4228 LearningRate 0.0793 Epoch: 11 Global Step: 124280 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-08 20:37:45,049-Speed 5988.88 samples/sec Loss 6.4470 LearningRate 0.0793 Epoch: 11 Global Step: 124290 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:37:51,935-Speed 5949.31 samples/sec Loss 6.4177 LearningRate 0.0793 Epoch: 11 Global Step: 124300 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:37:58,795-Speed 5974.42 samples/sec Loss 6.4915 LearningRate 0.0792 Epoch: 11 Global Step: 124310 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:38:05,649-Speed 5976.99 samples/sec Loss 6.4292 LearningRate 0.0792 Epoch: 11 Global Step: 124320 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:38:12,483-Speed 5994.50 samples/sec Loss 6.4101 LearningRate 0.0792 Epoch: 11 Global Step: 124330 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:38:19,326-Speed 5987.30 samples/sec Loss 6.4436 LearningRate 0.0792 Epoch: 11 Global Step: 124340 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:38:26,173-Speed 5983.46 samples/sec Loss 6.4127 LearningRate 0.0792 Epoch: 11 Global Step: 124350 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:38:33,026-Speed 5978.09 samples/sec Loss 6.4283 LearningRate 0.0791 Epoch: 11 Global Step: 124360 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:38:39,880-Speed 5976.91 samples/sec Loss 6.4643 LearningRate 0.0791 Epoch: 11 Global Step: 124370 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:38:46,735-Speed 5975.80 samples/sec Loss 6.4669 LearningRate 0.0791 Epoch: 11 Global Step: 124380 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:38:53,599-Speed 5969.07 samples/sec Loss 6.4294 LearningRate 0.0791 Epoch: 11 Global Step: 124390 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:39:00,467-Speed 5965.61 samples/sec Loss 6.4668 LearningRate 0.0791 Epoch: 11 Global Step: 124400 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:39:07,321-Speed 5977.05 samples/sec Loss 6.4475 LearningRate 0.0790 Epoch: 11 Global Step: 124410 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:39:14,188-Speed 5965.45 samples/sec Loss 6.4436 LearningRate 0.0790 Epoch: 11 Global Step: 124420 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:39:21,035-Speed 5982.97 samples/sec Loss 6.4618 LearningRate 0.0790 Epoch: 11 Global Step: 124430 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:39:45,397-Speed 1681.47 samples/sec Loss 6.4144 LearningRate 0.0790 Epoch: 12 Global Step: 124440 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:39:52,210-Speed 6013.29 samples/sec Loss 6.4372 LearningRate 0.0790 Epoch: 12 Global Step: 124450 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:39:59,035-Speed 6005.09 samples/sec Loss 6.3834 LearningRate 0.0790 Epoch: 12 Global Step: 124460 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:40:05,869-Speed 5994.68 samples/sec Loss 6.4191 LearningRate 0.0789 Epoch: 12 Global Step: 124470 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:40:12,710-Speed 5988.23 samples/sec Loss 6.4641 LearningRate 0.0789 Epoch: 12 Global Step: 124480 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:40:19,593-Speed 5974.17 samples/sec Loss 6.4341 LearningRate 0.0789 Epoch: 12 Global Step: 124490 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:40:26,454-Speed 5974.78 samples/sec Loss 6.3524 LearningRate 0.0789 Epoch: 12 Global Step: 124500 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:40:33,304-Speed 5980.96 samples/sec Loss 6.3986 LearningRate 0.0789 Epoch: 12 Global Step: 124510 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:40:40,146-Speed 5987.35 samples/sec Loss 6.4111 LearningRate 0.0788 Epoch: 12 Global Step: 124520 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:40:47,017-Speed 5976.07 samples/sec Loss 6.3288 LearningRate 0.0788 Epoch: 12 Global Step: 124530 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:40:53,878-Speed 5971.16 samples/sec Loss 6.3735 LearningRate 0.0788 Epoch: 12 Global Step: 124540 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:41:00,736-Speed 5973.39 samples/sec Loss 6.4148 LearningRate 0.0788 Epoch: 12 Global Step: 124550 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:41:07,607-Speed 5962.29 samples/sec Loss 6.4004 LearningRate 0.0788 Epoch: 12 Global Step: 124560 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:41:14,461-Speed 5977.42 samples/sec Loss 6.4548 LearningRate 0.0787 Epoch: 12 Global Step: 124570 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:41:21,315-Speed 5977.80 samples/sec Loss 6.3865 LearningRate 0.0787 Epoch: 12 Global Step: 124580 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:41:28,177-Speed 5970.44 samples/sec Loss 6.4155 LearningRate 0.0787 Epoch: 12 Global Step: 124590 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:41:35,024-Speed 5982.66 samples/sec Loss 6.3741 LearningRate 0.0787 Epoch: 12 Global Step: 124600 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:41:41,887-Speed 5970.02 samples/sec Loss 6.3847 LearningRate 0.0787 Epoch: 12 Global Step: 124610 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:41:48,769-Speed 5953.25 samples/sec Loss 6.3834 LearningRate 0.0786 Epoch: 12 Global Step: 124620 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:41:55,651-Speed 5952.91 samples/sec Loss 6.3818 LearningRate 0.0786 Epoch: 12 Global Step: 124630 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:42:02,514-Speed 5969.14 samples/sec Loss 6.4047 LearningRate 0.0786 Epoch: 12 Global Step: 124640 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:42:09,377-Speed 5969.45 samples/sec Loss 6.4369 LearningRate 0.0786 Epoch: 12 Global Step: 124650 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:42:16,230-Speed 5978.04 samples/sec Loss 6.3991 LearningRate 0.0786 Epoch: 12 Global Step: 124660 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:42:23,073-Speed 5986.69 samples/sec Loss 6.3777 LearningRate 0.0786 Epoch: 12 Global Step: 124670 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:42:29,959-Speed 5950.27 samples/sec Loss 6.4503 LearningRate 0.0785 Epoch: 12 Global Step: 124680 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:42:38,713-Speed 4679.34 samples/sec Loss 6.4043 LearningRate 0.0785 Epoch: 12 Global Step: 124690 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:42:45,552-Speed 5990.00 samples/sec Loss 6.4087 LearningRate 0.0785 Epoch: 12 Global Step: 124700 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:42:52,413-Speed 5971.39 samples/sec Loss 6.3385 LearningRate 0.0785 Epoch: 12 Global Step: 124710 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:42:59,277-Speed 5968.81 samples/sec Loss 6.4401 LearningRate 0.0785 Epoch: 12 Global Step: 124720 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:43:06,129-Speed 5979.30 samples/sec Loss 6.3022 LearningRate 0.0784 Epoch: 12 Global Step: 124730 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:43:12,981-Speed 5977.87 samples/sec Loss 6.3650 LearningRate 0.0784 Epoch: 12 Global Step: 124740 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:43:19,872-Speed 5945.86 samples/sec Loss 6.4242 LearningRate 0.0784 Epoch: 12 Global Step: 124750 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:43:26,728-Speed 5975.75 samples/sec Loss 6.3688 LearningRate 0.0784 Epoch: 12 Global Step: 124760 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:43:33,582-Speed 5976.71 samples/sec Loss 6.3707 LearningRate 0.0784 Epoch: 12 Global Step: 124770 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:43:40,438-Speed 5975.79 samples/sec Loss 6.3790 LearningRate 0.0783 Epoch: 12 Global Step: 124780 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:43:47,299-Speed 5970.99 samples/sec Loss 6.3657 LearningRate 0.0783 Epoch: 12 Global Step: 124790 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:43:54,156-Speed 5974.27 samples/sec Loss 6.4527 LearningRate 0.0783 Epoch: 12 Global Step: 124800 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:44:01,024-Speed 5965.30 samples/sec Loss 6.4638 LearningRate 0.0783 Epoch: 12 Global Step: 124810 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:44:07,882-Speed 5974.17 samples/sec Loss 6.4034 LearningRate 0.0783 Epoch: 12 Global Step: 124820 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:44:14,740-Speed 5973.98 samples/sec Loss 6.4389 LearningRate 0.0782 Epoch: 12 Global Step: 124830 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:44:21,596-Speed 5976.02 samples/sec Loss 6.4383 LearningRate 0.0782 Epoch: 12 Global Step: 124840 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:44:28,450-Speed 5977.74 samples/sec Loss 6.4068 LearningRate 0.0782 Epoch: 12 Global Step: 124850 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:44:35,307-Speed 5973.67 samples/sec Loss 6.3550 LearningRate 0.0782 Epoch: 12 Global Step: 124860 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:44:42,153-Speed 5985.08 samples/sec Loss 6.3942 LearningRate 0.0782 Epoch: 12 Global Step: 124870 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:44:49,023-Speed 5968.16 samples/sec Loss 6.3792 LearningRate 0.0782 Epoch: 12 Global Step: 124880 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:44:55,888-Speed 5967.85 samples/sec Loss 6.4010 LearningRate 0.0781 Epoch: 12 Global Step: 124890 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:45:02,738-Speed 5981.35 samples/sec Loss 6.3912 LearningRate 0.0781 Epoch: 12 Global Step: 124900 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:45:09,599-Speed 5972.68 samples/sec Loss 6.4212 LearningRate 0.0781 Epoch: 12 Global Step: 124910 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:45:16,457-Speed 5973.59 samples/sec Loss 6.3101 LearningRate 0.0781 Epoch: 12 Global Step: 124920 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:45:23,305-Speed 5982.61 samples/sec Loss 6.4492 LearningRate 0.0781 Epoch: 12 Global Step: 124930 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:45:30,153-Speed 5982.92 samples/sec Loss 6.4590 LearningRate 0.0780 Epoch: 12 Global Step: 124940 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:45:37,007-Speed 5976.13 samples/sec Loss 6.3979 LearningRate 0.0780 Epoch: 12 Global Step: 124950 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:45:43,852-Speed 5985.27 samples/sec Loss 6.3832 LearningRate 0.0780 Epoch: 12 Global Step: 124960 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:45:50,702-Speed 5982.30 samples/sec Loss 6.3757 LearningRate 0.0780 Epoch: 12 Global Step: 124970 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:45:57,551-Speed 5981.30 samples/sec Loss 6.3777 LearningRate 0.0780 Epoch: 12 Global Step: 124980 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:46:04,432-Speed 5953.92 samples/sec Loss 6.4095 LearningRate 0.0779 Epoch: 12 Global Step: 124990 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:46:11,285-Speed 5980.93 samples/sec Loss 6.3581 LearningRate 0.0779 Epoch: 12 Global Step: 125000 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:46:38,272-[lfw][125000]XNorm: 23.898443 Training: 2022-01-08 20:46:38,272-[lfw][125000]Accuracy-Flip: 0.99750+-0.00291 Training: 2022-01-08 20:46:38,273-[lfw][125000]Accuracy-Highest: 0.99783 Training: 2022-01-08 20:47:09,337-[cfp_fp][125000]XNorm: 21.006631 Training: 2022-01-08 20:47:09,338-[cfp_fp][125000]Accuracy-Flip: 0.98571+-0.00599 Training: 2022-01-08 20:47:09,339-[cfp_fp][125000]Accuracy-Highest: 0.98571 Training: 2022-01-08 20:47:36,052-[agedb_30][125000]XNorm: 23.742593 Training: 2022-01-08 20:47:36,053-[agedb_30][125000]Accuracy-Flip: 0.97200+-0.00726 Training: 2022-01-08 20:47:36,053-[agedb_30][125000]Accuracy-Highest: 0.97383 Training: 2022-01-08 20:47:42,915-Speed 447.02 samples/sec Loss 6.3715 LearningRate 0.0779 Epoch: 12 Global Step: 125010 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:47:49,766-Speed 5979.55 samples/sec Loss 6.2990 LearningRate 0.0779 Epoch: 12 Global Step: 125020 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:47:56,630-Speed 5968.94 samples/sec Loss 6.4098 LearningRate 0.0779 Epoch: 12 Global Step: 125030 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-08 20:48:03,484-Speed 5978.20 samples/sec Loss 6.4046 LearningRate 0.0779 Epoch: 12 Global Step: 125040 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-08 20:48:10,356-Speed 5961.25 samples/sec Loss 6.4077 LearningRate 0.0778 Epoch: 12 Global Step: 125050 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:48:17,220-Speed 5968.90 samples/sec Loss 6.3456 LearningRate 0.0778 Epoch: 12 Global Step: 125060 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:48:24,086-Speed 5966.79 samples/sec Loss 6.3168 LearningRate 0.0778 Epoch: 12 Global Step: 125070 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:48:30,960-Speed 5959.17 samples/sec Loss 6.3826 LearningRate 0.0778 Epoch: 12 Global Step: 125080 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:48:37,871-Speed 5931.33 samples/sec Loss 6.3590 LearningRate 0.0778 Epoch: 12 Global Step: 125090 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:48:44,782-Speed 5928.31 samples/sec Loss 6.4674 LearningRate 0.0777 Epoch: 12 Global Step: 125100 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:48:51,671-Speed 5946.72 samples/sec Loss 6.3187 LearningRate 0.0777 Epoch: 12 Global Step: 125110 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:48:58,540-Speed 5964.48 samples/sec Loss 6.3556 LearningRate 0.0777 Epoch: 12 Global Step: 125120 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:49:05,435-Speed 5942.04 samples/sec Loss 6.3010 LearningRate 0.0777 Epoch: 12 Global Step: 125130 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:49:12,304-Speed 5963.70 samples/sec Loss 6.3598 LearningRate 0.0777 Epoch: 12 Global Step: 125140 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:49:19,177-Speed 5961.32 samples/sec Loss 6.3267 LearningRate 0.0776 Epoch: 12 Global Step: 125150 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:49:26,036-Speed 5972.95 samples/sec Loss 6.3867 LearningRate 0.0776 Epoch: 12 Global Step: 125160 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:49:32,891-Speed 5976.16 samples/sec Loss 6.3754 LearningRate 0.0776 Epoch: 12 Global Step: 125170 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:49:39,749-Speed 5973.92 samples/sec Loss 6.4115 LearningRate 0.0776 Epoch: 12 Global Step: 125180 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:49:46,618-Speed 5964.06 samples/sec Loss 6.3549 LearningRate 0.0776 Epoch: 12 Global Step: 125190 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:49:53,476-Speed 5973.70 samples/sec Loss 6.3190 LearningRate 0.0775 Epoch: 12 Global Step: 125200 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:50:00,347-Speed 5962.52 samples/sec Loss 6.4004 LearningRate 0.0775 Epoch: 12 Global Step: 125210 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:50:07,222-Speed 5959.62 samples/sec Loss 6.3466 LearningRate 0.0775 Epoch: 12 Global Step: 125220 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:50:14,073-Speed 5979.69 samples/sec Loss 6.4148 LearningRate 0.0775 Epoch: 12 Global Step: 125230 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:50:20,938-Speed 5967.75 samples/sec Loss 6.3940 LearningRate 0.0775 Epoch: 12 Global Step: 125240 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:50:27,802-Speed 5971.02 samples/sec Loss 6.3507 LearningRate 0.0775 Epoch: 12 Global Step: 125250 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:50:34,674-Speed 5961.98 samples/sec Loss 6.3789 LearningRate 0.0774 Epoch: 12 Global Step: 125260 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:50:41,526-Speed 5979.04 samples/sec Loss 6.3667 LearningRate 0.0774 Epoch: 12 Global Step: 125270 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:50:48,375-Speed 5981.47 samples/sec Loss 6.3881 LearningRate 0.0774 Epoch: 12 Global Step: 125280 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:50:55,232-Speed 5973.48 samples/sec Loss 6.4077 LearningRate 0.0774 Epoch: 12 Global Step: 125290 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:51:02,082-Speed 5980.74 samples/sec Loss 6.4146 LearningRate 0.0774 Epoch: 12 Global Step: 125300 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:51:08,959-Speed 5959.85 samples/sec Loss 6.3408 LearningRate 0.0773 Epoch: 12 Global Step: 125310 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:51:15,817-Speed 5973.15 samples/sec Loss 6.3598 LearningRate 0.0773 Epoch: 12 Global Step: 125320 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-08 20:51:22,673-Speed 5975.92 samples/sec Loss 6.3324 LearningRate 0.0773 Epoch: 12 Global Step: 125330 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-08 20:51:29,511-Speed 5990.89 samples/sec Loss 6.4050 LearningRate 0.0773 Epoch: 12 Global Step: 125340 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:51:36,392-Speed 5954.14 samples/sec Loss 6.3606 LearningRate 0.0773 Epoch: 12 Global Step: 125350 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:51:43,251-Speed 5973.02 samples/sec Loss 6.3580 LearningRate 0.0772 Epoch: 12 Global Step: 125360 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:51:50,099-Speed 5982.50 samples/sec Loss 6.4052 LearningRate 0.0772 Epoch: 12 Global Step: 125370 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:51:56,968-Speed 5964.20 samples/sec Loss 6.3061 LearningRate 0.0772 Epoch: 12 Global Step: 125380 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:52:03,854-Speed 5951.41 samples/sec Loss 6.3899 LearningRate 0.0772 Epoch: 12 Global Step: 125390 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:52:10,720-Speed 5966.93 samples/sec Loss 6.2946 LearningRate 0.0772 Epoch: 12 Global Step: 125400 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:52:17,597-Speed 5957.21 samples/sec Loss 6.2998 LearningRate 0.0772 Epoch: 12 Global Step: 125410 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:52:24,468-Speed 5962.80 samples/sec Loss 6.3462 LearningRate 0.0771 Epoch: 12 Global Step: 125420 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:52:31,316-Speed 5983.55 samples/sec Loss 6.3715 LearningRate 0.0771 Epoch: 12 Global Step: 125430 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:52:38,179-Speed 5969.36 samples/sec Loss 6.3305 LearningRate 0.0771 Epoch: 12 Global Step: 125440 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:52:45,078-Speed 5938.58 samples/sec Loss 6.3486 LearningRate 0.0771 Epoch: 12 Global Step: 125450 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:52:51,959-Speed 5955.05 samples/sec Loss 6.3597 LearningRate 0.0771 Epoch: 12 Global Step: 125460 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:52:58,822-Speed 5969.05 samples/sec Loss 6.3289 LearningRate 0.0770 Epoch: 12 Global Step: 125470 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:53:05,687-Speed 5967.95 samples/sec Loss 6.2809 LearningRate 0.0770 Epoch: 12 Global Step: 125480 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:53:12,559-Speed 5962.24 samples/sec Loss 6.3250 LearningRate 0.0770 Epoch: 12 Global Step: 125490 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:53:19,418-Speed 5972.61 samples/sec Loss 6.2930 LearningRate 0.0770 Epoch: 12 Global Step: 125500 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:53:26,299-Speed 5964.32 samples/sec Loss 6.3088 LearningRate 0.0770 Epoch: 12 Global Step: 125510 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:53:33,152-Speed 5978.88 samples/sec Loss 6.3710 LearningRate 0.0769 Epoch: 12 Global Step: 125520 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:53:39,995-Speed 5985.90 samples/sec Loss 6.3182 LearningRate 0.0769 Epoch: 12 Global Step: 125530 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:53:46,841-Speed 5984.31 samples/sec Loss 6.2978 LearningRate 0.0769 Epoch: 12 Global Step: 125540 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:53:53,703-Speed 5971.91 samples/sec Loss 6.3179 LearningRate 0.0769 Epoch: 12 Global Step: 125550 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:54:00,562-Speed 5972.67 samples/sec Loss 6.3507 LearningRate 0.0769 Epoch: 12 Global Step: 125560 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:54:07,444-Speed 5952.02 samples/sec Loss 6.3080 LearningRate 0.0769 Epoch: 12 Global Step: 125570 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:54:14,328-Speed 5951.39 samples/sec Loss 6.3150 LearningRate 0.0768 Epoch: 12 Global Step: 125580 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:54:21,185-Speed 5974.76 samples/sec Loss 6.3970 LearningRate 0.0768 Epoch: 12 Global Step: 125590 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:54:28,068-Speed 5952.01 samples/sec Loss 6.3861 LearningRate 0.0768 Epoch: 12 Global Step: 125600 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:54:34,955-Speed 5949.14 samples/sec Loss 6.3291 LearningRate 0.0768 Epoch: 12 Global Step: 125610 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:54:41,828-Speed 5959.93 samples/sec Loss 6.3071 LearningRate 0.0768 Epoch: 12 Global Step: 125620 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:54:48,683-Speed 5976.74 samples/sec Loss 6.2868 LearningRate 0.0767 Epoch: 12 Global Step: 125630 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:54:55,556-Speed 5961.57 samples/sec Loss 6.2972 LearningRate 0.0767 Epoch: 12 Global Step: 125640 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:55:02,435-Speed 5955.47 samples/sec Loss 6.3491 LearningRate 0.0767 Epoch: 12 Global Step: 125650 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:55:09,285-Speed 5980.56 samples/sec Loss 6.3203 LearningRate 0.0767 Epoch: 12 Global Step: 125660 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:55:16,140-Speed 5976.87 samples/sec Loss 6.3398 LearningRate 0.0767 Epoch: 12 Global Step: 125670 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:55:22,996-Speed 5974.67 samples/sec Loss 6.3419 LearningRate 0.0766 Epoch: 12 Global Step: 125680 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:55:29,851-Speed 5977.08 samples/sec Loss 6.3342 LearningRate 0.0766 Epoch: 12 Global Step: 125690 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:55:36,696-Speed 5985.06 samples/sec Loss 6.4079 LearningRate 0.0766 Epoch: 12 Global Step: 125700 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:55:43,552-Speed 5975.57 samples/sec Loss 6.3352 LearningRate 0.0766 Epoch: 12 Global Step: 125710 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:55:50,404-Speed 5978.79 samples/sec Loss 6.4026 LearningRate 0.0766 Epoch: 12 Global Step: 125720 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:55:57,292-Speed 5947.79 samples/sec Loss 6.3488 LearningRate 0.0766 Epoch: 12 Global Step: 125730 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:56:04,150-Speed 5974.27 samples/sec Loss 6.3197 LearningRate 0.0765 Epoch: 12 Global Step: 125740 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:56:11,021-Speed 5962.71 samples/sec Loss 6.3265 LearningRate 0.0765 Epoch: 12 Global Step: 125750 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:56:17,877-Speed 5976.27 samples/sec Loss 6.3503 LearningRate 0.0765 Epoch: 12 Global Step: 125760 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:56:24,725-Speed 5981.60 samples/sec Loss 6.3608 LearningRate 0.0765 Epoch: 12 Global Step: 125770 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:56:31,589-Speed 5968.49 samples/sec Loss 6.3438 LearningRate 0.0765 Epoch: 12 Global Step: 125780 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:56:38,441-Speed 5979.12 samples/sec Loss 6.3287 LearningRate 0.0764 Epoch: 12 Global Step: 125790 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:56:45,298-Speed 5974.30 samples/sec Loss 6.3495 LearningRate 0.0764 Epoch: 12 Global Step: 125800 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:56:52,149-Speed 5979.88 samples/sec Loss 6.3478 LearningRate 0.0764 Epoch: 12 Global Step: 125810 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:56:59,017-Speed 5966.04 samples/sec Loss 6.3276 LearningRate 0.0764 Epoch: 12 Global Step: 125820 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:57:05,874-Speed 5974.03 samples/sec Loss 6.2987 LearningRate 0.0764 Epoch: 12 Global Step: 125830 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:57:12,721-Speed 5983.93 samples/sec Loss 6.2942 LearningRate 0.0763 Epoch: 12 Global Step: 125840 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:57:19,585-Speed 5968.06 samples/sec Loss 6.2978 LearningRate 0.0763 Epoch: 12 Global Step: 125850 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:57:26,427-Speed 5988.04 samples/sec Loss 6.3707 LearningRate 0.0763 Epoch: 12 Global Step: 125860 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:57:33,308-Speed 5953.45 samples/sec Loss 6.3226 LearningRate 0.0763 Epoch: 12 Global Step: 125870 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:57:40,164-Speed 5976.23 samples/sec Loss 6.3328 LearningRate 0.0763 Epoch: 12 Global Step: 125880 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:57:47,031-Speed 5965.33 samples/sec Loss 6.3418 LearningRate 0.0763 Epoch: 12 Global Step: 125890 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:57:53,903-Speed 5961.65 samples/sec Loss 6.2971 LearningRate 0.0762 Epoch: 12 Global Step: 125900 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 20:58:00,749-Speed 5984.72 samples/sec Loss 6.3406 LearningRate 0.0762 Epoch: 12 Global Step: 125910 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:58:07,632-Speed 5952.45 samples/sec Loss 6.3347 LearningRate 0.0762 Epoch: 12 Global Step: 125920 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:58:14,488-Speed 5975.30 samples/sec Loss 6.3454 LearningRate 0.0762 Epoch: 12 Global Step: 125930 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:58:21,341-Speed 5980.49 samples/sec Loss 6.3682 LearningRate 0.0762 Epoch: 12 Global Step: 125940 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:58:28,217-Speed 5958.00 samples/sec Loss 6.3142 LearningRate 0.0761 Epoch: 12 Global Step: 125950 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:58:35,069-Speed 5978.76 samples/sec Loss 6.3039 LearningRate 0.0761 Epoch: 12 Global Step: 125960 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:58:41,943-Speed 5960.02 samples/sec Loss 6.2982 LearningRate 0.0761 Epoch: 12 Global Step: 125970 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:58:48,802-Speed 5972.51 samples/sec Loss 6.3525 LearningRate 0.0761 Epoch: 12 Global Step: 125980 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:58:55,664-Speed 5973.64 samples/sec Loss 6.3517 LearningRate 0.0761 Epoch: 12 Global Step: 125990 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:59:02,530-Speed 5966.46 samples/sec Loss 6.3095 LearningRate 0.0760 Epoch: 12 Global Step: 126000 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:59:09,370-Speed 5989.09 samples/sec Loss 6.3344 LearningRate 0.0760 Epoch: 12 Global Step: 126010 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:59:16,227-Speed 5974.16 samples/sec Loss 6.2632 LearningRate 0.0760 Epoch: 12 Global Step: 126020 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:59:23,091-Speed 5968.89 samples/sec Loss 6.2690 LearningRate 0.0760 Epoch: 12 Global Step: 126030 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:59:29,943-Speed 5978.09 samples/sec Loss 6.2819 LearningRate 0.0760 Epoch: 12 Global Step: 126040 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:59:36,843-Speed 5937.10 samples/sec Loss 6.2682 LearningRate 0.0760 Epoch: 12 Global Step: 126050 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:59:43,704-Speed 5971.84 samples/sec Loss 6.2774 LearningRate 0.0759 Epoch: 12 Global Step: 126060 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:59:50,561-Speed 5974.60 samples/sec Loss 6.3646 LearningRate 0.0759 Epoch: 12 Global Step: 126070 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 20:59:57,435-Speed 5960.08 samples/sec Loss 6.2188 LearningRate 0.0759 Epoch: 12 Global Step: 126080 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:00:04,287-Speed 5979.17 samples/sec Loss 6.3060 LearningRate 0.0759 Epoch: 12 Global Step: 126090 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:00:11,132-Speed 5984.83 samples/sec Loss 6.2949 LearningRate 0.0759 Epoch: 12 Global Step: 126100 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:00:18,011-Speed 5955.50 samples/sec Loss 6.2778 LearningRate 0.0758 Epoch: 12 Global Step: 126110 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:00:24,869-Speed 5974.07 samples/sec Loss 6.3322 LearningRate 0.0758 Epoch: 12 Global Step: 126120 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:00:31,770-Speed 5936.22 samples/sec Loss 6.3280 LearningRate 0.0758 Epoch: 12 Global Step: 126130 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:00:38,645-Speed 5958.69 samples/sec Loss 6.2849 LearningRate 0.0758 Epoch: 12 Global Step: 126140 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:00:45,502-Speed 5977.56 samples/sec Loss 6.2790 LearningRate 0.0758 Epoch: 12 Global Step: 126150 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:00:52,385-Speed 5951.10 samples/sec Loss 6.3156 LearningRate 0.0757 Epoch: 12 Global Step: 126160 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:00:59,279-Speed 5944.65 samples/sec Loss 6.2671 LearningRate 0.0757 Epoch: 12 Global Step: 126170 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:01:06,133-Speed 5976.97 samples/sec Loss 6.3234 LearningRate 0.0757 Epoch: 12 Global Step: 126180 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:01:13,013-Speed 5955.04 samples/sec Loss 6.2926 LearningRate 0.0757 Epoch: 12 Global Step: 126190 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:01:19,888-Speed 5959.42 samples/sec Loss 6.2841 LearningRate 0.0757 Epoch: 12 Global Step: 126200 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:01:26,753-Speed 5967.74 samples/sec Loss 6.2905 LearningRate 0.0757 Epoch: 12 Global Step: 126210 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:01:33,614-Speed 5971.00 samples/sec Loss 6.3457 LearningRate 0.0756 Epoch: 12 Global Step: 126220 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:01:40,493-Speed 5962.34 samples/sec Loss 6.2757 LearningRate 0.0756 Epoch: 12 Global Step: 126230 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:01:47,359-Speed 5978.96 samples/sec Loss 6.3454 LearningRate 0.0756 Epoch: 12 Global Step: 126240 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:01:54,210-Speed 5979.48 samples/sec Loss 6.2721 LearningRate 0.0756 Epoch: 12 Global Step: 126250 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:02:01,082-Speed 5962.34 samples/sec Loss 6.2968 LearningRate 0.0756 Epoch: 12 Global Step: 126260 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:02:07,965-Speed 5951.73 samples/sec Loss 6.2487 LearningRate 0.0755 Epoch: 12 Global Step: 126270 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:02:14,813-Speed 5982.38 samples/sec Loss 6.3264 LearningRate 0.0755 Epoch: 12 Global Step: 126280 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:02:21,666-Speed 5978.36 samples/sec Loss 6.2887 LearningRate 0.0755 Epoch: 12 Global Step: 126290 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:02:28,547-Speed 5954.24 samples/sec Loss 6.3562 LearningRate 0.0755 Epoch: 12 Global Step: 126300 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:02:35,425-Speed 5955.74 samples/sec Loss 6.2618 LearningRate 0.0755 Epoch: 12 Global Step: 126310 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:02:42,288-Speed 5975.58 samples/sec Loss 6.2652 LearningRate 0.0754 Epoch: 12 Global Step: 126320 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:02:49,160-Speed 5961.37 samples/sec Loss 6.2933 LearningRate 0.0754 Epoch: 12 Global Step: 126330 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:02:56,027-Speed 5965.67 samples/sec Loss 6.3229 LearningRate 0.0754 Epoch: 12 Global Step: 126340 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:03:02,889-Speed 5970.50 samples/sec Loss 6.2816 LearningRate 0.0754 Epoch: 12 Global Step: 126350 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:03:09,770-Speed 5954.39 samples/sec Loss 6.2736 LearningRate 0.0754 Epoch: 12 Global Step: 126360 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:03:16,637-Speed 5965.27 samples/sec Loss 6.2720 LearningRate 0.0754 Epoch: 12 Global Step: 126370 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:03:23,494-Speed 5974.67 samples/sec Loss 6.2444 LearningRate 0.0753 Epoch: 12 Global Step: 126380 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:03:30,346-Speed 5978.96 samples/sec Loss 6.3133 LearningRate 0.0753 Epoch: 12 Global Step: 126390 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:03:37,220-Speed 5959.95 samples/sec Loss 6.2465 LearningRate 0.0753 Epoch: 12 Global Step: 126400 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:03:44,190-Speed 5877.45 samples/sec Loss 6.3400 LearningRate 0.0753 Epoch: 12 Global Step: 126410 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:03:51,049-Speed 5975.35 samples/sec Loss 6.3107 LearningRate 0.0753 Epoch: 12 Global Step: 126420 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:03:57,926-Speed 5956.56 samples/sec Loss 6.3129 LearningRate 0.0752 Epoch: 12 Global Step: 126430 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:04:04,794-Speed 5966.98 samples/sec Loss 6.3249 LearningRate 0.0752 Epoch: 12 Global Step: 126440 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:04:11,654-Speed 5972.56 samples/sec Loss 6.2712 LearningRate 0.0752 Epoch: 12 Global Step: 126450 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:04:18,538-Speed 5950.96 samples/sec Loss 6.2654 LearningRate 0.0752 Epoch: 12 Global Step: 126460 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:04:25,418-Speed 5955.06 samples/sec Loss 6.2672 LearningRate 0.0752 Epoch: 12 Global Step: 126470 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:04:32,289-Speed 5962.49 samples/sec Loss 6.2812 LearningRate 0.0752 Epoch: 12 Global Step: 126480 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:04:39,257-Speed 5879.82 samples/sec Loss 6.2878 LearningRate 0.0751 Epoch: 12 Global Step: 126490 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:04:46,163-Speed 5932.00 samples/sec Loss 6.3011 LearningRate 0.0751 Epoch: 12 Global Step: 126500 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:04:53,026-Speed 5970.71 samples/sec Loss 6.3216 LearningRate 0.0751 Epoch: 12 Global Step: 126510 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:04:59,882-Speed 5975.30 samples/sec Loss 6.2561 LearningRate 0.0751 Epoch: 12 Global Step: 126520 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:05:06,776-Speed 5943.16 samples/sec Loss 6.2724 LearningRate 0.0751 Epoch: 12 Global Step: 126530 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:05:13,629-Speed 5978.35 samples/sec Loss 6.3001 LearningRate 0.0750 Epoch: 12 Global Step: 126540 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:05:20,510-Speed 5953.57 samples/sec Loss 6.2795 LearningRate 0.0750 Epoch: 12 Global Step: 126550 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:05:27,393-Speed 5952.03 samples/sec Loss 6.2293 LearningRate 0.0750 Epoch: 12 Global Step: 126560 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:05:34,256-Speed 5969.34 samples/sec Loss 6.2386 LearningRate 0.0750 Epoch: 12 Global Step: 126570 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:05:41,107-Speed 5979.45 samples/sec Loss 6.2178 LearningRate 0.0750 Epoch: 12 Global Step: 126580 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:05:47,963-Speed 5975.70 samples/sec Loss 6.2163 LearningRate 0.0749 Epoch: 12 Global Step: 126590 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:05:54,835-Speed 5961.84 samples/sec Loss 6.2989 LearningRate 0.0749 Epoch: 12 Global Step: 126600 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:06:01,682-Speed 5983.45 samples/sec Loss 6.2684 LearningRate 0.0749 Epoch: 12 Global Step: 126610 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:06:08,542-Speed 5972.04 samples/sec Loss 6.2996 LearningRate 0.0749 Epoch: 12 Global Step: 126620 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:06:15,391-Speed 5981.11 samples/sec Loss 6.2422 LearningRate 0.0749 Epoch: 12 Global Step: 126630 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:06:22,252-Speed 5970.87 samples/sec Loss 6.3232 LearningRate 0.0749 Epoch: 12 Global Step: 126640 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:06:29,105-Speed 5979.21 samples/sec Loss 6.3243 LearningRate 0.0748 Epoch: 12 Global Step: 126650 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:06:35,968-Speed 5969.14 samples/sec Loss 6.2942 LearningRate 0.0748 Epoch: 12 Global Step: 126660 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:06:42,821-Speed 5977.64 samples/sec Loss 6.2234 LearningRate 0.0748 Epoch: 12 Global Step: 126670 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:06:49,692-Speed 5965.28 samples/sec Loss 6.2511 LearningRate 0.0748 Epoch: 12 Global Step: 126680 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:06:56,560-Speed 5964.52 samples/sec Loss 6.2959 LearningRate 0.0748 Epoch: 12 Global Step: 126690 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:07:03,404-Speed 5986.05 samples/sec Loss 6.2871 LearningRate 0.0747 Epoch: 12 Global Step: 126700 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:07:10,258-Speed 5976.65 samples/sec Loss 6.2751 LearningRate 0.0747 Epoch: 12 Global Step: 126710 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:07:17,128-Speed 5964.16 samples/sec Loss 6.2172 LearningRate 0.0747 Epoch: 12 Global Step: 126720 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:07:23,978-Speed 5979.95 samples/sec Loss 6.2927 LearningRate 0.0747 Epoch: 12 Global Step: 126730 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:07:30,828-Speed 5980.51 samples/sec Loss 6.2229 LearningRate 0.0747 Epoch: 12 Global Step: 126740 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:07:37,719-Speed 5948.25 samples/sec Loss 6.2507 LearningRate 0.0747 Epoch: 12 Global Step: 126750 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:07:44,609-Speed 5945.65 samples/sec Loss 6.2910 LearningRate 0.0746 Epoch: 12 Global Step: 126760 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:07:51,520-Speed 5927.64 samples/sec Loss 6.1889 LearningRate 0.0746 Epoch: 12 Global Step: 126770 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:07:58,446-Speed 5916.15 samples/sec Loss 6.2528 LearningRate 0.0746 Epoch: 12 Global Step: 126780 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:08:05,319-Speed 5959.92 samples/sec Loss 6.2313 LearningRate 0.0746 Epoch: 12 Global Step: 126790 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:08:12,208-Speed 5947.51 samples/sec Loss 6.2741 LearningRate 0.0746 Epoch: 12 Global Step: 126800 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:08:19,061-Speed 5978.54 samples/sec Loss 6.2851 LearningRate 0.0745 Epoch: 12 Global Step: 126810 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:08:25,907-Speed 5983.35 samples/sec Loss 6.2475 LearningRate 0.0745 Epoch: 12 Global Step: 126820 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:08:32,755-Speed 5983.00 samples/sec Loss 6.3185 LearningRate 0.0745 Epoch: 12 Global Step: 126830 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:08:39,612-Speed 5974.27 samples/sec Loss 6.2612 LearningRate 0.0745 Epoch: 12 Global Step: 126840 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:08:46,476-Speed 5968.42 samples/sec Loss 6.1941 LearningRate 0.0745 Epoch: 12 Global Step: 126850 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:08:53,336-Speed 5972.51 samples/sec Loss 6.2103 LearningRate 0.0744 Epoch: 12 Global Step: 126860 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:09:00,179-Speed 5987.11 samples/sec Loss 6.2663 LearningRate 0.0744 Epoch: 12 Global Step: 126870 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:09:07,061-Speed 5952.46 samples/sec Loss 6.2482 LearningRate 0.0744 Epoch: 12 Global Step: 126880 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:09:13,924-Speed 5969.05 samples/sec Loss 6.2677 LearningRate 0.0744 Epoch: 12 Global Step: 126890 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:09:20,778-Speed 5977.07 samples/sec Loss 6.2041 LearningRate 0.0744 Epoch: 12 Global Step: 126900 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:09:27,652-Speed 5959.61 samples/sec Loss 6.2785 LearningRate 0.0744 Epoch: 12 Global Step: 126910 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:09:34,528-Speed 5958.47 samples/sec Loss 6.2330 LearningRate 0.0743 Epoch: 12 Global Step: 126920 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:09:41,385-Speed 5975.09 samples/sec Loss 6.2076 LearningRate 0.0743 Epoch: 12 Global Step: 126930 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:09:48,251-Speed 5966.76 samples/sec Loss 6.2441 LearningRate 0.0743 Epoch: 12 Global Step: 126940 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:09:55,108-Speed 5974.65 samples/sec Loss 6.1986 LearningRate 0.0743 Epoch: 12 Global Step: 126950 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:10:01,974-Speed 5967.02 samples/sec Loss 6.2326 LearningRate 0.0743 Epoch: 12 Global Step: 126960 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:10:08,835-Speed 5970.68 samples/sec Loss 6.2195 LearningRate 0.0742 Epoch: 12 Global Step: 126970 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-08 21:10:15,684-Speed 5982.05 samples/sec Loss 6.2442 LearningRate 0.0742 Epoch: 12 Global Step: 126980 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:10:22,546-Speed 5970.99 samples/sec Loss 6.1881 LearningRate 0.0742 Epoch: 12 Global Step: 126990 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:10:29,403-Speed 5974.05 samples/sec Loss 6.2995 LearningRate 0.0742 Epoch: 12 Global Step: 127000 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:10:36,244-Speed 5988.54 samples/sec Loss 6.2999 LearningRate 0.0742 Epoch: 12 Global Step: 127010 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:10:43,095-Speed 5979.77 samples/sec Loss 6.2570 LearningRate 0.0742 Epoch: 12 Global Step: 127020 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:10:49,948-Speed 5977.62 samples/sec Loss 6.2109 LearningRate 0.0741 Epoch: 12 Global Step: 127030 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:10:56,840-Speed 5944.26 samples/sec Loss 6.1847 LearningRate 0.0741 Epoch: 12 Global Step: 127040 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:11:03,684-Speed 5986.05 samples/sec Loss 6.2111 LearningRate 0.0741 Epoch: 12 Global Step: 127050 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:11:10,536-Speed 5979.03 samples/sec Loss 6.2594 LearningRate 0.0741 Epoch: 12 Global Step: 127060 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:11:17,384-Speed 5981.92 samples/sec Loss 6.2855 LearningRate 0.0741 Epoch: 12 Global Step: 127070 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:11:24,227-Speed 5986.97 samples/sec Loss 6.1712 LearningRate 0.0740 Epoch: 12 Global Step: 127080 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:11:31,087-Speed 5971.73 samples/sec Loss 6.2888 LearningRate 0.0740 Epoch: 12 Global Step: 127090 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:11:37,937-Speed 5981.56 samples/sec Loss 6.2335 LearningRate 0.0740 Epoch: 12 Global Step: 127100 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:11:44,790-Speed 5977.98 samples/sec Loss 6.2844 LearningRate 0.0740 Epoch: 12 Global Step: 127110 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:11:51,655-Speed 5967.80 samples/sec Loss 6.1742 LearningRate 0.0740 Epoch: 12 Global Step: 127120 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:11:58,521-Speed 5966.28 samples/sec Loss 6.2440 LearningRate 0.0739 Epoch: 12 Global Step: 127130 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:12:05,382-Speed 5971.71 samples/sec Loss 6.2002 LearningRate 0.0739 Epoch: 12 Global Step: 127140 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:12:12,232-Speed 5980.91 samples/sec Loss 6.2053 LearningRate 0.0739 Epoch: 12 Global Step: 127150 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:12:19,071-Speed 5990.18 samples/sec Loss 6.2066 LearningRate 0.0739 Epoch: 12 Global Step: 127160 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:12:25,934-Speed 5971.94 samples/sec Loss 6.2875 LearningRate 0.0739 Epoch: 12 Global Step: 127170 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:12:32,793-Speed 5972.69 samples/sec Loss 6.2184 LearningRate 0.0739 Epoch: 12 Global Step: 127180 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:12:39,661-Speed 5965.19 samples/sec Loss 6.2902 LearningRate 0.0738 Epoch: 12 Global Step: 127190 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:12:46,522-Speed 5971.60 samples/sec Loss 6.2045 LearningRate 0.0738 Epoch: 12 Global Step: 127200 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:12:53,390-Speed 5964.31 samples/sec Loss 6.2204 LearningRate 0.0738 Epoch: 12 Global Step: 127210 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:13:00,242-Speed 5979.88 samples/sec Loss 6.2607 LearningRate 0.0738 Epoch: 12 Global Step: 127220 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:13:07,110-Speed 5964.95 samples/sec Loss 6.2361 LearningRate 0.0738 Epoch: 12 Global Step: 127230 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:13:13,968-Speed 5973.59 samples/sec Loss 6.2333 LearningRate 0.0737 Epoch: 12 Global Step: 127240 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:13:20,837-Speed 5964.01 samples/sec Loss 6.2443 LearningRate 0.0737 Epoch: 12 Global Step: 127250 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:13:27,691-Speed 5977.74 samples/sec Loss 6.1974 LearningRate 0.0737 Epoch: 12 Global Step: 127260 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:13:34,547-Speed 5975.58 samples/sec Loss 6.2149 LearningRate 0.0737 Epoch: 12 Global Step: 127270 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:13:41,397-Speed 5980.96 samples/sec Loss 6.2763 LearningRate 0.0737 Epoch: 12 Global Step: 127280 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:13:48,292-Speed 5942.16 samples/sec Loss 6.2529 LearningRate 0.0737 Epoch: 12 Global Step: 127290 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:13:55,134-Speed 5987.32 samples/sec Loss 6.2201 LearningRate 0.0736 Epoch: 12 Global Step: 127300 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:14:02,010-Speed 5958.10 samples/sec Loss 6.2723 LearningRate 0.0736 Epoch: 12 Global Step: 127310 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:14:08,856-Speed 5984.85 samples/sec Loss 6.2350 LearningRate 0.0736 Epoch: 12 Global Step: 127320 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:14:15,730-Speed 5960.90 samples/sec Loss 6.1874 LearningRate 0.0736 Epoch: 12 Global Step: 127330 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:14:22,592-Speed 5970.58 samples/sec Loss 6.2511 LearningRate 0.0736 Epoch: 12 Global Step: 127340 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:14:29,455-Speed 5969.80 samples/sec Loss 6.2464 LearningRate 0.0735 Epoch: 12 Global Step: 127350 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:14:36,307-Speed 5978.68 samples/sec Loss 6.2444 LearningRate 0.0735 Epoch: 12 Global Step: 127360 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:14:43,168-Speed 5971.94 samples/sec Loss 6.1849 LearningRate 0.0735 Epoch: 12 Global Step: 127370 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:14:50,012-Speed 5985.77 samples/sec Loss 6.2143 LearningRate 0.0735 Epoch: 12 Global Step: 127380 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:14:56,865-Speed 5977.68 samples/sec Loss 6.2143 LearningRate 0.0735 Epoch: 12 Global Step: 127390 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:15:03,799-Speed 5908.87 samples/sec Loss 6.1747 LearningRate 0.0735 Epoch: 12 Global Step: 127400 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:15:10,770-Speed 5877.80 samples/sec Loss 6.1948 LearningRate 0.0734 Epoch: 12 Global Step: 127410 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:15:17,613-Speed 5986.63 samples/sec Loss 6.2267 LearningRate 0.0734 Epoch: 12 Global Step: 127420 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:15:24,481-Speed 5965.04 samples/sec Loss 6.2314 LearningRate 0.0734 Epoch: 12 Global Step: 127430 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:15:31,343-Speed 5971.03 samples/sec Loss 6.2671 LearningRate 0.0734 Epoch: 12 Global Step: 127440 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:15:38,197-Speed 5976.28 samples/sec Loss 6.2402 LearningRate 0.0734 Epoch: 12 Global Step: 127450 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:15:45,080-Speed 5952.75 samples/sec Loss 6.2339 LearningRate 0.0733 Epoch: 12 Global Step: 127460 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:15:51,938-Speed 5973.63 samples/sec Loss 6.2126 LearningRate 0.0733 Epoch: 12 Global Step: 127470 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:15:58,819-Speed 5953.62 samples/sec Loss 6.2009 LearningRate 0.0733 Epoch: 12 Global Step: 127480 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:16:05,693-Speed 5960.41 samples/sec Loss 6.2100 LearningRate 0.0733 Epoch: 12 Global Step: 127490 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:16:12,561-Speed 5964.95 samples/sec Loss 6.2358 LearningRate 0.0733 Epoch: 12 Global Step: 127500 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:16:19,440-Speed 5955.39 samples/sec Loss 6.2020 LearningRate 0.0733 Epoch: 12 Global Step: 127510 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:16:26,301-Speed 5971.69 samples/sec Loss 6.2361 LearningRate 0.0732 Epoch: 12 Global Step: 127520 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:16:33,155-Speed 5976.84 samples/sec Loss 6.1609 LearningRate 0.0732 Epoch: 12 Global Step: 127530 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:16:40,030-Speed 5959.79 samples/sec Loss 6.1690 LearningRate 0.0732 Epoch: 12 Global Step: 127540 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:16:46,910-Speed 5955.33 samples/sec Loss 6.2295 LearningRate 0.0732 Epoch: 12 Global Step: 127550 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:16:53,761-Speed 5980.09 samples/sec Loss 6.2134 LearningRate 0.0732 Epoch: 12 Global Step: 127560 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:17:00,624-Speed 5969.06 samples/sec Loss 6.2294 LearningRate 0.0731 Epoch: 12 Global Step: 127570 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:17:07,478-Speed 5978.88 samples/sec Loss 6.2313 LearningRate 0.0731 Epoch: 12 Global Step: 127580 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:17:14,332-Speed 5979.72 samples/sec Loss 6.2331 LearningRate 0.0731 Epoch: 12 Global Step: 127590 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:17:21,208-Speed 5957.99 samples/sec Loss 6.2398 LearningRate 0.0731 Epoch: 12 Global Step: 127600 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:17:28,048-Speed 5989.51 samples/sec Loss 6.2365 LearningRate 0.0731 Epoch: 12 Global Step: 127610 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:17:34,913-Speed 5967.87 samples/sec Loss 6.1664 LearningRate 0.0730 Epoch: 12 Global Step: 127620 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:17:41,763-Speed 5980.72 samples/sec Loss 6.0810 LearningRate 0.0730 Epoch: 12 Global Step: 127630 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:17:48,617-Speed 5976.81 samples/sec Loss 6.1304 LearningRate 0.0730 Epoch: 12 Global Step: 127640 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:17:55,470-Speed 5978.77 samples/sec Loss 6.1951 LearningRate 0.0730 Epoch: 12 Global Step: 127650 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:18:02,315-Speed 5984.57 samples/sec Loss 6.1866 LearningRate 0.0730 Epoch: 12 Global Step: 127660 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:18:09,175-Speed 5972.04 samples/sec Loss 6.2254 LearningRate 0.0730 Epoch: 12 Global Step: 127670 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:18:16,038-Speed 5969.32 samples/sec Loss 6.2079 LearningRate 0.0729 Epoch: 12 Global Step: 127680 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:18:22,911-Speed 5961.04 samples/sec Loss 6.1989 LearningRate 0.0729 Epoch: 12 Global Step: 127690 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:18:29,771-Speed 5971.91 samples/sec Loss 6.1509 LearningRate 0.0729 Epoch: 12 Global Step: 127700 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:18:36,626-Speed 5976.38 samples/sec Loss 6.2140 LearningRate 0.0729 Epoch: 12 Global Step: 127710 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:18:43,484-Speed 5973.70 samples/sec Loss 6.1769 LearningRate 0.0729 Epoch: 12 Global Step: 127720 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:18:50,334-Speed 5981.01 samples/sec Loss 6.1661 LearningRate 0.0728 Epoch: 12 Global Step: 127730 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:18:57,188-Speed 5980.19 samples/sec Loss 6.2316 LearningRate 0.0728 Epoch: 12 Global Step: 127740 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:19:04,036-Speed 5982.10 samples/sec Loss 6.2815 LearningRate 0.0728 Epoch: 12 Global Step: 127750 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:19:10,893-Speed 5974.72 samples/sec Loss 6.2449 LearningRate 0.0728 Epoch: 12 Global Step: 127760 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:19:17,743-Speed 5981.15 samples/sec Loss 6.2058 LearningRate 0.0728 Epoch: 12 Global Step: 127770 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:19:24,592-Speed 5981.20 samples/sec Loss 6.2133 LearningRate 0.0728 Epoch: 12 Global Step: 127780 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:19:31,511-Speed 5921.03 samples/sec Loss 6.2122 LearningRate 0.0727 Epoch: 12 Global Step: 127790 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:19:38,456-Speed 5899.61 samples/sec Loss 6.2138 LearningRate 0.0727 Epoch: 12 Global Step: 127800 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:19:45,407-Speed 5893.55 samples/sec Loss 6.2263 LearningRate 0.0727 Epoch: 12 Global Step: 127810 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:19:52,269-Speed 5970.12 samples/sec Loss 6.1653 LearningRate 0.0727 Epoch: 12 Global Step: 127820 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:19:59,129-Speed 5972.19 samples/sec Loss 6.0943 LearningRate 0.0727 Epoch: 12 Global Step: 127830 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:20:05,979-Speed 5980.18 samples/sec Loss 6.1939 LearningRate 0.0726 Epoch: 12 Global Step: 127840 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:20:12,831-Speed 5978.40 samples/sec Loss 6.1233 LearningRate 0.0726 Epoch: 12 Global Step: 127850 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:20:19,689-Speed 5974.25 samples/sec Loss 6.1886 LearningRate 0.0726 Epoch: 12 Global Step: 127860 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:20:26,543-Speed 5977.33 samples/sec Loss 6.2032 LearningRate 0.0726 Epoch: 12 Global Step: 127870 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:20:33,417-Speed 5960.01 samples/sec Loss 6.2321 LearningRate 0.0726 Epoch: 12 Global Step: 127880 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:20:40,270-Speed 5977.90 samples/sec Loss 6.1747 LearningRate 0.0726 Epoch: 12 Global Step: 127890 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:20:47,124-Speed 5976.94 samples/sec Loss 6.1105 LearningRate 0.0725 Epoch: 12 Global Step: 127900 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:20:53,974-Speed 5981.72 samples/sec Loss 6.1578 LearningRate 0.0725 Epoch: 12 Global Step: 127910 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:21:00,832-Speed 5974.42 samples/sec Loss 6.1920 LearningRate 0.0725 Epoch: 12 Global Step: 127920 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:21:07,705-Speed 5960.19 samples/sec Loss 6.2443 LearningRate 0.0725 Epoch: 12 Global Step: 127930 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:21:14,619-Speed 5925.61 samples/sec Loss 6.1250 LearningRate 0.0725 Epoch: 12 Global Step: 127940 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:21:21,495-Speed 5958.05 samples/sec Loss 6.1502 LearningRate 0.0724 Epoch: 12 Global Step: 127950 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:21:28,354-Speed 5972.70 samples/sec Loss 6.1761 LearningRate 0.0724 Epoch: 12 Global Step: 127960 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:21:35,199-Speed 5984.77 samples/sec Loss 6.1955 LearningRate 0.0724 Epoch: 12 Global Step: 127970 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:21:42,041-Speed 5987.58 samples/sec Loss 6.2055 LearningRate 0.0724 Epoch: 12 Global Step: 127980 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:21:49,001-Speed 5886.80 samples/sec Loss 6.1978 LearningRate 0.0724 Epoch: 12 Global Step: 127990 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:21:55,971-Speed 5878.31 samples/sec Loss 6.1921 LearningRate 0.0724 Epoch: 12 Global Step: 128000 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:22:02,839-Speed 5965.23 samples/sec Loss 6.1564 LearningRate 0.0723 Epoch: 12 Global Step: 128010 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:22:09,756-Speed 5922.14 samples/sec Loss 6.1816 LearningRate 0.0723 Epoch: 12 Global Step: 128020 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:22:16,607-Speed 5980.56 samples/sec Loss 6.1864 LearningRate 0.0723 Epoch: 12 Global Step: 128030 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:22:23,461-Speed 5979.99 samples/sec Loss 6.1559 LearningRate 0.0723 Epoch: 12 Global Step: 128040 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:22:30,298-Speed 5990.94 samples/sec Loss 6.2031 LearningRate 0.0723 Epoch: 12 Global Step: 128050 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:22:37,163-Speed 5967.80 samples/sec Loss 6.1853 LearningRate 0.0722 Epoch: 12 Global Step: 128060 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:22:44,009-Speed 5984.73 samples/sec Loss 6.2182 LearningRate 0.0722 Epoch: 12 Global Step: 128070 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:22:50,848-Speed 5989.11 samples/sec Loss 6.1522 LearningRate 0.0722 Epoch: 12 Global Step: 128080 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:22:57,690-Speed 5988.25 samples/sec Loss 6.1190 LearningRate 0.0722 Epoch: 12 Global Step: 128090 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:23:04,541-Speed 5980.75 samples/sec Loss 6.1946 LearningRate 0.0722 Epoch: 12 Global Step: 128100 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-08 21:23:11,361-Speed 6006.46 samples/sec Loss 6.1591 LearningRate 0.0722 Epoch: 12 Global Step: 128110 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-01-08 21:23:18,246-Speed 5950.30 samples/sec Loss 6.1885 LearningRate 0.0721 Epoch: 12 Global Step: 128120 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-01-08 21:23:25,105-Speed 5973.35 samples/sec Loss 6.1508 LearningRate 0.0721 Epoch: 12 Global Step: 128130 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-01-08 21:23:32,015-Speed 5928.45 samples/sec Loss 6.1654 LearningRate 0.0721 Epoch: 12 Global Step: 128140 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-01-08 21:23:38,897-Speed 5953.13 samples/sec Loss 6.2195 LearningRate 0.0721 Epoch: 12 Global Step: 128150 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-01-08 21:23:45,750-Speed 5978.21 samples/sec Loss 6.1445 LearningRate 0.0721 Epoch: 12 Global Step: 128160 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-01-08 21:23:52,614-Speed 5968.66 samples/sec Loss 6.1484 LearningRate 0.0720 Epoch: 12 Global Step: 128170 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-01-08 21:23:59,488-Speed 5960.98 samples/sec Loss 6.1445 LearningRate 0.0720 Epoch: 12 Global Step: 128180 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-01-08 21:24:06,337-Speed 5981.91 samples/sec Loss 6.2264 LearningRate 0.0720 Epoch: 12 Global Step: 128190 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-01-08 21:24:13,180-Speed 5986.37 samples/sec Loss 6.2152 LearningRate 0.0720 Epoch: 12 Global Step: 128200 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-01-08 21:24:20,046-Speed 5966.63 samples/sec Loss 6.1531 LearningRate 0.0720 Epoch: 12 Global Step: 128210 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-08 21:24:26,918-Speed 5962.64 samples/sec Loss 6.0693 LearningRate 0.0720 Epoch: 12 Global Step: 128220 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-08 21:24:33,780-Speed 5969.72 samples/sec Loss 6.1058 LearningRate 0.0719 Epoch: 12 Global Step: 128230 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-08 21:24:40,659-Speed 5955.56 samples/sec Loss 6.1518 LearningRate 0.0719 Epoch: 12 Global Step: 128240 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-08 21:24:47,539-Speed 5955.03 samples/sec Loss 6.1565 LearningRate 0.0719 Epoch: 12 Global Step: 128250 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-08 21:24:54,395-Speed 5975.63 samples/sec Loss 6.1306 LearningRate 0.0719 Epoch: 12 Global Step: 128260 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-08 21:25:01,255-Speed 5971.77 samples/sec Loss 6.2154 LearningRate 0.0719 Epoch: 12 Global Step: 128270 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-08 21:25:08,135-Speed 5954.70 samples/sec Loss 6.1377 LearningRate 0.0718 Epoch: 12 Global Step: 128280 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-08 21:25:14,980-Speed 5985.49 samples/sec Loss 6.1008 LearningRate 0.0718 Epoch: 12 Global Step: 128290 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-08 21:25:21,838-Speed 5973.04 samples/sec Loss 6.1212 LearningRate 0.0718 Epoch: 12 Global Step: 128300 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-08 21:25:28,717-Speed 5956.15 samples/sec Loss 6.1790 LearningRate 0.0718 Epoch: 12 Global Step: 128310 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:25:35,584-Speed 5965.95 samples/sec Loss 6.1175 LearningRate 0.0718 Epoch: 12 Global Step: 128320 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-08 21:25:42,431-Speed 5983.26 samples/sec Loss 6.1569 LearningRate 0.0718 Epoch: 12 Global Step: 128330 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:25:49,282-Speed 5981.58 samples/sec Loss 6.1236 LearningRate 0.0717 Epoch: 12 Global Step: 128340 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:25:56,137-Speed 5976.30 samples/sec Loss 6.1327 LearningRate 0.0717 Epoch: 12 Global Step: 128350 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:26:03,007-Speed 5963.84 samples/sec Loss 6.1838 LearningRate 0.0717 Epoch: 12 Global Step: 128360 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:26:09,851-Speed 5986.11 samples/sec Loss 6.1393 LearningRate 0.0717 Epoch: 12 Global Step: 128370 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:26:16,704-Speed 5978.24 samples/sec Loss 6.0978 LearningRate 0.0717 Epoch: 12 Global Step: 128380 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:26:23,558-Speed 5979.61 samples/sec Loss 6.1526 LearningRate 0.0716 Epoch: 12 Global Step: 128390 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:26:30,402-Speed 5986.60 samples/sec Loss 6.1471 LearningRate 0.0716 Epoch: 12 Global Step: 128400 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:26:37,257-Speed 5975.84 samples/sec Loss 6.1190 LearningRate 0.0716 Epoch: 12 Global Step: 128410 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:26:44,115-Speed 5973.76 samples/sec Loss 6.1107 LearningRate 0.0716 Epoch: 12 Global Step: 128420 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:26:50,973-Speed 5974.23 samples/sec Loss 6.1267 LearningRate 0.0716 Epoch: 12 Global Step: 128430 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:26:57,826-Speed 5977.75 samples/sec Loss 6.1244 LearningRate 0.0716 Epoch: 12 Global Step: 128440 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:27:04,704-Speed 5956.94 samples/sec Loss 6.1323 LearningRate 0.0715 Epoch: 12 Global Step: 128450 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:27:11,558-Speed 5977.36 samples/sec Loss 6.1288 LearningRate 0.0715 Epoch: 12 Global Step: 128460 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:27:18,444-Speed 5949.51 samples/sec Loss 6.1384 LearningRate 0.0715 Epoch: 12 Global Step: 128470 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:27:25,317-Speed 5961.12 samples/sec Loss 6.1853 LearningRate 0.0715 Epoch: 12 Global Step: 128480 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:27:32,173-Speed 5976.25 samples/sec Loss 6.1334 LearningRate 0.0715 Epoch: 12 Global Step: 128490 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:27:39,032-Speed 5972.81 samples/sec Loss 6.1682 LearningRate 0.0714 Epoch: 12 Global Step: 128500 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:27:45,915-Speed 5956.01 samples/sec Loss 6.1901 LearningRate 0.0714 Epoch: 12 Global Step: 128510 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-08 21:27:52,776-Speed 5973.78 samples/sec Loss 6.1794 LearningRate 0.0714 Epoch: 12 Global Step: 128520 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-08 21:27:59,613-Speed 5991.40 samples/sec Loss 6.1977 LearningRate 0.0714 Epoch: 12 Global Step: 128530 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:28:06,479-Speed 5967.50 samples/sec Loss 6.1656 LearningRate 0.0714 Epoch: 12 Global Step: 128540 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:28:13,326-Speed 5983.47 samples/sec Loss 6.1273 LearningRate 0.0714 Epoch: 12 Global Step: 128550 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:28:20,182-Speed 5975.08 samples/sec Loss 6.1267 LearningRate 0.0713 Epoch: 12 Global Step: 128560 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:28:27,040-Speed 5974.10 samples/sec Loss 6.1453 LearningRate 0.0713 Epoch: 12 Global Step: 128570 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:28:33,890-Speed 5981.16 samples/sec Loss 6.0911 LearningRate 0.0713 Epoch: 12 Global Step: 128580 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:28:40,742-Speed 5978.32 samples/sec Loss 6.0999 LearningRate 0.0713 Epoch: 12 Global Step: 128590 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:28:47,596-Speed 5976.59 samples/sec Loss 6.1066 LearningRate 0.0713 Epoch: 12 Global Step: 128600 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:28:54,454-Speed 5974.14 samples/sec Loss 6.1091 LearningRate 0.0712 Epoch: 12 Global Step: 128610 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:29:01,325-Speed 5962.13 samples/sec Loss 6.0676 LearningRate 0.0712 Epoch: 12 Global Step: 128620 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:29:08,181-Speed 5975.32 samples/sec Loss 6.0672 LearningRate 0.0712 Epoch: 12 Global Step: 128630 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:29:15,031-Speed 5981.10 samples/sec Loss 6.0863 LearningRate 0.0712 Epoch: 12 Global Step: 128640 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:29:21,893-Speed 5970.46 samples/sec Loss 6.1060 LearningRate 0.0712 Epoch: 12 Global Step: 128650 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:29:28,738-Speed 5985.50 samples/sec Loss 6.1066 LearningRate 0.0712 Epoch: 12 Global Step: 128660 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:29:35,587-Speed 5982.18 samples/sec Loss 6.1524 LearningRate 0.0711 Epoch: 12 Global Step: 128670 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:29:42,453-Speed 5966.92 samples/sec Loss 6.1504 LearningRate 0.0711 Epoch: 12 Global Step: 128680 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:29:49,326-Speed 5960.86 samples/sec Loss 6.1249 LearningRate 0.0711 Epoch: 12 Global Step: 128690 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:29:56,202-Speed 5957.71 samples/sec Loss 6.0930 LearningRate 0.0711 Epoch: 12 Global Step: 128700 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:30:03,068-Speed 5966.69 samples/sec Loss 6.0993 LearningRate 0.0711 Epoch: 12 Global Step: 128710 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:30:09,975-Speed 5931.94 samples/sec Loss 6.1224 LearningRate 0.0710 Epoch: 12 Global Step: 128720 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:30:16,834-Speed 5972.53 samples/sec Loss 6.1528 LearningRate 0.0710 Epoch: 12 Global Step: 128730 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:30:23,732-Speed 5939.20 samples/sec Loss 6.1507 LearningRate 0.0710 Epoch: 12 Global Step: 128740 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:30:30,646-Speed 5925.25 samples/sec Loss 6.0928 LearningRate 0.0710 Epoch: 12 Global Step: 128750 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:30:37,526-Speed 5954.80 samples/sec Loss 6.1292 LearningRate 0.0710 Epoch: 12 Global Step: 128760 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:30:44,373-Speed 5982.89 samples/sec Loss 6.1059 LearningRate 0.0710 Epoch: 12 Global Step: 128770 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:30:51,246-Speed 5960.37 samples/sec Loss 6.0829 LearningRate 0.0709 Epoch: 12 Global Step: 128780 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:30:58,115-Speed 5964.00 samples/sec Loss 6.1028 LearningRate 0.0709 Epoch: 12 Global Step: 128790 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:31:04,980-Speed 5968.55 samples/sec Loss 6.0778 LearningRate 0.0709 Epoch: 12 Global Step: 128800 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:31:11,834-Speed 5976.88 samples/sec Loss 6.1595 LearningRate 0.0709 Epoch: 12 Global Step: 128810 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:31:18,706-Speed 5962.23 samples/sec Loss 6.0727 LearningRate 0.0709 Epoch: 12 Global Step: 128820 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:31:25,579-Speed 5960.35 samples/sec Loss 6.1084 LearningRate 0.0708 Epoch: 12 Global Step: 128830 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:31:32,447-Speed 5965.00 samples/sec Loss 6.1137 LearningRate 0.0708 Epoch: 12 Global Step: 128840 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:31:39,309-Speed 5970.75 samples/sec Loss 6.1303 LearningRate 0.0708 Epoch: 12 Global Step: 128850 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:31:46,189-Speed 5954.55 samples/sec Loss 6.1071 LearningRate 0.0708 Epoch: 12 Global Step: 128860 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:31:53,044-Speed 5976.20 samples/sec Loss 6.0564 LearningRate 0.0708 Epoch: 12 Global Step: 128870 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:32:00,054-Speed 5846.83 samples/sec Loss 6.0675 LearningRate 0.0708 Epoch: 12 Global Step: 128880 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:32:06,917-Speed 5969.71 samples/sec Loss 6.0715 LearningRate 0.0707 Epoch: 12 Global Step: 128890 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:32:13,785-Speed 5965.51 samples/sec Loss 6.1317 LearningRate 0.0707 Epoch: 12 Global Step: 128900 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:32:20,632-Speed 5982.78 samples/sec Loss 6.1572 LearningRate 0.0707 Epoch: 12 Global Step: 128910 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:32:27,483-Speed 5980.29 samples/sec Loss 6.1099 LearningRate 0.0707 Epoch: 12 Global Step: 128920 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:32:34,335-Speed 5978.69 samples/sec Loss 6.1339 LearningRate 0.0707 Epoch: 12 Global Step: 128930 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:32:41,195-Speed 5971.96 samples/sec Loss 6.1077 LearningRate 0.0707 Epoch: 12 Global Step: 128940 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:32:48,068-Speed 5961.05 samples/sec Loss 6.0854 LearningRate 0.0706 Epoch: 12 Global Step: 128950 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:32:54,937-Speed 5963.90 samples/sec Loss 6.1110 LearningRate 0.0706 Epoch: 12 Global Step: 128960 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:33:01,797-Speed 5972.67 samples/sec Loss 6.0512 LearningRate 0.0706 Epoch: 12 Global Step: 128970 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-08 21:33:08,638-Speed 5987.89 samples/sec Loss 6.1124 LearningRate 0.0706 Epoch: 12 Global Step: 128980 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:33:15,505-Speed 5966.14 samples/sec Loss 6.1048 LearningRate 0.0706 Epoch: 12 Global Step: 128990 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:33:22,366-Speed 5970.52 samples/sec Loss 6.0978 LearningRate 0.0705 Epoch: 12 Global Step: 129000 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:33:29,248-Speed 5953.23 samples/sec Loss 6.1100 LearningRate 0.0705 Epoch: 12 Global Step: 129010 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:33:36,126-Speed 5955.92 samples/sec Loss 6.1383 LearningRate 0.0705 Epoch: 12 Global Step: 129020 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:33:42,993-Speed 5965.47 samples/sec Loss 6.0854 LearningRate 0.0705 Epoch: 12 Global Step: 129030 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:33:49,857-Speed 5968.71 samples/sec Loss 6.0907 LearningRate 0.0705 Epoch: 12 Global Step: 129040 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:33:56,727-Speed 5966.36 samples/sec Loss 6.0982 LearningRate 0.0705 Epoch: 12 Global Step: 129050 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:34:03,593-Speed 5966.60 samples/sec Loss 6.0606 LearningRate 0.0704 Epoch: 12 Global Step: 129060 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:34:10,446-Speed 5978.57 samples/sec Loss 6.0712 LearningRate 0.0704 Epoch: 12 Global Step: 129070 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:34:17,282-Speed 5992.59 samples/sec Loss 6.0482 LearningRate 0.0704 Epoch: 12 Global Step: 129080 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:34:24,139-Speed 5976.81 samples/sec Loss 6.0879 LearningRate 0.0704 Epoch: 12 Global Step: 129090 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:34:30,995-Speed 5975.45 samples/sec Loss 6.1098 LearningRate 0.0704 Epoch: 12 Global Step: 129100 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:34:37,854-Speed 5973.00 samples/sec Loss 6.1398 LearningRate 0.0703 Epoch: 12 Global Step: 129110 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:34:44,711-Speed 5975.20 samples/sec Loss 6.0929 LearningRate 0.0703 Epoch: 12 Global Step: 129120 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:34:51,574-Speed 5969.51 samples/sec Loss 6.1194 LearningRate 0.0703 Epoch: 12 Global Step: 129130 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:34:58,491-Speed 5922.61 samples/sec Loss 6.1249 LearningRate 0.0703 Epoch: 12 Global Step: 129140 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:35:05,357-Speed 5966.66 samples/sec Loss 6.0830 LearningRate 0.0703 Epoch: 12 Global Step: 129150 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:35:12,223-Speed 5966.70 samples/sec Loss 6.0698 LearningRate 0.0703 Epoch: 12 Global Step: 129160 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:35:19,074-Speed 5979.65 samples/sec Loss 6.0858 LearningRate 0.0702 Epoch: 12 Global Step: 129170 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:35:25,919-Speed 5984.88 samples/sec Loss 6.0708 LearningRate 0.0702 Epoch: 12 Global Step: 129180 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:35:32,777-Speed 5973.32 samples/sec Loss 6.1063 LearningRate 0.0702 Epoch: 12 Global Step: 129190 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:35:39,631-Speed 5977.62 samples/sec Loss 6.0956 LearningRate 0.0702 Epoch: 12 Global Step: 129200 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:35:46,504-Speed 5960.90 samples/sec Loss 6.1393 LearningRate 0.0702 Epoch: 12 Global Step: 129210 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:35:53,383-Speed 5955.49 samples/sec Loss 6.1194 LearningRate 0.0701 Epoch: 12 Global Step: 129220 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:36:00,232-Speed 5981.68 samples/sec Loss 6.0270 LearningRate 0.0701 Epoch: 12 Global Step: 129230 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:36:07,100-Speed 5964.74 samples/sec Loss 6.1048 LearningRate 0.0701 Epoch: 12 Global Step: 129240 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:36:13,955-Speed 5976.97 samples/sec Loss 6.1399 LearningRate 0.0701 Epoch: 12 Global Step: 129250 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:36:20,803-Speed 5981.61 samples/sec Loss 6.0960 LearningRate 0.0701 Epoch: 12 Global Step: 129260 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:36:27,708-Speed 5936.73 samples/sec Loss 6.1137 LearningRate 0.0701 Epoch: 12 Global Step: 129270 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:36:34,570-Speed 5969.55 samples/sec Loss 6.0962 LearningRate 0.0700 Epoch: 12 Global Step: 129280 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:36:41,436-Speed 5967.05 samples/sec Loss 6.0370 LearningRate 0.0700 Epoch: 12 Global Step: 129290 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:36:48,300-Speed 5968.48 samples/sec Loss 6.1261 LearningRate 0.0700 Epoch: 12 Global Step: 129300 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:36:55,153-Speed 5978.06 samples/sec Loss 6.0938 LearningRate 0.0700 Epoch: 12 Global Step: 129310 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:37:02,010-Speed 5974.32 samples/sec Loss 6.1048 LearningRate 0.0700 Epoch: 12 Global Step: 129320 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:37:08,862-Speed 5979.59 samples/sec Loss 6.0601 LearningRate 0.0699 Epoch: 12 Global Step: 129330 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:37:15,716-Speed 5976.64 samples/sec Loss 6.0820 LearningRate 0.0699 Epoch: 12 Global Step: 129340 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:37:22,563-Speed 5983.31 samples/sec Loss 6.1288 LearningRate 0.0699 Epoch: 12 Global Step: 129350 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:37:29,432-Speed 5964.65 samples/sec Loss 6.0947 LearningRate 0.0699 Epoch: 12 Global Step: 129360 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:37:36,304-Speed 5961.63 samples/sec Loss 6.1195 LearningRate 0.0699 Epoch: 12 Global Step: 129370 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:37:43,179-Speed 5959.26 samples/sec Loss 6.0971 LearningRate 0.0699 Epoch: 12 Global Step: 129380 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:37:50,043-Speed 5968.80 samples/sec Loss 6.0689 LearningRate 0.0698 Epoch: 12 Global Step: 129390 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:37:56,888-Speed 5984.76 samples/sec Loss 6.0332 LearningRate 0.0698 Epoch: 12 Global Step: 129400 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:38:03,762-Speed 5960.02 samples/sec Loss 6.0503 LearningRate 0.0698 Epoch: 12 Global Step: 129410 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:38:10,610-Speed 5982.89 samples/sec Loss 6.0405 LearningRate 0.0698 Epoch: 12 Global Step: 129420 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:38:17,469-Speed 5972.38 samples/sec Loss 6.0503 LearningRate 0.0698 Epoch: 12 Global Step: 129430 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:38:24,349-Speed 5954.93 samples/sec Loss 6.0729 LearningRate 0.0698 Epoch: 12 Global Step: 129440 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:38:31,232-Speed 5952.38 samples/sec Loss 6.0163 LearningRate 0.0697 Epoch: 12 Global Step: 129450 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:38:38,083-Speed 5979.35 samples/sec Loss 6.0810 LearningRate 0.0697 Epoch: 12 Global Step: 129460 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:38:44,941-Speed 5976.53 samples/sec Loss 6.0295 LearningRate 0.0697 Epoch: 12 Global Step: 129470 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:38:51,809-Speed 5964.94 samples/sec Loss 6.0476 LearningRate 0.0697 Epoch: 12 Global Step: 129480 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:38:58,674-Speed 5967.04 samples/sec Loss 6.1066 LearningRate 0.0697 Epoch: 12 Global Step: 129490 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:39:05,553-Speed 5956.13 samples/sec Loss 6.0545 LearningRate 0.0696 Epoch: 12 Global Step: 129500 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:39:12,412-Speed 5972.97 samples/sec Loss 6.1025 LearningRate 0.0696 Epoch: 12 Global Step: 129510 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:39:19,273-Speed 5971.21 samples/sec Loss 6.0572 LearningRate 0.0696 Epoch: 12 Global Step: 129520 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:39:26,135-Speed 5970.27 samples/sec Loss 6.0551 LearningRate 0.0696 Epoch: 12 Global Step: 129530 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:39:33,024-Speed 5947.17 samples/sec Loss 6.0962 LearningRate 0.0696 Epoch: 12 Global Step: 129540 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:39:39,890-Speed 5966.07 samples/sec Loss 6.0779 LearningRate 0.0696 Epoch: 12 Global Step: 129550 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:39:46,757-Speed 5966.36 samples/sec Loss 6.0382 LearningRate 0.0695 Epoch: 12 Global Step: 129560 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:39:53,613-Speed 5975.94 samples/sec Loss 6.0678 LearningRate 0.0695 Epoch: 12 Global Step: 129570 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:40:00,473-Speed 5972.11 samples/sec Loss 6.0621 LearningRate 0.0695 Epoch: 12 Global Step: 129580 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:40:07,332-Speed 5972.42 samples/sec Loss 6.0673 LearningRate 0.0695 Epoch: 12 Global Step: 129590 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:40:14,280-Speed 5897.78 samples/sec Loss 6.0206 LearningRate 0.0695 Epoch: 12 Global Step: 129600 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:40:21,138-Speed 5973.37 samples/sec Loss 6.0509 LearningRate 0.0694 Epoch: 12 Global Step: 129610 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:40:28,064-Speed 5915.32 samples/sec Loss 6.0125 LearningRate 0.0694 Epoch: 12 Global Step: 129620 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:40:34,942-Speed 5958.39 samples/sec Loss 6.0436 LearningRate 0.0694 Epoch: 12 Global Step: 129630 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:40:41,837-Speed 5941.84 samples/sec Loss 6.0714 LearningRate 0.0694 Epoch: 12 Global Step: 129640 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:40:48,792-Speed 5890.52 samples/sec Loss 6.0961 LearningRate 0.0694 Epoch: 12 Global Step: 129650 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:40:55,741-Speed 5895.57 samples/sec Loss 6.0424 LearningRate 0.0694 Epoch: 12 Global Step: 129660 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:41:02,687-Speed 5897.76 samples/sec Loss 6.0726 LearningRate 0.0693 Epoch: 12 Global Step: 129670 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:41:09,590-Speed 5934.65 samples/sec Loss 6.1072 LearningRate 0.0693 Epoch: 12 Global Step: 129680 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:41:16,462-Speed 5962.39 samples/sec Loss 6.0653 LearningRate 0.0693 Epoch: 12 Global Step: 129690 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:41:23,329-Speed 5965.45 samples/sec Loss 6.0508 LearningRate 0.0693 Epoch: 12 Global Step: 129700 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:41:30,204-Speed 5959.27 samples/sec Loss 6.0293 LearningRate 0.0693 Epoch: 12 Global Step: 129710 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:41:37,076-Speed 5962.51 samples/sec Loss 6.0545 LearningRate 0.0693 Epoch: 12 Global Step: 129720 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:41:43,958-Speed 5952.50 samples/sec Loss 5.9972 LearningRate 0.0692 Epoch: 12 Global Step: 129730 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:41:50,820-Speed 5970.56 samples/sec Loss 6.0764 LearningRate 0.0692 Epoch: 12 Global Step: 129740 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:41:57,690-Speed 5964.40 samples/sec Loss 6.0606 LearningRate 0.0692 Epoch: 12 Global Step: 129750 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:42:04,530-Speed 5989.43 samples/sec Loss 6.0010 LearningRate 0.0692 Epoch: 12 Global Step: 129760 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:42:11,496-Speed 5881.45 samples/sec Loss 6.0046 LearningRate 0.0692 Epoch: 12 Global Step: 129770 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:42:18,404-Speed 5931.67 samples/sec Loss 6.0608 LearningRate 0.0691 Epoch: 12 Global Step: 129780 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:42:25,249-Speed 5984.46 samples/sec Loss 6.0560 LearningRate 0.0691 Epoch: 12 Global Step: 129790 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:42:32,115-Speed 5967.58 samples/sec Loss 6.0295 LearningRate 0.0691 Epoch: 12 Global Step: 129800 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:42:38,981-Speed 5966.33 samples/sec Loss 5.9827 LearningRate 0.0691 Epoch: 12 Global Step: 129810 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:42:45,839-Speed 5974.14 samples/sec Loss 6.0544 LearningRate 0.0691 Epoch: 12 Global Step: 129820 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:42:52,689-Speed 5981.31 samples/sec Loss 6.0420 LearningRate 0.0691 Epoch: 12 Global Step: 129830 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:42:59,577-Speed 5947.39 samples/sec Loss 6.0539 LearningRate 0.0690 Epoch: 12 Global Step: 129840 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:43:06,542-Speed 5882.08 samples/sec Loss 6.0673 LearningRate 0.0690 Epoch: 12 Global Step: 129850 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:43:13,402-Speed 5972.78 samples/sec Loss 6.0489 LearningRate 0.0690 Epoch: 12 Global Step: 129860 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:43:20,271-Speed 5963.59 samples/sec Loss 6.0150 LearningRate 0.0690 Epoch: 12 Global Step: 129870 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:43:27,124-Speed 5980.36 samples/sec Loss 6.0652 LearningRate 0.0690 Epoch: 12 Global Step: 129880 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:43:33,994-Speed 5964.49 samples/sec Loss 6.0646 LearningRate 0.0689 Epoch: 12 Global Step: 129890 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:43:40,957-Speed 5888.17 samples/sec Loss 6.0563 LearningRate 0.0689 Epoch: 12 Global Step: 129900 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:43:47,833-Speed 5957.73 samples/sec Loss 6.0397 LearningRate 0.0689 Epoch: 12 Global Step: 129910 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:43:54,685-Speed 5979.37 samples/sec Loss 6.0926 LearningRate 0.0689 Epoch: 12 Global Step: 129920 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:44:01,538-Speed 5978.34 samples/sec Loss 6.1100 LearningRate 0.0689 Epoch: 12 Global Step: 129930 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:44:08,387-Speed 5981.94 samples/sec Loss 6.0410 LearningRate 0.0689 Epoch: 12 Global Step: 129940 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:44:15,289-Speed 5935.17 samples/sec Loss 6.0336 LearningRate 0.0688 Epoch: 12 Global Step: 129950 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:44:22,220-Speed 5911.29 samples/sec Loss 6.0813 LearningRate 0.0688 Epoch: 12 Global Step: 129960 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:44:29,164-Speed 5898.68 samples/sec Loss 6.0840 LearningRate 0.0688 Epoch: 12 Global Step: 129970 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:44:36,092-Speed 5913.94 samples/sec Loss 6.0218 LearningRate 0.0688 Epoch: 12 Global Step: 129980 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:44:43,035-Speed 5900.24 samples/sec Loss 5.9816 LearningRate 0.0688 Epoch: 12 Global Step: 129990 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:44:49,935-Speed 5937.46 samples/sec Loss 6.0354 LearningRate 0.0688 Epoch: 12 Global Step: 130000 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:45:16,864-[lfw][130000]XNorm: 23.952681 Training: 2022-01-08 21:45:16,865-[lfw][130000]Accuracy-Flip: 0.99767+-0.00300 Training: 2022-01-08 21:45:16,865-[lfw][130000]Accuracy-Highest: 0.99783 Training: 2022-01-08 21:45:48,165-[cfp_fp][130000]XNorm: 20.997199 Training: 2022-01-08 21:45:48,166-[cfp_fp][130000]Accuracy-Flip: 0.98586+-0.00627 Training: 2022-01-08 21:45:48,167-[cfp_fp][130000]Accuracy-Highest: 0.98586 Training: 2022-01-08 21:46:15,228-[agedb_30][130000]XNorm: 23.315981 Training: 2022-01-08 21:46:15,229-[agedb_30][130000]Accuracy-Flip: 0.97550+-0.00517 Training: 2022-01-08 21:46:15,229-[agedb_30][130000]Accuracy-Highest: 0.97550 Training: 2022-01-08 21:46:22,069-Speed 444.58 samples/sec Loss 6.0551 LearningRate 0.0687 Epoch: 12 Global Step: 130010 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:46:28,919-Speed 5981.68 samples/sec Loss 6.0376 LearningRate 0.0687 Epoch: 12 Global Step: 130020 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:46:35,771-Speed 5979.41 samples/sec Loss 6.0375 LearningRate 0.0687 Epoch: 12 Global Step: 130030 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:46:42,721-Speed 5894.75 samples/sec Loss 6.0127 LearningRate 0.0687 Epoch: 12 Global Step: 130040 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:46:49,591-Speed 5963.44 samples/sec Loss 6.0564 LearningRate 0.0687 Epoch: 12 Global Step: 130050 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:46:56,460-Speed 5964.45 samples/sec Loss 6.0611 LearningRate 0.0686 Epoch: 12 Global Step: 130060 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:47:03,352-Speed 5944.50 samples/sec Loss 5.9903 LearningRate 0.0686 Epoch: 12 Global Step: 130070 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:47:10,204-Speed 5978.57 samples/sec Loss 6.0034 LearningRate 0.0686 Epoch: 12 Global Step: 130080 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:47:17,060-Speed 5975.91 samples/sec Loss 6.0720 LearningRate 0.0686 Epoch: 12 Global Step: 130090 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:47:23,914-Speed 5976.57 samples/sec Loss 6.0584 LearningRate 0.0686 Epoch: 12 Global Step: 130100 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:47:30,765-Speed 5980.35 samples/sec Loss 6.0045 LearningRate 0.0686 Epoch: 12 Global Step: 130110 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:47:37,619-Speed 5976.56 samples/sec Loss 6.0416 LearningRate 0.0685 Epoch: 12 Global Step: 130120 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:47:44,476-Speed 5974.58 samples/sec Loss 6.0649 LearningRate 0.0685 Epoch: 12 Global Step: 130130 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:47:51,358-Speed 5952.70 samples/sec Loss 6.0520 LearningRate 0.0685 Epoch: 12 Global Step: 130140 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:47:58,199-Speed 5988.25 samples/sec Loss 6.0615 LearningRate 0.0685 Epoch: 12 Global Step: 130150 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:48:05,057-Speed 5973.90 samples/sec Loss 5.9790 LearningRate 0.0685 Epoch: 12 Global Step: 130160 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:48:11,920-Speed 5969.74 samples/sec Loss 5.9935 LearningRate 0.0685 Epoch: 12 Global Step: 130170 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:48:18,756-Speed 5992.40 samples/sec Loss 6.0491 LearningRate 0.0684 Epoch: 12 Global Step: 130180 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:48:25,611-Speed 5976.64 samples/sec Loss 6.0684 LearningRate 0.0684 Epoch: 12 Global Step: 130190 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:48:32,454-Speed 5986.74 samples/sec Loss 5.9730 LearningRate 0.0684 Epoch: 12 Global Step: 130200 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:48:39,297-Speed 5986.34 samples/sec Loss 5.9669 LearningRate 0.0684 Epoch: 12 Global Step: 130210 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:48:46,167-Speed 5963.94 samples/sec Loss 6.0286 LearningRate 0.0684 Epoch: 12 Global Step: 130220 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:48:53,017-Speed 5980.31 samples/sec Loss 5.9758 LearningRate 0.0683 Epoch: 12 Global Step: 130230 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-08 21:48:59,872-Speed 5976.43 samples/sec Loss 5.9311 LearningRate 0.0683 Epoch: 12 Global Step: 130240 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:49:06,723-Speed 5979.98 samples/sec Loss 5.9620 LearningRate 0.0683 Epoch: 12 Global Step: 130250 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:49:13,567-Speed 5985.96 samples/sec Loss 6.0585 LearningRate 0.0683 Epoch: 12 Global Step: 130260 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:49:20,412-Speed 5985.15 samples/sec Loss 6.0145 LearningRate 0.0683 Epoch: 12 Global Step: 130270 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:49:27,271-Speed 5972.77 samples/sec Loss 5.9774 LearningRate 0.0683 Epoch: 12 Global Step: 130280 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-08 21:49:34,140-Speed 5964.36 samples/sec Loss 6.0215 LearningRate 0.0682 Epoch: 12 Global Step: 130290 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-08 21:49:40,986-Speed 5984.04 samples/sec Loss 6.0002 LearningRate 0.0682 Epoch: 12 Global Step: 130300 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-08 21:49:47,839-Speed 5977.75 samples/sec Loss 6.0227 LearningRate 0.0682 Epoch: 12 Global Step: 130310 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-08 21:49:54,682-Speed 5986.79 samples/sec Loss 6.0180 LearningRate 0.0682 Epoch: 12 Global Step: 130320 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-08 21:50:01,550-Speed 5965.03 samples/sec Loss 6.0380 LearningRate 0.0682 Epoch: 12 Global Step: 130330 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-08 21:50:08,405-Speed 5976.40 samples/sec Loss 5.9602 LearningRate 0.0682 Epoch: 12 Global Step: 130340 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-08 21:50:15,267-Speed 5970.20 samples/sec Loss 6.0890 LearningRate 0.0681 Epoch: 12 Global Step: 130350 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-08 21:50:22,131-Speed 5969.67 samples/sec Loss 6.0021 LearningRate 0.0681 Epoch: 12 Global Step: 130360 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-08 21:50:28,969-Speed 5990.61 samples/sec Loss 5.9874 LearningRate 0.0681 Epoch: 12 Global Step: 130370 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-08 21:50:35,815-Speed 5984.03 samples/sec Loss 6.0047 LearningRate 0.0681 Epoch: 12 Global Step: 130380 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:50:42,660-Speed 5985.18 samples/sec Loss 5.9444 LearningRate 0.0681 Epoch: 12 Global Step: 130390 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:50:49,507-Speed 5982.54 samples/sec Loss 6.0496 LearningRate 0.0680 Epoch: 12 Global Step: 130400 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:50:56,384-Speed 5960.05 samples/sec Loss 6.0535 LearningRate 0.0680 Epoch: 12 Global Step: 130410 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:51:03,244-Speed 5973.42 samples/sec Loss 6.0015 LearningRate 0.0680 Epoch: 12 Global Step: 130420 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:51:10,111-Speed 5965.91 samples/sec Loss 6.0173 LearningRate 0.0680 Epoch: 12 Global Step: 130430 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:51:16,961-Speed 5980.28 samples/sec Loss 6.0537 LearningRate 0.0680 Epoch: 12 Global Step: 130440 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:51:23,805-Speed 5986.54 samples/sec Loss 6.0080 LearningRate 0.0680 Epoch: 12 Global Step: 130450 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:51:30,672-Speed 5965.92 samples/sec Loss 6.0052 LearningRate 0.0679 Epoch: 12 Global Step: 130460 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:51:37,525-Speed 5977.44 samples/sec Loss 5.9776 LearningRate 0.0679 Epoch: 12 Global Step: 130470 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:51:44,382-Speed 5974.76 samples/sec Loss 6.0004 LearningRate 0.0679 Epoch: 12 Global Step: 130480 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:51:51,238-Speed 5978.45 samples/sec Loss 5.9885 LearningRate 0.0679 Epoch: 12 Global Step: 130490 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:51:58,099-Speed 5970.94 samples/sec Loss 6.0193 LearningRate 0.0679 Epoch: 12 Global Step: 130500 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:52:04,978-Speed 5956.88 samples/sec Loss 5.9428 LearningRate 0.0679 Epoch: 12 Global Step: 130510 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:52:11,835-Speed 5978.07 samples/sec Loss 5.9932 LearningRate 0.0678 Epoch: 12 Global Step: 130520 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:52:18,709-Speed 5959.32 samples/sec Loss 5.9973 LearningRate 0.0678 Epoch: 12 Global Step: 130530 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:52:25,562-Speed 5978.04 samples/sec Loss 6.0326 LearningRate 0.0678 Epoch: 12 Global Step: 130540 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:52:32,421-Speed 5973.68 samples/sec Loss 5.9182 LearningRate 0.0678 Epoch: 12 Global Step: 130550 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:52:39,268-Speed 5983.14 samples/sec Loss 6.0024 LearningRate 0.0678 Epoch: 12 Global Step: 130560 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:52:46,115-Speed 5982.52 samples/sec Loss 5.9854 LearningRate 0.0677 Epoch: 12 Global Step: 130570 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:52:52,957-Speed 5988.01 samples/sec Loss 5.9547 LearningRate 0.0677 Epoch: 12 Global Step: 130580 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:52:59,909-Speed 5893.13 samples/sec Loss 5.9757 LearningRate 0.0677 Epoch: 12 Global Step: 130590 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:53:06,765-Speed 5976.93 samples/sec Loss 5.9555 LearningRate 0.0677 Epoch: 12 Global Step: 130600 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:53:13,624-Speed 5973.81 samples/sec Loss 5.9909 LearningRate 0.0677 Epoch: 12 Global Step: 130610 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:53:20,579-Speed 5889.92 samples/sec Loss 5.9998 LearningRate 0.0677 Epoch: 12 Global Step: 130620 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:53:27,438-Speed 5972.70 samples/sec Loss 6.0560 LearningRate 0.0676 Epoch: 12 Global Step: 130630 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:53:34,301-Speed 5972.33 samples/sec Loss 5.9485 LearningRate 0.0676 Epoch: 12 Global Step: 130640 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:53:41,149-Speed 5982.06 samples/sec Loss 6.0106 LearningRate 0.0676 Epoch: 12 Global Step: 130650 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:53:47,990-Speed 5991.52 samples/sec Loss 5.9828 LearningRate 0.0676 Epoch: 12 Global Step: 130660 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:53:54,837-Speed 5983.06 samples/sec Loss 5.9750 LearningRate 0.0676 Epoch: 12 Global Step: 130670 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:54:01,676-Speed 5989.41 samples/sec Loss 5.9685 LearningRate 0.0676 Epoch: 12 Global Step: 130680 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:54:08,522-Speed 5984.75 samples/sec Loss 5.9087 LearningRate 0.0675 Epoch: 12 Global Step: 130690 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:54:15,423-Speed 5936.73 samples/sec Loss 6.0106 LearningRate 0.0675 Epoch: 12 Global Step: 130700 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:54:22,326-Speed 5934.65 samples/sec Loss 5.9615 LearningRate 0.0675 Epoch: 12 Global Step: 130710 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:54:29,224-Speed 5939.14 samples/sec Loss 5.9938 LearningRate 0.0675 Epoch: 12 Global Step: 130720 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:54:36,119-Speed 5941.98 samples/sec Loss 6.0079 LearningRate 0.0675 Epoch: 12 Global Step: 130730 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:54:42,965-Speed 5983.75 samples/sec Loss 6.0394 LearningRate 0.0674 Epoch: 12 Global Step: 130740 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:54:49,823-Speed 5973.46 samples/sec Loss 6.0111 LearningRate 0.0674 Epoch: 12 Global Step: 130750 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:54:56,708-Speed 5950.73 samples/sec Loss 5.9673 LearningRate 0.0674 Epoch: 12 Global Step: 130760 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:55:03,560-Speed 5979.14 samples/sec Loss 5.9937 LearningRate 0.0674 Epoch: 12 Global Step: 130770 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:55:10,410-Speed 5980.62 samples/sec Loss 6.0163 LearningRate 0.0674 Epoch: 12 Global Step: 130780 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:55:17,279-Speed 5964.89 samples/sec Loss 6.0111 LearningRate 0.0674 Epoch: 12 Global Step: 130790 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:55:24,133-Speed 5977.01 samples/sec Loss 6.0240 LearningRate 0.0673 Epoch: 12 Global Step: 130800 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:55:30,987-Speed 5977.53 samples/sec Loss 5.9972 LearningRate 0.0673 Epoch: 12 Global Step: 130810 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:55:37,838-Speed 5980.01 samples/sec Loss 5.9850 LearningRate 0.0673 Epoch: 12 Global Step: 130820 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:55:44,688-Speed 5980.42 samples/sec Loss 5.9443 LearningRate 0.0673 Epoch: 12 Global Step: 130830 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:55:51,527-Speed 5990.19 samples/sec Loss 5.9771 LearningRate 0.0673 Epoch: 12 Global Step: 130840 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:55:58,381-Speed 5977.10 samples/sec Loss 5.9778 LearningRate 0.0673 Epoch: 12 Global Step: 130850 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-08 21:56:05,238-Speed 5974.56 samples/sec Loss 5.9539 LearningRate 0.0672 Epoch: 12 Global Step: 130860 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:56:12,092-Speed 5977.79 samples/sec Loss 5.9347 LearningRate 0.0672 Epoch: 12 Global Step: 130870 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:56:18,938-Speed 5985.10 samples/sec Loss 5.9639 LearningRate 0.0672 Epoch: 12 Global Step: 130880 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:56:25,807-Speed 5963.69 samples/sec Loss 5.9084 LearningRate 0.0672 Epoch: 12 Global Step: 130890 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:56:32,675-Speed 5965.38 samples/sec Loss 6.0023 LearningRate 0.0672 Epoch: 12 Global Step: 130900 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:56:39,543-Speed 5964.75 samples/sec Loss 6.0281 LearningRate 0.0671 Epoch: 12 Global Step: 130910 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:56:46,414-Speed 5962.25 samples/sec Loss 5.9485 LearningRate 0.0671 Epoch: 12 Global Step: 130920 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:56:53,266-Speed 5979.00 samples/sec Loss 6.0080 LearningRate 0.0671 Epoch: 12 Global Step: 130930 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:57:00,167-Speed 5937.51 samples/sec Loss 5.9785 LearningRate 0.0671 Epoch: 12 Global Step: 130940 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:57:07,011-Speed 5985.37 samples/sec Loss 5.9502 LearningRate 0.0671 Epoch: 12 Global Step: 130950 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:57:13,858-Speed 5983.84 samples/sec Loss 5.9679 LearningRate 0.0671 Epoch: 12 Global Step: 130960 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:57:20,722-Speed 5968.33 samples/sec Loss 5.9868 LearningRate 0.0670 Epoch: 12 Global Step: 130970 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:57:27,564-Speed 5987.42 samples/sec Loss 5.9334 LearningRate 0.0670 Epoch: 12 Global Step: 130980 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:57:34,411-Speed 5983.60 samples/sec Loss 5.9560 LearningRate 0.0670 Epoch: 12 Global Step: 130990 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:57:41,267-Speed 5975.35 samples/sec Loss 5.9532 LearningRate 0.0670 Epoch: 12 Global Step: 131000 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:57:48,134-Speed 5965.11 samples/sec Loss 6.0117 LearningRate 0.0670 Epoch: 12 Global Step: 131010 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:57:55,000-Speed 5967.22 samples/sec Loss 5.9958 LearningRate 0.0670 Epoch: 12 Global Step: 131020 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:58:01,869-Speed 5964.89 samples/sec Loss 5.9851 LearningRate 0.0669 Epoch: 12 Global Step: 131030 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:58:08,719-Speed 5980.50 samples/sec Loss 6.0032 LearningRate 0.0669 Epoch: 12 Global Step: 131040 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:58:15,587-Speed 5965.43 samples/sec Loss 5.9566 LearningRate 0.0669 Epoch: 12 Global Step: 131050 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:58:22,456-Speed 5963.97 samples/sec Loss 5.9031 LearningRate 0.0669 Epoch: 12 Global Step: 131060 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:58:29,303-Speed 5983.07 samples/sec Loss 5.9718 LearningRate 0.0669 Epoch: 12 Global Step: 131070 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:58:36,171-Speed 5965.57 samples/sec Loss 6.0043 LearningRate 0.0668 Epoch: 12 Global Step: 131080 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:58:43,031-Speed 5972.30 samples/sec Loss 5.9564 LearningRate 0.0668 Epoch: 12 Global Step: 131090 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:58:49,896-Speed 5967.72 samples/sec Loss 5.9278 LearningRate 0.0668 Epoch: 12 Global Step: 131100 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:58:56,762-Speed 5966.52 samples/sec Loss 5.9626 LearningRate 0.0668 Epoch: 12 Global Step: 131110 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:59:03,621-Speed 5973.29 samples/sec Loss 5.9646 LearningRate 0.0668 Epoch: 12 Global Step: 131120 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:59:10,475-Speed 5976.81 samples/sec Loss 5.9246 LearningRate 0.0668 Epoch: 12 Global Step: 131130 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:59:17,334-Speed 5974.98 samples/sec Loss 5.8884 LearningRate 0.0667 Epoch: 12 Global Step: 131140 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:59:24,201-Speed 5968.35 samples/sec Loss 5.9672 LearningRate 0.0667 Epoch: 12 Global Step: 131150 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:59:31,096-Speed 5941.29 samples/sec Loss 5.9962 LearningRate 0.0667 Epoch: 12 Global Step: 131160 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:59:37,940-Speed 5985.87 samples/sec Loss 5.9475 LearningRate 0.0667 Epoch: 12 Global Step: 131170 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 21:59:44,816-Speed 5958.79 samples/sec Loss 5.9804 LearningRate 0.0667 Epoch: 12 Global Step: 131180 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:59:51,679-Speed 5968.49 samples/sec Loss 5.9748 LearningRate 0.0667 Epoch: 12 Global Step: 131190 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 21:59:58,516-Speed 5993.13 samples/sec Loss 5.9891 LearningRate 0.0666 Epoch: 12 Global Step: 131200 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:00:05,382-Speed 5965.96 samples/sec Loss 5.9255 LearningRate 0.0666 Epoch: 12 Global Step: 131210 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:00:12,235-Speed 5978.03 samples/sec Loss 5.9479 LearningRate 0.0666 Epoch: 12 Global Step: 131220 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:00:19,092-Speed 5974.28 samples/sec Loss 5.9145 LearningRate 0.0666 Epoch: 12 Global Step: 131230 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:00:25,943-Speed 5980.62 samples/sec Loss 6.0010 LearningRate 0.0666 Epoch: 12 Global Step: 131240 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:00:32,802-Speed 5972.32 samples/sec Loss 5.9777 LearningRate 0.0666 Epoch: 12 Global Step: 131250 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:00:39,679-Speed 5957.21 samples/sec Loss 5.9275 LearningRate 0.0665 Epoch: 12 Global Step: 131260 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:00:46,530-Speed 5981.81 samples/sec Loss 5.9785 LearningRate 0.0665 Epoch: 12 Global Step: 131270 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:00:53,403-Speed 5960.85 samples/sec Loss 5.9607 LearningRate 0.0665 Epoch: 12 Global Step: 131280 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:01:00,253-Speed 5980.80 samples/sec Loss 5.9404 LearningRate 0.0665 Epoch: 12 Global Step: 131290 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:01:07,102-Speed 5981.06 samples/sec Loss 5.9370 LearningRate 0.0665 Epoch: 12 Global Step: 131300 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:01:13,984-Speed 5952.85 samples/sec Loss 5.9683 LearningRate 0.0664 Epoch: 12 Global Step: 131310 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:01:20,852-Speed 5965.05 samples/sec Loss 5.9243 LearningRate 0.0664 Epoch: 12 Global Step: 131320 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:01:27,706-Speed 5979.08 samples/sec Loss 5.9623 LearningRate 0.0664 Epoch: 12 Global Step: 131330 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:01:34,552-Speed 5983.96 samples/sec Loss 5.9803 LearningRate 0.0664 Epoch: 12 Global Step: 131340 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:01:41,398-Speed 5984.25 samples/sec Loss 5.9204 LearningRate 0.0664 Epoch: 12 Global Step: 131350 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:01:48,251-Speed 5977.84 samples/sec Loss 5.9091 LearningRate 0.0664 Epoch: 12 Global Step: 131360 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:01:55,107-Speed 5975.05 samples/sec Loss 5.9559 LearningRate 0.0663 Epoch: 12 Global Step: 131370 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:02:01,978-Speed 5962.33 samples/sec Loss 5.9415 LearningRate 0.0663 Epoch: 12 Global Step: 131380 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:02:08,830-Speed 5979.40 samples/sec Loss 5.9463 LearningRate 0.0663 Epoch: 12 Global Step: 131390 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:02:15,700-Speed 5962.76 samples/sec Loss 5.9212 LearningRate 0.0663 Epoch: 12 Global Step: 131400 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:02:22,568-Speed 5964.72 samples/sec Loss 5.9871 LearningRate 0.0663 Epoch: 12 Global Step: 131410 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:02:29,435-Speed 5966.62 samples/sec Loss 5.9488 LearningRate 0.0663 Epoch: 12 Global Step: 131420 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-08 22:02:36,309-Speed 5959.72 samples/sec Loss 5.9240 LearningRate 0.0662 Epoch: 12 Global Step: 131430 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-08 22:02:43,179-Speed 5963.71 samples/sec Loss 5.9170 LearningRate 0.0662 Epoch: 12 Global Step: 131440 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-08 22:02:50,057-Speed 5956.42 samples/sec Loss 5.9160 LearningRate 0.0662 Epoch: 12 Global Step: 131450 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-08 22:02:56,907-Speed 5980.33 samples/sec Loss 5.9469 LearningRate 0.0662 Epoch: 12 Global Step: 131460 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-08 22:03:03,810-Speed 5935.02 samples/sec Loss 5.9348 LearningRate 0.0662 Epoch: 12 Global Step: 131470 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-08 22:03:10,763-Speed 5893.27 samples/sec Loss 5.9038 LearningRate 0.0661 Epoch: 12 Global Step: 131480 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-08 22:03:17,611-Speed 5982.69 samples/sec Loss 5.9618 LearningRate 0.0661 Epoch: 12 Global Step: 131490 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-08 22:03:24,449-Speed 5990.81 samples/sec Loss 5.9170 LearningRate 0.0661 Epoch: 12 Global Step: 131500 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-08 22:03:31,297-Speed 5982.95 samples/sec Loss 5.9791 LearningRate 0.0661 Epoch: 12 Global Step: 131510 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-08 22:03:38,168-Speed 5961.62 samples/sec Loss 5.8598 LearningRate 0.0661 Epoch: 12 Global Step: 131520 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:03:45,013-Speed 5985.29 samples/sec Loss 5.9357 LearningRate 0.0661 Epoch: 12 Global Step: 131530 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:03:51,880-Speed 5969.51 samples/sec Loss 5.9062 LearningRate 0.0660 Epoch: 12 Global Step: 131540 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:03:58,719-Speed 5989.78 samples/sec Loss 5.9242 LearningRate 0.0660 Epoch: 12 Global Step: 131550 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:04:05,594-Speed 5959.34 samples/sec Loss 5.8832 LearningRate 0.0660 Epoch: 12 Global Step: 131560 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:04:12,449-Speed 5976.65 samples/sec Loss 5.9246 LearningRate 0.0660 Epoch: 12 Global Step: 131570 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:04:19,291-Speed 5987.46 samples/sec Loss 5.9682 LearningRate 0.0660 Epoch: 12 Global Step: 131580 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:04:26,138-Speed 5982.76 samples/sec Loss 5.8882 LearningRate 0.0660 Epoch: 12 Global Step: 131590 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:04:33,008-Speed 5963.38 samples/sec Loss 5.9506 LearningRate 0.0659 Epoch: 12 Global Step: 131600 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:04:39,867-Speed 5973.02 samples/sec Loss 5.9420 LearningRate 0.0659 Epoch: 12 Global Step: 131610 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:04:46,728-Speed 5971.17 samples/sec Loss 5.9007 LearningRate 0.0659 Epoch: 12 Global Step: 131620 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:04:53,573-Speed 5985.68 samples/sec Loss 5.9292 LearningRate 0.0659 Epoch: 12 Global Step: 131630 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:05:00,430-Speed 5973.86 samples/sec Loss 5.8682 LearningRate 0.0659 Epoch: 12 Global Step: 131640 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:05:07,296-Speed 5967.40 samples/sec Loss 5.9244 LearningRate 0.0659 Epoch: 12 Global Step: 131650 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:05:14,167-Speed 5964.80 samples/sec Loss 5.9434 LearningRate 0.0658 Epoch: 12 Global Step: 131660 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:05:21,023-Speed 5974.94 samples/sec Loss 5.9710 LearningRate 0.0658 Epoch: 12 Global Step: 131670 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:05:27,935-Speed 5927.74 samples/sec Loss 5.9003 LearningRate 0.0658 Epoch: 12 Global Step: 131680 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:05:34,804-Speed 5965.40 samples/sec Loss 5.9196 LearningRate 0.0658 Epoch: 12 Global Step: 131690 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:05:41,673-Speed 5964.04 samples/sec Loss 5.9327 LearningRate 0.0658 Epoch: 12 Global Step: 131700 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:05:48,541-Speed 5964.65 samples/sec Loss 5.8850 LearningRate 0.0657 Epoch: 12 Global Step: 131710 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:05:55,400-Speed 5973.68 samples/sec Loss 5.9212 LearningRate 0.0657 Epoch: 12 Global Step: 131720 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:06:02,261-Speed 5971.18 samples/sec Loss 5.8642 LearningRate 0.0657 Epoch: 12 Global Step: 131730 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:06:09,098-Speed 5992.02 samples/sec Loss 5.8763 LearningRate 0.0657 Epoch: 12 Global Step: 131740 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:06:15,946-Speed 5982.76 samples/sec Loss 5.9167 LearningRate 0.0657 Epoch: 12 Global Step: 131750 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:06:22,796-Speed 5982.08 samples/sec Loss 5.8679 LearningRate 0.0657 Epoch: 12 Global Step: 131760 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:06:29,641-Speed 5985.35 samples/sec Loss 5.9016 LearningRate 0.0656 Epoch: 12 Global Step: 131770 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:06:36,495-Speed 5977.88 samples/sec Loss 5.8542 LearningRate 0.0656 Epoch: 12 Global Step: 131780 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:06:43,352-Speed 5974.87 samples/sec Loss 5.9196 LearningRate 0.0656 Epoch: 12 Global Step: 131790 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:06:50,215-Speed 5969.62 samples/sec Loss 5.9184 LearningRate 0.0656 Epoch: 12 Global Step: 131800 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:06:57,074-Speed 5973.17 samples/sec Loss 5.8754 LearningRate 0.0656 Epoch: 12 Global Step: 131810 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:07:03,918-Speed 5986.11 samples/sec Loss 5.9256 LearningRate 0.0656 Epoch: 12 Global Step: 131820 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:07:10,778-Speed 5972.22 samples/sec Loss 5.9092 LearningRate 0.0655 Epoch: 12 Global Step: 131830 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:07:17,658-Speed 5956.80 samples/sec Loss 5.9049 LearningRate 0.0655 Epoch: 12 Global Step: 131840 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:07:24,502-Speed 5985.86 samples/sec Loss 5.8698 LearningRate 0.0655 Epoch: 12 Global Step: 131850 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:07:31,347-Speed 5985.02 samples/sec Loss 5.9043 LearningRate 0.0655 Epoch: 12 Global Step: 131860 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:07:38,216-Speed 5964.29 samples/sec Loss 5.9176 LearningRate 0.0655 Epoch: 12 Global Step: 131870 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:07:45,062-Speed 5984.05 samples/sec Loss 5.9477 LearningRate 0.0655 Epoch: 12 Global Step: 131880 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:07:51,916-Speed 5977.40 samples/sec Loss 5.9495 LearningRate 0.0654 Epoch: 12 Global Step: 131890 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:07:58,761-Speed 5985.15 samples/sec Loss 5.9263 LearningRate 0.0654 Epoch: 12 Global Step: 131900 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:08:05,692-Speed 5910.23 samples/sec Loss 5.9185 LearningRate 0.0654 Epoch: 12 Global Step: 131910 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:08:12,556-Speed 5968.97 samples/sec Loss 5.8916 LearningRate 0.0654 Epoch: 12 Global Step: 131920 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:08:19,398-Speed 5988.17 samples/sec Loss 5.9206 LearningRate 0.0654 Epoch: 12 Global Step: 131930 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:08:26,250-Speed 5979.23 samples/sec Loss 5.9001 LearningRate 0.0653 Epoch: 12 Global Step: 131940 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:08:33,125-Speed 5958.94 samples/sec Loss 5.8803 LearningRate 0.0653 Epoch: 12 Global Step: 131950 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:08:39,982-Speed 5975.28 samples/sec Loss 5.9107 LearningRate 0.0653 Epoch: 12 Global Step: 131960 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:08:46,846-Speed 5968.08 samples/sec Loss 5.9134 LearningRate 0.0653 Epoch: 12 Global Step: 131970 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:08:53,824-Speed 5871.06 samples/sec Loss 5.9009 LearningRate 0.0653 Epoch: 12 Global Step: 131980 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:09:00,783-Speed 5889.02 samples/sec Loss 5.8862 LearningRate 0.0653 Epoch: 12 Global Step: 131990 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:09:07,750-Speed 5879.89 samples/sec Loss 5.9351 LearningRate 0.0652 Epoch: 12 Global Step: 132000 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:09:14,596-Speed 5985.84 samples/sec Loss 5.8696 LearningRate 0.0652 Epoch: 12 Global Step: 132010 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:09:21,582-Speed 5864.48 samples/sec Loss 5.8534 LearningRate 0.0652 Epoch: 12 Global Step: 132020 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:09:28,438-Speed 5975.87 samples/sec Loss 5.9759 LearningRate 0.0652 Epoch: 12 Global Step: 132030 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:09:35,294-Speed 5975.87 samples/sec Loss 5.9000 LearningRate 0.0652 Epoch: 12 Global Step: 132040 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:09:42,133-Speed 5990.35 samples/sec Loss 5.9249 LearningRate 0.0652 Epoch: 12 Global Step: 132050 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:09:49,043-Speed 5928.17 samples/sec Loss 5.8281 LearningRate 0.0651 Epoch: 12 Global Step: 132060 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:09:55,904-Speed 5970.99 samples/sec Loss 5.9021 LearningRate 0.0651 Epoch: 12 Global Step: 132070 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:10:02,754-Speed 5980.79 samples/sec Loss 5.8615 LearningRate 0.0651 Epoch: 12 Global Step: 132080 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:10:09,611-Speed 5975.69 samples/sec Loss 5.9055 LearningRate 0.0651 Epoch: 12 Global Step: 132090 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:10:16,464-Speed 5978.70 samples/sec Loss 5.8334 LearningRate 0.0651 Epoch: 12 Global Step: 132100 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:10:23,308-Speed 5985.98 samples/sec Loss 5.8867 LearningRate 0.0651 Epoch: 12 Global Step: 132110 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:10:30,169-Speed 5970.30 samples/sec Loss 5.8421 LearningRate 0.0650 Epoch: 12 Global Step: 132120 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:10:37,022-Speed 5978.76 samples/sec Loss 5.8768 LearningRate 0.0650 Epoch: 12 Global Step: 132130 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:10:43,912-Speed 5945.70 samples/sec Loss 5.9397 LearningRate 0.0650 Epoch: 12 Global Step: 132140 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:10:50,759-Speed 5982.86 samples/sec Loss 5.8454 LearningRate 0.0650 Epoch: 12 Global Step: 132150 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:10:57,641-Speed 5953.30 samples/sec Loss 5.8704 LearningRate 0.0650 Epoch: 12 Global Step: 132160 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:11:04,496-Speed 5977.30 samples/sec Loss 5.8850 LearningRate 0.0650 Epoch: 12 Global Step: 132170 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:11:11,338-Speed 5986.59 samples/sec Loss 5.8851 LearningRate 0.0649 Epoch: 12 Global Step: 132180 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:11:18,196-Speed 5974.69 samples/sec Loss 5.9112 LearningRate 0.0649 Epoch: 12 Global Step: 132190 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:11:25,042-Speed 5983.56 samples/sec Loss 5.9128 LearningRate 0.0649 Epoch: 12 Global Step: 132200 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:11:31,897-Speed 5976.61 samples/sec Loss 5.9150 LearningRate 0.0649 Epoch: 12 Global Step: 132210 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:11:38,907-Speed 5847.11 samples/sec Loss 5.9222 LearningRate 0.0649 Epoch: 12 Global Step: 132220 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:11:45,756-Speed 5981.89 samples/sec Loss 5.8842 LearningRate 0.0648 Epoch: 12 Global Step: 132230 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:11:52,602-Speed 5984.06 samples/sec Loss 5.8664 LearningRate 0.0648 Epoch: 12 Global Step: 132240 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:11:59,453-Speed 5980.27 samples/sec Loss 5.8863 LearningRate 0.0648 Epoch: 12 Global Step: 132250 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:12:06,316-Speed 5969.60 samples/sec Loss 5.8526 LearningRate 0.0648 Epoch: 12 Global Step: 132260 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:12:13,194-Speed 5956.65 samples/sec Loss 5.8913 LearningRate 0.0648 Epoch: 12 Global Step: 132270 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:12:20,050-Speed 5975.69 samples/sec Loss 5.8430 LearningRate 0.0648 Epoch: 12 Global Step: 132280 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:12:26,922-Speed 5962.24 samples/sec Loss 5.8501 LearningRate 0.0647 Epoch: 12 Global Step: 132290 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:12:33,786-Speed 5968.87 samples/sec Loss 5.9265 LearningRate 0.0647 Epoch: 12 Global Step: 132300 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:12:40,656-Speed 5963.69 samples/sec Loss 5.8449 LearningRate 0.0647 Epoch: 12 Global Step: 132310 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:12:47,527-Speed 5962.08 samples/sec Loss 5.8737 LearningRate 0.0647 Epoch: 12 Global Step: 132320 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:12:54,381-Speed 5976.95 samples/sec Loss 5.8614 LearningRate 0.0647 Epoch: 12 Global Step: 132330 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:13:01,261-Speed 5955.36 samples/sec Loss 5.8248 LearningRate 0.0647 Epoch: 12 Global Step: 132340 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:13:08,141-Speed 5954.40 samples/sec Loss 5.8089 LearningRate 0.0646 Epoch: 12 Global Step: 132350 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:13:15,044-Speed 5934.20 samples/sec Loss 5.8354 LearningRate 0.0646 Epoch: 12 Global Step: 132360 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:13:21,945-Speed 5937.21 samples/sec Loss 5.8894 LearningRate 0.0646 Epoch: 12 Global Step: 132370 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:13:28,792-Speed 5982.86 samples/sec Loss 5.9118 LearningRate 0.0646 Epoch: 12 Global Step: 132380 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:13:35,642-Speed 5980.32 samples/sec Loss 5.8615 LearningRate 0.0646 Epoch: 12 Global Step: 132390 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:13:42,548-Speed 5931.45 samples/sec Loss 5.9112 LearningRate 0.0646 Epoch: 12 Global Step: 132400 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:13:49,403-Speed 5976.35 samples/sec Loss 5.8688 LearningRate 0.0645 Epoch: 12 Global Step: 132410 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:13:56,255-Speed 5978.95 samples/sec Loss 5.9046 LearningRate 0.0645 Epoch: 12 Global Step: 132420 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:14:03,122-Speed 5966.10 samples/sec Loss 5.8705 LearningRate 0.0645 Epoch: 12 Global Step: 132430 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:14:09,973-Speed 5983.38 samples/sec Loss 5.9004 LearningRate 0.0645 Epoch: 12 Global Step: 132440 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:14:16,824-Speed 5979.36 samples/sec Loss 5.8529 LearningRate 0.0645 Epoch: 12 Global Step: 132450 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:14:23,688-Speed 5969.16 samples/sec Loss 5.8998 LearningRate 0.0645 Epoch: 12 Global Step: 132460 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:14:30,530-Speed 5989.72 samples/sec Loss 5.8579 LearningRate 0.0644 Epoch: 12 Global Step: 132470 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:14:37,391-Speed 5970.35 samples/sec Loss 5.8614 LearningRate 0.0644 Epoch: 12 Global Step: 132480 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:14:44,278-Speed 5948.65 samples/sec Loss 5.9066 LearningRate 0.0644 Epoch: 12 Global Step: 132490 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:14:51,124-Speed 5983.94 samples/sec Loss 5.8542 LearningRate 0.0644 Epoch: 12 Global Step: 132500 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:14:57,994-Speed 5965.93 samples/sec Loss 5.8228 LearningRate 0.0644 Epoch: 12 Global Step: 132510 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:15:04,863-Speed 5964.42 samples/sec Loss 5.8905 LearningRate 0.0643 Epoch: 12 Global Step: 132520 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:15:11,731-Speed 5964.90 samples/sec Loss 5.8556 LearningRate 0.0643 Epoch: 12 Global Step: 132530 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:15:18,604-Speed 5961.15 samples/sec Loss 5.8720 LearningRate 0.0643 Epoch: 12 Global Step: 132540 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:15:25,453-Speed 5981.02 samples/sec Loss 5.9147 LearningRate 0.0643 Epoch: 12 Global Step: 132550 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:15:32,312-Speed 5972.96 samples/sec Loss 5.9011 LearningRate 0.0643 Epoch: 12 Global Step: 132560 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:15:39,168-Speed 5975.52 samples/sec Loss 5.8844 LearningRate 0.0643 Epoch: 12 Global Step: 132570 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:15:46,035-Speed 5966.08 samples/sec Loss 5.8775 LearningRate 0.0642 Epoch: 12 Global Step: 132580 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:15:52,898-Speed 5968.94 samples/sec Loss 5.8602 LearningRate 0.0642 Epoch: 12 Global Step: 132590 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:15:59,741-Speed 5986.92 samples/sec Loss 5.9297 LearningRate 0.0642 Epoch: 12 Global Step: 132600 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:16:06,594-Speed 5978.68 samples/sec Loss 5.8518 LearningRate 0.0642 Epoch: 12 Global Step: 132610 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:16:13,473-Speed 5957.53 samples/sec Loss 5.8896 LearningRate 0.0642 Epoch: 12 Global Step: 132620 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:16:20,346-Speed 5961.10 samples/sec Loss 5.8127 LearningRate 0.0642 Epoch: 12 Global Step: 132630 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:16:27,204-Speed 5974.01 samples/sec Loss 5.7949 LearningRate 0.0641 Epoch: 12 Global Step: 132640 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:16:34,063-Speed 5972.56 samples/sec Loss 5.8412 LearningRate 0.0641 Epoch: 12 Global Step: 132650 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:16:40,939-Speed 5958.26 samples/sec Loss 5.8232 LearningRate 0.0641 Epoch: 12 Global Step: 132660 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:16:47,804-Speed 5967.77 samples/sec Loss 5.8279 LearningRate 0.0641 Epoch: 12 Global Step: 132670 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:16:54,692-Speed 5948.87 samples/sec Loss 5.8769 LearningRate 0.0641 Epoch: 12 Global Step: 132680 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:17:01,588-Speed 5941.25 samples/sec Loss 5.8385 LearningRate 0.0641 Epoch: 12 Global Step: 132690 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:17:08,473-Speed 5950.09 samples/sec Loss 5.8283 LearningRate 0.0640 Epoch: 12 Global Step: 132700 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:17:15,356-Speed 5952.54 samples/sec Loss 5.8899 LearningRate 0.0640 Epoch: 12 Global Step: 132710 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:17:22,212-Speed 5974.31 samples/sec Loss 5.8551 LearningRate 0.0640 Epoch: 12 Global Step: 132720 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:17:29,133-Speed 5919.58 samples/sec Loss 5.8115 LearningRate 0.0640 Epoch: 12 Global Step: 132730 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:17:35,986-Speed 5978.78 samples/sec Loss 5.8335 LearningRate 0.0640 Epoch: 12 Global Step: 132740 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:17:42,859-Speed 5960.56 samples/sec Loss 5.8580 LearningRate 0.0640 Epoch: 12 Global Step: 132750 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:17:49,721-Speed 5970.46 samples/sec Loss 5.7859 LearningRate 0.0639 Epoch: 12 Global Step: 132760 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:17:56,574-Speed 5978.11 samples/sec Loss 5.9030 LearningRate 0.0639 Epoch: 12 Global Step: 132770 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:18:03,437-Speed 5968.91 samples/sec Loss 5.8146 LearningRate 0.0639 Epoch: 12 Global Step: 132780 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:18:10,289-Speed 5979.34 samples/sec Loss 5.8470 LearningRate 0.0639 Epoch: 12 Global Step: 132790 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:18:17,142-Speed 5978.75 samples/sec Loss 5.8425 LearningRate 0.0639 Epoch: 12 Global Step: 132800 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:18:24,004-Speed 5970.10 samples/sec Loss 5.8104 LearningRate 0.0639 Epoch: 12 Global Step: 132810 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:18:30,871-Speed 5969.00 samples/sec Loss 5.7914 LearningRate 0.0638 Epoch: 12 Global Step: 132820 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:18:37,726-Speed 5976.05 samples/sec Loss 5.8847 LearningRate 0.0638 Epoch: 12 Global Step: 132830 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:18:44,580-Speed 5977.49 samples/sec Loss 5.8466 LearningRate 0.0638 Epoch: 12 Global Step: 132840 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:18:51,442-Speed 5973.03 samples/sec Loss 5.8478 LearningRate 0.0638 Epoch: 12 Global Step: 132850 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:18:58,298-Speed 5976.35 samples/sec Loss 5.8468 LearningRate 0.0638 Epoch: 12 Global Step: 132860 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:19:05,218-Speed 5920.67 samples/sec Loss 5.8148 LearningRate 0.0637 Epoch: 12 Global Step: 132870 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:19:12,080-Speed 5970.06 samples/sec Loss 5.8491 LearningRate 0.0637 Epoch: 12 Global Step: 132880 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:19:18,982-Speed 5935.88 samples/sec Loss 5.8454 LearningRate 0.0637 Epoch: 12 Global Step: 132890 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:19:25,910-Speed 5913.36 samples/sec Loss 5.8181 LearningRate 0.0637 Epoch: 12 Global Step: 132900 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:19:32,826-Speed 5923.52 samples/sec Loss 5.8461 LearningRate 0.0637 Epoch: 12 Global Step: 132910 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:19:39,751-Speed 5916.55 samples/sec Loss 5.7826 LearningRate 0.0637 Epoch: 12 Global Step: 132920 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:19:46,670-Speed 5921.26 samples/sec Loss 5.8657 LearningRate 0.0636 Epoch: 12 Global Step: 132930 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:19:53,572-Speed 5935.31 samples/sec Loss 5.8233 LearningRate 0.0636 Epoch: 12 Global Step: 132940 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:20:00,483-Speed 5928.38 samples/sec Loss 5.8009 LearningRate 0.0636 Epoch: 12 Global Step: 132950 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:20:07,393-Speed 5928.14 samples/sec Loss 5.8264 LearningRate 0.0636 Epoch: 12 Global Step: 132960 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:20:14,287-Speed 5942.19 samples/sec Loss 5.8153 LearningRate 0.0636 Epoch: 12 Global Step: 132970 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:20:21,191-Speed 5934.20 samples/sec Loss 5.7953 LearningRate 0.0636 Epoch: 12 Global Step: 132980 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:20:28,103-Speed 5927.08 samples/sec Loss 5.8882 LearningRate 0.0635 Epoch: 12 Global Step: 132990 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:20:35,000-Speed 5940.09 samples/sec Loss 5.8393 LearningRate 0.0635 Epoch: 12 Global Step: 133000 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:20:41,855-Speed 5978.10 samples/sec Loss 5.7937 LearningRate 0.0635 Epoch: 12 Global Step: 133010 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:20:48,726-Speed 5962.26 samples/sec Loss 5.8395 LearningRate 0.0635 Epoch: 12 Global Step: 133020 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:20:55,566-Speed 5988.92 samples/sec Loss 5.7850 LearningRate 0.0635 Epoch: 12 Global Step: 133030 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:21:02,440-Speed 5960.38 samples/sec Loss 5.9088 LearningRate 0.0635 Epoch: 12 Global Step: 133040 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:21:09,303-Speed 5969.36 samples/sec Loss 5.8287 LearningRate 0.0634 Epoch: 12 Global Step: 133050 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:21:16,147-Speed 5986.45 samples/sec Loss 5.8318 LearningRate 0.0634 Epoch: 12 Global Step: 133060 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-08 22:21:23,027-Speed 5954.51 samples/sec Loss 5.8448 LearningRate 0.0634 Epoch: 12 Global Step: 133070 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-08 22:21:29,902-Speed 5958.43 samples/sec Loss 5.8007 LearningRate 0.0634 Epoch: 12 Global Step: 133080 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-08 22:21:36,771-Speed 5966.63 samples/sec Loss 5.8252 LearningRate 0.0634 Epoch: 12 Global Step: 133090 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-08 22:21:43,627-Speed 5975.54 samples/sec Loss 5.8090 LearningRate 0.0634 Epoch: 12 Global Step: 133100 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-08 22:21:50,482-Speed 5975.63 samples/sec Loss 5.7774 LearningRate 0.0633 Epoch: 12 Global Step: 133110 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-08 22:21:57,340-Speed 5974.21 samples/sec Loss 5.7974 LearningRate 0.0633 Epoch: 12 Global Step: 133120 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-08 22:22:04,182-Speed 5987.26 samples/sec Loss 5.8221 LearningRate 0.0633 Epoch: 12 Global Step: 133130 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-08 22:22:11,036-Speed 5977.48 samples/sec Loss 5.8278 LearningRate 0.0633 Epoch: 12 Global Step: 133140 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-08 22:22:17,880-Speed 5985.64 samples/sec Loss 5.8221 LearningRate 0.0633 Epoch: 12 Global Step: 133150 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-08 22:22:24,731-Speed 5979.97 samples/sec Loss 5.8433 LearningRate 0.0633 Epoch: 12 Global Step: 133160 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:22:31,592-Speed 5971.10 samples/sec Loss 5.7472 LearningRate 0.0632 Epoch: 12 Global Step: 133170 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:22:38,462-Speed 5963.04 samples/sec Loss 5.7815 LearningRate 0.0632 Epoch: 12 Global Step: 133180 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:22:45,333-Speed 5962.98 samples/sec Loss 5.8108 LearningRate 0.0632 Epoch: 12 Global Step: 133190 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:22:52,194-Speed 5970.60 samples/sec Loss 5.8043 LearningRate 0.0632 Epoch: 12 Global Step: 133200 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:22:59,054-Speed 5972.41 samples/sec Loss 5.8254 LearningRate 0.0632 Epoch: 12 Global Step: 133210 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:23:05,903-Speed 5981.32 samples/sec Loss 5.7882 LearningRate 0.0632 Epoch: 12 Global Step: 133220 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:23:12,770-Speed 5965.90 samples/sec Loss 5.8948 LearningRate 0.0631 Epoch: 12 Global Step: 133230 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:23:19,628-Speed 5973.77 samples/sec Loss 5.8036 LearningRate 0.0631 Epoch: 12 Global Step: 133240 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:23:26,500-Speed 5962.31 samples/sec Loss 5.7430 LearningRate 0.0631 Epoch: 12 Global Step: 133250 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:23:33,402-Speed 5935.61 samples/sec Loss 5.8066 LearningRate 0.0631 Epoch: 12 Global Step: 133260 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:23:40,256-Speed 5977.99 samples/sec Loss 5.7419 LearningRate 0.0631 Epoch: 12 Global Step: 133270 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:23:47,159-Speed 5934.35 samples/sec Loss 5.8120 LearningRate 0.0630 Epoch: 12 Global Step: 133280 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:23:54,062-Speed 5935.52 samples/sec Loss 5.8181 LearningRate 0.0630 Epoch: 12 Global Step: 133290 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:24:00,965-Speed 5934.52 samples/sec Loss 5.8225 LearningRate 0.0630 Epoch: 12 Global Step: 133300 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:24:07,884-Speed 5921.31 samples/sec Loss 5.7920 LearningRate 0.0630 Epoch: 12 Global Step: 133310 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:24:14,784-Speed 5936.60 samples/sec Loss 5.8128 LearningRate 0.0630 Epoch: 12 Global Step: 133320 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:24:21,653-Speed 5964.66 samples/sec Loss 5.7510 LearningRate 0.0630 Epoch: 12 Global Step: 133330 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:24:28,520-Speed 5965.85 samples/sec Loss 5.7811 LearningRate 0.0629 Epoch: 12 Global Step: 133340 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:24:35,402-Speed 5952.49 samples/sec Loss 5.7191 LearningRate 0.0629 Epoch: 12 Global Step: 133350 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:24:42,271-Speed 5964.83 samples/sec Loss 5.8054 LearningRate 0.0629 Epoch: 12 Global Step: 133360 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:24:49,130-Speed 5972.19 samples/sec Loss 5.7708 LearningRate 0.0629 Epoch: 12 Global Step: 133370 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:24:55,995-Speed 5967.21 samples/sec Loss 5.8827 LearningRate 0.0629 Epoch: 12 Global Step: 133380 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-08 22:25:02,858-Speed 5972.66 samples/sec Loss 5.7823 LearningRate 0.0629 Epoch: 12 Global Step: 133390 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:25:09,719-Speed 5972.41 samples/sec Loss 5.8018 LearningRate 0.0628 Epoch: 12 Global Step: 133400 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:25:16,577-Speed 5973.70 samples/sec Loss 5.7512 LearningRate 0.0628 Epoch: 12 Global Step: 133410 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-08 22:25:23,430-Speed 5978.22 samples/sec Loss 5.7929 LearningRate 0.0628 Epoch: 12 Global Step: 133420 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:25:30,327-Speed 5944.94 samples/sec Loss 5.8413 LearningRate 0.0628 Epoch: 12 Global Step: 133430 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:25:37,188-Speed 5970.99 samples/sec Loss 5.7804 LearningRate 0.0628 Epoch: 12 Global Step: 133440 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:25:44,040-Speed 5978.78 samples/sec Loss 5.7331 LearningRate 0.0628 Epoch: 12 Global Step: 133450 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:25:50,920-Speed 5956.81 samples/sec Loss 5.7974 LearningRate 0.0627 Epoch: 12 Global Step: 133460 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:25:57,774-Speed 5976.87 samples/sec Loss 5.8262 LearningRate 0.0627 Epoch: 12 Global Step: 133470 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:26:04,652-Speed 5956.77 samples/sec Loss 5.8303 LearningRate 0.0627 Epoch: 12 Global Step: 133480 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:26:11,500-Speed 5982.22 samples/sec Loss 5.7864 LearningRate 0.0627 Epoch: 12 Global Step: 133490 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:26:18,361-Speed 5971.48 samples/sec Loss 5.8006 LearningRate 0.0627 Epoch: 12 Global Step: 133500 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:26:25,225-Speed 5968.31 samples/sec Loss 5.8115 LearningRate 0.0627 Epoch: 12 Global Step: 133510 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:26:32,088-Speed 5978.21 samples/sec Loss 5.7957 LearningRate 0.0626 Epoch: 12 Global Step: 133520 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:26:38,964-Speed 5957.70 samples/sec Loss 5.7792 LearningRate 0.0626 Epoch: 12 Global Step: 133530 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:26:45,836-Speed 5961.48 samples/sec Loss 5.8092 LearningRate 0.0626 Epoch: 12 Global Step: 133540 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:26:52,716-Speed 5955.66 samples/sec Loss 5.7443 LearningRate 0.0626 Epoch: 12 Global Step: 133550 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:26:59,576-Speed 5972.10 samples/sec Loss 5.7438 LearningRate 0.0626 Epoch: 12 Global Step: 133560 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:27:06,436-Speed 5971.47 samples/sec Loss 5.7660 LearningRate 0.0626 Epoch: 12 Global Step: 133570 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:27:13,305-Speed 5969.53 samples/sec Loss 5.7555 LearningRate 0.0625 Epoch: 12 Global Step: 133580 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:27:20,228-Speed 5917.89 samples/sec Loss 5.7716 LearningRate 0.0625 Epoch: 12 Global Step: 133590 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:27:27,089-Speed 5970.68 samples/sec Loss 5.7575 LearningRate 0.0625 Epoch: 12 Global Step: 133600 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:27:33,943-Speed 5977.41 samples/sec Loss 5.7853 LearningRate 0.0625 Epoch: 12 Global Step: 133610 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:27:40,791-Speed 5982.92 samples/sec Loss 5.8169 LearningRate 0.0625 Epoch: 12 Global Step: 133620 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:27:47,643-Speed 5978.99 samples/sec Loss 5.8307 LearningRate 0.0625 Epoch: 12 Global Step: 133630 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:27:54,498-Speed 5978.58 samples/sec Loss 5.7664 LearningRate 0.0624 Epoch: 12 Global Step: 133640 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:28:01,361-Speed 5969.14 samples/sec Loss 5.8118 LearningRate 0.0624 Epoch: 12 Global Step: 133650 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:28:08,224-Speed 5969.73 samples/sec Loss 5.7787 LearningRate 0.0624 Epoch: 12 Global Step: 133660 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:28:15,079-Speed 5976.24 samples/sec Loss 5.7442 LearningRate 0.0624 Epoch: 12 Global Step: 133670 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:28:21,929-Speed 5980.94 samples/sec Loss 5.7563 LearningRate 0.0624 Epoch: 12 Global Step: 133680 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:28:28,789-Speed 5971.80 samples/sec Loss 5.7803 LearningRate 0.0624 Epoch: 12 Global Step: 133690 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:28:35,653-Speed 5970.24 samples/sec Loss 5.7767 LearningRate 0.0623 Epoch: 12 Global Step: 133700 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:28:42,510-Speed 5974.44 samples/sec Loss 5.7434 LearningRate 0.0623 Epoch: 12 Global Step: 133710 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:28:49,373-Speed 5969.30 samples/sec Loss 5.7679 LearningRate 0.0623 Epoch: 12 Global Step: 133720 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:28:56,223-Speed 5981.15 samples/sec Loss 5.8332 LearningRate 0.0623 Epoch: 12 Global Step: 133730 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:29:03,070-Speed 5982.40 samples/sec Loss 5.7617 LearningRate 0.0623 Epoch: 12 Global Step: 133740 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:29:09,933-Speed 5969.49 samples/sec Loss 5.7544 LearningRate 0.0623 Epoch: 12 Global Step: 133750 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:29:16,810-Speed 5957.46 samples/sec Loss 5.8081 LearningRate 0.0622 Epoch: 12 Global Step: 133760 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:29:23,676-Speed 5966.84 samples/sec Loss 5.7996 LearningRate 0.0622 Epoch: 12 Global Step: 133770 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:29:30,512-Speed 5993.06 samples/sec Loss 5.7854 LearningRate 0.0622 Epoch: 12 Global Step: 133780 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:29:37,369-Speed 5974.09 samples/sec Loss 5.7662 LearningRate 0.0622 Epoch: 12 Global Step: 133790 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:29:44,221-Speed 5978.82 samples/sec Loss 5.7784 LearningRate 0.0622 Epoch: 12 Global Step: 133800 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:29:51,084-Speed 5969.79 samples/sec Loss 5.7752 LearningRate 0.0622 Epoch: 12 Global Step: 133810 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:29:58,001-Speed 5922.86 samples/sec Loss 5.8072 LearningRate 0.0621 Epoch: 12 Global Step: 133820 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:30:04,917-Speed 5923.31 samples/sec Loss 5.7149 LearningRate 0.0621 Epoch: 12 Global Step: 133830 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:30:11,879-Speed 5884.52 samples/sec Loss 5.7191 LearningRate 0.0621 Epoch: 12 Global Step: 133840 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:30:18,866-Speed 5864.08 samples/sec Loss 5.7765 LearningRate 0.0621 Epoch: 12 Global Step: 133850 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:30:25,816-Speed 5894.39 samples/sec Loss 5.7401 LearningRate 0.0621 Epoch: 12 Global Step: 133860 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:30:32,768-Speed 5893.12 samples/sec Loss 5.7624 LearningRate 0.0620 Epoch: 12 Global Step: 133870 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:30:39,623-Speed 5976.94 samples/sec Loss 5.7726 LearningRate 0.0620 Epoch: 12 Global Step: 133880 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:30:46,497-Speed 5959.80 samples/sec Loss 5.7681 LearningRate 0.0620 Epoch: 12 Global Step: 133890 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:30:53,347-Speed 5981.08 samples/sec Loss 5.7700 LearningRate 0.0620 Epoch: 12 Global Step: 133900 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:31:00,186-Speed 5990.53 samples/sec Loss 5.7645 LearningRate 0.0620 Epoch: 12 Global Step: 133910 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:31:07,555-Speed 5558.84 samples/sec Loss 5.7512 LearningRate 0.0620 Epoch: 12 Global Step: 133920 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:31:14,408-Speed 5978.20 samples/sec Loss 5.7260 LearningRate 0.0619 Epoch: 12 Global Step: 133930 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:31:21,273-Speed 5967.84 samples/sec Loss 5.8201 LearningRate 0.0619 Epoch: 12 Global Step: 133940 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:31:28,151-Speed 5956.25 samples/sec Loss 5.7931 LearningRate 0.0619 Epoch: 12 Global Step: 133950 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:31:35,017-Speed 5966.74 samples/sec Loss 5.7401 LearningRate 0.0619 Epoch: 12 Global Step: 133960 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:31:41,865-Speed 5982.73 samples/sec Loss 5.7687 LearningRate 0.0619 Epoch: 12 Global Step: 133970 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:31:48,721-Speed 5975.24 samples/sec Loss 5.7459 LearningRate 0.0619 Epoch: 12 Global Step: 133980 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:31:55,579-Speed 5973.88 samples/sec Loss 5.7094 LearningRate 0.0618 Epoch: 12 Global Step: 133990 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:32:02,425-Speed 5983.99 samples/sec Loss 5.7636 LearningRate 0.0618 Epoch: 12 Global Step: 134000 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:32:09,281-Speed 5975.91 samples/sec Loss 5.7233 LearningRate 0.0618 Epoch: 12 Global Step: 134010 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:32:16,147-Speed 5966.38 samples/sec Loss 5.7556 LearningRate 0.0618 Epoch: 12 Global Step: 134020 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:32:23,004-Speed 5977.65 samples/sec Loss 5.8007 LearningRate 0.0618 Epoch: 12 Global Step: 134030 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:32:29,874-Speed 5963.37 samples/sec Loss 5.7314 LearningRate 0.0618 Epoch: 12 Global Step: 134040 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:32:36,735-Speed 5971.23 samples/sec Loss 5.7930 LearningRate 0.0617 Epoch: 12 Global Step: 134050 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:32:43,585-Speed 5983.99 samples/sec Loss 5.7988 LearningRate 0.0617 Epoch: 12 Global Step: 134060 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:32:50,441-Speed 5975.57 samples/sec Loss 5.7223 LearningRate 0.0617 Epoch: 12 Global Step: 134070 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:32:57,329-Speed 5947.82 samples/sec Loss 5.7437 LearningRate 0.0617 Epoch: 12 Global Step: 134080 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:33:04,184-Speed 5975.93 samples/sec Loss 5.7553 LearningRate 0.0617 Epoch: 12 Global Step: 134090 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:33:11,035-Speed 5979.42 samples/sec Loss 5.7734 LearningRate 0.0617 Epoch: 12 Global Step: 134100 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:33:17,904-Speed 5964.98 samples/sec Loss 5.7893 LearningRate 0.0616 Epoch: 12 Global Step: 134110 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-08 22:33:24,756-Speed 5978.76 samples/sec Loss 5.7095 LearningRate 0.0616 Epoch: 12 Global Step: 134120 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:33:31,609-Speed 5978.47 samples/sec Loss 5.7842 LearningRate 0.0616 Epoch: 12 Global Step: 134130 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:33:38,497-Speed 5947.12 samples/sec Loss 5.7242 LearningRate 0.0616 Epoch: 12 Global Step: 134140 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:33:45,354-Speed 5978.71 samples/sec Loss 5.7126 LearningRate 0.0616 Epoch: 12 Global Step: 134150 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:33:52,207-Speed 5978.06 samples/sec Loss 5.7404 LearningRate 0.0616 Epoch: 12 Global Step: 134160 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:33:59,054-Speed 5983.04 samples/sec Loss 5.6986 LearningRate 0.0615 Epoch: 12 Global Step: 134170 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:34:05,913-Speed 5972.92 samples/sec Loss 5.7564 LearningRate 0.0615 Epoch: 12 Global Step: 134180 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:34:12,775-Speed 5969.96 samples/sec Loss 5.7372 LearningRate 0.0615 Epoch: 12 Global Step: 134190 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:34:19,621-Speed 5984.37 samples/sec Loss 5.7309 LearningRate 0.0615 Epoch: 12 Global Step: 134200 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:34:26,451-Speed 5998.02 samples/sec Loss 5.7592 LearningRate 0.0615 Epoch: 12 Global Step: 134210 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 22:34:33,343-Speed 5943.92 samples/sec Loss 5.7268 LearningRate 0.0615 Epoch: 12 Global Step: 134220 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 22:34:40,222-Speed 5955.34 samples/sec Loss 5.6824 LearningRate 0.0614 Epoch: 12 Global Step: 134230 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 22:34:47,092-Speed 5964.37 samples/sec Loss 5.7354 LearningRate 0.0614 Epoch: 12 Global Step: 134240 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 22:34:53,948-Speed 5974.88 samples/sec Loss 5.7616 LearningRate 0.0614 Epoch: 12 Global Step: 134250 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 22:35:00,798-Speed 5980.21 samples/sec Loss 5.7825 LearningRate 0.0614 Epoch: 12 Global Step: 134260 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 22:35:07,660-Speed 5971.21 samples/sec Loss 5.7011 LearningRate 0.0614 Epoch: 12 Global Step: 134270 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 22:35:14,513-Speed 5976.91 samples/sec Loss 5.7727 LearningRate 0.0614 Epoch: 12 Global Step: 134280 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 22:35:21,370-Speed 5975.27 samples/sec Loss 5.8049 LearningRate 0.0613 Epoch: 12 Global Step: 134290 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 22:35:28,213-Speed 5986.84 samples/sec Loss 5.7688 LearningRate 0.0613 Epoch: 12 Global Step: 134300 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 22:35:35,071-Speed 5973.51 samples/sec Loss 5.7363 LearningRate 0.0613 Epoch: 12 Global Step: 134310 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:35:41,918-Speed 5983.87 samples/sec Loss 5.7587 LearningRate 0.0613 Epoch: 12 Global Step: 134320 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:35:48,777-Speed 5972.96 samples/sec Loss 5.7552 LearningRate 0.0613 Epoch: 12 Global Step: 134330 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:35:55,678-Speed 5936.51 samples/sec Loss 5.7710 LearningRate 0.0613 Epoch: 12 Global Step: 134340 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:36:02,530-Speed 5979.20 samples/sec Loss 5.7572 LearningRate 0.0612 Epoch: 12 Global Step: 134350 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:36:09,408-Speed 5956.46 samples/sec Loss 5.7052 LearningRate 0.0612 Epoch: 12 Global Step: 134360 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:36:16,265-Speed 5974.40 samples/sec Loss 5.7366 LearningRate 0.0612 Epoch: 12 Global Step: 134370 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:36:23,130-Speed 5967.93 samples/sec Loss 5.6962 LearningRate 0.0612 Epoch: 12 Global Step: 134380 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:36:30,008-Speed 5956.32 samples/sec Loss 5.7131 LearningRate 0.0612 Epoch: 12 Global Step: 134390 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:36:36,867-Speed 5972.63 samples/sec Loss 5.7171 LearningRate 0.0612 Epoch: 12 Global Step: 134400 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:36:43,721-Speed 5977.62 samples/sec Loss 5.6951 LearningRate 0.0611 Epoch: 12 Global Step: 134410 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:36:50,602-Speed 5953.41 samples/sec Loss 5.7303 LearningRate 0.0611 Epoch: 12 Global Step: 134420 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:36:57,460-Speed 5973.82 samples/sec Loss 5.7374 LearningRate 0.0611 Epoch: 12 Global Step: 134430 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:37:04,314-Speed 5977.71 samples/sec Loss 5.6986 LearningRate 0.0611 Epoch: 12 Global Step: 134440 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:37:11,169-Speed 5979.03 samples/sec Loss 5.7254 LearningRate 0.0611 Epoch: 12 Global Step: 134450 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:37:18,023-Speed 5977.33 samples/sec Loss 5.8017 LearningRate 0.0611 Epoch: 12 Global Step: 134460 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:37:24,885-Speed 5970.68 samples/sec Loss 5.7479 LearningRate 0.0610 Epoch: 12 Global Step: 134470 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:37:31,732-Speed 5983.61 samples/sec Loss 5.7126 LearningRate 0.0610 Epoch: 12 Global Step: 134480 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:37:38,593-Speed 5970.34 samples/sec Loss 5.6425 LearningRate 0.0610 Epoch: 12 Global Step: 134490 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:37:45,449-Speed 5975.60 samples/sec Loss 5.7384 LearningRate 0.0610 Epoch: 12 Global Step: 134500 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:37:52,305-Speed 5976.13 samples/sec Loss 5.7153 LearningRate 0.0610 Epoch: 12 Global Step: 134510 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:37:59,171-Speed 5966.77 samples/sec Loss 5.6853 LearningRate 0.0610 Epoch: 12 Global Step: 134520 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:38:06,026-Speed 5976.50 samples/sec Loss 5.7355 LearningRate 0.0609 Epoch: 12 Global Step: 134530 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:38:12,875-Speed 5981.97 samples/sec Loss 5.7383 LearningRate 0.0609 Epoch: 12 Global Step: 134540 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:38:19,726-Speed 5979.88 samples/sec Loss 5.6739 LearningRate 0.0609 Epoch: 12 Global Step: 134550 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:38:26,594-Speed 5965.35 samples/sec Loss 5.7183 LearningRate 0.0609 Epoch: 12 Global Step: 134560 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:38:33,452-Speed 5973.30 samples/sec Loss 5.7153 LearningRate 0.0609 Epoch: 12 Global Step: 134570 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:38:40,299-Speed 5982.92 samples/sec Loss 5.7532 LearningRate 0.0609 Epoch: 12 Global Step: 134580 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:38:47,187-Speed 5948.41 samples/sec Loss 5.7054 LearningRate 0.0608 Epoch: 12 Global Step: 134590 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:38:54,037-Speed 5982.47 samples/sec Loss 5.6911 LearningRate 0.0608 Epoch: 12 Global Step: 134600 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:39:00,904-Speed 5965.76 samples/sec Loss 5.6927 LearningRate 0.0608 Epoch: 12 Global Step: 134610 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:39:07,762-Speed 5974.09 samples/sec Loss 5.6954 LearningRate 0.0608 Epoch: 12 Global Step: 134620 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:39:14,607-Speed 5985.14 samples/sec Loss 5.7023 LearningRate 0.0608 Epoch: 12 Global Step: 134630 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:39:21,452-Speed 5985.18 samples/sec Loss 5.7286 LearningRate 0.0608 Epoch: 12 Global Step: 134640 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:39:28,303-Speed 5979.96 samples/sec Loss 5.7319 LearningRate 0.0607 Epoch: 12 Global Step: 134650 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:39:35,182-Speed 5955.02 samples/sec Loss 5.7767 LearningRate 0.0607 Epoch: 12 Global Step: 134660 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:39:42,038-Speed 5975.43 samples/sec Loss 5.7310 LearningRate 0.0607 Epoch: 12 Global Step: 134670 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:39:48,908-Speed 5962.98 samples/sec Loss 5.7510 LearningRate 0.0607 Epoch: 12 Global Step: 134680 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:39:55,762-Speed 5977.80 samples/sec Loss 5.7093 LearningRate 0.0607 Epoch: 12 Global Step: 134690 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:40:02,610-Speed 5981.59 samples/sec Loss 5.6746 LearningRate 0.0607 Epoch: 12 Global Step: 134700 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:40:09,461-Speed 5982.06 samples/sec Loss 5.7300 LearningRate 0.0606 Epoch: 12 Global Step: 134710 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:40:16,312-Speed 5979.98 samples/sec Loss 5.8128 LearningRate 0.0606 Epoch: 12 Global Step: 134720 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:40:23,188-Speed 5958.20 samples/sec Loss 5.6818 LearningRate 0.0606 Epoch: 12 Global Step: 134730 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:40:30,036-Speed 5983.60 samples/sec Loss 5.6725 LearningRate 0.0606 Epoch: 12 Global Step: 134740 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:40:36,889-Speed 5981.45 samples/sec Loss 5.7669 LearningRate 0.0606 Epoch: 12 Global Step: 134750 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:40:43,749-Speed 5971.34 samples/sec Loss 5.7309 LearningRate 0.0606 Epoch: 12 Global Step: 134760 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:40:50,602-Speed 5978.84 samples/sec Loss 5.6914 LearningRate 0.0605 Epoch: 12 Global Step: 134770 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:40:57,451-Speed 5981.17 samples/sec Loss 5.7057 LearningRate 0.0605 Epoch: 12 Global Step: 134780 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:41:04,316-Speed 5967.03 samples/sec Loss 5.7724 LearningRate 0.0605 Epoch: 12 Global Step: 134790 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:41:11,173-Speed 5975.36 samples/sec Loss 5.7283 LearningRate 0.0605 Epoch: 12 Global Step: 134800 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:41:34,406-Speed 1763.10 samples/sec Loss 5.7144 LearningRate 0.0605 Epoch: 13 Global Step: 134810 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:41:41,258-Speed 5980.42 samples/sec Loss 5.7026 LearningRate 0.0605 Epoch: 13 Global Step: 134820 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:41:48,116-Speed 5973.58 samples/sec Loss 5.7033 LearningRate 0.0604 Epoch: 13 Global Step: 134830 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:41:54,972-Speed 5975.57 samples/sec Loss 5.7075 LearningRate 0.0604 Epoch: 13 Global Step: 134840 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:42:01,806-Speed 5994.91 samples/sec Loss 5.7356 LearningRate 0.0604 Epoch: 13 Global Step: 134850 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:42:08,645-Speed 5990.22 samples/sec Loss 5.6948 LearningRate 0.0604 Epoch: 13 Global Step: 134860 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:42:15,506-Speed 5971.12 samples/sec Loss 5.7006 LearningRate 0.0604 Epoch: 13 Global Step: 134870 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:42:22,364-Speed 5974.29 samples/sec Loss 5.7180 LearningRate 0.0604 Epoch: 13 Global Step: 134880 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:42:29,205-Speed 5988.67 samples/sec Loss 5.7161 LearningRate 0.0603 Epoch: 13 Global Step: 134890 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:42:36,063-Speed 5973.00 samples/sec Loss 5.7030 LearningRate 0.0603 Epoch: 13 Global Step: 134900 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:42:42,931-Speed 5965.47 samples/sec Loss 5.6743 LearningRate 0.0603 Epoch: 13 Global Step: 134910 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:42:49,786-Speed 5977.12 samples/sec Loss 5.6771 LearningRate 0.0603 Epoch: 13 Global Step: 134920 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:42:56,645-Speed 5971.75 samples/sec Loss 5.7477 LearningRate 0.0603 Epoch: 13 Global Step: 134930 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:43:03,494-Speed 5982.42 samples/sec Loss 5.6415 LearningRate 0.0603 Epoch: 13 Global Step: 134940 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:43:10,352-Speed 5976.48 samples/sec Loss 5.6761 LearningRate 0.0602 Epoch: 13 Global Step: 134950 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:43:17,219-Speed 5968.72 samples/sec Loss 5.6624 LearningRate 0.0602 Epoch: 13 Global Step: 134960 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:43:24,074-Speed 5978.48 samples/sec Loss 5.6336 LearningRate 0.0602 Epoch: 13 Global Step: 134970 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:43:30,952-Speed 5956.63 samples/sec Loss 5.6703 LearningRate 0.0602 Epoch: 13 Global Step: 134980 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:43:37,847-Speed 5941.35 samples/sec Loss 5.6992 LearningRate 0.0602 Epoch: 13 Global Step: 134990 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:43:44,696-Speed 5982.39 samples/sec Loss 5.6893 LearningRate 0.0602 Epoch: 13 Global Step: 135000 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-01-08 22:44:11,505-[lfw][135000]XNorm: 24.448964 Training: 2022-01-08 22:44:11,506-[lfw][135000]Accuracy-Flip: 0.99783+-0.00308 Training: 2022-01-08 22:44:11,506-[lfw][135000]Accuracy-Highest: 0.99783 Training: 2022-01-08 22:44:42,577-[cfp_fp][135000]XNorm: 21.364589 Training: 2022-01-08 22:44:42,578-[cfp_fp][135000]Accuracy-Flip: 0.98586+-0.00559 Training: 2022-01-08 22:44:42,579-[cfp_fp][135000]Accuracy-Highest: 0.98586 Training: 2022-01-08 22:45:09,282-[agedb_30][135000]XNorm: 23.805966 Training: 2022-01-08 22:45:09,283-[agedb_30][135000]Accuracy-Flip: 0.97667+-0.00830 Training: 2022-01-08 22:45:09,284-[agedb_30][135000]Accuracy-Highest: 0.97667 Training: 2022-01-08 22:45:16,122-Speed 448.02 samples/sec Loss 5.6575 LearningRate 0.0601 Epoch: 13 Global Step: 135010 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-01-08 22:45:22,956-Speed 5994.20 samples/sec Loss 5.6415 LearningRate 0.0601 Epoch: 13 Global Step: 135020 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-01-08 22:45:29,791-Speed 5993.14 samples/sec Loss 5.7195 LearningRate 0.0601 Epoch: 13 Global Step: 135030 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-01-08 22:45:36,644-Speed 5978.34 samples/sec Loss 5.6952 LearningRate 0.0601 Epoch: 13 Global Step: 135040 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-01-08 22:45:43,493-Speed 5982.66 samples/sec Loss 5.6696 LearningRate 0.0601 Epoch: 13 Global Step: 135050 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-01-08 22:45:50,344-Speed 5980.88 samples/sec Loss 5.6247 LearningRate 0.0601 Epoch: 13 Global Step: 135060 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-01-08 22:45:57,225-Speed 5956.21 samples/sec Loss 5.7123 LearningRate 0.0600 Epoch: 13 Global Step: 135070 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-01-08 22:46:04,094-Speed 5963.55 samples/sec Loss 5.6249 LearningRate 0.0600 Epoch: 13 Global Step: 135080 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-01-08 22:46:10,961-Speed 5966.00 samples/sec Loss 5.6513 LearningRate 0.0600 Epoch: 13 Global Step: 135090 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-01-08 22:46:17,841-Speed 5954.99 samples/sec Loss 5.6624 LearningRate 0.0600 Epoch: 13 Global Step: 135100 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 22:46:24,706-Speed 5967.71 samples/sec Loss 5.6310 LearningRate 0.0600 Epoch: 13 Global Step: 135110 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 22:46:31,575-Speed 5963.90 samples/sec Loss 5.6532 LearningRate 0.0600 Epoch: 13 Global Step: 135120 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 22:46:38,432-Speed 5974.62 samples/sec Loss 5.7127 LearningRate 0.0599 Epoch: 13 Global Step: 135130 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 22:46:45,294-Speed 5969.57 samples/sec Loss 5.6973 LearningRate 0.0599 Epoch: 13 Global Step: 135140 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 22:46:52,159-Speed 5968.96 samples/sec Loss 5.6780 LearningRate 0.0599 Epoch: 13 Global Step: 135150 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 22:46:59,038-Speed 5955.90 samples/sec Loss 5.6775 LearningRate 0.0599 Epoch: 13 Global Step: 135160 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 22:47:05,919-Speed 5953.43 samples/sec Loss 5.7000 LearningRate 0.0599 Epoch: 13 Global Step: 135170 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 22:47:12,815-Speed 5941.26 samples/sec Loss 5.6770 LearningRate 0.0599 Epoch: 13 Global Step: 135180 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 22:47:19,687-Speed 5961.86 samples/sec Loss 5.6150 LearningRate 0.0598 Epoch: 13 Global Step: 135190 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 22:47:26,555-Speed 5964.60 samples/sec Loss 5.7273 LearningRate 0.0598 Epoch: 13 Global Step: 135200 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:47:33,426-Speed 5962.83 samples/sec Loss 5.6374 LearningRate 0.0598 Epoch: 13 Global Step: 135210 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:47:40,297-Speed 5964.23 samples/sec Loss 5.6570 LearningRate 0.0598 Epoch: 13 Global Step: 135220 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:47:47,150-Speed 5977.88 samples/sec Loss 5.6843 LearningRate 0.0598 Epoch: 13 Global Step: 135230 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:47:54,015-Speed 5967.45 samples/sec Loss 5.7047 LearningRate 0.0598 Epoch: 13 Global Step: 135240 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:48:00,889-Speed 5963.26 samples/sec Loss 5.7114 LearningRate 0.0597 Epoch: 13 Global Step: 135250 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:48:07,769-Speed 5953.88 samples/sec Loss 5.7057 LearningRate 0.0597 Epoch: 13 Global Step: 135260 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:48:14,686-Speed 5923.31 samples/sec Loss 5.7017 LearningRate 0.0597 Epoch: 13 Global Step: 135270 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:48:21,549-Speed 5972.68 samples/sec Loss 5.6981 LearningRate 0.0597 Epoch: 13 Global Step: 135280 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:48:28,406-Speed 5974.29 samples/sec Loss 5.6088 LearningRate 0.0597 Epoch: 13 Global Step: 135290 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:48:35,299-Speed 5943.14 samples/sec Loss 5.6639 LearningRate 0.0597 Epoch: 13 Global Step: 135300 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:48:42,186-Speed 5949.10 samples/sec Loss 5.6906 LearningRate 0.0596 Epoch: 13 Global Step: 135310 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:48:49,063-Speed 5957.54 samples/sec Loss 5.6336 LearningRate 0.0596 Epoch: 13 Global Step: 135320 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:48:55,948-Speed 5950.51 samples/sec Loss 5.7121 LearningRate 0.0596 Epoch: 13 Global Step: 135330 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:49:02,848-Speed 5937.89 samples/sec Loss 5.6353 LearningRate 0.0596 Epoch: 13 Global Step: 135340 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:49:09,721-Speed 5961.97 samples/sec Loss 5.6674 LearningRate 0.0596 Epoch: 13 Global Step: 135350 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:49:16,577-Speed 5975.66 samples/sec Loss 5.6770 LearningRate 0.0596 Epoch: 13 Global Step: 135360 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:49:23,443-Speed 5967.05 samples/sec Loss 5.6313 LearningRate 0.0595 Epoch: 13 Global Step: 135370 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:49:30,302-Speed 5973.00 samples/sec Loss 5.5875 LearningRate 0.0595 Epoch: 13 Global Step: 135380 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:49:37,163-Speed 5971.11 samples/sec Loss 5.6620 LearningRate 0.0595 Epoch: 13 Global Step: 135390 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:49:44,037-Speed 5960.45 samples/sec Loss 5.6108 LearningRate 0.0595 Epoch: 13 Global Step: 135400 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:49:50,896-Speed 5972.24 samples/sec Loss 5.6711 LearningRate 0.0595 Epoch: 13 Global Step: 135410 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:49:57,765-Speed 5964.75 samples/sec Loss 5.6687 LearningRate 0.0595 Epoch: 13 Global Step: 135420 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:50:04,626-Speed 5972.62 samples/sec Loss 5.6392 LearningRate 0.0594 Epoch: 13 Global Step: 135430 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:50:11,489-Speed 5969.25 samples/sec Loss 5.6477 LearningRate 0.0594 Epoch: 13 Global Step: 135440 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:50:18,341-Speed 5979.05 samples/sec Loss 5.6997 LearningRate 0.0594 Epoch: 13 Global Step: 135450 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:50:25,321-Speed 5869.66 samples/sec Loss 5.6881 LearningRate 0.0594 Epoch: 13 Global Step: 135460 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:50:32,275-Speed 5891.13 samples/sec Loss 5.6696 LearningRate 0.0594 Epoch: 13 Global Step: 135470 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:50:39,246-Speed 5876.75 samples/sec Loss 5.6698 LearningRate 0.0594 Epoch: 13 Global Step: 135480 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:50:46,103-Speed 5975.45 samples/sec Loss 5.6353 LearningRate 0.0593 Epoch: 13 Global Step: 135490 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:50:52,967-Speed 5967.38 samples/sec Loss 5.6278 LearningRate 0.0593 Epoch: 13 Global Step: 135500 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:50:59,826-Speed 5973.31 samples/sec Loss 5.6212 LearningRate 0.0593 Epoch: 13 Global Step: 135510 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:51:06,719-Speed 5942.95 samples/sec Loss 5.5986 LearningRate 0.0593 Epoch: 13 Global Step: 135520 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:51:13,577-Speed 5973.68 samples/sec Loss 5.6410 LearningRate 0.0593 Epoch: 13 Global Step: 135530 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:51:20,431-Speed 5976.90 samples/sec Loss 5.6920 LearningRate 0.0593 Epoch: 13 Global Step: 135540 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:51:27,283-Speed 5979.03 samples/sec Loss 5.6399 LearningRate 0.0592 Epoch: 13 Global Step: 135550 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:51:34,156-Speed 5961.27 samples/sec Loss 5.6655 LearningRate 0.0592 Epoch: 13 Global Step: 135560 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:51:41,016-Speed 5972.02 samples/sec Loss 5.6382 LearningRate 0.0592 Epoch: 13 Global Step: 135570 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:51:47,878-Speed 5970.68 samples/sec Loss 5.6314 LearningRate 0.0592 Epoch: 13 Global Step: 135580 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:51:54,737-Speed 5972.34 samples/sec Loss 5.6610 LearningRate 0.0592 Epoch: 13 Global Step: 135590 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:52:01,599-Speed 5970.59 samples/sec Loss 5.6369 LearningRate 0.0592 Epoch: 13 Global Step: 135600 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:52:08,481-Speed 5953.41 samples/sec Loss 5.6029 LearningRate 0.0591 Epoch: 13 Global Step: 135610 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:52:15,334-Speed 5977.16 samples/sec Loss 5.6998 LearningRate 0.0591 Epoch: 13 Global Step: 135620 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 22:52:22,193-Speed 5972.84 samples/sec Loss 5.7089 LearningRate 0.0591 Epoch: 13 Global Step: 135630 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 22:52:29,050-Speed 5975.19 samples/sec Loss 5.7110 LearningRate 0.0591 Epoch: 13 Global Step: 135640 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 22:52:36,003-Speed 5892.29 samples/sec Loss 5.6149 LearningRate 0.0591 Epoch: 13 Global Step: 135650 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 22:52:42,869-Speed 5966.24 samples/sec Loss 5.6845 LearningRate 0.0591 Epoch: 13 Global Step: 135660 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 22:52:49,720-Speed 5980.15 samples/sec Loss 5.6599 LearningRate 0.0590 Epoch: 13 Global Step: 135670 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 22:52:56,571-Speed 5980.10 samples/sec Loss 5.6459 LearningRate 0.0590 Epoch: 13 Global Step: 135680 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 22:53:03,460-Speed 5946.76 samples/sec Loss 5.6309 LearningRate 0.0590 Epoch: 13 Global Step: 135690 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 22:53:10,308-Speed 5982.08 samples/sec Loss 5.6787 LearningRate 0.0590 Epoch: 13 Global Step: 135700 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 22:53:17,164-Speed 5975.15 samples/sec Loss 5.6534 LearningRate 0.0590 Epoch: 13 Global Step: 135710 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 22:53:24,015-Speed 5980.59 samples/sec Loss 5.6758 LearningRate 0.0590 Epoch: 13 Global Step: 135720 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:53:30,874-Speed 5973.40 samples/sec Loss 5.6713 LearningRate 0.0589 Epoch: 13 Global Step: 135730 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:53:37,720-Speed 5984.06 samples/sec Loss 5.6079 LearningRate 0.0589 Epoch: 13 Global Step: 135740 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:53:44,653-Speed 5909.33 samples/sec Loss 5.5948 LearningRate 0.0589 Epoch: 13 Global Step: 135750 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:53:51,522-Speed 5964.45 samples/sec Loss 5.5960 LearningRate 0.0589 Epoch: 13 Global Step: 135760 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:53:58,377-Speed 5976.15 samples/sec Loss 5.5772 LearningRate 0.0589 Epoch: 13 Global Step: 135770 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:54:05,228-Speed 5979.33 samples/sec Loss 5.6444 LearningRate 0.0589 Epoch: 13 Global Step: 135780 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:54:12,083-Speed 5976.52 samples/sec Loss 5.5707 LearningRate 0.0588 Epoch: 13 Global Step: 135790 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:54:18,967-Speed 5951.38 samples/sec Loss 5.6518 LearningRate 0.0588 Epoch: 13 Global Step: 135800 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:54:25,825-Speed 5975.81 samples/sec Loss 5.6081 LearningRate 0.0588 Epoch: 13 Global Step: 135810 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:54:32,666-Speed 5988.20 samples/sec Loss 5.6648 LearningRate 0.0588 Epoch: 13 Global Step: 135820 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:54:39,534-Speed 5964.97 samples/sec Loss 5.6176 LearningRate 0.0588 Epoch: 13 Global Step: 135830 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:54:46,385-Speed 5980.69 samples/sec Loss 5.6304 LearningRate 0.0588 Epoch: 13 Global Step: 135840 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:54:53,278-Speed 5945.58 samples/sec Loss 5.6132 LearningRate 0.0588 Epoch: 13 Global Step: 135850 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:55:00,141-Speed 5969.02 samples/sec Loss 5.5922 LearningRate 0.0587 Epoch: 13 Global Step: 135860 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:55:06,995-Speed 5977.09 samples/sec Loss 5.6442 LearningRate 0.0587 Epoch: 13 Global Step: 135870 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:55:13,880-Speed 5950.68 samples/sec Loss 5.6521 LearningRate 0.0587 Epoch: 13 Global Step: 135880 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:55:20,759-Speed 5955.01 samples/sec Loss 5.6494 LearningRate 0.0587 Epoch: 13 Global Step: 135890 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:55:27,633-Speed 5960.36 samples/sec Loss 5.5925 LearningRate 0.0587 Epoch: 13 Global Step: 135900 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:55:34,499-Speed 5966.28 samples/sec Loss 5.6523 LearningRate 0.0587 Epoch: 13 Global Step: 135910 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:55:41,349-Speed 5981.10 samples/sec Loss 5.6603 LearningRate 0.0586 Epoch: 13 Global Step: 135920 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:55:48,219-Speed 5963.95 samples/sec Loss 5.6488 LearningRate 0.0586 Epoch: 13 Global Step: 135930 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:55:55,077-Speed 5973.72 samples/sec Loss 5.6407 LearningRate 0.0586 Epoch: 13 Global Step: 135940 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:56:01,937-Speed 5972.00 samples/sec Loss 5.6407 LearningRate 0.0586 Epoch: 13 Global Step: 135950 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:56:08,823-Speed 5950.05 samples/sec Loss 5.6215 LearningRate 0.0586 Epoch: 13 Global Step: 135960 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:56:15,678-Speed 5976.56 samples/sec Loss 5.6102 LearningRate 0.0586 Epoch: 13 Global Step: 135970 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:56:22,539-Speed 5970.65 samples/sec Loss 5.6229 LearningRate 0.0585 Epoch: 13 Global Step: 135980 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:56:29,419-Speed 5954.31 samples/sec Loss 5.6038 LearningRate 0.0585 Epoch: 13 Global Step: 135990 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:56:36,340-Speed 5919.56 samples/sec Loss 5.6086 LearningRate 0.0585 Epoch: 13 Global Step: 136000 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:56:43,207-Speed 5968.56 samples/sec Loss 5.6200 LearningRate 0.0585 Epoch: 13 Global Step: 136010 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:56:50,074-Speed 5965.30 samples/sec Loss 5.5646 LearningRate 0.0585 Epoch: 13 Global Step: 136020 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:56:56,941-Speed 5966.40 samples/sec Loss 5.5904 LearningRate 0.0585 Epoch: 13 Global Step: 136030 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:57:03,807-Speed 5966.33 samples/sec Loss 5.5816 LearningRate 0.0584 Epoch: 13 Global Step: 136040 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:57:10,671-Speed 5968.56 samples/sec Loss 5.5698 LearningRate 0.0584 Epoch: 13 Global Step: 136050 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:57:17,548-Speed 5957.48 samples/sec Loss 5.6296 LearningRate 0.0584 Epoch: 13 Global Step: 136060 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:57:24,405-Speed 5975.27 samples/sec Loss 5.6168 LearningRate 0.0584 Epoch: 13 Global Step: 136070 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:57:31,281-Speed 5958.96 samples/sec Loss 5.5679 LearningRate 0.0584 Epoch: 13 Global Step: 136080 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:57:38,162-Speed 5954.56 samples/sec Loss 5.6235 LearningRate 0.0584 Epoch: 13 Global Step: 136090 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:57:45,018-Speed 5975.26 samples/sec Loss 5.6297 LearningRate 0.0583 Epoch: 13 Global Step: 136100 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:57:51,859-Speed 5988.32 samples/sec Loss 5.6176 LearningRate 0.0583 Epoch: 13 Global Step: 136110 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:57:58,742-Speed 5955.15 samples/sec Loss 5.5682 LearningRate 0.0583 Epoch: 13 Global Step: 136120 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:58:05,616-Speed 5959.30 samples/sec Loss 5.5670 LearningRate 0.0583 Epoch: 13 Global Step: 136130 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:58:12,469-Speed 5978.54 samples/sec Loss 5.5769 LearningRate 0.0583 Epoch: 13 Global Step: 136140 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:58:19,324-Speed 5976.62 samples/sec Loss 5.5691 LearningRate 0.0583 Epoch: 13 Global Step: 136150 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:58:26,180-Speed 5975.84 samples/sec Loss 5.6237 LearningRate 0.0582 Epoch: 13 Global Step: 136160 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:58:33,055-Speed 5958.59 samples/sec Loss 5.6060 LearningRate 0.0582 Epoch: 13 Global Step: 136170 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:58:39,909-Speed 5977.84 samples/sec Loss 5.6810 LearningRate 0.0582 Epoch: 13 Global Step: 136180 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 22:58:46,769-Speed 5971.50 samples/sec Loss 5.6305 LearningRate 0.0582 Epoch: 13 Global Step: 136190 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:58:53,626-Speed 5974.85 samples/sec Loss 5.5471 LearningRate 0.0582 Epoch: 13 Global Step: 136200 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:59:00,510-Speed 5952.20 samples/sec Loss 5.6316 LearningRate 0.0582 Epoch: 13 Global Step: 136210 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:59:07,377-Speed 5965.38 samples/sec Loss 5.5869 LearningRate 0.0581 Epoch: 13 Global Step: 136220 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:59:14,238-Speed 5971.64 samples/sec Loss 5.5925 LearningRate 0.0581 Epoch: 13 Global Step: 136230 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:59:21,109-Speed 5962.15 samples/sec Loss 5.6287 LearningRate 0.0581 Epoch: 13 Global Step: 136240 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:59:27,956-Speed 5983.24 samples/sec Loss 5.6227 LearningRate 0.0581 Epoch: 13 Global Step: 136250 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:59:34,814-Speed 5975.01 samples/sec Loss 5.5849 LearningRate 0.0581 Epoch: 13 Global Step: 136260 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:59:41,820-Speed 5847.86 samples/sec Loss 5.6347 LearningRate 0.0581 Epoch: 13 Global Step: 136270 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:59:48,783-Speed 5884.13 samples/sec Loss 5.6026 LearningRate 0.0580 Epoch: 13 Global Step: 136280 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 22:59:55,644-Speed 5971.38 samples/sec Loss 5.6286 LearningRate 0.0580 Epoch: 13 Global Step: 136290 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:00:02,488-Speed 5986.10 samples/sec Loss 5.6120 LearningRate 0.0580 Epoch: 13 Global Step: 136300 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:00:19,698-Speed 2380.26 samples/sec Loss 5.5968 LearningRate 0.0580 Epoch: 13 Global Step: 136310 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:00:26,537-Speed 5989.70 samples/sec Loss 5.5845 LearningRate 0.0580 Epoch: 13 Global Step: 136320 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:00:33,369-Speed 5996.43 samples/sec Loss 5.5745 LearningRate 0.0580 Epoch: 13 Global Step: 136330 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:00:40,218-Speed 5982.57 samples/sec Loss 5.6614 LearningRate 0.0579 Epoch: 13 Global Step: 136340 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:00:47,075-Speed 5974.36 samples/sec Loss 5.6228 LearningRate 0.0579 Epoch: 13 Global Step: 136350 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:00:53,951-Speed 5958.59 samples/sec Loss 5.5622 LearningRate 0.0579 Epoch: 13 Global Step: 136360 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:01:00,804-Speed 5977.91 samples/sec Loss 5.6420 LearningRate 0.0579 Epoch: 13 Global Step: 136370 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:01:07,786-Speed 5869.66 samples/sec Loss 5.5290 LearningRate 0.0579 Epoch: 13 Global Step: 136380 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:01:14,663-Speed 5957.46 samples/sec Loss 5.5618 LearningRate 0.0579 Epoch: 13 Global Step: 136390 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:01:21,522-Speed 5972.68 samples/sec Loss 5.5815 LearningRate 0.0579 Epoch: 13 Global Step: 136400 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:01:28,378-Speed 5974.99 samples/sec Loss 5.5835 LearningRate 0.0578 Epoch: 13 Global Step: 136410 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:01:35,263-Speed 5950.59 samples/sec Loss 5.6203 LearningRate 0.0578 Epoch: 13 Global Step: 136420 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:01:42,118-Speed 5976.00 samples/sec Loss 5.5805 LearningRate 0.0578 Epoch: 13 Global Step: 136430 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:01:48,987-Speed 5964.92 samples/sec Loss 5.5425 LearningRate 0.0578 Epoch: 13 Global Step: 136440 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:01:55,842-Speed 5975.73 samples/sec Loss 5.6040 LearningRate 0.0578 Epoch: 13 Global Step: 136450 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:02:02,739-Speed 5940.07 samples/sec Loss 5.5734 LearningRate 0.0578 Epoch: 13 Global Step: 136460 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:02:09,615-Speed 5970.03 samples/sec Loss 5.5462 LearningRate 0.0577 Epoch: 13 Global Step: 136470 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:02:16,465-Speed 5980.14 samples/sec Loss 5.5587 LearningRate 0.0577 Epoch: 13 Global Step: 136480 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:02:23,305-Speed 5991.52 samples/sec Loss 5.5586 LearningRate 0.0577 Epoch: 13 Global Step: 136490 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:02:30,173-Speed 5966.39 samples/sec Loss 5.6011 LearningRate 0.0577 Epoch: 13 Global Step: 136500 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:02:37,042-Speed 5963.82 samples/sec Loss 5.5778 LearningRate 0.0577 Epoch: 13 Global Step: 136510 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:02:43,906-Speed 5970.72 samples/sec Loss 5.5610 LearningRate 0.0577 Epoch: 13 Global Step: 136520 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:02:50,760-Speed 5977.40 samples/sec Loss 5.5752 LearningRate 0.0576 Epoch: 13 Global Step: 136530 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:02:57,608-Speed 5982.05 samples/sec Loss 5.5444 LearningRate 0.0576 Epoch: 13 Global Step: 136540 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:03:04,521-Speed 5926.45 samples/sec Loss 5.5484 LearningRate 0.0576 Epoch: 13 Global Step: 136550 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:03:11,452-Speed 5910.94 samples/sec Loss 5.5395 LearningRate 0.0576 Epoch: 13 Global Step: 136560 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:03:18,310-Speed 5972.90 samples/sec Loss 5.5666 LearningRate 0.0576 Epoch: 13 Global Step: 136570 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:03:25,162-Speed 5979.98 samples/sec Loss 5.5447 LearningRate 0.0576 Epoch: 13 Global Step: 136580 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:03:32,033-Speed 5962.14 samples/sec Loss 5.5420 LearningRate 0.0575 Epoch: 13 Global Step: 136590 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:03:38,888-Speed 5976.31 samples/sec Loss 5.5921 LearningRate 0.0575 Epoch: 13 Global Step: 136600 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:03:45,755-Speed 5966.09 samples/sec Loss 5.5881 LearningRate 0.0575 Epoch: 13 Global Step: 136610 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:03:52,631-Speed 5958.25 samples/sec Loss 5.5481 LearningRate 0.0575 Epoch: 13 Global Step: 136620 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:03:59,461-Speed 5997.58 samples/sec Loss 5.5755 LearningRate 0.0575 Epoch: 13 Global Step: 136630 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 23:04:06,311-Speed 5980.71 samples/sec Loss 5.5998 LearningRate 0.0575 Epoch: 13 Global Step: 136640 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 23:04:13,163-Speed 5979.03 samples/sec Loss 5.5600 LearningRate 0.0574 Epoch: 13 Global Step: 136650 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 23:04:20,024-Speed 5971.52 samples/sec Loss 5.5624 LearningRate 0.0574 Epoch: 13 Global Step: 136660 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 23:04:26,870-Speed 5985.82 samples/sec Loss 5.6042 LearningRate 0.0574 Epoch: 13 Global Step: 136670 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 23:04:33,733-Speed 5970.80 samples/sec Loss 5.5722 LearningRate 0.0574 Epoch: 13 Global Step: 136680 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 23:04:40,580-Speed 5983.80 samples/sec Loss 5.5413 LearningRate 0.0574 Epoch: 13 Global Step: 136690 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 23:04:47,445-Speed 5967.61 samples/sec Loss 5.5819 LearningRate 0.0574 Epoch: 13 Global Step: 136700 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 23:04:54,301-Speed 5975.64 samples/sec Loss 5.5339 LearningRate 0.0573 Epoch: 13 Global Step: 136710 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 23:05:01,203-Speed 5935.21 samples/sec Loss 5.5092 LearningRate 0.0573 Epoch: 13 Global Step: 136720 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 23:05:08,066-Speed 5969.39 samples/sec Loss 5.5475 LearningRate 0.0573 Epoch: 13 Global Step: 136730 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:05:14,950-Speed 5951.68 samples/sec Loss 5.6369 LearningRate 0.0573 Epoch: 13 Global Step: 136740 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:05:21,832-Speed 5952.95 samples/sec Loss 5.5436 LearningRate 0.0573 Epoch: 13 Global Step: 136750 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:05:28,688-Speed 5975.23 samples/sec Loss 5.5552 LearningRate 0.0573 Epoch: 13 Global Step: 136760 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:05:35,559-Speed 5963.08 samples/sec Loss 5.5570 LearningRate 0.0572 Epoch: 13 Global Step: 136770 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:05:42,408-Speed 5980.99 samples/sec Loss 5.4925 LearningRate 0.0572 Epoch: 13 Global Step: 136780 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:05:49,303-Speed 5943.27 samples/sec Loss 5.5485 LearningRate 0.0572 Epoch: 13 Global Step: 136790 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:05:56,145-Speed 5989.88 samples/sec Loss 5.5809 LearningRate 0.0572 Epoch: 13 Global Step: 136800 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:06:03,006-Speed 5971.01 samples/sec Loss 5.5743 LearningRate 0.0572 Epoch: 13 Global Step: 136810 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 23:06:09,850-Speed 5986.68 samples/sec Loss 5.5532 LearningRate 0.0572 Epoch: 13 Global Step: 136820 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 23:06:16,695-Speed 5985.11 samples/sec Loss 5.6157 LearningRate 0.0572 Epoch: 13 Global Step: 136830 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 23:06:23,540-Speed 5984.68 samples/sec Loss 5.5492 LearningRate 0.0571 Epoch: 13 Global Step: 136840 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 23:06:30,387-Speed 5985.38 samples/sec Loss 5.5474 LearningRate 0.0571 Epoch: 13 Global Step: 136850 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 23:06:37,236-Speed 5982.08 samples/sec Loss 5.5476 LearningRate 0.0571 Epoch: 13 Global Step: 136860 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 23:06:44,075-Speed 5989.49 samples/sec Loss 5.5439 LearningRate 0.0571 Epoch: 13 Global Step: 136870 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 23:06:50,914-Speed 5989.99 samples/sec Loss 5.5509 LearningRate 0.0571 Epoch: 13 Global Step: 136880 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 23:06:57,788-Speed 5962.45 samples/sec Loss 5.5594 LearningRate 0.0571 Epoch: 13 Global Step: 136890 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 23:07:04,634-Speed 5983.27 samples/sec Loss 5.5291 LearningRate 0.0570 Epoch: 13 Global Step: 136900 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 23:07:11,493-Speed 5972.97 samples/sec Loss 5.5137 LearningRate 0.0570 Epoch: 13 Global Step: 136910 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:07:18,346-Speed 5977.99 samples/sec Loss 5.5585 LearningRate 0.0570 Epoch: 13 Global Step: 136920 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:07:25,218-Speed 5961.69 samples/sec Loss 5.6094 LearningRate 0.0570 Epoch: 13 Global Step: 136930 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:07:32,100-Speed 5953.30 samples/sec Loss 5.5387 LearningRate 0.0570 Epoch: 13 Global Step: 136940 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:07:38,973-Speed 5961.68 samples/sec Loss 5.5292 LearningRate 0.0570 Epoch: 13 Global Step: 136950 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:07:45,825-Speed 5978.52 samples/sec Loss 5.5167 LearningRate 0.0569 Epoch: 13 Global Step: 136960 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:07:52,675-Speed 5981.04 samples/sec Loss 5.5527 LearningRate 0.0569 Epoch: 13 Global Step: 136970 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:07:59,527-Speed 5979.89 samples/sec Loss 5.5871 LearningRate 0.0569 Epoch: 13 Global Step: 136980 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:08:06,383-Speed 5975.37 samples/sec Loss 5.5555 LearningRate 0.0569 Epoch: 13 Global Step: 136990 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:08:13,263-Speed 5954.49 samples/sec Loss 5.5194 LearningRate 0.0569 Epoch: 13 Global Step: 137000 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:08:20,131-Speed 5965.79 samples/sec Loss 5.5547 LearningRate 0.0569 Epoch: 13 Global Step: 137010 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:08:27,006-Speed 5958.28 samples/sec Loss 5.5123 LearningRate 0.0568 Epoch: 13 Global Step: 137020 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:08:33,843-Speed 5991.78 samples/sec Loss 5.5602 LearningRate 0.0568 Epoch: 13 Global Step: 137030 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:08:40,701-Speed 5974.30 samples/sec Loss 5.5238 LearningRate 0.0568 Epoch: 13 Global Step: 137040 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:08:47,660-Speed 5886.69 samples/sec Loss 5.5223 LearningRate 0.0568 Epoch: 13 Global Step: 137050 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:08:54,622-Speed 5885.75 samples/sec Loss 5.5631 LearningRate 0.0568 Epoch: 13 Global Step: 137060 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:09:01,585-Speed 5883.69 samples/sec Loss 5.5270 LearningRate 0.0568 Epoch: 13 Global Step: 137070 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:09:08,443-Speed 5973.87 samples/sec Loss 5.5499 LearningRate 0.0567 Epoch: 13 Global Step: 137080 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:09:15,304-Speed 5971.64 samples/sec Loss 5.5644 LearningRate 0.0567 Epoch: 13 Global Step: 137090 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:09:22,182-Speed 5957.32 samples/sec Loss 5.5186 LearningRate 0.0567 Epoch: 13 Global Step: 137100 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:09:29,041-Speed 5972.63 samples/sec Loss 5.5393 LearningRate 0.0567 Epoch: 13 Global Step: 137110 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 23:09:35,905-Speed 5968.78 samples/sec Loss 5.5231 LearningRate 0.0567 Epoch: 13 Global Step: 137120 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 23:09:42,749-Speed 5986.72 samples/sec Loss 5.5456 LearningRate 0.0567 Epoch: 13 Global Step: 137130 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 23:09:49,592-Speed 5986.57 samples/sec Loss 5.5238 LearningRate 0.0567 Epoch: 13 Global Step: 137140 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 23:09:56,447-Speed 5976.77 samples/sec Loss 5.4931 LearningRate 0.0566 Epoch: 13 Global Step: 137150 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 23:10:03,294-Speed 5982.86 samples/sec Loss 5.5484 LearningRate 0.0566 Epoch: 13 Global Step: 137160 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 23:10:10,275-Speed 5868.92 samples/sec Loss 5.5251 LearningRate 0.0566 Epoch: 13 Global Step: 137170 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 23:10:17,146-Speed 5962.59 samples/sec Loss 5.5875 LearningRate 0.0566 Epoch: 13 Global Step: 137180 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 23:10:24,000-Speed 5977.88 samples/sec Loss 5.5498 LearningRate 0.0566 Epoch: 13 Global Step: 137190 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 23:10:30,863-Speed 5968.98 samples/sec Loss 5.5215 LearningRate 0.0566 Epoch: 13 Global Step: 137200 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-08 23:10:37,715-Speed 5978.88 samples/sec Loss 5.5746 LearningRate 0.0565 Epoch: 13 Global Step: 137210 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:10:44,581-Speed 5966.85 samples/sec Loss 5.4923 LearningRate 0.0565 Epoch: 13 Global Step: 137220 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:10:51,421-Speed 5989.16 samples/sec Loss 5.5632 LearningRate 0.0565 Epoch: 13 Global Step: 137230 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:10:58,262-Speed 5988.07 samples/sec Loss 5.4803 LearningRate 0.0565 Epoch: 13 Global Step: 137240 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:11:05,111-Speed 5982.33 samples/sec Loss 5.5025 LearningRate 0.0565 Epoch: 13 Global Step: 137250 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:11:11,972-Speed 5971.02 samples/sec Loss 5.4811 LearningRate 0.0565 Epoch: 13 Global Step: 137260 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:11:18,825-Speed 5977.89 samples/sec Loss 5.5073 LearningRate 0.0564 Epoch: 13 Global Step: 137270 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:11:25,685-Speed 5972.32 samples/sec Loss 5.4871 LearningRate 0.0564 Epoch: 13 Global Step: 137280 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:11:32,541-Speed 5975.13 samples/sec Loss 5.5188 LearningRate 0.0564 Epoch: 13 Global Step: 137290 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:11:39,463-Speed 5918.38 samples/sec Loss 5.5042 LearningRate 0.0564 Epoch: 13 Global Step: 137300 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:11:46,315-Speed 5979.07 samples/sec Loss 5.5517 LearningRate 0.0564 Epoch: 13 Global Step: 137310 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:11:53,160-Speed 5984.54 samples/sec Loss 5.5431 LearningRate 0.0564 Epoch: 13 Global Step: 137320 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:12:00,021-Speed 5973.02 samples/sec Loss 5.5422 LearningRate 0.0563 Epoch: 13 Global Step: 137330 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:12:06,896-Speed 5959.09 samples/sec Loss 5.5113 LearningRate 0.0563 Epoch: 13 Global Step: 137340 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:12:13,742-Speed 5986.63 samples/sec Loss 5.4762 LearningRate 0.0563 Epoch: 13 Global Step: 137350 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:12:20,612-Speed 5962.96 samples/sec Loss 5.5565 LearningRate 0.0563 Epoch: 13 Global Step: 137360 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:12:27,469-Speed 5975.71 samples/sec Loss 5.5299 LearningRate 0.0563 Epoch: 13 Global Step: 137370 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:12:34,316-Speed 5982.72 samples/sec Loss 5.5296 LearningRate 0.0563 Epoch: 13 Global Step: 137380 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:12:41,176-Speed 5971.49 samples/sec Loss 5.5649 LearningRate 0.0562 Epoch: 13 Global Step: 137390 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:12:48,032-Speed 5976.21 samples/sec Loss 5.5123 LearningRate 0.0562 Epoch: 13 Global Step: 137400 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:12:54,896-Speed 5967.77 samples/sec Loss 5.5040 LearningRate 0.0562 Epoch: 13 Global Step: 137410 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:13:01,770-Speed 5960.32 samples/sec Loss 5.4954 LearningRate 0.0562 Epoch: 13 Global Step: 137420 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:13:08,623-Speed 5981.20 samples/sec Loss 5.5346 LearningRate 0.0562 Epoch: 13 Global Step: 137430 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:13:15,547-Speed 5916.15 samples/sec Loss 5.5162 LearningRate 0.0562 Epoch: 13 Global Step: 137440 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:13:22,422-Speed 5959.34 samples/sec Loss 5.5684 LearningRate 0.0562 Epoch: 13 Global Step: 137450 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:13:29,287-Speed 5970.25 samples/sec Loss 5.4936 LearningRate 0.0561 Epoch: 13 Global Step: 137460 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:13:36,142-Speed 5976.11 samples/sec Loss 5.5416 LearningRate 0.0561 Epoch: 13 Global Step: 137470 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:13:43,019-Speed 5957.93 samples/sec Loss 5.5496 LearningRate 0.0561 Epoch: 13 Global Step: 137480 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:13:49,875-Speed 5975.22 samples/sec Loss 5.4918 LearningRate 0.0561 Epoch: 13 Global Step: 137490 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:13:56,826-Speed 5894.44 samples/sec Loss 5.4794 LearningRate 0.0561 Epoch: 13 Global Step: 137500 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:14:03,782-Speed 5890.11 samples/sec Loss 5.5031 LearningRate 0.0561 Epoch: 13 Global Step: 137510 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:14:10,647-Speed 5967.27 samples/sec Loss 5.5377 LearningRate 0.0560 Epoch: 13 Global Step: 137520 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:14:17,500-Speed 5978.40 samples/sec Loss 5.5508 LearningRate 0.0560 Epoch: 13 Global Step: 137530 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:14:24,417-Speed 5923.50 samples/sec Loss 5.4800 LearningRate 0.0560 Epoch: 13 Global Step: 137540 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:14:31,267-Speed 5980.40 samples/sec Loss 5.4885 LearningRate 0.0560 Epoch: 13 Global Step: 137550 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:14:38,129-Speed 5970.60 samples/sec Loss 5.5407 LearningRate 0.0560 Epoch: 13 Global Step: 137560 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:14:44,995-Speed 5966.97 samples/sec Loss 5.4918 LearningRate 0.0560 Epoch: 13 Global Step: 137570 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:14:51,850-Speed 5975.96 samples/sec Loss 5.5021 LearningRate 0.0559 Epoch: 13 Global Step: 137580 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:14:58,728-Speed 5956.00 samples/sec Loss 5.5324 LearningRate 0.0559 Epoch: 13 Global Step: 137590 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:15:05,578-Speed 5981.42 samples/sec Loss 5.4798 LearningRate 0.0559 Epoch: 13 Global Step: 137600 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:15:12,454-Speed 5958.88 samples/sec Loss 5.5202 LearningRate 0.0559 Epoch: 13 Global Step: 137610 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:15:19,329-Speed 5958.19 samples/sec Loss 5.5225 LearningRate 0.0559 Epoch: 13 Global Step: 137620 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:15:26,191-Speed 5970.46 samples/sec Loss 5.5147 LearningRate 0.0559 Epoch: 13 Global Step: 137630 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:15:33,059-Speed 5966.53 samples/sec Loss 5.4614 LearningRate 0.0558 Epoch: 13 Global Step: 137640 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:15:39,921-Speed 5969.90 samples/sec Loss 5.5659 LearningRate 0.0558 Epoch: 13 Global Step: 137650 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:15:46,803-Speed 5952.85 samples/sec Loss 5.4834 LearningRate 0.0558 Epoch: 13 Global Step: 137660 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:15:53,660-Speed 5975.09 samples/sec Loss 5.5087 LearningRate 0.0558 Epoch: 13 Global Step: 137670 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:16:00,523-Speed 5969.31 samples/sec Loss 5.5114 LearningRate 0.0558 Epoch: 13 Global Step: 137680 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:16:07,382-Speed 5972.87 samples/sec Loss 5.5152 LearningRate 0.0558 Epoch: 13 Global Step: 137690 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:16:14,264-Speed 5955.55 samples/sec Loss 5.5361 LearningRate 0.0558 Epoch: 13 Global Step: 137700 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:16:21,165-Speed 5936.69 samples/sec Loss 5.5111 LearningRate 0.0557 Epoch: 13 Global Step: 137710 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:16:28,056-Speed 5945.06 samples/sec Loss 5.5301 LearningRate 0.0557 Epoch: 13 Global Step: 137720 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:16:35,024-Speed 5879.47 samples/sec Loss 5.4774 LearningRate 0.0557 Epoch: 13 Global Step: 137730 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:16:41,992-Speed 5879.81 samples/sec Loss 5.5047 LearningRate 0.0557 Epoch: 13 Global Step: 137740 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:16:48,881-Speed 5947.17 samples/sec Loss 5.5611 LearningRate 0.0557 Epoch: 13 Global Step: 137750 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:16:55,768-Speed 5948.21 samples/sec Loss 5.4357 LearningRate 0.0557 Epoch: 13 Global Step: 137760 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:17:02,624-Speed 5975.50 samples/sec Loss 5.4850 LearningRate 0.0556 Epoch: 13 Global Step: 137770 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:17:09,504-Speed 5954.98 samples/sec Loss 5.5293 LearningRate 0.0556 Epoch: 13 Global Step: 137780 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:17:16,423-Speed 5922.01 samples/sec Loss 5.5528 LearningRate 0.0556 Epoch: 13 Global Step: 137790 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:17:23,411-Speed 5863.39 samples/sec Loss 5.4940 LearningRate 0.0556 Epoch: 13 Global Step: 137800 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-08 23:17:30,322-Speed 5927.63 samples/sec Loss 5.5057 LearningRate 0.0556 Epoch: 13 Global Step: 137810 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:17:37,173-Speed 5980.49 samples/sec Loss 5.4830 LearningRate 0.0556 Epoch: 13 Global Step: 137820 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:17:44,014-Speed 5988.90 samples/sec Loss 5.4952 LearningRate 0.0555 Epoch: 13 Global Step: 137830 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:17:50,866-Speed 5980.67 samples/sec Loss 5.4860 LearningRate 0.0555 Epoch: 13 Global Step: 137840 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:17:57,703-Speed 5991.36 samples/sec Loss 5.4950 LearningRate 0.0555 Epoch: 13 Global Step: 137850 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:18:04,562-Speed 5972.75 samples/sec Loss 5.4666 LearningRate 0.0555 Epoch: 13 Global Step: 137860 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:18:11,418-Speed 5976.16 samples/sec Loss 5.5241 LearningRate 0.0555 Epoch: 13 Global Step: 137870 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:18:18,289-Speed 5964.68 samples/sec Loss 5.5104 LearningRate 0.0555 Epoch: 13 Global Step: 137880 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:18:25,146-Speed 5974.27 samples/sec Loss 5.4884 LearningRate 0.0554 Epoch: 13 Global Step: 137890 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:18:31,992-Speed 5983.90 samples/sec Loss 5.5116 LearningRate 0.0554 Epoch: 13 Global Step: 137900 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:18:38,822-Speed 5998.26 samples/sec Loss 5.4465 LearningRate 0.0554 Epoch: 13 Global Step: 137910 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:18:45,671-Speed 5981.89 samples/sec Loss 5.5183 LearningRate 0.0554 Epoch: 13 Global Step: 137920 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:18:52,517-Speed 5984.71 samples/sec Loss 5.5000 LearningRate 0.0554 Epoch: 13 Global Step: 137930 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:18:59,436-Speed 5921.85 samples/sec Loss 5.4916 LearningRate 0.0554 Epoch: 13 Global Step: 137940 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:19:06,285-Speed 5981.05 samples/sec Loss 5.4585 LearningRate 0.0554 Epoch: 13 Global Step: 137950 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:19:13,149-Speed 5968.87 samples/sec Loss 5.4696 LearningRate 0.0553 Epoch: 13 Global Step: 137960 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:19:19,993-Speed 5986.19 samples/sec Loss 5.4968 LearningRate 0.0553 Epoch: 13 Global Step: 137970 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:19:26,849-Speed 5974.92 samples/sec Loss 5.4866 LearningRate 0.0553 Epoch: 13 Global Step: 137980 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:19:33,707-Speed 5973.45 samples/sec Loss 5.4958 LearningRate 0.0553 Epoch: 13 Global Step: 137990 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:19:40,562-Speed 5976.21 samples/sec Loss 5.4567 LearningRate 0.0553 Epoch: 13 Global Step: 138000 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:19:47,413-Speed 5979.38 samples/sec Loss 5.4752 LearningRate 0.0553 Epoch: 13 Global Step: 138010 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:19:54,266-Speed 5977.96 samples/sec Loss 5.4932 LearningRate 0.0552 Epoch: 13 Global Step: 138020 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:20:01,126-Speed 5971.80 samples/sec Loss 5.4376 LearningRate 0.0552 Epoch: 13 Global Step: 138030 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:20:07,974-Speed 5982.56 samples/sec Loss 5.4333 LearningRate 0.0552 Epoch: 13 Global Step: 138040 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:20:14,822-Speed 5984.62 samples/sec Loss 5.4673 LearningRate 0.0552 Epoch: 13 Global Step: 138050 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:20:21,678-Speed 5975.38 samples/sec Loss 5.4663 LearningRate 0.0552 Epoch: 13 Global Step: 138060 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:20:28,529-Speed 5979.20 samples/sec Loss 5.4977 LearningRate 0.0552 Epoch: 13 Global Step: 138070 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:20:35,400-Speed 5963.42 samples/sec Loss 5.4950 LearningRate 0.0551 Epoch: 13 Global Step: 138080 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:20:42,278-Speed 5956.10 samples/sec Loss 5.4801 LearningRate 0.0551 Epoch: 13 Global Step: 138090 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:20:49,129-Speed 5979.56 samples/sec Loss 5.4129 LearningRate 0.0551 Epoch: 13 Global Step: 138100 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:20:56,033-Speed 5934.38 samples/sec Loss 5.4875 LearningRate 0.0551 Epoch: 13 Global Step: 138110 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:21:02,885-Speed 5986.90 samples/sec Loss 5.4537 LearningRate 0.0551 Epoch: 13 Global Step: 138120 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:21:09,744-Speed 5972.12 samples/sec Loss 5.4789 LearningRate 0.0551 Epoch: 13 Global Step: 138130 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:21:16,608-Speed 5969.25 samples/sec Loss 5.5332 LearningRate 0.0550 Epoch: 13 Global Step: 138140 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:21:23,474-Speed 5966.58 samples/sec Loss 5.4364 LearningRate 0.0550 Epoch: 13 Global Step: 138150 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:21:30,318-Speed 5985.72 samples/sec Loss 5.4885 LearningRate 0.0550 Epoch: 13 Global Step: 138160 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:21:37,186-Speed 5965.01 samples/sec Loss 5.4694 LearningRate 0.0550 Epoch: 13 Global Step: 138170 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:21:44,046-Speed 5972.61 samples/sec Loss 5.4963 LearningRate 0.0550 Epoch: 13 Global Step: 138180 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:21:50,919-Speed 5960.98 samples/sec Loss 5.5063 LearningRate 0.0550 Epoch: 13 Global Step: 138190 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:21:57,772-Speed 5978.24 samples/sec Loss 5.4620 LearningRate 0.0550 Epoch: 13 Global Step: 138200 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:22:04,619-Speed 5985.46 samples/sec Loss 5.4495 LearningRate 0.0549 Epoch: 13 Global Step: 138210 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:22:11,472-Speed 5977.97 samples/sec Loss 5.4440 LearningRate 0.0549 Epoch: 13 Global Step: 138220 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:22:18,351-Speed 5956.12 samples/sec Loss 5.4477 LearningRate 0.0549 Epoch: 13 Global Step: 138230 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:22:25,225-Speed 5959.97 samples/sec Loss 5.4333 LearningRate 0.0549 Epoch: 13 Global Step: 138240 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:22:32,089-Speed 5968.25 samples/sec Loss 5.4839 LearningRate 0.0549 Epoch: 13 Global Step: 138250 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:22:38,936-Speed 5983.80 samples/sec Loss 5.4089 LearningRate 0.0549 Epoch: 13 Global Step: 138260 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:22:45,800-Speed 5971.00 samples/sec Loss 5.4749 LearningRate 0.0548 Epoch: 13 Global Step: 138270 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:22:52,657-Speed 5974.81 samples/sec Loss 5.4368 LearningRate 0.0548 Epoch: 13 Global Step: 138280 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:22:59,522-Speed 5967.55 samples/sec Loss 5.4413 LearningRate 0.0548 Epoch: 13 Global Step: 138290 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:23:06,390-Speed 5965.28 samples/sec Loss 5.5109 LearningRate 0.0548 Epoch: 13 Global Step: 138300 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:23:13,283-Speed 5943.43 samples/sec Loss 5.4908 LearningRate 0.0548 Epoch: 13 Global Step: 138310 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:23:20,128-Speed 5985.23 samples/sec Loss 5.4555 LearningRate 0.0548 Epoch: 13 Global Step: 138320 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:23:26,998-Speed 5963.71 samples/sec Loss 5.4425 LearningRate 0.0547 Epoch: 13 Global Step: 138330 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:23:33,850-Speed 5978.84 samples/sec Loss 5.4310 LearningRate 0.0547 Epoch: 13 Global Step: 138340 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:23:40,703-Speed 5977.68 samples/sec Loss 5.4339 LearningRate 0.0547 Epoch: 13 Global Step: 138350 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-08 23:23:47,567-Speed 5969.28 samples/sec Loss 5.4607 LearningRate 0.0547 Epoch: 13 Global Step: 138360 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-08 23:23:54,426-Speed 5972.12 samples/sec Loss 5.3832 LearningRate 0.0547 Epoch: 13 Global Step: 138370 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:24:01,307-Speed 5954.36 samples/sec Loss 5.4194 LearningRate 0.0547 Epoch: 13 Global Step: 138380 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:24:08,167-Speed 5971.57 samples/sec Loss 5.4502 LearningRate 0.0547 Epoch: 13 Global Step: 138390 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:24:15,049-Speed 5952.88 samples/sec Loss 5.4836 LearningRate 0.0546 Epoch: 13 Global Step: 138400 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:24:21,907-Speed 5974.07 samples/sec Loss 5.4378 LearningRate 0.0546 Epoch: 13 Global Step: 138410 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:24:28,758-Speed 5980.09 samples/sec Loss 5.5087 LearningRate 0.0546 Epoch: 13 Global Step: 138420 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:24:35,659-Speed 5936.27 samples/sec Loss 5.5057 LearningRate 0.0546 Epoch: 13 Global Step: 138430 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:24:42,511-Speed 5979.43 samples/sec Loss 5.4530 LearningRate 0.0546 Epoch: 13 Global Step: 138440 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:24:49,364-Speed 5980.57 samples/sec Loss 5.4651 LearningRate 0.0546 Epoch: 13 Global Step: 138450 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:24:56,228-Speed 5968.74 samples/sec Loss 5.4297 LearningRate 0.0545 Epoch: 13 Global Step: 138460 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:25:03,096-Speed 5965.19 samples/sec Loss 5.4420 LearningRate 0.0545 Epoch: 13 Global Step: 138470 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:25:09,944-Speed 5982.05 samples/sec Loss 5.4637 LearningRate 0.0545 Epoch: 13 Global Step: 138480 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:25:16,795-Speed 5980.02 samples/sec Loss 5.4873 LearningRate 0.0545 Epoch: 13 Global Step: 138490 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:25:23,666-Speed 5962.54 samples/sec Loss 5.4351 LearningRate 0.0545 Epoch: 13 Global Step: 138500 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:25:30,516-Speed 5980.57 samples/sec Loss 5.4586 LearningRate 0.0545 Epoch: 13 Global Step: 138510 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:25:37,393-Speed 5957.69 samples/sec Loss 5.4820 LearningRate 0.0544 Epoch: 13 Global Step: 138520 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-08 23:25:44,270-Speed 5957.26 samples/sec Loss 5.4763 LearningRate 0.0544 Epoch: 13 Global Step: 138530 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:25:51,123-Speed 5978.15 samples/sec Loss 5.4608 LearningRate 0.0544 Epoch: 13 Global Step: 138540 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:25:57,993-Speed 5963.15 samples/sec Loss 5.4547 LearningRate 0.0544 Epoch: 13 Global Step: 138550 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:26:04,856-Speed 5971.57 samples/sec Loss 5.4248 LearningRate 0.0544 Epoch: 13 Global Step: 138560 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:26:11,727-Speed 5962.17 samples/sec Loss 5.4255 LearningRate 0.0544 Epoch: 13 Global Step: 138570 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:26:18,599-Speed 5961.48 samples/sec Loss 5.4558 LearningRate 0.0544 Epoch: 13 Global Step: 138580 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:26:25,459-Speed 5971.87 samples/sec Loss 5.4412 LearningRate 0.0543 Epoch: 13 Global Step: 138590 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:26:32,346-Speed 5949.23 samples/sec Loss 5.4539 LearningRate 0.0543 Epoch: 13 Global Step: 138600 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:26:39,208-Speed 5969.25 samples/sec Loss 5.3861 LearningRate 0.0543 Epoch: 13 Global Step: 138610 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:26:46,061-Speed 5978.73 samples/sec Loss 5.3963 LearningRate 0.0543 Epoch: 13 Global Step: 138620 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:26:52,921-Speed 5972.06 samples/sec Loss 5.4729 LearningRate 0.0543 Epoch: 13 Global Step: 138630 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:26:59,784-Speed 5969.11 samples/sec Loss 5.4661 LearningRate 0.0543 Epoch: 13 Global Step: 138640 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:27:06,635-Speed 5979.35 samples/sec Loss 5.4001 LearningRate 0.0542 Epoch: 13 Global Step: 138650 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:27:13,509-Speed 5962.58 samples/sec Loss 5.4401 LearningRate 0.0542 Epoch: 13 Global Step: 138660 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:27:20,389-Speed 5954.48 samples/sec Loss 5.4371 LearningRate 0.0542 Epoch: 13 Global Step: 138670 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-08 23:27:27,232-Speed 5986.82 samples/sec Loss 5.4392 LearningRate 0.0542 Epoch: 13 Global Step: 138680 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:27:34,080-Speed 5985.04 samples/sec Loss 5.3717 LearningRate 0.0542 Epoch: 13 Global Step: 138690 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:27:40,933-Speed 5977.69 samples/sec Loss 5.3938 LearningRate 0.0542 Epoch: 13 Global Step: 138700 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:27:47,807-Speed 5959.85 samples/sec Loss 5.4032 LearningRate 0.0541 Epoch: 13 Global Step: 138710 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:27:54,664-Speed 5974.28 samples/sec Loss 5.3673 LearningRate 0.0541 Epoch: 13 Global Step: 138720 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:28:01,522-Speed 5973.80 samples/sec Loss 5.3725 LearningRate 0.0541 Epoch: 13 Global Step: 138730 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:28:08,397-Speed 5959.18 samples/sec Loss 5.4145 LearningRate 0.0541 Epoch: 13 Global Step: 138740 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:28:15,249-Speed 5979.20 samples/sec Loss 5.4473 LearningRate 0.0541 Epoch: 13 Global Step: 138750 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:28:22,126-Speed 5956.97 samples/sec Loss 5.4094 LearningRate 0.0541 Epoch: 13 Global Step: 138760 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:28:28,988-Speed 5971.07 samples/sec Loss 5.4523 LearningRate 0.0541 Epoch: 13 Global Step: 138770 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:28:35,849-Speed 5971.20 samples/sec Loss 5.4367 LearningRate 0.0540 Epoch: 13 Global Step: 138780 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:28:42,735-Speed 5949.35 samples/sec Loss 5.4092 LearningRate 0.0540 Epoch: 13 Global Step: 138790 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:28:49,615-Speed 5954.55 samples/sec Loss 5.3779 LearningRate 0.0540 Epoch: 13 Global Step: 138800 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:28:56,473-Speed 5974.41 samples/sec Loss 5.4365 LearningRate 0.0540 Epoch: 13 Global Step: 138810 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:29:03,334-Speed 5970.75 samples/sec Loss 5.4265 LearningRate 0.0540 Epoch: 13 Global Step: 138820 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:29:10,190-Speed 5975.90 samples/sec Loss 5.4517 LearningRate 0.0540 Epoch: 13 Global Step: 138830 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:29:17,033-Speed 5986.09 samples/sec Loss 5.4631 LearningRate 0.0539 Epoch: 13 Global Step: 138840 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:29:23,885-Speed 5978.81 samples/sec Loss 5.4402 LearningRate 0.0539 Epoch: 13 Global Step: 138850 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:29:30,765-Speed 5955.65 samples/sec Loss 5.3863 LearningRate 0.0539 Epoch: 13 Global Step: 138860 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:29:37,641-Speed 5958.58 samples/sec Loss 5.4367 LearningRate 0.0539 Epoch: 13 Global Step: 138870 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:29:44,503-Speed 5970.35 samples/sec Loss 5.4321 LearningRate 0.0539 Epoch: 13 Global Step: 138880 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:29:51,361-Speed 5973.28 samples/sec Loss 5.4301 LearningRate 0.0539 Epoch: 13 Global Step: 138890 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:29:58,217-Speed 5977.11 samples/sec Loss 5.3878 LearningRate 0.0538 Epoch: 13 Global Step: 138900 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:30:05,061-Speed 5986.01 samples/sec Loss 5.4295 LearningRate 0.0538 Epoch: 13 Global Step: 138910 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:30:11,908-Speed 5983.58 samples/sec Loss 5.4338 LearningRate 0.0538 Epoch: 13 Global Step: 138920 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:30:18,758-Speed 5980.36 samples/sec Loss 5.4407 LearningRate 0.0538 Epoch: 13 Global Step: 138930 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:30:25,614-Speed 5975.83 samples/sec Loss 5.4108 LearningRate 0.0538 Epoch: 13 Global Step: 138940 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:30:32,476-Speed 5969.81 samples/sec Loss 5.3812 LearningRate 0.0538 Epoch: 13 Global Step: 138950 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:30:39,346-Speed 5963.27 samples/sec Loss 5.4174 LearningRate 0.0538 Epoch: 13 Global Step: 138960 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:30:46,209-Speed 5969.32 samples/sec Loss 5.3994 LearningRate 0.0537 Epoch: 13 Global Step: 138970 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:30:53,073-Speed 5970.09 samples/sec Loss 5.4352 LearningRate 0.0537 Epoch: 13 Global Step: 138980 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:30:59,932-Speed 5972.33 samples/sec Loss 5.3541 LearningRate 0.0537 Epoch: 13 Global Step: 138990 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:31:06,792-Speed 5972.21 samples/sec Loss 5.4022 LearningRate 0.0537 Epoch: 13 Global Step: 139000 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:31:13,654-Speed 5970.07 samples/sec Loss 5.4101 LearningRate 0.0537 Epoch: 13 Global Step: 139010 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:31:20,516-Speed 5971.24 samples/sec Loss 5.3809 LearningRate 0.0537 Epoch: 13 Global Step: 139020 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:31:27,375-Speed 5973.18 samples/sec Loss 5.4326 LearningRate 0.0536 Epoch: 13 Global Step: 139030 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-08 23:31:34,229-Speed 5979.05 samples/sec Loss 5.3650 LearningRate 0.0536 Epoch: 13 Global Step: 139040 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:31:41,088-Speed 5973.19 samples/sec Loss 5.4257 LearningRate 0.0536 Epoch: 13 Global Step: 139050 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:31:47,953-Speed 5967.41 samples/sec Loss 5.3792 LearningRate 0.0536 Epoch: 13 Global Step: 139060 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:31:54,799-Speed 5983.46 samples/sec Loss 5.4162 LearningRate 0.0536 Epoch: 13 Global Step: 139070 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:32:01,701-Speed 5935.85 samples/sec Loss 5.4407 LearningRate 0.0536 Epoch: 13 Global Step: 139080 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:32:08,567-Speed 5967.01 samples/sec Loss 5.3977 LearningRate 0.0535 Epoch: 13 Global Step: 139090 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:32:15,415-Speed 5982.97 samples/sec Loss 5.4271 LearningRate 0.0535 Epoch: 13 Global Step: 139100 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:32:22,266-Speed 5980.27 samples/sec Loss 5.4300 LearningRate 0.0535 Epoch: 13 Global Step: 139110 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:32:29,115-Speed 5983.38 samples/sec Loss 5.3544 LearningRate 0.0535 Epoch: 13 Global Step: 139120 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:32:35,980-Speed 5967.49 samples/sec Loss 5.4502 LearningRate 0.0535 Epoch: 13 Global Step: 139130 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:32:42,865-Speed 5954.36 samples/sec Loss 5.4050 LearningRate 0.0535 Epoch: 13 Global Step: 139140 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:32:49,716-Speed 5979.61 samples/sec Loss 5.4014 LearningRate 0.0535 Epoch: 13 Global Step: 139150 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:32:56,608-Speed 5944.53 samples/sec Loss 5.3703 LearningRate 0.0534 Epoch: 13 Global Step: 139160 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:33:03,465-Speed 5974.81 samples/sec Loss 5.3702 LearningRate 0.0534 Epoch: 13 Global Step: 139170 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:33:10,337-Speed 5961.22 samples/sec Loss 5.3325 LearningRate 0.0534 Epoch: 13 Global Step: 139180 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:33:17,203-Speed 5967.68 samples/sec Loss 5.3706 LearningRate 0.0534 Epoch: 13 Global Step: 139190 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:33:24,158-Speed 5890.76 samples/sec Loss 5.3642 LearningRate 0.0534 Epoch: 13 Global Step: 139200 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:33:31,077-Speed 5921.45 samples/sec Loss 5.4298 LearningRate 0.0534 Epoch: 13 Global Step: 139210 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:33:37,954-Speed 5957.39 samples/sec Loss 5.4297 LearningRate 0.0533 Epoch: 13 Global Step: 139220 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:33:44,885-Speed 5911.23 samples/sec Loss 5.4391 LearningRate 0.0533 Epoch: 13 Global Step: 139230 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:33:51,828-Speed 5900.44 samples/sec Loss 5.3757 LearningRate 0.0533 Epoch: 13 Global Step: 139240 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:33:58,765-Speed 5905.47 samples/sec Loss 5.4109 LearningRate 0.0533 Epoch: 13 Global Step: 139250 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:34:05,653-Speed 5947.72 samples/sec Loss 5.3743 LearningRate 0.0533 Epoch: 13 Global Step: 139260 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:34:12,501-Speed 5982.33 samples/sec Loss 5.4015 LearningRate 0.0533 Epoch: 13 Global Step: 139270 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:34:19,346-Speed 5986.32 samples/sec Loss 5.4091 LearningRate 0.0533 Epoch: 13 Global Step: 139280 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:34:26,192-Speed 5984.83 samples/sec Loss 5.3568 LearningRate 0.0532 Epoch: 13 Global Step: 139290 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:34:33,055-Speed 5968.89 samples/sec Loss 5.3670 LearningRate 0.0532 Epoch: 13 Global Step: 139300 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:34:39,929-Speed 5960.71 samples/sec Loss 5.4473 LearningRate 0.0532 Epoch: 13 Global Step: 139310 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:34:46,771-Speed 5987.04 samples/sec Loss 5.3275 LearningRate 0.0532 Epoch: 13 Global Step: 139320 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:34:53,616-Speed 5985.02 samples/sec Loss 5.3281 LearningRate 0.0532 Epoch: 13 Global Step: 139330 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:35:00,461-Speed 5984.71 samples/sec Loss 5.4140 LearningRate 0.0532 Epoch: 13 Global Step: 139340 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:35:07,333-Speed 5961.49 samples/sec Loss 5.3832 LearningRate 0.0531 Epoch: 13 Global Step: 139350 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:35:14,188-Speed 5976.83 samples/sec Loss 5.3734 LearningRate 0.0531 Epoch: 13 Global Step: 139360 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:35:21,135-Speed 5896.57 samples/sec Loss 5.4106 LearningRate 0.0531 Epoch: 13 Global Step: 139370 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:35:28,099-Speed 5883.17 samples/sec Loss 5.3587 LearningRate 0.0531 Epoch: 13 Global Step: 139380 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:35:34,957-Speed 5973.81 samples/sec Loss 5.3545 LearningRate 0.0531 Epoch: 13 Global Step: 139390 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:35:41,813-Speed 5975.28 samples/sec Loss 5.4270 LearningRate 0.0531 Epoch: 13 Global Step: 139400 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:35:48,683-Speed 5966.21 samples/sec Loss 5.3506 LearningRate 0.0530 Epoch: 13 Global Step: 139410 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:35:55,537-Speed 5977.51 samples/sec Loss 5.3212 LearningRate 0.0530 Epoch: 13 Global Step: 139420 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:36:02,387-Speed 5981.03 samples/sec Loss 5.3608 LearningRate 0.0530 Epoch: 13 Global Step: 139430 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:36:09,244-Speed 5975.02 samples/sec Loss 5.3918 LearningRate 0.0530 Epoch: 13 Global Step: 139440 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:36:16,118-Speed 5959.40 samples/sec Loss 5.3384 LearningRate 0.0530 Epoch: 13 Global Step: 139450 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:36:22,954-Speed 5993.17 samples/sec Loss 5.4349 LearningRate 0.0530 Epoch: 13 Global Step: 139460 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:36:29,804-Speed 5980.86 samples/sec Loss 5.3388 LearningRate 0.0530 Epoch: 13 Global Step: 139470 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:36:36,659-Speed 5976.23 samples/sec Loss 5.4588 LearningRate 0.0529 Epoch: 13 Global Step: 139480 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:36:43,554-Speed 5942.15 samples/sec Loss 5.3177 LearningRate 0.0529 Epoch: 13 Global Step: 139490 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:36:50,442-Speed 5949.43 samples/sec Loss 5.3142 LearningRate 0.0529 Epoch: 13 Global Step: 139500 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:36:57,291-Speed 5981.18 samples/sec Loss 5.4157 LearningRate 0.0529 Epoch: 13 Global Step: 139510 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:37:04,146-Speed 5977.75 samples/sec Loss 5.3368 LearningRate 0.0529 Epoch: 13 Global Step: 139520 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:37:11,015-Speed 5968.92 samples/sec Loss 5.3419 LearningRate 0.0529 Epoch: 13 Global Step: 139530 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:37:17,880-Speed 5967.71 samples/sec Loss 5.3331 LearningRate 0.0528 Epoch: 13 Global Step: 139540 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:37:24,755-Speed 5959.29 samples/sec Loss 5.3234 LearningRate 0.0528 Epoch: 13 Global Step: 139550 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:37:31,618-Speed 5971.02 samples/sec Loss 5.3975 LearningRate 0.0528 Epoch: 13 Global Step: 139560 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:37:38,481-Speed 5969.28 samples/sec Loss 5.3791 LearningRate 0.0528 Epoch: 13 Global Step: 139570 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:37:45,336-Speed 5976.64 samples/sec Loss 5.3573 LearningRate 0.0528 Epoch: 13 Global Step: 139580 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:37:52,199-Speed 5971.77 samples/sec Loss 5.3322 LearningRate 0.0528 Epoch: 13 Global Step: 139590 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:37:59,054-Speed 5975.73 samples/sec Loss 5.3910 LearningRate 0.0528 Epoch: 13 Global Step: 139600 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:38:05,901-Speed 5983.51 samples/sec Loss 5.3900 LearningRate 0.0527 Epoch: 13 Global Step: 139610 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:38:12,776-Speed 5960.54 samples/sec Loss 5.3615 LearningRate 0.0527 Epoch: 13 Global Step: 139620 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:38:19,625-Speed 5981.08 samples/sec Loss 5.3294 LearningRate 0.0527 Epoch: 13 Global Step: 139630 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:38:26,479-Speed 5977.53 samples/sec Loss 5.3832 LearningRate 0.0527 Epoch: 13 Global Step: 139640 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:38:33,327-Speed 5982.71 samples/sec Loss 5.4077 LearningRate 0.0527 Epoch: 13 Global Step: 139650 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:38:40,180-Speed 5977.53 samples/sec Loss 5.3803 LearningRate 0.0527 Epoch: 13 Global Step: 139660 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:38:47,074-Speed 5942.54 samples/sec Loss 5.3200 LearningRate 0.0526 Epoch: 13 Global Step: 139670 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:38:53,932-Speed 5974.23 samples/sec Loss 5.3792 LearningRate 0.0526 Epoch: 13 Global Step: 139680 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:39:00,804-Speed 5961.52 samples/sec Loss 5.3463 LearningRate 0.0526 Epoch: 13 Global Step: 139690 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:39:07,665-Speed 5971.57 samples/sec Loss 5.3646 LearningRate 0.0526 Epoch: 13 Global Step: 139700 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:39:14,523-Speed 5973.69 samples/sec Loss 5.3677 LearningRate 0.0526 Epoch: 13 Global Step: 139710 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:39:21,376-Speed 5978.19 samples/sec Loss 5.3996 LearningRate 0.0526 Epoch: 13 Global Step: 139720 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:39:28,256-Speed 5954.80 samples/sec Loss 5.3847 LearningRate 0.0526 Epoch: 13 Global Step: 139730 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:39:35,116-Speed 5971.95 samples/sec Loss 5.3318 LearningRate 0.0525 Epoch: 13 Global Step: 139740 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:39:41,963-Speed 5983.07 samples/sec Loss 5.3876 LearningRate 0.0525 Epoch: 13 Global Step: 139750 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:39:48,899-Speed 5907.12 samples/sec Loss 5.3668 LearningRate 0.0525 Epoch: 13 Global Step: 139760 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:39:55,798-Speed 5937.24 samples/sec Loss 5.3542 LearningRate 0.0525 Epoch: 13 Global Step: 139770 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:40:02,650-Speed 5979.47 samples/sec Loss 5.4074 LearningRate 0.0525 Epoch: 13 Global Step: 139780 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:40:09,494-Speed 5985.96 samples/sec Loss 5.3384 LearningRate 0.0525 Epoch: 13 Global Step: 139790 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:40:16,370-Speed 5958.37 samples/sec Loss 5.3522 LearningRate 0.0524 Epoch: 13 Global Step: 139800 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:40:23,228-Speed 5973.55 samples/sec Loss 5.3781 LearningRate 0.0524 Epoch: 13 Global Step: 139810 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:40:30,084-Speed 5975.83 samples/sec Loss 5.3198 LearningRate 0.0524 Epoch: 13 Global Step: 139820 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:40:36,930-Speed 5984.62 samples/sec Loss 5.3434 LearningRate 0.0524 Epoch: 13 Global Step: 139830 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:40:43,791-Speed 5970.82 samples/sec Loss 5.3157 LearningRate 0.0524 Epoch: 13 Global Step: 139840 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:40:50,633-Speed 5987.58 samples/sec Loss 5.3077 LearningRate 0.0524 Epoch: 13 Global Step: 139850 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:40:57,469-Speed 5992.60 samples/sec Loss 5.3886 LearningRate 0.0523 Epoch: 13 Global Step: 139860 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:41:04,339-Speed 5963.20 samples/sec Loss 5.3801 LearningRate 0.0523 Epoch: 13 Global Step: 139870 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:41:11,192-Speed 5978.42 samples/sec Loss 5.3204 LearningRate 0.0523 Epoch: 13 Global Step: 139880 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:41:18,064-Speed 5961.66 samples/sec Loss 5.3717 LearningRate 0.0523 Epoch: 13 Global Step: 139890 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:41:24,927-Speed 5969.20 samples/sec Loss 5.3243 LearningRate 0.0523 Epoch: 13 Global Step: 139900 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:41:31,775-Speed 5983.09 samples/sec Loss 5.3712 LearningRate 0.0523 Epoch: 13 Global Step: 139910 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:41:38,631-Speed 5977.08 samples/sec Loss 5.2880 LearningRate 0.0523 Epoch: 13 Global Step: 139920 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:41:45,483-Speed 5977.98 samples/sec Loss 5.3552 LearningRate 0.0522 Epoch: 13 Global Step: 139930 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:41:52,344-Speed 5971.50 samples/sec Loss 5.3437 LearningRate 0.0522 Epoch: 13 Global Step: 139940 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:41:59,190-Speed 5984.10 samples/sec Loss 5.4130 LearningRate 0.0522 Epoch: 13 Global Step: 139950 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:42:06,039-Speed 5981.07 samples/sec Loss 5.3555 LearningRate 0.0522 Epoch: 13 Global Step: 139960 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:42:12,890-Speed 5979.86 samples/sec Loss 5.3211 LearningRate 0.0522 Epoch: 13 Global Step: 139970 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:42:19,742-Speed 5979.02 samples/sec Loss 5.3202 LearningRate 0.0522 Epoch: 13 Global Step: 139980 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:42:26,610-Speed 5965.11 samples/sec Loss 5.3361 LearningRate 0.0521 Epoch: 13 Global Step: 139990 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:42:33,486-Speed 5958.47 samples/sec Loss 5.3189 LearningRate 0.0521 Epoch: 13 Global Step: 140000 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:43:00,327-[lfw][140000]XNorm: 22.998499 Training: 2022-01-08 23:43:00,328-[lfw][140000]Accuracy-Flip: 0.99767+-0.00300 Training: 2022-01-08 23:43:00,328-[lfw][140000]Accuracy-Highest: 0.99783 Training: 2022-01-08 23:43:31,438-[cfp_fp][140000]XNorm: 20.309371 Training: 2022-01-08 23:43:31,439-[cfp_fp][140000]Accuracy-Flip: 0.98714+-0.00723 Training: 2022-01-08 23:43:31,440-[cfp_fp][140000]Accuracy-Highest: 0.98714 Training: 2022-01-08 23:43:58,303-[agedb_30][140000]XNorm: 22.815771 Training: 2022-01-08 23:43:58,304-[agedb_30][140000]Accuracy-Flip: 0.97617+-0.00619 Training: 2022-01-08 23:43:58,304-[agedb_30][140000]Accuracy-Highest: 0.97667 Training: 2022-01-08 23:44:05,167-Speed 446.77 samples/sec Loss 5.3102 LearningRate 0.0521 Epoch: 13 Global Step: 140010 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:44:12,006-Speed 5990.44 samples/sec Loss 5.3503 LearningRate 0.0521 Epoch: 13 Global Step: 140020 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:44:18,831-Speed 6002.38 samples/sec Loss 5.3614 LearningRate 0.0521 Epoch: 13 Global Step: 140030 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:44:25,681-Speed 5981.88 samples/sec Loss 5.3616 LearningRate 0.0521 Epoch: 13 Global Step: 140040 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:44:32,532-Speed 5979.83 samples/sec Loss 5.3080 LearningRate 0.0521 Epoch: 13 Global Step: 140050 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:44:39,400-Speed 5965.47 samples/sec Loss 5.2886 LearningRate 0.0520 Epoch: 13 Global Step: 140060 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:44:46,255-Speed 5976.09 samples/sec Loss 5.3346 LearningRate 0.0520 Epoch: 13 Global Step: 140070 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:44:53,122-Speed 5965.46 samples/sec Loss 5.2703 LearningRate 0.0520 Epoch: 13 Global Step: 140080 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:44:59,994-Speed 5964.15 samples/sec Loss 5.3270 LearningRate 0.0520 Epoch: 13 Global Step: 140090 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:45:06,846-Speed 5980.18 samples/sec Loss 5.3386 LearningRate 0.0520 Epoch: 13 Global Step: 140100 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:45:13,690-Speed 5985.05 samples/sec Loss 5.3414 LearningRate 0.0520 Epoch: 13 Global Step: 140110 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:45:20,528-Speed 5991.16 samples/sec Loss 5.3389 LearningRate 0.0519 Epoch: 13 Global Step: 140120 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:45:27,373-Speed 5985.30 samples/sec Loss 5.3514 LearningRate 0.0519 Epoch: 13 Global Step: 140130 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:45:34,213-Speed 5989.34 samples/sec Loss 5.3318 LearningRate 0.0519 Epoch: 13 Global Step: 140140 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:45:41,055-Speed 5988.17 samples/sec Loss 5.3166 LearningRate 0.0519 Epoch: 13 Global Step: 140150 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:45:47,895-Speed 5989.45 samples/sec Loss 5.3065 LearningRate 0.0519 Epoch: 13 Global Step: 140160 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:45:54,761-Speed 5966.84 samples/sec Loss 5.3466 LearningRate 0.0519 Epoch: 13 Global Step: 140170 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:46:01,597-Speed 5993.18 samples/sec Loss 5.2923 LearningRate 0.0519 Epoch: 13 Global Step: 140180 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:46:08,440-Speed 5987.66 samples/sec Loss 5.3818 LearningRate 0.0518 Epoch: 13 Global Step: 140190 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:46:15,290-Speed 5980.21 samples/sec Loss 5.2748 LearningRate 0.0518 Epoch: 13 Global Step: 140200 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:46:22,158-Speed 5965.94 samples/sec Loss 5.3494 LearningRate 0.0518 Epoch: 13 Global Step: 140210 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:46:29,042-Speed 5951.93 samples/sec Loss 5.3668 LearningRate 0.0518 Epoch: 13 Global Step: 140220 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:46:35,893-Speed 5979.97 samples/sec Loss 5.3005 LearningRate 0.0518 Epoch: 13 Global Step: 140230 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:46:42,789-Speed 5940.46 samples/sec Loss 5.3458 LearningRate 0.0518 Epoch: 13 Global Step: 140240 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:46:49,636-Speed 5984.30 samples/sec Loss 5.3346 LearningRate 0.0517 Epoch: 13 Global Step: 140250 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:46:56,499-Speed 5969.29 samples/sec Loss 5.3142 LearningRate 0.0517 Epoch: 13 Global Step: 140260 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:47:03,371-Speed 5964.15 samples/sec Loss 5.3459 LearningRate 0.0517 Epoch: 13 Global Step: 140270 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:47:10,220-Speed 5981.34 samples/sec Loss 5.3535 LearningRate 0.0517 Epoch: 13 Global Step: 140280 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:47:17,081-Speed 5971.01 samples/sec Loss 5.3400 LearningRate 0.0517 Epoch: 13 Global Step: 140290 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:47:23,938-Speed 5974.68 samples/sec Loss 5.3285 LearningRate 0.0517 Epoch: 13 Global Step: 140300 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:47:30,774-Speed 5992.76 samples/sec Loss 5.4060 LearningRate 0.0517 Epoch: 13 Global Step: 140310 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:47:37,612-Speed 5991.35 samples/sec Loss 5.3341 LearningRate 0.0516 Epoch: 13 Global Step: 140320 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:47:44,481-Speed 5964.43 samples/sec Loss 5.3660 LearningRate 0.0516 Epoch: 13 Global Step: 140330 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:47:51,336-Speed 5976.82 samples/sec Loss 5.2959 LearningRate 0.0516 Epoch: 13 Global Step: 140340 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:47:58,186-Speed 5980.66 samples/sec Loss 5.3496 LearningRate 0.0516 Epoch: 13 Global Step: 140350 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:48:05,029-Speed 5986.83 samples/sec Loss 5.3271 LearningRate 0.0516 Epoch: 13 Global Step: 140360 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:48:11,917-Speed 5948.25 samples/sec Loss 5.2906 LearningRate 0.0516 Epoch: 13 Global Step: 140370 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:48:18,764-Speed 5983.06 samples/sec Loss 5.3055 LearningRate 0.0515 Epoch: 13 Global Step: 140380 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:48:25,611-Speed 5983.10 samples/sec Loss 5.3511 LearningRate 0.0515 Epoch: 13 Global Step: 140390 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:48:32,477-Speed 5966.97 samples/sec Loss 5.2542 LearningRate 0.0515 Epoch: 13 Global Step: 140400 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:48:39,339-Speed 5970.43 samples/sec Loss 5.3353 LearningRate 0.0515 Epoch: 13 Global Step: 140410 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:48:46,201-Speed 5970.61 samples/sec Loss 5.3594 LearningRate 0.0515 Epoch: 13 Global Step: 140420 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:48:53,050-Speed 5981.15 samples/sec Loss 5.3448 LearningRate 0.0515 Epoch: 13 Global Step: 140430 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:48:59,914-Speed 5968.46 samples/sec Loss 5.3314 LearningRate 0.0515 Epoch: 13 Global Step: 140440 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-08 23:49:06,777-Speed 5969.91 samples/sec Loss 5.3631 LearningRate 0.0514 Epoch: 13 Global Step: 140450 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-08 23:49:13,626-Speed 5981.86 samples/sec Loss 5.2946 LearningRate 0.0514 Epoch: 13 Global Step: 140460 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-08 23:49:20,481-Speed 5975.89 samples/sec Loss 5.3450 LearningRate 0.0514 Epoch: 13 Global Step: 140470 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-08 23:49:27,359-Speed 5958.81 samples/sec Loss 5.3056 LearningRate 0.0514 Epoch: 13 Global Step: 140480 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-08 23:49:34,208-Speed 5981.71 samples/sec Loss 5.3392 LearningRate 0.0514 Epoch: 13 Global Step: 140490 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-08 23:49:41,059-Speed 5979.47 samples/sec Loss 5.2829 LearningRate 0.0514 Epoch: 13 Global Step: 140500 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-08 23:49:47,922-Speed 5971.46 samples/sec Loss 5.3030 LearningRate 0.0513 Epoch: 13 Global Step: 140510 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-08 23:49:54,791-Speed 5963.72 samples/sec Loss 5.2818 LearningRate 0.0513 Epoch: 13 Global Step: 140520 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-08 23:50:01,641-Speed 5980.74 samples/sec Loss 5.3018 LearningRate 0.0513 Epoch: 13 Global Step: 140530 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-08 23:50:08,492-Speed 5980.45 samples/sec Loss 5.2646 LearningRate 0.0513 Epoch: 13 Global Step: 140540 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:50:15,345-Speed 5978.18 samples/sec Loss 5.2615 LearningRate 0.0513 Epoch: 13 Global Step: 140550 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:50:22,186-Speed 5988.61 samples/sec Loss 5.2705 LearningRate 0.0513 Epoch: 13 Global Step: 140560 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:50:29,035-Speed 5981.02 samples/sec Loss 5.2738 LearningRate 0.0513 Epoch: 13 Global Step: 140570 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:50:35,884-Speed 5982.20 samples/sec Loss 5.3542 LearningRate 0.0512 Epoch: 13 Global Step: 140580 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:50:42,743-Speed 5972.76 samples/sec Loss 5.3065 LearningRate 0.0512 Epoch: 13 Global Step: 140590 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:50:49,598-Speed 5976.49 samples/sec Loss 5.2955 LearningRate 0.0512 Epoch: 13 Global Step: 140600 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:50:56,455-Speed 5974.90 samples/sec Loss 5.2737 LearningRate 0.0512 Epoch: 13 Global Step: 140610 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:51:03,304-Speed 5981.04 samples/sec Loss 5.3223 LearningRate 0.0512 Epoch: 13 Global Step: 140620 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:51:10,151-Speed 5984.00 samples/sec Loss 5.2982 LearningRate 0.0512 Epoch: 13 Global Step: 140630 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:51:16,997-Speed 5984.21 samples/sec Loss 5.2935 LearningRate 0.0511 Epoch: 13 Global Step: 140640 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:51:23,865-Speed 5964.77 samples/sec Loss 5.2896 LearningRate 0.0511 Epoch: 13 Global Step: 140650 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:51:30,725-Speed 5971.43 samples/sec Loss 5.2731 LearningRate 0.0511 Epoch: 13 Global Step: 140660 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:51:37,603-Speed 5959.82 samples/sec Loss 5.2702 LearningRate 0.0511 Epoch: 13 Global Step: 140670 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:51:44,474-Speed 5961.99 samples/sec Loss 5.3142 LearningRate 0.0511 Epoch: 13 Global Step: 140680 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:51:51,383-Speed 5929.60 samples/sec Loss 5.2789 LearningRate 0.0511 Epoch: 13 Global Step: 140690 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:51:58,251-Speed 5965.69 samples/sec Loss 5.2793 LearningRate 0.0511 Epoch: 13 Global Step: 140700 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:52:05,110-Speed 5972.60 samples/sec Loss 5.2551 LearningRate 0.0510 Epoch: 13 Global Step: 140710 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:52:11,964-Speed 5976.88 samples/sec Loss 5.2220 LearningRate 0.0510 Epoch: 13 Global Step: 140720 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:52:18,835-Speed 5962.86 samples/sec Loss 5.2684 LearningRate 0.0510 Epoch: 13 Global Step: 140730 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:52:25,711-Speed 5957.96 samples/sec Loss 5.2904 LearningRate 0.0510 Epoch: 13 Global Step: 140740 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:52:32,582-Speed 5962.40 samples/sec Loss 5.3000 LearningRate 0.0510 Epoch: 13 Global Step: 140750 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:52:39,451-Speed 5964.69 samples/sec Loss 5.2840 LearningRate 0.0510 Epoch: 13 Global Step: 140760 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:52:46,299-Speed 5981.86 samples/sec Loss 5.2194 LearningRate 0.0509 Epoch: 13 Global Step: 140770 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:52:53,159-Speed 5972.26 samples/sec Loss 5.3279 LearningRate 0.0509 Epoch: 13 Global Step: 140780 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:53:00,059-Speed 5937.58 samples/sec Loss 5.2789 LearningRate 0.0509 Epoch: 13 Global Step: 140790 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:53:06,906-Speed 5983.73 samples/sec Loss 5.2908 LearningRate 0.0509 Epoch: 13 Global Step: 140800 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:53:13,762-Speed 5974.91 samples/sec Loss 5.3056 LearningRate 0.0509 Epoch: 13 Global Step: 140810 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:53:20,625-Speed 5969.99 samples/sec Loss 5.2926 LearningRate 0.0509 Epoch: 13 Global Step: 140820 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:53:27,480-Speed 5975.59 samples/sec Loss 5.2335 LearningRate 0.0509 Epoch: 13 Global Step: 140830 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:53:34,332-Speed 5978.94 samples/sec Loss 5.1916 LearningRate 0.0508 Epoch: 13 Global Step: 140840 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:53:41,193-Speed 5970.97 samples/sec Loss 5.2631 LearningRate 0.0508 Epoch: 13 Global Step: 140850 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:53:48,057-Speed 5968.51 samples/sec Loss 5.2574 LearningRate 0.0508 Epoch: 13 Global Step: 140860 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:53:54,913-Speed 5976.27 samples/sec Loss 5.2583 LearningRate 0.0508 Epoch: 13 Global Step: 140870 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:54:01,766-Speed 5977.93 samples/sec Loss 5.2988 LearningRate 0.0508 Epoch: 13 Global Step: 140880 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:54:08,661-Speed 5941.78 samples/sec Loss 5.2815 LearningRate 0.0508 Epoch: 13 Global Step: 140890 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:54:15,531-Speed 5963.46 samples/sec Loss 5.2625 LearningRate 0.0507 Epoch: 13 Global Step: 140900 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:54:22,386-Speed 5976.35 samples/sec Loss 5.3321 LearningRate 0.0507 Epoch: 13 Global Step: 140910 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:54:29,282-Speed 5941.10 samples/sec Loss 5.3163 LearningRate 0.0507 Epoch: 13 Global Step: 140920 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:54:36,138-Speed 5974.85 samples/sec Loss 5.3076 LearningRate 0.0507 Epoch: 13 Global Step: 140930 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:54:42,994-Speed 5975.98 samples/sec Loss 5.2777 LearningRate 0.0507 Epoch: 13 Global Step: 140940 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:54:49,944-Speed 5894.18 samples/sec Loss 5.2566 LearningRate 0.0507 Epoch: 13 Global Step: 140950 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:54:56,796-Speed 5978.77 samples/sec Loss 5.2275 LearningRate 0.0507 Epoch: 13 Global Step: 140960 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:55:03,671-Speed 5959.49 samples/sec Loss 5.2508 LearningRate 0.0506 Epoch: 13 Global Step: 140970 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-08 23:55:10,518-Speed 5983.45 samples/sec Loss 5.2761 LearningRate 0.0506 Epoch: 13 Global Step: 140980 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:55:17,385-Speed 5965.68 samples/sec Loss 5.2871 LearningRate 0.0506 Epoch: 13 Global Step: 140990 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:55:24,251-Speed 5966.76 samples/sec Loss 5.2794 LearningRate 0.0506 Epoch: 13 Global Step: 141000 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:55:31,107-Speed 5975.59 samples/sec Loss 5.2565 LearningRate 0.0506 Epoch: 13 Global Step: 141010 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:55:37,964-Speed 5975.27 samples/sec Loss 5.2778 LearningRate 0.0506 Epoch: 13 Global Step: 141020 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:55:44,832-Speed 5965.42 samples/sec Loss 5.2239 LearningRate 0.0506 Epoch: 13 Global Step: 141030 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:55:51,710-Speed 5956.14 samples/sec Loss 5.2176 LearningRate 0.0505 Epoch: 13 Global Step: 141040 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:55:58,588-Speed 5956.18 samples/sec Loss 5.2793 LearningRate 0.0505 Epoch: 13 Global Step: 141050 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:56:05,440-Speed 5978.70 samples/sec Loss 5.2512 LearningRate 0.0505 Epoch: 13 Global Step: 141060 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:56:12,298-Speed 5973.99 samples/sec Loss 5.2430 LearningRate 0.0505 Epoch: 13 Global Step: 141070 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:56:19,157-Speed 5971.80 samples/sec Loss 5.2519 LearningRate 0.0505 Epoch: 13 Global Step: 141080 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-08 23:56:26,028-Speed 5963.69 samples/sec Loss 5.2518 LearningRate 0.0505 Epoch: 13 Global Step: 141090 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-08 23:56:32,893-Speed 5967.56 samples/sec Loss 5.2718 LearningRate 0.0504 Epoch: 13 Global Step: 141100 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-08 23:56:39,736-Speed 5987.13 samples/sec Loss 5.2220 LearningRate 0.0504 Epoch: 13 Global Step: 141110 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:56:46,596-Speed 5971.91 samples/sec Loss 5.2406 LearningRate 0.0504 Epoch: 13 Global Step: 141120 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:56:53,458-Speed 5969.71 samples/sec Loss 5.2650 LearningRate 0.0504 Epoch: 13 Global Step: 141130 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:57:00,318-Speed 5972.34 samples/sec Loss 5.3432 LearningRate 0.0504 Epoch: 13 Global Step: 141140 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:57:07,176-Speed 5974.12 samples/sec Loss 5.2869 LearningRate 0.0504 Epoch: 13 Global Step: 141150 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:57:14,030-Speed 5976.41 samples/sec Loss 5.2305 LearningRate 0.0504 Epoch: 13 Global Step: 141160 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:57:20,901-Speed 5962.79 samples/sec Loss 5.2621 LearningRate 0.0503 Epoch: 13 Global Step: 141170 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:57:27,775-Speed 5962.44 samples/sec Loss 5.3123 LearningRate 0.0503 Epoch: 13 Global Step: 141180 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:57:34,650-Speed 5958.19 samples/sec Loss 5.2036 LearningRate 0.0503 Epoch: 13 Global Step: 141190 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:57:41,504-Speed 5977.23 samples/sec Loss 5.2471 LearningRate 0.0503 Epoch: 13 Global Step: 141200 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:57:48,369-Speed 5968.47 samples/sec Loss 5.2750 LearningRate 0.0503 Epoch: 13 Global Step: 141210 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:57:55,230-Speed 5970.69 samples/sec Loss 5.2633 LearningRate 0.0503 Epoch: 13 Global Step: 141220 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:58:02,091-Speed 5971.52 samples/sec Loss 5.2882 LearningRate 0.0502 Epoch: 13 Global Step: 141230 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-08 23:58:08,984-Speed 5943.75 samples/sec Loss 5.2538 LearningRate 0.0502 Epoch: 13 Global Step: 141240 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:58:15,839-Speed 5975.86 samples/sec Loss 5.2974 LearningRate 0.0502 Epoch: 13 Global Step: 141250 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:58:22,723-Speed 5951.73 samples/sec Loss 5.2728 LearningRate 0.0502 Epoch: 13 Global Step: 141260 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:58:29,592-Speed 5965.56 samples/sec Loss 5.2458 LearningRate 0.0502 Epoch: 13 Global Step: 141270 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:58:36,472-Speed 5954.56 samples/sec Loss 5.2547 LearningRate 0.0502 Epoch: 13 Global Step: 141280 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:58:43,337-Speed 5968.76 samples/sec Loss 5.2766 LearningRate 0.0502 Epoch: 13 Global Step: 141290 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:58:50,223-Speed 5950.29 samples/sec Loss 5.2292 LearningRate 0.0501 Epoch: 13 Global Step: 141300 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:58:57,091-Speed 5964.55 samples/sec Loss 5.2340 LearningRate 0.0501 Epoch: 13 Global Step: 141310 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:59:03,946-Speed 5976.27 samples/sec Loss 5.3079 LearningRate 0.0501 Epoch: 13 Global Step: 141320 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:59:10,823-Speed 5957.70 samples/sec Loss 5.2585 LearningRate 0.0501 Epoch: 13 Global Step: 141330 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:59:17,684-Speed 5971.84 samples/sec Loss 5.2514 LearningRate 0.0501 Epoch: 13 Global Step: 141340 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:59:24,641-Speed 5888.51 samples/sec Loss 5.2961 LearningRate 0.0501 Epoch: 13 Global Step: 141350 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:59:31,626-Speed 5865.74 samples/sec Loss 5.1953 LearningRate 0.0500 Epoch: 13 Global Step: 141360 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:59:38,542-Speed 5924.17 samples/sec Loss 5.2300 LearningRate 0.0500 Epoch: 13 Global Step: 141370 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:59:45,405-Speed 5969.14 samples/sec Loss 5.2094 LearningRate 0.0500 Epoch: 13 Global Step: 141380 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:59:52,262-Speed 5974.91 samples/sec Loss 5.2897 LearningRate 0.0500 Epoch: 13 Global Step: 141390 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-08 23:59:59,124-Speed 5970.12 samples/sec Loss 5.2495 LearningRate 0.0500 Epoch: 13 Global Step: 141400 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:00:05,996-Speed 5961.64 samples/sec Loss 5.2824 LearningRate 0.0500 Epoch: 13 Global Step: 141410 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:00:12,871-Speed 5958.85 samples/sec Loss 5.2707 LearningRate 0.0500 Epoch: 13 Global Step: 141420 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:00:19,725-Speed 5977.02 samples/sec Loss 5.2677 LearningRate 0.0499 Epoch: 13 Global Step: 141430 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:00:26,576-Speed 5979.61 samples/sec Loss 5.2034 LearningRate 0.0499 Epoch: 13 Global Step: 141440 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-09 00:00:33,435-Speed 5973.00 samples/sec Loss 5.2932 LearningRate 0.0499 Epoch: 13 Global Step: 141450 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:00:40,294-Speed 5971.87 samples/sec Loss 5.2433 LearningRate 0.0499 Epoch: 13 Global Step: 141460 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:00:47,178-Speed 5953.77 samples/sec Loss 5.2318 LearningRate 0.0499 Epoch: 13 Global Step: 141470 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:00:54,037-Speed 5972.88 samples/sec Loss 5.2333 LearningRate 0.0499 Epoch: 13 Global Step: 141480 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:01:00,899-Speed 5969.51 samples/sec Loss 5.2278 LearningRate 0.0499 Epoch: 13 Global Step: 141490 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:01:07,759-Speed 5971.92 samples/sec Loss 5.2497 LearningRate 0.0498 Epoch: 13 Global Step: 141500 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:01:14,614-Speed 5977.43 samples/sec Loss 5.1935 LearningRate 0.0498 Epoch: 13 Global Step: 141510 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:01:21,475-Speed 5970.58 samples/sec Loss 5.2167 LearningRate 0.0498 Epoch: 13 Global Step: 141520 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:01:28,383-Speed 5930.64 samples/sec Loss 5.2549 LearningRate 0.0498 Epoch: 13 Global Step: 141530 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:01:35,234-Speed 5979.46 samples/sec Loss 5.2235 LearningRate 0.0498 Epoch: 13 Global Step: 141540 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:01:42,088-Speed 5977.53 samples/sec Loss 5.2292 LearningRate 0.0498 Epoch: 13 Global Step: 141550 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:01:48,958-Speed 5962.99 samples/sec Loss 5.2009 LearningRate 0.0497 Epoch: 13 Global Step: 141560 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:01:55,836-Speed 5956.33 samples/sec Loss 5.2181 LearningRate 0.0497 Epoch: 13 Global Step: 141570 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:02:02,707-Speed 5962.34 samples/sec Loss 5.1966 LearningRate 0.0497 Epoch: 13 Global Step: 141580 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:02:09,593-Speed 5949.37 samples/sec Loss 5.2426 LearningRate 0.0497 Epoch: 13 Global Step: 141590 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:02:16,446-Speed 5978.42 samples/sec Loss 5.3107 LearningRate 0.0497 Epoch: 13 Global Step: 141600 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 00:02:23,296-Speed 5981.00 samples/sec Loss 5.2253 LearningRate 0.0497 Epoch: 13 Global Step: 141610 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 00:02:30,178-Speed 5952.55 samples/sec Loss 5.2015 LearningRate 0.0497 Epoch: 13 Global Step: 141620 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 00:02:37,064-Speed 5950.02 samples/sec Loss 5.2053 LearningRate 0.0496 Epoch: 13 Global Step: 141630 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 00:02:43,926-Speed 5970.40 samples/sec Loss 5.2309 LearningRate 0.0496 Epoch: 13 Global Step: 141640 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 00:02:50,816-Speed 5947.52 samples/sec Loss 5.2460 LearningRate 0.0496 Epoch: 13 Global Step: 141650 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 00:02:57,680-Speed 5968.80 samples/sec Loss 5.2833 LearningRate 0.0496 Epoch: 13 Global Step: 141660 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 00:03:04,576-Speed 5940.78 samples/sec Loss 5.2006 LearningRate 0.0496 Epoch: 13 Global Step: 141670 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 00:03:11,450-Speed 5959.55 samples/sec Loss 5.2217 LearningRate 0.0496 Epoch: 13 Global Step: 141680 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 00:03:18,313-Speed 5970.08 samples/sec Loss 5.2058 LearningRate 0.0495 Epoch: 13 Global Step: 141690 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 00:03:25,164-Speed 5979.30 samples/sec Loss 5.2405 LearningRate 0.0495 Epoch: 13 Global Step: 141700 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:03:32,119-Speed 5891.53 samples/sec Loss 5.2325 LearningRate 0.0495 Epoch: 13 Global Step: 141710 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:03:39,086-Speed 5879.71 samples/sec Loss 5.2103 LearningRate 0.0495 Epoch: 13 Global Step: 141720 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:03:46,042-Speed 5889.93 samples/sec Loss 5.1875 LearningRate 0.0495 Epoch: 13 Global Step: 141730 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:03:52,999-Speed 5889.13 samples/sec Loss 5.2424 LearningRate 0.0495 Epoch: 13 Global Step: 141740 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:03:59,874-Speed 5959.36 samples/sec Loss 5.2147 LearningRate 0.0495 Epoch: 13 Global Step: 141750 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:04:06,768-Speed 5941.79 samples/sec Loss 5.2162 LearningRate 0.0494 Epoch: 13 Global Step: 141760 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:04:13,637-Speed 5964.06 samples/sec Loss 5.2237 LearningRate 0.0494 Epoch: 13 Global Step: 141770 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:04:20,510-Speed 5961.01 samples/sec Loss 5.2159 LearningRate 0.0494 Epoch: 13 Global Step: 141780 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:04:27,363-Speed 5977.83 samples/sec Loss 5.2605 LearningRate 0.0494 Epoch: 13 Global Step: 141790 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:04:34,219-Speed 5974.57 samples/sec Loss 5.2199 LearningRate 0.0494 Epoch: 13 Global Step: 141800 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:04:41,077-Speed 5973.88 samples/sec Loss 5.1932 LearningRate 0.0494 Epoch: 13 Global Step: 141810 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:04:47,951-Speed 5959.66 samples/sec Loss 5.2810 LearningRate 0.0494 Epoch: 13 Global Step: 141820 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:04:54,800-Speed 5981.94 samples/sec Loss 5.2504 LearningRate 0.0493 Epoch: 13 Global Step: 141830 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:05:01,658-Speed 5973.81 samples/sec Loss 5.2195 LearningRate 0.0493 Epoch: 13 Global Step: 141840 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:05:08,514-Speed 5974.40 samples/sec Loss 5.1869 LearningRate 0.0493 Epoch: 13 Global Step: 141850 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:05:15,380-Speed 5967.50 samples/sec Loss 5.2641 LearningRate 0.0493 Epoch: 13 Global Step: 141860 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:05:22,281-Speed 5936.61 samples/sec Loss 5.2022 LearningRate 0.0493 Epoch: 13 Global Step: 141870 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:05:29,129-Speed 5982.60 samples/sec Loss 5.2580 LearningRate 0.0493 Epoch: 13 Global Step: 141880 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:05:35,995-Speed 5967.28 samples/sec Loss 5.2345 LearningRate 0.0492 Epoch: 13 Global Step: 141890 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:05:42,851-Speed 5975.30 samples/sec Loss 5.2285 LearningRate 0.0492 Epoch: 13 Global Step: 141900 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:05:49,728-Speed 5957.41 samples/sec Loss 5.2266 LearningRate 0.0492 Epoch: 13 Global Step: 141910 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:05:56,594-Speed 5968.97 samples/sec Loss 5.2174 LearningRate 0.0492 Epoch: 13 Global Step: 141920 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:06:03,442-Speed 5981.92 samples/sec Loss 5.2038 LearningRate 0.0492 Epoch: 13 Global Step: 141930 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:06:10,302-Speed 5972.17 samples/sec Loss 5.2119 LearningRate 0.0492 Epoch: 13 Global Step: 141940 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:06:17,161-Speed 5972.98 samples/sec Loss 5.0970 LearningRate 0.0492 Epoch: 13 Global Step: 141950 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:06:24,026-Speed 5968.58 samples/sec Loss 5.1620 LearningRate 0.0491 Epoch: 13 Global Step: 141960 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:06:30,876-Speed 5980.39 samples/sec Loss 5.1613 LearningRate 0.0491 Epoch: 13 Global Step: 141970 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:06:37,726-Speed 5981.42 samples/sec Loss 5.1390 LearningRate 0.0491 Epoch: 13 Global Step: 141980 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:06:44,691-Speed 5881.61 samples/sec Loss 5.2138 LearningRate 0.0491 Epoch: 13 Global Step: 141990 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:06:51,558-Speed 5966.41 samples/sec Loss 5.1495 LearningRate 0.0491 Epoch: 13 Global Step: 142000 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:06:58,409-Speed 5979.80 samples/sec Loss 5.1770 LearningRate 0.0491 Epoch: 13 Global Step: 142010 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:07:05,267-Speed 5973.14 samples/sec Loss 5.2081 LearningRate 0.0491 Epoch: 13 Global Step: 142020 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:07:12,155-Speed 5947.48 samples/sec Loss 5.1895 LearningRate 0.0490 Epoch: 13 Global Step: 142030 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:07:19,030-Speed 5959.55 samples/sec Loss 5.1704 LearningRate 0.0490 Epoch: 13 Global Step: 142040 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:07:25,894-Speed 5968.32 samples/sec Loss 5.2129 LearningRate 0.0490 Epoch: 13 Global Step: 142050 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:07:32,750-Speed 5975.76 samples/sec Loss 5.2707 LearningRate 0.0490 Epoch: 13 Global Step: 142060 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:07:39,596-Speed 5984.92 samples/sec Loss 5.1817 LearningRate 0.0490 Epoch: 13 Global Step: 142070 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:07:46,481-Speed 5952.66 samples/sec Loss 5.2333 LearningRate 0.0490 Epoch: 13 Global Step: 142080 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:07:53,357-Speed 5958.36 samples/sec Loss 5.1461 LearningRate 0.0489 Epoch: 13 Global Step: 142090 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:08:00,206-Speed 5981.50 samples/sec Loss 5.1558 LearningRate 0.0489 Epoch: 13 Global Step: 142100 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:08:07,046-Speed 5989.25 samples/sec Loss 5.1896 LearningRate 0.0489 Epoch: 13 Global Step: 142110 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 00:08:13,893-Speed 5982.65 samples/sec Loss 5.2155 LearningRate 0.0489 Epoch: 13 Global Step: 142120 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 00:08:20,759-Speed 5966.36 samples/sec Loss 5.1874 LearningRate 0.0489 Epoch: 13 Global Step: 142130 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 00:08:27,616-Speed 5975.06 samples/sec Loss 5.2177 LearningRate 0.0489 Epoch: 13 Global Step: 142140 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 00:08:34,480-Speed 5968.93 samples/sec Loss 5.2145 LearningRate 0.0489 Epoch: 13 Global Step: 142150 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 00:08:41,349-Speed 5963.66 samples/sec Loss 5.1508 LearningRate 0.0488 Epoch: 13 Global Step: 142160 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 00:08:48,201-Speed 5982.68 samples/sec Loss 5.1673 LearningRate 0.0488 Epoch: 13 Global Step: 142170 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 00:08:55,061-Speed 5971.08 samples/sec Loss 5.1950 LearningRate 0.0488 Epoch: 13 Global Step: 142180 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 00:09:01,927-Speed 5967.64 samples/sec Loss 5.2002 LearningRate 0.0488 Epoch: 13 Global Step: 142190 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 00:09:08,790-Speed 5969.08 samples/sec Loss 5.2030 LearningRate 0.0488 Epoch: 13 Global Step: 142200 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 00:09:15,651-Speed 5971.02 samples/sec Loss 5.1982 LearningRate 0.0488 Epoch: 13 Global Step: 142210 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:09:22,541-Speed 5947.75 samples/sec Loss 5.1844 LearningRate 0.0488 Epoch: 13 Global Step: 142220 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:09:29,393-Speed 5980.22 samples/sec Loss 5.2162 LearningRate 0.0487 Epoch: 13 Global Step: 142230 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:09:36,248-Speed 5976.18 samples/sec Loss 5.1206 LearningRate 0.0487 Epoch: 13 Global Step: 142240 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:09:43,111-Speed 5969.52 samples/sec Loss 5.2159 LearningRate 0.0487 Epoch: 13 Global Step: 142250 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:09:49,971-Speed 5971.85 samples/sec Loss 5.1812 LearningRate 0.0487 Epoch: 13 Global Step: 142260 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:09:56,825-Speed 5976.37 samples/sec Loss 5.2055 LearningRate 0.0487 Epoch: 13 Global Step: 142270 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:10:03,686-Speed 5971.65 samples/sec Loss 5.2197 LearningRate 0.0487 Epoch: 13 Global Step: 142280 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:10:10,542-Speed 5975.32 samples/sec Loss 5.1388 LearningRate 0.0486 Epoch: 13 Global Step: 142290 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:10:17,422-Speed 5957.53 samples/sec Loss 5.1633 LearningRate 0.0486 Epoch: 13 Global Step: 142300 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:10:24,307-Speed 5950.77 samples/sec Loss 5.2419 LearningRate 0.0486 Epoch: 13 Global Step: 142310 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:10:31,156-Speed 5981.41 samples/sec Loss 5.1573 LearningRate 0.0486 Epoch: 13 Global Step: 142320 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:10:38,009-Speed 5978.28 samples/sec Loss 5.1763 LearningRate 0.0486 Epoch: 13 Global Step: 142330 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:10:44,900-Speed 5944.39 samples/sec Loss 5.1490 LearningRate 0.0486 Epoch: 13 Global Step: 142340 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:10:51,768-Speed 5965.75 samples/sec Loss 5.1823 LearningRate 0.0486 Epoch: 13 Global Step: 142350 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:10:58,623-Speed 5975.51 samples/sec Loss 5.1730 LearningRate 0.0485 Epoch: 13 Global Step: 142360 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:11:05,474-Speed 5980.15 samples/sec Loss 5.1581 LearningRate 0.0485 Epoch: 13 Global Step: 142370 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:11:12,335-Speed 5970.42 samples/sec Loss 5.1604 LearningRate 0.0485 Epoch: 13 Global Step: 142380 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:11:19,186-Speed 5980.46 samples/sec Loss 5.1722 LearningRate 0.0485 Epoch: 13 Global Step: 142390 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:11:26,043-Speed 5973.96 samples/sec Loss 5.2063 LearningRate 0.0485 Epoch: 13 Global Step: 142400 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:11:32,919-Speed 5959.03 samples/sec Loss 5.0939 LearningRate 0.0485 Epoch: 13 Global Step: 142410 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:11:39,787-Speed 5964.75 samples/sec Loss 5.1841 LearningRate 0.0485 Epoch: 13 Global Step: 142420 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:11:46,650-Speed 5969.64 samples/sec Loss 5.1828 LearningRate 0.0484 Epoch: 13 Global Step: 142430 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:11:53,508-Speed 5973.16 samples/sec Loss 5.1870 LearningRate 0.0484 Epoch: 13 Global Step: 142440 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:12:00,364-Speed 5975.28 samples/sec Loss 5.2308 LearningRate 0.0484 Epoch: 13 Global Step: 142450 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:12:07,212-Speed 5982.69 samples/sec Loss 5.1779 LearningRate 0.0484 Epoch: 13 Global Step: 142460 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:12:14,055-Speed 5987.05 samples/sec Loss 5.1158 LearningRate 0.0484 Epoch: 13 Global Step: 142470 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:12:20,903-Speed 5981.59 samples/sec Loss 5.1842 LearningRate 0.0484 Epoch: 13 Global Step: 142480 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:12:27,771-Speed 5965.21 samples/sec Loss 5.1392 LearningRate 0.0484 Epoch: 13 Global Step: 142490 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:12:34,626-Speed 5976.88 samples/sec Loss 5.1781 LearningRate 0.0483 Epoch: 13 Global Step: 142500 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:12:41,484-Speed 5976.92 samples/sec Loss 5.1388 LearningRate 0.0483 Epoch: 13 Global Step: 142510 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:12:48,353-Speed 5964.17 samples/sec Loss 5.1182 LearningRate 0.0483 Epoch: 13 Global Step: 142520 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:12:55,232-Speed 5956.28 samples/sec Loss 5.1756 LearningRate 0.0483 Epoch: 13 Global Step: 142530 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:13:02,083-Speed 5978.60 samples/sec Loss 5.1738 LearningRate 0.0483 Epoch: 13 Global Step: 142540 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:13:08,939-Speed 5976.24 samples/sec Loss 5.1742 LearningRate 0.0483 Epoch: 13 Global Step: 142550 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 00:13:15,779-Speed 5989.76 samples/sec Loss 5.1076 LearningRate 0.0482 Epoch: 13 Global Step: 142560 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 00:13:22,626-Speed 5982.86 samples/sec Loss 5.2084 LearningRate 0.0482 Epoch: 13 Global Step: 142570 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 00:13:29,480-Speed 5976.99 samples/sec Loss 5.1321 LearningRate 0.0482 Epoch: 13 Global Step: 142580 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 00:13:36,319-Speed 5990.22 samples/sec Loss 5.0968 LearningRate 0.0482 Epoch: 13 Global Step: 142590 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 00:13:43,170-Speed 5980.44 samples/sec Loss 5.1884 LearningRate 0.0482 Epoch: 13 Global Step: 142600 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 00:13:50,021-Speed 5979.72 samples/sec Loss 5.1394 LearningRate 0.0482 Epoch: 13 Global Step: 142610 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 00:13:56,889-Speed 5965.13 samples/sec Loss 5.1875 LearningRate 0.0482 Epoch: 13 Global Step: 142620 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 00:14:03,777-Speed 5947.51 samples/sec Loss 5.1681 LearningRate 0.0481 Epoch: 13 Global Step: 142630 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 00:14:10,646-Speed 5964.59 samples/sec Loss 5.2056 LearningRate 0.0481 Epoch: 13 Global Step: 142640 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 00:14:17,501-Speed 5976.68 samples/sec Loss 5.1441 LearningRate 0.0481 Epoch: 13 Global Step: 142650 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:14:24,352-Speed 5979.61 samples/sec Loss 5.1800 LearningRate 0.0481 Epoch: 13 Global Step: 142660 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:14:31,211-Speed 5972.91 samples/sec Loss 5.1740 LearningRate 0.0481 Epoch: 13 Global Step: 142670 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:14:38,117-Speed 5932.29 samples/sec Loss 5.1460 LearningRate 0.0481 Epoch: 13 Global Step: 142680 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:14:44,982-Speed 5967.41 samples/sec Loss 5.1295 LearningRate 0.0481 Epoch: 13 Global Step: 142690 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:14:51,845-Speed 5969.60 samples/sec Loss 5.1482 LearningRate 0.0480 Epoch: 13 Global Step: 142700 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:14:58,692-Speed 5983.77 samples/sec Loss 5.1089 LearningRate 0.0480 Epoch: 13 Global Step: 142710 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:15:05,550-Speed 5973.15 samples/sec Loss 5.0913 LearningRate 0.0480 Epoch: 13 Global Step: 142720 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:15:12,415-Speed 5968.28 samples/sec Loss 5.1711 LearningRate 0.0480 Epoch: 13 Global Step: 142730 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:15:19,291-Speed 5958.29 samples/sec Loss 5.1623 LearningRate 0.0480 Epoch: 13 Global Step: 142740 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:15:26,175-Speed 5951.27 samples/sec Loss 5.1782 LearningRate 0.0480 Epoch: 13 Global Step: 142750 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:15:33,063-Speed 5947.89 samples/sec Loss 5.1978 LearningRate 0.0479 Epoch: 13 Global Step: 142760 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:15:39,911-Speed 5982.33 samples/sec Loss 5.2056 LearningRate 0.0479 Epoch: 13 Global Step: 142770 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:15:46,762-Speed 5979.54 samples/sec Loss 5.1838 LearningRate 0.0479 Epoch: 13 Global Step: 142780 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:15:53,605-Speed 5987.16 samples/sec Loss 5.1109 LearningRate 0.0479 Epoch: 13 Global Step: 142790 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:16:00,470-Speed 5967.76 samples/sec Loss 5.1121 LearningRate 0.0479 Epoch: 13 Global Step: 142800 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:16:07,315-Speed 5984.23 samples/sec Loss 5.1189 LearningRate 0.0479 Epoch: 13 Global Step: 142810 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:16:14,155-Speed 5989.39 samples/sec Loss 5.2065 LearningRate 0.0479 Epoch: 13 Global Step: 142820 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:16:21,020-Speed 5967.93 samples/sec Loss 5.1691 LearningRate 0.0478 Epoch: 13 Global Step: 142830 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:16:27,915-Speed 5942.28 samples/sec Loss 5.1724 LearningRate 0.0478 Epoch: 13 Global Step: 142840 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:16:34,777-Speed 5970.37 samples/sec Loss 5.1517 LearningRate 0.0478 Epoch: 13 Global Step: 142850 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-09 00:16:41,625-Speed 5981.82 samples/sec Loss 5.1748 LearningRate 0.0478 Epoch: 13 Global Step: 142860 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:16:48,476-Speed 5979.88 samples/sec Loss 5.1639 LearningRate 0.0478 Epoch: 13 Global Step: 142870 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:16:55,329-Speed 5978.32 samples/sec Loss 5.1548 LearningRate 0.0478 Epoch: 13 Global Step: 142880 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:17:02,186-Speed 5977.01 samples/sec Loss 5.1228 LearningRate 0.0478 Epoch: 13 Global Step: 142890 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:17:09,040-Speed 5977.51 samples/sec Loss 5.1076 LearningRate 0.0477 Epoch: 13 Global Step: 142900 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:17:15,897-Speed 5974.78 samples/sec Loss 5.1497 LearningRate 0.0477 Epoch: 13 Global Step: 142910 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:17:22,756-Speed 5972.87 samples/sec Loss 5.1413 LearningRate 0.0477 Epoch: 13 Global Step: 142920 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:17:29,612-Speed 5975.14 samples/sec Loss 5.1643 LearningRate 0.0477 Epoch: 13 Global Step: 142930 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:17:36,480-Speed 5965.14 samples/sec Loss 5.1497 LearningRate 0.0477 Epoch: 13 Global Step: 142940 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:17:43,349-Speed 5964.58 samples/sec Loss 5.1286 LearningRate 0.0477 Epoch: 13 Global Step: 142950 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:17:50,217-Speed 5964.51 samples/sec Loss 5.1652 LearningRate 0.0477 Epoch: 13 Global Step: 142960 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-09 00:17:57,086-Speed 5964.67 samples/sec Loss 5.1626 LearningRate 0.0476 Epoch: 13 Global Step: 142970 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:18:03,942-Speed 5975.19 samples/sec Loss 5.1327 LearningRate 0.0476 Epoch: 13 Global Step: 142980 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:18:10,807-Speed 5967.78 samples/sec Loss 5.1719 LearningRate 0.0476 Epoch: 13 Global Step: 142990 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:18:17,677-Speed 5963.35 samples/sec Loss 5.0994 LearningRate 0.0476 Epoch: 13 Global Step: 143000 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:18:24,611-Speed 5909.21 samples/sec Loss 5.1266 LearningRate 0.0476 Epoch: 13 Global Step: 143010 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:18:31,477-Speed 5966.32 samples/sec Loss 5.1343 LearningRate 0.0476 Epoch: 13 Global Step: 143020 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:18:38,340-Speed 5969.81 samples/sec Loss 5.1057 LearningRate 0.0475 Epoch: 13 Global Step: 143030 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:18:45,220-Speed 5954.44 samples/sec Loss 5.1197 LearningRate 0.0475 Epoch: 13 Global Step: 143040 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:18:52,085-Speed 5967.86 samples/sec Loss 5.0874 LearningRate 0.0475 Epoch: 13 Global Step: 143050 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:18:58,963-Speed 5958.47 samples/sec Loss 5.1138 LearningRate 0.0475 Epoch: 13 Global Step: 143060 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:19:05,813-Speed 5980.76 samples/sec Loss 5.1740 LearningRate 0.0475 Epoch: 13 Global Step: 143070 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:19:12,663-Speed 5980.39 samples/sec Loss 5.1257 LearningRate 0.0475 Epoch: 13 Global Step: 143080 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:19:19,532-Speed 5964.57 samples/sec Loss 5.1001 LearningRate 0.0475 Epoch: 13 Global Step: 143090 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:19:26,395-Speed 5969.79 samples/sec Loss 5.1148 LearningRate 0.0474 Epoch: 13 Global Step: 143100 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:19:33,237-Speed 5987.81 samples/sec Loss 5.0989 LearningRate 0.0474 Epoch: 13 Global Step: 143110 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:19:40,089-Speed 5978.99 samples/sec Loss 5.1261 LearningRate 0.0474 Epoch: 13 Global Step: 143120 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:19:46,966-Speed 5957.91 samples/sec Loss 5.1512 LearningRate 0.0474 Epoch: 13 Global Step: 143130 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:19:53,815-Speed 5980.44 samples/sec Loss 5.0908 LearningRate 0.0474 Epoch: 13 Global Step: 143140 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:20:00,677-Speed 5970.79 samples/sec Loss 5.1768 LearningRate 0.0474 Epoch: 13 Global Step: 143150 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:20:07,529-Speed 5979.42 samples/sec Loss 5.1068 LearningRate 0.0474 Epoch: 13 Global Step: 143160 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:20:14,465-Speed 5905.83 samples/sec Loss 5.1043 LearningRate 0.0473 Epoch: 13 Global Step: 143170 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:20:21,323-Speed 5974.69 samples/sec Loss 5.1075 LearningRate 0.0473 Epoch: 13 Global Step: 143180 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:20:28,178-Speed 5975.90 samples/sec Loss 5.1272 LearningRate 0.0473 Epoch: 13 Global Step: 143190 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:20:35,019-Speed 5988.15 samples/sec Loss 5.1110 LearningRate 0.0473 Epoch: 13 Global Step: 143200 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:20:41,876-Speed 5975.51 samples/sec Loss 5.1020 LearningRate 0.0473 Epoch: 13 Global Step: 143210 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:20:48,732-Speed 5975.53 samples/sec Loss 5.1223 LearningRate 0.0473 Epoch: 13 Global Step: 143220 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:20:55,635-Speed 5934.53 samples/sec Loss 5.1181 LearningRate 0.0473 Epoch: 13 Global Step: 143230 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:21:02,500-Speed 5967.87 samples/sec Loss 5.1100 LearningRate 0.0472 Epoch: 13 Global Step: 143240 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:21:09,352-Speed 5978.70 samples/sec Loss 5.0680 LearningRate 0.0472 Epoch: 13 Global Step: 143250 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:21:16,213-Speed 5970.98 samples/sec Loss 5.1328 LearningRate 0.0472 Epoch: 13 Global Step: 143260 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:21:23,059-Speed 5984.14 samples/sec Loss 5.0852 LearningRate 0.0472 Epoch: 13 Global Step: 143270 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:21:29,917-Speed 5974.31 samples/sec Loss 5.0873 LearningRate 0.0472 Epoch: 13 Global Step: 143280 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:21:36,767-Speed 5979.52 samples/sec Loss 5.1010 LearningRate 0.0472 Epoch: 13 Global Step: 143290 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:21:43,644-Speed 5957.89 samples/sec Loss 5.1977 LearningRate 0.0472 Epoch: 13 Global Step: 143300 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:21:50,512-Speed 5965.79 samples/sec Loss 5.0846 LearningRate 0.0471 Epoch: 13 Global Step: 143310 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:21:57,381-Speed 5964.06 samples/sec Loss 5.0713 LearningRate 0.0471 Epoch: 13 Global Step: 143320 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:22:04,243-Speed 5970.55 samples/sec Loss 5.0892 LearningRate 0.0471 Epoch: 13 Global Step: 143330 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:22:11,085-Speed 5987.33 samples/sec Loss 5.0694 LearningRate 0.0471 Epoch: 13 Global Step: 143340 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:22:17,937-Speed 5979.25 samples/sec Loss 5.1207 LearningRate 0.0471 Epoch: 13 Global Step: 143350 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:22:24,777-Speed 5989.16 samples/sec Loss 5.0831 LearningRate 0.0471 Epoch: 13 Global Step: 143360 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:22:31,630-Speed 5977.94 samples/sec Loss 5.1051 LearningRate 0.0470 Epoch: 13 Global Step: 143370 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:22:38,485-Speed 5976.51 samples/sec Loss 5.1561 LearningRate 0.0470 Epoch: 13 Global Step: 143380 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:22:45,349-Speed 5968.61 samples/sec Loss 5.0759 LearningRate 0.0470 Epoch: 13 Global Step: 143390 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:22:52,240-Speed 5946.62 samples/sec Loss 5.1527 LearningRate 0.0470 Epoch: 13 Global Step: 143400 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:22:59,091-Speed 5979.15 samples/sec Loss 5.1274 LearningRate 0.0470 Epoch: 13 Global Step: 143410 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:23:05,960-Speed 5963.95 samples/sec Loss 5.1643 LearningRate 0.0470 Epoch: 13 Global Step: 143420 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:23:12,816-Speed 5975.66 samples/sec Loss 5.0789 LearningRate 0.0470 Epoch: 13 Global Step: 143430 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:23:19,678-Speed 5970.13 samples/sec Loss 5.1087 LearningRate 0.0469 Epoch: 13 Global Step: 143440 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:23:26,533-Speed 5977.27 samples/sec Loss 5.0902 LearningRate 0.0469 Epoch: 13 Global Step: 143450 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:23:33,391-Speed 5973.32 samples/sec Loss 5.1453 LearningRate 0.0469 Epoch: 13 Global Step: 143460 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:23:40,246-Speed 5976.20 samples/sec Loss 5.0868 LearningRate 0.0469 Epoch: 13 Global Step: 143470 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:23:47,103-Speed 5974.60 samples/sec Loss 5.1397 LearningRate 0.0469 Epoch: 13 Global Step: 143480 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:23:53,985-Speed 5952.82 samples/sec Loss 5.1168 LearningRate 0.0469 Epoch: 13 Global Step: 143490 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:24:00,839-Speed 5977.33 samples/sec Loss 5.0587 LearningRate 0.0469 Epoch: 13 Global Step: 143500 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:24:07,707-Speed 5965.05 samples/sec Loss 5.0714 LearningRate 0.0468 Epoch: 13 Global Step: 143510 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:24:14,563-Speed 5977.03 samples/sec Loss 5.0916 LearningRate 0.0468 Epoch: 13 Global Step: 143520 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:24:21,420-Speed 5973.45 samples/sec Loss 5.0558 LearningRate 0.0468 Epoch: 13 Global Step: 143530 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:24:28,296-Speed 5958.43 samples/sec Loss 5.1004 LearningRate 0.0468 Epoch: 13 Global Step: 143540 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:24:35,145-Speed 5981.77 samples/sec Loss 5.0779 LearningRate 0.0468 Epoch: 13 Global Step: 143550 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 00:24:41,995-Speed 5982.31 samples/sec Loss 5.0706 LearningRate 0.0468 Epoch: 13 Global Step: 143560 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:24:48,858-Speed 5972.17 samples/sec Loss 5.0798 LearningRate 0.0468 Epoch: 13 Global Step: 143570 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:24:55,740-Speed 5952.63 samples/sec Loss 5.0933 LearningRate 0.0467 Epoch: 13 Global Step: 143580 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:25:02,611-Speed 5962.83 samples/sec Loss 5.0944 LearningRate 0.0467 Epoch: 13 Global Step: 143590 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:25:09,476-Speed 5967.86 samples/sec Loss 5.0749 LearningRate 0.0467 Epoch: 13 Global Step: 143600 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:25:16,367-Speed 5945.05 samples/sec Loss 5.1333 LearningRate 0.0467 Epoch: 13 Global Step: 143610 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 00:25:23,258-Speed 5945.33 samples/sec Loss 5.0475 LearningRate 0.0467 Epoch: 13 Global Step: 143620 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:25:30,111-Speed 5978.23 samples/sec Loss 5.0785 LearningRate 0.0467 Epoch: 13 Global Step: 143630 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:25:36,992-Speed 5953.28 samples/sec Loss 5.0842 LearningRate 0.0467 Epoch: 13 Global Step: 143640 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:25:43,838-Speed 5984.42 samples/sec Loss 5.0637 LearningRate 0.0466 Epoch: 13 Global Step: 143650 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:25:50,706-Speed 5964.90 samples/sec Loss 5.0691 LearningRate 0.0466 Epoch: 13 Global Step: 143660 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:25:57,577-Speed 5962.51 samples/sec Loss 5.1216 LearningRate 0.0466 Epoch: 13 Global Step: 143670 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:26:04,427-Speed 5981.17 samples/sec Loss 5.0756 LearningRate 0.0466 Epoch: 13 Global Step: 143680 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:26:11,279-Speed 5978.87 samples/sec Loss 5.0604 LearningRate 0.0466 Epoch: 13 Global Step: 143690 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:26:18,129-Speed 5980.82 samples/sec Loss 5.0679 LearningRate 0.0466 Epoch: 13 Global Step: 143700 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:26:24,993-Speed 5968.13 samples/sec Loss 5.0887 LearningRate 0.0465 Epoch: 13 Global Step: 143710 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:26:31,855-Speed 5970.42 samples/sec Loss 5.0929 LearningRate 0.0465 Epoch: 13 Global Step: 143720 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:26:38,730-Speed 5961.41 samples/sec Loss 5.0645 LearningRate 0.0465 Epoch: 13 Global Step: 143730 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:26:45,580-Speed 5980.67 samples/sec Loss 5.0862 LearningRate 0.0465 Epoch: 13 Global Step: 143740 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:26:52,433-Speed 5978.39 samples/sec Loss 5.0749 LearningRate 0.0465 Epoch: 13 Global Step: 143750 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:26:59,293-Speed 5972.13 samples/sec Loss 5.1014 LearningRate 0.0465 Epoch: 13 Global Step: 143760 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:27:06,166-Speed 5960.28 samples/sec Loss 5.1009 LearningRate 0.0465 Epoch: 13 Global Step: 143770 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:27:13,044-Speed 5956.35 samples/sec Loss 5.0499 LearningRate 0.0464 Epoch: 13 Global Step: 143780 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:27:19,898-Speed 5977.46 samples/sec Loss 5.0544 LearningRate 0.0464 Epoch: 13 Global Step: 143790 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:27:26,773-Speed 5958.81 samples/sec Loss 5.0450 LearningRate 0.0464 Epoch: 13 Global Step: 143800 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:27:33,646-Speed 5960.64 samples/sec Loss 5.0598 LearningRate 0.0464 Epoch: 13 Global Step: 143810 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:27:40,514-Speed 5965.73 samples/sec Loss 5.0375 LearningRate 0.0464 Epoch: 13 Global Step: 143820 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:27:47,364-Speed 5980.25 samples/sec Loss 5.0828 LearningRate 0.0464 Epoch: 13 Global Step: 143830 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:27:54,238-Speed 5959.28 samples/sec Loss 5.0504 LearningRate 0.0464 Epoch: 13 Global Step: 143840 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:28:01,106-Speed 5965.95 samples/sec Loss 5.0272 LearningRate 0.0463 Epoch: 13 Global Step: 143850 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-09 00:28:07,957-Speed 5979.13 samples/sec Loss 5.1123 LearningRate 0.0463 Epoch: 13 Global Step: 143860 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:28:14,813-Speed 5976.14 samples/sec Loss 4.9970 LearningRate 0.0463 Epoch: 13 Global Step: 143870 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:28:21,664-Speed 5980.14 samples/sec Loss 5.1186 LearningRate 0.0463 Epoch: 13 Global Step: 143880 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:28:28,523-Speed 5972.56 samples/sec Loss 5.0488 LearningRate 0.0463 Epoch: 13 Global Step: 143890 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:28:35,409-Speed 5950.33 samples/sec Loss 5.0838 LearningRate 0.0463 Epoch: 13 Global Step: 143900 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:28:42,274-Speed 5968.24 samples/sec Loss 5.0167 LearningRate 0.0463 Epoch: 13 Global Step: 143910 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:28:49,134-Speed 5971.58 samples/sec Loss 5.0800 LearningRate 0.0462 Epoch: 13 Global Step: 143920 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:28:55,986-Speed 5979.48 samples/sec Loss 5.0552 LearningRate 0.0462 Epoch: 13 Global Step: 143930 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:29:02,837-Speed 5979.07 samples/sec Loss 5.0618 LearningRate 0.0462 Epoch: 13 Global Step: 143940 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:29:09,690-Speed 5978.59 samples/sec Loss 5.0723 LearningRate 0.0462 Epoch: 13 Global Step: 143950 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:29:16,548-Speed 5974.01 samples/sec Loss 5.0215 LearningRate 0.0462 Epoch: 13 Global Step: 143960 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:29:23,418-Speed 5963.19 samples/sec Loss 5.0540 LearningRate 0.0462 Epoch: 13 Global Step: 143970 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:29:30,277-Speed 5972.17 samples/sec Loss 5.1210 LearningRate 0.0462 Epoch: 13 Global Step: 143980 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:29:37,182-Speed 5935.40 samples/sec Loss 5.0617 LearningRate 0.0461 Epoch: 13 Global Step: 143990 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:29:44,070-Speed 5947.03 samples/sec Loss 5.0952 LearningRate 0.0461 Epoch: 13 Global Step: 144000 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:29:50,978-Speed 5929.86 samples/sec Loss 5.0171 LearningRate 0.0461 Epoch: 13 Global Step: 144010 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:29:57,864-Speed 5950.42 samples/sec Loss 5.1673 LearningRate 0.0461 Epoch: 13 Global Step: 144020 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:30:04,721-Speed 5974.57 samples/sec Loss 5.0658 LearningRate 0.0461 Epoch: 13 Global Step: 144030 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:30:11,575-Speed 5977.32 samples/sec Loss 5.0733 LearningRate 0.0461 Epoch: 13 Global Step: 144040 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:30:18,474-Speed 5938.38 samples/sec Loss 5.1121 LearningRate 0.0461 Epoch: 13 Global Step: 144050 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:30:25,332-Speed 5973.68 samples/sec Loss 5.0318 LearningRate 0.0460 Epoch: 13 Global Step: 144060 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:30:32,206-Speed 5960.30 samples/sec Loss 5.0753 LearningRate 0.0460 Epoch: 13 Global Step: 144070 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:30:39,063-Speed 5974.54 samples/sec Loss 5.0557 LearningRate 0.0460 Epoch: 13 Global Step: 144080 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:30:45,936-Speed 5960.51 samples/sec Loss 5.0475 LearningRate 0.0460 Epoch: 13 Global Step: 144090 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:30:52,804-Speed 5964.42 samples/sec Loss 5.0650 LearningRate 0.0460 Epoch: 13 Global Step: 144100 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:30:59,693-Speed 5948.17 samples/sec Loss 5.0595 LearningRate 0.0460 Epoch: 13 Global Step: 144110 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:31:06,543-Speed 5981.83 samples/sec Loss 5.0345 LearningRate 0.0460 Epoch: 13 Global Step: 144120 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:31:13,409-Speed 5965.80 samples/sec Loss 5.0524 LearningRate 0.0459 Epoch: 13 Global Step: 144130 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:31:20,256-Speed 5983.63 samples/sec Loss 5.0759 LearningRate 0.0459 Epoch: 13 Global Step: 144140 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:31:27,114-Speed 5973.47 samples/sec Loss 5.0225 LearningRate 0.0459 Epoch: 13 Global Step: 144150 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:31:33,998-Speed 5953.62 samples/sec Loss 5.0692 LearningRate 0.0459 Epoch: 13 Global Step: 144160 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:31:40,882-Speed 5950.88 samples/sec Loss 5.0080 LearningRate 0.0459 Epoch: 13 Global Step: 144170 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:31:47,741-Speed 5973.44 samples/sec Loss 5.0686 LearningRate 0.0459 Epoch: 13 Global Step: 144180 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:31:54,608-Speed 5966.03 samples/sec Loss 5.0187 LearningRate 0.0458 Epoch: 13 Global Step: 144190 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:32:01,489-Speed 5953.47 samples/sec Loss 5.1009 LearningRate 0.0458 Epoch: 13 Global Step: 144200 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:32:08,354-Speed 5970.89 samples/sec Loss 5.0916 LearningRate 0.0458 Epoch: 13 Global Step: 144210 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:32:15,251-Speed 5939.36 samples/sec Loss 5.0056 LearningRate 0.0458 Epoch: 13 Global Step: 144220 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:32:22,139-Speed 5947.94 samples/sec Loss 4.9850 LearningRate 0.0458 Epoch: 13 Global Step: 144230 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:32:29,062-Speed 5921.28 samples/sec Loss 5.0395 LearningRate 0.0458 Epoch: 13 Global Step: 144240 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:32:35,944-Speed 5952.93 samples/sec Loss 5.0796 LearningRate 0.0458 Epoch: 13 Global Step: 144250 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:32:42,803-Speed 5975.05 samples/sec Loss 5.0337 LearningRate 0.0457 Epoch: 13 Global Step: 144260 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:32:49,673-Speed 5964.29 samples/sec Loss 5.0299 LearningRate 0.0457 Epoch: 13 Global Step: 144270 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:32:56,537-Speed 5968.36 samples/sec Loss 5.0359 LearningRate 0.0457 Epoch: 13 Global Step: 144280 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:33:03,389-Speed 5978.58 samples/sec Loss 5.0107 LearningRate 0.0457 Epoch: 13 Global Step: 144290 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:33:10,304-Speed 5924.87 samples/sec Loss 5.0389 LearningRate 0.0457 Epoch: 13 Global Step: 144300 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:33:17,194-Speed 5945.53 samples/sec Loss 5.0806 LearningRate 0.0457 Epoch: 13 Global Step: 144310 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:33:24,062-Speed 5965.05 samples/sec Loss 5.0770 LearningRate 0.0457 Epoch: 13 Global Step: 144320 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:33:30,940-Speed 5956.95 samples/sec Loss 5.0422 LearningRate 0.0456 Epoch: 13 Global Step: 144330 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:33:37,820-Speed 5954.71 samples/sec Loss 4.9995 LearningRate 0.0456 Epoch: 13 Global Step: 144340 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:33:44,742-Speed 5918.62 samples/sec Loss 5.0233 LearningRate 0.0456 Epoch: 13 Global Step: 144350 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:33:51,653-Speed 5928.21 samples/sec Loss 5.0290 LearningRate 0.0456 Epoch: 13 Global Step: 144360 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:33:58,548-Speed 5941.40 samples/sec Loss 5.0596 LearningRate 0.0456 Epoch: 13 Global Step: 144370 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:34:05,414-Speed 5967.86 samples/sec Loss 5.0296 LearningRate 0.0456 Epoch: 13 Global Step: 144380 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:34:12,274-Speed 5972.33 samples/sec Loss 5.0659 LearningRate 0.0456 Epoch: 13 Global Step: 144390 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:34:19,136-Speed 5969.92 samples/sec Loss 4.9963 LearningRate 0.0455 Epoch: 13 Global Step: 144400 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:34:25,996-Speed 5976.36 samples/sec Loss 5.0375 LearningRate 0.0455 Epoch: 13 Global Step: 144410 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:34:32,861-Speed 5968.03 samples/sec Loss 5.0184 LearningRate 0.0455 Epoch: 13 Global Step: 144420 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:34:39,714-Speed 5977.84 samples/sec Loss 5.0365 LearningRate 0.0455 Epoch: 13 Global Step: 144430 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:34:46,560-Speed 5983.99 samples/sec Loss 5.0722 LearningRate 0.0455 Epoch: 13 Global Step: 144440 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:34:53,401-Speed 5988.67 samples/sec Loss 5.0516 LearningRate 0.0455 Epoch: 13 Global Step: 144450 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:35:00,257-Speed 5975.30 samples/sec Loss 5.0433 LearningRate 0.0455 Epoch: 13 Global Step: 144460 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:35:07,114-Speed 5974.13 samples/sec Loss 5.0525 LearningRate 0.0454 Epoch: 13 Global Step: 144470 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:35:13,986-Speed 5962.02 samples/sec Loss 5.0519 LearningRate 0.0454 Epoch: 13 Global Step: 144480 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:35:20,844-Speed 5972.57 samples/sec Loss 5.0352 LearningRate 0.0454 Epoch: 13 Global Step: 144490 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:35:27,717-Speed 5961.13 samples/sec Loss 5.0352 LearningRate 0.0454 Epoch: 13 Global Step: 144500 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:35:34,576-Speed 5975.24 samples/sec Loss 4.9680 LearningRate 0.0454 Epoch: 13 Global Step: 144510 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:35:41,435-Speed 5971.53 samples/sec Loss 5.0763 LearningRate 0.0454 Epoch: 13 Global Step: 144520 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:35:48,405-Speed 5878.38 samples/sec Loss 5.0063 LearningRate 0.0454 Epoch: 13 Global Step: 144530 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:35:55,259-Speed 5977.34 samples/sec Loss 4.9791 LearningRate 0.0453 Epoch: 13 Global Step: 144540 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:36:02,130-Speed 5963.01 samples/sec Loss 5.0134 LearningRate 0.0453 Epoch: 13 Global Step: 144550 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:36:09,014-Speed 5951.44 samples/sec Loss 4.9710 LearningRate 0.0453 Epoch: 13 Global Step: 144560 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:36:15,878-Speed 5969.33 samples/sec Loss 5.0623 LearningRate 0.0453 Epoch: 13 Global Step: 144570 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:36:22,754-Speed 5957.74 samples/sec Loss 5.0623 LearningRate 0.0453 Epoch: 13 Global Step: 144580 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:36:29,614-Speed 5972.49 samples/sec Loss 5.0470 LearningRate 0.0453 Epoch: 13 Global Step: 144590 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:36:36,454-Speed 5989.11 samples/sec Loss 5.0487 LearningRate 0.0453 Epoch: 13 Global Step: 144600 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:36:43,311-Speed 5974.49 samples/sec Loss 5.0019 LearningRate 0.0452 Epoch: 13 Global Step: 144610 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:36:50,157-Speed 5984.10 samples/sec Loss 5.0195 LearningRate 0.0452 Epoch: 13 Global Step: 144620 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:36:57,007-Speed 5983.40 samples/sec Loss 5.0451 LearningRate 0.0452 Epoch: 13 Global Step: 144630 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:37:03,907-Speed 5937.47 samples/sec Loss 5.0437 LearningRate 0.0452 Epoch: 13 Global Step: 144640 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:37:10,781-Speed 5960.23 samples/sec Loss 5.0599 LearningRate 0.0452 Epoch: 13 Global Step: 144650 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:37:17,680-Speed 5938.64 samples/sec Loss 5.0148 LearningRate 0.0452 Epoch: 13 Global Step: 144660 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:37:24,527-Speed 5982.84 samples/sec Loss 5.0456 LearningRate 0.0452 Epoch: 13 Global Step: 144670 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:37:31,390-Speed 5971.39 samples/sec Loss 4.9822 LearningRate 0.0451 Epoch: 13 Global Step: 144680 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:37:38,249-Speed 5973.18 samples/sec Loss 5.0040 LearningRate 0.0451 Epoch: 13 Global Step: 144690 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:37:45,110-Speed 5970.39 samples/sec Loss 4.9982 LearningRate 0.0451 Epoch: 13 Global Step: 144700 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:37:51,968-Speed 5974.19 samples/sec Loss 5.0046 LearningRate 0.0451 Epoch: 13 Global Step: 144710 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:37:58,833-Speed 5967.92 samples/sec Loss 5.0377 LearningRate 0.0451 Epoch: 13 Global Step: 144720 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:38:05,687-Speed 5977.33 samples/sec Loss 5.0074 LearningRate 0.0451 Epoch: 13 Global Step: 144730 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:38:12,542-Speed 5975.87 samples/sec Loss 5.0064 LearningRate 0.0451 Epoch: 13 Global Step: 144740 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:38:19,404-Speed 5971.12 samples/sec Loss 4.9829 LearningRate 0.0450 Epoch: 13 Global Step: 144750 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:38:26,284-Speed 5954.59 samples/sec Loss 5.0221 LearningRate 0.0450 Epoch: 13 Global Step: 144760 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:38:33,134-Speed 5981.35 samples/sec Loss 5.0066 LearningRate 0.0450 Epoch: 13 Global Step: 144770 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:38:40,017-Speed 5951.56 samples/sec Loss 4.9995 LearningRate 0.0450 Epoch: 13 Global Step: 144780 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:38:46,872-Speed 5976.23 samples/sec Loss 4.9948 LearningRate 0.0450 Epoch: 13 Global Step: 144790 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:38:53,745-Speed 5963.28 samples/sec Loss 4.9654 LearningRate 0.0450 Epoch: 13 Global Step: 144800 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:39:00,610-Speed 5973.62 samples/sec Loss 5.0009 LearningRate 0.0450 Epoch: 13 Global Step: 144810 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:39:07,457-Speed 5983.04 samples/sec Loss 5.0486 LearningRate 0.0449 Epoch: 13 Global Step: 144820 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:39:14,330-Speed 5960.91 samples/sec Loss 5.0678 LearningRate 0.0449 Epoch: 13 Global Step: 144830 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:39:21,220-Speed 5946.45 samples/sec Loss 4.9928 LearningRate 0.0449 Epoch: 13 Global Step: 144840 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:39:28,116-Speed 5940.73 samples/sec Loss 4.9822 LearningRate 0.0449 Epoch: 13 Global Step: 144850 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:39:34,983-Speed 5965.29 samples/sec Loss 4.9908 LearningRate 0.0449 Epoch: 13 Global Step: 144860 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:39:41,847-Speed 5968.91 samples/sec Loss 4.9651 LearningRate 0.0449 Epoch: 13 Global Step: 144870 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:39:48,704-Speed 5974.36 samples/sec Loss 4.9845 LearningRate 0.0449 Epoch: 13 Global Step: 144880 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:39:55,599-Speed 5942.01 samples/sec Loss 4.9820 LearningRate 0.0448 Epoch: 13 Global Step: 144890 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:40:02,454-Speed 5976.63 samples/sec Loss 4.9604 LearningRate 0.0448 Epoch: 13 Global Step: 144900 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:40:09,301-Speed 5982.94 samples/sec Loss 5.0151 LearningRate 0.0448 Epoch: 13 Global Step: 144910 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:40:16,177-Speed 5958.37 samples/sec Loss 4.9628 LearningRate 0.0448 Epoch: 13 Global Step: 144920 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:40:23,025-Speed 5982.47 samples/sec Loss 5.0177 LearningRate 0.0448 Epoch: 13 Global Step: 144930 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:40:29,877-Speed 5978.12 samples/sec Loss 4.9713 LearningRate 0.0448 Epoch: 13 Global Step: 144940 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:40:36,725-Speed 5982.95 samples/sec Loss 4.9759 LearningRate 0.0448 Epoch: 13 Global Step: 144950 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:40:43,583-Speed 5974.18 samples/sec Loss 4.9644 LearningRate 0.0447 Epoch: 13 Global Step: 144960 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:40:50,441-Speed 5972.93 samples/sec Loss 5.0201 LearningRate 0.0447 Epoch: 13 Global Step: 144970 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:40:57,292-Speed 5979.94 samples/sec Loss 4.9771 LearningRate 0.0447 Epoch: 13 Global Step: 144980 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:41:04,162-Speed 5962.83 samples/sec Loss 4.9433 LearningRate 0.0447 Epoch: 13 Global Step: 144990 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:41:11,013-Speed 5979.82 samples/sec Loss 5.0421 LearningRate 0.0447 Epoch: 13 Global Step: 145000 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:41:38,083-[lfw][145000]XNorm: 24.053263 Training: 2022-01-09 00:41:38,084-[lfw][145000]Accuracy-Flip: 0.99800+-0.00296 Training: 2022-01-09 00:41:38,084-[lfw][145000]Accuracy-Highest: 0.99800 Training: 2022-01-09 00:42:09,428-[cfp_fp][145000]XNorm: 21.189103 Training: 2022-01-09 00:42:09,429-[cfp_fp][145000]Accuracy-Flip: 0.98700+-0.00375 Training: 2022-01-09 00:42:09,429-[cfp_fp][145000]Accuracy-Highest: 0.98714 Training: 2022-01-09 00:42:36,229-[agedb_30][145000]XNorm: 23.486906 Training: 2022-01-09 00:42:36,230-[agedb_30][145000]Accuracy-Flip: 0.97800+-0.00694 Training: 2022-01-09 00:42:36,231-[agedb_30][145000]Accuracy-Highest: 0.97800 Training: 2022-01-09 00:42:43,073-Speed 444.94 samples/sec Loss 5.0248 LearningRate 0.0447 Epoch: 13 Global Step: 145010 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:42:49,907-Speed 5994.10 samples/sec Loss 4.9600 LearningRate 0.0447 Epoch: 13 Global Step: 145020 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:42:57,222-Speed 5600.66 samples/sec Loss 5.0203 LearningRate 0.0446 Epoch: 13 Global Step: 145030 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:43:04,060-Speed 5991.06 samples/sec Loss 5.0115 LearningRate 0.0446 Epoch: 13 Global Step: 145040 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:43:10,922-Speed 5970.15 samples/sec Loss 5.0270 LearningRate 0.0446 Epoch: 13 Global Step: 145050 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:43:17,774-Speed 5980.34 samples/sec Loss 4.9879 LearningRate 0.0446 Epoch: 13 Global Step: 145060 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:43:24,632-Speed 5973.08 samples/sec Loss 5.0493 LearningRate 0.0446 Epoch: 13 Global Step: 145070 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:43:31,483-Speed 5979.76 samples/sec Loss 5.0128 LearningRate 0.0446 Epoch: 13 Global Step: 145080 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:43:38,332-Speed 5982.47 samples/sec Loss 4.9777 LearningRate 0.0446 Epoch: 13 Global Step: 145090 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:43:45,183-Speed 5979.58 samples/sec Loss 5.0321 LearningRate 0.0445 Epoch: 13 Global Step: 145100 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:43:52,039-Speed 5974.96 samples/sec Loss 5.0043 LearningRate 0.0445 Epoch: 13 Global Step: 145110 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:43:58,910-Speed 5963.02 samples/sec Loss 4.9710 LearningRate 0.0445 Epoch: 13 Global Step: 145120 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:44:05,815-Speed 5932.68 samples/sec Loss 5.0164 LearningRate 0.0445 Epoch: 13 Global Step: 145130 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:44:12,684-Speed 5963.71 samples/sec Loss 5.0057 LearningRate 0.0445 Epoch: 13 Global Step: 145140 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:44:19,556-Speed 5962.13 samples/sec Loss 5.0190 LearningRate 0.0445 Epoch: 13 Global Step: 145150 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:44:26,415-Speed 5973.13 samples/sec Loss 5.0112 LearningRate 0.0445 Epoch: 13 Global Step: 145160 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:44:33,278-Speed 5968.89 samples/sec Loss 4.9820 LearningRate 0.0444 Epoch: 13 Global Step: 145170 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:44:56,506-Speed 1763.51 samples/sec Loss 4.9649 LearningRate 0.0444 Epoch: 14 Global Step: 145180 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:45:03,349-Speed 5987.05 samples/sec Loss 4.9908 LearningRate 0.0444 Epoch: 14 Global Step: 145190 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:45:10,183-Speed 5994.46 samples/sec Loss 4.9693 LearningRate 0.0444 Epoch: 14 Global Step: 145200 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:45:17,038-Speed 5976.31 samples/sec Loss 4.9938 LearningRate 0.0444 Epoch: 14 Global Step: 145210 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:45:23,900-Speed 5970.61 samples/sec Loss 4.9825 LearningRate 0.0444 Epoch: 14 Global Step: 145220 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:45:30,773-Speed 5959.94 samples/sec Loss 5.0015 LearningRate 0.0444 Epoch: 14 Global Step: 145230 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:45:37,626-Speed 5981.68 samples/sec Loss 4.9230 LearningRate 0.0443 Epoch: 14 Global Step: 145240 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:45:44,523-Speed 5939.24 samples/sec Loss 4.9897 LearningRate 0.0443 Epoch: 14 Global Step: 145250 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:45:51,428-Speed 5934.78 samples/sec Loss 4.9355 LearningRate 0.0443 Epoch: 14 Global Step: 145260 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:45:58,351-Speed 5917.21 samples/sec Loss 4.9148 LearningRate 0.0443 Epoch: 14 Global Step: 145270 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:46:05,228-Speed 5957.25 samples/sec Loss 4.9068 LearningRate 0.0443 Epoch: 14 Global Step: 145280 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:46:12,124-Speed 5943.20 samples/sec Loss 4.9823 LearningRate 0.0443 Epoch: 14 Global Step: 145290 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:46:19,059-Speed 5907.47 samples/sec Loss 4.9418 LearningRate 0.0443 Epoch: 14 Global Step: 145300 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:46:25,970-Speed 5927.83 samples/sec Loss 4.9901 LearningRate 0.0442 Epoch: 14 Global Step: 145310 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:46:32,907-Speed 5905.90 samples/sec Loss 4.9694 LearningRate 0.0442 Epoch: 14 Global Step: 145320 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:46:39,791-Speed 5951.87 samples/sec Loss 4.9450 LearningRate 0.0442 Epoch: 14 Global Step: 145330 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:46:46,647-Speed 5975.24 samples/sec Loss 4.9397 LearningRate 0.0442 Epoch: 14 Global Step: 145340 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:46:53,521-Speed 5963.23 samples/sec Loss 4.9626 LearningRate 0.0442 Epoch: 14 Global Step: 145350 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:47:00,407-Speed 5949.20 samples/sec Loss 4.9649 LearningRate 0.0442 Epoch: 14 Global Step: 145360 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:47:07,267-Speed 5972.39 samples/sec Loss 4.8966 LearningRate 0.0442 Epoch: 14 Global Step: 145370 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:47:14,138-Speed 5962.36 samples/sec Loss 4.9541 LearningRate 0.0441 Epoch: 14 Global Step: 145380 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:47:20,996-Speed 5973.34 samples/sec Loss 5.0132 LearningRate 0.0441 Epoch: 14 Global Step: 145390 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:47:27,862-Speed 5968.28 samples/sec Loss 4.9942 LearningRate 0.0441 Epoch: 14 Global Step: 145400 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:47:34,724-Speed 5970.36 samples/sec Loss 4.9497 LearningRate 0.0441 Epoch: 14 Global Step: 145410 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:47:41,883-Speed 5722.14 samples/sec Loss 4.9789 LearningRate 0.0441 Epoch: 14 Global Step: 145420 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:47:48,781-Speed 5939.10 samples/sec Loss 4.9335 LearningRate 0.0441 Epoch: 14 Global Step: 145430 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:47:55,644-Speed 5971.49 samples/sec Loss 4.9500 LearningRate 0.0441 Epoch: 14 Global Step: 145440 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 00:48:02,515-Speed 5962.50 samples/sec Loss 4.9816 LearningRate 0.0440 Epoch: 14 Global Step: 145450 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 00:48:09,362-Speed 5983.37 samples/sec Loss 4.9588 LearningRate 0.0440 Epoch: 14 Global Step: 145460 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 00:48:16,214-Speed 5980.53 samples/sec Loss 4.9760 LearningRate 0.0440 Epoch: 14 Global Step: 145470 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 00:48:23,079-Speed 5967.60 samples/sec Loss 4.9213 LearningRate 0.0440 Epoch: 14 Global Step: 145480 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 00:48:29,949-Speed 5963.66 samples/sec Loss 4.9334 LearningRate 0.0440 Epoch: 14 Global Step: 145490 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 00:48:36,802-Speed 5978.06 samples/sec Loss 4.9700 LearningRate 0.0440 Epoch: 14 Global Step: 145500 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 00:48:43,658-Speed 5975.23 samples/sec Loss 4.9485 LearningRate 0.0440 Epoch: 14 Global Step: 145510 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 00:48:50,551-Speed 5943.13 samples/sec Loss 4.9661 LearningRate 0.0439 Epoch: 14 Global Step: 145520 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 00:48:57,411-Speed 5972.17 samples/sec Loss 4.9294 LearningRate 0.0439 Epoch: 14 Global Step: 145530 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 00:49:04,297-Speed 5948.90 samples/sec Loss 4.9141 LearningRate 0.0439 Epoch: 14 Global Step: 145540 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:49:11,158-Speed 5971.27 samples/sec Loss 4.9354 LearningRate 0.0439 Epoch: 14 Global Step: 145550 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:49:18,021-Speed 5968.89 samples/sec Loss 4.9966 LearningRate 0.0439 Epoch: 14 Global Step: 145560 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:49:24,878-Speed 5974.79 samples/sec Loss 4.9618 LearningRate 0.0439 Epoch: 14 Global Step: 145570 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:49:31,736-Speed 5973.35 samples/sec Loss 4.9416 LearningRate 0.0439 Epoch: 14 Global Step: 145580 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:49:38,627-Speed 5945.46 samples/sec Loss 4.9950 LearningRate 0.0438 Epoch: 14 Global Step: 145590 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:49:45,498-Speed 5962.21 samples/sec Loss 4.9435 LearningRate 0.0438 Epoch: 14 Global Step: 145600 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:49:52,378-Speed 5954.50 samples/sec Loss 5.0335 LearningRate 0.0438 Epoch: 14 Global Step: 145610 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:49:59,270-Speed 5944.45 samples/sec Loss 4.9456 LearningRate 0.0438 Epoch: 14 Global Step: 145620 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:50:06,141-Speed 5962.09 samples/sec Loss 4.9656 LearningRate 0.0438 Epoch: 14 Global Step: 145630 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:50:13,002-Speed 5970.31 samples/sec Loss 4.8846 LearningRate 0.0438 Epoch: 14 Global Step: 145640 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:50:19,866-Speed 5969.05 samples/sec Loss 4.9608 LearningRate 0.0438 Epoch: 14 Global Step: 145650 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:50:26,721-Speed 5977.21 samples/sec Loss 4.9943 LearningRate 0.0437 Epoch: 14 Global Step: 145660 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:50:33,610-Speed 5946.78 samples/sec Loss 4.9423 LearningRate 0.0437 Epoch: 14 Global Step: 145670 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:50:40,485-Speed 5959.13 samples/sec Loss 4.9645 LearningRate 0.0437 Epoch: 14 Global Step: 145680 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:50:47,359-Speed 5962.39 samples/sec Loss 4.9666 LearningRate 0.0437 Epoch: 14 Global Step: 145690 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:50:54,210-Speed 5979.57 samples/sec Loss 4.9483 LearningRate 0.0437 Epoch: 14 Global Step: 145700 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:51:01,070-Speed 5972.38 samples/sec Loss 4.9072 LearningRate 0.0437 Epoch: 14 Global Step: 145710 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:51:07,933-Speed 5972.46 samples/sec Loss 4.9034 LearningRate 0.0437 Epoch: 14 Global Step: 145720 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:51:14,824-Speed 5945.20 samples/sec Loss 4.9216 LearningRate 0.0436 Epoch: 14 Global Step: 145730 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:51:21,689-Speed 5967.53 samples/sec Loss 4.9437 LearningRate 0.0436 Epoch: 14 Global Step: 145740 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:51:28,552-Speed 5968.94 samples/sec Loss 4.9018 LearningRate 0.0436 Epoch: 14 Global Step: 145750 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:51:35,416-Speed 5968.57 samples/sec Loss 4.9306 LearningRate 0.0436 Epoch: 14 Global Step: 145760 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:51:42,288-Speed 5961.48 samples/sec Loss 4.9573 LearningRate 0.0436 Epoch: 14 Global Step: 145770 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:51:49,158-Speed 5964.44 samples/sec Loss 4.9127 LearningRate 0.0436 Epoch: 14 Global Step: 145780 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:51:56,032-Speed 5959.67 samples/sec Loss 4.9689 LearningRate 0.0436 Epoch: 14 Global Step: 145790 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:52:02,889-Speed 5974.21 samples/sec Loss 4.9234 LearningRate 0.0435 Epoch: 14 Global Step: 145800 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:52:09,770-Speed 5954.12 samples/sec Loss 4.9175 LearningRate 0.0435 Epoch: 14 Global Step: 145810 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:52:16,645-Speed 5958.28 samples/sec Loss 4.9553 LearningRate 0.0435 Epoch: 14 Global Step: 145820 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:52:23,511-Speed 5966.86 samples/sec Loss 4.9362 LearningRate 0.0435 Epoch: 14 Global Step: 145830 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:52:30,367-Speed 5976.13 samples/sec Loss 4.9382 LearningRate 0.0435 Epoch: 14 Global Step: 145840 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:52:37,216-Speed 5981.16 samples/sec Loss 4.9514 LearningRate 0.0435 Epoch: 14 Global Step: 145850 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:52:44,078-Speed 5970.41 samples/sec Loss 4.9163 LearningRate 0.0435 Epoch: 14 Global Step: 145860 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:52:50,937-Speed 5972.57 samples/sec Loss 4.9411 LearningRate 0.0434 Epoch: 14 Global Step: 145870 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:52:57,809-Speed 5961.76 samples/sec Loss 4.9036 LearningRate 0.0434 Epoch: 14 Global Step: 145880 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:53:04,661-Speed 5980.62 samples/sec Loss 4.9750 LearningRate 0.0434 Epoch: 14 Global Step: 145890 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:53:11,533-Speed 5961.68 samples/sec Loss 4.9256 LearningRate 0.0434 Epoch: 14 Global Step: 145900 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:53:18,421-Speed 5947.44 samples/sec Loss 4.9192 LearningRate 0.0434 Epoch: 14 Global Step: 145910 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:53:25,264-Speed 5987.17 samples/sec Loss 4.8719 LearningRate 0.0434 Epoch: 14 Global Step: 145920 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:53:32,128-Speed 5968.65 samples/sec Loss 4.9108 LearningRate 0.0434 Epoch: 14 Global Step: 145930 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:53:38,979-Speed 5979.26 samples/sec Loss 4.9117 LearningRate 0.0433 Epoch: 14 Global Step: 145940 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:53:45,843-Speed 5968.09 samples/sec Loss 4.9025 LearningRate 0.0433 Epoch: 14 Global Step: 145950 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:53:52,707-Speed 5967.85 samples/sec Loss 4.9098 LearningRate 0.0433 Epoch: 14 Global Step: 145960 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:53:59,567-Speed 5972.43 samples/sec Loss 4.9211 LearningRate 0.0433 Epoch: 14 Global Step: 145970 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:54:06,453-Speed 5949.47 samples/sec Loss 4.9158 LearningRate 0.0433 Epoch: 14 Global Step: 145980 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:54:13,327-Speed 5960.22 samples/sec Loss 4.9059 LearningRate 0.0433 Epoch: 14 Global Step: 145990 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:54:20,200-Speed 5960.00 samples/sec Loss 4.9459 LearningRate 0.0433 Epoch: 14 Global Step: 146000 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:54:27,078-Speed 5957.49 samples/sec Loss 4.9269 LearningRate 0.0432 Epoch: 14 Global Step: 146010 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:54:33,956-Speed 5955.64 samples/sec Loss 4.9174 LearningRate 0.0432 Epoch: 14 Global Step: 146020 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:54:40,849-Speed 5943.67 samples/sec Loss 4.9208 LearningRate 0.0432 Epoch: 14 Global Step: 146030 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:54:47,711-Speed 5971.25 samples/sec Loss 4.9193 LearningRate 0.0432 Epoch: 14 Global Step: 146040 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:54:54,591-Speed 5955.23 samples/sec Loss 4.9302 LearningRate 0.0432 Epoch: 14 Global Step: 146050 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:55:01,459-Speed 5964.88 samples/sec Loss 4.9046 LearningRate 0.0432 Epoch: 14 Global Step: 146060 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:55:08,341-Speed 5952.98 samples/sec Loss 4.8897 LearningRate 0.0432 Epoch: 14 Global Step: 146070 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:55:15,220-Speed 5955.44 samples/sec Loss 4.9337 LearningRate 0.0431 Epoch: 14 Global Step: 146080 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:55:22,071-Speed 5979.95 samples/sec Loss 4.9624 LearningRate 0.0431 Epoch: 14 Global Step: 146090 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:55:28,937-Speed 5966.75 samples/sec Loss 4.9175 LearningRate 0.0431 Epoch: 14 Global Step: 146100 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:55:35,808-Speed 5964.28 samples/sec Loss 4.8889 LearningRate 0.0431 Epoch: 14 Global Step: 146110 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:55:42,686-Speed 5955.42 samples/sec Loss 4.9763 LearningRate 0.0431 Epoch: 14 Global Step: 146120 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:55:49,546-Speed 5972.76 samples/sec Loss 4.8613 LearningRate 0.0431 Epoch: 14 Global Step: 146130 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:55:56,421-Speed 5959.05 samples/sec Loss 4.9315 LearningRate 0.0431 Epoch: 14 Global Step: 146140 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:56:03,304-Speed 5951.96 samples/sec Loss 4.9062 LearningRate 0.0430 Epoch: 14 Global Step: 146150 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:56:10,202-Speed 5940.88 samples/sec Loss 4.8953 LearningRate 0.0430 Epoch: 14 Global Step: 146160 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:56:17,064-Speed 5969.58 samples/sec Loss 4.9149 LearningRate 0.0430 Epoch: 14 Global Step: 146170 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:56:23,946-Speed 5952.95 samples/sec Loss 4.8769 LearningRate 0.0430 Epoch: 14 Global Step: 146180 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:56:30,808-Speed 5972.61 samples/sec Loss 4.9467 LearningRate 0.0430 Epoch: 14 Global Step: 146190 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:56:37,676-Speed 5964.98 samples/sec Loss 4.9418 LearningRate 0.0430 Epoch: 14 Global Step: 146200 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:56:44,541-Speed 5967.75 samples/sec Loss 4.8946 LearningRate 0.0430 Epoch: 14 Global Step: 146210 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:56:51,437-Speed 5940.91 samples/sec Loss 4.8920 LearningRate 0.0430 Epoch: 14 Global Step: 146220 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:56:58,324-Speed 5948.55 samples/sec Loss 4.9129 LearningRate 0.0429 Epoch: 14 Global Step: 146230 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:57:05,190-Speed 5966.73 samples/sec Loss 4.9298 LearningRate 0.0429 Epoch: 14 Global Step: 146240 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:57:12,035-Speed 5985.63 samples/sec Loss 4.9096 LearningRate 0.0429 Epoch: 14 Global Step: 146250 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:57:18,913-Speed 5958.14 samples/sec Loss 4.9140 LearningRate 0.0429 Epoch: 14 Global Step: 146260 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:57:25,814-Speed 5936.21 samples/sec Loss 4.8870 LearningRate 0.0429 Epoch: 14 Global Step: 146270 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:57:32,676-Speed 5972.45 samples/sec Loss 4.8573 LearningRate 0.0429 Epoch: 14 Global Step: 146280 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:57:39,561-Speed 5952.90 samples/sec Loss 4.9064 LearningRate 0.0429 Epoch: 14 Global Step: 146290 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:57:46,455-Speed 5942.67 samples/sec Loss 4.8816 LearningRate 0.0428 Epoch: 14 Global Step: 146300 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:57:53,342-Speed 5948.96 samples/sec Loss 4.8491 LearningRate 0.0428 Epoch: 14 Global Step: 146310 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:58:00,341-Speed 5854.27 samples/sec Loss 4.9135 LearningRate 0.0428 Epoch: 14 Global Step: 146320 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:58:07,198-Speed 5974.16 samples/sec Loss 4.8675 LearningRate 0.0428 Epoch: 14 Global Step: 146330 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:58:14,069-Speed 5962.89 samples/sec Loss 4.9147 LearningRate 0.0428 Epoch: 14 Global Step: 146340 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:58:21,043-Speed 5874.78 samples/sec Loss 4.8797 LearningRate 0.0428 Epoch: 14 Global Step: 146350 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:58:28,027-Speed 5866.38 samples/sec Loss 4.8909 LearningRate 0.0428 Epoch: 14 Global Step: 146360 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:58:34,997-Speed 5877.61 samples/sec Loss 4.9296 LearningRate 0.0427 Epoch: 14 Global Step: 146370 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:58:41,925-Speed 5914.38 samples/sec Loss 4.9039 LearningRate 0.0427 Epoch: 14 Global Step: 146380 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:58:48,854-Speed 5911.60 samples/sec Loss 4.9055 LearningRate 0.0427 Epoch: 14 Global Step: 146390 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:58:55,777-Speed 5920.49 samples/sec Loss 4.9236 LearningRate 0.0427 Epoch: 14 Global Step: 146400 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:59:02,702-Speed 5916.74 samples/sec Loss 4.9332 LearningRate 0.0427 Epoch: 14 Global Step: 146410 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:59:09,608-Speed 5931.04 samples/sec Loss 4.8962 LearningRate 0.0427 Epoch: 14 Global Step: 146420 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:59:16,526-Speed 5923.48 samples/sec Loss 4.8999 LearningRate 0.0427 Epoch: 14 Global Step: 146430 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:59:23,390-Speed 5968.69 samples/sec Loss 4.8360 LearningRate 0.0426 Epoch: 14 Global Step: 146440 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:59:30,253-Speed 5968.68 samples/sec Loss 4.8652 LearningRate 0.0426 Epoch: 14 Global Step: 146450 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 00:59:37,102-Speed 5984.36 samples/sec Loss 4.9092 LearningRate 0.0426 Epoch: 14 Global Step: 146460 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:59:43,991-Speed 5946.69 samples/sec Loss 4.8604 LearningRate 0.0426 Epoch: 14 Global Step: 146470 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:59:50,926-Speed 5907.33 samples/sec Loss 4.8464 LearningRate 0.0426 Epoch: 14 Global Step: 146480 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 00:59:57,866-Speed 5906.14 samples/sec Loss 4.8901 LearningRate 0.0426 Epoch: 14 Global Step: 146490 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:00:04,802-Speed 5909.37 samples/sec Loss 4.8759 LearningRate 0.0426 Epoch: 14 Global Step: 146500 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:00:11,702-Speed 5937.59 samples/sec Loss 4.9150 LearningRate 0.0425 Epoch: 14 Global Step: 146510 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:00:18,561-Speed 5973.18 samples/sec Loss 4.8660 LearningRate 0.0425 Epoch: 14 Global Step: 146520 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:00:25,438-Speed 5956.83 samples/sec Loss 4.8767 LearningRate 0.0425 Epoch: 14 Global Step: 146530 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:00:32,281-Speed 5986.79 samples/sec Loss 4.9177 LearningRate 0.0425 Epoch: 14 Global Step: 146540 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:00:39,143-Speed 5969.75 samples/sec Loss 4.9008 LearningRate 0.0425 Epoch: 14 Global Step: 146550 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:00:46,010-Speed 5966.67 samples/sec Loss 4.8330 LearningRate 0.0425 Epoch: 14 Global Step: 146560 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:00:52,918-Speed 5930.41 samples/sec Loss 4.9106 LearningRate 0.0425 Epoch: 14 Global Step: 146570 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:00:59,819-Speed 5936.48 samples/sec Loss 4.9100 LearningRate 0.0424 Epoch: 14 Global Step: 146580 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:01:06,740-Speed 5920.02 samples/sec Loss 4.9143 LearningRate 0.0424 Epoch: 14 Global Step: 146590 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:01:13,614-Speed 5959.60 samples/sec Loss 4.9041 LearningRate 0.0424 Epoch: 14 Global Step: 146600 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:01:20,490-Speed 5957.97 samples/sec Loss 4.8550 LearningRate 0.0424 Epoch: 14 Global Step: 146610 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:01:27,384-Speed 5943.33 samples/sec Loss 4.8394 LearningRate 0.0424 Epoch: 14 Global Step: 146620 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:01:34,253-Speed 5964.08 samples/sec Loss 4.8737 LearningRate 0.0424 Epoch: 14 Global Step: 146630 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:01:41,109-Speed 5975.17 samples/sec Loss 4.8375 LearningRate 0.0424 Epoch: 14 Global Step: 146640 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:01:47,983-Speed 5960.73 samples/sec Loss 4.8843 LearningRate 0.0423 Epoch: 14 Global Step: 146650 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:01:54,831-Speed 5982.24 samples/sec Loss 4.8868 LearningRate 0.0423 Epoch: 14 Global Step: 146660 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:02:01,697-Speed 5966.30 samples/sec Loss 4.8899 LearningRate 0.0423 Epoch: 14 Global Step: 146670 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:02:08,564-Speed 5966.69 samples/sec Loss 4.8469 LearningRate 0.0423 Epoch: 14 Global Step: 146680 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:02:15,433-Speed 5963.73 samples/sec Loss 4.8751 LearningRate 0.0423 Epoch: 14 Global Step: 146690 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:02:22,310-Speed 5959.68 samples/sec Loss 4.8913 LearningRate 0.0423 Epoch: 14 Global Step: 146700 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:02:29,167-Speed 5974.72 samples/sec Loss 4.8357 LearningRate 0.0423 Epoch: 14 Global Step: 146710 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:02:36,049-Speed 5952.66 samples/sec Loss 4.9117 LearningRate 0.0423 Epoch: 14 Global Step: 146720 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:02:42,919-Speed 5966.45 samples/sec Loss 4.8321 LearningRate 0.0422 Epoch: 14 Global Step: 146730 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:02:49,802-Speed 5952.49 samples/sec Loss 4.8959 LearningRate 0.0422 Epoch: 14 Global Step: 146740 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:02:56,669-Speed 5965.68 samples/sec Loss 4.9113 LearningRate 0.0422 Epoch: 14 Global Step: 146750 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:03:03,532-Speed 5968.84 samples/sec Loss 4.8742 LearningRate 0.0422 Epoch: 14 Global Step: 146760 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:03:10,395-Speed 5970.45 samples/sec Loss 4.8275 LearningRate 0.0422 Epoch: 14 Global Step: 146770 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:03:17,260-Speed 5970.73 samples/sec Loss 4.8980 LearningRate 0.0422 Epoch: 14 Global Step: 146780 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:03:24,135-Speed 5959.44 samples/sec Loss 4.8475 LearningRate 0.0422 Epoch: 14 Global Step: 146790 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:03:31,003-Speed 5965.40 samples/sec Loss 4.8847 LearningRate 0.0421 Epoch: 14 Global Step: 146800 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:03:37,908-Speed 5932.85 samples/sec Loss 4.8370 LearningRate 0.0421 Epoch: 14 Global Step: 146810 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:03:44,822-Speed 5925.79 samples/sec Loss 4.8173 LearningRate 0.0421 Epoch: 14 Global Step: 146820 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:03:51,703-Speed 5954.02 samples/sec Loss 4.8893 LearningRate 0.0421 Epoch: 14 Global Step: 146830 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:03:58,577-Speed 5959.08 samples/sec Loss 4.8383 LearningRate 0.0421 Epoch: 14 Global Step: 146840 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:04:05,450-Speed 5963.90 samples/sec Loss 4.8478 LearningRate 0.0421 Epoch: 14 Global Step: 146850 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:04:12,305-Speed 5975.75 samples/sec Loss 4.8095 LearningRate 0.0421 Epoch: 14 Global Step: 146860 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:04:19,160-Speed 5976.07 samples/sec Loss 4.8500 LearningRate 0.0420 Epoch: 14 Global Step: 146870 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:04:26,036-Speed 5958.83 samples/sec Loss 4.8413 LearningRate 0.0420 Epoch: 14 Global Step: 146880 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:04:32,907-Speed 5963.76 samples/sec Loss 4.8816 LearningRate 0.0420 Epoch: 14 Global Step: 146890 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:04:39,758-Speed 5979.77 samples/sec Loss 4.8156 LearningRate 0.0420 Epoch: 14 Global Step: 146900 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:04:46,631-Speed 5960.71 samples/sec Loss 4.8413 LearningRate 0.0420 Epoch: 14 Global Step: 146910 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:04:53,508-Speed 5957.37 samples/sec Loss 4.8170 LearningRate 0.0420 Epoch: 14 Global Step: 146920 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:05:00,384-Speed 5958.50 samples/sec Loss 4.8531 LearningRate 0.0420 Epoch: 14 Global Step: 146930 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:05:07,272-Speed 5947.59 samples/sec Loss 4.8667 LearningRate 0.0419 Epoch: 14 Global Step: 146940 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:05:14,127-Speed 5976.35 samples/sec Loss 4.8498 LearningRate 0.0419 Epoch: 14 Global Step: 146950 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:05:20,983-Speed 5975.17 samples/sec Loss 4.9177 LearningRate 0.0419 Epoch: 14 Global Step: 146960 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:05:27,833-Speed 5981.10 samples/sec Loss 4.8594 LearningRate 0.0419 Epoch: 14 Global Step: 146970 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:05:34,827-Speed 5857.50 samples/sec Loss 4.8635 LearningRate 0.0419 Epoch: 14 Global Step: 146980 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:05:41,685-Speed 5975.01 samples/sec Loss 4.9038 LearningRate 0.0419 Epoch: 14 Global Step: 146990 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 01:05:48,546-Speed 5971.47 samples/sec Loss 4.8351 LearningRate 0.0419 Epoch: 14 Global Step: 147000 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 01:05:55,415-Speed 5964.00 samples/sec Loss 4.8377 LearningRate 0.0418 Epoch: 14 Global Step: 147010 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 01:06:02,288-Speed 5961.21 samples/sec Loss 4.8430 LearningRate 0.0418 Epoch: 14 Global Step: 147020 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 01:06:09,158-Speed 5962.82 samples/sec Loss 4.8174 LearningRate 0.0418 Epoch: 14 Global Step: 147030 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 01:06:16,041-Speed 5953.00 samples/sec Loss 4.8688 LearningRate 0.0418 Epoch: 14 Global Step: 147040 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 01:06:22,915-Speed 5959.41 samples/sec Loss 4.8721 LearningRate 0.0418 Epoch: 14 Global Step: 147050 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 01:06:29,774-Speed 5972.63 samples/sec Loss 4.8482 LearningRate 0.0418 Epoch: 14 Global Step: 147060 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 01:06:36,619-Speed 5986.08 samples/sec Loss 4.8070 LearningRate 0.0418 Epoch: 14 Global Step: 147070 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:06:43,483-Speed 5967.35 samples/sec Loss 4.8744 LearningRate 0.0418 Epoch: 14 Global Step: 147080 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:06:50,347-Speed 5968.95 samples/sec Loss 4.8098 LearningRate 0.0417 Epoch: 14 Global Step: 147090 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:06:57,204-Speed 5974.89 samples/sec Loss 4.7702 LearningRate 0.0417 Epoch: 14 Global Step: 147100 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:07:04,045-Speed 5987.65 samples/sec Loss 4.8260 LearningRate 0.0417 Epoch: 14 Global Step: 147110 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:07:10,918-Speed 5961.10 samples/sec Loss 4.8192 LearningRate 0.0417 Epoch: 14 Global Step: 147120 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:07:17,779-Speed 5971.00 samples/sec Loss 4.8612 LearningRate 0.0417 Epoch: 14 Global Step: 147130 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:07:24,644-Speed 5968.09 samples/sec Loss 4.9038 LearningRate 0.0417 Epoch: 14 Global Step: 147140 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:07:31,531-Speed 5948.80 samples/sec Loss 4.8376 LearningRate 0.0417 Epoch: 14 Global Step: 147150 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:07:38,405-Speed 5960.07 samples/sec Loss 4.8250 LearningRate 0.0416 Epoch: 14 Global Step: 147160 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:07:45,254-Speed 5980.52 samples/sec Loss 4.8133 LearningRate 0.0416 Epoch: 14 Global Step: 147170 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:07:52,122-Speed 5964.77 samples/sec Loss 4.8442 LearningRate 0.0416 Epoch: 14 Global Step: 147180 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:07:58,984-Speed 5976.67 samples/sec Loss 4.8494 LearningRate 0.0416 Epoch: 14 Global Step: 147190 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:08:05,838-Speed 5977.49 samples/sec Loss 4.8436 LearningRate 0.0416 Epoch: 14 Global Step: 147200 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:08:12,706-Speed 5964.94 samples/sec Loss 4.8725 LearningRate 0.0416 Epoch: 14 Global Step: 147210 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:08:19,648-Speed 5901.66 samples/sec Loss 4.8622 LearningRate 0.0416 Epoch: 14 Global Step: 147220 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:08:26,504-Speed 5974.88 samples/sec Loss 4.8132 LearningRate 0.0415 Epoch: 14 Global Step: 147230 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:08:33,371-Speed 5966.89 samples/sec Loss 4.8039 LearningRate 0.0415 Epoch: 14 Global Step: 147240 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:08:40,272-Speed 5936.54 samples/sec Loss 4.8405 LearningRate 0.0415 Epoch: 14 Global Step: 147250 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:08:47,139-Speed 5965.93 samples/sec Loss 4.7996 LearningRate 0.0415 Epoch: 14 Global Step: 147260 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:08:54,004-Speed 5967.59 samples/sec Loss 4.8592 LearningRate 0.0415 Epoch: 14 Global Step: 147270 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:09:00,868-Speed 5968.46 samples/sec Loss 4.8320 LearningRate 0.0415 Epoch: 14 Global Step: 147280 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:09:07,721-Speed 5978.20 samples/sec Loss 4.7429 LearningRate 0.0415 Epoch: 14 Global Step: 147290 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:09:14,595-Speed 5959.62 samples/sec Loss 4.8202 LearningRate 0.0414 Epoch: 14 Global Step: 147300 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:09:21,484-Speed 5947.45 samples/sec Loss 4.8327 LearningRate 0.0414 Epoch: 14 Global Step: 147310 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 01:09:28,360-Speed 5958.02 samples/sec Loss 4.8413 LearningRate 0.0414 Epoch: 14 Global Step: 147320 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:09:35,231-Speed 5962.40 samples/sec Loss 4.8459 LearningRate 0.0414 Epoch: 14 Global Step: 147330 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:09:42,085-Speed 5978.06 samples/sec Loss 4.8747 LearningRate 0.0414 Epoch: 14 Global Step: 147340 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:09:48,942-Speed 5974.46 samples/sec Loss 4.8048 LearningRate 0.0414 Epoch: 14 Global Step: 147350 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:09:55,794-Speed 5979.47 samples/sec Loss 4.8309 LearningRate 0.0414 Epoch: 14 Global Step: 147360 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:10:02,652-Speed 5973.95 samples/sec Loss 4.8191 LearningRate 0.0414 Epoch: 14 Global Step: 147370 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:10:09,528-Speed 5958.08 samples/sec Loss 4.8366 LearningRate 0.0413 Epoch: 14 Global Step: 147380 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:10:16,401-Speed 5961.52 samples/sec Loss 4.8323 LearningRate 0.0413 Epoch: 14 Global Step: 147390 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:10:23,277-Speed 5960.63 samples/sec Loss 4.7764 LearningRate 0.0413 Epoch: 14 Global Step: 147400 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:10:30,144-Speed 5965.55 samples/sec Loss 4.7774 LearningRate 0.0413 Epoch: 14 Global Step: 147410 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:10:37,016-Speed 5962.74 samples/sec Loss 4.7749 LearningRate 0.0413 Epoch: 14 Global Step: 147420 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 01:10:43,908-Speed 5944.64 samples/sec Loss 4.8186 LearningRate 0.0413 Epoch: 14 Global Step: 147430 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:10:50,763-Speed 5976.34 samples/sec Loss 4.8163 LearningRate 0.0413 Epoch: 14 Global Step: 147440 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:10:57,636-Speed 5962.11 samples/sec Loss 4.7792 LearningRate 0.0412 Epoch: 14 Global Step: 147450 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:11:04,503-Speed 5968.52 samples/sec Loss 4.7924 LearningRate 0.0412 Epoch: 14 Global Step: 147460 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:11:11,386-Speed 5953.26 samples/sec Loss 4.8083 LearningRate 0.0412 Epoch: 14 Global Step: 147470 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:11:18,269-Speed 5952.60 samples/sec Loss 4.8412 LearningRate 0.0412 Epoch: 14 Global Step: 147480 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:11:25,130-Speed 5971.27 samples/sec Loss 4.8166 LearningRate 0.0412 Epoch: 14 Global Step: 147490 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:11:32,017-Speed 5948.53 samples/sec Loss 4.8019 LearningRate 0.0412 Epoch: 14 Global Step: 147500 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:11:38,872-Speed 5977.10 samples/sec Loss 4.8075 LearningRate 0.0412 Epoch: 14 Global Step: 147510 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:11:45,720-Speed 5981.83 samples/sec Loss 4.8049 LearningRate 0.0411 Epoch: 14 Global Step: 147520 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:11:52,592-Speed 5962.04 samples/sec Loss 4.8350 LearningRate 0.0411 Epoch: 14 Global Step: 147530 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:11:59,447-Speed 5976.74 samples/sec Loss 4.8247 LearningRate 0.0411 Epoch: 14 Global Step: 147540 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:12:06,300-Speed 5977.56 samples/sec Loss 4.8156 LearningRate 0.0411 Epoch: 14 Global Step: 147550 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:12:13,164-Speed 5968.61 samples/sec Loss 4.8320 LearningRate 0.0411 Epoch: 14 Global Step: 147560 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:12:20,036-Speed 5964.04 samples/sec Loss 4.8239 LearningRate 0.0411 Epoch: 14 Global Step: 147570 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:12:26,899-Speed 5969.23 samples/sec Loss 4.7861 LearningRate 0.0411 Epoch: 14 Global Step: 147580 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:12:33,760-Speed 5970.36 samples/sec Loss 4.8164 LearningRate 0.0410 Epoch: 14 Global Step: 147590 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:12:40,625-Speed 5969.70 samples/sec Loss 4.7774 LearningRate 0.0410 Epoch: 14 Global Step: 147600 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:12:47,478-Speed 5977.95 samples/sec Loss 4.8220 LearningRate 0.0410 Epoch: 14 Global Step: 147610 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:12:54,338-Speed 5972.35 samples/sec Loss 4.7795 LearningRate 0.0410 Epoch: 14 Global Step: 147620 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:13:01,228-Speed 5945.81 samples/sec Loss 4.8160 LearningRate 0.0410 Epoch: 14 Global Step: 147630 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:13:08,135-Speed 5934.48 samples/sec Loss 4.8323 LearningRate 0.0410 Epoch: 14 Global Step: 147640 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:13:14,997-Speed 5973.65 samples/sec Loss 4.7803 LearningRate 0.0410 Epoch: 14 Global Step: 147650 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:13:21,871-Speed 5960.25 samples/sec Loss 4.8390 LearningRate 0.0410 Epoch: 14 Global Step: 147660 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:13:28,752-Speed 5953.53 samples/sec Loss 4.8035 LearningRate 0.0409 Epoch: 14 Global Step: 147670 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:13:35,600-Speed 5981.65 samples/sec Loss 4.8191 LearningRate 0.0409 Epoch: 14 Global Step: 147680 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:13:42,463-Speed 5969.67 samples/sec Loss 4.7431 LearningRate 0.0409 Epoch: 14 Global Step: 147690 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:13:49,325-Speed 5970.30 samples/sec Loss 4.8236 LearningRate 0.0409 Epoch: 14 Global Step: 147700 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:13:56,199-Speed 5959.72 samples/sec Loss 4.7835 LearningRate 0.0409 Epoch: 14 Global Step: 147710 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:14:03,068-Speed 5964.50 samples/sec Loss 4.7386 LearningRate 0.0409 Epoch: 14 Global Step: 147720 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:14:09,930-Speed 5970.49 samples/sec Loss 4.8074 LearningRate 0.0409 Epoch: 14 Global Step: 147730 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:14:16,789-Speed 5972.86 samples/sec Loss 4.8677 LearningRate 0.0408 Epoch: 14 Global Step: 147740 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:14:23,649-Speed 5971.78 samples/sec Loss 4.7850 LearningRate 0.0408 Epoch: 14 Global Step: 147750 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:14:30,523-Speed 5960.11 samples/sec Loss 4.7614 LearningRate 0.0408 Epoch: 14 Global Step: 147760 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:14:37,373-Speed 5980.21 samples/sec Loss 4.8189 LearningRate 0.0408 Epoch: 14 Global Step: 147770 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:14:44,234-Speed 5970.96 samples/sec Loss 4.8309 LearningRate 0.0408 Epoch: 14 Global Step: 147780 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:14:51,104-Speed 5966.87 samples/sec Loss 4.8356 LearningRate 0.0408 Epoch: 14 Global Step: 147790 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:14:57,960-Speed 5975.27 samples/sec Loss 4.8047 LearningRate 0.0408 Epoch: 14 Global Step: 147800 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:15:04,812-Speed 5979.66 samples/sec Loss 4.7671 LearningRate 0.0407 Epoch: 14 Global Step: 147810 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:15:11,671-Speed 5973.01 samples/sec Loss 4.7617 LearningRate 0.0407 Epoch: 14 Global Step: 147820 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:15:18,534-Speed 5969.30 samples/sec Loss 4.7907 LearningRate 0.0407 Epoch: 14 Global Step: 147830 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:15:25,399-Speed 5967.48 samples/sec Loss 4.8483 LearningRate 0.0407 Epoch: 14 Global Step: 147840 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:15:32,270-Speed 5962.62 samples/sec Loss 4.7995 LearningRate 0.0407 Epoch: 14 Global Step: 147850 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:15:39,140-Speed 5963.00 samples/sec Loss 4.7868 LearningRate 0.0407 Epoch: 14 Global Step: 147860 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:15:45,995-Speed 5976.32 samples/sec Loss 4.7692 LearningRate 0.0407 Epoch: 14 Global Step: 147870 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:15:52,856-Speed 5970.88 samples/sec Loss 4.8038 LearningRate 0.0407 Epoch: 14 Global Step: 147880 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 01:15:59,725-Speed 5964.03 samples/sec Loss 4.7749 LearningRate 0.0406 Epoch: 14 Global Step: 147890 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 01:16:06,586-Speed 5971.62 samples/sec Loss 4.7410 LearningRate 0.0406 Epoch: 14 Global Step: 147900 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 01:16:13,451-Speed 5970.01 samples/sec Loss 4.7923 LearningRate 0.0406 Epoch: 14 Global Step: 147910 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 01:16:20,306-Speed 5975.77 samples/sec Loss 4.7842 LearningRate 0.0406 Epoch: 14 Global Step: 147920 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 01:16:27,186-Speed 5957.90 samples/sec Loss 4.7679 LearningRate 0.0406 Epoch: 14 Global Step: 147930 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:16:34,052-Speed 5967.37 samples/sec Loss 4.7683 LearningRate 0.0406 Epoch: 14 Global Step: 147940 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:16:40,914-Speed 5969.39 samples/sec Loss 4.8096 LearningRate 0.0406 Epoch: 14 Global Step: 147950 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:16:47,797-Speed 5952.96 samples/sec Loss 4.7681 LearningRate 0.0405 Epoch: 14 Global Step: 147960 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:16:54,665-Speed 5965.14 samples/sec Loss 4.7933 LearningRate 0.0405 Epoch: 14 Global Step: 147970 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:17:01,520-Speed 5976.18 samples/sec Loss 4.7502 LearningRate 0.0405 Epoch: 14 Global Step: 147980 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:17:08,389-Speed 5964.05 samples/sec Loss 4.7488 LearningRate 0.0405 Epoch: 14 Global Step: 147990 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:17:15,258-Speed 5964.58 samples/sec Loss 4.7789 LearningRate 0.0405 Epoch: 14 Global Step: 148000 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:17:22,115-Speed 5974.64 samples/sec Loss 4.7844 LearningRate 0.0405 Epoch: 14 Global Step: 148010 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:17:28,980-Speed 5968.15 samples/sec Loss 4.7901 LearningRate 0.0405 Epoch: 14 Global Step: 148020 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:17:35,833-Speed 5978.82 samples/sec Loss 4.8609 LearningRate 0.0404 Epoch: 14 Global Step: 148030 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 01:17:42,695-Speed 5969.55 samples/sec Loss 4.7580 LearningRate 0.0404 Epoch: 14 Global Step: 148040 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 01:17:49,568-Speed 5961.03 samples/sec Loss 4.7639 LearningRate 0.0404 Epoch: 14 Global Step: 148050 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 01:17:56,461-Speed 5944.10 samples/sec Loss 4.7558 LearningRate 0.0404 Epoch: 14 Global Step: 148060 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 01:18:03,333-Speed 5961.57 samples/sec Loss 4.7903 LearningRate 0.0404 Epoch: 14 Global Step: 148070 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:18:10,193-Speed 5972.51 samples/sec Loss 4.7885 LearningRate 0.0404 Epoch: 14 Global Step: 148080 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:18:17,072-Speed 5955.21 samples/sec Loss 4.8009 LearningRate 0.0404 Epoch: 14 Global Step: 148090 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:18:23,939-Speed 5966.01 samples/sec Loss 4.8083 LearningRate 0.0404 Epoch: 14 Global Step: 148100 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:18:30,822-Speed 5952.91 samples/sec Loss 4.7722 LearningRate 0.0403 Epoch: 14 Global Step: 148110 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:18:37,687-Speed 5968.86 samples/sec Loss 4.7510 LearningRate 0.0403 Epoch: 14 Global Step: 148120 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:18:44,549-Speed 5970.03 samples/sec Loss 4.7543 LearningRate 0.0403 Epoch: 14 Global Step: 148130 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:18:51,422-Speed 5961.41 samples/sec Loss 4.7813 LearningRate 0.0403 Epoch: 14 Global Step: 148140 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:18:58,313-Speed 5945.46 samples/sec Loss 4.7696 LearningRate 0.0403 Epoch: 14 Global Step: 148150 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:19:05,173-Speed 5971.74 samples/sec Loss 4.7816 LearningRate 0.0403 Epoch: 14 Global Step: 148160 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:19:12,026-Speed 5978.32 samples/sec Loss 4.7893 LearningRate 0.0403 Epoch: 14 Global Step: 148170 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 01:19:18,884-Speed 5973.79 samples/sec Loss 4.7816 LearningRate 0.0402 Epoch: 14 Global Step: 148180 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 01:19:25,757-Speed 5960.62 samples/sec Loss 4.7739 LearningRate 0.0402 Epoch: 14 Global Step: 148190 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 01:19:32,611-Speed 5977.05 samples/sec Loss 4.7967 LearningRate 0.0402 Epoch: 14 Global Step: 148200 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 01:19:39,480-Speed 5964.06 samples/sec Loss 4.7985 LearningRate 0.0402 Epoch: 14 Global Step: 148210 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 01:19:46,334-Speed 5976.80 samples/sec Loss 4.7899 LearningRate 0.0402 Epoch: 14 Global Step: 148220 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 01:19:53,177-Speed 5986.71 samples/sec Loss 4.7795 LearningRate 0.0402 Epoch: 14 Global Step: 148230 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:20:00,035-Speed 5973.20 samples/sec Loss 4.7265 LearningRate 0.0402 Epoch: 14 Global Step: 148240 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:20:06,942-Speed 5931.86 samples/sec Loss 4.7496 LearningRate 0.0401 Epoch: 14 Global Step: 148250 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:20:13,801-Speed 5973.40 samples/sec Loss 4.7801 LearningRate 0.0401 Epoch: 14 Global Step: 148260 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:20:20,664-Speed 5970.09 samples/sec Loss 4.7537 LearningRate 0.0401 Epoch: 14 Global Step: 148270 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:20:27,535-Speed 5961.94 samples/sec Loss 4.7516 LearningRate 0.0401 Epoch: 14 Global Step: 148280 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:20:34,405-Speed 5963.26 samples/sec Loss 4.7555 LearningRate 0.0401 Epoch: 14 Global Step: 148290 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:20:41,269-Speed 5969.29 samples/sec Loss 4.7631 LearningRate 0.0401 Epoch: 14 Global Step: 148300 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:20:48,138-Speed 5964.34 samples/sec Loss 4.7299 LearningRate 0.0401 Epoch: 14 Global Step: 148310 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:20:55,005-Speed 5965.78 samples/sec Loss 4.7576 LearningRate 0.0401 Epoch: 14 Global Step: 148320 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:21:01,869-Speed 5968.98 samples/sec Loss 4.7478 LearningRate 0.0400 Epoch: 14 Global Step: 148330 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 01:21:08,743-Speed 5959.48 samples/sec Loss 4.7484 LearningRate 0.0400 Epoch: 14 Global Step: 148340 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 01:21:15,622-Speed 5955.42 samples/sec Loss 4.7332 LearningRate 0.0400 Epoch: 14 Global Step: 148350 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 01:21:22,482-Speed 5972.63 samples/sec Loss 4.7646 LearningRate 0.0400 Epoch: 14 Global Step: 148360 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:21:29,342-Speed 5971.69 samples/sec Loss 4.7808 LearningRate 0.0400 Epoch: 14 Global Step: 148370 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:21:36,215-Speed 5962.06 samples/sec Loss 4.7345 LearningRate 0.0400 Epoch: 14 Global Step: 148380 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:21:43,089-Speed 5959.70 samples/sec Loss 4.7595 LearningRate 0.0400 Epoch: 14 Global Step: 148390 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:21:49,946-Speed 5974.86 samples/sec Loss 4.7154 LearningRate 0.0399 Epoch: 14 Global Step: 148400 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:21:56,800-Speed 5978.89 samples/sec Loss 4.7370 LearningRate 0.0399 Epoch: 14 Global Step: 148410 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:22:03,664-Speed 5968.58 samples/sec Loss 4.7511 LearningRate 0.0399 Epoch: 14 Global Step: 148420 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:22:10,527-Speed 5969.99 samples/sec Loss 4.7408 LearningRate 0.0399 Epoch: 14 Global Step: 148430 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:22:17,399-Speed 5961.27 samples/sec Loss 4.7477 LearningRate 0.0399 Epoch: 14 Global Step: 148440 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:22:24,260-Speed 5971.71 samples/sec Loss 4.7388 LearningRate 0.0399 Epoch: 14 Global Step: 148450 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:22:31,115-Speed 5975.95 samples/sec Loss 4.7226 LearningRate 0.0399 Epoch: 14 Global Step: 148460 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 01:22:37,958-Speed 5986.20 samples/sec Loss 4.7936 LearningRate 0.0398 Epoch: 14 Global Step: 148470 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:22:44,823-Speed 5967.79 samples/sec Loss 4.7573 LearningRate 0.0398 Epoch: 14 Global Step: 148480 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:22:51,693-Speed 5962.66 samples/sec Loss 4.7317 LearningRate 0.0398 Epoch: 14 Global Step: 148490 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:22:58,553-Speed 5973.52 samples/sec Loss 4.7040 LearningRate 0.0398 Epoch: 14 Global Step: 148500 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:23:05,410-Speed 5977.50 samples/sec Loss 4.7527 LearningRate 0.0398 Epoch: 14 Global Step: 148510 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:23:12,275-Speed 5966.75 samples/sec Loss 4.7847 LearningRate 0.0398 Epoch: 14 Global Step: 148520 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:23:19,142-Speed 5966.85 samples/sec Loss 4.7476 LearningRate 0.0398 Epoch: 14 Global Step: 148530 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:23:26,012-Speed 5963.14 samples/sec Loss 4.7310 LearningRate 0.0398 Epoch: 14 Global Step: 148540 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:23:32,892-Speed 5954.13 samples/sec Loss 4.7278 LearningRate 0.0397 Epoch: 14 Global Step: 148550 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:23:39,757-Speed 5967.49 samples/sec Loss 4.7167 LearningRate 0.0397 Epoch: 14 Global Step: 148560 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 01:23:46,643-Speed 5951.46 samples/sec Loss 4.7660 LearningRate 0.0397 Epoch: 14 Global Step: 148570 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:23:53,502-Speed 5971.98 samples/sec Loss 4.7511 LearningRate 0.0397 Epoch: 14 Global Step: 148580 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:24:00,353-Speed 5980.57 samples/sec Loss 4.7016 LearningRate 0.0397 Epoch: 14 Global Step: 148590 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:24:07,224-Speed 5962.13 samples/sec Loss 4.7403 LearningRate 0.0397 Epoch: 14 Global Step: 148600 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:24:14,110-Speed 5949.58 samples/sec Loss 4.7164 LearningRate 0.0397 Epoch: 14 Global Step: 148610 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:24:20,992-Speed 5968.45 samples/sec Loss 4.7168 LearningRate 0.0396 Epoch: 14 Global Step: 148620 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:24:29,037-Speed 5955.02 samples/sec Loss 4.7427 LearningRate 0.0396 Epoch: 14 Global Step: 148630 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:24:35,962-Speed 5915.79 samples/sec Loss 4.7100 LearningRate 0.0396 Epoch: 14 Global Step: 148640 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:24:42,843-Speed 5953.45 samples/sec Loss 4.7464 LearningRate 0.0396 Epoch: 14 Global Step: 148650 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:24:49,697-Speed 5977.97 samples/sec Loss 4.7392 LearningRate 0.0396 Epoch: 14 Global Step: 148660 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:24:56,556-Speed 5972.88 samples/sec Loss 4.7471 LearningRate 0.0396 Epoch: 14 Global Step: 148670 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 01:25:03,497-Speed 5902.43 samples/sec Loss 4.7617 LearningRate 0.0396 Epoch: 14 Global Step: 148680 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 01:25:10,487-Speed 5861.22 samples/sec Loss 4.7647 LearningRate 0.0396 Epoch: 14 Global Step: 148690 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 01:25:17,333-Speed 5984.26 samples/sec Loss 4.7091 LearningRate 0.0395 Epoch: 14 Global Step: 148700 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:25:24,215-Speed 5953.28 samples/sec Loss 4.6971 LearningRate 0.0395 Epoch: 14 Global Step: 148710 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:25:31,061-Speed 5983.93 samples/sec Loss 4.7276 LearningRate 0.0395 Epoch: 14 Global Step: 148720 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 01:25:37,938-Speed 5956.88 samples/sec Loss 4.7490 LearningRate 0.0395 Epoch: 14 Global Step: 148730 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:25:44,848-Speed 5929.09 samples/sec Loss 4.7289 LearningRate 0.0395 Epoch: 14 Global Step: 148740 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:25:51,712-Speed 5969.20 samples/sec Loss 4.7001 LearningRate 0.0395 Epoch: 14 Global Step: 148750 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:25:58,571-Speed 5972.77 samples/sec Loss 4.7360 LearningRate 0.0395 Epoch: 14 Global Step: 148760 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:26:05,434-Speed 5969.22 samples/sec Loss 4.7006 LearningRate 0.0394 Epoch: 14 Global Step: 148770 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:26:12,314-Speed 5954.66 samples/sec Loss 4.6988 LearningRate 0.0394 Epoch: 14 Global Step: 148780 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:26:19,181-Speed 5966.41 samples/sec Loss 4.7941 LearningRate 0.0394 Epoch: 14 Global Step: 148790 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:26:26,036-Speed 5976.67 samples/sec Loss 4.7477 LearningRate 0.0394 Epoch: 14 Global Step: 148800 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 01:26:32,893-Speed 5976.41 samples/sec Loss 4.7549 LearningRate 0.0394 Epoch: 14 Global Step: 148810 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 01:26:39,757-Speed 5968.50 samples/sec Loss 4.7317 LearningRate 0.0394 Epoch: 14 Global Step: 148820 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 01:26:46,629-Speed 5961.29 samples/sec Loss 4.6833 LearningRate 0.0394 Epoch: 14 Global Step: 148830 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 01:26:53,479-Speed 5980.31 samples/sec Loss 4.7723 LearningRate 0.0394 Epoch: 14 Global Step: 148840 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:27:00,332-Speed 5977.72 samples/sec Loss 4.7345 LearningRate 0.0393 Epoch: 14 Global Step: 148850 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:27:07,187-Speed 5976.78 samples/sec Loss 4.7522 LearningRate 0.0393 Epoch: 14 Global Step: 148860 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:27:14,051-Speed 5969.97 samples/sec Loss 4.7472 LearningRate 0.0393 Epoch: 14 Global Step: 148870 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:27:20,912-Speed 5971.15 samples/sec Loss 4.6954 LearningRate 0.0393 Epoch: 14 Global Step: 148880 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:27:27,775-Speed 5969.12 samples/sec Loss 4.7016 LearningRate 0.0393 Epoch: 14 Global Step: 148890 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:27:34,653-Speed 5956.98 samples/sec Loss 4.7084 LearningRate 0.0393 Epoch: 14 Global Step: 148900 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:27:41,539-Speed 5949.18 samples/sec Loss 4.6813 LearningRate 0.0393 Epoch: 14 Global Step: 148910 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:27:48,403-Speed 5970.81 samples/sec Loss 4.7512 LearningRate 0.0392 Epoch: 14 Global Step: 148920 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:27:55,307-Speed 5933.59 samples/sec Loss 4.7498 LearningRate 0.0392 Epoch: 14 Global Step: 148930 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:28:02,167-Speed 5971.50 samples/sec Loss 4.6929 LearningRate 0.0392 Epoch: 14 Global Step: 148940 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 01:28:09,038-Speed 5964.67 samples/sec Loss 4.6966 LearningRate 0.0392 Epoch: 14 Global Step: 148950 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 01:28:15,902-Speed 5968.70 samples/sec Loss 4.7157 LearningRate 0.0392 Epoch: 14 Global Step: 148960 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 01:28:22,747-Speed 5985.03 samples/sec Loss 4.7168 LearningRate 0.0392 Epoch: 14 Global Step: 148970 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:28:29,635-Speed 5947.95 samples/sec Loss 4.7377 LearningRate 0.0392 Epoch: 14 Global Step: 148980 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:28:36,507-Speed 5961.78 samples/sec Loss 4.6846 LearningRate 0.0391 Epoch: 14 Global Step: 148990 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:28:43,358-Speed 5979.63 samples/sec Loss 4.6983 LearningRate 0.0391 Epoch: 14 Global Step: 149000 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:28:50,232-Speed 5959.50 samples/sec Loss 4.7289 LearningRate 0.0391 Epoch: 14 Global Step: 149010 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:28:57,105-Speed 5960.82 samples/sec Loss 4.6933 LearningRate 0.0391 Epoch: 14 Global Step: 149020 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:29:03,997-Speed 5944.83 samples/sec Loss 4.6879 LearningRate 0.0391 Epoch: 14 Global Step: 149030 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:29:10,878-Speed 5953.41 samples/sec Loss 4.7212 LearningRate 0.0391 Epoch: 14 Global Step: 149040 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:29:17,746-Speed 5965.06 samples/sec Loss 4.7042 LearningRate 0.0391 Epoch: 14 Global Step: 149050 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 01:29:24,606-Speed 5971.88 samples/sec Loss 4.6860 LearningRate 0.0391 Epoch: 14 Global Step: 149060 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 01:29:31,465-Speed 5974.65 samples/sec Loss 4.7251 LearningRate 0.0390 Epoch: 14 Global Step: 149070 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 01:29:38,321-Speed 5975.19 samples/sec Loss 4.7119 LearningRate 0.0390 Epoch: 14 Global Step: 149080 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 01:29:45,252-Speed 5910.44 samples/sec Loss 4.7100 LearningRate 0.0390 Epoch: 14 Global Step: 149090 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 01:29:52,133-Speed 5954.14 samples/sec Loss 4.6738 LearningRate 0.0390 Epoch: 14 Global Step: 149100 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 01:29:58,989-Speed 5977.75 samples/sec Loss 4.7112 LearningRate 0.0390 Epoch: 14 Global Step: 149110 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 01:30:05,859-Speed 5962.71 samples/sec Loss 4.6983 LearningRate 0.0390 Epoch: 14 Global Step: 149120 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 01:30:12,727-Speed 5966.88 samples/sec Loss 4.6756 LearningRate 0.0390 Epoch: 14 Global Step: 149130 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 01:30:19,583-Speed 5975.97 samples/sec Loss 4.7130 LearningRate 0.0389 Epoch: 14 Global Step: 149140 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 01:30:26,470-Speed 5948.39 samples/sec Loss 4.6889 LearningRate 0.0389 Epoch: 14 Global Step: 149150 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:30:33,359-Speed 5947.40 samples/sec Loss 4.7325 LearningRate 0.0389 Epoch: 14 Global Step: 149160 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:30:40,237-Speed 5956.57 samples/sec Loss 4.6701 LearningRate 0.0389 Epoch: 14 Global Step: 149170 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:30:47,099-Speed 5969.99 samples/sec Loss 4.6668 LearningRate 0.0389 Epoch: 14 Global Step: 149180 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:30:54,804-Speed 5317.72 samples/sec Loss 4.6643 LearningRate 0.0389 Epoch: 14 Global Step: 149190 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:31:01,685-Speed 5953.91 samples/sec Loss 4.6845 LearningRate 0.0389 Epoch: 14 Global Step: 149200 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:31:08,544-Speed 5972.69 samples/sec Loss 4.6976 LearningRate 0.0389 Epoch: 14 Global Step: 149210 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:31:15,417-Speed 5961.09 samples/sec Loss 4.6565 LearningRate 0.0388 Epoch: 14 Global Step: 149220 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:31:22,274-Speed 5974.51 samples/sec Loss 4.6704 LearningRate 0.0388 Epoch: 14 Global Step: 149230 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:31:29,135-Speed 5971.70 samples/sec Loss 4.7259 LearningRate 0.0388 Epoch: 14 Global Step: 149240 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:31:35,996-Speed 5971.08 samples/sec Loss 4.6777 LearningRate 0.0388 Epoch: 14 Global Step: 149250 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 01:31:42,868-Speed 5962.20 samples/sec Loss 4.7076 LearningRate 0.0388 Epoch: 14 Global Step: 149260 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 01:31:49,714-Speed 5983.94 samples/sec Loss 4.6757 LearningRate 0.0388 Epoch: 14 Global Step: 149270 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:31:56,585-Speed 5964.28 samples/sec Loss 4.7429 LearningRate 0.0388 Epoch: 14 Global Step: 149280 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:32:03,453-Speed 5965.69 samples/sec Loss 4.7084 LearningRate 0.0387 Epoch: 14 Global Step: 149290 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:32:10,314-Speed 5970.55 samples/sec Loss 4.7252 LearningRate 0.0387 Epoch: 14 Global Step: 149300 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:32:17,167-Speed 5980.31 samples/sec Loss 4.7513 LearningRate 0.0387 Epoch: 14 Global Step: 149310 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:32:24,047-Speed 5955.20 samples/sec Loss 4.6750 LearningRate 0.0387 Epoch: 14 Global Step: 149320 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:32:30,912-Speed 5967.01 samples/sec Loss 4.6492 LearningRate 0.0387 Epoch: 14 Global Step: 149330 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:32:37,771-Speed 5972.76 samples/sec Loss 4.6436 LearningRate 0.0387 Epoch: 14 Global Step: 149340 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:32:44,654-Speed 5952.84 samples/sec Loss 4.6629 LearningRate 0.0387 Epoch: 14 Global Step: 149350 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:32:51,514-Speed 5971.45 samples/sec Loss 4.6939 LearningRate 0.0387 Epoch: 14 Global Step: 149360 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:32:58,371-Speed 5977.40 samples/sec Loss 4.6432 LearningRate 0.0386 Epoch: 14 Global Step: 149370 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 01:33:05,230-Speed 5975.77 samples/sec Loss 4.6688 LearningRate 0.0386 Epoch: 14 Global Step: 149380 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:33:12,105-Speed 5959.16 samples/sec Loss 4.7061 LearningRate 0.0386 Epoch: 14 Global Step: 149390 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:33:18,979-Speed 5960.40 samples/sec Loss 4.6691 LearningRate 0.0386 Epoch: 14 Global Step: 149400 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:33:25,840-Speed 5971.07 samples/sec Loss 4.6297 LearningRate 0.0386 Epoch: 14 Global Step: 149410 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:33:32,703-Speed 5968.68 samples/sec Loss 4.6461 LearningRate 0.0386 Epoch: 14 Global Step: 149420 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:33:39,559-Speed 5974.89 samples/sec Loss 4.6804 LearningRate 0.0386 Epoch: 14 Global Step: 149430 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:33:46,419-Speed 5972.82 samples/sec Loss 4.6763 LearningRate 0.0385 Epoch: 14 Global Step: 149440 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:33:53,302-Speed 5952.16 samples/sec Loss 4.6948 LearningRate 0.0385 Epoch: 14 Global Step: 149450 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:34:00,168-Speed 5966.62 samples/sec Loss 4.6613 LearningRate 0.0385 Epoch: 14 Global Step: 149460 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:34:07,037-Speed 5964.09 samples/sec Loss 4.6828 LearningRate 0.0385 Epoch: 14 Global Step: 149470 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:34:13,896-Speed 5973.18 samples/sec Loss 4.6578 LearningRate 0.0385 Epoch: 14 Global Step: 149480 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 01:34:20,757-Speed 5973.21 samples/sec Loss 4.6698 LearningRate 0.0385 Epoch: 14 Global Step: 149490 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:34:27,636-Speed 5955.76 samples/sec Loss 4.6492 LearningRate 0.0385 Epoch: 14 Global Step: 149500 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:34:34,504-Speed 5964.87 samples/sec Loss 4.6662 LearningRate 0.0385 Epoch: 14 Global Step: 149510 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:34:41,366-Speed 5970.18 samples/sec Loss 4.6685 LearningRate 0.0384 Epoch: 14 Global Step: 149520 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:34:48,237-Speed 5964.73 samples/sec Loss 4.6848 LearningRate 0.0384 Epoch: 14 Global Step: 149530 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:34:55,097-Speed 5970.98 samples/sec Loss 4.6879 LearningRate 0.0384 Epoch: 14 Global Step: 149540 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:35:01,951-Speed 5979.13 samples/sec Loss 4.6471 LearningRate 0.0384 Epoch: 14 Global Step: 149550 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:35:08,812-Speed 5973.80 samples/sec Loss 4.7073 LearningRate 0.0384 Epoch: 14 Global Step: 149560 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:35:15,667-Speed 5976.12 samples/sec Loss 4.6622 LearningRate 0.0384 Epoch: 14 Global Step: 149570 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:35:22,537-Speed 5963.46 samples/sec Loss 4.6977 LearningRate 0.0384 Epoch: 14 Global Step: 149580 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:35:29,417-Speed 5955.39 samples/sec Loss 4.6591 LearningRate 0.0383 Epoch: 14 Global Step: 149590 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 01:35:36,275-Speed 5972.99 samples/sec Loss 4.6844 LearningRate 0.0383 Epoch: 14 Global Step: 149600 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 01:35:43,139-Speed 5968.65 samples/sec Loss 4.6502 LearningRate 0.0383 Epoch: 14 Global Step: 149610 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 01:35:49,999-Speed 5971.87 samples/sec Loss 4.6717 LearningRate 0.0383 Epoch: 14 Global Step: 149620 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 01:35:56,852-Speed 5977.87 samples/sec Loss 4.6458 LearningRate 0.0383 Epoch: 14 Global Step: 149630 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:36:03,708-Speed 5976.16 samples/sec Loss 4.6397 LearningRate 0.0383 Epoch: 14 Global Step: 149640 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:36:10,563-Speed 5978.13 samples/sec Loss 4.7016 LearningRate 0.0383 Epoch: 14 Global Step: 149650 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:36:17,437-Speed 5959.56 samples/sec Loss 4.6603 LearningRate 0.0383 Epoch: 14 Global Step: 149660 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:36:24,311-Speed 5960.20 samples/sec Loss 4.6689 LearningRate 0.0382 Epoch: 14 Global Step: 149670 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:36:31,187-Speed 5957.84 samples/sec Loss 4.6564 LearningRate 0.0382 Epoch: 14 Global Step: 149680 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:36:38,055-Speed 5965.03 samples/sec Loss 4.6526 LearningRate 0.0382 Epoch: 14 Global Step: 149690 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:36:44,926-Speed 5962.69 samples/sec Loss 4.6705 LearningRate 0.0382 Epoch: 14 Global Step: 149700 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:36:51,788-Speed 5970.27 samples/sec Loss 4.6328 LearningRate 0.0382 Epoch: 14 Global Step: 149710 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:36:58,653-Speed 5967.68 samples/sec Loss 4.7089 LearningRate 0.0382 Epoch: 14 Global Step: 149720 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:37:05,524-Speed 5961.91 samples/sec Loss 4.6346 LearningRate 0.0382 Epoch: 14 Global Step: 149730 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 01:37:12,387-Speed 5969.90 samples/sec Loss 4.6370 LearningRate 0.0381 Epoch: 14 Global Step: 149740 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:37:19,243-Speed 5975.22 samples/sec Loss 4.6780 LearningRate 0.0381 Epoch: 14 Global Step: 149750 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:37:26,096-Speed 5977.85 samples/sec Loss 4.5460 LearningRate 0.0381 Epoch: 14 Global Step: 149760 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:37:32,951-Speed 5976.95 samples/sec Loss 4.6744 LearningRate 0.0381 Epoch: 14 Global Step: 149770 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:37:39,831-Speed 5953.60 samples/sec Loss 4.6439 LearningRate 0.0381 Epoch: 14 Global Step: 149780 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:37:46,696-Speed 5967.50 samples/sec Loss 4.6758 LearningRate 0.0381 Epoch: 14 Global Step: 149790 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:37:53,573-Speed 5970.95 samples/sec Loss 4.6412 LearningRate 0.0381 Epoch: 14 Global Step: 149800 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:38:00,443-Speed 5963.27 samples/sec Loss 4.6614 LearningRate 0.0381 Epoch: 14 Global Step: 149810 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:38:07,309-Speed 5967.08 samples/sec Loss 4.6430 LearningRate 0.0380 Epoch: 14 Global Step: 149820 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:38:14,241-Speed 5910.00 samples/sec Loss 4.6367 LearningRate 0.0380 Epoch: 14 Global Step: 149830 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:38:21,198-Speed 5889.00 samples/sec Loss 4.6410 LearningRate 0.0380 Epoch: 14 Global Step: 149840 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 01:38:28,042-Speed 5985.89 samples/sec Loss 4.7062 LearningRate 0.0380 Epoch: 14 Global Step: 149850 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:38:34,917-Speed 5959.48 samples/sec Loss 4.6755 LearningRate 0.0380 Epoch: 14 Global Step: 149860 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:38:41,790-Speed 5961.27 samples/sec Loss 4.6297 LearningRate 0.0380 Epoch: 14 Global Step: 149870 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:38:48,666-Speed 5957.63 samples/sec Loss 4.6269 LearningRate 0.0380 Epoch: 14 Global Step: 149880 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 01:38:55,520-Speed 5977.26 samples/sec Loss 4.6441 LearningRate 0.0380 Epoch: 14 Global Step: 149890 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 01:39:02,405-Speed 5950.06 samples/sec Loss 4.6433 LearningRate 0.0379 Epoch: 14 Global Step: 149900 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 01:39:09,258-Speed 5978.87 samples/sec Loss 4.6391 LearningRate 0.0379 Epoch: 14 Global Step: 149910 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 01:39:16,131-Speed 5961.17 samples/sec Loss 4.6348 LearningRate 0.0379 Epoch: 14 Global Step: 149920 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 01:39:23,028-Speed 5939.22 samples/sec Loss 4.6206 LearningRate 0.0379 Epoch: 14 Global Step: 149930 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 01:39:29,909-Speed 5954.37 samples/sec Loss 4.6298 LearningRate 0.0379 Epoch: 14 Global Step: 149940 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 01:39:36,755-Speed 5985.79 samples/sec Loss 4.6019 LearningRate 0.0379 Epoch: 14 Global Step: 149950 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 01:39:43,633-Speed 5956.24 samples/sec Loss 4.6188 LearningRate 0.0379 Epoch: 14 Global Step: 149960 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 01:39:50,483-Speed 5981.36 samples/sec Loss 4.6226 LearningRate 0.0378 Epoch: 14 Global Step: 149970 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 01:39:57,356-Speed 5960.65 samples/sec Loss 4.6479 LearningRate 0.0378 Epoch: 14 Global Step: 149980 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:40:04,211-Speed 5975.69 samples/sec Loss 4.6473 LearningRate 0.0378 Epoch: 14 Global Step: 149990 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:40:11,127-Speed 5924.55 samples/sec Loss 4.6106 LearningRate 0.0378 Epoch: 14 Global Step: 150000 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:40:38,124-[lfw][150000]XNorm: 22.411340 Training: 2022-01-09 01:40:38,125-[lfw][150000]Accuracy-Flip: 0.99800+-0.00267 Training: 2022-01-09 01:40:38,125-[lfw][150000]Accuracy-Highest: 0.99800 Training: 2022-01-09 01:41:09,219-[cfp_fp][150000]XNorm: 20.023255 Training: 2022-01-09 01:41:09,220-[cfp_fp][150000]Accuracy-Flip: 0.98771+-0.00500 Training: 2022-01-09 01:41:09,221-[cfp_fp][150000]Accuracy-Highest: 0.98771 Training: 2022-01-09 01:41:36,021-[agedb_30][150000]XNorm: 22.346659 Training: 2022-01-09 01:41:36,022-[agedb_30][150000]Accuracy-Flip: 0.97833+-0.00671 Training: 2022-01-09 01:41:36,022-[agedb_30][150000]Accuracy-Highest: 0.97833 Training: 2022-01-09 01:41:42,864-Speed 446.50 samples/sec Loss 4.6262 LearningRate 0.0378 Epoch: 14 Global Step: 150010 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:41:49,708-Speed 5986.40 samples/sec Loss 4.6483 LearningRate 0.0378 Epoch: 14 Global Step: 150020 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:41:56,566-Speed 5974.21 samples/sec Loss 4.6091 LearningRate 0.0378 Epoch: 14 Global Step: 150030 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:42:03,426-Speed 5975.70 samples/sec Loss 4.6700 LearningRate 0.0378 Epoch: 14 Global Step: 150040 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:42:10,277-Speed 5979.96 samples/sec Loss 4.6420 LearningRate 0.0377 Epoch: 14 Global Step: 150050 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:42:17,139-Speed 5971.00 samples/sec Loss 4.6055 LearningRate 0.0377 Epoch: 14 Global Step: 150060 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:42:23,996-Speed 5974.61 samples/sec Loss 4.6891 LearningRate 0.0377 Epoch: 14 Global Step: 150070 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:42:30,870-Speed 5962.13 samples/sec Loss 4.6371 LearningRate 0.0377 Epoch: 14 Global Step: 150080 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 01:42:37,717-Speed 5983.71 samples/sec Loss 4.6737 LearningRate 0.0377 Epoch: 14 Global Step: 150090 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 01:42:44,645-Speed 5912.80 samples/sec Loss 4.6222 LearningRate 0.0377 Epoch: 14 Global Step: 150100 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:42:51,486-Speed 5988.20 samples/sec Loss 4.6311 LearningRate 0.0377 Epoch: 14 Global Step: 150110 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:42:58,351-Speed 5968.41 samples/sec Loss 4.6717 LearningRate 0.0376 Epoch: 14 Global Step: 150120 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:43:05,246-Speed 5941.50 samples/sec Loss 4.6073 LearningRate 0.0376 Epoch: 14 Global Step: 150130 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:43:12,157-Speed 5927.65 samples/sec Loss 4.5871 LearningRate 0.0376 Epoch: 14 Global Step: 150140 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:43:19,010-Speed 5978.40 samples/sec Loss 4.6501 LearningRate 0.0376 Epoch: 14 Global Step: 150150 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:43:25,860-Speed 5980.61 samples/sec Loss 4.6682 LearningRate 0.0376 Epoch: 14 Global Step: 150160 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:43:32,713-Speed 5977.18 samples/sec Loss 4.6703 LearningRate 0.0376 Epoch: 14 Global Step: 150170 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:43:39,574-Speed 5971.45 samples/sec Loss 4.6400 LearningRate 0.0376 Epoch: 14 Global Step: 150180 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:43:46,419-Speed 5985.22 samples/sec Loss 4.6266 LearningRate 0.0376 Epoch: 14 Global Step: 150190 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:43:53,262-Speed 5986.97 samples/sec Loss 4.6396 LearningRate 0.0375 Epoch: 14 Global Step: 150200 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 01:44:00,132-Speed 5965.16 samples/sec Loss 4.6324 LearningRate 0.0375 Epoch: 14 Global Step: 150210 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 01:44:07,002-Speed 5964.19 samples/sec Loss 4.6348 LearningRate 0.0375 Epoch: 14 Global Step: 150220 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 01:44:13,852-Speed 5980.20 samples/sec Loss 4.6346 LearningRate 0.0375 Epoch: 14 Global Step: 150230 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 01:44:20,691-Speed 5990.47 samples/sec Loss 4.6109 LearningRate 0.0375 Epoch: 14 Global Step: 150240 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:44:27,542-Speed 5980.10 samples/sec Loss 4.6675 LearningRate 0.0375 Epoch: 14 Global Step: 150250 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:44:34,394-Speed 5978.48 samples/sec Loss 4.6119 LearningRate 0.0375 Epoch: 14 Global Step: 150260 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:44:41,249-Speed 5976.74 samples/sec Loss 4.6164 LearningRate 0.0375 Epoch: 14 Global Step: 150270 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:44:48,126-Speed 5957.64 samples/sec Loss 4.6287 LearningRate 0.0374 Epoch: 14 Global Step: 150280 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:44:54,979-Speed 5977.35 samples/sec Loss 4.5875 LearningRate 0.0374 Epoch: 14 Global Step: 150290 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:45:01,828-Speed 5982.29 samples/sec Loss 4.5784 LearningRate 0.0374 Epoch: 14 Global Step: 150300 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:45:08,725-Speed 5940.30 samples/sec Loss 4.6331 LearningRate 0.0374 Epoch: 14 Global Step: 150310 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:45:15,571-Speed 5984.36 samples/sec Loss 4.6045 LearningRate 0.0374 Epoch: 14 Global Step: 150320 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:45:22,423-Speed 5978.80 samples/sec Loss 4.6129 LearningRate 0.0374 Epoch: 14 Global Step: 150330 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:45:29,283-Speed 5972.12 samples/sec Loss 4.6547 LearningRate 0.0374 Epoch: 14 Global Step: 150340 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 01:45:36,119-Speed 5993.18 samples/sec Loss 4.6446 LearningRate 0.0373 Epoch: 14 Global Step: 150350 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:45:42,971-Speed 5979.32 samples/sec Loss 4.6367 LearningRate 0.0373 Epoch: 14 Global Step: 150360 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:45:49,841-Speed 5963.79 samples/sec Loss 4.5764 LearningRate 0.0373 Epoch: 14 Global Step: 150370 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:45:56,696-Speed 5976.01 samples/sec Loss 4.5664 LearningRate 0.0373 Epoch: 14 Global Step: 150380 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:46:03,547-Speed 5979.90 samples/sec Loss 4.6063 LearningRate 0.0373 Epoch: 14 Global Step: 150390 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:46:10,399-Speed 5980.67 samples/sec Loss 4.6234 LearningRate 0.0373 Epoch: 14 Global Step: 150400 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:46:17,247-Speed 5983.01 samples/sec Loss 4.5833 LearningRate 0.0373 Epoch: 14 Global Step: 150410 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:46:24,099-Speed 5978.80 samples/sec Loss 4.5839 LearningRate 0.0373 Epoch: 14 Global Step: 150420 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:46:30,963-Speed 5969.03 samples/sec Loss 4.5923 LearningRate 0.0372 Epoch: 14 Global Step: 150430 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:46:37,825-Speed 5970.18 samples/sec Loss 4.6228 LearningRate 0.0372 Epoch: 14 Global Step: 150440 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:46:44,683-Speed 5973.65 samples/sec Loss 4.5576 LearningRate 0.0372 Epoch: 14 Global Step: 150450 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 01:46:51,578-Speed 5941.57 samples/sec Loss 4.6135 LearningRate 0.0372 Epoch: 14 Global Step: 150460 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 01:46:58,426-Speed 5982.16 samples/sec Loss 4.5591 LearningRate 0.0372 Epoch: 14 Global Step: 150470 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 01:47:05,265-Speed 5990.35 samples/sec Loss 4.5971 LearningRate 0.0372 Epoch: 14 Global Step: 150480 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:47:12,114-Speed 5985.37 samples/sec Loss 4.6151 LearningRate 0.0372 Epoch: 14 Global Step: 150490 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:47:18,969-Speed 5976.48 samples/sec Loss 4.6530 LearningRate 0.0372 Epoch: 14 Global Step: 150500 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:47:25,820-Speed 5981.47 samples/sec Loss 4.6167 LearningRate 0.0371 Epoch: 14 Global Step: 150510 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:47:32,682-Speed 5970.01 samples/sec Loss 4.5762 LearningRate 0.0371 Epoch: 14 Global Step: 150520 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:47:39,593-Speed 5928.23 samples/sec Loss 4.5519 LearningRate 0.0371 Epoch: 14 Global Step: 150530 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:47:46,453-Speed 5972.21 samples/sec Loss 4.6160 LearningRate 0.0371 Epoch: 14 Global Step: 150540 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 01:47:53,307-Speed 5980.62 samples/sec Loss 4.5710 LearningRate 0.0371 Epoch: 14 Global Step: 150550 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 01:48:00,160-Speed 5978.05 samples/sec Loss 4.6083 LearningRate 0.0371 Epoch: 14 Global Step: 150560 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 01:48:07,029-Speed 5964.26 samples/sec Loss 4.6127 LearningRate 0.0371 Epoch: 14 Global Step: 150570 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 01:48:13,904-Speed 5959.18 samples/sec Loss 4.6096 LearningRate 0.0370 Epoch: 14 Global Step: 150580 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 01:48:20,771-Speed 5965.44 samples/sec Loss 4.5987 LearningRate 0.0370 Epoch: 14 Global Step: 150590 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 01:48:27,640-Speed 5965.08 samples/sec Loss 4.5722 LearningRate 0.0370 Epoch: 14 Global Step: 150600 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 01:48:34,553-Speed 5926.17 samples/sec Loss 4.6109 LearningRate 0.0370 Epoch: 14 Global Step: 150610 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 01:48:41,481-Speed 5912.88 samples/sec Loss 4.5694 LearningRate 0.0370 Epoch: 14 Global Step: 150620 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 01:48:48,332-Speed 5979.88 samples/sec Loss 4.6388 LearningRate 0.0370 Epoch: 14 Global Step: 150630 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 01:48:55,181-Speed 5982.02 samples/sec Loss 4.5919 LearningRate 0.0370 Epoch: 14 Global Step: 150640 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:49:02,041-Speed 5972.75 samples/sec Loss 4.5906 LearningRate 0.0370 Epoch: 14 Global Step: 150650 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:49:08,904-Speed 5971.72 samples/sec Loss 4.6038 LearningRate 0.0369 Epoch: 14 Global Step: 150660 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:49:15,777-Speed 5960.56 samples/sec Loss 4.5863 LearningRate 0.0369 Epoch: 14 Global Step: 150670 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:49:22,660-Speed 5952.21 samples/sec Loss 4.5869 LearningRate 0.0369 Epoch: 14 Global Step: 150680 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:49:29,522-Speed 5970.34 samples/sec Loss 4.5698 LearningRate 0.0369 Epoch: 14 Global Step: 150690 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:49:36,369-Speed 5983.90 samples/sec Loss 4.6550 LearningRate 0.0369 Epoch: 14 Global Step: 150700 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:49:43,220-Speed 5979.05 samples/sec Loss 4.6447 LearningRate 0.0369 Epoch: 14 Global Step: 150710 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:49:50,074-Speed 5978.92 samples/sec Loss 4.5802 LearningRate 0.0369 Epoch: 14 Global Step: 150720 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:49:56,935-Speed 5971.65 samples/sec Loss 4.5411 LearningRate 0.0369 Epoch: 14 Global Step: 150730 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:50:03,783-Speed 5981.69 samples/sec Loss 4.6041 LearningRate 0.0368 Epoch: 14 Global Step: 150740 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 01:50:10,649-Speed 5966.99 samples/sec Loss 4.5894 LearningRate 0.0368 Epoch: 14 Global Step: 150750 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:50:17,511-Speed 5970.41 samples/sec Loss 4.5615 LearningRate 0.0368 Epoch: 14 Global Step: 150760 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:50:24,370-Speed 5972.66 samples/sec Loss 4.6175 LearningRate 0.0368 Epoch: 14 Global Step: 150770 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:50:31,243-Speed 5960.75 samples/sec Loss 4.5635 LearningRate 0.0368 Epoch: 14 Global Step: 150780 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:50:38,114-Speed 5963.07 samples/sec Loss 4.6030 LearningRate 0.0368 Epoch: 14 Global Step: 150790 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:50:44,980-Speed 5966.43 samples/sec Loss 4.5421 LearningRate 0.0368 Epoch: 14 Global Step: 150800 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:50:51,840-Speed 5972.22 samples/sec Loss 4.5256 LearningRate 0.0367 Epoch: 14 Global Step: 150810 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:50:58,712-Speed 5961.66 samples/sec Loss 4.5901 LearningRate 0.0367 Epoch: 14 Global Step: 150820 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:51:05,573-Speed 5971.63 samples/sec Loss 4.5961 LearningRate 0.0367 Epoch: 14 Global Step: 150830 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:51:12,423-Speed 5980.48 samples/sec Loss 4.5632 LearningRate 0.0367 Epoch: 14 Global Step: 150840 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:51:19,277-Speed 5977.94 samples/sec Loss 4.6064 LearningRate 0.0367 Epoch: 14 Global Step: 150850 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:51:26,132-Speed 5976.37 samples/sec Loss 4.5664 LearningRate 0.0367 Epoch: 14 Global Step: 150860 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:51:32,983-Speed 5980.05 samples/sec Loss 4.5564 LearningRate 0.0367 Epoch: 14 Global Step: 150870 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:51:39,840-Speed 5974.24 samples/sec Loss 4.5856 LearningRate 0.0367 Epoch: 14 Global Step: 150880 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:51:46,704-Speed 5968.52 samples/sec Loss 4.6019 LearningRate 0.0366 Epoch: 14 Global Step: 150890 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:51:53,580-Speed 5958.75 samples/sec Loss 4.5759 LearningRate 0.0366 Epoch: 14 Global Step: 150900 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:52:00,459-Speed 5955.98 samples/sec Loss 4.5785 LearningRate 0.0366 Epoch: 14 Global Step: 150910 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:52:07,342-Speed 5952.44 samples/sec Loss 4.6215 LearningRate 0.0366 Epoch: 14 Global Step: 150920 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:52:14,205-Speed 5969.01 samples/sec Loss 4.5532 LearningRate 0.0366 Epoch: 14 Global Step: 150930 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:52:21,077-Speed 5961.90 samples/sec Loss 4.6048 LearningRate 0.0366 Epoch: 14 Global Step: 150940 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:52:27,967-Speed 5945.88 samples/sec Loss 4.5413 LearningRate 0.0366 Epoch: 14 Global Step: 150950 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:52:34,830-Speed 5969.94 samples/sec Loss 4.5895 LearningRate 0.0366 Epoch: 14 Global Step: 150960 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:52:41,690-Speed 5972.39 samples/sec Loss 4.5683 LearningRate 0.0365 Epoch: 14 Global Step: 150970 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:52:48,562-Speed 5961.17 samples/sec Loss 4.5444 LearningRate 0.0365 Epoch: 14 Global Step: 150980 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:52:55,425-Speed 5969.73 samples/sec Loss 4.5723 LearningRate 0.0365 Epoch: 14 Global Step: 150990 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:53:02,281-Speed 5975.58 samples/sec Loss 4.5984 LearningRate 0.0365 Epoch: 14 Global Step: 151000 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:53:09,151-Speed 5963.65 samples/sec Loss 4.5613 LearningRate 0.0365 Epoch: 14 Global Step: 151010 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:53:16,001-Speed 5980.14 samples/sec Loss 4.5556 LearningRate 0.0365 Epoch: 14 Global Step: 151020 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:53:22,870-Speed 5964.71 samples/sec Loss 4.5902 LearningRate 0.0365 Epoch: 14 Global Step: 151030 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:53:29,721-Speed 5979.96 samples/sec Loss 4.5399 LearningRate 0.0364 Epoch: 14 Global Step: 151040 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:53:36,588-Speed 5968.42 samples/sec Loss 4.5780 LearningRate 0.0364 Epoch: 14 Global Step: 151050 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 01:53:43,472-Speed 5950.64 samples/sec Loss 4.5374 LearningRate 0.0364 Epoch: 14 Global Step: 151060 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 01:53:50,327-Speed 5976.87 samples/sec Loss 4.5556 LearningRate 0.0364 Epoch: 14 Global Step: 151070 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:53:57,241-Speed 5925.21 samples/sec Loss 4.5300 LearningRate 0.0364 Epoch: 14 Global Step: 151080 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:54:04,110-Speed 5964.25 samples/sec Loss 4.5916 LearningRate 0.0364 Epoch: 14 Global Step: 151090 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:54:10,961-Speed 5979.34 samples/sec Loss 4.5902 LearningRate 0.0364 Epoch: 14 Global Step: 151100 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:54:17,821-Speed 5972.00 samples/sec Loss 4.5677 LearningRate 0.0364 Epoch: 14 Global Step: 151110 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:54:24,681-Speed 5972.09 samples/sec Loss 4.5317 LearningRate 0.0363 Epoch: 14 Global Step: 151120 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:54:31,541-Speed 5972.08 samples/sec Loss 4.5619 LearningRate 0.0363 Epoch: 14 Global Step: 151130 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:54:38,398-Speed 5975.19 samples/sec Loss 4.5301 LearningRate 0.0363 Epoch: 14 Global Step: 151140 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:54:45,291-Speed 5943.10 samples/sec Loss 4.5767 LearningRate 0.0363 Epoch: 14 Global Step: 151150 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:54:52,137-Speed 5983.66 samples/sec Loss 4.5143 LearningRate 0.0363 Epoch: 14 Global Step: 151160 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 01:54:59,005-Speed 5964.86 samples/sec Loss 4.5324 LearningRate 0.0363 Epoch: 14 Global Step: 151170 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 01:55:05,872-Speed 5966.69 samples/sec Loss 4.5843 LearningRate 0.0363 Epoch: 14 Global Step: 151180 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 01:55:12,724-Speed 5978.78 samples/sec Loss 4.5230 LearningRate 0.0363 Epoch: 14 Global Step: 151190 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 01:55:19,594-Speed 5962.94 samples/sec Loss 4.4888 LearningRate 0.0362 Epoch: 14 Global Step: 151200 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 01:55:26,464-Speed 5963.06 samples/sec Loss 4.5608 LearningRate 0.0362 Epoch: 14 Global Step: 151210 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 01:55:33,317-Speed 5977.18 samples/sec Loss 4.5664 LearningRate 0.0362 Epoch: 14 Global Step: 151220 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 01:55:40,178-Speed 5971.57 samples/sec Loss 4.5168 LearningRate 0.0362 Epoch: 14 Global Step: 151230 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 01:55:47,055-Speed 5957.98 samples/sec Loss 4.5977 LearningRate 0.0362 Epoch: 14 Global Step: 151240 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 01:55:53,929-Speed 5958.93 samples/sec Loss 4.5509 LearningRate 0.0362 Epoch: 14 Global Step: 151250 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 01:56:00,800-Speed 5963.37 samples/sec Loss 4.5708 LearningRate 0.0362 Epoch: 14 Global Step: 151260 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:56:07,659-Speed 5972.20 samples/sec Loss 4.5274 LearningRate 0.0362 Epoch: 14 Global Step: 151270 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:56:14,515-Speed 5975.24 samples/sec Loss 4.5724 LearningRate 0.0361 Epoch: 14 Global Step: 151280 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:56:21,403-Speed 5948.40 samples/sec Loss 4.5279 LearningRate 0.0361 Epoch: 14 Global Step: 151290 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:56:28,257-Speed 5976.86 samples/sec Loss 4.5476 LearningRate 0.0361 Epoch: 14 Global Step: 151300 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:56:35,125-Speed 5965.27 samples/sec Loss 4.5596 LearningRate 0.0361 Epoch: 14 Global Step: 151310 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:56:41,974-Speed 5981.55 samples/sec Loss 4.4914 LearningRate 0.0361 Epoch: 14 Global Step: 151320 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:56:48,837-Speed 5969.46 samples/sec Loss 4.5089 LearningRate 0.0361 Epoch: 14 Global Step: 151330 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:56:55,687-Speed 5980.05 samples/sec Loss 4.5134 LearningRate 0.0361 Epoch: 14 Global Step: 151340 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:57:02,558-Speed 5963.16 samples/sec Loss 4.5045 LearningRate 0.0360 Epoch: 14 Global Step: 151350 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:57:09,433-Speed 5959.41 samples/sec Loss 4.5445 LearningRate 0.0360 Epoch: 14 Global Step: 151360 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 01:57:16,298-Speed 5969.61 samples/sec Loss 4.4978 LearningRate 0.0360 Epoch: 14 Global Step: 151370 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:57:23,176-Speed 5956.66 samples/sec Loss 4.5319 LearningRate 0.0360 Epoch: 14 Global Step: 151380 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:57:30,088-Speed 5927.94 samples/sec Loss 4.5319 LearningRate 0.0360 Epoch: 14 Global Step: 151390 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:57:37,007-Speed 5920.74 samples/sec Loss 4.5352 LearningRate 0.0360 Epoch: 14 Global Step: 151400 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:57:43,873-Speed 5966.90 samples/sec Loss 4.5996 LearningRate 0.0360 Epoch: 14 Global Step: 151410 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:57:50,727-Speed 5978.09 samples/sec Loss 4.5388 LearningRate 0.0360 Epoch: 14 Global Step: 151420 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:57:57,580-Speed 5977.84 samples/sec Loss 4.5403 LearningRate 0.0359 Epoch: 14 Global Step: 151430 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:58:04,436-Speed 5975.42 samples/sec Loss 4.5447 LearningRate 0.0359 Epoch: 14 Global Step: 151440 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:58:11,297-Speed 5970.61 samples/sec Loss 4.5558 LearningRate 0.0359 Epoch: 14 Global Step: 151450 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:58:18,182-Speed 5950.72 samples/sec Loss 4.5004 LearningRate 0.0359 Epoch: 14 Global Step: 151460 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:58:25,044-Speed 5970.00 samples/sec Loss 4.5883 LearningRate 0.0359 Epoch: 14 Global Step: 151470 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 01:58:31,898-Speed 5977.87 samples/sec Loss 4.5238 LearningRate 0.0359 Epoch: 14 Global Step: 151480 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 01:58:38,750-Speed 5978.59 samples/sec Loss 4.5773 LearningRate 0.0359 Epoch: 14 Global Step: 151490 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 01:58:45,602-Speed 5978.75 samples/sec Loss 4.5484 LearningRate 0.0359 Epoch: 14 Global Step: 151500 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 01:58:52,451-Speed 5983.83 samples/sec Loss 4.5452 LearningRate 0.0358 Epoch: 14 Global Step: 151510 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:58:59,325-Speed 5959.75 samples/sec Loss 4.5727 LearningRate 0.0358 Epoch: 14 Global Step: 151520 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:59:06,193-Speed 5965.87 samples/sec Loss 4.5459 LearningRate 0.0358 Epoch: 14 Global Step: 151530 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:59:13,074-Speed 5954.17 samples/sec Loss 4.5640 LearningRate 0.0358 Epoch: 14 Global Step: 151540 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:59:19,930-Speed 5974.59 samples/sec Loss 4.5713 LearningRate 0.0358 Epoch: 14 Global Step: 151550 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:59:26,820-Speed 5946.98 samples/sec Loss 4.5415 LearningRate 0.0358 Epoch: 14 Global Step: 151560 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:59:33,692-Speed 5961.28 samples/sec Loss 4.5851 LearningRate 0.0358 Epoch: 14 Global Step: 151570 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:59:40,539-Speed 5983.29 samples/sec Loss 4.5191 LearningRate 0.0358 Epoch: 14 Global Step: 151580 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:59:47,394-Speed 5977.10 samples/sec Loss 4.5023 LearningRate 0.0357 Epoch: 14 Global Step: 151590 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 01:59:54,259-Speed 5970.42 samples/sec Loss 4.6023 LearningRate 0.0357 Epoch: 14 Global Step: 151600 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:00:01,116-Speed 5974.10 samples/sec Loss 4.4988 LearningRate 0.0357 Epoch: 14 Global Step: 151610 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 02:00:07,989-Speed 5961.18 samples/sec Loss 4.5525 LearningRate 0.0357 Epoch: 14 Global Step: 151620 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 02:00:14,844-Speed 5976.35 samples/sec Loss 4.5104 LearningRate 0.0357 Epoch: 14 Global Step: 151630 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:00:21,705-Speed 5971.22 samples/sec Loss 4.5048 LearningRate 0.0357 Epoch: 14 Global Step: 151640 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:00:28,563-Speed 5976.50 samples/sec Loss 4.5576 LearningRate 0.0357 Epoch: 14 Global Step: 151650 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:00:35,424-Speed 5971.38 samples/sec Loss 4.5594 LearningRate 0.0357 Epoch: 14 Global Step: 151660 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:00:42,274-Speed 5980.45 samples/sec Loss 4.5427 LearningRate 0.0356 Epoch: 14 Global Step: 151670 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:00:49,119-Speed 5984.71 samples/sec Loss 4.4747 LearningRate 0.0356 Epoch: 14 Global Step: 151680 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:00:55,984-Speed 5968.03 samples/sec Loss 4.5557 LearningRate 0.0356 Epoch: 14 Global Step: 151690 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:01:02,861-Speed 5957.01 samples/sec Loss 4.5452 LearningRate 0.0356 Epoch: 14 Global Step: 151700 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:01:09,728-Speed 5966.89 samples/sec Loss 4.4597 LearningRate 0.0356 Epoch: 14 Global Step: 151710 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:01:16,583-Speed 5976.71 samples/sec Loss 4.5099 LearningRate 0.0356 Epoch: 14 Global Step: 151720 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:01:23,434-Speed 5979.32 samples/sec Loss 4.5234 LearningRate 0.0356 Epoch: 14 Global Step: 151730 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:01:30,301-Speed 5966.34 samples/sec Loss 4.5109 LearningRate 0.0355 Epoch: 14 Global Step: 151740 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:01:37,167-Speed 5967.61 samples/sec Loss 4.4772 LearningRate 0.0355 Epoch: 14 Global Step: 151750 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:01:44,053-Speed 5949.12 samples/sec Loss 4.5668 LearningRate 0.0355 Epoch: 14 Global Step: 151760 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:01:50,900-Speed 5982.78 samples/sec Loss 4.5524 LearningRate 0.0355 Epoch: 14 Global Step: 151770 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:01:57,798-Speed 5939.06 samples/sec Loss 4.4688 LearningRate 0.0355 Epoch: 14 Global Step: 151780 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:02:04,650-Speed 5979.13 samples/sec Loss 4.4479 LearningRate 0.0355 Epoch: 14 Global Step: 151790 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:02:11,508-Speed 5974.11 samples/sec Loss 4.5066 LearningRate 0.0355 Epoch: 14 Global Step: 151800 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:02:18,361-Speed 5977.97 samples/sec Loss 4.4896 LearningRate 0.0355 Epoch: 14 Global Step: 151810 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:02:25,234-Speed 5961.12 samples/sec Loss 4.4632 LearningRate 0.0354 Epoch: 14 Global Step: 151820 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:02:32,107-Speed 5960.78 samples/sec Loss 4.5597 LearningRate 0.0354 Epoch: 14 Global Step: 151830 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 02:02:38,984-Speed 5956.91 samples/sec Loss 4.5356 LearningRate 0.0354 Epoch: 14 Global Step: 151840 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 02:02:45,884-Speed 5937.06 samples/sec Loss 4.4816 LearningRate 0.0354 Epoch: 14 Global Step: 151850 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 02:02:52,737-Speed 5978.62 samples/sec Loss 4.5331 LearningRate 0.0354 Epoch: 14 Global Step: 151860 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 02:02:59,585-Speed 5985.16 samples/sec Loss 4.5305 LearningRate 0.0354 Epoch: 14 Global Step: 151870 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 02:03:06,441-Speed 5975.18 samples/sec Loss 4.4717 LearningRate 0.0354 Epoch: 14 Global Step: 151880 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 02:03:13,290-Speed 5981.85 samples/sec Loss 4.4550 LearningRate 0.0354 Epoch: 14 Global Step: 151890 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:03:20,157-Speed 5966.49 samples/sec Loss 4.5251 LearningRate 0.0353 Epoch: 14 Global Step: 151900 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:03:27,029-Speed 5961.91 samples/sec Loss 4.5146 LearningRate 0.0353 Epoch: 14 Global Step: 151910 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:03:33,887-Speed 5973.38 samples/sec Loss 4.5295 LearningRate 0.0353 Epoch: 14 Global Step: 151920 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:03:40,782-Speed 5942.24 samples/sec Loss 4.5180 LearningRate 0.0353 Epoch: 14 Global Step: 151930 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:03:47,681-Speed 5937.82 samples/sec Loss 4.4853 LearningRate 0.0353 Epoch: 14 Global Step: 151940 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:03:54,550-Speed 5964.61 samples/sec Loss 4.4802 LearningRate 0.0353 Epoch: 14 Global Step: 151950 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:04:01,433-Speed 5952.62 samples/sec Loss 4.5026 LearningRate 0.0353 Epoch: 14 Global Step: 151960 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:04:08,290-Speed 5974.50 samples/sec Loss 4.5155 LearningRate 0.0353 Epoch: 14 Global Step: 151970 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:04:15,138-Speed 5982.25 samples/sec Loss 4.5060 LearningRate 0.0352 Epoch: 14 Global Step: 151980 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:04:22,011-Speed 5961.17 samples/sec Loss 4.4967 LearningRate 0.0352 Epoch: 14 Global Step: 151990 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:04:28,868-Speed 5974.34 samples/sec Loss 4.5442 LearningRate 0.0352 Epoch: 14 Global Step: 152000 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:04:35,739-Speed 5962.78 samples/sec Loss 4.4925 LearningRate 0.0352 Epoch: 14 Global Step: 152010 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:04:42,615-Speed 5961.16 samples/sec Loss 4.5089 LearningRate 0.0352 Epoch: 14 Global Step: 152020 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:04:49,539-Speed 5916.62 samples/sec Loss 4.5664 LearningRate 0.0352 Epoch: 14 Global Step: 152030 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:04:56,394-Speed 5977.35 samples/sec Loss 4.5030 LearningRate 0.0352 Epoch: 14 Global Step: 152040 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:05:03,241-Speed 5983.33 samples/sec Loss 4.4862 LearningRate 0.0352 Epoch: 14 Global Step: 152050 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:05:10,104-Speed 5969.54 samples/sec Loss 4.4801 LearningRate 0.0351 Epoch: 14 Global Step: 152060 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:05:16,967-Speed 5969.75 samples/sec Loss 4.4560 LearningRate 0.0351 Epoch: 14 Global Step: 152070 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:05:23,811-Speed 5985.42 samples/sec Loss 4.4754 LearningRate 0.0351 Epoch: 14 Global Step: 152080 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:05:30,676-Speed 5967.78 samples/sec Loss 4.4912 LearningRate 0.0351 Epoch: 14 Global Step: 152090 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 02:05:37,538-Speed 5969.72 samples/sec Loss 4.4838 LearningRate 0.0351 Epoch: 14 Global Step: 152100 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:05:44,390-Speed 5980.42 samples/sec Loss 4.4570 LearningRate 0.0351 Epoch: 14 Global Step: 152110 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:05:51,279-Speed 5946.71 samples/sec Loss 4.4438 LearningRate 0.0351 Epoch: 14 Global Step: 152120 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:05:58,132-Speed 5978.09 samples/sec Loss 4.5009 LearningRate 0.0351 Epoch: 14 Global Step: 152130 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:06:05,031-Speed 5938.32 samples/sec Loss 4.4393 LearningRate 0.0350 Epoch: 14 Global Step: 152140 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:06:11,884-Speed 5977.93 samples/sec Loss 4.4571 LearningRate 0.0350 Epoch: 14 Global Step: 152150 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:06:18,740-Speed 5975.30 samples/sec Loss 4.5104 LearningRate 0.0350 Epoch: 14 Global Step: 152160 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:06:25,591-Speed 5980.07 samples/sec Loss 4.5255 LearningRate 0.0350 Epoch: 14 Global Step: 152170 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:06:32,488-Speed 5940.11 samples/sec Loss 4.5029 LearningRate 0.0350 Epoch: 14 Global Step: 152180 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:06:39,375-Speed 5949.41 samples/sec Loss 4.5043 LearningRate 0.0350 Epoch: 14 Global Step: 152190 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:06:46,272-Speed 5939.72 samples/sec Loss 4.4971 LearningRate 0.0350 Epoch: 14 Global Step: 152200 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 02:06:53,135-Speed 5969.34 samples/sec Loss 4.4718 LearningRate 0.0350 Epoch: 14 Global Step: 152210 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 02:07:00,009-Speed 5959.97 samples/sec Loss 4.4868 LearningRate 0.0349 Epoch: 14 Global Step: 152220 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:07:06,871-Speed 5970.74 samples/sec Loss 4.4816 LearningRate 0.0349 Epoch: 14 Global Step: 152230 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:07:13,756-Speed 5950.01 samples/sec Loss 4.5061 LearningRate 0.0349 Epoch: 14 Global Step: 152240 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:07:20,632-Speed 5957.76 samples/sec Loss 4.4364 LearningRate 0.0349 Epoch: 14 Global Step: 152250 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:07:27,551-Speed 5921.40 samples/sec Loss 4.5053 LearningRate 0.0349 Epoch: 14 Global Step: 152260 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:07:34,424-Speed 5962.88 samples/sec Loss 4.4798 LearningRate 0.0349 Epoch: 14 Global Step: 152270 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:07:41,284-Speed 5972.74 samples/sec Loss 4.4929 LearningRate 0.0349 Epoch: 14 Global Step: 152280 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:07:48,130-Speed 5984.11 samples/sec Loss 4.4911 LearningRate 0.0348 Epoch: 14 Global Step: 152290 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:07:54,984-Speed 5976.05 samples/sec Loss 4.4527 LearningRate 0.0348 Epoch: 14 Global Step: 152300 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:08:01,853-Speed 5964.13 samples/sec Loss 4.4952 LearningRate 0.0348 Epoch: 14 Global Step: 152310 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:08:08,733-Speed 5954.98 samples/sec Loss 4.4648 LearningRate 0.0348 Epoch: 14 Global Step: 152320 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 02:08:15,599-Speed 5967.04 samples/sec Loss 4.4335 LearningRate 0.0348 Epoch: 14 Global Step: 152330 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 02:08:22,445-Speed 5984.38 samples/sec Loss 4.4044 LearningRate 0.0348 Epoch: 14 Global Step: 152340 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 02:08:29,290-Speed 5985.30 samples/sec Loss 4.4257 LearningRate 0.0348 Epoch: 14 Global Step: 152350 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 02:08:36,138-Speed 5981.99 samples/sec Loss 4.4734 LearningRate 0.0348 Epoch: 14 Global Step: 152360 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 02:08:42,991-Speed 5981.28 samples/sec Loss 4.4671 LearningRate 0.0347 Epoch: 14 Global Step: 152370 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 02:08:49,855-Speed 5968.26 samples/sec Loss 4.4866 LearningRate 0.0347 Epoch: 14 Global Step: 152380 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:08:56,714-Speed 5973.20 samples/sec Loss 4.4465 LearningRate 0.0347 Epoch: 14 Global Step: 152390 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:09:03,622-Speed 5933.30 samples/sec Loss 4.4644 LearningRate 0.0347 Epoch: 14 Global Step: 152400 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:09:10,485-Speed 5969.77 samples/sec Loss 4.4456 LearningRate 0.0347 Epoch: 14 Global Step: 152410 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:09:17,360-Speed 5958.51 samples/sec Loss 4.4727 LearningRate 0.0347 Epoch: 14 Global Step: 152420 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:09:24,237-Speed 5957.05 samples/sec Loss 4.4719 LearningRate 0.0347 Epoch: 14 Global Step: 152430 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:09:31,101-Speed 5969.07 samples/sec Loss 4.4512 LearningRate 0.0347 Epoch: 14 Global Step: 152440 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:09:37,975-Speed 5959.62 samples/sec Loss 4.5051 LearningRate 0.0346 Epoch: 14 Global Step: 152450 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:09:44,840-Speed 5968.00 samples/sec Loss 4.3993 LearningRate 0.0346 Epoch: 14 Global Step: 152460 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:09:51,689-Speed 5981.19 samples/sec Loss 4.4299 LearningRate 0.0346 Epoch: 14 Global Step: 152470 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:09:58,543-Speed 5977.44 samples/sec Loss 4.4991 LearningRate 0.0346 Epoch: 14 Global Step: 152480 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:10:05,405-Speed 5972.09 samples/sec Loss 4.4750 LearningRate 0.0346 Epoch: 14 Global Step: 152490 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:10:12,254-Speed 5983.48 samples/sec Loss 4.4616 LearningRate 0.0346 Epoch: 14 Global Step: 152500 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:10:19,108-Speed 5977.42 samples/sec Loss 4.4466 LearningRate 0.0346 Epoch: 14 Global Step: 152510 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:10:25,967-Speed 5972.71 samples/sec Loss 4.4594 LearningRate 0.0346 Epoch: 14 Global Step: 152520 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:10:32,838-Speed 5963.00 samples/sec Loss 4.3987 LearningRate 0.0345 Epoch: 14 Global Step: 152530 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:10:39,684-Speed 5983.45 samples/sec Loss 4.5079 LearningRate 0.0345 Epoch: 14 Global Step: 152540 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:10:46,558-Speed 5960.64 samples/sec Loss 4.4105 LearningRate 0.0345 Epoch: 14 Global Step: 152550 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:10:53,423-Speed 5967.76 samples/sec Loss 4.4592 LearningRate 0.0345 Epoch: 14 Global Step: 152560 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:11:00,273-Speed 5980.42 samples/sec Loss 4.4459 LearningRate 0.0345 Epoch: 14 Global Step: 152570 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:11:07,139-Speed 5967.03 samples/sec Loss 4.4342 LearningRate 0.0345 Epoch: 14 Global Step: 152580 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 02:11:14,017-Speed 5956.60 samples/sec Loss 4.4878 LearningRate 0.0345 Epoch: 14 Global Step: 152590 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 02:11:20,895-Speed 5956.27 samples/sec Loss 4.4463 LearningRate 0.0345 Epoch: 14 Global Step: 152600 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:11:27,755-Speed 5971.92 samples/sec Loss 4.4586 LearningRate 0.0344 Epoch: 14 Global Step: 152610 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:11:34,642-Speed 5949.16 samples/sec Loss 4.4846 LearningRate 0.0344 Epoch: 14 Global Step: 152620 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:11:41,489-Speed 5983.53 samples/sec Loss 4.4986 LearningRate 0.0344 Epoch: 14 Global Step: 152630 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:11:48,379-Speed 5945.80 samples/sec Loss 4.4246 LearningRate 0.0344 Epoch: 14 Global Step: 152640 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:11:55,254-Speed 5959.58 samples/sec Loss 4.4571 LearningRate 0.0344 Epoch: 14 Global Step: 152650 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:12:02,099-Speed 5984.54 samples/sec Loss 4.4663 LearningRate 0.0344 Epoch: 14 Global Step: 152660 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:12:08,958-Speed 5975.40 samples/sec Loss 4.4030 LearningRate 0.0344 Epoch: 14 Global Step: 152670 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:12:15,853-Speed 5943.81 samples/sec Loss 4.4576 LearningRate 0.0344 Epoch: 14 Global Step: 152680 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:12:22,699-Speed 5983.95 samples/sec Loss 4.4713 LearningRate 0.0343 Epoch: 14 Global Step: 152690 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:12:29,547-Speed 5982.56 samples/sec Loss 4.4431 LearningRate 0.0343 Epoch: 14 Global Step: 152700 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 02:12:36,416-Speed 5964.32 samples/sec Loss 4.4597 LearningRate 0.0343 Epoch: 14 Global Step: 152710 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 02:12:43,270-Speed 5977.08 samples/sec Loss 4.4502 LearningRate 0.0343 Epoch: 14 Global Step: 152720 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:12:50,134-Speed 5968.57 samples/sec Loss 4.4245 LearningRate 0.0343 Epoch: 14 Global Step: 152730 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:12:57,006-Speed 5961.12 samples/sec Loss 4.4862 LearningRate 0.0343 Epoch: 14 Global Step: 152740 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:13:03,852-Speed 5984.21 samples/sec Loss 4.4791 LearningRate 0.0343 Epoch: 14 Global Step: 152750 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:13:10,712-Speed 5972.12 samples/sec Loss 4.4392 LearningRate 0.0343 Epoch: 14 Global Step: 152760 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:13:17,561-Speed 5981.79 samples/sec Loss 4.4602 LearningRate 0.0342 Epoch: 14 Global Step: 152770 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:13:24,409-Speed 5981.33 samples/sec Loss 4.4357 LearningRate 0.0342 Epoch: 14 Global Step: 152780 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:13:31,285-Speed 5960.92 samples/sec Loss 4.4396 LearningRate 0.0342 Epoch: 14 Global Step: 152790 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:13:38,149-Speed 5969.26 samples/sec Loss 4.3763 LearningRate 0.0342 Epoch: 14 Global Step: 152800 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:13:45,018-Speed 5963.21 samples/sec Loss 4.4031 LearningRate 0.0342 Epoch: 14 Global Step: 152810 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:13:51,894-Speed 5958.59 samples/sec Loss 4.4183 LearningRate 0.0342 Epoch: 14 Global Step: 152820 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 02:13:58,750-Speed 5977.42 samples/sec Loss 4.4773 LearningRate 0.0342 Epoch: 14 Global Step: 152830 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 02:14:05,595-Speed 5985.23 samples/sec Loss 4.4124 LearningRate 0.0342 Epoch: 14 Global Step: 152840 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 02:14:12,447-Speed 5978.66 samples/sec Loss 4.3907 LearningRate 0.0341 Epoch: 14 Global Step: 152850 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:14:19,305-Speed 5974.19 samples/sec Loss 4.4106 LearningRate 0.0341 Epoch: 14 Global Step: 152860 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:14:26,158-Speed 5978.31 samples/sec Loss 4.4263 LearningRate 0.0341 Epoch: 14 Global Step: 152870 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:14:33,001-Speed 5987.04 samples/sec Loss 4.4376 LearningRate 0.0341 Epoch: 14 Global Step: 152880 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:14:39,851-Speed 5980.30 samples/sec Loss 4.4025 LearningRate 0.0341 Epoch: 14 Global Step: 152890 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:14:46,716-Speed 5968.06 samples/sec Loss 4.4311 LearningRate 0.0341 Epoch: 14 Global Step: 152900 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:14:53,578-Speed 5970.13 samples/sec Loss 4.5032 LearningRate 0.0341 Epoch: 14 Global Step: 152910 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:15:00,426-Speed 5982.02 samples/sec Loss 4.4393 LearningRate 0.0341 Epoch: 14 Global Step: 152920 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:15:07,277-Speed 5979.66 samples/sec Loss 4.4195 LearningRate 0.0340 Epoch: 14 Global Step: 152930 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:15:14,137-Speed 5972.88 samples/sec Loss 4.4349 LearningRate 0.0340 Epoch: 14 Global Step: 152940 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:15:20,986-Speed 5981.71 samples/sec Loss 4.4294 LearningRate 0.0340 Epoch: 14 Global Step: 152950 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 02:15:27,836-Speed 5980.35 samples/sec Loss 4.4312 LearningRate 0.0340 Epoch: 14 Global Step: 152960 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 02:15:34,691-Speed 5978.97 samples/sec Loss 4.4477 LearningRate 0.0340 Epoch: 14 Global Step: 152970 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 02:15:41,554-Speed 5969.49 samples/sec Loss 4.4534 LearningRate 0.0340 Epoch: 14 Global Step: 152980 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 02:15:48,439-Speed 5950.01 samples/sec Loss 4.4532 LearningRate 0.0340 Epoch: 14 Global Step: 152990 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 02:15:55,302-Speed 5969.46 samples/sec Loss 4.3838 LearningRate 0.0340 Epoch: 14 Global Step: 153000 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 02:16:02,195-Speed 5943.32 samples/sec Loss 4.4419 LearningRate 0.0339 Epoch: 14 Global Step: 153010 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 02:16:09,067-Speed 5961.99 samples/sec Loss 4.4408 LearningRate 0.0339 Epoch: 14 Global Step: 153020 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:16:15,919-Speed 5978.56 samples/sec Loss 4.4180 LearningRate 0.0339 Epoch: 14 Global Step: 153030 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:16:22,769-Speed 5981.13 samples/sec Loss 4.3926 LearningRate 0.0339 Epoch: 14 Global Step: 153040 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:16:29,621-Speed 5977.90 samples/sec Loss 4.4193 LearningRate 0.0339 Epoch: 14 Global Step: 153050 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:16:36,471-Speed 5980.88 samples/sec Loss 4.4385 LearningRate 0.0339 Epoch: 14 Global Step: 153060 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:16:43,321-Speed 5982.61 samples/sec Loss 4.4282 LearningRate 0.0339 Epoch: 14 Global Step: 153070 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:16:50,180-Speed 5972.96 samples/sec Loss 4.4195 LearningRate 0.0339 Epoch: 14 Global Step: 153080 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:16:57,034-Speed 5977.11 samples/sec Loss 4.4242 LearningRate 0.0338 Epoch: 14 Global Step: 153090 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:17:03,928-Speed 5943.34 samples/sec Loss 4.4614 LearningRate 0.0338 Epoch: 14 Global Step: 153100 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:17:10,782-Speed 5976.29 samples/sec Loss 4.4383 LearningRate 0.0338 Epoch: 14 Global Step: 153110 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:17:17,661-Speed 5956.25 samples/sec Loss 4.4572 LearningRate 0.0338 Epoch: 14 Global Step: 153120 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 02:17:24,509-Speed 5982.21 samples/sec Loss 4.4766 LearningRate 0.0338 Epoch: 14 Global Step: 153130 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 02:17:31,354-Speed 5984.52 samples/sec Loss 4.4448 LearningRate 0.0338 Epoch: 14 Global Step: 153140 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:17:38,207-Speed 5978.24 samples/sec Loss 4.4330 LearningRate 0.0338 Epoch: 14 Global Step: 153150 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:17:45,064-Speed 5974.30 samples/sec Loss 4.3925 LearningRate 0.0338 Epoch: 14 Global Step: 153160 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:17:51,923-Speed 5972.81 samples/sec Loss 4.3761 LearningRate 0.0337 Epoch: 14 Global Step: 153170 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:17:58,774-Speed 5980.55 samples/sec Loss 4.4256 LearningRate 0.0337 Epoch: 14 Global Step: 153180 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:18:05,640-Speed 5967.62 samples/sec Loss 4.3838 LearningRate 0.0337 Epoch: 14 Global Step: 153190 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:18:12,530-Speed 5945.29 samples/sec Loss 4.3909 LearningRate 0.0337 Epoch: 14 Global Step: 153200 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:18:19,381-Speed 5980.05 samples/sec Loss 4.4148 LearningRate 0.0337 Epoch: 14 Global Step: 153210 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:18:26,248-Speed 5967.57 samples/sec Loss 4.4065 LearningRate 0.0337 Epoch: 14 Global Step: 153220 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:18:33,135-Speed 5948.48 samples/sec Loss 4.4414 LearningRate 0.0337 Epoch: 14 Global Step: 153230 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:18:40,017-Speed 5952.93 samples/sec Loss 4.4632 LearningRate 0.0337 Epoch: 14 Global Step: 153240 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 02:18:46,866-Speed 5981.39 samples/sec Loss 4.4389 LearningRate 0.0336 Epoch: 14 Global Step: 153250 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 02:18:53,738-Speed 5961.55 samples/sec Loss 4.4362 LearningRate 0.0336 Epoch: 14 Global Step: 153260 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:19:00,611-Speed 5960.62 samples/sec Loss 4.4131 LearningRate 0.0336 Epoch: 14 Global Step: 153270 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:19:07,473-Speed 5970.73 samples/sec Loss 4.3817 LearningRate 0.0336 Epoch: 14 Global Step: 153280 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:19:14,343-Speed 5962.50 samples/sec Loss 4.4190 LearningRate 0.0336 Epoch: 14 Global Step: 153290 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:19:21,211-Speed 5965.32 samples/sec Loss 4.4000 LearningRate 0.0336 Epoch: 14 Global Step: 153300 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:19:28,080-Speed 5964.50 samples/sec Loss 4.4071 LearningRate 0.0336 Epoch: 14 Global Step: 153310 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:19:34,934-Speed 5976.88 samples/sec Loss 4.3644 LearningRate 0.0336 Epoch: 14 Global Step: 153320 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:19:41,792-Speed 5974.77 samples/sec Loss 4.3771 LearningRate 0.0335 Epoch: 14 Global Step: 153330 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:19:48,679-Speed 5948.32 samples/sec Loss 4.4314 LearningRate 0.0335 Epoch: 14 Global Step: 153340 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:19:55,526-Speed 5983.71 samples/sec Loss 4.4169 LearningRate 0.0335 Epoch: 14 Global Step: 153350 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:20:02,385-Speed 5972.90 samples/sec Loss 4.3976 LearningRate 0.0335 Epoch: 14 Global Step: 153360 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 02:20:09,225-Speed 5989.44 samples/sec Loss 4.3710 LearningRate 0.0335 Epoch: 14 Global Step: 153370 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:20:16,078-Speed 5977.30 samples/sec Loss 4.4461 LearningRate 0.0335 Epoch: 14 Global Step: 153380 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:20:22,967-Speed 5947.22 samples/sec Loss 4.4292 LearningRate 0.0335 Epoch: 14 Global Step: 153390 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:20:29,843-Speed 5958.45 samples/sec Loss 4.4269 LearningRate 0.0335 Epoch: 14 Global Step: 153400 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:20:36,733-Speed 5946.46 samples/sec Loss 4.4481 LearningRate 0.0334 Epoch: 14 Global Step: 153410 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:20:43,608-Speed 5958.60 samples/sec Loss 4.4240 LearningRate 0.0334 Epoch: 14 Global Step: 153420 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:20:50,473-Speed 5967.98 samples/sec Loss 4.3740 LearningRate 0.0334 Epoch: 14 Global Step: 153430 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:20:57,334-Speed 5971.09 samples/sec Loss 4.3999 LearningRate 0.0334 Epoch: 14 Global Step: 153440 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:21:04,189-Speed 5976.59 samples/sec Loss 4.3838 LearningRate 0.0334 Epoch: 14 Global Step: 153450 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:21:11,042-Speed 5978.59 samples/sec Loss 4.3871 LearningRate 0.0334 Epoch: 14 Global Step: 153460 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:21:17,930-Speed 5947.14 samples/sec Loss 4.4179 LearningRate 0.0334 Epoch: 14 Global Step: 153470 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 02:21:24,784-Speed 5977.29 samples/sec Loss 4.3988 LearningRate 0.0334 Epoch: 14 Global Step: 153480 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 02:21:31,675-Speed 5944.99 samples/sec Loss 4.3958 LearningRate 0.0333 Epoch: 14 Global Step: 153490 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 02:21:38,538-Speed 5969.48 samples/sec Loss 4.3758 LearningRate 0.0333 Epoch: 14 Global Step: 153500 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 02:21:45,425-Speed 5950.54 samples/sec Loss 4.3978 LearningRate 0.0333 Epoch: 14 Global Step: 153510 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 02:21:52,274-Speed 5983.91 samples/sec Loss 4.4140 LearningRate 0.0333 Epoch: 14 Global Step: 153520 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 02:21:59,158-Speed 5950.75 samples/sec Loss 4.3729 LearningRate 0.0333 Epoch: 14 Global Step: 153530 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:22:06,047-Speed 5947.56 samples/sec Loss 4.3699 LearningRate 0.0333 Epoch: 14 Global Step: 153540 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:22:12,918-Speed 5962.41 samples/sec Loss 4.4156 LearningRate 0.0333 Epoch: 14 Global Step: 153550 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:22:19,791-Speed 5960.33 samples/sec Loss 4.3523 LearningRate 0.0333 Epoch: 14 Global Step: 153560 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:22:26,660-Speed 5964.67 samples/sec Loss 4.4175 LearningRate 0.0332 Epoch: 14 Global Step: 153570 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:22:33,528-Speed 5964.77 samples/sec Loss 4.3961 LearningRate 0.0332 Epoch: 14 Global Step: 153580 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:22:40,405-Speed 5956.83 samples/sec Loss 4.3890 LearningRate 0.0332 Epoch: 14 Global Step: 153590 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:22:47,279-Speed 5959.68 samples/sec Loss 4.3583 LearningRate 0.0332 Epoch: 14 Global Step: 153600 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:22:54,149-Speed 5963.93 samples/sec Loss 4.4129 LearningRate 0.0332 Epoch: 14 Global Step: 153610 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:23:01,022-Speed 5960.77 samples/sec Loss 4.3640 LearningRate 0.0332 Epoch: 14 Global Step: 153620 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:23:07,887-Speed 5967.14 samples/sec Loss 4.3509 LearningRate 0.0332 Epoch: 14 Global Step: 153630 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 02:23:14,750-Speed 5970.06 samples/sec Loss 4.3686 LearningRate 0.0332 Epoch: 14 Global Step: 153640 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 02:23:21,607-Speed 5974.28 samples/sec Loss 4.3676 LearningRate 0.0331 Epoch: 14 Global Step: 153650 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:23:28,465-Speed 5974.00 samples/sec Loss 4.4004 LearningRate 0.0331 Epoch: 14 Global Step: 153660 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:23:35,352-Speed 5949.41 samples/sec Loss 4.3079 LearningRate 0.0331 Epoch: 14 Global Step: 153670 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:23:42,196-Speed 5984.78 samples/sec Loss 4.3550 LearningRate 0.0331 Epoch: 14 Global Step: 153680 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:23:49,060-Speed 5969.02 samples/sec Loss 4.3913 LearningRate 0.0331 Epoch: 14 Global Step: 153690 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:23:55,916-Speed 5976.46 samples/sec Loss 4.3740 LearningRate 0.0331 Epoch: 14 Global Step: 153700 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:24:02,778-Speed 5969.98 samples/sec Loss 4.3729 LearningRate 0.0331 Epoch: 14 Global Step: 153710 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 02:24:09,657-Speed 5955.23 samples/sec Loss 4.3921 LearningRate 0.0331 Epoch: 14 Global Step: 153720 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 02:24:16,512-Speed 5976.58 samples/sec Loss 4.4667 LearningRate 0.0331 Epoch: 14 Global Step: 153730 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 02:24:23,397-Speed 5950.31 samples/sec Loss 4.3595 LearningRate 0.0330 Epoch: 14 Global Step: 153740 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 02:24:30,252-Speed 5975.37 samples/sec Loss 4.4136 LearningRate 0.0330 Epoch: 14 Global Step: 153750 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 02:24:37,131-Speed 5956.34 samples/sec Loss 4.3789 LearningRate 0.0330 Epoch: 14 Global Step: 153760 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 02:24:43,975-Speed 5985.54 samples/sec Loss 4.3391 LearningRate 0.0330 Epoch: 14 Global Step: 153770 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 02:24:50,824-Speed 5981.31 samples/sec Loss 4.3856 LearningRate 0.0330 Epoch: 14 Global Step: 153780 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 02:24:57,688-Speed 5968.48 samples/sec Loss 4.3584 LearningRate 0.0330 Epoch: 14 Global Step: 153790 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 02:25:04,540-Speed 5978.65 samples/sec Loss 4.3982 LearningRate 0.0330 Epoch: 14 Global Step: 153800 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 02:25:11,395-Speed 5976.87 samples/sec Loss 4.3910 LearningRate 0.0330 Epoch: 14 Global Step: 153810 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 02:25:18,260-Speed 5970.29 samples/sec Loss 4.3329 LearningRate 0.0329 Epoch: 14 Global Step: 153820 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:25:25,129-Speed 5963.41 samples/sec Loss 4.4185 LearningRate 0.0329 Epoch: 14 Global Step: 153830 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:25:31,971-Speed 5988.39 samples/sec Loss 4.3357 LearningRate 0.0329 Epoch: 14 Global Step: 153840 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:25:38,827-Speed 5976.74 samples/sec Loss 4.3635 LearningRate 0.0329 Epoch: 14 Global Step: 153850 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:25:45,695-Speed 5965.27 samples/sec Loss 4.3297 LearningRate 0.0329 Epoch: 14 Global Step: 153860 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:25:52,581-Speed 5949.60 samples/sec Loss 4.3198 LearningRate 0.0329 Epoch: 14 Global Step: 153870 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:25:59,460-Speed 5958.30 samples/sec Loss 4.3760 LearningRate 0.0329 Epoch: 14 Global Step: 153880 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:26:06,313-Speed 5977.59 samples/sec Loss 4.3775 LearningRate 0.0329 Epoch: 14 Global Step: 153890 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:26:13,164-Speed 5982.37 samples/sec Loss 4.3870 LearningRate 0.0328 Epoch: 14 Global Step: 153900 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:26:20,038-Speed 5960.31 samples/sec Loss 4.3694 LearningRate 0.0328 Epoch: 14 Global Step: 153910 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 02:26:26,897-Speed 5973.06 samples/sec Loss 4.3923 LearningRate 0.0328 Epoch: 14 Global Step: 153920 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 02:26:33,742-Speed 5984.40 samples/sec Loss 4.3073 LearningRate 0.0328 Epoch: 14 Global Step: 153930 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:26:40,599-Speed 5974.69 samples/sec Loss 4.3530 LearningRate 0.0328 Epoch: 14 Global Step: 153940 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:26:47,474-Speed 5959.53 samples/sec Loss 4.3461 LearningRate 0.0328 Epoch: 14 Global Step: 153950 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:26:54,332-Speed 5973.21 samples/sec Loss 4.3117 LearningRate 0.0328 Epoch: 14 Global Step: 153960 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:27:01,191-Speed 5973.03 samples/sec Loss 4.3394 LearningRate 0.0328 Epoch: 14 Global Step: 153970 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:27:08,053-Speed 5972.04 samples/sec Loss 4.3009 LearningRate 0.0327 Epoch: 14 Global Step: 153980 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:27:14,903-Speed 5983.74 samples/sec Loss 4.3437 LearningRate 0.0327 Epoch: 14 Global Step: 153990 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:27:21,751-Speed 5984.59 samples/sec Loss 4.3477 LearningRate 0.0327 Epoch: 14 Global Step: 154000 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:27:28,597-Speed 5983.80 samples/sec Loss 4.3598 LearningRate 0.0327 Epoch: 14 Global Step: 154010 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:27:35,449-Speed 5979.43 samples/sec Loss 4.2959 LearningRate 0.0327 Epoch: 14 Global Step: 154020 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:27:42,312-Speed 5969.64 samples/sec Loss 4.3325 LearningRate 0.0327 Epoch: 14 Global Step: 154030 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:27:49,170-Speed 5973.62 samples/sec Loss 4.3269 LearningRate 0.0327 Epoch: 14 Global Step: 154040 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:27:56,016-Speed 5983.79 samples/sec Loss 4.3438 LearningRate 0.0327 Epoch: 14 Global Step: 154050 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:28:02,860-Speed 5986.07 samples/sec Loss 4.2606 LearningRate 0.0326 Epoch: 14 Global Step: 154060 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:28:09,726-Speed 5966.50 samples/sec Loss 4.3075 LearningRate 0.0326 Epoch: 14 Global Step: 154070 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:28:16,608-Speed 5954.21 samples/sec Loss 4.3701 LearningRate 0.0326 Epoch: 14 Global Step: 154080 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:28:23,493-Speed 5950.40 samples/sec Loss 4.3802 LearningRate 0.0326 Epoch: 14 Global Step: 154090 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:28:30,374-Speed 5954.48 samples/sec Loss 4.3356 LearningRate 0.0326 Epoch: 14 Global Step: 154100 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:28:37,239-Speed 5967.90 samples/sec Loss 4.3479 LearningRate 0.0326 Epoch: 14 Global Step: 154110 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:28:44,113-Speed 5959.53 samples/sec Loss 4.3100 LearningRate 0.0326 Epoch: 14 Global Step: 154120 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:28:50,967-Speed 5977.00 samples/sec Loss 4.3643 LearningRate 0.0326 Epoch: 14 Global Step: 154130 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:28:57,818-Speed 5980.57 samples/sec Loss 4.3359 LearningRate 0.0325 Epoch: 14 Global Step: 154140 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:29:04,687-Speed 5964.31 samples/sec Loss 4.3688 LearningRate 0.0325 Epoch: 14 Global Step: 154150 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:29:11,592-Speed 5932.19 samples/sec Loss 4.2785 LearningRate 0.0325 Epoch: 14 Global Step: 154160 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:29:18,456-Speed 5969.19 samples/sec Loss 4.3002 LearningRate 0.0325 Epoch: 14 Global Step: 154170 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:29:25,344-Speed 5947.45 samples/sec Loss 4.3647 LearningRate 0.0325 Epoch: 14 Global Step: 154180 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:29:32,229-Speed 5950.17 samples/sec Loss 4.3959 LearningRate 0.0325 Epoch: 14 Global Step: 154190 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:29:39,093-Speed 5968.91 samples/sec Loss 4.3358 LearningRate 0.0325 Epoch: 14 Global Step: 154200 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:29:45,962-Speed 5966.61 samples/sec Loss 4.3337 LearningRate 0.0325 Epoch: 14 Global Step: 154210 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:29:52,801-Speed 5990.34 samples/sec Loss 4.3422 LearningRate 0.0324 Epoch: 14 Global Step: 154220 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:29:59,652-Speed 5978.95 samples/sec Loss 4.3260 LearningRate 0.0324 Epoch: 14 Global Step: 154230 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:30:06,510-Speed 5974.44 samples/sec Loss 4.3545 LearningRate 0.0324 Epoch: 14 Global Step: 154240 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:30:13,370-Speed 5972.22 samples/sec Loss 4.3366 LearningRate 0.0324 Epoch: 14 Global Step: 154250 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:30:20,231-Speed 5971.08 samples/sec Loss 4.3707 LearningRate 0.0324 Epoch: 14 Global Step: 154260 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:30:27,109-Speed 5956.57 samples/sec Loss 4.3251 LearningRate 0.0324 Epoch: 14 Global Step: 154270 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:30:33,963-Speed 5976.81 samples/sec Loss 4.3369 LearningRate 0.0324 Epoch: 14 Global Step: 154280 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:30:40,820-Speed 5974.83 samples/sec Loss 4.3435 LearningRate 0.0324 Epoch: 14 Global Step: 154290 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:30:47,689-Speed 5964.00 samples/sec Loss 4.3121 LearningRate 0.0324 Epoch: 14 Global Step: 154300 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:30:54,536-Speed 5982.70 samples/sec Loss 4.3165 LearningRate 0.0323 Epoch: 14 Global Step: 154310 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:31:01,391-Speed 5978.77 samples/sec Loss 4.3448 LearningRate 0.0323 Epoch: 14 Global Step: 154320 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:31:08,236-Speed 5984.79 samples/sec Loss 4.3447 LearningRate 0.0323 Epoch: 14 Global Step: 154330 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:31:15,090-Speed 5976.64 samples/sec Loss 4.3036 LearningRate 0.0323 Epoch: 14 Global Step: 154340 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:31:22,044-Speed 5892.14 samples/sec Loss 4.3177 LearningRate 0.0323 Epoch: 14 Global Step: 154350 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 02:31:28,999-Speed 5890.30 samples/sec Loss 4.2825 LearningRate 0.0323 Epoch: 14 Global Step: 154360 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 02:31:35,892-Speed 5942.89 samples/sec Loss 4.3775 LearningRate 0.0323 Epoch: 14 Global Step: 154370 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 02:31:42,742-Speed 5980.71 samples/sec Loss 4.3843 LearningRate 0.0323 Epoch: 14 Global Step: 154380 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 02:31:49,587-Speed 5984.72 samples/sec Loss 4.2710 LearningRate 0.0322 Epoch: 14 Global Step: 154390 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 02:31:56,558-Speed 5877.26 samples/sec Loss 4.3160 LearningRate 0.0322 Epoch: 14 Global Step: 154400 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:32:03,552-Speed 5859.71 samples/sec Loss 4.3482 LearningRate 0.0322 Epoch: 14 Global Step: 154410 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:32:10,508-Speed 5889.22 samples/sec Loss 4.3482 LearningRate 0.0322 Epoch: 14 Global Step: 154420 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:32:17,400-Speed 5944.09 samples/sec Loss 4.3594 LearningRate 0.0322 Epoch: 14 Global Step: 154430 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:32:24,267-Speed 5966.04 samples/sec Loss 4.2837 LearningRate 0.0322 Epoch: 14 Global Step: 154440 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:32:31,141-Speed 5962.00 samples/sec Loss 4.3107 LearningRate 0.0322 Epoch: 14 Global Step: 154450 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:32:38,062-Speed 5919.87 samples/sec Loss 4.3282 LearningRate 0.0322 Epoch: 14 Global Step: 154460 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:32:44,938-Speed 5957.70 samples/sec Loss 4.3186 LearningRate 0.0321 Epoch: 14 Global Step: 154470 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:32:51,793-Speed 5976.68 samples/sec Loss 4.2945 LearningRate 0.0321 Epoch: 14 Global Step: 154480 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:32:58,644-Speed 5978.90 samples/sec Loss 4.3459 LearningRate 0.0321 Epoch: 14 Global Step: 154490 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:33:05,497-Speed 5978.14 samples/sec Loss 4.3422 LearningRate 0.0321 Epoch: 14 Global Step: 154500 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:33:12,382-Speed 5951.06 samples/sec Loss 4.3283 LearningRate 0.0321 Epoch: 14 Global Step: 154510 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:33:19,236-Speed 5976.18 samples/sec Loss 4.3736 LearningRate 0.0321 Epoch: 14 Global Step: 154520 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:33:26,122-Speed 5950.22 samples/sec Loss 4.2433 LearningRate 0.0321 Epoch: 14 Global Step: 154530 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:33:32,975-Speed 5977.82 samples/sec Loss 4.3333 LearningRate 0.0321 Epoch: 14 Global Step: 154540 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:33:39,833-Speed 5973.59 samples/sec Loss 4.3002 LearningRate 0.0320 Epoch: 14 Global Step: 154550 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:33:46,684-Speed 5979.61 samples/sec Loss 4.3006 LearningRate 0.0320 Epoch: 14 Global Step: 154560 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:33:53,542-Speed 5974.19 samples/sec Loss 4.3257 LearningRate 0.0320 Epoch: 14 Global Step: 154570 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:34:00,407-Speed 5967.62 samples/sec Loss 4.3193 LearningRate 0.0320 Epoch: 14 Global Step: 154580 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:34:07,257-Speed 5981.69 samples/sec Loss 4.3271 LearningRate 0.0320 Epoch: 14 Global Step: 154590 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:34:14,107-Speed 5980.76 samples/sec Loss 4.2992 LearningRate 0.0320 Epoch: 14 Global Step: 154600 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 02:34:20,988-Speed 5953.13 samples/sec Loss 4.3044 LearningRate 0.0320 Epoch: 14 Global Step: 154610 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 02:34:27,859-Speed 5963.46 samples/sec Loss 4.3517 LearningRate 0.0320 Epoch: 14 Global Step: 154620 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 02:34:34,702-Speed 5987.02 samples/sec Loss 4.3066 LearningRate 0.0320 Epoch: 14 Global Step: 154630 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:34:41,569-Speed 5965.58 samples/sec Loss 4.3068 LearningRate 0.0319 Epoch: 14 Global Step: 154640 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:34:48,419-Speed 5980.84 samples/sec Loss 4.3052 LearningRate 0.0319 Epoch: 14 Global Step: 154650 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:34:55,403-Speed 5865.41 samples/sec Loss 4.3227 LearningRate 0.0319 Epoch: 14 Global Step: 154660 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:35:02,306-Speed 5935.07 samples/sec Loss 4.3084 LearningRate 0.0319 Epoch: 14 Global Step: 154670 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:35:09,160-Speed 5977.38 samples/sec Loss 4.2935 LearningRate 0.0319 Epoch: 14 Global Step: 154680 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:35:16,016-Speed 5975.56 samples/sec Loss 4.2995 LearningRate 0.0319 Epoch: 14 Global Step: 154690 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:35:22,869-Speed 5978.58 samples/sec Loss 4.3284 LearningRate 0.0319 Epoch: 14 Global Step: 154700 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:35:29,718-Speed 5981.44 samples/sec Loss 4.3083 LearningRate 0.0319 Epoch: 14 Global Step: 154710 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:35:36,603-Speed 5950.45 samples/sec Loss 4.3050 LearningRate 0.0318 Epoch: 14 Global Step: 154720 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:35:43,483-Speed 5955.33 samples/sec Loss 4.3229 LearningRate 0.0318 Epoch: 14 Global Step: 154730 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 02:35:50,325-Speed 5987.81 samples/sec Loss 4.3528 LearningRate 0.0318 Epoch: 14 Global Step: 154740 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:35:57,195-Speed 5963.52 samples/sec Loss 4.3113 LearningRate 0.0318 Epoch: 14 Global Step: 154750 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:36:04,071-Speed 5957.64 samples/sec Loss 4.3181 LearningRate 0.0318 Epoch: 14 Global Step: 154760 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:36:10,941-Speed 5963.80 samples/sec Loss 4.2912 LearningRate 0.0318 Epoch: 14 Global Step: 154770 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:36:17,834-Speed 5943.50 samples/sec Loss 4.3067 LearningRate 0.0318 Epoch: 14 Global Step: 154780 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:36:24,702-Speed 5964.98 samples/sec Loss 4.3033 LearningRate 0.0318 Epoch: 14 Global Step: 154790 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:36:31,557-Speed 5976.09 samples/sec Loss 4.2853 LearningRate 0.0317 Epoch: 14 Global Step: 154800 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:36:38,406-Speed 5981.29 samples/sec Loss 4.3034 LearningRate 0.0317 Epoch: 14 Global Step: 154810 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:36:45,255-Speed 5980.82 samples/sec Loss 4.3056 LearningRate 0.0317 Epoch: 14 Global Step: 154820 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:36:52,130-Speed 5959.49 samples/sec Loss 4.3474 LearningRate 0.0317 Epoch: 14 Global Step: 154830 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:36:58,979-Speed 5981.61 samples/sec Loss 4.3084 LearningRate 0.0317 Epoch: 14 Global Step: 154840 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:37:05,830-Speed 5979.74 samples/sec Loss 4.2795 LearningRate 0.0317 Epoch: 14 Global Step: 154850 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:37:12,704-Speed 5959.81 samples/sec Loss 4.3561 LearningRate 0.0317 Epoch: 14 Global Step: 154860 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:37:19,559-Speed 5976.71 samples/sec Loss 4.2758 LearningRate 0.0317 Epoch: 14 Global Step: 154870 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:37:26,427-Speed 5964.90 samples/sec Loss 4.2877 LearningRate 0.0316 Epoch: 14 Global Step: 154880 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:37:33,305-Speed 5955.88 samples/sec Loss 4.2773 LearningRate 0.0316 Epoch: 14 Global Step: 154890 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:37:40,182-Speed 5958.45 samples/sec Loss 4.2535 LearningRate 0.0316 Epoch: 14 Global Step: 154900 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:37:47,044-Speed 5969.87 samples/sec Loss 4.2880 LearningRate 0.0316 Epoch: 14 Global Step: 154910 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:37:53,946-Speed 5935.22 samples/sec Loss 4.2694 LearningRate 0.0316 Epoch: 14 Global Step: 154920 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:38:00,806-Speed 5971.77 samples/sec Loss 4.2957 LearningRate 0.0316 Epoch: 14 Global Step: 154930 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:38:07,647-Speed 5988.37 samples/sec Loss 4.2588 LearningRate 0.0316 Epoch: 14 Global Step: 154940 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:38:14,507-Speed 5972.93 samples/sec Loss 4.2510 LearningRate 0.0316 Epoch: 14 Global Step: 154950 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:38:21,366-Speed 5972.75 samples/sec Loss 4.2805 LearningRate 0.0316 Epoch: 14 Global Step: 154960 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:38:28,228-Speed 5970.18 samples/sec Loss 4.2933 LearningRate 0.0315 Epoch: 14 Global Step: 154970 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:38:35,082-Speed 5977.50 samples/sec Loss 4.3160 LearningRate 0.0315 Epoch: 14 Global Step: 154980 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:38:41,934-Speed 5979.58 samples/sec Loss 4.2585 LearningRate 0.0315 Epoch: 14 Global Step: 154990 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:38:48,781-Speed 5983.34 samples/sec Loss 4.2504 LearningRate 0.0315 Epoch: 14 Global Step: 155000 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:39:15,476-[lfw][155000]XNorm: 23.453467 Training: 2022-01-09 02:39:15,477-[lfw][155000]Accuracy-Flip: 0.99800+-0.00296 Training: 2022-01-09 02:39:15,477-[lfw][155000]Accuracy-Highest: 0.99800 Training: 2022-01-09 02:39:46,296-[cfp_fp][155000]XNorm: 20.854993 Training: 2022-01-09 02:39:46,297-[cfp_fp][155000]Accuracy-Flip: 0.98786+-0.00539 Training: 2022-01-09 02:39:46,298-[cfp_fp][155000]Accuracy-Highest: 0.98786 Training: 2022-01-09 02:40:13,112-[agedb_30][155000]XNorm: 22.783735 Training: 2022-01-09 02:40:13,113-[agedb_30][155000]Accuracy-Flip: 0.97783+-0.00687 Training: 2022-01-09 02:40:13,113-[agedb_30][155000]Accuracy-Highest: 0.97833 Training: 2022-01-09 02:40:19,956-Speed 449.25 samples/sec Loss 4.3124 LearningRate 0.0315 Epoch: 14 Global Step: 155010 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:40:26,788-Speed 5997.44 samples/sec Loss 4.2883 LearningRate 0.0315 Epoch: 14 Global Step: 155020 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:40:33,629-Speed 5988.02 samples/sec Loss 4.2443 LearningRate 0.0315 Epoch: 14 Global Step: 155030 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:40:40,476-Speed 5983.60 samples/sec Loss 4.2971 LearningRate 0.0315 Epoch: 14 Global Step: 155040 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:40:47,347-Speed 5962.33 samples/sec Loss 4.2781 LearningRate 0.0314 Epoch: 14 Global Step: 155050 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:40:54,251-Speed 5935.17 samples/sec Loss 4.2747 LearningRate 0.0314 Epoch: 14 Global Step: 155060 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:41:01,127-Speed 5958.40 samples/sec Loss 4.3241 LearningRate 0.0314 Epoch: 14 Global Step: 155070 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:41:08,018-Speed 5944.57 samples/sec Loss 4.2736 LearningRate 0.0314 Epoch: 14 Global Step: 155080 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:41:14,906-Speed 5948.02 samples/sec Loss 4.2889 LearningRate 0.0314 Epoch: 14 Global Step: 155090 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:41:21,780-Speed 5960.59 samples/sec Loss 4.2550 LearningRate 0.0314 Epoch: 14 Global Step: 155100 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:41:28,646-Speed 5966.30 samples/sec Loss 4.3161 LearningRate 0.0314 Epoch: 14 Global Step: 155110 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:41:35,518-Speed 5963.78 samples/sec Loss 4.2555 LearningRate 0.0314 Epoch: 14 Global Step: 155120 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:41:42,391-Speed 5961.08 samples/sec Loss 4.2909 LearningRate 0.0313 Epoch: 14 Global Step: 155130 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:41:49,252-Speed 5970.84 samples/sec Loss 4.2530 LearningRate 0.0313 Epoch: 14 Global Step: 155140 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:41:56,115-Speed 5969.87 samples/sec Loss 4.2766 LearningRate 0.0313 Epoch: 14 Global Step: 155150 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:42:02,985-Speed 5963.39 samples/sec Loss 4.2944 LearningRate 0.0313 Epoch: 14 Global Step: 155160 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:42:09,854-Speed 5964.50 samples/sec Loss 4.2960 LearningRate 0.0313 Epoch: 14 Global Step: 155170 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:42:16,712-Speed 5973.70 samples/sec Loss 4.2706 LearningRate 0.0313 Epoch: 14 Global Step: 155180 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:42:23,566-Speed 5977.70 samples/sec Loss 4.2576 LearningRate 0.0313 Epoch: 14 Global Step: 155190 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:42:30,433-Speed 5965.39 samples/sec Loss 4.2696 LearningRate 0.0313 Epoch: 14 Global Step: 155200 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:42:37,319-Speed 5949.65 samples/sec Loss 4.2506 LearningRate 0.0313 Epoch: 14 Global Step: 155210 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:42:44,204-Speed 5951.21 samples/sec Loss 4.2887 LearningRate 0.0312 Epoch: 14 Global Step: 155220 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:42:51,090-Speed 5948.82 samples/sec Loss 4.1982 LearningRate 0.0312 Epoch: 14 Global Step: 155230 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:42:57,957-Speed 5966.34 samples/sec Loss 4.2777 LearningRate 0.0312 Epoch: 14 Global Step: 155240 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 02:43:04,815-Speed 5974.00 samples/sec Loss 4.2970 LearningRate 0.0312 Epoch: 14 Global Step: 155250 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:43:11,674-Speed 5972.58 samples/sec Loss 4.2564 LearningRate 0.0312 Epoch: 14 Global Step: 155260 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:43:18,535-Speed 5970.76 samples/sec Loss 4.3019 LearningRate 0.0312 Epoch: 14 Global Step: 155270 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:43:25,401-Speed 5967.67 samples/sec Loss 4.3289 LearningRate 0.0312 Epoch: 14 Global Step: 155280 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:43:32,263-Speed 5970.11 samples/sec Loss 4.2115 LearningRate 0.0312 Epoch: 14 Global Step: 155290 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:43:39,129-Speed 5966.40 samples/sec Loss 4.2839 LearningRate 0.0311 Epoch: 14 Global Step: 155300 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:43:45,998-Speed 5965.04 samples/sec Loss 4.2396 LearningRate 0.0311 Epoch: 14 Global Step: 155310 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:43:52,856-Speed 5973.57 samples/sec Loss 4.2511 LearningRate 0.0311 Epoch: 14 Global Step: 155320 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:43:59,715-Speed 5973.97 samples/sec Loss 4.3098 LearningRate 0.0311 Epoch: 14 Global Step: 155330 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:44:06,568-Speed 5978.14 samples/sec Loss 4.2232 LearningRate 0.0311 Epoch: 14 Global Step: 155340 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:44:13,428-Speed 5972.41 samples/sec Loss 4.2717 LearningRate 0.0311 Epoch: 14 Global Step: 155350 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 02:44:20,307-Speed 5955.94 samples/sec Loss 4.2730 LearningRate 0.0311 Epoch: 14 Global Step: 155360 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 02:44:27,165-Speed 5976.65 samples/sec Loss 4.2622 LearningRate 0.0311 Epoch: 14 Global Step: 155370 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 02:44:34,022-Speed 5974.04 samples/sec Loss 4.2605 LearningRate 0.0310 Epoch: 14 Global Step: 155380 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 02:44:40,888-Speed 5969.84 samples/sec Loss 4.2259 LearningRate 0.0310 Epoch: 14 Global Step: 155390 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 02:44:47,735-Speed 5983.90 samples/sec Loss 4.3006 LearningRate 0.0310 Epoch: 14 Global Step: 155400 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:44:54,596-Speed 5970.24 samples/sec Loss 4.2343 LearningRate 0.0310 Epoch: 14 Global Step: 155410 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:45:01,461-Speed 5968.81 samples/sec Loss 4.3273 LearningRate 0.0310 Epoch: 14 Global Step: 155420 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:45:08,338-Speed 5957.54 samples/sec Loss 4.2436 LearningRate 0.0310 Epoch: 14 Global Step: 155430 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:45:15,187-Speed 5981.37 samples/sec Loss 4.2263 LearningRate 0.0310 Epoch: 14 Global Step: 155440 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:45:22,056-Speed 5964.15 samples/sec Loss 4.2613 LearningRate 0.0310 Epoch: 14 Global Step: 155450 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:45:28,949-Speed 5943.63 samples/sec Loss 4.3042 LearningRate 0.0310 Epoch: 14 Global Step: 155460 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:45:35,813-Speed 5968.59 samples/sec Loss 4.2813 LearningRate 0.0309 Epoch: 14 Global Step: 155470 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:45:42,692-Speed 5955.30 samples/sec Loss 4.2546 LearningRate 0.0309 Epoch: 14 Global Step: 155480 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:45:49,563-Speed 5962.00 samples/sec Loss 4.2613 LearningRate 0.0309 Epoch: 14 Global Step: 155490 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:45:56,430-Speed 5966.26 samples/sec Loss 4.2554 LearningRate 0.0309 Epoch: 14 Global Step: 155500 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 02:46:03,286-Speed 5977.93 samples/sec Loss 4.2785 LearningRate 0.0309 Epoch: 14 Global Step: 155510 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 02:46:10,138-Speed 5978.69 samples/sec Loss 4.2971 LearningRate 0.0309 Epoch: 14 Global Step: 155520 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:46:17,008-Speed 5962.91 samples/sec Loss 4.2538 LearningRate 0.0309 Epoch: 14 Global Step: 155530 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:46:23,867-Speed 5973.54 samples/sec Loss 4.2244 LearningRate 0.0309 Epoch: 14 Global Step: 155540 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:46:48,072-Speed 1692.33 samples/sec Loss 4.2801 LearningRate 0.0308 Epoch: 15 Global Step: 155550 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:46:54,898-Speed 6001.69 samples/sec Loss 4.1888 LearningRate 0.0308 Epoch: 15 Global Step: 155560 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:47:01,742-Speed 5985.96 samples/sec Loss 4.2823 LearningRate 0.0308 Epoch: 15 Global Step: 155570 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:47:08,569-Speed 6000.65 samples/sec Loss 4.2387 LearningRate 0.0308 Epoch: 15 Global Step: 155580 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:47:15,431-Speed 5970.06 samples/sec Loss 4.2739 LearningRate 0.0308 Epoch: 15 Global Step: 155590 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:47:22,291-Speed 5971.85 samples/sec Loss 4.2980 LearningRate 0.0308 Epoch: 15 Global Step: 155600 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:47:29,165-Speed 5960.08 samples/sec Loss 4.2299 LearningRate 0.0308 Epoch: 15 Global Step: 155610 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:47:36,007-Speed 5987.91 samples/sec Loss 4.2514 LearningRate 0.0308 Epoch: 15 Global Step: 155620 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:47:42,851-Speed 5985.94 samples/sec Loss 4.1900 LearningRate 0.0308 Epoch: 15 Global Step: 155630 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:47:49,703-Speed 5979.08 samples/sec Loss 4.1991 LearningRate 0.0307 Epoch: 15 Global Step: 155640 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:47:56,573-Speed 5963.82 samples/sec Loss 4.2476 LearningRate 0.0307 Epoch: 15 Global Step: 155650 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:48:03,438-Speed 5967.40 samples/sec Loss 4.2030 LearningRate 0.0307 Epoch: 15 Global Step: 155660 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:48:10,313-Speed 5959.49 samples/sec Loss 4.2293 LearningRate 0.0307 Epoch: 15 Global Step: 155670 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:48:17,179-Speed 5967.33 samples/sec Loss 4.2290 LearningRate 0.0307 Epoch: 15 Global Step: 155680 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:48:24,037-Speed 5972.96 samples/sec Loss 4.1781 LearningRate 0.0307 Epoch: 15 Global Step: 155690 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:48:30,911-Speed 5960.10 samples/sec Loss 4.2442 LearningRate 0.0307 Epoch: 15 Global Step: 155700 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:48:37,772-Speed 5971.54 samples/sec Loss 4.2495 LearningRate 0.0307 Epoch: 15 Global Step: 155710 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:48:44,660-Speed 5947.51 samples/sec Loss 4.2086 LearningRate 0.0306 Epoch: 15 Global Step: 155720 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:48:51,541-Speed 5956.55 samples/sec Loss 4.2615 LearningRate 0.0306 Epoch: 15 Global Step: 155730 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:48:58,400-Speed 5973.34 samples/sec Loss 4.2139 LearningRate 0.0306 Epoch: 15 Global Step: 155740 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:49:05,250-Speed 5980.26 samples/sec Loss 4.2612 LearningRate 0.0306 Epoch: 15 Global Step: 155750 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 02:49:12,115-Speed 5967.70 samples/sec Loss 4.2376 LearningRate 0.0306 Epoch: 15 Global Step: 155760 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:49:18,968-Speed 5977.47 samples/sec Loss 4.2118 LearningRate 0.0306 Epoch: 15 Global Step: 155770 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:49:25,828-Speed 5972.19 samples/sec Loss 4.1898 LearningRate 0.0306 Epoch: 15 Global Step: 155780 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:49:32,717-Speed 5947.17 samples/sec Loss 4.2490 LearningRate 0.0306 Epoch: 15 Global Step: 155790 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:49:39,582-Speed 5968.04 samples/sec Loss 4.1872 LearningRate 0.0305 Epoch: 15 Global Step: 155800 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:49:46,467-Speed 5950.09 samples/sec Loss 4.1895 LearningRate 0.0305 Epoch: 15 Global Step: 155810 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:49:53,350-Speed 5952.94 samples/sec Loss 4.2206 LearningRate 0.0305 Epoch: 15 Global Step: 155820 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:50:00,219-Speed 5964.72 samples/sec Loss 4.1772 LearningRate 0.0305 Epoch: 15 Global Step: 155830 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:50:07,111-Speed 5944.85 samples/sec Loss 4.2335 LearningRate 0.0305 Epoch: 15 Global Step: 155840 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:50:13,989-Speed 5957.64 samples/sec Loss 4.2865 LearningRate 0.0305 Epoch: 15 Global Step: 155850 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:50:20,862-Speed 5961.50 samples/sec Loss 4.1800 LearningRate 0.0305 Epoch: 15 Global Step: 155860 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 02:50:27,732-Speed 5963.12 samples/sec Loss 4.2565 LearningRate 0.0305 Epoch: 15 Global Step: 155870 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 02:50:34,589-Speed 5974.67 samples/sec Loss 4.2556 LearningRate 0.0305 Epoch: 15 Global Step: 155880 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:50:41,467-Speed 5956.22 samples/sec Loss 4.2267 LearningRate 0.0304 Epoch: 15 Global Step: 155890 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:50:48,317-Speed 5980.82 samples/sec Loss 4.2292 LearningRate 0.0304 Epoch: 15 Global Step: 155900 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:50:55,274-Speed 5889.16 samples/sec Loss 4.2030 LearningRate 0.0304 Epoch: 15 Global Step: 155910 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:51:02,136-Speed 5970.13 samples/sec Loss 4.2335 LearningRate 0.0304 Epoch: 15 Global Step: 155920 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:51:09,036-Speed 5937.71 samples/sec Loss 4.2125 LearningRate 0.0304 Epoch: 15 Global Step: 155930 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:51:15,894-Speed 5974.24 samples/sec Loss 4.2619 LearningRate 0.0304 Epoch: 15 Global Step: 155940 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:51:22,769-Speed 5959.76 samples/sec Loss 4.2235 LearningRate 0.0304 Epoch: 15 Global Step: 155950 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:51:29,628-Speed 5972.49 samples/sec Loss 4.2252 LearningRate 0.0304 Epoch: 15 Global Step: 155960 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:51:36,493-Speed 5967.72 samples/sec Loss 4.2104 LearningRate 0.0303 Epoch: 15 Global Step: 155970 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:51:43,369-Speed 5958.66 samples/sec Loss 4.1925 LearningRate 0.0303 Epoch: 15 Global Step: 155980 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 02:51:50,248-Speed 5954.91 samples/sec Loss 4.2011 LearningRate 0.0303 Epoch: 15 Global Step: 155990 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 02:51:57,108-Speed 5972.75 samples/sec Loss 4.1925 LearningRate 0.0303 Epoch: 15 Global Step: 156000 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 02:52:03,948-Speed 5989.44 samples/sec Loss 4.2293 LearningRate 0.0303 Epoch: 15 Global Step: 156010 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:52:10,820-Speed 5961.66 samples/sec Loss 4.2168 LearningRate 0.0303 Epoch: 15 Global Step: 156020 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:52:17,702-Speed 5954.27 samples/sec Loss 4.2856 LearningRate 0.0303 Epoch: 15 Global Step: 156030 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:52:24,576-Speed 5960.56 samples/sec Loss 4.2277 LearningRate 0.0303 Epoch: 15 Global Step: 156040 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:52:31,436-Speed 5972.06 samples/sec Loss 4.2201 LearningRate 0.0303 Epoch: 15 Global Step: 156050 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:52:38,401-Speed 5881.91 samples/sec Loss 4.2405 LearningRate 0.0302 Epoch: 15 Global Step: 156060 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:52:45,266-Speed 5969.37 samples/sec Loss 4.2271 LearningRate 0.0302 Epoch: 15 Global Step: 156070 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:52:52,168-Speed 5935.79 samples/sec Loss 4.1581 LearningRate 0.0302 Epoch: 15 Global Step: 156080 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:52:59,052-Speed 5951.08 samples/sec Loss 4.1912 LearningRate 0.0302 Epoch: 15 Global Step: 156090 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:53:05,908-Speed 5975.80 samples/sec Loss 4.1686 LearningRate 0.0302 Epoch: 15 Global Step: 156100 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:53:12,787-Speed 5955.36 samples/sec Loss 4.1739 LearningRate 0.0302 Epoch: 15 Global Step: 156110 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:53:19,679-Speed 5944.87 samples/sec Loss 4.2002 LearningRate 0.0302 Epoch: 15 Global Step: 156120 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:53:26,558-Speed 5955.63 samples/sec Loss 4.1871 LearningRate 0.0302 Epoch: 15 Global Step: 156130 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:53:33,408-Speed 5981.10 samples/sec Loss 4.2306 LearningRate 0.0301 Epoch: 15 Global Step: 156140 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:53:40,277-Speed 5964.02 samples/sec Loss 4.2017 LearningRate 0.0301 Epoch: 15 Global Step: 156150 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:53:47,156-Speed 5955.59 samples/sec Loss 4.2193 LearningRate 0.0301 Epoch: 15 Global Step: 156160 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:53:54,012-Speed 5975.33 samples/sec Loss 4.2393 LearningRate 0.0301 Epoch: 15 Global Step: 156170 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:54:00,898-Speed 5949.10 samples/sec Loss 4.2036 LearningRate 0.0301 Epoch: 15 Global Step: 156180 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:54:07,753-Speed 5976.54 samples/sec Loss 4.2491 LearningRate 0.0301 Epoch: 15 Global Step: 156190 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:54:14,625-Speed 5961.55 samples/sec Loss 4.2117 LearningRate 0.0301 Epoch: 15 Global Step: 156200 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:54:21,486-Speed 5970.88 samples/sec Loss 4.2042 LearningRate 0.0301 Epoch: 15 Global Step: 156210 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:54:28,343-Speed 5975.19 samples/sec Loss 4.2044 LearningRate 0.0301 Epoch: 15 Global Step: 156220 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:54:35,203-Speed 5971.55 samples/sec Loss 4.1924 LearningRate 0.0300 Epoch: 15 Global Step: 156230 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:54:42,062-Speed 5973.37 samples/sec Loss 4.1767 LearningRate 0.0300 Epoch: 15 Global Step: 156240 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:54:48,943-Speed 5953.85 samples/sec Loss 4.2053 LearningRate 0.0300 Epoch: 15 Global Step: 156250 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 02:54:55,814-Speed 5962.89 samples/sec Loss 4.1770 LearningRate 0.0300 Epoch: 15 Global Step: 156260 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:55:02,700-Speed 5949.36 samples/sec Loss 4.1805 LearningRate 0.0300 Epoch: 15 Global Step: 156270 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:55:09,588-Speed 5947.89 samples/sec Loss 4.2158 LearningRate 0.0300 Epoch: 15 Global Step: 156280 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:55:16,450-Speed 5969.95 samples/sec Loss 4.1778 LearningRate 0.0300 Epoch: 15 Global Step: 156290 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:55:23,402-Speed 5893.41 samples/sec Loss 4.2068 LearningRate 0.0300 Epoch: 15 Global Step: 156300 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:55:30,276-Speed 5962.43 samples/sec Loss 4.1792 LearningRate 0.0299 Epoch: 15 Global Step: 156310 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:55:37,127-Speed 5979.81 samples/sec Loss 4.1443 LearningRate 0.0299 Epoch: 15 Global Step: 156320 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:55:43,976-Speed 5981.55 samples/sec Loss 4.1932 LearningRate 0.0299 Epoch: 15 Global Step: 156330 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:55:50,826-Speed 5980.25 samples/sec Loss 4.2016 LearningRate 0.0299 Epoch: 15 Global Step: 156340 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:55:57,681-Speed 5976.27 samples/sec Loss 4.1744 LearningRate 0.0299 Epoch: 15 Global Step: 156350 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:56:04,537-Speed 5975.42 samples/sec Loss 4.2295 LearningRate 0.0299 Epoch: 15 Global Step: 156360 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 02:56:11,404-Speed 5966.61 samples/sec Loss 4.1946 LearningRate 0.0299 Epoch: 15 Global Step: 156370 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:56:18,278-Speed 5959.16 samples/sec Loss 4.2344 LearningRate 0.0299 Epoch: 15 Global Step: 156380 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:56:25,149-Speed 5963.58 samples/sec Loss 4.2169 LearningRate 0.0299 Epoch: 15 Global Step: 156390 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:56:32,019-Speed 5963.92 samples/sec Loss 4.1918 LearningRate 0.0298 Epoch: 15 Global Step: 156400 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:56:38,868-Speed 5981.68 samples/sec Loss 4.2108 LearningRate 0.0298 Epoch: 15 Global Step: 156410 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:56:45,749-Speed 5953.57 samples/sec Loss 4.1969 LearningRate 0.0298 Epoch: 15 Global Step: 156420 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:56:52,598-Speed 5981.86 samples/sec Loss 4.1638 LearningRate 0.0298 Epoch: 15 Global Step: 156430 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:56:59,469-Speed 5962.60 samples/sec Loss 4.1562 LearningRate 0.0298 Epoch: 15 Global Step: 156440 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:57:06,319-Speed 5981.13 samples/sec Loss 4.1777 LearningRate 0.0298 Epoch: 15 Global Step: 156450 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:57:13,197-Speed 5956.24 samples/sec Loss 4.1241 LearningRate 0.0298 Epoch: 15 Global Step: 156460 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:57:20,076-Speed 5955.80 samples/sec Loss 4.1633 LearningRate 0.0298 Epoch: 15 Global Step: 156470 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:57:26,940-Speed 5968.91 samples/sec Loss 4.1631 LearningRate 0.0297 Epoch: 15 Global Step: 156480 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:57:33,797-Speed 5974.80 samples/sec Loss 4.1743 LearningRate 0.0297 Epoch: 15 Global Step: 156490 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:57:40,672-Speed 5958.23 samples/sec Loss 4.1752 LearningRate 0.0297 Epoch: 15 Global Step: 156500 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:57:47,535-Speed 5969.79 samples/sec Loss 4.1755 LearningRate 0.0297 Epoch: 15 Global Step: 156510 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:57:54,399-Speed 5970.11 samples/sec Loss 4.1447 LearningRate 0.0297 Epoch: 15 Global Step: 156520 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:58:01,273-Speed 5959.97 samples/sec Loss 4.1756 LearningRate 0.0297 Epoch: 15 Global Step: 156530 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:58:08,132-Speed 5974.19 samples/sec Loss 4.2218 LearningRate 0.0297 Epoch: 15 Global Step: 156540 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:58:14,973-Speed 5988.83 samples/sec Loss 4.1610 LearningRate 0.0297 Epoch: 15 Global Step: 156550 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:58:21,844-Speed 5961.62 samples/sec Loss 4.1687 LearningRate 0.0297 Epoch: 15 Global Step: 156560 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 02:58:28,701-Speed 5974.56 samples/sec Loss 4.2183 LearningRate 0.0296 Epoch: 15 Global Step: 156570 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 02:58:35,550-Speed 5982.06 samples/sec Loss 4.1760 LearningRate 0.0296 Epoch: 15 Global Step: 156580 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 02:58:42,399-Speed 5981.00 samples/sec Loss 4.1938 LearningRate 0.0296 Epoch: 15 Global Step: 156590 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 02:58:49,253-Speed 5977.52 samples/sec Loss 4.1692 LearningRate 0.0296 Epoch: 15 Global Step: 156600 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 02:58:56,102-Speed 5982.39 samples/sec Loss 4.1513 LearningRate 0.0296 Epoch: 15 Global Step: 156610 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 02:59:02,952-Speed 5980.29 samples/sec Loss 4.1658 LearningRate 0.0296 Epoch: 15 Global Step: 156620 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 02:59:09,842-Speed 5950.50 samples/sec Loss 4.1464 LearningRate 0.0296 Epoch: 15 Global Step: 156630 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 02:59:16,693-Speed 5980.48 samples/sec Loss 4.1519 LearningRate 0.0296 Epoch: 15 Global Step: 156640 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 02:59:23,541-Speed 5981.87 samples/sec Loss 4.1231 LearningRate 0.0296 Epoch: 15 Global Step: 156650 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 02:59:30,392-Speed 5980.33 samples/sec Loss 4.1937 LearningRate 0.0295 Epoch: 15 Global Step: 156660 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 02:59:37,245-Speed 5977.39 samples/sec Loss 4.1689 LearningRate 0.0295 Epoch: 15 Global Step: 156670 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-09 02:59:44,101-Speed 5975.54 samples/sec Loss 4.1749 LearningRate 0.0295 Epoch: 15 Global Step: 156680 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 02:59:50,964-Speed 5972.65 samples/sec Loss 4.1754 LearningRate 0.0295 Epoch: 15 Global Step: 156690 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 02:59:57,830-Speed 5966.97 samples/sec Loss 4.1568 LearningRate 0.0295 Epoch: 15 Global Step: 156700 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 03:00:04,694-Speed 5968.47 samples/sec Loss 4.1461 LearningRate 0.0295 Epoch: 15 Global Step: 156710 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 03:00:11,546-Speed 5980.19 samples/sec Loss 4.1746 LearningRate 0.0295 Epoch: 15 Global Step: 156720 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 03:00:18,435-Speed 5947.38 samples/sec Loss 4.1771 LearningRate 0.0295 Epoch: 15 Global Step: 156730 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 03:00:25,308-Speed 5960.94 samples/sec Loss 4.1022 LearningRate 0.0294 Epoch: 15 Global Step: 156740 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 03:00:32,164-Speed 5975.32 samples/sec Loss 4.1516 LearningRate 0.0294 Epoch: 15 Global Step: 156750 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 03:00:39,025-Speed 5972.62 samples/sec Loss 4.1577 LearningRate 0.0294 Epoch: 15 Global Step: 156760 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:00:45,899-Speed 5959.59 samples/sec Loss 4.2068 LearningRate 0.0294 Epoch: 15 Global Step: 156770 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:00:52,758-Speed 5973.40 samples/sec Loss 4.1418 LearningRate 0.0294 Epoch: 15 Global Step: 156780 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:00:59,614-Speed 5976.06 samples/sec Loss 4.2003 LearningRate 0.0294 Epoch: 15 Global Step: 156790 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:01:06,511-Speed 5940.19 samples/sec Loss 4.1481 LearningRate 0.0294 Epoch: 15 Global Step: 156800 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:01:13,375-Speed 5969.59 samples/sec Loss 4.1521 LearningRate 0.0294 Epoch: 15 Global Step: 156810 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:01:20,229-Speed 5977.88 samples/sec Loss 4.1691 LearningRate 0.0294 Epoch: 15 Global Step: 156820 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:01:27,090-Speed 5969.73 samples/sec Loss 4.1878 LearningRate 0.0293 Epoch: 15 Global Step: 156830 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:01:33,955-Speed 5968.03 samples/sec Loss 4.2120 LearningRate 0.0293 Epoch: 15 Global Step: 156840 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:01:40,838-Speed 5952.06 samples/sec Loss 4.1504 LearningRate 0.0293 Epoch: 15 Global Step: 156850 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:01:47,704-Speed 5966.85 samples/sec Loss 4.0900 LearningRate 0.0293 Epoch: 15 Global Step: 156860 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:01:54,612-Speed 5930.69 samples/sec Loss 4.1585 LearningRate 0.0293 Epoch: 15 Global Step: 156870 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:02:01,493-Speed 5953.82 samples/sec Loss 4.1356 LearningRate 0.0293 Epoch: 15 Global Step: 156880 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:02:08,347-Speed 5977.38 samples/sec Loss 4.1211 LearningRate 0.0293 Epoch: 15 Global Step: 156890 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:02:15,198-Speed 5980.17 samples/sec Loss 4.1373 LearningRate 0.0293 Epoch: 15 Global Step: 156900 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:02:22,047-Speed 5981.71 samples/sec Loss 4.1141 LearningRate 0.0292 Epoch: 15 Global Step: 156910 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:02:28,908-Speed 5970.72 samples/sec Loss 4.1448 LearningRate 0.0292 Epoch: 15 Global Step: 156920 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:02:35,754-Speed 5984.85 samples/sec Loss 4.1626 LearningRate 0.0292 Epoch: 15 Global Step: 156930 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:02:42,663-Speed 5929.31 samples/sec Loss 4.1596 LearningRate 0.0292 Epoch: 15 Global Step: 156940 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:02:49,547-Speed 5951.43 samples/sec Loss 4.1415 LearningRate 0.0292 Epoch: 15 Global Step: 156950 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:02:56,420-Speed 5961.37 samples/sec Loss 4.1384 LearningRate 0.0292 Epoch: 15 Global Step: 156960 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:03:03,263-Speed 5986.32 samples/sec Loss 4.1235 LearningRate 0.0292 Epoch: 15 Global Step: 156970 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:03:10,128-Speed 5967.96 samples/sec Loss 4.1331 LearningRate 0.0292 Epoch: 15 Global Step: 156980 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:03:16,974-Speed 5984.06 samples/sec Loss 4.1235 LearningRate 0.0292 Epoch: 15 Global Step: 156990 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:03:23,836-Speed 5970.80 samples/sec Loss 4.1226 LearningRate 0.0291 Epoch: 15 Global Step: 157000 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:03:30,724-Speed 5947.90 samples/sec Loss 4.1457 LearningRate 0.0291 Epoch: 15 Global Step: 157010 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 03:03:37,609-Speed 5951.78 samples/sec Loss 4.1498 LearningRate 0.0291 Epoch: 15 Global Step: 157020 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:03:44,471-Speed 5970.45 samples/sec Loss 4.1506 LearningRate 0.0291 Epoch: 15 Global Step: 157030 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:03:51,331-Speed 5971.62 samples/sec Loss 4.1670 LearningRate 0.0291 Epoch: 15 Global Step: 157040 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:03:58,198-Speed 5966.92 samples/sec Loss 4.1241 LearningRate 0.0291 Epoch: 15 Global Step: 157050 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:04:05,057-Speed 5972.73 samples/sec Loss 4.1754 LearningRate 0.0291 Epoch: 15 Global Step: 157060 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:04:11,909-Speed 5979.54 samples/sec Loss 4.1145 LearningRate 0.0291 Epoch: 15 Global Step: 157070 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:04:18,782-Speed 5960.90 samples/sec Loss 4.1807 LearningRate 0.0291 Epoch: 15 Global Step: 157080 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:04:25,645-Speed 5969.56 samples/sec Loss 4.1774 LearningRate 0.0290 Epoch: 15 Global Step: 157090 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:04:32,497-Speed 5978.53 samples/sec Loss 4.1043 LearningRate 0.0290 Epoch: 15 Global Step: 157100 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:04:39,358-Speed 5971.30 samples/sec Loss 4.1587 LearningRate 0.0290 Epoch: 15 Global Step: 157110 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:04:46,224-Speed 5972.73 samples/sec Loss 4.1221 LearningRate 0.0290 Epoch: 15 Global Step: 157120 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:04:53,073-Speed 5981.39 samples/sec Loss 4.1582 LearningRate 0.0290 Epoch: 15 Global Step: 157130 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:04:59,948-Speed 5961.59 samples/sec Loss 4.1386 LearningRate 0.0290 Epoch: 15 Global Step: 157140 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:05:06,799-Speed 5979.37 samples/sec Loss 4.1764 LearningRate 0.0290 Epoch: 15 Global Step: 157150 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:05:13,656-Speed 5974.43 samples/sec Loss 4.1335 LearningRate 0.0290 Epoch: 15 Global Step: 157160 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:05:20,537-Speed 5954.22 samples/sec Loss 4.1002 LearningRate 0.0289 Epoch: 15 Global Step: 157170 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:05:27,389-Speed 5979.52 samples/sec Loss 4.1463 LearningRate 0.0289 Epoch: 15 Global Step: 157180 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:05:34,233-Speed 5985.21 samples/sec Loss 4.1443 LearningRate 0.0289 Epoch: 15 Global Step: 157190 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:05:41,076-Speed 5987.52 samples/sec Loss 4.1549 LearningRate 0.0289 Epoch: 15 Global Step: 157200 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:05:47,925-Speed 5981.97 samples/sec Loss 4.1182 LearningRate 0.0289 Epoch: 15 Global Step: 157210 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:05:54,766-Speed 5987.88 samples/sec Loss 4.0860 LearningRate 0.0289 Epoch: 15 Global Step: 157220 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:06:01,616-Speed 5980.39 samples/sec Loss 4.0975 LearningRate 0.0289 Epoch: 15 Global Step: 157230 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:06:08,469-Speed 5980.46 samples/sec Loss 4.1216 LearningRate 0.0289 Epoch: 15 Global Step: 157240 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 03:06:15,340-Speed 5962.05 samples/sec Loss 4.1086 LearningRate 0.0289 Epoch: 15 Global Step: 157250 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 03:06:22,181-Speed 5988.95 samples/sec Loss 4.0972 LearningRate 0.0288 Epoch: 15 Global Step: 157260 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:06:29,018-Speed 5992.01 samples/sec Loss 4.1149 LearningRate 0.0288 Epoch: 15 Global Step: 157270 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:06:35,876-Speed 5973.42 samples/sec Loss 4.0988 LearningRate 0.0288 Epoch: 15 Global Step: 157280 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:06:42,716-Speed 5991.05 samples/sec Loss 4.1578 LearningRate 0.0288 Epoch: 15 Global Step: 157290 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:06:49,563-Speed 5984.66 samples/sec Loss 4.1290 LearningRate 0.0288 Epoch: 15 Global Step: 157300 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:06:56,415-Speed 5977.98 samples/sec Loss 4.0650 LearningRate 0.0288 Epoch: 15 Global Step: 157310 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:07:03,277-Speed 5971.25 samples/sec Loss 4.1333 LearningRate 0.0288 Epoch: 15 Global Step: 157320 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:07:10,128-Speed 5980.36 samples/sec Loss 4.0867 LearningRate 0.0288 Epoch: 15 Global Step: 157330 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:07:16,977-Speed 5981.34 samples/sec Loss 4.1083 LearningRate 0.0288 Epoch: 15 Global Step: 157340 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:07:23,832-Speed 5978.46 samples/sec Loss 4.1164 LearningRate 0.0287 Epoch: 15 Global Step: 157350 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:07:30,678-Speed 5984.39 samples/sec Loss 4.1165 LearningRate 0.0287 Epoch: 15 Global Step: 157360 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:07:37,534-Speed 5976.68 samples/sec Loss 4.1019 LearningRate 0.0287 Epoch: 15 Global Step: 157370 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:07:44,414-Speed 5954.35 samples/sec Loss 4.1153 LearningRate 0.0287 Epoch: 15 Global Step: 157380 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:07:51,268-Speed 5977.55 samples/sec Loss 4.1249 LearningRate 0.0287 Epoch: 15 Global Step: 157390 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:07:58,113-Speed 5984.43 samples/sec Loss 4.0980 LearningRate 0.0287 Epoch: 15 Global Step: 157400 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:08:04,979-Speed 5967.26 samples/sec Loss 4.0585 LearningRate 0.0287 Epoch: 15 Global Step: 157410 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:08:11,832-Speed 5978.44 samples/sec Loss 4.1441 LearningRate 0.0287 Epoch: 15 Global Step: 157420 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:08:18,706-Speed 5961.58 samples/sec Loss 4.1191 LearningRate 0.0286 Epoch: 15 Global Step: 157430 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:08:25,578-Speed 5961.95 samples/sec Loss 4.0886 LearningRate 0.0286 Epoch: 15 Global Step: 157440 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:08:32,463-Speed 5950.62 samples/sec Loss 4.0765 LearningRate 0.0286 Epoch: 15 Global Step: 157450 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:08:39,322-Speed 5972.80 samples/sec Loss 4.1047 LearningRate 0.0286 Epoch: 15 Global Step: 157460 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:08:46,167-Speed 5985.07 samples/sec Loss 4.1302 LearningRate 0.0286 Epoch: 15 Global Step: 157470 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:08:53,020-Speed 5980.64 samples/sec Loss 4.1277 LearningRate 0.0286 Epoch: 15 Global Step: 157480 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:08:59,896-Speed 5957.96 samples/sec Loss 4.0997 LearningRate 0.0286 Epoch: 15 Global Step: 157490 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 03:09:06,736-Speed 5989.21 samples/sec Loss 4.0890 LearningRate 0.0286 Epoch: 15 Global Step: 157500 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:09:13,591-Speed 5977.05 samples/sec Loss 4.0679 LearningRate 0.0286 Epoch: 15 Global Step: 157510 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:09:20,449-Speed 5973.49 samples/sec Loss 4.1639 LearningRate 0.0285 Epoch: 15 Global Step: 157520 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:09:27,328-Speed 5956.09 samples/sec Loss 4.1010 LearningRate 0.0285 Epoch: 15 Global Step: 157530 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:09:34,197-Speed 5964.22 samples/sec Loss 4.1327 LearningRate 0.0285 Epoch: 15 Global Step: 157540 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:09:41,052-Speed 5976.00 samples/sec Loss 4.0896 LearningRate 0.0285 Epoch: 15 Global Step: 157550 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:09:47,909-Speed 5974.91 samples/sec Loss 4.1016 LearningRate 0.0285 Epoch: 15 Global Step: 157560 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:09:54,766-Speed 5974.15 samples/sec Loss 4.1213 LearningRate 0.0285 Epoch: 15 Global Step: 157570 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:10:01,641-Speed 5959.24 samples/sec Loss 4.0776 LearningRate 0.0285 Epoch: 15 Global Step: 157580 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:10:08,531-Speed 5946.29 samples/sec Loss 4.0950 LearningRate 0.0285 Epoch: 15 Global Step: 157590 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:10:15,381-Speed 5980.55 samples/sec Loss 4.1173 LearningRate 0.0285 Epoch: 15 Global Step: 157600 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 03:10:22,247-Speed 5967.00 samples/sec Loss 4.1222 LearningRate 0.0284 Epoch: 15 Global Step: 157610 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:10:29,126-Speed 5956.39 samples/sec Loss 4.0914 LearningRate 0.0284 Epoch: 15 Global Step: 157620 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:10:35,989-Speed 5969.91 samples/sec Loss 4.0634 LearningRate 0.0284 Epoch: 15 Global Step: 157630 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:10:42,848-Speed 5972.09 samples/sec Loss 4.0967 LearningRate 0.0284 Epoch: 15 Global Step: 157640 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:10:49,757-Speed 5929.99 samples/sec Loss 4.0964 LearningRate 0.0284 Epoch: 15 Global Step: 157650 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:10:56,607-Speed 5980.57 samples/sec Loss 4.0664 LearningRate 0.0284 Epoch: 15 Global Step: 157660 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:11:03,467-Speed 5972.29 samples/sec Loss 4.0863 LearningRate 0.0284 Epoch: 15 Global Step: 157670 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:11:10,325-Speed 5973.71 samples/sec Loss 4.0356 LearningRate 0.0284 Epoch: 15 Global Step: 157680 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:11:17,193-Speed 5965.35 samples/sec Loss 4.1007 LearningRate 0.0284 Epoch: 15 Global Step: 157690 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:11:24,047-Speed 5976.65 samples/sec Loss 4.1201 LearningRate 0.0283 Epoch: 15 Global Step: 157700 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:11:30,925-Speed 5956.39 samples/sec Loss 4.0702 LearningRate 0.0283 Epoch: 15 Global Step: 157710 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:11:37,850-Speed 5916.88 samples/sec Loss 4.1187 LearningRate 0.0283 Epoch: 15 Global Step: 157720 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:11:44,701-Speed 5979.15 samples/sec Loss 4.0497 LearningRate 0.0283 Epoch: 15 Global Step: 157730 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:11:51,566-Speed 5968.26 samples/sec Loss 4.1071 LearningRate 0.0283 Epoch: 15 Global Step: 157740 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:11:58,416-Speed 5981.43 samples/sec Loss 4.1067 LearningRate 0.0283 Epoch: 15 Global Step: 157750 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:12:05,266-Speed 5979.78 samples/sec Loss 4.0950 LearningRate 0.0283 Epoch: 15 Global Step: 157760 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:12:12,110-Speed 5986.47 samples/sec Loss 4.1156 LearningRate 0.0283 Epoch: 15 Global Step: 157770 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:12:18,967-Speed 5974.40 samples/sec Loss 4.0842 LearningRate 0.0282 Epoch: 15 Global Step: 157780 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:12:25,848-Speed 5954.09 samples/sec Loss 4.1205 LearningRate 0.0282 Epoch: 15 Global Step: 157790 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:12:32,717-Speed 5964.00 samples/sec Loss 4.1121 LearningRate 0.0282 Epoch: 15 Global Step: 157800 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:12:39,594-Speed 5957.63 samples/sec Loss 4.0867 LearningRate 0.0282 Epoch: 15 Global Step: 157810 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:12:46,466-Speed 5961.32 samples/sec Loss 4.0473 LearningRate 0.0282 Epoch: 15 Global Step: 157820 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:12:53,359-Speed 5944.82 samples/sec Loss 4.1340 LearningRate 0.0282 Epoch: 15 Global Step: 157830 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:13:00,234-Speed 5959.09 samples/sec Loss 4.1038 LearningRate 0.0282 Epoch: 15 Global Step: 157840 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:13:07,094-Speed 5974.54 samples/sec Loss 4.1040 LearningRate 0.0282 Epoch: 15 Global Step: 157850 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:13:13,941-Speed 5983.89 samples/sec Loss 4.0800 LearningRate 0.0282 Epoch: 15 Global Step: 157860 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:13:20,790-Speed 5981.14 samples/sec Loss 4.0240 LearningRate 0.0281 Epoch: 15 Global Step: 157870 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:13:27,663-Speed 5960.70 samples/sec Loss 4.0477 LearningRate 0.0281 Epoch: 15 Global Step: 157880 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:13:34,535-Speed 5961.06 samples/sec Loss 4.0608 LearningRate 0.0281 Epoch: 15 Global Step: 157890 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:13:41,397-Speed 5971.13 samples/sec Loss 4.1083 LearningRate 0.0281 Epoch: 15 Global Step: 157900 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:13:48,269-Speed 5960.83 samples/sec Loss 4.0791 LearningRate 0.0281 Epoch: 15 Global Step: 157910 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:13:55,133-Speed 5969.05 samples/sec Loss 4.0862 LearningRate 0.0281 Epoch: 15 Global Step: 157920 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:14:01,994-Speed 5971.24 samples/sec Loss 4.0541 LearningRate 0.0281 Epoch: 15 Global Step: 157930 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:14:08,920-Speed 5914.71 samples/sec Loss 4.0555 LearningRate 0.0281 Epoch: 15 Global Step: 157940 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:14:15,775-Speed 5976.68 samples/sec Loss 4.0383 LearningRate 0.0281 Epoch: 15 Global Step: 157950 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:14:22,628-Speed 5978.82 samples/sec Loss 4.1058 LearningRate 0.0280 Epoch: 15 Global Step: 157960 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:14:29,489-Speed 5970.84 samples/sec Loss 4.0599 LearningRate 0.0280 Epoch: 15 Global Step: 157970 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:14:36,336-Speed 5984.91 samples/sec Loss 4.0647 LearningRate 0.0280 Epoch: 15 Global Step: 157980 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:14:43,184-Speed 5982.24 samples/sec Loss 4.0538 LearningRate 0.0280 Epoch: 15 Global Step: 157990 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:14:50,027-Speed 5986.45 samples/sec Loss 4.0651 LearningRate 0.0280 Epoch: 15 Global Step: 158000 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 03:14:56,931-Speed 5934.21 samples/sec Loss 4.1176 LearningRate 0.0280 Epoch: 15 Global Step: 158010 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 03:15:03,790-Speed 5973.45 samples/sec Loss 4.0626 LearningRate 0.0280 Epoch: 15 Global Step: 158020 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 03:15:10,668-Speed 5955.73 samples/sec Loss 4.0581 LearningRate 0.0280 Epoch: 15 Global Step: 158030 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 03:15:17,528-Speed 5973.36 samples/sec Loss 4.0801 LearningRate 0.0280 Epoch: 15 Global Step: 158040 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:15:24,432-Speed 5934.53 samples/sec Loss 4.0741 LearningRate 0.0279 Epoch: 15 Global Step: 158050 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:15:31,311-Speed 5954.49 samples/sec Loss 4.0493 LearningRate 0.0279 Epoch: 15 Global Step: 158060 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:15:38,197-Speed 5949.71 samples/sec Loss 4.0634 LearningRate 0.0279 Epoch: 15 Global Step: 158070 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:15:45,057-Speed 5972.82 samples/sec Loss 4.0060 LearningRate 0.0279 Epoch: 15 Global Step: 158080 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:15:51,919-Speed 5970.22 samples/sec Loss 4.0307 LearningRate 0.0279 Epoch: 15 Global Step: 158090 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:15:58,770-Speed 5981.75 samples/sec Loss 4.0587 LearningRate 0.0279 Epoch: 15 Global Step: 158100 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:16:05,627-Speed 5977.15 samples/sec Loss 4.0398 LearningRate 0.0279 Epoch: 15 Global Step: 158110 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:16:12,480-Speed 5977.50 samples/sec Loss 4.0764 LearningRate 0.0279 Epoch: 15 Global Step: 158120 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:16:19,361-Speed 5953.58 samples/sec Loss 4.0278 LearningRate 0.0279 Epoch: 15 Global Step: 158130 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:16:26,223-Speed 5970.48 samples/sec Loss 4.0540 LearningRate 0.0278 Epoch: 15 Global Step: 158140 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:16:33,080-Speed 5974.68 samples/sec Loss 4.0649 LearningRate 0.0278 Epoch: 15 Global Step: 158150 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:16:39,930-Speed 5980.48 samples/sec Loss 4.1051 LearningRate 0.0278 Epoch: 15 Global Step: 158160 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:16:46,801-Speed 5962.53 samples/sec Loss 4.0303 LearningRate 0.0278 Epoch: 15 Global Step: 158170 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:16:53,651-Speed 5980.62 samples/sec Loss 4.0544 LearningRate 0.0278 Epoch: 15 Global Step: 158180 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:17:00,499-Speed 5983.08 samples/sec Loss 4.0711 LearningRate 0.0278 Epoch: 15 Global Step: 158190 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:17:07,347-Speed 5982.10 samples/sec Loss 4.0404 LearningRate 0.0278 Epoch: 15 Global Step: 158200 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:17:14,193-Speed 5983.81 samples/sec Loss 4.0777 LearningRate 0.0278 Epoch: 15 Global Step: 158210 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:17:21,051-Speed 5972.97 samples/sec Loss 4.0835 LearningRate 0.0278 Epoch: 15 Global Step: 158220 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:17:27,923-Speed 5961.74 samples/sec Loss 4.0317 LearningRate 0.0277 Epoch: 15 Global Step: 158230 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:17:34,776-Speed 5977.69 samples/sec Loss 4.0261 LearningRate 0.0277 Epoch: 15 Global Step: 158240 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:17:41,650-Speed 5959.13 samples/sec Loss 4.0772 LearningRate 0.0277 Epoch: 15 Global Step: 158250 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:17:48,498-Speed 5983.06 samples/sec Loss 4.0628 LearningRate 0.0277 Epoch: 15 Global Step: 158260 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:17:55,353-Speed 5975.72 samples/sec Loss 4.0708 LearningRate 0.0277 Epoch: 15 Global Step: 158270 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 03:18:02,216-Speed 5970.15 samples/sec Loss 4.0536 LearningRate 0.0277 Epoch: 15 Global Step: 158280 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:18:09,093-Speed 5957.80 samples/sec Loss 4.0706 LearningRate 0.0277 Epoch: 15 Global Step: 158290 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:18:15,946-Speed 5978.26 samples/sec Loss 4.0080 LearningRate 0.0277 Epoch: 15 Global Step: 158300 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:18:22,814-Speed 5964.59 samples/sec Loss 4.0314 LearningRate 0.0276 Epoch: 15 Global Step: 158310 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:18:29,686-Speed 5962.54 samples/sec Loss 4.0395 LearningRate 0.0276 Epoch: 15 Global Step: 158320 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:18:36,559-Speed 5961.54 samples/sec Loss 4.0371 LearningRate 0.0276 Epoch: 15 Global Step: 158330 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:18:43,429-Speed 5963.67 samples/sec Loss 4.0750 LearningRate 0.0276 Epoch: 15 Global Step: 158340 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:18:50,287-Speed 5973.08 samples/sec Loss 4.0483 LearningRate 0.0276 Epoch: 15 Global Step: 158350 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:18:57,138-Speed 5979.98 samples/sec Loss 4.0717 LearningRate 0.0276 Epoch: 15 Global Step: 158360 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:19:04,014-Speed 5961.11 samples/sec Loss 4.0648 LearningRate 0.0276 Epoch: 15 Global Step: 158370 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:19:10,885-Speed 5962.07 samples/sec Loss 4.0490 LearningRate 0.0276 Epoch: 15 Global Step: 158380 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 03:19:17,765-Speed 5955.24 samples/sec Loss 4.0380 LearningRate 0.0276 Epoch: 15 Global Step: 158390 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 03:19:24,625-Speed 5974.81 samples/sec Loss 4.0561 LearningRate 0.0275 Epoch: 15 Global Step: 158400 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 03:19:31,506-Speed 5953.61 samples/sec Loss 4.0572 LearningRate 0.0275 Epoch: 15 Global Step: 158410 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 03:19:38,369-Speed 5968.92 samples/sec Loss 4.0087 LearningRate 0.0275 Epoch: 15 Global Step: 158420 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 03:19:45,218-Speed 5981.63 samples/sec Loss 4.0634 LearningRate 0.0275 Epoch: 15 Global Step: 158430 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:19:52,065-Speed 5983.56 samples/sec Loss 4.0442 LearningRate 0.0275 Epoch: 15 Global Step: 158440 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:19:58,915-Speed 5980.40 samples/sec Loss 4.0522 LearningRate 0.0275 Epoch: 15 Global Step: 158450 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:20:05,769-Speed 5976.69 samples/sec Loss 4.0747 LearningRate 0.0275 Epoch: 15 Global Step: 158460 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:20:12,614-Speed 5985.52 samples/sec Loss 4.0749 LearningRate 0.0275 Epoch: 15 Global Step: 158470 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:20:19,471-Speed 5974.18 samples/sec Loss 4.0281 LearningRate 0.0275 Epoch: 15 Global Step: 158480 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:20:26,337-Speed 5967.16 samples/sec Loss 4.0193 LearningRate 0.0274 Epoch: 15 Global Step: 158490 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:20:33,186-Speed 5981.98 samples/sec Loss 4.0431 LearningRate 0.0274 Epoch: 15 Global Step: 158500 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:20:40,046-Speed 5972.21 samples/sec Loss 4.0445 LearningRate 0.0274 Epoch: 15 Global Step: 158510 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:20:46,915-Speed 5963.87 samples/sec Loss 4.0484 LearningRate 0.0274 Epoch: 15 Global Step: 158520 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:20:53,767-Speed 5979.13 samples/sec Loss 4.0336 LearningRate 0.0274 Epoch: 15 Global Step: 158530 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 03:21:00,623-Speed 5975.04 samples/sec Loss 4.0266 LearningRate 0.0274 Epoch: 15 Global Step: 158540 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 03:21:07,473-Speed 5983.33 samples/sec Loss 4.0695 LearningRate 0.0274 Epoch: 15 Global Step: 158550 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:21:14,337-Speed 5967.78 samples/sec Loss 4.0612 LearningRate 0.0274 Epoch: 15 Global Step: 158560 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:21:21,189-Speed 5979.30 samples/sec Loss 4.0786 LearningRate 0.0274 Epoch: 15 Global Step: 158570 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:21:28,035-Speed 5984.98 samples/sec Loss 4.0473 LearningRate 0.0273 Epoch: 15 Global Step: 158580 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:21:34,888-Speed 5978.68 samples/sec Loss 3.9614 LearningRate 0.0273 Epoch: 15 Global Step: 158590 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:21:41,737-Speed 5980.81 samples/sec Loss 4.0150 LearningRate 0.0273 Epoch: 15 Global Step: 158600 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:21:48,590-Speed 5978.35 samples/sec Loss 4.0109 LearningRate 0.0273 Epoch: 15 Global Step: 158610 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:21:55,460-Speed 5963.21 samples/sec Loss 4.0254 LearningRate 0.0273 Epoch: 15 Global Step: 158620 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:22:02,314-Speed 5977.31 samples/sec Loss 4.0399 LearningRate 0.0273 Epoch: 15 Global Step: 158630 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:22:09,188-Speed 5959.58 samples/sec Loss 3.9822 LearningRate 0.0273 Epoch: 15 Global Step: 158640 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:22:16,070-Speed 5953.32 samples/sec Loss 4.0373 LearningRate 0.0273 Epoch: 15 Global Step: 158650 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 03:22:22,924-Speed 5976.12 samples/sec Loss 4.0529 LearningRate 0.0273 Epoch: 15 Global Step: 158660 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:22:29,800-Speed 5960.15 samples/sec Loss 4.0520 LearningRate 0.0272 Epoch: 15 Global Step: 158670 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:22:36,682-Speed 5954.66 samples/sec Loss 4.0126 LearningRate 0.0272 Epoch: 15 Global Step: 158680 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:22:43,536-Speed 5976.56 samples/sec Loss 4.0135 LearningRate 0.0272 Epoch: 15 Global Step: 158690 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:22:50,390-Speed 5977.19 samples/sec Loss 4.0147 LearningRate 0.0272 Epoch: 15 Global Step: 158700 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:22:57,244-Speed 5977.69 samples/sec Loss 4.0329 LearningRate 0.0272 Epoch: 15 Global Step: 158710 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:23:04,112-Speed 5964.30 samples/sec Loss 4.0186 LearningRate 0.0272 Epoch: 15 Global Step: 158720 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:23:10,984-Speed 5962.24 samples/sec Loss 3.9832 LearningRate 0.0272 Epoch: 15 Global Step: 158730 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:23:17,836-Speed 5978.32 samples/sec Loss 4.0162 LearningRate 0.0272 Epoch: 15 Global Step: 158740 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:23:24,697-Speed 5971.26 samples/sec Loss 3.9923 LearningRate 0.0272 Epoch: 15 Global Step: 158750 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:23:31,562-Speed 5967.26 samples/sec Loss 4.0776 LearningRate 0.0271 Epoch: 15 Global Step: 158760 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 03:23:38,467-Speed 5933.15 samples/sec Loss 4.0045 LearningRate 0.0271 Epoch: 15 Global Step: 158770 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:23:45,333-Speed 5967.44 samples/sec Loss 4.0368 LearningRate 0.0271 Epoch: 15 Global Step: 158780 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:23:52,199-Speed 5966.49 samples/sec Loss 4.0553 LearningRate 0.0271 Epoch: 15 Global Step: 158790 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:23:59,051-Speed 5979.48 samples/sec Loss 3.9756 LearningRate 0.0271 Epoch: 15 Global Step: 158800 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:24:05,928-Speed 5956.40 samples/sec Loss 4.0120 LearningRate 0.0271 Epoch: 15 Global Step: 158810 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:24:12,774-Speed 5985.00 samples/sec Loss 4.0121 LearningRate 0.0271 Epoch: 15 Global Step: 158820 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:24:19,632-Speed 5973.11 samples/sec Loss 3.9834 LearningRate 0.0271 Epoch: 15 Global Step: 158830 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:24:26,479-Speed 5982.85 samples/sec Loss 4.0482 LearningRate 0.0271 Epoch: 15 Global Step: 158840 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:24:33,333-Speed 5977.88 samples/sec Loss 4.0279 LearningRate 0.0270 Epoch: 15 Global Step: 158850 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:24:40,180-Speed 5982.83 samples/sec Loss 3.9974 LearningRate 0.0270 Epoch: 15 Global Step: 158860 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:24:47,027-Speed 5983.12 samples/sec Loss 4.0448 LearningRate 0.0270 Epoch: 15 Global Step: 158870 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:24:53,879-Speed 5981.58 samples/sec Loss 4.0143 LearningRate 0.0270 Epoch: 15 Global Step: 158880 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:25:00,741-Speed 5970.45 samples/sec Loss 4.0614 LearningRate 0.0270 Epoch: 15 Global Step: 158890 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:25:07,601-Speed 5971.95 samples/sec Loss 4.0524 LearningRate 0.0270 Epoch: 15 Global Step: 158900 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:25:14,480-Speed 5955.83 samples/sec Loss 4.0226 LearningRate 0.0270 Epoch: 15 Global Step: 158910 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 03:25:21,364-Speed 5952.62 samples/sec Loss 3.9965 LearningRate 0.0270 Epoch: 15 Global Step: 158920 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:25:28,226-Speed 5970.46 samples/sec Loss 3.9703 LearningRate 0.0270 Epoch: 15 Global Step: 158930 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:25:35,079-Speed 5977.91 samples/sec Loss 3.9650 LearningRate 0.0269 Epoch: 15 Global Step: 158940 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:25:41,933-Speed 5976.89 samples/sec Loss 3.9650 LearningRate 0.0269 Epoch: 15 Global Step: 158950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:25:48,808-Speed 5959.46 samples/sec Loss 4.0116 LearningRate 0.0269 Epoch: 15 Global Step: 158960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:25:55,666-Speed 5974.45 samples/sec Loss 4.0183 LearningRate 0.0269 Epoch: 15 Global Step: 158970 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-09 03:26:02,522-Speed 5975.95 samples/sec Loss 4.0095 LearningRate 0.0269 Epoch: 15 Global Step: 158980 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:26:09,372-Speed 5980.61 samples/sec Loss 3.9790 LearningRate 0.0269 Epoch: 15 Global Step: 158990 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:26:16,244-Speed 5961.16 samples/sec Loss 4.0058 LearningRate 0.0269 Epoch: 15 Global Step: 159000 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:26:23,101-Speed 5974.41 samples/sec Loss 3.9971 LearningRate 0.0269 Epoch: 15 Global Step: 159010 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:26:29,943-Speed 5987.20 samples/sec Loss 3.9795 LearningRate 0.0269 Epoch: 15 Global Step: 159020 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:26:36,827-Speed 5954.80 samples/sec Loss 4.0004 LearningRate 0.0268 Epoch: 15 Global Step: 159030 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:26:43,691-Speed 5970.10 samples/sec Loss 3.9778 LearningRate 0.0268 Epoch: 15 Global Step: 159040 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:26:50,554-Speed 5968.28 samples/sec Loss 4.0246 LearningRate 0.0268 Epoch: 15 Global Step: 159050 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:26:57,398-Speed 5986.65 samples/sec Loss 3.9643 LearningRate 0.0268 Epoch: 15 Global Step: 159060 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:27:04,271-Speed 5960.87 samples/sec Loss 3.9714 LearningRate 0.0268 Epoch: 15 Global Step: 159070 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:27:11,141-Speed 5962.09 samples/sec Loss 3.9817 LearningRate 0.0268 Epoch: 15 Global Step: 159080 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-09 03:27:18,003-Speed 5970.58 samples/sec Loss 4.0135 LearningRate 0.0268 Epoch: 15 Global Step: 159090 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-09 03:27:24,856-Speed 5978.58 samples/sec Loss 3.9765 LearningRate 0.0268 Epoch: 15 Global Step: 159100 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-09 03:27:31,722-Speed 5966.56 samples/sec Loss 3.9831 LearningRate 0.0268 Epoch: 15 Global Step: 159110 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-09 03:27:38,565-Speed 5986.68 samples/sec Loss 3.9788 LearningRate 0.0267 Epoch: 15 Global Step: 159120 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:27:45,417-Speed 5978.51 samples/sec Loss 4.0640 LearningRate 0.0267 Epoch: 15 Global Step: 159130 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:27:52,271-Speed 5976.93 samples/sec Loss 3.9851 LearningRate 0.0267 Epoch: 15 Global Step: 159140 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:27:59,132-Speed 5970.94 samples/sec Loss 4.0737 LearningRate 0.0267 Epoch: 15 Global Step: 159150 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:28:05,997-Speed 5967.88 samples/sec Loss 3.9984 LearningRate 0.0267 Epoch: 15 Global Step: 159160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:28:12,846-Speed 5980.96 samples/sec Loss 3.9792 LearningRate 0.0267 Epoch: 15 Global Step: 159170 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:28:19,718-Speed 5962.33 samples/sec Loss 3.9901 LearningRate 0.0267 Epoch: 15 Global Step: 159180 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:28:26,576-Speed 5973.19 samples/sec Loss 4.0144 LearningRate 0.0267 Epoch: 15 Global Step: 159190 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:28:33,434-Speed 5973.80 samples/sec Loss 3.9947 LearningRate 0.0267 Epoch: 15 Global Step: 159200 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:28:40,281-Speed 5983.42 samples/sec Loss 4.0455 LearningRate 0.0266 Epoch: 15 Global Step: 159210 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:28:47,135-Speed 5980.40 samples/sec Loss 3.9656 LearningRate 0.0266 Epoch: 15 Global Step: 159220 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:28:54,014-Speed 5955.54 samples/sec Loss 3.9672 LearningRate 0.0266 Epoch: 15 Global Step: 159230 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:29:00,887-Speed 5960.62 samples/sec Loss 3.9826 LearningRate 0.0266 Epoch: 15 Global Step: 159240 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:29:07,747-Speed 5973.92 samples/sec Loss 3.9929 LearningRate 0.0266 Epoch: 15 Global Step: 159250 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:29:14,605-Speed 5973.79 samples/sec Loss 3.9518 LearningRate 0.0266 Epoch: 15 Global Step: 159260 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:29:21,465-Speed 5974.12 samples/sec Loss 4.0110 LearningRate 0.0266 Epoch: 15 Global Step: 159270 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:29:28,318-Speed 5977.98 samples/sec Loss 3.9783 LearningRate 0.0266 Epoch: 15 Global Step: 159280 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:29:35,192-Speed 5959.67 samples/sec Loss 4.0029 LearningRate 0.0266 Epoch: 15 Global Step: 159290 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:29:42,059-Speed 5965.28 samples/sec Loss 3.9803 LearningRate 0.0265 Epoch: 15 Global Step: 159300 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:29:48,916-Speed 5976.57 samples/sec Loss 3.9865 LearningRate 0.0265 Epoch: 15 Global Step: 159310 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:29:55,765-Speed 5981.45 samples/sec Loss 3.9577 LearningRate 0.0265 Epoch: 15 Global Step: 159320 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:30:02,625-Speed 5973.24 samples/sec Loss 3.9605 LearningRate 0.0265 Epoch: 15 Global Step: 159330 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:30:09,472-Speed 5983.31 samples/sec Loss 3.9610 LearningRate 0.0265 Epoch: 15 Global Step: 159340 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:30:16,336-Speed 5968.16 samples/sec Loss 3.9929 LearningRate 0.0265 Epoch: 15 Global Step: 159350 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:30:23,187-Speed 5980.78 samples/sec Loss 3.9526 LearningRate 0.0265 Epoch: 15 Global Step: 159360 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:30:30,033-Speed 5984.41 samples/sec Loss 3.9764 LearningRate 0.0265 Epoch: 15 Global Step: 159370 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:30:36,874-Speed 5987.97 samples/sec Loss 3.9191 LearningRate 0.0265 Epoch: 15 Global Step: 159380 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:30:43,725-Speed 5980.57 samples/sec Loss 4.0044 LearningRate 0.0264 Epoch: 15 Global Step: 159390 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:30:50,596-Speed 5962.20 samples/sec Loss 3.9631 LearningRate 0.0264 Epoch: 15 Global Step: 159400 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:30:57,454-Speed 5974.08 samples/sec Loss 3.9893 LearningRate 0.0264 Epoch: 15 Global Step: 159410 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:31:04,321-Speed 5969.74 samples/sec Loss 3.9580 LearningRate 0.0264 Epoch: 15 Global Step: 159420 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:31:11,187-Speed 5967.23 samples/sec Loss 3.9341 LearningRate 0.0264 Epoch: 15 Global Step: 159430 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:31:18,048-Speed 5970.77 samples/sec Loss 3.9882 LearningRate 0.0264 Epoch: 15 Global Step: 159440 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:31:24,889-Speed 5988.74 samples/sec Loss 4.0016 LearningRate 0.0264 Epoch: 15 Global Step: 159450 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:31:31,763-Speed 5959.99 samples/sec Loss 3.9541 LearningRate 0.0264 Epoch: 15 Global Step: 159460 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:31:38,617-Speed 5976.77 samples/sec Loss 4.0100 LearningRate 0.0264 Epoch: 15 Global Step: 159470 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:31:45,495-Speed 5956.79 samples/sec Loss 3.9513 LearningRate 0.0263 Epoch: 15 Global Step: 159480 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:31:52,357-Speed 5969.76 samples/sec Loss 4.0214 LearningRate 0.0263 Epoch: 15 Global Step: 159490 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:31:59,216-Speed 5973.19 samples/sec Loss 3.9519 LearningRate 0.0263 Epoch: 15 Global Step: 159500 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:32:06,066-Speed 5983.30 samples/sec Loss 3.9560 LearningRate 0.0263 Epoch: 15 Global Step: 159510 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:32:12,929-Speed 5972.31 samples/sec Loss 3.9548 LearningRate 0.0263 Epoch: 15 Global Step: 159520 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:32:19,794-Speed 5970.05 samples/sec Loss 3.9696 LearningRate 0.0263 Epoch: 15 Global Step: 159530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:32:26,660-Speed 5967.21 samples/sec Loss 3.9091 LearningRate 0.0263 Epoch: 15 Global Step: 159540 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:32:33,515-Speed 5977.58 samples/sec Loss 3.9495 LearningRate 0.0263 Epoch: 15 Global Step: 159550 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:32:40,363-Speed 5982.35 samples/sec Loss 3.9484 LearningRate 0.0263 Epoch: 15 Global Step: 159560 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:32:47,253-Speed 5946.55 samples/sec Loss 4.0056 LearningRate 0.0262 Epoch: 15 Global Step: 159570 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:32:54,095-Speed 5987.72 samples/sec Loss 3.9813 LearningRate 0.0262 Epoch: 15 Global Step: 159580 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:33:00,958-Speed 5969.33 samples/sec Loss 4.0015 LearningRate 0.0262 Epoch: 15 Global Step: 159590 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:33:07,918-Speed 5886.75 samples/sec Loss 3.9830 LearningRate 0.0262 Epoch: 15 Global Step: 159600 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-09 03:33:14,774-Speed 5975.86 samples/sec Loss 3.9651 LearningRate 0.0262 Epoch: 15 Global Step: 159610 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-09 03:33:21,776-Speed 5851.05 samples/sec Loss 3.9301 LearningRate 0.0262 Epoch: 15 Global Step: 159620 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-09 03:33:28,747-Speed 5876.76 samples/sec Loss 3.9529 LearningRate 0.0262 Epoch: 15 Global Step: 159630 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-09 03:33:35,600-Speed 5978.60 samples/sec Loss 3.9526 LearningRate 0.0262 Epoch: 15 Global Step: 159640 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-09 03:33:42,442-Speed 5987.31 samples/sec Loss 3.9365 LearningRate 0.0262 Epoch: 15 Global Step: 159650 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:33:49,327-Speed 5952.83 samples/sec Loss 3.9363 LearningRate 0.0261 Epoch: 15 Global Step: 159660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:33:56,208-Speed 5954.49 samples/sec Loss 3.9466 LearningRate 0.0261 Epoch: 15 Global Step: 159670 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:34:03,056-Speed 5982.08 samples/sec Loss 3.9281 LearningRate 0.0261 Epoch: 15 Global Step: 159680 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:34:09,951-Speed 5941.49 samples/sec Loss 3.9406 LearningRate 0.0261 Epoch: 15 Global Step: 159690 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:34:16,809-Speed 5973.95 samples/sec Loss 3.9435 LearningRate 0.0261 Epoch: 15 Global Step: 159700 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:34:23,668-Speed 5972.19 samples/sec Loss 3.9340 LearningRate 0.0261 Epoch: 15 Global Step: 159710 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:34:30,536-Speed 5965.28 samples/sec Loss 3.9763 LearningRate 0.0261 Epoch: 15 Global Step: 159720 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:34:37,383-Speed 5983.17 samples/sec Loss 3.9227 LearningRate 0.0261 Epoch: 15 Global Step: 159730 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:34:44,237-Speed 5976.94 samples/sec Loss 3.9324 LearningRate 0.0261 Epoch: 15 Global Step: 159740 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:34:51,116-Speed 5955.94 samples/sec Loss 3.9265 LearningRate 0.0260 Epoch: 15 Global Step: 159750 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-09 03:34:58,012-Speed 5941.25 samples/sec Loss 3.9342 LearningRate 0.0260 Epoch: 15 Global Step: 159760 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-09 03:35:04,853-Speed 5989.21 samples/sec Loss 3.9258 LearningRate 0.0260 Epoch: 15 Global Step: 159770 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:35:11,708-Speed 5976.22 samples/sec Loss 3.9302 LearningRate 0.0260 Epoch: 15 Global Step: 159780 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:35:18,565-Speed 5975.07 samples/sec Loss 3.9347 LearningRate 0.0260 Epoch: 15 Global Step: 159790 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:35:25,414-Speed 5981.48 samples/sec Loss 3.9470 LearningRate 0.0260 Epoch: 15 Global Step: 159800 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:35:32,265-Speed 5981.55 samples/sec Loss 3.9406 LearningRate 0.0260 Epoch: 15 Global Step: 159810 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:35:39,129-Speed 5969.33 samples/sec Loss 3.9215 LearningRate 0.0260 Epoch: 15 Global Step: 159820 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:35:45,989-Speed 5971.26 samples/sec Loss 3.9739 LearningRate 0.0260 Epoch: 15 Global Step: 159830 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:35:52,941-Speed 5895.64 samples/sec Loss 3.9507 LearningRate 0.0260 Epoch: 15 Global Step: 159840 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:35:59,809-Speed 5964.99 samples/sec Loss 3.9700 LearningRate 0.0259 Epoch: 15 Global Step: 159850 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:36:06,669-Speed 5971.82 samples/sec Loss 3.9207 LearningRate 0.0259 Epoch: 15 Global Step: 159860 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:36:13,522-Speed 5978.14 samples/sec Loss 3.9635 LearningRate 0.0259 Epoch: 15 Global Step: 159870 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:36:20,386-Speed 5968.83 samples/sec Loss 3.9203 LearningRate 0.0259 Epoch: 15 Global Step: 159880 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:36:27,240-Speed 5976.24 samples/sec Loss 3.9301 LearningRate 0.0259 Epoch: 15 Global Step: 159890 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:36:34,111-Speed 5965.72 samples/sec Loss 3.8890 LearningRate 0.0259 Epoch: 15 Global Step: 159900 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:36:40,959-Speed 5983.10 samples/sec Loss 3.9654 LearningRate 0.0259 Epoch: 15 Global Step: 159910 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:36:47,799-Speed 5988.77 samples/sec Loss 3.9176 LearningRate 0.0259 Epoch: 15 Global Step: 159920 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:36:54,644-Speed 5985.55 samples/sec Loss 3.9193 LearningRate 0.0259 Epoch: 15 Global Step: 159930 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:37:01,494-Speed 5980.48 samples/sec Loss 3.9256 LearningRate 0.0258 Epoch: 15 Global Step: 159940 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:37:08,360-Speed 5966.29 samples/sec Loss 3.9340 LearningRate 0.0258 Epoch: 15 Global Step: 159950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:37:15,221-Speed 5971.57 samples/sec Loss 3.9361 LearningRate 0.0258 Epoch: 15 Global Step: 159960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:37:22,082-Speed 5971.14 samples/sec Loss 3.9718 LearningRate 0.0258 Epoch: 15 Global Step: 159970 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:37:28,936-Speed 5976.63 samples/sec Loss 3.9438 LearningRate 0.0258 Epoch: 15 Global Step: 159980 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:37:35,778-Speed 5987.67 samples/sec Loss 3.9422 LearningRate 0.0258 Epoch: 15 Global Step: 159990 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-09 03:37:42,632-Speed 5977.34 samples/sec Loss 3.9010 LearningRate 0.0258 Epoch: 15 Global Step: 160000 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:38:09,290-[lfw][160000]XNorm: 23.752233 Training: 2022-01-09 03:38:09,291-[lfw][160000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-01-09 03:38:09,292-[lfw][160000]Accuracy-Highest: 0.99817 Training: 2022-01-09 03:38:40,143-[cfp_fp][160000]XNorm: 21.176481 Training: 2022-01-09 03:38:40,144-[cfp_fp][160000]Accuracy-Flip: 0.98929+-0.00492 Training: 2022-01-09 03:38:40,145-[cfp_fp][160000]Accuracy-Highest: 0.98929 Training: 2022-01-09 03:39:06,843-[agedb_30][160000]XNorm: 23.048697 Training: 2022-01-09 03:39:06,844-[agedb_30][160000]Accuracy-Flip: 0.97817+-0.00608 Training: 2022-01-09 03:39:06,845-[agedb_30][160000]Accuracy-Highest: 0.97833 Training: 2022-01-09 03:39:13,708-Speed 449.74 samples/sec Loss 3.9775 LearningRate 0.0258 Epoch: 15 Global Step: 160010 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:39:20,563-Speed 5976.01 samples/sec Loss 3.8734 LearningRate 0.0258 Epoch: 15 Global Step: 160020 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:39:27,412-Speed 5981.98 samples/sec Loss 3.9750 LearningRate 0.0257 Epoch: 15 Global Step: 160030 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:39:34,267-Speed 5976.47 samples/sec Loss 3.9275 LearningRate 0.0257 Epoch: 15 Global Step: 160040 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:39:41,136-Speed 5964.33 samples/sec Loss 3.9181 LearningRate 0.0257 Epoch: 15 Global Step: 160050 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:39:47,990-Speed 5977.44 samples/sec Loss 3.8779 LearningRate 0.0257 Epoch: 15 Global Step: 160060 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:39:54,846-Speed 5975.83 samples/sec Loss 3.9558 LearningRate 0.0257 Epoch: 15 Global Step: 160070 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:40:01,723-Speed 5958.15 samples/sec Loss 3.8993 LearningRate 0.0257 Epoch: 15 Global Step: 160080 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:40:08,625-Speed 5935.67 samples/sec Loss 3.9297 LearningRate 0.0257 Epoch: 15 Global Step: 160090 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:40:15,528-Speed 5934.96 samples/sec Loss 3.9346 LearningRate 0.0257 Epoch: 15 Global Step: 160100 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-09 03:40:22,418-Speed 5945.50 samples/sec Loss 3.9103 LearningRate 0.0257 Epoch: 15 Global Step: 160110 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:40:29,279-Speed 5970.73 samples/sec Loss 3.9255 LearningRate 0.0256 Epoch: 15 Global Step: 160120 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:40:36,181-Speed 5936.41 samples/sec Loss 3.9273 LearningRate 0.0256 Epoch: 15 Global Step: 160130 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:40:43,054-Speed 5959.91 samples/sec Loss 3.8907 LearningRate 0.0256 Epoch: 15 Global Step: 160140 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:40:49,927-Speed 5961.11 samples/sec Loss 3.9195 LearningRate 0.0256 Epoch: 15 Global Step: 160150 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:40:56,781-Speed 5977.73 samples/sec Loss 3.9425 LearningRate 0.0256 Epoch: 15 Global Step: 160160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:41:03,644-Speed 5968.91 samples/sec Loss 3.9025 LearningRate 0.0256 Epoch: 15 Global Step: 160170 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:41:10,496-Speed 5978.48 samples/sec Loss 3.8854 LearningRate 0.0256 Epoch: 15 Global Step: 160180 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:41:17,350-Speed 5977.42 samples/sec Loss 3.8648 LearningRate 0.0256 Epoch: 15 Global Step: 160190 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:41:24,196-Speed 5983.65 samples/sec Loss 3.9445 LearningRate 0.0256 Epoch: 15 Global Step: 160200 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:41:31,053-Speed 5974.78 samples/sec Loss 3.9311 LearningRate 0.0255 Epoch: 15 Global Step: 160210 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-09 03:41:37,923-Speed 5963.78 samples/sec Loss 3.9401 LearningRate 0.0255 Epoch: 15 Global Step: 160220 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-09 03:41:44,785-Speed 5970.35 samples/sec Loss 3.9682 LearningRate 0.0255 Epoch: 15 Global Step: 160230 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-09 03:41:51,633-Speed 5983.18 samples/sec Loss 3.9203 LearningRate 0.0255 Epoch: 15 Global Step: 160240 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-09 03:41:58,501-Speed 5965.07 samples/sec Loss 3.9316 LearningRate 0.0255 Epoch: 15 Global Step: 160250 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-09 03:42:05,358-Speed 5974.40 samples/sec Loss 3.8972 LearningRate 0.0255 Epoch: 15 Global Step: 160260 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:42:12,202-Speed 5985.93 samples/sec Loss 3.8916 LearningRate 0.0255 Epoch: 15 Global Step: 160270 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:42:19,052-Speed 5980.80 samples/sec Loss 3.8748 LearningRate 0.0255 Epoch: 15 Global Step: 160280 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:42:25,913-Speed 5970.96 samples/sec Loss 3.8884 LearningRate 0.0255 Epoch: 15 Global Step: 160290 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:42:32,779-Speed 5969.15 samples/sec Loss 3.9500 LearningRate 0.0255 Epoch: 15 Global Step: 160300 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:42:39,638-Speed 5972.69 samples/sec Loss 3.9232 LearningRate 0.0254 Epoch: 15 Global Step: 160310 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:42:46,517-Speed 5955.12 samples/sec Loss 3.8999 LearningRate 0.0254 Epoch: 15 Global Step: 160320 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:42:53,380-Speed 5971.92 samples/sec Loss 3.8871 LearningRate 0.0254 Epoch: 15 Global Step: 160330 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:43:00,259-Speed 5955.91 samples/sec Loss 3.9009 LearningRate 0.0254 Epoch: 15 Global Step: 160340 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:43:07,117-Speed 5973.63 samples/sec Loss 3.9306 LearningRate 0.0254 Epoch: 15 Global Step: 160350 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:43:13,979-Speed 5973.68 samples/sec Loss 3.8635 LearningRate 0.0254 Epoch: 15 Global Step: 160360 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:43:20,837-Speed 5973.49 samples/sec Loss 3.9235 LearningRate 0.0254 Epoch: 15 Global Step: 160370 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:43:27,710-Speed 5960.23 samples/sec Loss 3.9401 LearningRate 0.0254 Epoch: 15 Global Step: 160380 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:43:34,587-Speed 5957.96 samples/sec Loss 3.8987 LearningRate 0.0254 Epoch: 15 Global Step: 160390 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:43:41,445-Speed 5973.50 samples/sec Loss 3.9046 LearningRate 0.0253 Epoch: 15 Global Step: 160400 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:43:48,313-Speed 5964.71 samples/sec Loss 3.8742 LearningRate 0.0253 Epoch: 15 Global Step: 160410 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:43:55,176-Speed 5969.11 samples/sec Loss 3.8891 LearningRate 0.0253 Epoch: 15 Global Step: 160420 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:44:02,113-Speed 5905.94 samples/sec Loss 3.9252 LearningRate 0.0253 Epoch: 15 Global Step: 160430 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:44:08,992-Speed 5955.26 samples/sec Loss 3.8977 LearningRate 0.0253 Epoch: 15 Global Step: 160440 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:44:15,926-Speed 5909.12 samples/sec Loss 3.9182 LearningRate 0.0253 Epoch: 15 Global Step: 160450 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:44:22,795-Speed 5964.39 samples/sec Loss 3.9057 LearningRate 0.0253 Epoch: 15 Global Step: 160460 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:44:29,649-Speed 5976.74 samples/sec Loss 3.9033 LearningRate 0.0253 Epoch: 15 Global Step: 160470 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:44:36,512-Speed 5972.55 samples/sec Loss 3.8720 LearningRate 0.0253 Epoch: 15 Global Step: 160480 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:44:43,378-Speed 5966.59 samples/sec Loss 3.8944 LearningRate 0.0252 Epoch: 15 Global Step: 160490 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:44:50,247-Speed 5964.36 samples/sec Loss 3.9439 LearningRate 0.0252 Epoch: 15 Global Step: 160500 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:44:57,125-Speed 5956.83 samples/sec Loss 3.8709 LearningRate 0.0252 Epoch: 15 Global Step: 160510 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:45:03,980-Speed 5976.53 samples/sec Loss 3.8792 LearningRate 0.0252 Epoch: 15 Global Step: 160520 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:45:10,835-Speed 5975.90 samples/sec Loss 3.9087 LearningRate 0.0252 Epoch: 15 Global Step: 160530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:45:17,742-Speed 5931.79 samples/sec Loss 3.8599 LearningRate 0.0252 Epoch: 15 Global Step: 160540 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:45:24,653-Speed 5928.05 samples/sec Loss 3.8920 LearningRate 0.0252 Epoch: 15 Global Step: 160550 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-09 03:45:31,572-Speed 5921.21 samples/sec Loss 3.8832 LearningRate 0.0252 Epoch: 15 Global Step: 160560 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:45:38,434-Speed 5970.29 samples/sec Loss 3.8521 LearningRate 0.0252 Epoch: 15 Global Step: 160570 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:45:45,287-Speed 5977.91 samples/sec Loss 3.9201 LearningRate 0.0251 Epoch: 15 Global Step: 160580 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:45:52,137-Speed 5980.58 samples/sec Loss 3.8693 LearningRate 0.0251 Epoch: 15 Global Step: 160590 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:45:59,007-Speed 5963.30 samples/sec Loss 3.8452 LearningRate 0.0251 Epoch: 15 Global Step: 160600 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:46:05,869-Speed 5970.28 samples/sec Loss 3.8779 LearningRate 0.0251 Epoch: 15 Global Step: 160610 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:46:12,733-Speed 5968.26 samples/sec Loss 3.9316 LearningRate 0.0251 Epoch: 15 Global Step: 160620 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:46:19,599-Speed 5966.80 samples/sec Loss 3.8480 LearningRate 0.0251 Epoch: 15 Global Step: 160630 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:46:26,474-Speed 5959.24 samples/sec Loss 3.9102 LearningRate 0.0251 Epoch: 15 Global Step: 160640 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:46:33,323-Speed 5981.72 samples/sec Loss 3.8838 LearningRate 0.0251 Epoch: 15 Global Step: 160650 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:46:40,234-Speed 5981.20 samples/sec Loss 3.9297 LearningRate 0.0251 Epoch: 15 Global Step: 160660 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:46:47,123-Speed 5946.33 samples/sec Loss 3.8548 LearningRate 0.0251 Epoch: 15 Global Step: 160670 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:46:54,007-Speed 5951.13 samples/sec Loss 3.8948 LearningRate 0.0250 Epoch: 15 Global Step: 160680 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:47:00,860-Speed 5978.17 samples/sec Loss 3.8923 LearningRate 0.0250 Epoch: 15 Global Step: 160690 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:47:07,752-Speed 5944.33 samples/sec Loss 3.8675 LearningRate 0.0250 Epoch: 15 Global Step: 160700 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:47:14,602-Speed 5981.48 samples/sec Loss 3.8327 LearningRate 0.0250 Epoch: 15 Global Step: 160710 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:47:21,449-Speed 5982.74 samples/sec Loss 3.8947 LearningRate 0.0250 Epoch: 15 Global Step: 160720 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:47:28,311-Speed 5970.97 samples/sec Loss 3.8833 LearningRate 0.0250 Epoch: 15 Global Step: 160730 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:47:35,181-Speed 5963.22 samples/sec Loss 3.8828 LearningRate 0.0250 Epoch: 15 Global Step: 160740 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:47:42,053-Speed 5961.74 samples/sec Loss 3.8897 LearningRate 0.0250 Epoch: 15 Global Step: 160750 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:47:48,924-Speed 5962.88 samples/sec Loss 3.8654 LearningRate 0.0250 Epoch: 15 Global Step: 160760 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:47:55,800-Speed 5957.29 samples/sec Loss 3.8716 LearningRate 0.0249 Epoch: 15 Global Step: 160770 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-09 03:48:02,642-Speed 5987.59 samples/sec Loss 3.8586 LearningRate 0.0249 Epoch: 15 Global Step: 160780 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:48:09,489-Speed 5983.63 samples/sec Loss 3.8902 LearningRate 0.0249 Epoch: 15 Global Step: 160790 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:48:16,355-Speed 5967.14 samples/sec Loss 3.8683 LearningRate 0.0249 Epoch: 15 Global Step: 160800 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:48:23,256-Speed 5936.05 samples/sec Loss 3.8559 LearningRate 0.0249 Epoch: 15 Global Step: 160810 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:48:30,120-Speed 5968.75 samples/sec Loss 3.8047 LearningRate 0.0249 Epoch: 15 Global Step: 160820 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:48:37,016-Speed 5941.25 samples/sec Loss 3.8719 LearningRate 0.0249 Epoch: 15 Global Step: 160830 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:48:43,881-Speed 5968.04 samples/sec Loss 3.8712 LearningRate 0.0249 Epoch: 15 Global Step: 160840 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:48:50,738-Speed 5976.89 samples/sec Loss 3.8853 LearningRate 0.0249 Epoch: 15 Global Step: 160850 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:48:57,603-Speed 5967.94 samples/sec Loss 3.8674 LearningRate 0.0248 Epoch: 15 Global Step: 160860 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:49:04,450-Speed 5982.84 samples/sec Loss 3.8773 LearningRate 0.0248 Epoch: 15 Global Step: 160870 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:49:11,333-Speed 5952.01 samples/sec Loss 3.8655 LearningRate 0.0248 Epoch: 15 Global Step: 160880 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-09 03:49:18,203-Speed 5963.92 samples/sec Loss 3.8616 LearningRate 0.0248 Epoch: 15 Global Step: 160890 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-09 03:49:25,082-Speed 5955.70 samples/sec Loss 3.8878 LearningRate 0.0248 Epoch: 15 Global Step: 160900 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-09 03:49:31,939-Speed 5974.13 samples/sec Loss 3.8246 LearningRate 0.0248 Epoch: 15 Global Step: 160910 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:49:38,792-Speed 5979.16 samples/sec Loss 3.8410 LearningRate 0.0248 Epoch: 15 Global Step: 160920 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:49:45,662-Speed 5963.44 samples/sec Loss 3.8304 LearningRate 0.0248 Epoch: 15 Global Step: 160930 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:49:52,524-Speed 5970.81 samples/sec Loss 3.8655 LearningRate 0.0248 Epoch: 15 Global Step: 160940 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:49:59,381-Speed 5974.23 samples/sec Loss 3.8304 LearningRate 0.0248 Epoch: 15 Global Step: 160950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:50:06,232-Speed 5981.29 samples/sec Loss 3.9205 LearningRate 0.0247 Epoch: 15 Global Step: 160960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:50:13,124-Speed 5944.77 samples/sec Loss 3.8802 LearningRate 0.0247 Epoch: 15 Global Step: 160970 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:50:19,993-Speed 5964.40 samples/sec Loss 3.8925 LearningRate 0.0247 Epoch: 15 Global Step: 160980 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:50:26,870-Speed 5957.00 samples/sec Loss 3.9054 LearningRate 0.0247 Epoch: 15 Global Step: 160990 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:50:33,734-Speed 5969.26 samples/sec Loss 3.8931 LearningRate 0.0247 Epoch: 15 Global Step: 161000 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:50:40,587-Speed 5978.52 samples/sec Loss 3.8777 LearningRate 0.0247 Epoch: 15 Global Step: 161010 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-09 03:50:47,455-Speed 5964.50 samples/sec Loss 3.8469 LearningRate 0.0247 Epoch: 15 Global Step: 161020 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-09 03:50:54,330-Speed 5959.22 samples/sec Loss 3.8822 LearningRate 0.0247 Epoch: 15 Global Step: 161030 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-09 03:51:01,175-Speed 5985.59 samples/sec Loss 3.8475 LearningRate 0.0247 Epoch: 15 Global Step: 161040 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:51:08,075-Speed 5937.69 samples/sec Loss 3.8439 LearningRate 0.0246 Epoch: 15 Global Step: 161050 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:51:14,933-Speed 5973.39 samples/sec Loss 3.8750 LearningRate 0.0246 Epoch: 15 Global Step: 161060 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:51:21,803-Speed 5974.75 samples/sec Loss 3.8493 LearningRate 0.0246 Epoch: 15 Global Step: 161070 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:51:28,664-Speed 5971.55 samples/sec Loss 3.8354 LearningRate 0.0246 Epoch: 15 Global Step: 161080 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:51:35,540-Speed 5958.27 samples/sec Loss 3.8548 LearningRate 0.0246 Epoch: 15 Global Step: 161090 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:51:42,404-Speed 5968.46 samples/sec Loss 3.8336 LearningRate 0.0246 Epoch: 15 Global Step: 161100 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:51:49,286-Speed 5952.13 samples/sec Loss 3.9043 LearningRate 0.0246 Epoch: 15 Global Step: 161110 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:51:56,141-Speed 5976.18 samples/sec Loss 3.8318 LearningRate 0.0246 Epoch: 15 Global Step: 161120 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:52:02,997-Speed 5976.18 samples/sec Loss 3.8756 LearningRate 0.0246 Epoch: 15 Global Step: 161130 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:52:09,848-Speed 5979.44 samples/sec Loss 3.8341 LearningRate 0.0246 Epoch: 15 Global Step: 161140 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-09 03:52:16,687-Speed 5990.54 samples/sec Loss 3.8575 LearningRate 0.0245 Epoch: 15 Global Step: 161150 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:52:23,556-Speed 5964.29 samples/sec Loss 3.8679 LearningRate 0.0245 Epoch: 15 Global Step: 161160 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:52:30,403-Speed 5982.82 samples/sec Loss 3.8330 LearningRate 0.0245 Epoch: 15 Global Step: 161170 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:52:37,303-Speed 5937.08 samples/sec Loss 3.8917 LearningRate 0.0245 Epoch: 15 Global Step: 161180 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:52:44,186-Speed 5952.94 samples/sec Loss 3.8415 LearningRate 0.0245 Epoch: 15 Global Step: 161190 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:52:51,040-Speed 5977.00 samples/sec Loss 3.8887 LearningRate 0.0245 Epoch: 15 Global Step: 161200 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:52:57,919-Speed 5956.11 samples/sec Loss 3.8462 LearningRate 0.0245 Epoch: 15 Global Step: 161210 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:53:04,775-Speed 5975.24 samples/sec Loss 3.8378 LearningRate 0.0245 Epoch: 15 Global Step: 161220 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:53:11,648-Speed 5960.48 samples/sec Loss 3.8309 LearningRate 0.0245 Epoch: 15 Global Step: 161230 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:53:18,503-Speed 5976.78 samples/sec Loss 3.8481 LearningRate 0.0244 Epoch: 15 Global Step: 161240 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 03:53:25,362-Speed 5972.77 samples/sec Loss 3.8836 LearningRate 0.0244 Epoch: 15 Global Step: 161250 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:53:32,244-Speed 5953.43 samples/sec Loss 3.8306 LearningRate 0.0244 Epoch: 15 Global Step: 161260 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:53:39,171-Speed 5914.36 samples/sec Loss 3.8154 LearningRate 0.0244 Epoch: 15 Global Step: 161270 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:53:46,072-Speed 5937.35 samples/sec Loss 3.7909 LearningRate 0.0244 Epoch: 15 Global Step: 161280 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:53:53,070-Speed 5853.70 samples/sec Loss 3.8438 LearningRate 0.0244 Epoch: 15 Global Step: 161290 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:53:59,924-Speed 5977.84 samples/sec Loss 3.8381 LearningRate 0.0244 Epoch: 15 Global Step: 161300 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:54:06,795-Speed 5962.81 samples/sec Loss 3.8298 LearningRate 0.0244 Epoch: 15 Global Step: 161310 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:54:13,654-Speed 5972.71 samples/sec Loss 3.8685 LearningRate 0.0244 Epoch: 15 Global Step: 161320 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:54:20,518-Speed 5968.59 samples/sec Loss 3.8274 LearningRate 0.0244 Epoch: 15 Global Step: 161330 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:54:27,392-Speed 5959.82 samples/sec Loss 3.8739 LearningRate 0.0243 Epoch: 15 Global Step: 161340 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:54:34,250-Speed 5973.95 samples/sec Loss 3.8120 LearningRate 0.0243 Epoch: 15 Global Step: 161350 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-09 03:54:41,094-Speed 5986.23 samples/sec Loss 3.8563 LearningRate 0.0243 Epoch: 15 Global Step: 161360 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:54:47,952-Speed 5973.90 samples/sec Loss 3.8206 LearningRate 0.0243 Epoch: 15 Global Step: 161370 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:54:54,801-Speed 5981.42 samples/sec Loss 3.8632 LearningRate 0.0243 Epoch: 15 Global Step: 161380 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:55:01,656-Speed 5976.25 samples/sec Loss 3.7954 LearningRate 0.0243 Epoch: 15 Global Step: 161390 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:55:08,504-Speed 5982.55 samples/sec Loss 3.8017 LearningRate 0.0243 Epoch: 15 Global Step: 161400 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:55:15,352-Speed 5982.50 samples/sec Loss 3.7989 LearningRate 0.0243 Epoch: 15 Global Step: 161410 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:55:22,229-Speed 5956.98 samples/sec Loss 3.8013 LearningRate 0.0243 Epoch: 15 Global Step: 161420 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:55:29,117-Speed 5950.27 samples/sec Loss 3.8023 LearningRate 0.0242 Epoch: 15 Global Step: 161430 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:55:35,980-Speed 5969.07 samples/sec Loss 3.7937 LearningRate 0.0242 Epoch: 15 Global Step: 161440 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:55:42,833-Speed 5977.95 samples/sec Loss 3.8277 LearningRate 0.0242 Epoch: 15 Global Step: 161450 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:55:49,696-Speed 5969.16 samples/sec Loss 3.8085 LearningRate 0.0242 Epoch: 15 Global Step: 161460 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-09 03:55:56,542-Speed 5984.49 samples/sec Loss 3.8205 LearningRate 0.0242 Epoch: 15 Global Step: 161470 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:56:03,408-Speed 5968.00 samples/sec Loss 3.8608 LearningRate 0.0242 Epoch: 15 Global Step: 161480 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:56:10,272-Speed 5968.40 samples/sec Loss 3.8165 LearningRate 0.0242 Epoch: 15 Global Step: 161490 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:56:17,119-Speed 5982.69 samples/sec Loss 3.8429 LearningRate 0.0242 Epoch: 15 Global Step: 161500 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:56:23,985-Speed 5967.39 samples/sec Loss 3.8467 LearningRate 0.0242 Epoch: 15 Global Step: 161510 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:56:30,844-Speed 5974.05 samples/sec Loss 3.8275 LearningRate 0.0241 Epoch: 15 Global Step: 161520 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:56:37,692-Speed 5981.53 samples/sec Loss 3.8060 LearningRate 0.0241 Epoch: 15 Global Step: 161530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:56:44,555-Speed 5969.80 samples/sec Loss 3.7985 LearningRate 0.0241 Epoch: 15 Global Step: 161540 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:56:51,412-Speed 5974.55 samples/sec Loss 3.7725 LearningRate 0.0241 Epoch: 15 Global Step: 161550 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:56:58,267-Speed 5976.50 samples/sec Loss 3.8091 LearningRate 0.0241 Epoch: 15 Global Step: 161560 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:57:05,135-Speed 5965.10 samples/sec Loss 3.8652 LearningRate 0.0241 Epoch: 15 Global Step: 161570 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:57:12,014-Speed 5956.48 samples/sec Loss 3.7934 LearningRate 0.0241 Epoch: 15 Global Step: 161580 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:57:18,912-Speed 5938.76 samples/sec Loss 3.7746 LearningRate 0.0241 Epoch: 15 Global Step: 161590 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:57:25,763-Speed 5981.59 samples/sec Loss 3.8078 LearningRate 0.0241 Epoch: 15 Global Step: 161600 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:57:32,624-Speed 5971.52 samples/sec Loss 3.7965 LearningRate 0.0241 Epoch: 15 Global Step: 161610 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:57:39,499-Speed 5958.29 samples/sec Loss 3.8135 LearningRate 0.0240 Epoch: 15 Global Step: 161620 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:57:46,360-Speed 5972.15 samples/sec Loss 3.7889 LearningRate 0.0240 Epoch: 15 Global Step: 161630 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:57:53,203-Speed 5986.75 samples/sec Loss 3.7863 LearningRate 0.0240 Epoch: 15 Global Step: 161640 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:58:00,053-Speed 5980.28 samples/sec Loss 3.8420 LearningRate 0.0240 Epoch: 15 Global Step: 161650 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:58:06,900-Speed 5983.17 samples/sec Loss 3.8161 LearningRate 0.0240 Epoch: 15 Global Step: 161660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:58:13,763-Speed 5973.32 samples/sec Loss 3.7994 LearningRate 0.0240 Epoch: 15 Global Step: 161670 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-09 03:58:20,632-Speed 5964.48 samples/sec Loss 3.7963 LearningRate 0.0240 Epoch: 15 Global Step: 161680 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-09 03:58:27,489-Speed 5974.34 samples/sec Loss 3.7710 LearningRate 0.0240 Epoch: 15 Global Step: 161690 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:58:34,381-Speed 5945.85 samples/sec Loss 3.8054 LearningRate 0.0240 Epoch: 15 Global Step: 161700 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:58:41,247-Speed 5967.21 samples/sec Loss 3.8476 LearningRate 0.0239 Epoch: 15 Global Step: 161710 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:58:48,105-Speed 5973.57 samples/sec Loss 3.8201 LearningRate 0.0239 Epoch: 15 Global Step: 161720 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:58:54,967-Speed 5970.12 samples/sec Loss 3.7801 LearningRate 0.0239 Epoch: 15 Global Step: 161730 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:59:01,820-Speed 5977.45 samples/sec Loss 3.8043 LearningRate 0.0239 Epoch: 15 Global Step: 161740 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:59:08,688-Speed 5965.57 samples/sec Loss 3.8512 LearningRate 0.0239 Epoch: 15 Global Step: 161750 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:59:15,550-Speed 5971.65 samples/sec Loss 3.7757 LearningRate 0.0239 Epoch: 15 Global Step: 161760 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:59:22,416-Speed 5966.38 samples/sec Loss 3.7757 LearningRate 0.0239 Epoch: 15 Global Step: 161770 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:59:29,286-Speed 5963.85 samples/sec Loss 3.7362 LearningRate 0.0239 Epoch: 15 Global Step: 161780 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:59:36,134-Speed 5983.94 samples/sec Loss 3.7731 LearningRate 0.0239 Epoch: 15 Global Step: 161790 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:59:43,014-Speed 5955.29 samples/sec Loss 3.7982 LearningRate 0.0239 Epoch: 15 Global Step: 161800 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:59:49,873-Speed 5972.54 samples/sec Loss 3.8229 LearningRate 0.0238 Epoch: 15 Global Step: 161810 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 03:59:56,732-Speed 5972.58 samples/sec Loss 3.7989 LearningRate 0.0238 Epoch: 15 Global Step: 161820 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:00:03,580-Speed 5982.24 samples/sec Loss 3.7659 LearningRate 0.0238 Epoch: 15 Global Step: 161830 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:00:10,432-Speed 5979.83 samples/sec Loss 3.8666 LearningRate 0.0238 Epoch: 15 Global Step: 161840 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:00:17,306-Speed 5959.50 samples/sec Loss 3.7937 LearningRate 0.0238 Epoch: 15 Global Step: 161850 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:00:24,156-Speed 5980.90 samples/sec Loss 3.7896 LearningRate 0.0238 Epoch: 15 Global Step: 161860 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:00:31,006-Speed 5981.11 samples/sec Loss 3.8608 LearningRate 0.0238 Epoch: 15 Global Step: 161870 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:00:37,864-Speed 5973.21 samples/sec Loss 3.8060 LearningRate 0.0238 Epoch: 15 Global Step: 161880 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:00:44,721-Speed 5974.51 samples/sec Loss 3.7719 LearningRate 0.0238 Epoch: 15 Global Step: 161890 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-09 04:00:51,575-Speed 5977.30 samples/sec Loss 3.8236 LearningRate 0.0238 Epoch: 15 Global Step: 161900 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-09 04:00:58,432-Speed 5974.61 samples/sec Loss 3.8335 LearningRate 0.0237 Epoch: 15 Global Step: 161910 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-09 04:01:05,278-Speed 5984.16 samples/sec Loss 3.7907 LearningRate 0.0237 Epoch: 15 Global Step: 161920 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:01:12,122-Speed 5985.35 samples/sec Loss 3.8050 LearningRate 0.0237 Epoch: 15 Global Step: 161930 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:01:18,988-Speed 5967.53 samples/sec Loss 3.8173 LearningRate 0.0237 Epoch: 15 Global Step: 161940 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:01:25,860-Speed 5960.60 samples/sec Loss 3.8109 LearningRate 0.0237 Epoch: 15 Global Step: 161950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:01:32,711-Speed 5980.52 samples/sec Loss 3.8101 LearningRate 0.0237 Epoch: 15 Global Step: 161960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:01:39,561-Speed 5981.47 samples/sec Loss 3.7712 LearningRate 0.0237 Epoch: 15 Global Step: 161970 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:01:46,420-Speed 5971.81 samples/sec Loss 3.7682 LearningRate 0.0237 Epoch: 15 Global Step: 161980 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:01:53,282-Speed 5970.78 samples/sec Loss 3.7413 LearningRate 0.0237 Epoch: 15 Global Step: 161990 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:02:00,142-Speed 5972.16 samples/sec Loss 3.8140 LearningRate 0.0236 Epoch: 15 Global Step: 162000 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:02:07,002-Speed 5971.69 samples/sec Loss 3.8019 LearningRate 0.0236 Epoch: 15 Global Step: 162010 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:02:13,859-Speed 5974.84 samples/sec Loss 3.8307 LearningRate 0.0236 Epoch: 15 Global Step: 162020 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:02:20,731-Speed 5962.09 samples/sec Loss 3.7990 LearningRate 0.0236 Epoch: 15 Global Step: 162030 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:02:27,580-Speed 5981.54 samples/sec Loss 3.7305 LearningRate 0.0236 Epoch: 15 Global Step: 162040 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:02:35,125-Speed 5429.48 samples/sec Loss 3.7962 LearningRate 0.0236 Epoch: 15 Global Step: 162050 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:02:42,006-Speed 5954.75 samples/sec Loss 3.8318 LearningRate 0.0236 Epoch: 15 Global Step: 162060 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:02:48,878-Speed 5960.82 samples/sec Loss 3.7764 LearningRate 0.0236 Epoch: 15 Global Step: 162070 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:02:55,740-Speed 5971.05 samples/sec Loss 3.8000 LearningRate 0.0236 Epoch: 15 Global Step: 162080 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:03:02,597-Speed 5974.08 samples/sec Loss 3.7649 LearningRate 0.0236 Epoch: 15 Global Step: 162090 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:03:09,461-Speed 5968.51 samples/sec Loss 3.7704 LearningRate 0.0235 Epoch: 15 Global Step: 162100 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:03:16,308-Speed 5983.04 samples/sec Loss 3.8143 LearningRate 0.0235 Epoch: 15 Global Step: 162110 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:03:23,171-Speed 5970.58 samples/sec Loss 3.7852 LearningRate 0.0235 Epoch: 15 Global Step: 162120 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-09 04:03:30,031-Speed 5971.24 samples/sec Loss 3.7782 LearningRate 0.0235 Epoch: 15 Global Step: 162130 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-09 04:03:36,893-Speed 5970.98 samples/sec Loss 3.8286 LearningRate 0.0235 Epoch: 15 Global Step: 162140 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-09 04:03:43,768-Speed 5958.90 samples/sec Loss 3.8138 LearningRate 0.0235 Epoch: 15 Global Step: 162150 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-09 04:03:50,632-Speed 5968.25 samples/sec Loss 3.7813 LearningRate 0.0235 Epoch: 15 Global Step: 162160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:03:57,507-Speed 5959.23 samples/sec Loss 3.8111 LearningRate 0.0235 Epoch: 15 Global Step: 162170 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:04:04,363-Speed 5975.75 samples/sec Loss 3.7501 LearningRate 0.0235 Epoch: 15 Global Step: 162180 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:04:11,243-Speed 5954.64 samples/sec Loss 3.7411 LearningRate 0.0234 Epoch: 15 Global Step: 162190 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:04:18,096-Speed 5978.30 samples/sec Loss 3.7663 LearningRate 0.0234 Epoch: 15 Global Step: 162200 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:04:24,950-Speed 5977.18 samples/sec Loss 3.7889 LearningRate 0.0234 Epoch: 15 Global Step: 162210 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:04:31,815-Speed 5967.17 samples/sec Loss 3.8013 LearningRate 0.0234 Epoch: 15 Global Step: 162220 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:04:38,668-Speed 5978.44 samples/sec Loss 3.7840 LearningRate 0.0234 Epoch: 15 Global Step: 162230 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:04:45,563-Speed 5943.14 samples/sec Loss 3.7998 LearningRate 0.0234 Epoch: 15 Global Step: 162240 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:04:52,425-Speed 5970.21 samples/sec Loss 3.8083 LearningRate 0.0234 Epoch: 15 Global Step: 162250 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:04:59,287-Speed 5970.02 samples/sec Loss 3.7349 LearningRate 0.0234 Epoch: 15 Global Step: 162260 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:05:06,138-Speed 5982.15 samples/sec Loss 3.7786 LearningRate 0.0234 Epoch: 15 Global Step: 162270 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:05:13,006-Speed 5965.38 samples/sec Loss 3.8179 LearningRate 0.0234 Epoch: 15 Global Step: 162280 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:05:19,858-Speed 5978.93 samples/sec Loss 3.7523 LearningRate 0.0233 Epoch: 15 Global Step: 162290 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:05:26,719-Speed 5971.65 samples/sec Loss 3.7988 LearningRate 0.0233 Epoch: 15 Global Step: 162300 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:05:33,580-Speed 5970.39 samples/sec Loss 3.7215 LearningRate 0.0233 Epoch: 15 Global Step: 162310 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:05:40,441-Speed 5971.89 samples/sec Loss 3.7502 LearningRate 0.0233 Epoch: 15 Global Step: 162320 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:05:47,294-Speed 5977.81 samples/sec Loss 3.7896 LearningRate 0.0233 Epoch: 15 Global Step: 162330 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:05:54,146-Speed 5979.18 samples/sec Loss 3.7578 LearningRate 0.0233 Epoch: 15 Global Step: 162340 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:06:01,008-Speed 5970.55 samples/sec Loss 3.7476 LearningRate 0.0233 Epoch: 15 Global Step: 162350 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:06:07,858-Speed 5980.99 samples/sec Loss 3.7854 LearningRate 0.0233 Epoch: 15 Global Step: 162360 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-09 04:06:14,721-Speed 5971.53 samples/sec Loss 3.7791 LearningRate 0.0233 Epoch: 15 Global Step: 162370 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-09 04:06:21,594-Speed 5960.72 samples/sec Loss 3.7679 LearningRate 0.0233 Epoch: 15 Global Step: 162380 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:06:28,468-Speed 5960.48 samples/sec Loss 3.7580 LearningRate 0.0232 Epoch: 15 Global Step: 162390 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:06:35,358-Speed 5945.30 samples/sec Loss 3.7506 LearningRate 0.0232 Epoch: 15 Global Step: 162400 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:06:42,227-Speed 5966.10 samples/sec Loss 3.7728 LearningRate 0.0232 Epoch: 15 Global Step: 162410 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:06:49,080-Speed 5978.57 samples/sec Loss 3.7359 LearningRate 0.0232 Epoch: 15 Global Step: 162420 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:06:55,934-Speed 5976.78 samples/sec Loss 3.8314 LearningRate 0.0232 Epoch: 15 Global Step: 162430 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:07:02,780-Speed 5983.64 samples/sec Loss 3.7746 LearningRate 0.0232 Epoch: 15 Global Step: 162440 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:07:09,648-Speed 5966.05 samples/sec Loss 3.7056 LearningRate 0.0232 Epoch: 15 Global Step: 162450 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:07:16,507-Speed 5972.22 samples/sec Loss 3.6958 LearningRate 0.0232 Epoch: 15 Global Step: 162460 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:07:23,380-Speed 5961.35 samples/sec Loss 3.7511 LearningRate 0.0232 Epoch: 15 Global Step: 162470 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:07:30,246-Speed 5968.21 samples/sec Loss 3.7524 LearningRate 0.0231 Epoch: 15 Global Step: 162480 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:07:37,116-Speed 5962.83 samples/sec Loss 3.7733 LearningRate 0.0231 Epoch: 15 Global Step: 162490 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:07:43,970-Speed 5977.13 samples/sec Loss 3.7773 LearningRate 0.0231 Epoch: 15 Global Step: 162500 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:07:50,830-Speed 5972.49 samples/sec Loss 3.6699 LearningRate 0.0231 Epoch: 15 Global Step: 162510 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:07:57,680-Speed 5983.21 samples/sec Loss 3.7600 LearningRate 0.0231 Epoch: 15 Global Step: 162520 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:08:04,529-Speed 5980.97 samples/sec Loss 3.7531 LearningRate 0.0231 Epoch: 15 Global Step: 162530 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:08:11,397-Speed 5965.67 samples/sec Loss 3.7425 LearningRate 0.0231 Epoch: 15 Global Step: 162540 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:08:18,265-Speed 5964.46 samples/sec Loss 3.7922 LearningRate 0.0231 Epoch: 15 Global Step: 162550 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:08:25,112-Speed 5983.60 samples/sec Loss 3.7219 LearningRate 0.0231 Epoch: 15 Global Step: 162560 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:08:31,972-Speed 5972.46 samples/sec Loss 3.7321 LearningRate 0.0231 Epoch: 15 Global Step: 162570 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:08:38,823-Speed 5979.19 samples/sec Loss 3.7492 LearningRate 0.0230 Epoch: 15 Global Step: 162580 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:08:45,717-Speed 5942.65 samples/sec Loss 3.7348 LearningRate 0.0230 Epoch: 15 Global Step: 162590 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:08:52,580-Speed 5969.16 samples/sec Loss 3.7224 LearningRate 0.0230 Epoch: 15 Global Step: 162600 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:08:59,458-Speed 5957.03 samples/sec Loss 3.7311 LearningRate 0.0230 Epoch: 15 Global Step: 162610 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:09:06,336-Speed 5956.04 samples/sec Loss 3.7297 LearningRate 0.0230 Epoch: 15 Global Step: 162620 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:09:13,231-Speed 5942.10 samples/sec Loss 3.7321 LearningRate 0.0230 Epoch: 15 Global Step: 162630 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:09:20,189-Speed 5888.30 samples/sec Loss 3.7516 LearningRate 0.0230 Epoch: 15 Global Step: 162640 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:09:27,034-Speed 5984.93 samples/sec Loss 3.7576 LearningRate 0.0230 Epoch: 15 Global Step: 162650 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:09:33,926-Speed 5944.93 samples/sec Loss 3.7620 LearningRate 0.0230 Epoch: 15 Global Step: 162660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:09:40,808-Speed 5952.68 samples/sec Loss 3.7090 LearningRate 0.0230 Epoch: 15 Global Step: 162670 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:09:47,689-Speed 5953.88 samples/sec Loss 3.7553 LearningRate 0.0229 Epoch: 15 Global Step: 162680 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:09:54,588-Speed 5938.26 samples/sec Loss 3.7280 LearningRate 0.0229 Epoch: 15 Global Step: 162690 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:10:01,450-Speed 5970.59 samples/sec Loss 3.7295 LearningRate 0.0229 Epoch: 15 Global Step: 162700 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:10:08,324-Speed 5959.79 samples/sec Loss 3.7655 LearningRate 0.0229 Epoch: 15 Global Step: 162710 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:10:15,178-Speed 5978.75 samples/sec Loss 3.7398 LearningRate 0.0229 Epoch: 15 Global Step: 162720 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:10:22,042-Speed 5967.62 samples/sec Loss 3.7195 LearningRate 0.0229 Epoch: 15 Global Step: 162730 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:10:28,915-Speed 5960.80 samples/sec Loss 3.7708 LearningRate 0.0229 Epoch: 15 Global Step: 162740 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:10:35,764-Speed 5982.40 samples/sec Loss 3.7147 LearningRate 0.0229 Epoch: 15 Global Step: 162750 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:10:42,616-Speed 5978.59 samples/sec Loss 3.7315 LearningRate 0.0229 Epoch: 15 Global Step: 162760 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:10:49,465-Speed 5981.83 samples/sec Loss 3.7939 LearningRate 0.0229 Epoch: 15 Global Step: 162770 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:10:56,319-Speed 5977.09 samples/sec Loss 3.7461 LearningRate 0.0228 Epoch: 15 Global Step: 162780 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:11:03,174-Speed 5976.23 samples/sec Loss 3.7012 LearningRate 0.0228 Epoch: 15 Global Step: 162790 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:11:10,073-Speed 5939.09 samples/sec Loss 3.7227 LearningRate 0.0228 Epoch: 15 Global Step: 162800 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:11:16,948-Speed 5958.66 samples/sec Loss 3.7279 LearningRate 0.0228 Epoch: 15 Global Step: 162810 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:11:23,798-Speed 5982.18 samples/sec Loss 3.7738 LearningRate 0.0228 Epoch: 15 Global Step: 162820 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:11:30,678-Speed 5954.66 samples/sec Loss 3.6783 LearningRate 0.0228 Epoch: 15 Global Step: 162830 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:11:37,571-Speed 5944.37 samples/sec Loss 3.6806 LearningRate 0.0228 Epoch: 15 Global Step: 162840 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:11:44,439-Speed 5964.52 samples/sec Loss 3.7133 LearningRate 0.0228 Epoch: 15 Global Step: 162850 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:11:51,303-Speed 5969.13 samples/sec Loss 3.7471 LearningRate 0.0228 Epoch: 15 Global Step: 162860 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:11:58,160-Speed 5975.00 samples/sec Loss 3.6959 LearningRate 0.0227 Epoch: 15 Global Step: 162870 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:12:05,026-Speed 5966.00 samples/sec Loss 3.7349 LearningRate 0.0227 Epoch: 15 Global Step: 162880 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:12:11,889-Speed 5969.94 samples/sec Loss 3.7547 LearningRate 0.0227 Epoch: 15 Global Step: 162890 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:12:18,772-Speed 5951.73 samples/sec Loss 3.7508 LearningRate 0.0227 Epoch: 15 Global Step: 162900 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:12:25,649-Speed 5956.73 samples/sec Loss 3.7005 LearningRate 0.0227 Epoch: 15 Global Step: 162910 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:12:32,556-Speed 5931.73 samples/sec Loss 3.7185 LearningRate 0.0227 Epoch: 15 Global Step: 162920 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:12:39,417-Speed 5971.57 samples/sec Loss 3.7148 LearningRate 0.0227 Epoch: 15 Global Step: 162930 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:12:46,269-Speed 5979.53 samples/sec Loss 3.6983 LearningRate 0.0227 Epoch: 15 Global Step: 162940 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:12:53,123-Speed 5976.38 samples/sec Loss 3.6911 LearningRate 0.0227 Epoch: 15 Global Step: 162950 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:12:59,974-Speed 5980.33 samples/sec Loss 3.6983 LearningRate 0.0227 Epoch: 15 Global Step: 162960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:13:06,821-Speed 5982.62 samples/sec Loss 3.7280 LearningRate 0.0226 Epoch: 15 Global Step: 162970 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:13:13,695-Speed 5959.99 samples/sec Loss 3.6848 LearningRate 0.0226 Epoch: 15 Global Step: 162980 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:13:20,568-Speed 5961.12 samples/sec Loss 3.7323 LearningRate 0.0226 Epoch: 15 Global Step: 162990 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:13:27,459-Speed 5945.25 samples/sec Loss 3.7557 LearningRate 0.0226 Epoch: 15 Global Step: 163000 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:13:34,344-Speed 5950.36 samples/sec Loss 3.7241 LearningRate 0.0226 Epoch: 15 Global Step: 163010 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:13:41,212-Speed 5965.68 samples/sec Loss 3.6835 LearningRate 0.0226 Epoch: 15 Global Step: 163020 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:13:48,072-Speed 5971.47 samples/sec Loss 3.6765 LearningRate 0.0226 Epoch: 15 Global Step: 163030 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:13:54,923-Speed 5980.54 samples/sec Loss 3.7507 LearningRate 0.0226 Epoch: 15 Global Step: 163040 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:14:01,796-Speed 5960.39 samples/sec Loss 3.6840 LearningRate 0.0226 Epoch: 15 Global Step: 163050 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:14:08,662-Speed 5966.85 samples/sec Loss 3.7067 LearningRate 0.0226 Epoch: 15 Global Step: 163060 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-09 04:14:15,515-Speed 5978.52 samples/sec Loss 3.7190 LearningRate 0.0225 Epoch: 15 Global Step: 163070 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-09 04:14:22,382-Speed 5965.78 samples/sec Loss 3.7497 LearningRate 0.0225 Epoch: 15 Global Step: 163080 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:14:29,271-Speed 5946.62 samples/sec Loss 3.7232 LearningRate 0.0225 Epoch: 15 Global Step: 163090 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:14:36,142-Speed 5963.41 samples/sec Loss 3.7629 LearningRate 0.0225 Epoch: 15 Global Step: 163100 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:14:43,030-Speed 5949.38 samples/sec Loss 3.7304 LearningRate 0.0225 Epoch: 15 Global Step: 163110 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:14:49,884-Speed 5977.08 samples/sec Loss 3.6703 LearningRate 0.0225 Epoch: 15 Global Step: 163120 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:14:56,738-Speed 5976.97 samples/sec Loss 3.7488 LearningRate 0.0225 Epoch: 15 Global Step: 163130 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:15:03,589-Speed 5979.90 samples/sec Loss 3.7324 LearningRate 0.0225 Epoch: 15 Global Step: 163140 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:15:10,435-Speed 5983.71 samples/sec Loss 3.6812 LearningRate 0.0225 Epoch: 15 Global Step: 163150 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:15:17,274-Speed 5991.02 samples/sec Loss 3.7158 LearningRate 0.0225 Epoch: 15 Global Step: 163160 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:15:24,165-Speed 5947.25 samples/sec Loss 3.6945 LearningRate 0.0224 Epoch: 15 Global Step: 163170 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:15:31,033-Speed 5964.70 samples/sec Loss 3.6837 LearningRate 0.0224 Epoch: 15 Global Step: 163180 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:15:37,899-Speed 5967.06 samples/sec Loss 3.6917 LearningRate 0.0224 Epoch: 15 Global Step: 163190 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:15:44,745-Speed 5983.88 samples/sec Loss 3.7098 LearningRate 0.0224 Epoch: 15 Global Step: 163200 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:15:51,590-Speed 5985.16 samples/sec Loss 3.7244 LearningRate 0.0224 Epoch: 15 Global Step: 163210 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:15:58,438-Speed 5983.18 samples/sec Loss 3.7073 LearningRate 0.0224 Epoch: 15 Global Step: 163220 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:16:05,330-Speed 5944.57 samples/sec Loss 3.6841 LearningRate 0.0224 Epoch: 15 Global Step: 163230 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:16:12,212-Speed 5952.83 samples/sec Loss 3.6911 LearningRate 0.0224 Epoch: 15 Global Step: 163240 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:16:19,135-Speed 5917.10 samples/sec Loss 3.7151 LearningRate 0.0224 Epoch: 15 Global Step: 163250 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:16:26,002-Speed 5966.06 samples/sec Loss 3.6821 LearningRate 0.0224 Epoch: 15 Global Step: 163260 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:16:32,857-Speed 5977.05 samples/sec Loss 3.6845 LearningRate 0.0223 Epoch: 15 Global Step: 163270 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:16:39,727-Speed 5963.25 samples/sec Loss 3.7027 LearningRate 0.0223 Epoch: 15 Global Step: 163280 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:16:46,593-Speed 5967.23 samples/sec Loss 3.7439 LearningRate 0.0223 Epoch: 15 Global Step: 163290 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:16:53,485-Speed 5943.78 samples/sec Loss 3.6947 LearningRate 0.0223 Epoch: 15 Global Step: 163300 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:17:00,330-Speed 5985.83 samples/sec Loss 3.6863 LearningRate 0.0223 Epoch: 15 Global Step: 163310 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:17:07,172-Speed 5988.64 samples/sec Loss 3.6987 LearningRate 0.0223 Epoch: 15 Global Step: 163320 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:17:14,062-Speed 5946.15 samples/sec Loss 3.7061 LearningRate 0.0223 Epoch: 15 Global Step: 163330 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:17:20,952-Speed 5946.55 samples/sec Loss 3.7412 LearningRate 0.0223 Epoch: 15 Global Step: 163340 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:17:27,811-Speed 5972.64 samples/sec Loss 3.6593 LearningRate 0.0223 Epoch: 15 Global Step: 163350 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:17:34,666-Speed 5976.43 samples/sec Loss 3.6663 LearningRate 0.0223 Epoch: 15 Global Step: 163360 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-09 04:17:41,518-Speed 5980.37 samples/sec Loss 3.6744 LearningRate 0.0222 Epoch: 15 Global Step: 163370 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:17:48,405-Speed 5949.24 samples/sec Loss 3.6393 LearningRate 0.0222 Epoch: 15 Global Step: 163380 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:17:55,272-Speed 5965.04 samples/sec Loss 3.6952 LearningRate 0.0222 Epoch: 15 Global Step: 163390 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:18:02,130-Speed 5974.29 samples/sec Loss 3.6690 LearningRate 0.0222 Epoch: 15 Global Step: 163400 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:18:08,992-Speed 5971.80 samples/sec Loss 3.6507 LearningRate 0.0222 Epoch: 15 Global Step: 163410 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:18:15,839-Speed 5983.43 samples/sec Loss 3.6775 LearningRate 0.0222 Epoch: 15 Global Step: 163420 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:18:22,692-Speed 5978.57 samples/sec Loss 3.6860 LearningRate 0.0222 Epoch: 15 Global Step: 163430 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:18:29,553-Speed 5971.79 samples/sec Loss 3.6412 LearningRate 0.0222 Epoch: 15 Global Step: 163440 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:18:36,403-Speed 5980.20 samples/sec Loss 3.6981 LearningRate 0.0222 Epoch: 15 Global Step: 163450 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:18:43,286-Speed 5954.22 samples/sec Loss 3.7058 LearningRate 0.0221 Epoch: 15 Global Step: 163460 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:18:50,151-Speed 5967.80 samples/sec Loss 3.6506 LearningRate 0.0221 Epoch: 15 Global Step: 163470 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:18:56,995-Speed 5985.99 samples/sec Loss 3.6942 LearningRate 0.0221 Epoch: 15 Global Step: 163480 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:19:03,843-Speed 5984.25 samples/sec Loss 3.6626 LearningRate 0.0221 Epoch: 15 Global Step: 163490 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:19:10,707-Speed 5969.07 samples/sec Loss 3.6379 LearningRate 0.0221 Epoch: 15 Global Step: 163500 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:19:17,568-Speed 5970.79 samples/sec Loss 3.7120 LearningRate 0.0221 Epoch: 15 Global Step: 163510 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:19:24,443-Speed 5958.97 samples/sec Loss 3.7026 LearningRate 0.0221 Epoch: 15 Global Step: 163520 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:19:31,344-Speed 5937.26 samples/sec Loss 3.7152 LearningRate 0.0221 Epoch: 15 Global Step: 163530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:19:38,201-Speed 5974.02 samples/sec Loss 3.6631 LearningRate 0.0221 Epoch: 15 Global Step: 163540 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:19:45,055-Speed 5979.52 samples/sec Loss 3.6909 LearningRate 0.0221 Epoch: 15 Global Step: 163550 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:19:51,903-Speed 5982.54 samples/sec Loss 3.6830 LearningRate 0.0220 Epoch: 15 Global Step: 163560 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:19:58,767-Speed 5968.12 samples/sec Loss 3.7119 LearningRate 0.0220 Epoch: 15 Global Step: 163570 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:20:05,636-Speed 5965.33 samples/sec Loss 3.6688 LearningRate 0.0220 Epoch: 15 Global Step: 163580 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:20:12,499-Speed 5971.50 samples/sec Loss 3.6704 LearningRate 0.0220 Epoch: 15 Global Step: 163590 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:20:19,352-Speed 5977.94 samples/sec Loss 3.6506 LearningRate 0.0220 Epoch: 15 Global Step: 163600 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:20:26,216-Speed 5969.04 samples/sec Loss 3.6754 LearningRate 0.0220 Epoch: 15 Global Step: 163610 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:20:33,081-Speed 5972.43 samples/sec Loss 3.6653 LearningRate 0.0220 Epoch: 15 Global Step: 163620 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:20:39,936-Speed 5975.88 samples/sec Loss 3.7002 LearningRate 0.0220 Epoch: 15 Global Step: 163630 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:20:46,818-Speed 5953.77 samples/sec Loss 3.7166 LearningRate 0.0220 Epoch: 15 Global Step: 163640 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:20:53,666-Speed 5982.87 samples/sec Loss 3.6932 LearningRate 0.0220 Epoch: 15 Global Step: 163650 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:21:00,519-Speed 5978.04 samples/sec Loss 3.6684 LearningRate 0.0219 Epoch: 15 Global Step: 163660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:21:07,381-Speed 5970.50 samples/sec Loss 3.6843 LearningRate 0.0219 Epoch: 15 Global Step: 163670 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:21:14,255-Speed 5960.71 samples/sec Loss 3.7423 LearningRate 0.0219 Epoch: 15 Global Step: 163680 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:21:21,124-Speed 5963.71 samples/sec Loss 3.6520 LearningRate 0.0219 Epoch: 15 Global Step: 163690 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:21:27,987-Speed 5969.08 samples/sec Loss 3.6724 LearningRate 0.0219 Epoch: 15 Global Step: 163700 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:21:34,839-Speed 5981.48 samples/sec Loss 3.6677 LearningRate 0.0219 Epoch: 15 Global Step: 163710 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:21:41,683-Speed 5986.72 samples/sec Loss 3.6841 LearningRate 0.0219 Epoch: 15 Global Step: 163720 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:21:48,538-Speed 5976.40 samples/sec Loss 3.6459 LearningRate 0.0219 Epoch: 15 Global Step: 163730 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:21:55,422-Speed 5951.86 samples/sec Loss 3.6350 LearningRate 0.0219 Epoch: 15 Global Step: 163740 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:22:02,268-Speed 5983.39 samples/sec Loss 3.6680 LearningRate 0.0219 Epoch: 15 Global Step: 163750 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:22:09,118-Speed 5982.40 samples/sec Loss 3.6594 LearningRate 0.0218 Epoch: 15 Global Step: 163760 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:22:15,999-Speed 5954.37 samples/sec Loss 3.6554 LearningRate 0.0218 Epoch: 15 Global Step: 163770 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:22:22,872-Speed 5960.24 samples/sec Loss 3.6391 LearningRate 0.0218 Epoch: 15 Global Step: 163780 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:22:29,738-Speed 5967.48 samples/sec Loss 3.7036 LearningRate 0.0218 Epoch: 15 Global Step: 163790 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:22:36,616-Speed 5957.11 samples/sec Loss 3.6790 LearningRate 0.0218 Epoch: 15 Global Step: 163800 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:22:43,463-Speed 5982.90 samples/sec Loss 3.6728 LearningRate 0.0218 Epoch: 15 Global Step: 163810 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:22:50,337-Speed 5959.97 samples/sec Loss 3.6607 LearningRate 0.0218 Epoch: 15 Global Step: 163820 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:22:57,199-Speed 5973.50 samples/sec Loss 3.6237 LearningRate 0.0218 Epoch: 15 Global Step: 163830 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:23:04,053-Speed 5977.17 samples/sec Loss 3.6728 LearningRate 0.0218 Epoch: 15 Global Step: 163840 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:23:10,912-Speed 5973.15 samples/sec Loss 3.6583 LearningRate 0.0218 Epoch: 15 Global Step: 163850 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 04:23:17,775-Speed 5969.99 samples/sec Loss 3.6701 LearningRate 0.0217 Epoch: 15 Global Step: 163860 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:23:24,636-Speed 5971.41 samples/sec Loss 3.6142 LearningRate 0.0217 Epoch: 15 Global Step: 163870 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:23:31,520-Speed 5951.60 samples/sec Loss 3.6464 LearningRate 0.0217 Epoch: 15 Global Step: 163880 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:23:38,371-Speed 5979.93 samples/sec Loss 3.6658 LearningRate 0.0217 Epoch: 15 Global Step: 163890 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:23:45,257-Speed 5949.00 samples/sec Loss 3.6059 LearningRate 0.0217 Epoch: 15 Global Step: 163900 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:23:52,109-Speed 5978.35 samples/sec Loss 3.6300 LearningRate 0.0217 Epoch: 15 Global Step: 163910 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:23:59,010-Speed 5936.72 samples/sec Loss 3.6338 LearningRate 0.0217 Epoch: 15 Global Step: 163920 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:24:05,912-Speed 5935.40 samples/sec Loss 3.6621 LearningRate 0.0217 Epoch: 15 Global Step: 163930 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:24:12,816-Speed 5933.94 samples/sec Loss 3.6660 LearningRate 0.0217 Epoch: 15 Global Step: 163940 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:24:19,714-Speed 5939.16 samples/sec Loss 3.6392 LearningRate 0.0217 Epoch: 15 Global Step: 163950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:24:26,597-Speed 5951.90 samples/sec Loss 3.6574 LearningRate 0.0216 Epoch: 15 Global Step: 163960 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-09 04:24:33,446-Speed 5981.47 samples/sec Loss 3.6379 LearningRate 0.0216 Epoch: 15 Global Step: 163970 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-09 04:24:40,292-Speed 5984.14 samples/sec Loss 3.6745 LearningRate 0.0216 Epoch: 15 Global Step: 163980 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:24:47,133-Speed 5988.59 samples/sec Loss 3.6209 LearningRate 0.0216 Epoch: 15 Global Step: 163990 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:24:54,008-Speed 5961.62 samples/sec Loss 3.6341 LearningRate 0.0216 Epoch: 15 Global Step: 164000 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:25:00,860-Speed 5979.68 samples/sec Loss 3.6509 LearningRate 0.0216 Epoch: 15 Global Step: 164010 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 04:25:07,745-Speed 5950.02 samples/sec Loss 3.6108 LearningRate 0.0216 Epoch: 15 Global Step: 164020 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:25:14,605-Speed 5972.30 samples/sec Loss 3.6993 LearningRate 0.0216 Epoch: 15 Global Step: 164030 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:25:21,471-Speed 5966.71 samples/sec Loss 3.6680 LearningRate 0.0216 Epoch: 15 Global Step: 164040 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:25:28,393-Speed 5918.62 samples/sec Loss 3.6197 LearningRate 0.0216 Epoch: 15 Global Step: 164050 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:25:35,277-Speed 5951.31 samples/sec Loss 3.6257 LearningRate 0.0215 Epoch: 15 Global Step: 164060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:25:42,142-Speed 5966.95 samples/sec Loss 3.6241 LearningRate 0.0215 Epoch: 15 Global Step: 164070 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:25:50,186-Speed 5092.64 samples/sec Loss 3.6349 LearningRate 0.0215 Epoch: 15 Global Step: 164080 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:25:57,035-Speed 5982.69 samples/sec Loss 3.6529 LearningRate 0.0215 Epoch: 15 Global Step: 164090 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:26:03,891-Speed 5975.45 samples/sec Loss 3.6726 LearningRate 0.0215 Epoch: 15 Global Step: 164100 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:26:10,747-Speed 5975.46 samples/sec Loss 3.6355 LearningRate 0.0215 Epoch: 15 Global Step: 164110 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:26:17,609-Speed 5971.02 samples/sec Loss 3.6620 LearningRate 0.0215 Epoch: 15 Global Step: 164120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:26:24,469-Speed 5972.38 samples/sec Loss 3.6458 LearningRate 0.0215 Epoch: 15 Global Step: 164130 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:26:31,326-Speed 5974.26 samples/sec Loss 3.6395 LearningRate 0.0215 Epoch: 15 Global Step: 164140 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:26:38,176-Speed 5981.42 samples/sec Loss 3.6322 LearningRate 0.0215 Epoch: 15 Global Step: 164150 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:26:45,038-Speed 5970.27 samples/sec Loss 3.6661 LearningRate 0.0214 Epoch: 15 Global Step: 164160 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:26:51,906-Speed 5964.71 samples/sec Loss 3.6504 LearningRate 0.0214 Epoch: 15 Global Step: 164170 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:26:58,769-Speed 5969.86 samples/sec Loss 3.6271 LearningRate 0.0214 Epoch: 15 Global Step: 164180 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-09 04:27:05,624-Speed 5976.62 samples/sec Loss 3.6507 LearningRate 0.0214 Epoch: 15 Global Step: 164190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:27:12,486-Speed 5969.76 samples/sec Loss 3.6032 LearningRate 0.0214 Epoch: 15 Global Step: 164200 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:27:19,368-Speed 5953.13 samples/sec Loss 3.5989 LearningRate 0.0214 Epoch: 15 Global Step: 164210 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:27:26,248-Speed 5957.77 samples/sec Loss 3.6144 LearningRate 0.0214 Epoch: 15 Global Step: 164220 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:27:33,123-Speed 5959.35 samples/sec Loss 3.6409 LearningRate 0.0214 Epoch: 15 Global Step: 164230 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:27:39,989-Speed 5966.60 samples/sec Loss 3.6374 LearningRate 0.0214 Epoch: 15 Global Step: 164240 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:27:46,887-Speed 5939.43 samples/sec Loss 3.5910 LearningRate 0.0214 Epoch: 15 Global Step: 164250 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:27:53,763-Speed 5957.19 samples/sec Loss 3.6342 LearningRate 0.0214 Epoch: 15 Global Step: 164260 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:28:00,619-Speed 5976.33 samples/sec Loss 3.6187 LearningRate 0.0213 Epoch: 15 Global Step: 164270 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:28:07,495-Speed 5958.47 samples/sec Loss 3.6340 LearningRate 0.0213 Epoch: 15 Global Step: 164280 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:28:14,344-Speed 5981.54 samples/sec Loss 3.6085 LearningRate 0.0213 Epoch: 15 Global Step: 164290 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:28:21,229-Speed 5950.96 samples/sec Loss 3.6243 LearningRate 0.0213 Epoch: 15 Global Step: 164300 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:28:28,158-Speed 5913.20 samples/sec Loss 3.6041 LearningRate 0.0213 Epoch: 15 Global Step: 164310 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:28:35,009-Speed 5979.32 samples/sec Loss 3.6196 LearningRate 0.0213 Epoch: 15 Global Step: 164320 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:28:41,869-Speed 5972.22 samples/sec Loss 3.6013 LearningRate 0.0213 Epoch: 15 Global Step: 164330 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:28:48,722-Speed 5978.14 samples/sec Loss 3.6236 LearningRate 0.0213 Epoch: 15 Global Step: 164340 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:28:55,612-Speed 5946.25 samples/sec Loss 3.5861 LearningRate 0.0213 Epoch: 15 Global Step: 164350 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:29:02,467-Speed 5976.33 samples/sec Loss 3.5687 LearningRate 0.0213 Epoch: 15 Global Step: 164360 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:29:09,375-Speed 5932.79 samples/sec Loss 3.5994 LearningRate 0.0212 Epoch: 15 Global Step: 164370 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:29:16,308-Speed 5908.72 samples/sec Loss 3.6320 LearningRate 0.0212 Epoch: 15 Global Step: 164380 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:29:23,204-Speed 5941.23 samples/sec Loss 3.6268 LearningRate 0.0212 Epoch: 15 Global Step: 164390 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:29:30,047-Speed 5987.14 samples/sec Loss 3.5675 LearningRate 0.0212 Epoch: 15 Global Step: 164400 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:29:36,902-Speed 5976.03 samples/sec Loss 3.6167 LearningRate 0.0212 Epoch: 15 Global Step: 164410 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:29:43,757-Speed 5977.89 samples/sec Loss 3.6123 LearningRate 0.0212 Epoch: 15 Global Step: 164420 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:29:50,642-Speed 5950.60 samples/sec Loss 3.6270 LearningRate 0.0212 Epoch: 15 Global Step: 164430 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:29:57,547-Speed 5932.98 samples/sec Loss 3.6064 LearningRate 0.0212 Epoch: 15 Global Step: 164440 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:30:04,406-Speed 5972.57 samples/sec Loss 3.6009 LearningRate 0.0212 Epoch: 15 Global Step: 164450 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:30:11,258-Speed 5978.98 samples/sec Loss 3.5931 LearningRate 0.0212 Epoch: 15 Global Step: 164460 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-09 04:30:18,131-Speed 5960.28 samples/sec Loss 3.6304 LearningRate 0.0211 Epoch: 15 Global Step: 164470 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-09 04:30:24,994-Speed 5970.33 samples/sec Loss 3.6192 LearningRate 0.0211 Epoch: 15 Global Step: 164480 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-09 04:30:31,841-Speed 5982.88 samples/sec Loss 3.5977 LearningRate 0.0211 Epoch: 15 Global Step: 164490 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-09 04:30:38,684-Speed 5987.35 samples/sec Loss 3.5949 LearningRate 0.0211 Epoch: 15 Global Step: 164500 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:30:45,554-Speed 5963.49 samples/sec Loss 3.5901 LearningRate 0.0211 Epoch: 15 Global Step: 164510 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:30:52,427-Speed 5961.61 samples/sec Loss 3.6138 LearningRate 0.0211 Epoch: 15 Global Step: 164520 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:30:59,272-Speed 5985.16 samples/sec Loss 3.5933 LearningRate 0.0211 Epoch: 15 Global Step: 164530 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:31:06,135-Speed 5969.59 samples/sec Loss 3.5921 LearningRate 0.0211 Epoch: 15 Global Step: 164540 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:31:13,009-Speed 5961.62 samples/sec Loss 3.5969 LearningRate 0.0211 Epoch: 15 Global Step: 164550 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:31:19,873-Speed 5968.91 samples/sec Loss 3.6439 LearningRate 0.0211 Epoch: 15 Global Step: 164560 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:31:26,748-Speed 5958.53 samples/sec Loss 3.5765 LearningRate 0.0210 Epoch: 15 Global Step: 164570 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:31:33,614-Speed 5967.74 samples/sec Loss 3.5927 LearningRate 0.0210 Epoch: 15 Global Step: 164580 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:31:40,471-Speed 5974.33 samples/sec Loss 3.6103 LearningRate 0.0210 Epoch: 15 Global Step: 164590 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:31:47,367-Speed 5942.02 samples/sec Loss 3.5934 LearningRate 0.0210 Epoch: 15 Global Step: 164600 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-09 04:31:54,207-Speed 5989.37 samples/sec Loss 3.6143 LearningRate 0.0210 Epoch: 15 Global Step: 164610 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:32:01,057-Speed 5980.88 samples/sec Loss 3.6195 LearningRate 0.0210 Epoch: 15 Global Step: 164620 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:32:07,921-Speed 5968.56 samples/sec Loss 3.5932 LearningRate 0.0210 Epoch: 15 Global Step: 164630 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:32:14,799-Speed 5956.74 samples/sec Loss 3.6103 LearningRate 0.0210 Epoch: 15 Global Step: 164640 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:32:21,644-Speed 5984.67 samples/sec Loss 3.5812 LearningRate 0.0210 Epoch: 15 Global Step: 164650 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:32:28,492-Speed 5982.20 samples/sec Loss 3.6012 LearningRate 0.0210 Epoch: 15 Global Step: 164660 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:32:35,357-Speed 5967.86 samples/sec Loss 3.5742 LearningRate 0.0209 Epoch: 15 Global Step: 164670 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:32:42,215-Speed 5973.86 samples/sec Loss 3.5578 LearningRate 0.0209 Epoch: 15 Global Step: 164680 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:32:49,068-Speed 5978.13 samples/sec Loss 3.5848 LearningRate 0.0209 Epoch: 15 Global Step: 164690 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:32:55,943-Speed 5959.48 samples/sec Loss 3.6032 LearningRate 0.0209 Epoch: 15 Global Step: 164700 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:33:02,801-Speed 5974.32 samples/sec Loss 3.6312 LearningRate 0.0209 Epoch: 15 Global Step: 164710 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:33:09,672-Speed 5962.47 samples/sec Loss 3.6188 LearningRate 0.0209 Epoch: 15 Global Step: 164720 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:33:16,518-Speed 5984.13 samples/sec Loss 3.5658 LearningRate 0.0209 Epoch: 15 Global Step: 164730 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:33:23,394-Speed 5958.00 samples/sec Loss 3.5682 LearningRate 0.0209 Epoch: 15 Global Step: 164740 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:33:30,261-Speed 5966.34 samples/sec Loss 3.5869 LearningRate 0.0209 Epoch: 15 Global Step: 164750 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:33:37,104-Speed 5986.47 samples/sec Loss 3.5700 LearningRate 0.0209 Epoch: 15 Global Step: 164760 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:33:43,950-Speed 5984.03 samples/sec Loss 3.6182 LearningRate 0.0208 Epoch: 15 Global Step: 164770 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:33:50,809-Speed 5973.45 samples/sec Loss 3.5647 LearningRate 0.0208 Epoch: 15 Global Step: 164780 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:33:57,663-Speed 5976.38 samples/sec Loss 3.5739 LearningRate 0.0208 Epoch: 15 Global Step: 164790 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:34:04,583-Speed 5920.54 samples/sec Loss 3.5784 LearningRate 0.0208 Epoch: 15 Global Step: 164800 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:34:11,515-Speed 5911.77 samples/sec Loss 3.5702 LearningRate 0.0208 Epoch: 15 Global Step: 164810 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-09 04:34:18,426-Speed 5928.61 samples/sec Loss 3.5348 LearningRate 0.0208 Epoch: 15 Global Step: 164820 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-09 04:34:25,295-Speed 5963.82 samples/sec Loss 3.5490 LearningRate 0.0208 Epoch: 15 Global Step: 164830 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-09 04:34:32,155-Speed 5972.37 samples/sec Loss 3.5473 LearningRate 0.0208 Epoch: 15 Global Step: 164840 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-09 04:34:39,033-Speed 5956.22 samples/sec Loss 3.6038 LearningRate 0.0208 Epoch: 15 Global Step: 164850 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-09 04:34:45,887-Speed 5977.01 samples/sec Loss 3.6348 LearningRate 0.0208 Epoch: 15 Global Step: 164860 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-09 04:34:52,749-Speed 5970.66 samples/sec Loss 3.5740 LearningRate 0.0208 Epoch: 15 Global Step: 164870 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-09 04:34:59,602-Speed 5977.79 samples/sec Loss 3.6094 LearningRate 0.0207 Epoch: 15 Global Step: 164880 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:35:06,453-Speed 5979.49 samples/sec Loss 3.5607 LearningRate 0.0207 Epoch: 15 Global Step: 164890 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:35:13,303-Speed 5980.99 samples/sec Loss 3.6373 LearningRate 0.0207 Epoch: 15 Global Step: 164900 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:35:21,485-Speed 5007.12 samples/sec Loss 3.5996 LearningRate 0.0207 Epoch: 15 Global Step: 164910 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:35:28,327-Speed 5987.76 samples/sec Loss 3.6167 LearningRate 0.0207 Epoch: 15 Global Step: 164920 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:35:35,197-Speed 5963.24 samples/sec Loss 3.5188 LearningRate 0.0207 Epoch: 15 Global Step: 164930 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:35:42,049-Speed 5979.83 samples/sec Loss 3.5551 LearningRate 0.0207 Epoch: 15 Global Step: 164940 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:35:48,910-Speed 5970.72 samples/sec Loss 3.5811 LearningRate 0.0207 Epoch: 15 Global Step: 164950 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:35:55,768-Speed 5974.59 samples/sec Loss 3.5488 LearningRate 0.0207 Epoch: 15 Global Step: 164960 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:36:02,668-Speed 5937.22 samples/sec Loss 3.6046 LearningRate 0.0207 Epoch: 15 Global Step: 164970 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:36:09,554-Speed 5949.76 samples/sec Loss 3.6071 LearningRate 0.0206 Epoch: 15 Global Step: 164980 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-09 04:36:16,442-Speed 5950.99 samples/sec Loss 3.5620 LearningRate 0.0206 Epoch: 15 Global Step: 164990 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-09 04:36:23,293-Speed 5979.89 samples/sec Loss 3.5998 LearningRate 0.0206 Epoch: 15 Global Step: 165000 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:36:50,204-[lfw][165000]XNorm: 23.243098 Training: 2022-01-09 04:36:50,204-[lfw][165000]Accuracy-Flip: 0.99767+-0.00300 Training: 2022-01-09 04:36:50,205-[lfw][165000]Accuracy-Highest: 0.99817 Training: 2022-01-09 04:37:21,341-[cfp_fp][165000]XNorm: 20.781484 Training: 2022-01-09 04:37:21,342-[cfp_fp][165000]Accuracy-Flip: 0.98814+-0.00438 Training: 2022-01-09 04:37:21,343-[cfp_fp][165000]Accuracy-Highest: 0.98929 Training: 2022-01-09 04:37:47,972-[agedb_30][165000]XNorm: 22.635031 Training: 2022-01-09 04:37:47,973-[agedb_30][165000]Accuracy-Flip: 0.98067+-0.00633 Training: 2022-01-09 04:37:47,973-[agedb_30][165000]Accuracy-Highest: 0.98067 Training: 2022-01-09 04:37:54,823-Speed 447.51 samples/sec Loss 3.5845 LearningRate 0.0206 Epoch: 15 Global Step: 165010 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:38:01,653-Speed 5998.64 samples/sec Loss 3.5696 LearningRate 0.0206 Epoch: 15 Global Step: 165020 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:38:08,488-Speed 5993.94 samples/sec Loss 3.5748 LearningRate 0.0206 Epoch: 15 Global Step: 165030 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:38:15,329-Speed 5989.68 samples/sec Loss 3.5559 LearningRate 0.0206 Epoch: 15 Global Step: 165040 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:38:22,201-Speed 5960.81 samples/sec Loss 3.5500 LearningRate 0.0206 Epoch: 15 Global Step: 165050 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:38:29,075-Speed 5960.04 samples/sec Loss 3.6008 LearningRate 0.0206 Epoch: 15 Global Step: 165060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:38:35,948-Speed 5961.92 samples/sec Loss 3.6060 LearningRate 0.0206 Epoch: 15 Global Step: 165070 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:38:42,827-Speed 5956.01 samples/sec Loss 3.5432 LearningRate 0.0205 Epoch: 15 Global Step: 165080 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:38:49,749-Speed 5918.96 samples/sec Loss 3.5809 LearningRate 0.0205 Epoch: 15 Global Step: 165090 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:38:56,655-Speed 5934.29 samples/sec Loss 3.5925 LearningRate 0.0205 Epoch: 15 Global Step: 165100 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:39:03,528-Speed 5960.96 samples/sec Loss 3.5392 LearningRate 0.0205 Epoch: 15 Global Step: 165110 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:39:10,406-Speed 5955.65 samples/sec Loss 3.5879 LearningRate 0.0205 Epoch: 15 Global Step: 165120 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:39:17,272-Speed 5966.97 samples/sec Loss 3.5372 LearningRate 0.0205 Epoch: 15 Global Step: 165130 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:39:24,164-Speed 5944.82 samples/sec Loss 3.5826 LearningRate 0.0205 Epoch: 15 Global Step: 165140 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:39:31,020-Speed 5975.56 samples/sec Loss 3.5842 LearningRate 0.0205 Epoch: 15 Global Step: 165150 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:39:37,881-Speed 5971.62 samples/sec Loss 3.5669 LearningRate 0.0205 Epoch: 15 Global Step: 165160 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:39:44,733-Speed 5978.78 samples/sec Loss 3.5781 LearningRate 0.0205 Epoch: 15 Global Step: 165170 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:39:51,565-Speed 5997.09 samples/sec Loss 3.5359 LearningRate 0.0204 Epoch: 15 Global Step: 165180 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:39:58,427-Speed 5969.85 samples/sec Loss 3.5923 LearningRate 0.0204 Epoch: 15 Global Step: 165190 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:40:05,286-Speed 5972.79 samples/sec Loss 3.5525 LearningRate 0.0204 Epoch: 15 Global Step: 165200 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:40:12,166-Speed 5955.33 samples/sec Loss 3.5863 LearningRate 0.0204 Epoch: 15 Global Step: 165210 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:40:19,039-Speed 5961.11 samples/sec Loss 3.5908 LearningRate 0.0204 Epoch: 15 Global Step: 165220 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:40:25,889-Speed 5979.81 samples/sec Loss 3.5676 LearningRate 0.0204 Epoch: 15 Global Step: 165230 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:40:32,739-Speed 5981.14 samples/sec Loss 3.5590 LearningRate 0.0204 Epoch: 15 Global Step: 165240 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:40:39,585-Speed 5984.76 samples/sec Loss 3.5215 LearningRate 0.0204 Epoch: 15 Global Step: 165250 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:40:46,432-Speed 5982.85 samples/sec Loss 3.6024 LearningRate 0.0204 Epoch: 15 Global Step: 165260 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:40:53,301-Speed 5963.84 samples/sec Loss 3.5677 LearningRate 0.0204 Epoch: 15 Global Step: 165270 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:41:00,147-Speed 5984.58 samples/sec Loss 3.5420 LearningRate 0.0204 Epoch: 15 Global Step: 165280 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:41:06,983-Speed 5992.70 samples/sec Loss 3.4933 LearningRate 0.0203 Epoch: 15 Global Step: 165290 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:41:13,840-Speed 5973.86 samples/sec Loss 3.5449 LearningRate 0.0203 Epoch: 15 Global Step: 165300 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:41:20,694-Speed 5979.07 samples/sec Loss 3.5514 LearningRate 0.0203 Epoch: 15 Global Step: 165310 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:41:27,568-Speed 5959.34 samples/sec Loss 3.4881 LearningRate 0.0203 Epoch: 15 Global Step: 165320 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:41:34,455-Speed 5949.40 samples/sec Loss 3.5767 LearningRate 0.0203 Epoch: 15 Global Step: 165330 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:41:41,310-Speed 5975.94 samples/sec Loss 3.5540 LearningRate 0.0203 Epoch: 15 Global Step: 165340 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:41:48,164-Speed 5977.64 samples/sec Loss 3.5344 LearningRate 0.0203 Epoch: 15 Global Step: 165350 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:41:55,021-Speed 5980.60 samples/sec Loss 3.5952 LearningRate 0.0203 Epoch: 15 Global Step: 165360 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:42:01,875-Speed 5977.36 samples/sec Loss 3.5400 LearningRate 0.0203 Epoch: 15 Global Step: 165370 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:42:08,758-Speed 5952.48 samples/sec Loss 3.5378 LearningRate 0.0203 Epoch: 15 Global Step: 165380 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:42:15,641-Speed 5952.14 samples/sec Loss 3.5642 LearningRate 0.0202 Epoch: 15 Global Step: 165390 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:42:22,511-Speed 5963.77 samples/sec Loss 3.5674 LearningRate 0.0202 Epoch: 15 Global Step: 165400 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:42:29,396-Speed 5949.52 samples/sec Loss 3.5204 LearningRate 0.0202 Epoch: 15 Global Step: 165410 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:42:36,256-Speed 5972.42 samples/sec Loss 3.5031 LearningRate 0.0202 Epoch: 15 Global Step: 165420 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:42:43,108-Speed 5980.34 samples/sec Loss 3.5663 LearningRate 0.0202 Epoch: 15 Global Step: 165430 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:42:49,973-Speed 5967.23 samples/sec Loss 3.5433 LearningRate 0.0202 Epoch: 15 Global Step: 165440 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:42:56,844-Speed 5962.44 samples/sec Loss 3.5559 LearningRate 0.0202 Epoch: 15 Global Step: 165450 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:43:03,700-Speed 5977.21 samples/sec Loss 3.5083 LearningRate 0.0202 Epoch: 15 Global Step: 165460 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:43:10,559-Speed 5972.77 samples/sec Loss 3.5872 LearningRate 0.0202 Epoch: 15 Global Step: 165470 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:43:17,421-Speed 5969.99 samples/sec Loss 3.5319 LearningRate 0.0202 Epoch: 15 Global Step: 165480 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:43:24,344-Speed 5918.52 samples/sec Loss 3.5010 LearningRate 0.0201 Epoch: 15 Global Step: 165490 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-09 04:43:31,218-Speed 5959.62 samples/sec Loss 3.5500 LearningRate 0.0201 Epoch: 15 Global Step: 165500 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-09 04:43:38,082-Speed 5968.70 samples/sec Loss 3.5366 LearningRate 0.0201 Epoch: 15 Global Step: 165510 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-09 04:43:44,961-Speed 5955.78 samples/sec Loss 3.5532 LearningRate 0.0201 Epoch: 15 Global Step: 165520 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-09 04:43:51,800-Speed 5990.46 samples/sec Loss 3.5329 LearningRate 0.0201 Epoch: 15 Global Step: 165530 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:43:58,664-Speed 5967.89 samples/sec Loss 3.5174 LearningRate 0.0201 Epoch: 15 Global Step: 165540 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:44:05,534-Speed 5963.69 samples/sec Loss 3.5272 LearningRate 0.0201 Epoch: 15 Global Step: 165550 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:44:12,441-Speed 5931.84 samples/sec Loss 3.5131 LearningRate 0.0201 Epoch: 15 Global Step: 165560 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:44:19,314-Speed 5960.75 samples/sec Loss 3.5901 LearningRate 0.0201 Epoch: 15 Global Step: 165570 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:44:26,173-Speed 5974.81 samples/sec Loss 3.4886 LearningRate 0.0201 Epoch: 15 Global Step: 165580 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:44:33,027-Speed 5976.99 samples/sec Loss 3.4889 LearningRate 0.0201 Epoch: 15 Global Step: 165590 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:44:39,877-Speed 5980.42 samples/sec Loss 3.5189 LearningRate 0.0200 Epoch: 15 Global Step: 165600 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:44:46,734-Speed 5976.06 samples/sec Loss 3.5160 LearningRate 0.0200 Epoch: 15 Global Step: 165610 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:44:53,608-Speed 5959.16 samples/sec Loss 3.5084 LearningRate 0.0200 Epoch: 15 Global Step: 165620 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:45:00,516-Speed 5932.84 samples/sec Loss 3.5455 LearningRate 0.0200 Epoch: 15 Global Step: 165630 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-09 04:45:10,672-Speed 4033.60 samples/sec Loss 3.5440 LearningRate 0.0200 Epoch: 15 Global Step: 165640 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:45:17,537-Speed 5967.97 samples/sec Loss 3.5654 LearningRate 0.0200 Epoch: 15 Global Step: 165650 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:45:24,385-Speed 5982.49 samples/sec Loss 3.5040 LearningRate 0.0200 Epoch: 15 Global Step: 165660 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:45:31,235-Speed 5980.76 samples/sec Loss 3.5355 LearningRate 0.0200 Epoch: 15 Global Step: 165670 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:45:38,129-Speed 5944.73 samples/sec Loss 3.5228 LearningRate 0.0200 Epoch: 15 Global Step: 165680 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:45:44,985-Speed 5974.82 samples/sec Loss 3.5370 LearningRate 0.0200 Epoch: 15 Global Step: 165690 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:45:51,837-Speed 5979.17 samples/sec Loss 3.5448 LearningRate 0.0199 Epoch: 15 Global Step: 165700 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:45:58,700-Speed 5969.59 samples/sec Loss 3.5172 LearningRate 0.0199 Epoch: 15 Global Step: 165710 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:46:05,556-Speed 5976.14 samples/sec Loss 3.4689 LearningRate 0.0199 Epoch: 15 Global Step: 165720 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:46:12,402-Speed 5986.90 samples/sec Loss 3.5044 LearningRate 0.0199 Epoch: 15 Global Step: 165730 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:46:19,272-Speed 5962.90 samples/sec Loss 3.5167 LearningRate 0.0199 Epoch: 15 Global Step: 165740 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:46:26,156-Speed 5951.88 samples/sec Loss 3.5482 LearningRate 0.0199 Epoch: 15 Global Step: 165750 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:46:33,067-Speed 5928.09 samples/sec Loss 3.5407 LearningRate 0.0199 Epoch: 15 Global Step: 165760 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:46:39,943-Speed 5959.95 samples/sec Loss 3.4881 LearningRate 0.0199 Epoch: 15 Global Step: 165770 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:46:46,800-Speed 5974.29 samples/sec Loss 3.5077 LearningRate 0.0199 Epoch: 15 Global Step: 165780 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:46:53,657-Speed 5974.86 samples/sec Loss 3.5897 LearningRate 0.0199 Epoch: 15 Global Step: 165790 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:47:00,509-Speed 5979.44 samples/sec Loss 3.5295 LearningRate 0.0199 Epoch: 15 Global Step: 165800 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:47:07,369-Speed 5971.95 samples/sec Loss 3.5433 LearningRate 0.0198 Epoch: 15 Global Step: 165810 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:47:14,300-Speed 5911.22 samples/sec Loss 3.5124 LearningRate 0.0198 Epoch: 15 Global Step: 165820 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:47:21,254-Speed 5891.52 samples/sec Loss 3.5613 LearningRate 0.0198 Epoch: 15 Global Step: 165830 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:47:28,230-Speed 5871.92 samples/sec Loss 3.5365 LearningRate 0.0198 Epoch: 15 Global Step: 165840 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:47:35,189-Speed 5889.68 samples/sec Loss 3.4921 LearningRate 0.0198 Epoch: 15 Global Step: 165850 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:47:42,074-Speed 5949.84 samples/sec Loss 3.5365 LearningRate 0.0198 Epoch: 15 Global Step: 165860 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:47:48,949-Speed 5958.81 samples/sec Loss 3.5262 LearningRate 0.0198 Epoch: 15 Global Step: 165870 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:47:55,809-Speed 5972.44 samples/sec Loss 3.5626 LearningRate 0.0198 Epoch: 15 Global Step: 165880 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:48:02,679-Speed 5963.04 samples/sec Loss 3.5363 LearningRate 0.0198 Epoch: 15 Global Step: 165890 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:48:09,559-Speed 5954.44 samples/sec Loss 3.5374 LearningRate 0.0198 Epoch: 15 Global Step: 165900 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:48:16,416-Speed 5974.96 samples/sec Loss 3.5355 LearningRate 0.0197 Epoch: 15 Global Step: 165910 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:48:44,501-Speed 1458.61 samples/sec Loss 3.5368 LearningRate 0.0197 Epoch: 16 Global Step: 165920 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:48:51,356-Speed 5976.53 samples/sec Loss 3.5072 LearningRate 0.0197 Epoch: 16 Global Step: 165930 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:48:58,176-Speed 6007.37 samples/sec Loss 3.5349 LearningRate 0.0197 Epoch: 16 Global Step: 165940 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:49:05,009-Speed 5995.56 samples/sec Loss 3.5315 LearningRate 0.0197 Epoch: 16 Global Step: 165950 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:49:11,909-Speed 5937.56 samples/sec Loss 3.5290 LearningRate 0.0197 Epoch: 16 Global Step: 165960 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:49:18,847-Speed 5905.59 samples/sec Loss 3.4958 LearningRate 0.0197 Epoch: 16 Global Step: 165970 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:49:25,713-Speed 5966.62 samples/sec Loss 3.5159 LearningRate 0.0197 Epoch: 16 Global Step: 165980 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:49:32,578-Speed 5967.86 samples/sec Loss 3.5275 LearningRate 0.0197 Epoch: 16 Global Step: 165990 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:49:39,448-Speed 5962.99 samples/sec Loss 3.4770 LearningRate 0.0197 Epoch: 16 Global Step: 166000 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:49:46,331-Speed 5952.32 samples/sec Loss 3.5326 LearningRate 0.0197 Epoch: 16 Global Step: 166010 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:49:53,204-Speed 5960.35 samples/sec Loss 3.5053 LearningRate 0.0196 Epoch: 16 Global Step: 166020 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:50:00,096-Speed 5944.70 samples/sec Loss 3.4435 LearningRate 0.0196 Epoch: 16 Global Step: 166030 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:50:06,991-Speed 5940.56 samples/sec Loss 3.4806 LearningRate 0.0196 Epoch: 16 Global Step: 166040 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:50:13,890-Speed 5938.69 samples/sec Loss 3.4773 LearningRate 0.0196 Epoch: 16 Global Step: 166050 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:50:20,776-Speed 5949.72 samples/sec Loss 3.4763 LearningRate 0.0196 Epoch: 16 Global Step: 166060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:50:27,642-Speed 5966.16 samples/sec Loss 3.5037 LearningRate 0.0196 Epoch: 16 Global Step: 166070 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:50:34,527-Speed 5950.59 samples/sec Loss 3.4470 LearningRate 0.0196 Epoch: 16 Global Step: 166080 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:50:41,402-Speed 5959.51 samples/sec Loss 3.4985 LearningRate 0.0196 Epoch: 16 Global Step: 166090 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:50:48,305-Speed 5934.29 samples/sec Loss 3.4637 LearningRate 0.0196 Epoch: 16 Global Step: 166100 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:50:55,174-Speed 5964.44 samples/sec Loss 3.5081 LearningRate 0.0196 Epoch: 16 Global Step: 166110 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:51:02,035-Speed 5971.71 samples/sec Loss 3.4840 LearningRate 0.0195 Epoch: 16 Global Step: 166120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:51:08,904-Speed 5963.49 samples/sec Loss 3.4840 LearningRate 0.0195 Epoch: 16 Global Step: 166130 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:51:15,777-Speed 5962.23 samples/sec Loss 3.4876 LearningRate 0.0195 Epoch: 16 Global Step: 166140 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:51:22,644-Speed 5966.09 samples/sec Loss 3.5338 LearningRate 0.0195 Epoch: 16 Global Step: 166150 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:51:29,505-Speed 5971.23 samples/sec Loss 3.4706 LearningRate 0.0195 Epoch: 16 Global Step: 166160 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:51:36,388-Speed 5951.95 samples/sec Loss 3.4916 LearningRate 0.0195 Epoch: 16 Global Step: 166170 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:51:43,273-Speed 5950.48 samples/sec Loss 3.4557 LearningRate 0.0195 Epoch: 16 Global Step: 166180 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:51:50,152-Speed 5955.17 samples/sec Loss 3.4983 LearningRate 0.0195 Epoch: 16 Global Step: 166190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:51:57,016-Speed 5969.11 samples/sec Loss 3.5054 LearningRate 0.0195 Epoch: 16 Global Step: 166200 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:52:03,875-Speed 5973.12 samples/sec Loss 3.4857 LearningRate 0.0195 Epoch: 16 Global Step: 166210 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:52:10,750-Speed 5959.06 samples/sec Loss 3.4445 LearningRate 0.0195 Epoch: 16 Global Step: 166220 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:52:17,634-Speed 5951.28 samples/sec Loss 3.4845 LearningRate 0.0194 Epoch: 16 Global Step: 166230 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:52:24,515-Speed 5953.62 samples/sec Loss 3.4816 LearningRate 0.0194 Epoch: 16 Global Step: 166240 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-09 04:52:31,372-Speed 5974.43 samples/sec Loss 3.4976 LearningRate 0.0194 Epoch: 16 Global Step: 166250 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:52:38,240-Speed 5964.35 samples/sec Loss 3.4651 LearningRate 0.0194 Epoch: 16 Global Step: 166260 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:52:45,149-Speed 5929.78 samples/sec Loss 3.4984 LearningRate 0.0194 Epoch: 16 Global Step: 166270 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:52:52,057-Speed 5930.33 samples/sec Loss 3.4824 LearningRate 0.0194 Epoch: 16 Global Step: 166280 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:52:58,931-Speed 5960.68 samples/sec Loss 3.5114 LearningRate 0.0194 Epoch: 16 Global Step: 166290 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:53:05,802-Speed 5962.63 samples/sec Loss 3.4907 LearningRate 0.0194 Epoch: 16 Global Step: 166300 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:53:12,669-Speed 5965.71 samples/sec Loss 3.4969 LearningRate 0.0194 Epoch: 16 Global Step: 166310 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:53:19,524-Speed 5975.94 samples/sec Loss 3.4818 LearningRate 0.0194 Epoch: 16 Global Step: 166320 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:53:26,412-Speed 5948.17 samples/sec Loss 3.4440 LearningRate 0.0193 Epoch: 16 Global Step: 166330 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:53:33,289-Speed 5957.20 samples/sec Loss 3.5228 LearningRate 0.0193 Epoch: 16 Global Step: 166340 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:53:40,173-Speed 5951.65 samples/sec Loss 3.5358 LearningRate 0.0193 Epoch: 16 Global Step: 166350 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:53:47,024-Speed 5980.50 samples/sec Loss 3.5287 LearningRate 0.0193 Epoch: 16 Global Step: 166360 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:53:53,883-Speed 5972.17 samples/sec Loss 3.5011 LearningRate 0.0193 Epoch: 16 Global Step: 166370 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:54:00,749-Speed 5966.95 samples/sec Loss 3.4695 LearningRate 0.0193 Epoch: 16 Global Step: 166380 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:54:07,613-Speed 5969.44 samples/sec Loss 3.4433 LearningRate 0.0193 Epoch: 16 Global Step: 166390 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:54:14,490-Speed 5957.13 samples/sec Loss 3.5144 LearningRate 0.0193 Epoch: 16 Global Step: 166400 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:54:21,379-Speed 5946.37 samples/sec Loss 3.4491 LearningRate 0.0193 Epoch: 16 Global Step: 166410 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:54:28,235-Speed 5975.57 samples/sec Loss 3.4911 LearningRate 0.0193 Epoch: 16 Global Step: 166420 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:54:35,108-Speed 5960.85 samples/sec Loss 3.4815 LearningRate 0.0193 Epoch: 16 Global Step: 166430 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:54:41,963-Speed 5976.65 samples/sec Loss 3.4990 LearningRate 0.0192 Epoch: 16 Global Step: 166440 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:54:48,819-Speed 5975.88 samples/sec Loss 3.4407 LearningRate 0.0192 Epoch: 16 Global Step: 166450 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:54:55,678-Speed 5973.22 samples/sec Loss 3.4504 LearningRate 0.0192 Epoch: 16 Global Step: 166460 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:55:02,561-Speed 5952.57 samples/sec Loss 3.4949 LearningRate 0.0192 Epoch: 16 Global Step: 166470 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:55:09,506-Speed 5898.86 samples/sec Loss 3.4554 LearningRate 0.0192 Epoch: 16 Global Step: 166480 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:55:16,379-Speed 5960.17 samples/sec Loss 3.4603 LearningRate 0.0192 Epoch: 16 Global Step: 166490 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:55:23,239-Speed 5971.93 samples/sec Loss 3.4411 LearningRate 0.0192 Epoch: 16 Global Step: 166500 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:55:30,173-Speed 5908.89 samples/sec Loss 3.4776 LearningRate 0.0192 Epoch: 16 Global Step: 166510 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:55:37,046-Speed 5959.90 samples/sec Loss 3.4564 LearningRate 0.0192 Epoch: 16 Global Step: 166520 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:55:43,968-Speed 5918.71 samples/sec Loss 3.4567 LearningRate 0.0192 Epoch: 16 Global Step: 166530 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:55:50,845-Speed 5957.08 samples/sec Loss 3.4423 LearningRate 0.0192 Epoch: 16 Global Step: 166540 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:55:57,702-Speed 5975.71 samples/sec Loss 3.5075 LearningRate 0.0191 Epoch: 16 Global Step: 166550 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:56:04,574-Speed 5961.55 samples/sec Loss 3.4639 LearningRate 0.0191 Epoch: 16 Global Step: 166560 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-09 04:56:11,436-Speed 5970.20 samples/sec Loss 3.4934 LearningRate 0.0191 Epoch: 16 Global Step: 166570 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:56:18,319-Speed 5951.98 samples/sec Loss 3.4650 LearningRate 0.0191 Epoch: 16 Global Step: 166580 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:56:25,229-Speed 5929.81 samples/sec Loss 3.4615 LearningRate 0.0191 Epoch: 16 Global Step: 166590 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:56:32,081-Speed 5979.63 samples/sec Loss 3.4876 LearningRate 0.0191 Epoch: 16 Global Step: 166600 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:56:38,941-Speed 5971.95 samples/sec Loss 3.4546 LearningRate 0.0191 Epoch: 16 Global Step: 166610 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:56:45,800-Speed 5974.25 samples/sec Loss 3.4649 LearningRate 0.0191 Epoch: 16 Global Step: 166620 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:56:52,680-Speed 5954.90 samples/sec Loss 3.4177 LearningRate 0.0191 Epoch: 16 Global Step: 166630 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:56:59,602-Speed 5918.05 samples/sec Loss 3.4897 LearningRate 0.0191 Epoch: 16 Global Step: 166640 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:57:06,471-Speed 5964.53 samples/sec Loss 3.4361 LearningRate 0.0190 Epoch: 16 Global Step: 166650 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:57:13,343-Speed 5961.72 samples/sec Loss 3.3965 LearningRate 0.0190 Epoch: 16 Global Step: 166660 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:57:20,212-Speed 5963.98 samples/sec Loss 3.4748 LearningRate 0.0190 Epoch: 16 Global Step: 166670 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-09 04:57:27,064-Speed 5978.71 samples/sec Loss 3.4390 LearningRate 0.0190 Epoch: 16 Global Step: 166680 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-09 04:57:34,007-Speed 5900.58 samples/sec Loss 3.4554 LearningRate 0.0190 Epoch: 16 Global Step: 166690 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:57:40,871-Speed 5968.28 samples/sec Loss 3.4737 LearningRate 0.0190 Epoch: 16 Global Step: 166700 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 04:57:47,721-Speed 5981.42 samples/sec Loss 3.4386 LearningRate 0.0190 Epoch: 16 Global Step: 166710 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:57:54,591-Speed 5963.34 samples/sec Loss 3.4264 LearningRate 0.0190 Epoch: 16 Global Step: 166720 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:58:01,451-Speed 5971.70 samples/sec Loss 3.4640 LearningRate 0.0190 Epoch: 16 Global Step: 166730 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:58:08,307-Speed 5975.85 samples/sec Loss 3.4492 LearningRate 0.0190 Epoch: 16 Global Step: 166740 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:58:15,164-Speed 5974.15 samples/sec Loss 3.4233 LearningRate 0.0190 Epoch: 16 Global Step: 166750 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:58:22,023-Speed 5972.80 samples/sec Loss 3.4697 LearningRate 0.0189 Epoch: 16 Global Step: 166760 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:58:28,887-Speed 5970.30 samples/sec Loss 3.4567 LearningRate 0.0189 Epoch: 16 Global Step: 166770 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:58:35,778-Speed 5945.63 samples/sec Loss 3.4406 LearningRate 0.0189 Epoch: 16 Global Step: 166780 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 04:58:42,677-Speed 5937.74 samples/sec Loss 3.4203 LearningRate 0.0189 Epoch: 16 Global Step: 166790 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 04:58:49,538-Speed 5972.85 samples/sec Loss 3.4523 LearningRate 0.0189 Epoch: 16 Global Step: 166800 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 04:58:56,403-Speed 5967.97 samples/sec Loss 3.4262 LearningRate 0.0189 Epoch: 16 Global Step: 166810 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 04:59:03,261-Speed 5973.42 samples/sec Loss 3.4765 LearningRate 0.0189 Epoch: 16 Global Step: 166820 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 04:59:10,126-Speed 5968.37 samples/sec Loss 3.4282 LearningRate 0.0189 Epoch: 16 Global Step: 166830 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 04:59:17,014-Speed 5948.06 samples/sec Loss 3.4815 LearningRate 0.0189 Epoch: 16 Global Step: 166840 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 04:59:23,872-Speed 5973.49 samples/sec Loss 3.4524 LearningRate 0.0189 Epoch: 16 Global Step: 166850 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 04:59:30,734-Speed 5970.75 samples/sec Loss 3.4219 LearningRate 0.0189 Epoch: 16 Global Step: 166860 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 04:59:37,595-Speed 5971.20 samples/sec Loss 3.4362 LearningRate 0.0188 Epoch: 16 Global Step: 166870 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 04:59:44,460-Speed 5968.01 samples/sec Loss 3.4188 LearningRate 0.0188 Epoch: 16 Global Step: 166880 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:59:51,320-Speed 5971.75 samples/sec Loss 3.4461 LearningRate 0.0188 Epoch: 16 Global Step: 166890 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 04:59:58,182-Speed 5970.91 samples/sec Loss 3.4341 LearningRate 0.0188 Epoch: 16 Global Step: 166900 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:00:05,057-Speed 5958.48 samples/sec Loss 3.4044 LearningRate 0.0188 Epoch: 16 Global Step: 166910 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:00:11,930-Speed 5961.30 samples/sec Loss 3.4269 LearningRate 0.0188 Epoch: 16 Global Step: 166920 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:00:18,788-Speed 5973.91 samples/sec Loss 3.3778 LearningRate 0.0188 Epoch: 16 Global Step: 166930 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:00:25,640-Speed 5979.20 samples/sec Loss 3.4798 LearningRate 0.0188 Epoch: 16 Global Step: 166940 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:00:32,522-Speed 5952.68 samples/sec Loss 3.4219 LearningRate 0.0188 Epoch: 16 Global Step: 166950 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:00:39,470-Speed 5896.30 samples/sec Loss 3.4763 LearningRate 0.0188 Epoch: 16 Global Step: 166960 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:00:46,356-Speed 5949.47 samples/sec Loss 3.4463 LearningRate 0.0188 Epoch: 16 Global Step: 166970 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:00:53,211-Speed 5976.82 samples/sec Loss 3.4302 LearningRate 0.0187 Epoch: 16 Global Step: 166980 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:01:00,092-Speed 5954.23 samples/sec Loss 3.3958 LearningRate 0.0187 Epoch: 16 Global Step: 166990 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:01:06,954-Speed 5970.28 samples/sec Loss 3.4348 LearningRate 0.0187 Epoch: 16 Global Step: 167000 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:01:13,822-Speed 5964.57 samples/sec Loss 3.4605 LearningRate 0.0187 Epoch: 16 Global Step: 167010 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:01:20,696-Speed 5967.71 samples/sec Loss 3.4249 LearningRate 0.0187 Epoch: 16 Global Step: 167020 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:01:27,574-Speed 5956.66 samples/sec Loss 3.4043 LearningRate 0.0187 Epoch: 16 Global Step: 167030 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:01:34,436-Speed 5970.24 samples/sec Loss 3.4195 LearningRate 0.0187 Epoch: 16 Global Step: 167040 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:01:41,289-Speed 5978.20 samples/sec Loss 3.4066 LearningRate 0.0187 Epoch: 16 Global Step: 167050 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:01:48,166-Speed 5957.01 samples/sec Loss 3.4771 LearningRate 0.0187 Epoch: 16 Global Step: 167060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:01:55,020-Speed 5977.78 samples/sec Loss 3.4226 LearningRate 0.0187 Epoch: 16 Global Step: 167070 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:02:01,890-Speed 5964.55 samples/sec Loss 3.4314 LearningRate 0.0186 Epoch: 16 Global Step: 167080 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-09 05:02:08,763-Speed 5960.11 samples/sec Loss 3.4559 LearningRate 0.0186 Epoch: 16 Global Step: 167090 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-09 05:02:15,610-Speed 5983.23 samples/sec Loss 3.4168 LearningRate 0.0186 Epoch: 16 Global Step: 167100 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:02:22,470-Speed 5972.29 samples/sec Loss 3.4099 LearningRate 0.0186 Epoch: 16 Global Step: 167110 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:02:29,343-Speed 5959.50 samples/sec Loss 3.4199 LearningRate 0.0186 Epoch: 16 Global Step: 167120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:02:36,207-Speed 5968.91 samples/sec Loss 3.4188 LearningRate 0.0186 Epoch: 16 Global Step: 167130 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:02:43,075-Speed 5965.65 samples/sec Loss 3.3593 LearningRate 0.0186 Epoch: 16 Global Step: 167140 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:02:49,942-Speed 5965.61 samples/sec Loss 3.4443 LearningRate 0.0186 Epoch: 16 Global Step: 167150 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:02:56,907-Speed 5885.31 samples/sec Loss 3.4487 LearningRate 0.0186 Epoch: 16 Global Step: 167160 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:03:03,761-Speed 5977.19 samples/sec Loss 3.3726 LearningRate 0.0186 Epoch: 16 Global Step: 167170 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:03:10,626-Speed 5967.82 samples/sec Loss 3.4249 LearningRate 0.0186 Epoch: 16 Global Step: 167180 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:03:17,488-Speed 5969.76 samples/sec Loss 3.3976 LearningRate 0.0185 Epoch: 16 Global Step: 167190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:03:24,337-Speed 5982.36 samples/sec Loss 3.3986 LearningRate 0.0185 Epoch: 16 Global Step: 167200 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:03:31,212-Speed 5958.27 samples/sec Loss 3.3942 LearningRate 0.0185 Epoch: 16 Global Step: 167210 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:03:38,082-Speed 5963.32 samples/sec Loss 3.3762 LearningRate 0.0185 Epoch: 16 Global Step: 167220 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:03:44,935-Speed 5977.98 samples/sec Loss 3.4467 LearningRate 0.0185 Epoch: 16 Global Step: 167230 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:03:51,810-Speed 5959.11 samples/sec Loss 3.4380 LearningRate 0.0185 Epoch: 16 Global Step: 167240 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:03:58,677-Speed 5965.57 samples/sec Loss 3.3996 LearningRate 0.0185 Epoch: 16 Global Step: 167250 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:04:05,543-Speed 5966.91 samples/sec Loss 3.3957 LearningRate 0.0185 Epoch: 16 Global Step: 167260 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:04:12,433-Speed 5946.45 samples/sec Loss 3.4381 LearningRate 0.0185 Epoch: 16 Global Step: 167270 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:04:19,293-Speed 5972.00 samples/sec Loss 3.4152 LearningRate 0.0185 Epoch: 16 Global Step: 167280 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:04:26,159-Speed 5967.14 samples/sec Loss 3.4550 LearningRate 0.0185 Epoch: 16 Global Step: 167290 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:04:33,035-Speed 5957.98 samples/sec Loss 3.3921 LearningRate 0.0184 Epoch: 16 Global Step: 167300 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:04:39,913-Speed 5956.55 samples/sec Loss 3.4015 LearningRate 0.0184 Epoch: 16 Global Step: 167310 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:04:46,762-Speed 5981.57 samples/sec Loss 3.3913 LearningRate 0.0184 Epoch: 16 Global Step: 167320 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:04:53,608-Speed 5984.34 samples/sec Loss 3.3896 LearningRate 0.0184 Epoch: 16 Global Step: 167330 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:05:00,457-Speed 5981.72 samples/sec Loss 3.4412 LearningRate 0.0184 Epoch: 16 Global Step: 167340 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:05:07,360-Speed 5934.52 samples/sec Loss 3.4459 LearningRate 0.0184 Epoch: 16 Global Step: 167350 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:05:14,202-Speed 5987.85 samples/sec Loss 3.3592 LearningRate 0.0184 Epoch: 16 Global Step: 167360 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:05:21,053-Speed 5979.32 samples/sec Loss 3.3993 LearningRate 0.0184 Epoch: 16 Global Step: 167370 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:05:27,907-Speed 5977.84 samples/sec Loss 3.3972 LearningRate 0.0184 Epoch: 16 Global Step: 167380 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:05:34,775-Speed 5964.89 samples/sec Loss 3.3952 LearningRate 0.0184 Epoch: 16 Global Step: 167390 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:05:41,636-Speed 5970.94 samples/sec Loss 3.4426 LearningRate 0.0184 Epoch: 16 Global Step: 167400 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:05:48,520-Speed 5950.69 samples/sec Loss 3.4170 LearningRate 0.0183 Epoch: 16 Global Step: 167410 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:05:55,366-Speed 5984.94 samples/sec Loss 3.4170 LearningRate 0.0183 Epoch: 16 Global Step: 167420 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:06:02,265-Speed 5938.02 samples/sec Loss 3.4000 LearningRate 0.0183 Epoch: 16 Global Step: 167430 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:06:09,165-Speed 5938.66 samples/sec Loss 3.3901 LearningRate 0.0183 Epoch: 16 Global Step: 167440 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:06:16,023-Speed 5974.36 samples/sec Loss 3.3913 LearningRate 0.0183 Epoch: 16 Global Step: 167450 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:06:22,902-Speed 5955.50 samples/sec Loss 3.3762 LearningRate 0.0183 Epoch: 16 Global Step: 167460 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-09 05:06:29,779-Speed 5957.04 samples/sec Loss 3.4101 LearningRate 0.0183 Epoch: 16 Global Step: 167470 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-09 05:06:36,643-Speed 5968.05 samples/sec Loss 3.4307 LearningRate 0.0183 Epoch: 16 Global Step: 167480 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-09 05:06:43,495-Speed 5980.37 samples/sec Loss 3.4272 LearningRate 0.0183 Epoch: 16 Global Step: 167490 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:06:50,366-Speed 5962.58 samples/sec Loss 3.4275 LearningRate 0.0183 Epoch: 16 Global Step: 167500 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:06:57,227-Speed 5970.50 samples/sec Loss 3.3935 LearningRate 0.0183 Epoch: 16 Global Step: 167510 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:07:04,083-Speed 5976.17 samples/sec Loss 3.4010 LearningRate 0.0182 Epoch: 16 Global Step: 167520 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:07:10,935-Speed 5978.65 samples/sec Loss 3.3423 LearningRate 0.0182 Epoch: 16 Global Step: 167530 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:07:17,789-Speed 5977.11 samples/sec Loss 3.3960 LearningRate 0.0182 Epoch: 16 Global Step: 167540 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:07:24,652-Speed 5970.05 samples/sec Loss 3.3925 LearningRate 0.0182 Epoch: 16 Global Step: 167550 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:07:31,503-Speed 5980.48 samples/sec Loss 3.4217 LearningRate 0.0182 Epoch: 16 Global Step: 167560 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:07:38,362-Speed 5972.64 samples/sec Loss 3.3929 LearningRate 0.0182 Epoch: 16 Global Step: 167570 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:07:45,213-Speed 5980.06 samples/sec Loss 3.3999 LearningRate 0.0182 Epoch: 16 Global Step: 167580 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:07:52,067-Speed 5977.01 samples/sec Loss 3.3786 LearningRate 0.0182 Epoch: 16 Global Step: 167590 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-09 05:07:58,918-Speed 5979.77 samples/sec Loss 3.3556 LearningRate 0.0182 Epoch: 16 Global Step: 167600 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:08:05,780-Speed 5970.89 samples/sec Loss 3.3431 LearningRate 0.0182 Epoch: 16 Global Step: 167610 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:08:12,739-Speed 5887.31 samples/sec Loss 3.3793 LearningRate 0.0182 Epoch: 16 Global Step: 167620 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:08:19,639-Speed 5936.60 samples/sec Loss 3.3636 LearningRate 0.0181 Epoch: 16 Global Step: 167630 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:08:26,556-Speed 5923.27 samples/sec Loss 3.3768 LearningRate 0.0181 Epoch: 16 Global Step: 167640 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:08:33,512-Speed 5889.54 samples/sec Loss 3.3522 LearningRate 0.0181 Epoch: 16 Global Step: 167650 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:08:40,432-Speed 5920.40 samples/sec Loss 3.3839 LearningRate 0.0181 Epoch: 16 Global Step: 167660 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:08:47,439-Speed 5846.92 samples/sec Loss 3.3939 LearningRate 0.0181 Epoch: 16 Global Step: 167670 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:08:54,305-Speed 5970.03 samples/sec Loss 3.3764 LearningRate 0.0181 Epoch: 16 Global Step: 167680 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:09:01,211-Speed 5931.82 samples/sec Loss 3.4037 LearningRate 0.0181 Epoch: 16 Global Step: 167690 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:09:08,110-Speed 5937.54 samples/sec Loss 3.3836 LearningRate 0.0181 Epoch: 16 Global Step: 167700 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:09:14,963-Speed 5978.65 samples/sec Loss 3.3985 LearningRate 0.0181 Epoch: 16 Global Step: 167710 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:09:21,813-Speed 5980.23 samples/sec Loss 3.3539 LearningRate 0.0181 Epoch: 16 Global Step: 167720 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:09:28,678-Speed 5968.16 samples/sec Loss 3.4336 LearningRate 0.0181 Epoch: 16 Global Step: 167730 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:09:35,537-Speed 5972.38 samples/sec Loss 3.3489 LearningRate 0.0180 Epoch: 16 Global Step: 167740 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:09:42,382-Speed 5985.21 samples/sec Loss 3.4011 LearningRate 0.0180 Epoch: 16 Global Step: 167750 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:09:49,236-Speed 5978.98 samples/sec Loss 3.3530 LearningRate 0.0180 Epoch: 16 Global Step: 167760 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:09:56,099-Speed 5969.13 samples/sec Loss 3.4045 LearningRate 0.0180 Epoch: 16 Global Step: 167770 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:10:02,965-Speed 5967.03 samples/sec Loss 3.3730 LearningRate 0.0180 Epoch: 16 Global Step: 167780 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:10:09,848-Speed 5952.02 samples/sec Loss 3.3866 LearningRate 0.0180 Epoch: 16 Global Step: 167790 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:10:16,751-Speed 5934.43 samples/sec Loss 3.3564 LearningRate 0.0180 Epoch: 16 Global Step: 167800 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:10:23,607-Speed 5975.86 samples/sec Loss 3.3512 LearningRate 0.0180 Epoch: 16 Global Step: 167810 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-09 05:10:30,455-Speed 5982.02 samples/sec Loss 3.3632 LearningRate 0.0180 Epoch: 16 Global Step: 167820 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-09 05:10:37,298-Speed 5986.96 samples/sec Loss 3.3705 LearningRate 0.0180 Epoch: 16 Global Step: 167830 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:10:44,170-Speed 5961.62 samples/sec Loss 3.3908 LearningRate 0.0180 Epoch: 16 Global Step: 167840 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:10:51,022-Speed 5979.06 samples/sec Loss 3.3398 LearningRate 0.0179 Epoch: 16 Global Step: 167850 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:10:57,867-Speed 5985.17 samples/sec Loss 3.3991 LearningRate 0.0179 Epoch: 16 Global Step: 167860 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:11:04,729-Speed 5970.29 samples/sec Loss 3.3892 LearningRate 0.0179 Epoch: 16 Global Step: 167870 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:11:11,620-Speed 5945.60 samples/sec Loss 3.3823 LearningRate 0.0179 Epoch: 16 Global Step: 167880 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:11:18,496-Speed 5958.40 samples/sec Loss 3.3780 LearningRate 0.0179 Epoch: 16 Global Step: 167890 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:11:25,350-Speed 5976.87 samples/sec Loss 3.3497 LearningRate 0.0179 Epoch: 16 Global Step: 167900 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:11:32,228-Speed 5956.29 samples/sec Loss 3.3690 LearningRate 0.0179 Epoch: 16 Global Step: 167910 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:11:39,118-Speed 5946.58 samples/sec Loss 3.3967 LearningRate 0.0179 Epoch: 16 Global Step: 167920 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:11:45,976-Speed 5973.63 samples/sec Loss 3.3685 LearningRate 0.0179 Epoch: 16 Global Step: 167930 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-09 05:11:52,842-Speed 5966.93 samples/sec Loss 3.3491 LearningRate 0.0179 Epoch: 16 Global Step: 167940 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:11:59,682-Speed 5989.23 samples/sec Loss 3.3258 LearningRate 0.0179 Epoch: 16 Global Step: 167950 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:12:06,554-Speed 5961.58 samples/sec Loss 3.3640 LearningRate 0.0178 Epoch: 16 Global Step: 167960 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:12:13,402-Speed 5982.35 samples/sec Loss 3.3517 LearningRate 0.0178 Epoch: 16 Global Step: 167970 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:12:20,257-Speed 5977.08 samples/sec Loss 3.3309 LearningRate 0.0178 Epoch: 16 Global Step: 167980 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:12:27,121-Speed 5968.64 samples/sec Loss 3.3382 LearningRate 0.0178 Epoch: 16 Global Step: 167990 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:12:33,982-Speed 5971.37 samples/sec Loss 3.3461 LearningRate 0.0178 Epoch: 16 Global Step: 168000 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:12:40,830-Speed 5982.06 samples/sec Loss 3.3611 LearningRate 0.0178 Epoch: 16 Global Step: 168010 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:12:47,686-Speed 5975.09 samples/sec Loss 3.3931 LearningRate 0.0178 Epoch: 16 Global Step: 168020 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:12:54,534-Speed 5984.13 samples/sec Loss 3.3156 LearningRate 0.0178 Epoch: 16 Global Step: 168030 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:13:01,398-Speed 5968.99 samples/sec Loss 3.3848 LearningRate 0.0178 Epoch: 16 Global Step: 168040 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:13:08,267-Speed 5964.06 samples/sec Loss 3.3221 LearningRate 0.0178 Epoch: 16 Global Step: 168050 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:13:15,142-Speed 5959.65 samples/sec Loss 3.3302 LearningRate 0.0178 Epoch: 16 Global Step: 168060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:13:22,002-Speed 5971.64 samples/sec Loss 3.3504 LearningRate 0.0177 Epoch: 16 Global Step: 168070 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:13:28,861-Speed 5972.33 samples/sec Loss 3.3996 LearningRate 0.0177 Epoch: 16 Global Step: 168080 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:13:35,742-Speed 5955.70 samples/sec Loss 3.3290 LearningRate 0.0177 Epoch: 16 Global Step: 168090 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:13:42,615-Speed 5960.72 samples/sec Loss 3.3021 LearningRate 0.0177 Epoch: 16 Global Step: 168100 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:13:49,468-Speed 5977.98 samples/sec Loss 3.3578 LearningRate 0.0177 Epoch: 16 Global Step: 168110 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:13:56,328-Speed 5972.31 samples/sec Loss 3.3658 LearningRate 0.0177 Epoch: 16 Global Step: 168120 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:14:03,186-Speed 5974.04 samples/sec Loss 3.3491 LearningRate 0.0177 Epoch: 16 Global Step: 168130 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:14:10,061-Speed 5959.06 samples/sec Loss 3.3717 LearningRate 0.0177 Epoch: 16 Global Step: 168140 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:14:16,914-Speed 5977.67 samples/sec Loss 3.3215 LearningRate 0.0177 Epoch: 16 Global Step: 168150 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:14:23,800-Speed 5950.68 samples/sec Loss 3.3160 LearningRate 0.0177 Epoch: 16 Global Step: 168160 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:14:30,648-Speed 5981.78 samples/sec Loss 3.3660 LearningRate 0.0177 Epoch: 16 Global Step: 168170 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:14:37,521-Speed 5961.24 samples/sec Loss 3.3258 LearningRate 0.0176 Epoch: 16 Global Step: 168180 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:14:44,429-Speed 5931.15 samples/sec Loss 3.3595 LearningRate 0.0176 Epoch: 16 Global Step: 168190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:14:51,294-Speed 5966.82 samples/sec Loss 3.3512 LearningRate 0.0176 Epoch: 16 Global Step: 168200 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:14:58,158-Speed 5968.54 samples/sec Loss 3.3589 LearningRate 0.0176 Epoch: 16 Global Step: 168210 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:15:05,051-Speed 5944.43 samples/sec Loss 3.3613 LearningRate 0.0176 Epoch: 16 Global Step: 168220 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:15:11,999-Speed 5896.25 samples/sec Loss 3.3341 LearningRate 0.0176 Epoch: 16 Global Step: 168230 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:15:18,863-Speed 5969.01 samples/sec Loss 3.3246 LearningRate 0.0176 Epoch: 16 Global Step: 168240 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:15:25,735-Speed 5962.21 samples/sec Loss 3.3425 LearningRate 0.0176 Epoch: 16 Global Step: 168250 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:15:32,618-Speed 5951.87 samples/sec Loss 3.3147 LearningRate 0.0176 Epoch: 16 Global Step: 168260 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:15:39,488-Speed 5964.29 samples/sec Loss 3.3423 LearningRate 0.0176 Epoch: 16 Global Step: 168270 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:15:46,366-Speed 5956.30 samples/sec Loss 3.3648 LearningRate 0.0176 Epoch: 16 Global Step: 168280 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:15:53,245-Speed 5954.98 samples/sec Loss 3.3453 LearningRate 0.0175 Epoch: 16 Global Step: 168290 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:16:00,110-Speed 5971.67 samples/sec Loss 3.3315 LearningRate 0.0175 Epoch: 16 Global Step: 168300 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:16:06,988-Speed 5957.37 samples/sec Loss 3.3312 LearningRate 0.0175 Epoch: 16 Global Step: 168310 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:16:13,861-Speed 5960.34 samples/sec Loss 3.3687 LearningRate 0.0175 Epoch: 16 Global Step: 168320 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:16:20,708-Speed 5984.04 samples/sec Loss 3.3438 LearningRate 0.0175 Epoch: 16 Global Step: 168330 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:16:27,582-Speed 5959.05 samples/sec Loss 3.3105 LearningRate 0.0175 Epoch: 16 Global Step: 168340 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:16:34,428-Speed 5984.78 samples/sec Loss 3.3243 LearningRate 0.0175 Epoch: 16 Global Step: 168350 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:16:41,296-Speed 5965.20 samples/sec Loss 3.3698 LearningRate 0.0175 Epoch: 16 Global Step: 168360 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:16:48,143-Speed 5984.26 samples/sec Loss 3.3658 LearningRate 0.0175 Epoch: 16 Global Step: 168370 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:16:55,011-Speed 5965.12 samples/sec Loss 3.3483 LearningRate 0.0175 Epoch: 16 Global Step: 168380 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:17:01,873-Speed 5972.07 samples/sec Loss 3.3383 LearningRate 0.0175 Epoch: 16 Global Step: 168390 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:17:08,717-Speed 5985.64 samples/sec Loss 3.3797 LearningRate 0.0174 Epoch: 16 Global Step: 168400 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:17:15,563-Speed 5983.95 samples/sec Loss 3.3543 LearningRate 0.0174 Epoch: 16 Global Step: 168410 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:17:22,435-Speed 5961.87 samples/sec Loss 3.2987 LearningRate 0.0174 Epoch: 16 Global Step: 168420 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:17:29,299-Speed 5968.85 samples/sec Loss 3.3149 LearningRate 0.0174 Epoch: 16 Global Step: 168430 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:17:36,147-Speed 5982.00 samples/sec Loss 3.3281 LearningRate 0.0174 Epoch: 16 Global Step: 168440 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:17:42,992-Speed 5985.11 samples/sec Loss 3.3027 LearningRate 0.0174 Epoch: 16 Global Step: 168450 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:17:49,862-Speed 5963.84 samples/sec Loss 3.3070 LearningRate 0.0174 Epoch: 16 Global Step: 168460 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:17:56,708-Speed 5984.17 samples/sec Loss 3.3134 LearningRate 0.0174 Epoch: 16 Global Step: 168470 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:18:03,646-Speed 5904.78 samples/sec Loss 3.3066 LearningRate 0.0174 Epoch: 16 Global Step: 168480 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:18:10,502-Speed 5974.81 samples/sec Loss 3.3576 LearningRate 0.0174 Epoch: 16 Global Step: 168490 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:18:17,345-Speed 5986.88 samples/sec Loss 3.3806 LearningRate 0.0174 Epoch: 16 Global Step: 168500 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:18:24,195-Speed 5980.52 samples/sec Loss 3.3587 LearningRate 0.0173 Epoch: 16 Global Step: 168510 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:18:31,040-Speed 5984.84 samples/sec Loss 3.3022 LearningRate 0.0173 Epoch: 16 Global Step: 168520 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:18:37,903-Speed 5969.91 samples/sec Loss 3.2899 LearningRate 0.0173 Epoch: 16 Global Step: 168530 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:18:44,890-Speed 5863.18 samples/sec Loss 3.3750 LearningRate 0.0173 Epoch: 16 Global Step: 168540 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:18:51,741-Speed 5979.94 samples/sec Loss 3.3147 LearningRate 0.0173 Epoch: 16 Global Step: 168550 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-09 05:18:58,578-Speed 5991.95 samples/sec Loss 3.3295 LearningRate 0.0173 Epoch: 16 Global Step: 168560 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:19:05,455-Speed 5957.79 samples/sec Loss 3.2963 LearningRate 0.0173 Epoch: 16 Global Step: 168570 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:19:12,302-Speed 5983.27 samples/sec Loss 3.2869 LearningRate 0.0173 Epoch: 16 Global Step: 168580 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:19:19,151-Speed 5981.76 samples/sec Loss 3.2720 LearningRate 0.0173 Epoch: 16 Global Step: 168590 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:19:26,012-Speed 5971.19 samples/sec Loss 3.3528 LearningRate 0.0173 Epoch: 16 Global Step: 168600 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:19:32,871-Speed 5972.67 samples/sec Loss 3.3331 LearningRate 0.0173 Epoch: 16 Global Step: 168610 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:19:39,729-Speed 5974.09 samples/sec Loss 3.3298 LearningRate 0.0173 Epoch: 16 Global Step: 168620 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:19:46,595-Speed 5967.10 samples/sec Loss 3.3209 LearningRate 0.0172 Epoch: 16 Global Step: 168630 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:19:53,456-Speed 5970.91 samples/sec Loss 3.3105 LearningRate 0.0172 Epoch: 16 Global Step: 168640 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:20:00,315-Speed 5972.68 samples/sec Loss 3.2893 LearningRate 0.0172 Epoch: 16 Global Step: 168650 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:20:07,160-Speed 5985.67 samples/sec Loss 3.2986 LearningRate 0.0172 Epoch: 16 Global Step: 168660 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:20:14,016-Speed 5975.22 samples/sec Loss 3.3038 LearningRate 0.0172 Epoch: 16 Global Step: 168670 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:20:20,892-Speed 5958.23 samples/sec Loss 3.3114 LearningRate 0.0172 Epoch: 16 Global Step: 168680 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:20:27,734-Speed 5987.13 samples/sec Loss 3.3261 LearningRate 0.0172 Epoch: 16 Global Step: 168690 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:20:34,596-Speed 5970.36 samples/sec Loss 3.3019 LearningRate 0.0172 Epoch: 16 Global Step: 168700 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:20:41,455-Speed 5972.81 samples/sec Loss 3.2894 LearningRate 0.0172 Epoch: 16 Global Step: 168710 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:20:48,314-Speed 5972.73 samples/sec Loss 3.3035 LearningRate 0.0172 Epoch: 16 Global Step: 168720 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:20:55,188-Speed 5959.48 samples/sec Loss 3.2959 LearningRate 0.0172 Epoch: 16 Global Step: 168730 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:21:02,041-Speed 5978.14 samples/sec Loss 3.3372 LearningRate 0.0171 Epoch: 16 Global Step: 168740 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:21:08,905-Speed 5968.21 samples/sec Loss 3.3358 LearningRate 0.0171 Epoch: 16 Global Step: 168750 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:21:15,759-Speed 5977.74 samples/sec Loss 3.3318 LearningRate 0.0171 Epoch: 16 Global Step: 168760 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-09 05:21:22,607-Speed 5981.99 samples/sec Loss 3.3287 LearningRate 0.0171 Epoch: 16 Global Step: 168770 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:21:29,487-Speed 5954.97 samples/sec Loss 3.3485 LearningRate 0.0171 Epoch: 16 Global Step: 168780 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:21:36,356-Speed 5964.51 samples/sec Loss 3.2984 LearningRate 0.0171 Epoch: 16 Global Step: 168790 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:21:43,216-Speed 5972.27 samples/sec Loss 3.2926 LearningRate 0.0171 Epoch: 16 Global Step: 168800 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:21:50,184-Speed 5879.16 samples/sec Loss 3.2740 LearningRate 0.0171 Epoch: 16 Global Step: 168810 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:21:57,029-Speed 5985.38 samples/sec Loss 3.3012 LearningRate 0.0171 Epoch: 16 Global Step: 168820 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:22:03,888-Speed 5972.07 samples/sec Loss 3.3105 LearningRate 0.0171 Epoch: 16 Global Step: 168830 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:22:10,733-Speed 5984.79 samples/sec Loss 3.3167 LearningRate 0.0171 Epoch: 16 Global Step: 168840 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:22:17,604-Speed 5962.65 samples/sec Loss 3.3113 LearningRate 0.0170 Epoch: 16 Global Step: 168850 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:22:24,469-Speed 5967.89 samples/sec Loss 3.3086 LearningRate 0.0170 Epoch: 16 Global Step: 168860 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:22:31,327-Speed 5974.03 samples/sec Loss 3.2941 LearningRate 0.0170 Epoch: 16 Global Step: 168870 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:22:38,182-Speed 5976.12 samples/sec Loss 3.2829 LearningRate 0.0170 Epoch: 16 Global Step: 168880 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:22:45,066-Speed 5951.36 samples/sec Loss 3.2843 LearningRate 0.0170 Epoch: 16 Global Step: 168890 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:22:51,930-Speed 5968.56 samples/sec Loss 3.3165 LearningRate 0.0170 Epoch: 16 Global Step: 168900 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:22:58,796-Speed 5966.39 samples/sec Loss 3.2983 LearningRate 0.0170 Epoch: 16 Global Step: 168910 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:23:05,687-Speed 5945.05 samples/sec Loss 3.3470 LearningRate 0.0170 Epoch: 16 Global Step: 168920 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:23:12,554-Speed 5966.01 samples/sec Loss 3.3219 LearningRate 0.0170 Epoch: 16 Global Step: 168930 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 05:23:19,399-Speed 5986.29 samples/sec Loss 3.3152 LearningRate 0.0170 Epoch: 16 Global Step: 168940 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:23:26,252-Speed 5977.22 samples/sec Loss 3.3094 LearningRate 0.0170 Epoch: 16 Global Step: 168950 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:23:33,122-Speed 5965.16 samples/sec Loss 3.2982 LearningRate 0.0169 Epoch: 16 Global Step: 168960 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:23:39,980-Speed 5974.47 samples/sec Loss 3.3018 LearningRate 0.0169 Epoch: 16 Global Step: 168970 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:23:46,831-Speed 5979.95 samples/sec Loss 3.2591 LearningRate 0.0169 Epoch: 16 Global Step: 168980 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:23:53,699-Speed 5965.62 samples/sec Loss 3.2740 LearningRate 0.0169 Epoch: 16 Global Step: 168990 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:24:00,555-Speed 5976.13 samples/sec Loss 3.2701 LearningRate 0.0169 Epoch: 16 Global Step: 169000 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:24:07,428-Speed 5959.85 samples/sec Loss 3.3040 LearningRate 0.0169 Epoch: 16 Global Step: 169010 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:24:14,292-Speed 5968.89 samples/sec Loss 3.2644 LearningRate 0.0169 Epoch: 16 Global Step: 169020 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:24:21,160-Speed 5965.14 samples/sec Loss 3.3070 LearningRate 0.0169 Epoch: 16 Global Step: 169030 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:24:28,017-Speed 5974.02 samples/sec Loss 3.2762 LearningRate 0.0169 Epoch: 16 Global Step: 169040 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-09 05:24:34,894-Speed 5957.19 samples/sec Loss 3.2770 LearningRate 0.0169 Epoch: 16 Global Step: 169050 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:24:41,781-Speed 5948.39 samples/sec Loss 3.2544 LearningRate 0.0169 Epoch: 16 Global Step: 169060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:24:48,635-Speed 5977.15 samples/sec Loss 3.2630 LearningRate 0.0169 Epoch: 16 Global Step: 169070 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:24:55,507-Speed 5962.33 samples/sec Loss 3.2767 LearningRate 0.0168 Epoch: 16 Global Step: 169080 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:25:02,384-Speed 5957.57 samples/sec Loss 3.2769 LearningRate 0.0168 Epoch: 16 Global Step: 169090 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:25:09,284-Speed 5936.58 samples/sec Loss 3.2649 LearningRate 0.0168 Epoch: 16 Global Step: 169100 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:25:16,192-Speed 5930.82 samples/sec Loss 3.2433 LearningRate 0.0168 Epoch: 16 Global Step: 169110 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:25:23,125-Speed 5909.34 samples/sec Loss 3.2807 LearningRate 0.0168 Epoch: 16 Global Step: 169120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 05:25:30,024-Speed 5937.77 samples/sec Loss 3.2812 LearningRate 0.0168 Epoch: 16 Global Step: 169130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:25:36,916-Speed 5944.01 samples/sec Loss 3.3044 LearningRate 0.0168 Epoch: 16 Global Step: 169140 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:25:43,817-Speed 5936.76 samples/sec Loss 3.2666 LearningRate 0.0168 Epoch: 16 Global Step: 169150 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:25:50,689-Speed 5961.60 samples/sec Loss 3.3305 LearningRate 0.0168 Epoch: 16 Global Step: 169160 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:25:57,537-Speed 5982.70 samples/sec Loss 3.3176 LearningRate 0.0168 Epoch: 16 Global Step: 169170 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:26:04,387-Speed 5980.56 samples/sec Loss 3.2633 LearningRate 0.0168 Epoch: 16 Global Step: 169180 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:26:11,242-Speed 5976.02 samples/sec Loss 3.2772 LearningRate 0.0167 Epoch: 16 Global Step: 169190 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:26:18,100-Speed 5973.93 samples/sec Loss 3.2711 LearningRate 0.0167 Epoch: 16 Global Step: 169200 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:26:24,999-Speed 5939.16 samples/sec Loss 3.2392 LearningRate 0.0167 Epoch: 16 Global Step: 169210 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:26:31,899-Speed 5936.90 samples/sec Loss 3.2626 LearningRate 0.0167 Epoch: 16 Global Step: 169220 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:26:38,751-Speed 5979.74 samples/sec Loss 3.3152 LearningRate 0.0167 Epoch: 16 Global Step: 169230 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:26:45,597-Speed 5983.77 samples/sec Loss 3.2477 LearningRate 0.0167 Epoch: 16 Global Step: 169240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:26:52,456-Speed 5973.24 samples/sec Loss 3.2524 LearningRate 0.0167 Epoch: 16 Global Step: 169250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:26:59,312-Speed 5975.73 samples/sec Loss 3.2982 LearningRate 0.0167 Epoch: 16 Global Step: 169260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:27:06,175-Speed 5969.40 samples/sec Loss 3.2639 LearningRate 0.0167 Epoch: 16 Global Step: 169270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:27:13,042-Speed 5966.19 samples/sec Loss 3.2747 LearningRate 0.0167 Epoch: 16 Global Step: 169280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:27:19,908-Speed 5966.52 samples/sec Loss 3.2840 LearningRate 0.0167 Epoch: 16 Global Step: 169290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:27:26,761-Speed 5980.98 samples/sec Loss 3.2318 LearningRate 0.0167 Epoch: 16 Global Step: 169300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:27:33,633-Speed 5961.50 samples/sec Loss 3.2740 LearningRate 0.0166 Epoch: 16 Global Step: 169310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:27:40,517-Speed 5951.38 samples/sec Loss 3.2762 LearningRate 0.0166 Epoch: 16 Global Step: 169320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:27:47,385-Speed 5964.99 samples/sec Loss 3.2514 LearningRate 0.0166 Epoch: 16 Global Step: 169330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:27:54,225-Speed 5989.32 samples/sec Loss 3.2623 LearningRate 0.0166 Epoch: 16 Global Step: 169340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:28:01,077-Speed 5979.03 samples/sec Loss 3.2208 LearningRate 0.0166 Epoch: 16 Global Step: 169350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:28:07,951-Speed 5959.77 samples/sec Loss 3.2483 LearningRate 0.0166 Epoch: 16 Global Step: 169360 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:28:14,808-Speed 5974.26 samples/sec Loss 3.2640 LearningRate 0.0166 Epoch: 16 Global Step: 169370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:28:21,659-Speed 5980.70 samples/sec Loss 3.2457 LearningRate 0.0166 Epoch: 16 Global Step: 169380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:28:28,515-Speed 5975.43 samples/sec Loss 3.2558 LearningRate 0.0166 Epoch: 16 Global Step: 169390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:28:35,400-Speed 5950.04 samples/sec Loss 3.2612 LearningRate 0.0166 Epoch: 16 Global Step: 169400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:28:42,253-Speed 5979.14 samples/sec Loss 3.2852 LearningRate 0.0166 Epoch: 16 Global Step: 169410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:28:49,114-Speed 5971.14 samples/sec Loss 3.2572 LearningRate 0.0165 Epoch: 16 Global Step: 169420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:28:55,986-Speed 5961.42 samples/sec Loss 3.2397 LearningRate 0.0165 Epoch: 16 Global Step: 169430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:29:02,847-Speed 5970.81 samples/sec Loss 3.2831 LearningRate 0.0165 Epoch: 16 Global Step: 169440 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-09 05:29:09,690-Speed 5988.94 samples/sec Loss 3.2911 LearningRate 0.0165 Epoch: 16 Global Step: 169450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:29:16,560-Speed 5963.12 samples/sec Loss 3.2610 LearningRate 0.0165 Epoch: 16 Global Step: 169460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:29:23,420-Speed 5972.44 samples/sec Loss 3.2463 LearningRate 0.0165 Epoch: 16 Global Step: 169470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:29:30,268-Speed 5982.25 samples/sec Loss 3.2565 LearningRate 0.0165 Epoch: 16 Global Step: 169480 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:29:37,164-Speed 5940.39 samples/sec Loss 3.2343 LearningRate 0.0165 Epoch: 16 Global Step: 169490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:29:44,016-Speed 5978.76 samples/sec Loss 3.2448 LearningRate 0.0165 Epoch: 16 Global Step: 169500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:29:50,882-Speed 5967.17 samples/sec Loss 3.3016 LearningRate 0.0165 Epoch: 16 Global Step: 169510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:29:57,753-Speed 5962.19 samples/sec Loss 3.2844 LearningRate 0.0165 Epoch: 16 Global Step: 169520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:30:04,603-Speed 5980.65 samples/sec Loss 3.2646 LearningRate 0.0165 Epoch: 16 Global Step: 169530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:30:11,446-Speed 5987.34 samples/sec Loss 3.2422 LearningRate 0.0164 Epoch: 16 Global Step: 169540 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:30:18,310-Speed 5967.82 samples/sec Loss 3.2522 LearningRate 0.0164 Epoch: 16 Global Step: 169550 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:30:25,175-Speed 5969.32 samples/sec Loss 3.2700 LearningRate 0.0164 Epoch: 16 Global Step: 169560 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:30:32,032-Speed 5974.75 samples/sec Loss 3.2472 LearningRate 0.0164 Epoch: 16 Global Step: 169570 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:30:38,877-Speed 5985.27 samples/sec Loss 3.2974 LearningRate 0.0164 Epoch: 16 Global Step: 169580 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:30:45,734-Speed 5974.37 samples/sec Loss 3.2444 LearningRate 0.0164 Epoch: 16 Global Step: 169590 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:30:52,590-Speed 5975.31 samples/sec Loss 3.2393 LearningRate 0.0164 Epoch: 16 Global Step: 169600 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:30:59,465-Speed 5959.14 samples/sec Loss 3.3322 LearningRate 0.0164 Epoch: 16 Global Step: 169610 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:31:06,312-Speed 5982.85 samples/sec Loss 3.2412 LearningRate 0.0164 Epoch: 16 Global Step: 169620 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:31:13,162-Speed 5981.26 samples/sec Loss 3.2240 LearningRate 0.0164 Epoch: 16 Global Step: 169630 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:31:20,020-Speed 5973.99 samples/sec Loss 3.2370 LearningRate 0.0164 Epoch: 16 Global Step: 169640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:31:26,885-Speed 5967.36 samples/sec Loss 3.2254 LearningRate 0.0163 Epoch: 16 Global Step: 169650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:31:33,734-Speed 5981.89 samples/sec Loss 3.2200 LearningRate 0.0163 Epoch: 16 Global Step: 169660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:31:40,585-Speed 5979.35 samples/sec Loss 3.2544 LearningRate 0.0163 Epoch: 16 Global Step: 169670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:31:47,464-Speed 5957.70 samples/sec Loss 3.2473 LearningRate 0.0163 Epoch: 16 Global Step: 169680 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:31:54,396-Speed 5909.85 samples/sec Loss 3.2436 LearningRate 0.0163 Epoch: 16 Global Step: 169690 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:32:01,330-Speed 5908.32 samples/sec Loss 3.2502 LearningRate 0.0163 Epoch: 16 Global Step: 169700 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:32:08,251-Speed 5919.55 samples/sec Loss 3.2031 LearningRate 0.0163 Epoch: 16 Global Step: 169710 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:32:15,195-Speed 5899.31 samples/sec Loss 3.2334 LearningRate 0.0163 Epoch: 16 Global Step: 169720 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:32:22,128-Speed 5909.03 samples/sec Loss 3.2088 LearningRate 0.0163 Epoch: 16 Global Step: 169730 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:32:29,061-Speed 5910.00 samples/sec Loss 3.2519 LearningRate 0.0163 Epoch: 16 Global Step: 169740 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:32:36,001-Speed 5903.05 samples/sec Loss 3.2193 LearningRate 0.0163 Epoch: 16 Global Step: 169750 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:32:42,863-Speed 5969.52 samples/sec Loss 3.2149 LearningRate 0.0163 Epoch: 16 Global Step: 169760 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:32:49,710-Speed 5983.40 samples/sec Loss 3.2302 LearningRate 0.0162 Epoch: 16 Global Step: 169770 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:32:56,563-Speed 5978.25 samples/sec Loss 3.2444 LearningRate 0.0162 Epoch: 16 Global Step: 169780 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:33:03,443-Speed 5954.61 samples/sec Loss 3.2446 LearningRate 0.0162 Epoch: 16 Global Step: 169790 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:33:10,329-Speed 5949.46 samples/sec Loss 3.1972 LearningRate 0.0162 Epoch: 16 Global Step: 169800 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:33:17,204-Speed 5959.56 samples/sec Loss 3.2564 LearningRate 0.0162 Epoch: 16 Global Step: 169810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:33:24,098-Speed 5941.68 samples/sec Loss 3.2152 LearningRate 0.0162 Epoch: 16 Global Step: 169820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:33:30,962-Speed 5968.99 samples/sec Loss 3.1643 LearningRate 0.0162 Epoch: 16 Global Step: 169830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:33:37,810-Speed 5983.08 samples/sec Loss 3.2632 LearningRate 0.0162 Epoch: 16 Global Step: 169840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:33:44,675-Speed 5966.60 samples/sec Loss 3.2216 LearningRate 0.0162 Epoch: 16 Global Step: 169850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:33:51,524-Speed 5981.70 samples/sec Loss 3.2372 LearningRate 0.0162 Epoch: 16 Global Step: 169860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:33:58,384-Speed 5972.28 samples/sec Loss 3.2117 LearningRate 0.0162 Epoch: 16 Global Step: 169870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:34:05,265-Speed 5954.00 samples/sec Loss 3.2107 LearningRate 0.0161 Epoch: 16 Global Step: 169880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:34:12,111-Speed 5984.05 samples/sec Loss 3.2214 LearningRate 0.0161 Epoch: 16 Global Step: 169890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:34:18,996-Speed 5950.93 samples/sec Loss 3.2092 LearningRate 0.0161 Epoch: 16 Global Step: 169900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:34:25,844-Speed 5982.80 samples/sec Loss 3.1975 LearningRate 0.0161 Epoch: 16 Global Step: 169910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:34:32,700-Speed 5974.74 samples/sec Loss 3.2490 LearningRate 0.0161 Epoch: 16 Global Step: 169920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:34:39,572-Speed 5962.20 samples/sec Loss 3.2522 LearningRate 0.0161 Epoch: 16 Global Step: 169930 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:34:46,445-Speed 5960.29 samples/sec Loss 3.2375 LearningRate 0.0161 Epoch: 16 Global Step: 169940 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:34:53,343-Speed 5939.63 samples/sec Loss 3.2385 LearningRate 0.0161 Epoch: 16 Global Step: 169950 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:35:00,239-Speed 5941.50 samples/sec Loss 3.2423 LearningRate 0.0161 Epoch: 16 Global Step: 169960 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:35:07,099-Speed 5971.69 samples/sec Loss 3.2481 LearningRate 0.0161 Epoch: 16 Global Step: 169970 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:35:13,953-Speed 5977.66 samples/sec Loss 3.2133 LearningRate 0.0161 Epoch: 16 Global Step: 169980 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:35:20,815-Speed 5970.49 samples/sec Loss 3.2062 LearningRate 0.0161 Epoch: 16 Global Step: 169990 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:35:27,709-Speed 5942.85 samples/sec Loss 3.2472 LearningRate 0.0160 Epoch: 16 Global Step: 170000 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:35:54,537-[lfw][170000]XNorm: 23.536367 Training: 2022-01-09 05:35:54,538-[lfw][170000]Accuracy-Flip: 0.99800+-0.00287 Training: 2022-01-09 05:35:54,539-[lfw][170000]Accuracy-Highest: 0.99817 Training: 2022-01-09 05:36:25,456-[cfp_fp][170000]XNorm: 21.273448 Training: 2022-01-09 05:36:25,457-[cfp_fp][170000]Accuracy-Flip: 0.99071+-0.00363 Training: 2022-01-09 05:36:25,458-[cfp_fp][170000]Accuracy-Highest: 0.99071 Training: 2022-01-09 05:36:52,343-[agedb_30][170000]XNorm: 22.943632 Training: 2022-01-09 05:36:52,344-[agedb_30][170000]Accuracy-Flip: 0.97750+-0.00523 Training: 2022-01-09 05:36:52,344-[agedb_30][170000]Accuracy-Highest: 0.98067 Training: 2022-01-09 05:36:59,223-Speed 447.60 samples/sec Loss 3.2569 LearningRate 0.0160 Epoch: 16 Global Step: 170010 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:37:06,060-Speed 5992.13 samples/sec Loss 3.2460 LearningRate 0.0160 Epoch: 16 Global Step: 170020 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:37:12,916-Speed 5974.96 samples/sec Loss 3.2212 LearningRate 0.0160 Epoch: 16 Global Step: 170030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:37:19,772-Speed 5975.81 samples/sec Loss 3.2334 LearningRate 0.0160 Epoch: 16 Global Step: 170040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:37:26,641-Speed 5964.28 samples/sec Loss 3.1970 LearningRate 0.0160 Epoch: 16 Global Step: 170050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:37:33,533-Speed 5944.60 samples/sec Loss 3.2156 LearningRate 0.0160 Epoch: 16 Global Step: 170060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:37:40,411-Speed 5956.03 samples/sec Loss 3.2145 LearningRate 0.0160 Epoch: 16 Global Step: 170070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:37:47,294-Speed 5952.10 samples/sec Loss 3.2137 LearningRate 0.0160 Epoch: 16 Global Step: 170080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:37:54,176-Speed 5954.03 samples/sec Loss 3.1833 LearningRate 0.0160 Epoch: 16 Global Step: 170090 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:38:01,055-Speed 5955.59 samples/sec Loss 3.1531 LearningRate 0.0160 Epoch: 16 Global Step: 170100 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:38:07,944-Speed 5946.55 samples/sec Loss 3.2260 LearningRate 0.0159 Epoch: 16 Global Step: 170110 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:38:14,830-Speed 5949.86 samples/sec Loss 3.1974 LearningRate 0.0159 Epoch: 16 Global Step: 170120 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:38:21,710-Speed 5955.02 samples/sec Loss 3.2017 LearningRate 0.0159 Epoch: 16 Global Step: 170130 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:38:28,571-Speed 5970.83 samples/sec Loss 3.2278 LearningRate 0.0159 Epoch: 16 Global Step: 170140 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:38:35,423-Speed 5979.28 samples/sec Loss 3.1771 LearningRate 0.0159 Epoch: 16 Global Step: 170150 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:38:42,286-Speed 5970.05 samples/sec Loss 3.2233 LearningRate 0.0159 Epoch: 16 Global Step: 170160 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:38:49,170-Speed 5951.68 samples/sec Loss 3.1730 LearningRate 0.0159 Epoch: 16 Global Step: 170170 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:38:56,048-Speed 5956.33 samples/sec Loss 3.2394 LearningRate 0.0159 Epoch: 16 Global Step: 170180 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:39:02,915-Speed 5966.44 samples/sec Loss 3.1952 LearningRate 0.0159 Epoch: 16 Global Step: 170190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:39:09,778-Speed 5968.64 samples/sec Loss 3.2629 LearningRate 0.0159 Epoch: 16 Global Step: 170200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:39:16,635-Speed 5975.11 samples/sec Loss 3.2175 LearningRate 0.0159 Epoch: 16 Global Step: 170210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:39:23,495-Speed 5974.63 samples/sec Loss 3.2159 LearningRate 0.0159 Epoch: 16 Global Step: 170220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:39:30,370-Speed 5958.63 samples/sec Loss 3.1773 LearningRate 0.0158 Epoch: 16 Global Step: 170230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:39:37,233-Speed 5969.33 samples/sec Loss 3.1832 LearningRate 0.0158 Epoch: 16 Global Step: 170240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:39:44,113-Speed 5955.00 samples/sec Loss 3.1725 LearningRate 0.0158 Epoch: 16 Global Step: 170250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:39:50,970-Speed 5974.22 samples/sec Loss 3.2130 LearningRate 0.0158 Epoch: 16 Global Step: 170260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:39:57,847-Speed 5958.13 samples/sec Loss 3.2137 LearningRate 0.0158 Epoch: 16 Global Step: 170270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:40:04,718-Speed 5962.69 samples/sec Loss 3.1930 LearningRate 0.0158 Epoch: 16 Global Step: 170280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:40:11,560-Speed 5987.06 samples/sec Loss 3.2495 LearningRate 0.0158 Epoch: 16 Global Step: 170290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:40:18,425-Speed 5968.16 samples/sec Loss 3.2020 LearningRate 0.0158 Epoch: 16 Global Step: 170300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:40:25,292-Speed 5966.16 samples/sec Loss 3.1722 LearningRate 0.0158 Epoch: 16 Global Step: 170310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:40:32,149-Speed 5974.38 samples/sec Loss 3.2063 LearningRate 0.0158 Epoch: 16 Global Step: 170320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:40:39,021-Speed 5962.13 samples/sec Loss 3.2013 LearningRate 0.0158 Epoch: 16 Global Step: 170330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:40:45,884-Speed 5969.75 samples/sec Loss 3.1918 LearningRate 0.0158 Epoch: 16 Global Step: 170340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:40:52,843-Speed 5886.57 samples/sec Loss 3.1629 LearningRate 0.0157 Epoch: 16 Global Step: 170350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:40:59,734-Speed 5945.80 samples/sec Loss 3.1839 LearningRate 0.0157 Epoch: 16 Global Step: 170360 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:41:06,589-Speed 5975.62 samples/sec Loss 3.1789 LearningRate 0.0157 Epoch: 16 Global Step: 170370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:41:13,448-Speed 5972.58 samples/sec Loss 3.1893 LearningRate 0.0157 Epoch: 16 Global Step: 170380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:41:20,326-Speed 5956.76 samples/sec Loss 3.2327 LearningRate 0.0157 Epoch: 16 Global Step: 170390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-09 05:41:27,196-Speed 5963.35 samples/sec Loss 3.1547 LearningRate 0.0157 Epoch: 16 Global Step: 170400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:41:34,061-Speed 5967.41 samples/sec Loss 3.1712 LearningRate 0.0157 Epoch: 16 Global Step: 170410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:41:40,994-Speed 5909.75 samples/sec Loss 3.1590 LearningRate 0.0157 Epoch: 16 Global Step: 170420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:41:47,848-Speed 5977.24 samples/sec Loss 3.2272 LearningRate 0.0157 Epoch: 16 Global Step: 170430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:41:54,701-Speed 5977.24 samples/sec Loss 3.1639 LearningRate 0.0157 Epoch: 16 Global Step: 170440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:42:01,552-Speed 5979.75 samples/sec Loss 3.1822 LearningRate 0.0157 Epoch: 16 Global Step: 170450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:42:08,432-Speed 5955.17 samples/sec Loss 3.1594 LearningRate 0.0157 Epoch: 16 Global Step: 170460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:42:15,284-Speed 5978.66 samples/sec Loss 3.1606 LearningRate 0.0156 Epoch: 16 Global Step: 170470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:42:22,156-Speed 5961.27 samples/sec Loss 3.1924 LearningRate 0.0156 Epoch: 16 Global Step: 170480 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:42:28,990-Speed 5994.59 samples/sec Loss 3.1537 LearningRate 0.0156 Epoch: 16 Global Step: 170490 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:42:35,852-Speed 5970.38 samples/sec Loss 3.2003 LearningRate 0.0156 Epoch: 16 Global Step: 170500 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:42:42,735-Speed 5952.69 samples/sec Loss 3.1853 LearningRate 0.0156 Epoch: 16 Global Step: 170510 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:42:49,602-Speed 5966.41 samples/sec Loss 3.1860 LearningRate 0.0156 Epoch: 16 Global Step: 170520 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:42:56,447-Speed 5984.73 samples/sec Loss 3.1912 LearningRate 0.0156 Epoch: 16 Global Step: 170530 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:43:03,299-Speed 5978.67 samples/sec Loss 3.1697 LearningRate 0.0156 Epoch: 16 Global Step: 170540 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:43:10,164-Speed 5968.35 samples/sec Loss 3.1725 LearningRate 0.0156 Epoch: 16 Global Step: 170550 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:43:17,054-Speed 5945.11 samples/sec Loss 3.1843 LearningRate 0.0156 Epoch: 16 Global Step: 170560 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:43:23,941-Speed 5949.29 samples/sec Loss 3.1245 LearningRate 0.0156 Epoch: 16 Global Step: 170570 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:43:30,814-Speed 5961.11 samples/sec Loss 3.2197 LearningRate 0.0156 Epoch: 16 Global Step: 170580 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:43:37,672-Speed 5973.71 samples/sec Loss 3.2125 LearningRate 0.0155 Epoch: 16 Global Step: 170590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:43:44,551-Speed 5957.70 samples/sec Loss 3.1812 LearningRate 0.0155 Epoch: 16 Global Step: 170600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:43:51,407-Speed 5975.84 samples/sec Loss 3.1479 LearningRate 0.0155 Epoch: 16 Global Step: 170610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:43:58,286-Speed 5955.39 samples/sec Loss 3.1582 LearningRate 0.0155 Epoch: 16 Global Step: 170620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:44:05,142-Speed 5975.35 samples/sec Loss 3.1600 LearningRate 0.0155 Epoch: 16 Global Step: 170630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:44:12,002-Speed 5971.81 samples/sec Loss 3.1720 LearningRate 0.0155 Epoch: 16 Global Step: 170640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:44:18,891-Speed 5946.65 samples/sec Loss 3.1831 LearningRate 0.0155 Epoch: 16 Global Step: 170650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:44:25,754-Speed 5970.32 samples/sec Loss 3.1648 LearningRate 0.0155 Epoch: 16 Global Step: 170660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:44:32,621-Speed 5965.24 samples/sec Loss 3.2343 LearningRate 0.0155 Epoch: 16 Global Step: 170670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:44:39,477-Speed 5975.64 samples/sec Loss 3.1820 LearningRate 0.0155 Epoch: 16 Global Step: 170680 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:44:46,326-Speed 5981.30 samples/sec Loss 3.1617 LearningRate 0.0155 Epoch: 16 Global Step: 170690 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:44:53,197-Speed 5963.30 samples/sec Loss 3.1646 LearningRate 0.0154 Epoch: 16 Global Step: 170700 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:45:00,051-Speed 5976.94 samples/sec Loss 3.1697 LearningRate 0.0154 Epoch: 16 Global Step: 170710 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:45:06,934-Speed 5954.59 samples/sec Loss 3.1625 LearningRate 0.0154 Epoch: 16 Global Step: 170720 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:45:13,795-Speed 5971.40 samples/sec Loss 3.1655 LearningRate 0.0154 Epoch: 16 Global Step: 170730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:45:20,652-Speed 5974.32 samples/sec Loss 3.1445 LearningRate 0.0154 Epoch: 16 Global Step: 170740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:45:27,511-Speed 5976.11 samples/sec Loss 3.1567 LearningRate 0.0154 Epoch: 16 Global Step: 170750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:45:34,378-Speed 5965.64 samples/sec Loss 3.1468 LearningRate 0.0154 Epoch: 16 Global Step: 170760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:45:41,230-Speed 5978.65 samples/sec Loss 3.1480 LearningRate 0.0154 Epoch: 16 Global Step: 170770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:45:48,083-Speed 5978.26 samples/sec Loss 3.1646 LearningRate 0.0154 Epoch: 16 Global Step: 170780 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:45:54,955-Speed 5962.47 samples/sec Loss 3.1680 LearningRate 0.0154 Epoch: 16 Global Step: 170790 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:46:01,820-Speed 5966.92 samples/sec Loss 3.1391 LearningRate 0.0154 Epoch: 16 Global Step: 170800 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:46:08,684-Speed 5968.69 samples/sec Loss 3.1615 LearningRate 0.0154 Epoch: 16 Global Step: 170810 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:46:15,553-Speed 5967.23 samples/sec Loss 3.1456 LearningRate 0.0153 Epoch: 16 Global Step: 170820 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:46:22,410-Speed 5975.22 samples/sec Loss 3.2030 LearningRate 0.0153 Epoch: 16 Global Step: 170830 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:46:29,295-Speed 5950.62 samples/sec Loss 3.1425 LearningRate 0.0153 Epoch: 16 Global Step: 170840 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:46:36,149-Speed 5977.09 samples/sec Loss 3.1554 LearningRate 0.0153 Epoch: 16 Global Step: 170850 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:46:43,009-Speed 5971.77 samples/sec Loss 3.1906 LearningRate 0.0153 Epoch: 16 Global Step: 170860 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:46:49,863-Speed 5976.94 samples/sec Loss 3.1539 LearningRate 0.0153 Epoch: 16 Global Step: 170870 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:46:56,708-Speed 5985.44 samples/sec Loss 3.1504 LearningRate 0.0153 Epoch: 16 Global Step: 170880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:47:03,566-Speed 5972.70 samples/sec Loss 3.1967 LearningRate 0.0153 Epoch: 16 Global Step: 170890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:47:10,456-Speed 5946.72 samples/sec Loss 3.1629 LearningRate 0.0153 Epoch: 16 Global Step: 170900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:47:17,306-Speed 5980.79 samples/sec Loss 3.1466 LearningRate 0.0153 Epoch: 16 Global Step: 170910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:47:24,174-Speed 5965.41 samples/sec Loss 3.2054 LearningRate 0.0153 Epoch: 16 Global Step: 170920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:47:31,025-Speed 5979.41 samples/sec Loss 3.1313 LearningRate 0.0153 Epoch: 16 Global Step: 170930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:47:37,873-Speed 5983.36 samples/sec Loss 3.1900 LearningRate 0.0152 Epoch: 16 Global Step: 170940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:47:44,734-Speed 5970.48 samples/sec Loss 3.1932 LearningRate 0.0152 Epoch: 16 Global Step: 170950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:47:51,594-Speed 5971.79 samples/sec Loss 3.1314 LearningRate 0.0152 Epoch: 16 Global Step: 170960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:47:58,439-Speed 5985.65 samples/sec Loss 3.1839 LearningRate 0.0152 Epoch: 16 Global Step: 170970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:48:05,299-Speed 5971.92 samples/sec Loss 3.1190 LearningRate 0.0152 Epoch: 16 Global Step: 170980 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-09 05:48:12,182-Speed 5951.81 samples/sec Loss 3.1601 LearningRate 0.0152 Epoch: 16 Global Step: 170990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-09 05:48:19,046-Speed 5968.05 samples/sec Loss 3.1495 LearningRate 0.0152 Epoch: 16 Global Step: 171000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-09 05:48:25,894-Speed 5982.41 samples/sec Loss 3.1449 LearningRate 0.0152 Epoch: 16 Global Step: 171010 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-09 05:48:32,743-Speed 5984.90 samples/sec Loss 3.1678 LearningRate 0.0152 Epoch: 16 Global Step: 171020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:48:39,616-Speed 5960.56 samples/sec Loss 3.1622 LearningRate 0.0152 Epoch: 16 Global Step: 171030 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:48:46,468-Speed 5978.18 samples/sec Loss 3.1138 LearningRate 0.0152 Epoch: 16 Global Step: 171040 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:48:53,332-Speed 5968.89 samples/sec Loss 3.1314 LearningRate 0.0152 Epoch: 16 Global Step: 171050 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:49:00,182-Speed 5981.13 samples/sec Loss 3.1192 LearningRate 0.0151 Epoch: 16 Global Step: 171060 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:49:07,039-Speed 5974.91 samples/sec Loss 3.1416 LearningRate 0.0151 Epoch: 16 Global Step: 171070 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:49:13,897-Speed 5973.54 samples/sec Loss 3.0987 LearningRate 0.0151 Epoch: 16 Global Step: 171080 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:49:20,774-Speed 5957.21 samples/sec Loss 3.1234 LearningRate 0.0151 Epoch: 16 Global Step: 171090 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:49:27,634-Speed 5971.79 samples/sec Loss 3.1770 LearningRate 0.0151 Epoch: 16 Global Step: 171100 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:49:34,485-Speed 5979.33 samples/sec Loss 3.1571 LearningRate 0.0151 Epoch: 16 Global Step: 171110 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:49:41,336-Speed 5983.23 samples/sec Loss 3.1523 LearningRate 0.0151 Epoch: 16 Global Step: 171120 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:49:48,206-Speed 5963.42 samples/sec Loss 3.1222 LearningRate 0.0151 Epoch: 16 Global Step: 171130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:49:55,060-Speed 5977.68 samples/sec Loss 3.0931 LearningRate 0.0151 Epoch: 16 Global Step: 171140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:50:01,930-Speed 5963.59 samples/sec Loss 3.1470 LearningRate 0.0151 Epoch: 16 Global Step: 171150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:50:08,793-Speed 5969.00 samples/sec Loss 3.1410 LearningRate 0.0151 Epoch: 16 Global Step: 171160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:50:15,681-Speed 5947.75 samples/sec Loss 3.1574 LearningRate 0.0151 Epoch: 16 Global Step: 171170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:50:22,547-Speed 5966.61 samples/sec Loss 3.1300 LearningRate 0.0150 Epoch: 16 Global Step: 171180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:50:29,408-Speed 5971.60 samples/sec Loss 3.1386 LearningRate 0.0150 Epoch: 16 Global Step: 171190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:50:36,275-Speed 5965.79 samples/sec Loss 3.1090 LearningRate 0.0150 Epoch: 16 Global Step: 171200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:50:43,199-Speed 5917.86 samples/sec Loss 3.1041 LearningRate 0.0150 Epoch: 16 Global Step: 171210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:50:50,119-Speed 5920.64 samples/sec Loss 3.0902 LearningRate 0.0150 Epoch: 16 Global Step: 171220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:50:56,980-Speed 5971.24 samples/sec Loss 3.1126 LearningRate 0.0150 Epoch: 16 Global Step: 171230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:51:03,830-Speed 5980.42 samples/sec Loss 3.1167 LearningRate 0.0150 Epoch: 16 Global Step: 171240 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:51:10,685-Speed 5976.64 samples/sec Loss 3.1361 LearningRate 0.0150 Epoch: 16 Global Step: 171250 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:51:17,558-Speed 5961.16 samples/sec Loss 3.1471 LearningRate 0.0150 Epoch: 16 Global Step: 171260 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:51:24,439-Speed 5955.87 samples/sec Loss 3.1138 LearningRate 0.0150 Epoch: 16 Global Step: 171270 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:51:31,315-Speed 5958.40 samples/sec Loss 3.1567 LearningRate 0.0150 Epoch: 16 Global Step: 171280 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:51:38,167-Speed 5978.63 samples/sec Loss 3.1042 LearningRate 0.0150 Epoch: 16 Global Step: 171290 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:51:45,024-Speed 5974.58 samples/sec Loss 3.1435 LearningRate 0.0149 Epoch: 16 Global Step: 171300 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:51:51,905-Speed 5953.55 samples/sec Loss 3.1099 LearningRate 0.0149 Epoch: 16 Global Step: 171310 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:51:58,762-Speed 5975.53 samples/sec Loss 3.1305 LearningRate 0.0149 Epoch: 16 Global Step: 171320 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:52:05,614-Speed 5978.50 samples/sec Loss 3.1728 LearningRate 0.0149 Epoch: 16 Global Step: 171330 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:52:12,475-Speed 5971.05 samples/sec Loss 3.1309 LearningRate 0.0149 Epoch: 16 Global Step: 171340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:52:19,346-Speed 5964.26 samples/sec Loss 3.1171 LearningRate 0.0149 Epoch: 16 Global Step: 171350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:52:26,212-Speed 5967.56 samples/sec Loss 3.1760 LearningRate 0.0149 Epoch: 16 Global Step: 171360 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:52:33,091-Speed 5955.06 samples/sec Loss 3.1532 LearningRate 0.0149 Epoch: 16 Global Step: 171370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:52:39,952-Speed 5973.29 samples/sec Loss 3.1441 LearningRate 0.0149 Epoch: 16 Global Step: 171380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:52:46,819-Speed 5965.69 samples/sec Loss 3.1154 LearningRate 0.0149 Epoch: 16 Global Step: 171390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:52:53,673-Speed 5977.09 samples/sec Loss 3.1527 LearningRate 0.0149 Epoch: 16 Global Step: 171400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:53:00,554-Speed 5953.88 samples/sec Loss 3.1416 LearningRate 0.0149 Epoch: 16 Global Step: 171410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:53:07,402-Speed 5982.71 samples/sec Loss 3.1147 LearningRate 0.0148 Epoch: 16 Global Step: 171420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:53:14,267-Speed 5967.59 samples/sec Loss 3.1330 LearningRate 0.0148 Epoch: 16 Global Step: 171430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:53:21,155-Speed 5947.35 samples/sec Loss 3.1314 LearningRate 0.0148 Epoch: 16 Global Step: 171440 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-09 05:53:28,004-Speed 5982.67 samples/sec Loss 3.0862 LearningRate 0.0148 Epoch: 16 Global Step: 171450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:53:34,881-Speed 5956.81 samples/sec Loss 3.1221 LearningRate 0.0148 Epoch: 16 Global Step: 171460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:53:41,732-Speed 5979.94 samples/sec Loss 3.0859 LearningRate 0.0148 Epoch: 16 Global Step: 171470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:53:48,600-Speed 5965.76 samples/sec Loss 3.1292 LearningRate 0.0148 Epoch: 16 Global Step: 171480 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:53:55,452-Speed 5978.89 samples/sec Loss 3.1213 LearningRate 0.0148 Epoch: 16 Global Step: 171490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:54:02,299-Speed 5983.61 samples/sec Loss 3.1054 LearningRate 0.0148 Epoch: 16 Global Step: 171500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:54:09,159-Speed 5972.48 samples/sec Loss 3.1294 LearningRate 0.0148 Epoch: 16 Global Step: 171510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:54:16,045-Speed 5949.20 samples/sec Loss 3.1389 LearningRate 0.0148 Epoch: 16 Global Step: 171520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:54:22,899-Speed 5977.95 samples/sec Loss 3.0738 LearningRate 0.0148 Epoch: 16 Global Step: 171530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:54:29,795-Speed 5940.87 samples/sec Loss 3.1472 LearningRate 0.0147 Epoch: 16 Global Step: 171540 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:54:36,675-Speed 5954.68 samples/sec Loss 3.1317 LearningRate 0.0147 Epoch: 16 Global Step: 171550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-09 05:54:43,517-Speed 5987.24 samples/sec Loss 3.1201 LearningRate 0.0147 Epoch: 16 Global Step: 171560 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:54:50,404-Speed 5948.93 samples/sec Loss 3.1427 LearningRate 0.0147 Epoch: 16 Global Step: 171570 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:54:57,278-Speed 5959.57 samples/sec Loss 3.1464 LearningRate 0.0147 Epoch: 16 Global Step: 171580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:55:04,149-Speed 5963.03 samples/sec Loss 3.1009 LearningRate 0.0147 Epoch: 16 Global Step: 171590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:55:11,007-Speed 5973.62 samples/sec Loss 3.1422 LearningRate 0.0147 Epoch: 16 Global Step: 171600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:55:17,878-Speed 5962.85 samples/sec Loss 3.0954 LearningRate 0.0147 Epoch: 16 Global Step: 171610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:55:24,741-Speed 5970.54 samples/sec Loss 3.1053 LearningRate 0.0147 Epoch: 16 Global Step: 171620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:55:31,592-Speed 5980.54 samples/sec Loss 3.1316 LearningRate 0.0147 Epoch: 16 Global Step: 171630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:55:38,442-Speed 5980.10 samples/sec Loss 3.1094 LearningRate 0.0147 Epoch: 16 Global Step: 171640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:55:45,296-Speed 5976.91 samples/sec Loss 3.0856 LearningRate 0.0147 Epoch: 16 Global Step: 171650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:55:52,140-Speed 5986.03 samples/sec Loss 3.0795 LearningRate 0.0147 Epoch: 16 Global Step: 171660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:55:58,997-Speed 5973.76 samples/sec Loss 3.0967 LearningRate 0.0146 Epoch: 16 Global Step: 171670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:56:05,879-Speed 5953.07 samples/sec Loss 3.0977 LearningRate 0.0146 Epoch: 16 Global Step: 171680 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:56:12,744-Speed 5967.62 samples/sec Loss 3.0835 LearningRate 0.0146 Epoch: 16 Global Step: 171690 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:56:19,601-Speed 5974.02 samples/sec Loss 3.0872 LearningRate 0.0146 Epoch: 16 Global Step: 171700 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:56:26,485-Speed 5952.08 samples/sec Loss 3.1300 LearningRate 0.0146 Epoch: 16 Global Step: 171710 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:56:33,340-Speed 5976.24 samples/sec Loss 3.0936 LearningRate 0.0146 Epoch: 16 Global Step: 171720 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:56:40,184-Speed 5985.25 samples/sec Loss 3.1218 LearningRate 0.0146 Epoch: 16 Global Step: 171730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:56:47,040-Speed 5975.99 samples/sec Loss 3.1144 LearningRate 0.0146 Epoch: 16 Global Step: 171740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:56:53,879-Speed 5990.58 samples/sec Loss 3.0608 LearningRate 0.0146 Epoch: 16 Global Step: 171750 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:57:00,730-Speed 5979.01 samples/sec Loss 3.0445 LearningRate 0.0146 Epoch: 16 Global Step: 171760 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:57:07,595-Speed 5967.56 samples/sec Loss 3.1164 LearningRate 0.0146 Epoch: 16 Global Step: 171770 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:57:14,456-Speed 5971.25 samples/sec Loss 3.1045 LearningRate 0.0146 Epoch: 16 Global Step: 171780 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:57:21,307-Speed 5979.31 samples/sec Loss 3.0505 LearningRate 0.0145 Epoch: 16 Global Step: 171790 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:57:28,168-Speed 5971.46 samples/sec Loss 3.0355 LearningRate 0.0145 Epoch: 16 Global Step: 171800 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:57:35,028-Speed 5972.58 samples/sec Loss 3.0660 LearningRate 0.0145 Epoch: 16 Global Step: 171810 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:57:41,881-Speed 5977.09 samples/sec Loss 3.1170 LearningRate 0.0145 Epoch: 16 Global Step: 171820 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:57:48,741-Speed 5975.20 samples/sec Loss 3.0664 LearningRate 0.0145 Epoch: 16 Global Step: 171830 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:57:55,596-Speed 5976.61 samples/sec Loss 3.1034 LearningRate 0.0145 Epoch: 16 Global Step: 171840 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 05:58:02,453-Speed 5974.22 samples/sec Loss 3.1071 LearningRate 0.0145 Epoch: 16 Global Step: 171850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:58:09,301-Speed 5982.43 samples/sec Loss 3.0675 LearningRate 0.0145 Epoch: 16 Global Step: 171860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:58:16,170-Speed 5963.49 samples/sec Loss 3.0349 LearningRate 0.0145 Epoch: 16 Global Step: 171870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:58:23,015-Speed 5985.26 samples/sec Loss 3.0911 LearningRate 0.0145 Epoch: 16 Global Step: 171880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:58:29,893-Speed 5956.41 samples/sec Loss 3.0902 LearningRate 0.0145 Epoch: 16 Global Step: 171890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:58:36,759-Speed 5966.94 samples/sec Loss 3.1037 LearningRate 0.0145 Epoch: 16 Global Step: 171900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:58:43,621-Speed 5969.58 samples/sec Loss 3.1095 LearningRate 0.0144 Epoch: 16 Global Step: 171910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:58:50,478-Speed 5978.10 samples/sec Loss 3.0839 LearningRate 0.0144 Epoch: 16 Global Step: 171920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:58:57,335-Speed 5977.21 samples/sec Loss 3.0668 LearningRate 0.0144 Epoch: 16 Global Step: 171930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:59:04,231-Speed 5940.79 samples/sec Loss 3.1194 LearningRate 0.0144 Epoch: 16 Global Step: 171940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:59:11,170-Speed 5904.12 samples/sec Loss 3.0689 LearningRate 0.0144 Epoch: 16 Global Step: 171950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-09 05:59:18,027-Speed 5974.47 samples/sec Loss 3.0858 LearningRate 0.0144 Epoch: 16 Global Step: 171960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:59:24,902-Speed 5958.97 samples/sec Loss 3.0718 LearningRate 0.0144 Epoch: 16 Global Step: 171970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:59:31,779-Speed 5957.48 samples/sec Loss 3.0450 LearningRate 0.0144 Epoch: 16 Global Step: 171980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:59:38,661-Speed 5953.34 samples/sec Loss 3.0821 LearningRate 0.0144 Epoch: 16 Global Step: 171990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:59:45,570-Speed 5929.61 samples/sec Loss 3.0375 LearningRate 0.0144 Epoch: 16 Global Step: 172000 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:59:52,421-Speed 5980.06 samples/sec Loss 3.0766 LearningRate 0.0144 Epoch: 16 Global Step: 172010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 05:59:59,312-Speed 5945.28 samples/sec Loss 3.1133 LearningRate 0.0144 Epoch: 16 Global Step: 172020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:00:06,184-Speed 5961.63 samples/sec Loss 3.0966 LearningRate 0.0143 Epoch: 16 Global Step: 172030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:00:13,061-Speed 5957.04 samples/sec Loss 3.0444 LearningRate 0.0143 Epoch: 16 Global Step: 172040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:00:19,935-Speed 5960.60 samples/sec Loss 3.0874 LearningRate 0.0143 Epoch: 16 Global Step: 172050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:00:26,785-Speed 5979.96 samples/sec Loss 3.1126 LearningRate 0.0143 Epoch: 16 Global Step: 172060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:00:33,773-Speed 5862.75 samples/sec Loss 3.0576 LearningRate 0.0143 Epoch: 16 Global Step: 172070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:00:40,649-Speed 5958.14 samples/sec Loss 3.0798 LearningRate 0.0143 Epoch: 16 Global Step: 172080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:00:47,526-Speed 5957.73 samples/sec Loss 3.0198 LearningRate 0.0143 Epoch: 16 Global Step: 172090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:00:54,423-Speed 5939.63 samples/sec Loss 3.0009 LearningRate 0.0143 Epoch: 16 Global Step: 172100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:01:01,290-Speed 5966.12 samples/sec Loss 3.0432 LearningRate 0.0143 Epoch: 16 Global Step: 172110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:01:08,158-Speed 5965.21 samples/sec Loss 3.0482 LearningRate 0.0143 Epoch: 16 Global Step: 172120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:01:15,055-Speed 5940.31 samples/sec Loss 3.1049 LearningRate 0.0143 Epoch: 16 Global Step: 172130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:01:21,913-Speed 5973.82 samples/sec Loss 3.0425 LearningRate 0.0143 Epoch: 16 Global Step: 172140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:01:28,759-Speed 5983.86 samples/sec Loss 3.0428 LearningRate 0.0143 Epoch: 16 Global Step: 172150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:01:35,620-Speed 5971.55 samples/sec Loss 3.0668 LearningRate 0.0142 Epoch: 16 Global Step: 172160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-09 06:01:42,491-Speed 5963.06 samples/sec Loss 3.0323 LearningRate 0.0142 Epoch: 16 Global Step: 172170 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-09 06:01:49,330-Speed 5989.65 samples/sec Loss 3.0486 LearningRate 0.0142 Epoch: 16 Global Step: 172180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:01:56,202-Speed 5964.41 samples/sec Loss 3.0469 LearningRate 0.0142 Epoch: 16 Global Step: 172190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:02:03,088-Speed 5949.79 samples/sec Loss 3.0898 LearningRate 0.0142 Epoch: 16 Global Step: 172200 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:02:09,929-Speed 5988.40 samples/sec Loss 3.0830 LearningRate 0.0142 Epoch: 16 Global Step: 172210 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:02:16,781-Speed 5978.87 samples/sec Loss 3.1137 LearningRate 0.0142 Epoch: 16 Global Step: 172220 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:02:23,686-Speed 5933.81 samples/sec Loss 3.0401 LearningRate 0.0142 Epoch: 16 Global Step: 172230 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:02:30,564-Speed 5955.69 samples/sec Loss 3.0767 LearningRate 0.0142 Epoch: 16 Global Step: 172240 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:02:37,422-Speed 5974.40 samples/sec Loss 3.0775 LearningRate 0.0142 Epoch: 16 Global Step: 172250 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:02:44,303-Speed 5953.26 samples/sec Loss 3.0616 LearningRate 0.0142 Epoch: 16 Global Step: 172260 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:02:51,180-Speed 5957.70 samples/sec Loss 3.1239 LearningRate 0.0142 Epoch: 16 Global Step: 172270 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:02:58,062-Speed 5952.95 samples/sec Loss 3.0832 LearningRate 0.0141 Epoch: 16 Global Step: 172280 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:03:04,934-Speed 5962.14 samples/sec Loss 3.0518 LearningRate 0.0141 Epoch: 16 Global Step: 172290 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:03:11,788-Speed 5977.50 samples/sec Loss 3.0443 LearningRate 0.0141 Epoch: 16 Global Step: 172300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:03:18,648-Speed 5972.56 samples/sec Loss 3.0522 LearningRate 0.0141 Epoch: 16 Global Step: 172310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:03:25,493-Speed 5984.89 samples/sec Loss 3.0359 LearningRate 0.0141 Epoch: 16 Global Step: 172320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:03:32,355-Speed 5969.61 samples/sec Loss 3.0446 LearningRate 0.0141 Epoch: 16 Global Step: 172330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:03:39,203-Speed 5982.81 samples/sec Loss 3.0487 LearningRate 0.0141 Epoch: 16 Global Step: 172340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:03:46,054-Speed 5979.91 samples/sec Loss 3.0504 LearningRate 0.0141 Epoch: 16 Global Step: 172350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:03:52,929-Speed 5959.07 samples/sec Loss 3.0590 LearningRate 0.0141 Epoch: 16 Global Step: 172360 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:03:59,787-Speed 5974.33 samples/sec Loss 3.0746 LearningRate 0.0141 Epoch: 16 Global Step: 172370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:04:06,633-Speed 5984.59 samples/sec Loss 3.0688 LearningRate 0.0141 Epoch: 16 Global Step: 172380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:04:13,501-Speed 5964.92 samples/sec Loss 3.0721 LearningRate 0.0141 Epoch: 16 Global Step: 172390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:04:20,366-Speed 5968.99 samples/sec Loss 3.0505 LearningRate 0.0141 Epoch: 16 Global Step: 172400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-09 06:04:27,220-Speed 5977.63 samples/sec Loss 3.0264 LearningRate 0.0140 Epoch: 16 Global Step: 172410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-09 06:04:34,063-Speed 5986.50 samples/sec Loss 3.0794 LearningRate 0.0140 Epoch: 16 Global Step: 172420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:04:40,918-Speed 5975.94 samples/sec Loss 3.0458 LearningRate 0.0140 Epoch: 16 Global Step: 172430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:04:47,793-Speed 5961.36 samples/sec Loss 3.0432 LearningRate 0.0140 Epoch: 16 Global Step: 172440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:04:54,645-Speed 5978.68 samples/sec Loss 3.0299 LearningRate 0.0140 Epoch: 16 Global Step: 172450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:05:01,519-Speed 5960.17 samples/sec Loss 3.0427 LearningRate 0.0140 Epoch: 16 Global Step: 172460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:05:08,372-Speed 5977.45 samples/sec Loss 3.0499 LearningRate 0.0140 Epoch: 16 Global Step: 172470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:05:15,233-Speed 5971.37 samples/sec Loss 3.0166 LearningRate 0.0140 Epoch: 16 Global Step: 172480 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:05:22,090-Speed 5976.97 samples/sec Loss 2.9922 LearningRate 0.0140 Epoch: 16 Global Step: 172490 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:05:28,963-Speed 5960.49 samples/sec Loss 3.0725 LearningRate 0.0140 Epoch: 16 Global Step: 172500 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:05:35,825-Speed 5969.74 samples/sec Loss 3.0574 LearningRate 0.0140 Epoch: 16 Global Step: 172510 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:05:42,686-Speed 5971.29 samples/sec Loss 3.0386 LearningRate 0.0140 Epoch: 16 Global Step: 172520 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:05:49,558-Speed 5962.38 samples/sec Loss 3.0304 LearningRate 0.0139 Epoch: 16 Global Step: 172530 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:05:56,411-Speed 5977.81 samples/sec Loss 3.0800 LearningRate 0.0139 Epoch: 16 Global Step: 172540 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:06:03,279-Speed 5965.02 samples/sec Loss 3.0542 LearningRate 0.0139 Epoch: 16 Global Step: 172550 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:06:10,145-Speed 5967.34 samples/sec Loss 3.0595 LearningRate 0.0139 Epoch: 16 Global Step: 172560 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:06:16,998-Speed 5978.56 samples/sec Loss 2.9959 LearningRate 0.0139 Epoch: 16 Global Step: 172570 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:06:23,848-Speed 5980.00 samples/sec Loss 3.0632 LearningRate 0.0139 Epoch: 16 Global Step: 172580 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:06:30,715-Speed 5966.31 samples/sec Loss 3.0672 LearningRate 0.0139 Epoch: 16 Global Step: 172590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:06:37,586-Speed 5962.61 samples/sec Loss 3.0436 LearningRate 0.0139 Epoch: 16 Global Step: 172600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:06:44,442-Speed 5975.58 samples/sec Loss 3.0123 LearningRate 0.0139 Epoch: 16 Global Step: 172610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:06:51,304-Speed 5970.12 samples/sec Loss 3.0948 LearningRate 0.0139 Epoch: 16 Global Step: 172620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:06:58,143-Speed 5990.65 samples/sec Loss 3.0241 LearningRate 0.0139 Epoch: 16 Global Step: 172630 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:07:05,022-Speed 5955.61 samples/sec Loss 3.0152 LearningRate 0.0139 Epoch: 16 Global Step: 172640 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:07:11,883-Speed 5972.78 samples/sec Loss 3.0538 LearningRate 0.0139 Epoch: 16 Global Step: 172650 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:07:18,741-Speed 5973.53 samples/sec Loss 3.0379 LearningRate 0.0138 Epoch: 16 Global Step: 172660 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:07:25,610-Speed 5964.44 samples/sec Loss 3.0384 LearningRate 0.0138 Epoch: 16 Global Step: 172670 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:07:32,462-Speed 5979.53 samples/sec Loss 3.0303 LearningRate 0.0138 Epoch: 16 Global Step: 172680 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:07:39,325-Speed 5968.82 samples/sec Loss 3.0207 LearningRate 0.0138 Epoch: 16 Global Step: 172690 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:07:46,199-Speed 5960.35 samples/sec Loss 3.0202 LearningRate 0.0138 Epoch: 16 Global Step: 172700 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:07:53,063-Speed 5969.15 samples/sec Loss 2.9961 LearningRate 0.0138 Epoch: 16 Global Step: 172710 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:07:59,906-Speed 5985.91 samples/sec Loss 3.0405 LearningRate 0.0138 Epoch: 16 Global Step: 172720 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:08:06,759-Speed 5978.33 samples/sec Loss 3.0372 LearningRate 0.0138 Epoch: 16 Global Step: 172730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:08:13,615-Speed 5976.53 samples/sec Loss 3.0579 LearningRate 0.0138 Epoch: 16 Global Step: 172740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:08:20,486-Speed 5962.77 samples/sec Loss 3.0615 LearningRate 0.0138 Epoch: 16 Global Step: 172750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:08:27,354-Speed 5964.70 samples/sec Loss 3.0527 LearningRate 0.0138 Epoch: 16 Global Step: 172760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:08:34,205-Speed 5981.95 samples/sec Loss 3.0395 LearningRate 0.0138 Epoch: 16 Global Step: 172770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:08:41,057-Speed 5978.92 samples/sec Loss 3.0037 LearningRate 0.0137 Epoch: 16 Global Step: 172780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:08:47,908-Speed 5979.27 samples/sec Loss 3.0483 LearningRate 0.0137 Epoch: 16 Global Step: 172790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:08:54,762-Speed 5977.41 samples/sec Loss 3.0054 LearningRate 0.0137 Epoch: 16 Global Step: 172800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:09:01,610-Speed 5982.38 samples/sec Loss 3.0748 LearningRate 0.0137 Epoch: 16 Global Step: 172810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:09:08,468-Speed 5974.17 samples/sec Loss 3.0042 LearningRate 0.0137 Epoch: 16 Global Step: 172820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:09:15,315-Speed 5983.68 samples/sec Loss 3.0369 LearningRate 0.0137 Epoch: 16 Global Step: 172830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:09:22,171-Speed 5975.02 samples/sec Loss 3.0296 LearningRate 0.0137 Epoch: 16 Global Step: 172840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:09:29,035-Speed 5968.83 samples/sec Loss 3.0054 LearningRate 0.0137 Epoch: 16 Global Step: 172850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:09:35,900-Speed 5967.74 samples/sec Loss 3.0382 LearningRate 0.0137 Epoch: 16 Global Step: 172860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:09:42,765-Speed 5967.45 samples/sec Loss 3.0407 LearningRate 0.0137 Epoch: 16 Global Step: 172870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:09:49,620-Speed 5976.41 samples/sec Loss 3.0349 LearningRate 0.0137 Epoch: 16 Global Step: 172880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:09:56,483-Speed 5970.87 samples/sec Loss 3.0194 LearningRate 0.0137 Epoch: 16 Global Step: 172890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:10:03,348-Speed 5967.84 samples/sec Loss 3.0122 LearningRate 0.0137 Epoch: 16 Global Step: 172900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:10:10,207-Speed 5972.93 samples/sec Loss 2.9937 LearningRate 0.0136 Epoch: 16 Global Step: 172910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:10:17,072-Speed 5968.55 samples/sec Loss 3.0397 LearningRate 0.0136 Epoch: 16 Global Step: 172920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:10:23,918-Speed 5983.20 samples/sec Loss 3.0216 LearningRate 0.0136 Epoch: 16 Global Step: 172930 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-09 06:10:30,791-Speed 5960.64 samples/sec Loss 3.0508 LearningRate 0.0136 Epoch: 16 Global Step: 172940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-09 06:10:37,656-Speed 5968.13 samples/sec Loss 3.0026 LearningRate 0.0136 Epoch: 16 Global Step: 172950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:10:44,507-Speed 5979.30 samples/sec Loss 3.0146 LearningRate 0.0136 Epoch: 16 Global Step: 172960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:10:51,371-Speed 5969.17 samples/sec Loss 2.9998 LearningRate 0.0136 Epoch: 16 Global Step: 172970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:10:58,215-Speed 5985.89 samples/sec Loss 3.0497 LearningRate 0.0136 Epoch: 16 Global Step: 172980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:11:05,073-Speed 5973.24 samples/sec Loss 3.0028 LearningRate 0.0136 Epoch: 16 Global Step: 172990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:11:11,942-Speed 5963.95 samples/sec Loss 3.0135 LearningRate 0.0136 Epoch: 16 Global Step: 173000 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:11:18,801-Speed 5972.88 samples/sec Loss 3.0161 LearningRate 0.0136 Epoch: 16 Global Step: 173010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:11:25,664-Speed 5968.84 samples/sec Loss 3.0213 LearningRate 0.0136 Epoch: 16 Global Step: 173020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:11:32,541-Speed 5960.03 samples/sec Loss 2.9543 LearningRate 0.0135 Epoch: 16 Global Step: 173030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:11:39,403-Speed 5972.22 samples/sec Loss 3.0195 LearningRate 0.0135 Epoch: 16 Global Step: 173040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:11:46,245-Speed 5986.80 samples/sec Loss 2.9766 LearningRate 0.0135 Epoch: 16 Global Step: 173050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:11:53,096-Speed 5979.85 samples/sec Loss 3.0068 LearningRate 0.0135 Epoch: 16 Global Step: 173060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:11:59,966-Speed 5963.88 samples/sec Loss 2.9893 LearningRate 0.0135 Epoch: 16 Global Step: 173070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:12:06,823-Speed 5974.44 samples/sec Loss 2.9978 LearningRate 0.0135 Epoch: 16 Global Step: 173080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:12:13,682-Speed 5972.63 samples/sec Loss 2.9798 LearningRate 0.0135 Epoch: 16 Global Step: 173090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:12:20,530-Speed 5985.68 samples/sec Loss 3.0467 LearningRate 0.0135 Epoch: 16 Global Step: 173100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:12:27,373-Speed 5986.60 samples/sec Loss 3.0011 LearningRate 0.0135 Epoch: 16 Global Step: 173110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:12:34,214-Speed 5990.59 samples/sec Loss 2.9646 LearningRate 0.0135 Epoch: 16 Global Step: 173120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:12:41,081-Speed 5965.96 samples/sec Loss 2.9866 LearningRate 0.0135 Epoch: 16 Global Step: 173130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:12:47,946-Speed 5967.81 samples/sec Loss 2.9787 LearningRate 0.0135 Epoch: 16 Global Step: 173140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:12:54,794-Speed 5983.84 samples/sec Loss 2.9661 LearningRate 0.0135 Epoch: 16 Global Step: 173150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-09 06:13:01,657-Speed 5969.90 samples/sec Loss 3.0282 LearningRate 0.0134 Epoch: 16 Global Step: 173160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:13:08,540-Speed 5951.33 samples/sec Loss 2.9763 LearningRate 0.0134 Epoch: 16 Global Step: 173170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:13:15,392-Speed 5979.17 samples/sec Loss 2.9972 LearningRate 0.0134 Epoch: 16 Global Step: 173180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:13:22,250-Speed 5978.34 samples/sec Loss 3.0127 LearningRate 0.0134 Epoch: 16 Global Step: 173190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:13:29,101-Speed 5980.17 samples/sec Loss 3.0009 LearningRate 0.0134 Epoch: 16 Global Step: 173200 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:13:36,005-Speed 5933.70 samples/sec Loss 2.9938 LearningRate 0.0134 Epoch: 16 Global Step: 173210 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:13:42,900-Speed 5942.19 samples/sec Loss 3.0107 LearningRate 0.0134 Epoch: 16 Global Step: 173220 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:13:49,755-Speed 5975.86 samples/sec Loss 2.9842 LearningRate 0.0134 Epoch: 16 Global Step: 173230 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:13:56,618-Speed 5969.31 samples/sec Loss 2.9909 LearningRate 0.0134 Epoch: 16 Global Step: 173240 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:14:03,484-Speed 5967.20 samples/sec Loss 2.9524 LearningRate 0.0134 Epoch: 16 Global Step: 173250 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:14:10,370-Speed 5949.49 samples/sec Loss 2.9878 LearningRate 0.0134 Epoch: 16 Global Step: 173260 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:14:17,282-Speed 5927.24 samples/sec Loss 2.9697 LearningRate 0.0134 Epoch: 16 Global Step: 173270 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:14:24,228-Speed 5898.16 samples/sec Loss 2.9817 LearningRate 0.0134 Epoch: 16 Global Step: 173280 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:14:31,154-Speed 5915.08 samples/sec Loss 3.0025 LearningRate 0.0133 Epoch: 16 Global Step: 173290 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:14:38,070-Speed 5924.10 samples/sec Loss 2.9885 LearningRate 0.0133 Epoch: 16 Global Step: 173300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:14:44,923-Speed 5977.87 samples/sec Loss 3.0049 LearningRate 0.0133 Epoch: 16 Global Step: 173310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:14:51,814-Speed 5945.08 samples/sec Loss 2.9455 LearningRate 0.0133 Epoch: 16 Global Step: 173320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:14:58,667-Speed 5978.87 samples/sec Loss 2.9792 LearningRate 0.0133 Epoch: 16 Global Step: 173330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:15:05,522-Speed 5976.29 samples/sec Loss 2.9908 LearningRate 0.0133 Epoch: 16 Global Step: 173340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:15:12,383-Speed 5971.22 samples/sec Loss 2.9755 LearningRate 0.0133 Epoch: 16 Global Step: 173350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:15:19,239-Speed 5975.73 samples/sec Loss 3.0114 LearningRate 0.0133 Epoch: 16 Global Step: 173360 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:15:26,122-Speed 5951.99 samples/sec Loss 3.0357 LearningRate 0.0133 Epoch: 16 Global Step: 173370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:15:32,992-Speed 5963.31 samples/sec Loss 2.9722 LearningRate 0.0133 Epoch: 16 Global Step: 173380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:15:39,874-Speed 5953.38 samples/sec Loss 2.9875 LearningRate 0.0133 Epoch: 16 Global Step: 173390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:15:46,798-Speed 5916.80 samples/sec Loss 2.9525 LearningRate 0.0133 Epoch: 16 Global Step: 173400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:15:53,659-Speed 5971.13 samples/sec Loss 2.9765 LearningRate 0.0133 Epoch: 16 Global Step: 173410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:16:00,518-Speed 5972.79 samples/sec Loss 3.0022 LearningRate 0.0132 Epoch: 16 Global Step: 173420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:16:07,487-Speed 5878.84 samples/sec Loss 2.9936 LearningRate 0.0132 Epoch: 16 Global Step: 173430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:16:14,390-Speed 5934.40 samples/sec Loss 3.0057 LearningRate 0.0132 Epoch: 16 Global Step: 173440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:16:21,251-Speed 5971.96 samples/sec Loss 2.9871 LearningRate 0.0132 Epoch: 16 Global Step: 173450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:16:28,112-Speed 5971.40 samples/sec Loss 2.9769 LearningRate 0.0132 Epoch: 16 Global Step: 173460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:16:34,974-Speed 5970.13 samples/sec Loss 2.9869 LearningRate 0.0132 Epoch: 16 Global Step: 173470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:16:41,843-Speed 5964.66 samples/sec Loss 2.9606 LearningRate 0.0132 Epoch: 16 Global Step: 173480 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:16:48,705-Speed 5969.34 samples/sec Loss 2.9653 LearningRate 0.0132 Epoch: 16 Global Step: 173490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:16:55,580-Speed 5959.36 samples/sec Loss 2.9609 LearningRate 0.0132 Epoch: 16 Global Step: 173500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-09 06:17:02,412-Speed 5996.71 samples/sec Loss 2.9531 LearningRate 0.0132 Epoch: 16 Global Step: 173510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:17:09,291-Speed 5955.51 samples/sec Loss 2.9782 LearningRate 0.0132 Epoch: 16 Global Step: 173520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:17:16,155-Speed 5968.11 samples/sec Loss 2.9841 LearningRate 0.0132 Epoch: 16 Global Step: 173530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:17:23,008-Speed 5978.36 samples/sec Loss 2.9961 LearningRate 0.0131 Epoch: 16 Global Step: 173540 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:17:29,913-Speed 5933.22 samples/sec Loss 2.9649 LearningRate 0.0131 Epoch: 16 Global Step: 173550 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:17:36,800-Speed 5948.87 samples/sec Loss 2.9728 LearningRate 0.0131 Epoch: 16 Global Step: 173560 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:17:43,658-Speed 5973.52 samples/sec Loss 2.9734 LearningRate 0.0131 Epoch: 16 Global Step: 173570 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:17:50,527-Speed 5964.38 samples/sec Loss 2.9165 LearningRate 0.0131 Epoch: 16 Global Step: 173580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:17:57,411-Speed 5951.24 samples/sec Loss 2.9770 LearningRate 0.0131 Epoch: 16 Global Step: 173590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:18:04,267-Speed 5975.88 samples/sec Loss 2.9290 LearningRate 0.0131 Epoch: 16 Global Step: 173600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:18:11,135-Speed 5964.66 samples/sec Loss 2.9950 LearningRate 0.0131 Epoch: 16 Global Step: 173610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:18:18,017-Speed 5953.15 samples/sec Loss 2.9752 LearningRate 0.0131 Epoch: 16 Global Step: 173620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:18:24,886-Speed 5963.95 samples/sec Loss 2.9579 LearningRate 0.0131 Epoch: 16 Global Step: 173630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:18:31,743-Speed 5974.73 samples/sec Loss 2.9705 LearningRate 0.0131 Epoch: 16 Global Step: 173640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:18:38,600-Speed 5974.29 samples/sec Loss 2.9597 LearningRate 0.0131 Epoch: 16 Global Step: 173650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:18:45,476-Speed 5957.79 samples/sec Loss 2.9542 LearningRate 0.0131 Epoch: 16 Global Step: 173660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:18:52,335-Speed 5972.34 samples/sec Loss 2.9867 LearningRate 0.0130 Epoch: 16 Global Step: 173670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:18:59,217-Speed 5953.42 samples/sec Loss 2.9790 LearningRate 0.0130 Epoch: 16 Global Step: 173680 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:19:06,081-Speed 5968.53 samples/sec Loss 2.9645 LearningRate 0.0130 Epoch: 16 Global Step: 173690 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:19:12,975-Speed 5942.48 samples/sec Loss 2.9369 LearningRate 0.0130 Epoch: 16 Global Step: 173700 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:19:19,829-Speed 5977.11 samples/sec Loss 2.9569 LearningRate 0.0130 Epoch: 16 Global Step: 173710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-09 06:19:26,692-Speed 5969.96 samples/sec Loss 2.9373 LearningRate 0.0130 Epoch: 16 Global Step: 173720 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-09 06:19:33,539-Speed 5983.81 samples/sec Loss 2.9642 LearningRate 0.0130 Epoch: 16 Global Step: 173730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:19:40,406-Speed 5965.67 samples/sec Loss 2.9898 LearningRate 0.0130 Epoch: 16 Global Step: 173740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:19:47,266-Speed 5971.88 samples/sec Loss 2.9776 LearningRate 0.0130 Epoch: 16 Global Step: 173750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:19:54,097-Speed 5997.87 samples/sec Loss 2.9030 LearningRate 0.0130 Epoch: 16 Global Step: 173760 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:20:00,957-Speed 5971.54 samples/sec Loss 2.9478 LearningRate 0.0130 Epoch: 16 Global Step: 173770 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:20:07,797-Speed 5989.62 samples/sec Loss 2.9482 LearningRate 0.0130 Epoch: 16 Global Step: 173780 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:20:14,655-Speed 5973.35 samples/sec Loss 2.9437 LearningRate 0.0130 Epoch: 16 Global Step: 173790 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:20:21,504-Speed 5981.19 samples/sec Loss 2.9326 LearningRate 0.0129 Epoch: 16 Global Step: 173800 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:20:28,356-Speed 5979.62 samples/sec Loss 2.9394 LearningRate 0.0129 Epoch: 16 Global Step: 173810 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:20:35,249-Speed 5943.42 samples/sec Loss 2.9528 LearningRate 0.0129 Epoch: 16 Global Step: 173820 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:20:42,151-Speed 5935.61 samples/sec Loss 2.9165 LearningRate 0.0129 Epoch: 16 Global Step: 173830 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:20:48,996-Speed 5984.95 samples/sec Loss 2.9428 LearningRate 0.0129 Epoch: 16 Global Step: 173840 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:20:55,927-Speed 5910.86 samples/sec Loss 2.9415 LearningRate 0.0129 Epoch: 16 Global Step: 173850 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:21:02,774-Speed 5983.71 samples/sec Loss 2.9574 LearningRate 0.0129 Epoch: 16 Global Step: 173860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:21:09,619-Speed 5984.54 samples/sec Loss 2.9745 LearningRate 0.0129 Epoch: 16 Global Step: 173870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:21:16,481-Speed 5970.67 samples/sec Loss 2.9543 LearningRate 0.0129 Epoch: 16 Global Step: 173880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:21:23,350-Speed 5963.66 samples/sec Loss 2.9473 LearningRate 0.0129 Epoch: 16 Global Step: 173890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:21:30,200-Speed 5980.98 samples/sec Loss 2.9487 LearningRate 0.0129 Epoch: 16 Global Step: 173900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:21:37,065-Speed 5968.45 samples/sec Loss 2.9495 LearningRate 0.0129 Epoch: 16 Global Step: 173910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:21:43,932-Speed 5966.11 samples/sec Loss 2.9320 LearningRate 0.0129 Epoch: 16 Global Step: 173920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:21:50,798-Speed 5967.14 samples/sec Loss 2.9352 LearningRate 0.0128 Epoch: 16 Global Step: 173930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:21:57,642-Speed 5987.93 samples/sec Loss 2.9588 LearningRate 0.0128 Epoch: 16 Global Step: 173940 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:22:04,491-Speed 5981.16 samples/sec Loss 2.9437 LearningRate 0.0128 Epoch: 16 Global Step: 173950 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:22:11,343-Speed 5978.50 samples/sec Loss 2.9241 LearningRate 0.0128 Epoch: 16 Global Step: 173960 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:22:18,198-Speed 5976.64 samples/sec Loss 2.9367 LearningRate 0.0128 Epoch: 16 Global Step: 173970 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:22:25,081-Speed 5951.93 samples/sec Loss 2.9676 LearningRate 0.0128 Epoch: 16 Global Step: 173980 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:22:31,946-Speed 5968.14 samples/sec Loss 2.9707 LearningRate 0.0128 Epoch: 16 Global Step: 173990 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:22:38,811-Speed 5968.22 samples/sec Loss 2.9613 LearningRate 0.0128 Epoch: 16 Global Step: 174000 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:22:45,680-Speed 5963.74 samples/sec Loss 2.9460 LearningRate 0.0128 Epoch: 16 Global Step: 174010 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:22:52,567-Speed 5948.70 samples/sec Loss 2.9505 LearningRate 0.0128 Epoch: 16 Global Step: 174020 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:22:59,458-Speed 5945.43 samples/sec Loss 2.9112 LearningRate 0.0128 Epoch: 16 Global Step: 174030 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:23:06,311-Speed 5977.84 samples/sec Loss 2.9117 LearningRate 0.0128 Epoch: 16 Global Step: 174040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:23:13,163-Speed 5979.18 samples/sec Loss 2.9227 LearningRate 0.0128 Epoch: 16 Global Step: 174050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:23:20,066-Speed 5937.62 samples/sec Loss 2.9402 LearningRate 0.0127 Epoch: 16 Global Step: 174060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:23:26,925-Speed 5972.27 samples/sec Loss 2.9325 LearningRate 0.0127 Epoch: 16 Global Step: 174070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:23:33,775-Speed 5980.99 samples/sec Loss 2.9618 LearningRate 0.0127 Epoch: 16 Global Step: 174080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:23:40,627-Speed 5981.53 samples/sec Loss 2.9634 LearningRate 0.0127 Epoch: 16 Global Step: 174090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:23:47,506-Speed 5955.00 samples/sec Loss 2.9623 LearningRate 0.0127 Epoch: 16 Global Step: 174100 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:23:54,369-Speed 5969.04 samples/sec Loss 2.9441 LearningRate 0.0127 Epoch: 16 Global Step: 174110 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:24:01,224-Speed 5977.16 samples/sec Loss 2.9182 LearningRate 0.0127 Epoch: 16 Global Step: 174120 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:24:08,078-Speed 5976.39 samples/sec Loss 2.9132 LearningRate 0.0127 Epoch: 16 Global Step: 174130 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:24:14,965-Speed 5948.88 samples/sec Loss 2.9614 LearningRate 0.0127 Epoch: 16 Global Step: 174140 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:24:21,822-Speed 5975.15 samples/sec Loss 2.9562 LearningRate 0.0127 Epoch: 16 Global Step: 174150 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:24:28,712-Speed 5945.52 samples/sec Loss 2.9463 LearningRate 0.0127 Epoch: 16 Global Step: 174160 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:24:35,565-Speed 5978.37 samples/sec Loss 2.9671 LearningRate 0.0127 Epoch: 16 Global Step: 174170 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:24:42,418-Speed 5977.75 samples/sec Loss 2.9142 LearningRate 0.0127 Epoch: 16 Global Step: 174180 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:24:49,275-Speed 5973.41 samples/sec Loss 2.9184 LearningRate 0.0126 Epoch: 16 Global Step: 174190 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 06:24:56,129-Speed 5977.06 samples/sec Loss 2.9438 LearningRate 0.0126 Epoch: 16 Global Step: 174200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:25:02,988-Speed 5973.85 samples/sec Loss 2.9367 LearningRate 0.0126 Epoch: 16 Global Step: 174210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 06:25:09,837-Speed 5980.74 samples/sec Loss 2.9417 LearningRate 0.0126 Epoch: 16 Global Step: 174220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:25:16,710-Speed 5960.67 samples/sec Loss 2.9555 LearningRate 0.0126 Epoch: 16 Global Step: 174230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:25:23,606-Speed 5941.05 samples/sec Loss 2.9275 LearningRate 0.0126 Epoch: 16 Global Step: 174240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:25:30,486-Speed 5954.06 samples/sec Loss 2.9382 LearningRate 0.0126 Epoch: 16 Global Step: 174250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:25:37,352-Speed 5967.17 samples/sec Loss 2.9634 LearningRate 0.0126 Epoch: 16 Global Step: 174260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:25:44,205-Speed 5978.92 samples/sec Loss 2.9624 LearningRate 0.0126 Epoch: 16 Global Step: 174270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:25:51,046-Speed 5987.84 samples/sec Loss 2.9287 LearningRate 0.0126 Epoch: 16 Global Step: 174280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:25:57,910-Speed 5971.33 samples/sec Loss 2.8770 LearningRate 0.0126 Epoch: 16 Global Step: 174290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:26:04,754-Speed 5985.87 samples/sec Loss 2.9293 LearningRate 0.0126 Epoch: 16 Global Step: 174300 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 06:26:11,609-Speed 5978.66 samples/sec Loss 2.8951 LearningRate 0.0126 Epoch: 16 Global Step: 174310 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 06:26:18,457-Speed 5982.68 samples/sec Loss 2.8887 LearningRate 0.0126 Epoch: 16 Global Step: 174320 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 06:26:25,309-Speed 5979.43 samples/sec Loss 2.9112 LearningRate 0.0125 Epoch: 16 Global Step: 174330 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 06:26:32,163-Speed 5976.24 samples/sec Loss 2.9291 LearningRate 0.0125 Epoch: 16 Global Step: 174340 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 06:26:39,023-Speed 5972.58 samples/sec Loss 2.9287 LearningRate 0.0125 Epoch: 16 Global Step: 174350 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 06:26:45,877-Speed 5978.09 samples/sec Loss 2.9098 LearningRate 0.0125 Epoch: 16 Global Step: 174360 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 06:26:52,724-Speed 5982.68 samples/sec Loss 2.9030 LearningRate 0.0125 Epoch: 16 Global Step: 174370 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 06:26:59,570-Speed 5984.25 samples/sec Loss 2.8606 LearningRate 0.0125 Epoch: 16 Global Step: 174380 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 06:27:06,450-Speed 5955.73 samples/sec Loss 2.8844 LearningRate 0.0125 Epoch: 16 Global Step: 174390 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 06:27:13,329-Speed 5955.18 samples/sec Loss 2.8900 LearningRate 0.0125 Epoch: 16 Global Step: 174400 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:27:20,183-Speed 5976.90 samples/sec Loss 2.8885 LearningRate 0.0125 Epoch: 16 Global Step: 174410 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:27:27,036-Speed 5978.44 samples/sec Loss 2.9312 LearningRate 0.0125 Epoch: 16 Global Step: 174420 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:27:33,885-Speed 5981.38 samples/sec Loss 2.9080 LearningRate 0.0125 Epoch: 16 Global Step: 174430 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:27:40,736-Speed 5979.64 samples/sec Loss 2.9423 LearningRate 0.0125 Epoch: 16 Global Step: 174440 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:27:47,604-Speed 5967.69 samples/sec Loss 2.8566 LearningRate 0.0125 Epoch: 16 Global Step: 174450 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:27:54,454-Speed 5980.74 samples/sec Loss 2.9389 LearningRate 0.0124 Epoch: 16 Global Step: 174460 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:28:01,295-Speed 5988.57 samples/sec Loss 2.8984 LearningRate 0.0124 Epoch: 16 Global Step: 174470 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:28:08,154-Speed 5973.25 samples/sec Loss 2.9132 LearningRate 0.0124 Epoch: 16 Global Step: 174480 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:28:14,995-Speed 5988.23 samples/sec Loss 2.9137 LearningRate 0.0124 Epoch: 16 Global Step: 174490 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:28:21,863-Speed 5965.30 samples/sec Loss 2.9384 LearningRate 0.0124 Epoch: 16 Global Step: 174500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:28:28,753-Speed 5946.69 samples/sec Loss 2.9233 LearningRate 0.0124 Epoch: 16 Global Step: 174510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:28:35,603-Speed 5980.51 samples/sec Loss 2.9472 LearningRate 0.0124 Epoch: 16 Global Step: 174520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:28:42,450-Speed 5983.46 samples/sec Loss 2.8921 LearningRate 0.0124 Epoch: 16 Global Step: 174530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:28:49,308-Speed 5975.00 samples/sec Loss 2.9192 LearningRate 0.0124 Epoch: 16 Global Step: 174540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:28:56,184-Speed 5958.00 samples/sec Loss 2.8750 LearningRate 0.0124 Epoch: 16 Global Step: 174550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:29:03,053-Speed 5964.28 samples/sec Loss 2.8989 LearningRate 0.0124 Epoch: 16 Global Step: 174560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:29:09,893-Speed 5989.97 samples/sec Loss 2.9070 LearningRate 0.0124 Epoch: 16 Global Step: 174570 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:29:16,762-Speed 5963.99 samples/sec Loss 2.9270 LearningRate 0.0124 Epoch: 16 Global Step: 174580 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:29:23,615-Speed 5977.78 samples/sec Loss 2.9031 LearningRate 0.0123 Epoch: 16 Global Step: 174590 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:29:30,503-Speed 5948.18 samples/sec Loss 2.8917 LearningRate 0.0123 Epoch: 16 Global Step: 174600 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:29:37,358-Speed 5975.53 samples/sec Loss 2.9313 LearningRate 0.0123 Epoch: 16 Global Step: 174610 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:29:44,233-Speed 5959.48 samples/sec Loss 2.8837 LearningRate 0.0123 Epoch: 16 Global Step: 174620 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:29:51,097-Speed 5970.05 samples/sec Loss 2.9085 LearningRate 0.0123 Epoch: 16 Global Step: 174630 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:29:57,962-Speed 5968.20 samples/sec Loss 2.9047 LearningRate 0.0123 Epoch: 16 Global Step: 174640 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:30:04,817-Speed 5978.04 samples/sec Loss 2.8943 LearningRate 0.0123 Epoch: 16 Global Step: 174650 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:30:11,662-Speed 5985.06 samples/sec Loss 2.8750 LearningRate 0.0123 Epoch: 16 Global Step: 174660 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:30:18,519-Speed 5974.94 samples/sec Loss 2.8970 LearningRate 0.0123 Epoch: 16 Global Step: 174670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:30:25,415-Speed 5941.26 samples/sec Loss 2.8614 LearningRate 0.0123 Epoch: 16 Global Step: 174680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:30:32,315-Speed 5937.44 samples/sec Loss 2.8866 LearningRate 0.0123 Epoch: 16 Global Step: 174690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:30:39,184-Speed 5963.98 samples/sec Loss 2.8951 LearningRate 0.0123 Epoch: 16 Global Step: 174700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:30:46,057-Speed 5960.72 samples/sec Loss 2.9034 LearningRate 0.0123 Epoch: 16 Global Step: 174710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:30:52,946-Speed 5949.11 samples/sec Loss 2.9068 LearningRate 0.0122 Epoch: 16 Global Step: 174720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:30:59,804-Speed 5974.21 samples/sec Loss 2.8774 LearningRate 0.0122 Epoch: 16 Global Step: 174730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:31:06,654-Speed 5980.38 samples/sec Loss 2.8730 LearningRate 0.0122 Epoch: 16 Global Step: 174740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:31:13,512-Speed 5973.80 samples/sec Loss 2.8488 LearningRate 0.0122 Epoch: 16 Global Step: 174750 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:31:20,360-Speed 5982.04 samples/sec Loss 2.8897 LearningRate 0.0122 Epoch: 16 Global Step: 174760 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:31:27,231-Speed 5964.56 samples/sec Loss 2.8834 LearningRate 0.0122 Epoch: 16 Global Step: 174770 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:31:34,091-Speed 5971.42 samples/sec Loss 2.8373 LearningRate 0.0122 Epoch: 16 Global Step: 174780 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:31:40,947-Speed 5975.52 samples/sec Loss 2.8551 LearningRate 0.0122 Epoch: 16 Global Step: 174790 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:31:47,805-Speed 5974.13 samples/sec Loss 2.8790 LearningRate 0.0122 Epoch: 16 Global Step: 174800 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:31:54,674-Speed 5964.43 samples/sec Loss 2.8662 LearningRate 0.0122 Epoch: 16 Global Step: 174810 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:32:01,528-Speed 5976.84 samples/sec Loss 2.8818 LearningRate 0.0122 Epoch: 16 Global Step: 174820 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:32:08,395-Speed 5967.57 samples/sec Loss 2.8925 LearningRate 0.0122 Epoch: 16 Global Step: 174830 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:32:15,384-Speed 5862.45 samples/sec Loss 2.9285 LearningRate 0.0122 Epoch: 16 Global Step: 174840 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:32:22,242-Speed 5973.67 samples/sec Loss 2.8669 LearningRate 0.0122 Epoch: 16 Global Step: 174850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:32:29,134-Speed 5946.92 samples/sec Loss 2.8926 LearningRate 0.0121 Epoch: 16 Global Step: 174860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:32:36,007-Speed 5960.81 samples/sec Loss 2.8729 LearningRate 0.0121 Epoch: 16 Global Step: 174870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:32:42,885-Speed 5956.49 samples/sec Loss 2.8224 LearningRate 0.0121 Epoch: 16 Global Step: 174880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:32:49,747-Speed 5970.16 samples/sec Loss 2.8828 LearningRate 0.0121 Epoch: 16 Global Step: 174890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:32:56,619-Speed 5961.84 samples/sec Loss 2.8896 LearningRate 0.0121 Epoch: 16 Global Step: 174900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:33:03,490-Speed 5962.78 samples/sec Loss 2.8523 LearningRate 0.0121 Epoch: 16 Global Step: 174910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:33:10,350-Speed 5972.27 samples/sec Loss 2.8895 LearningRate 0.0121 Epoch: 16 Global Step: 174920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:33:17,183-Speed 5995.37 samples/sec Loss 2.8366 LearningRate 0.0121 Epoch: 16 Global Step: 174930 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:33:24,042-Speed 5972.65 samples/sec Loss 2.8765 LearningRate 0.0121 Epoch: 16 Global Step: 174940 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:33:30,898-Speed 5975.83 samples/sec Loss 2.8561 LearningRate 0.0121 Epoch: 16 Global Step: 174950 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:33:37,750-Speed 5978.86 samples/sec Loss 2.8677 LearningRate 0.0121 Epoch: 16 Global Step: 174960 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:33:44,592-Speed 5986.92 samples/sec Loss 2.9110 LearningRate 0.0121 Epoch: 16 Global Step: 174970 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:33:51,438-Speed 5983.97 samples/sec Loss 2.8573 LearningRate 0.0121 Epoch: 16 Global Step: 174980 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:33:58,296-Speed 5974.67 samples/sec Loss 2.8670 LearningRate 0.0120 Epoch: 16 Global Step: 174990 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:34:05,222-Speed 5914.87 samples/sec Loss 2.8607 LearningRate 0.0120 Epoch: 16 Global Step: 175000 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:34:31,893-[lfw][175000]XNorm: 23.882786 Training: 2022-01-09 06:34:31,893-[lfw][175000]Accuracy-Flip: 0.99800+-0.00277 Training: 2022-01-09 06:34:31,894-[lfw][175000]Accuracy-Highest: 0.99817 Training: 2022-01-09 06:35:05,381-[cfp_fp][175000]XNorm: 21.642738 Training: 2022-01-09 06:35:05,382-[cfp_fp][175000]Accuracy-Flip: 0.99229+-0.00448 Training: 2022-01-09 06:35:05,383-[cfp_fp][175000]Accuracy-Highest: 0.99229 Training: 2022-01-09 06:35:32,122-[agedb_30][175000]XNorm: 23.278589 Training: 2022-01-09 06:35:32,123-[agedb_30][175000]Accuracy-Flip: 0.97833+-0.00654 Training: 2022-01-09 06:35:32,123-[agedb_30][175000]Accuracy-Highest: 0.98067 Training: 2022-01-09 06:35:38,974-Speed 436.91 samples/sec Loss 2.8500 LearningRate 0.0120 Epoch: 16 Global Step: 175010 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:35:45,809-Speed 5994.52 samples/sec Loss 2.8315 LearningRate 0.0120 Epoch: 16 Global Step: 175020 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:35:52,677-Speed 5965.87 samples/sec Loss 2.8599 LearningRate 0.0120 Epoch: 16 Global Step: 175030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:35:59,562-Speed 5954.18 samples/sec Loss 2.8450 LearningRate 0.0120 Epoch: 16 Global Step: 175040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:36:06,414-Speed 5978.39 samples/sec Loss 2.8548 LearningRate 0.0120 Epoch: 16 Global Step: 175050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:36:13,271-Speed 5974.20 samples/sec Loss 2.8598 LearningRate 0.0120 Epoch: 16 Global Step: 175060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:36:20,127-Speed 5975.70 samples/sec Loss 2.8733 LearningRate 0.0120 Epoch: 16 Global Step: 175070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:36:26,993-Speed 5966.23 samples/sec Loss 2.8930 LearningRate 0.0120 Epoch: 16 Global Step: 175080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:36:33,849-Speed 5976.15 samples/sec Loss 2.8703 LearningRate 0.0120 Epoch: 16 Global Step: 175090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:36:40,698-Speed 5982.36 samples/sec Loss 2.8938 LearningRate 0.0120 Epoch: 16 Global Step: 175100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:36:47,567-Speed 5963.72 samples/sec Loss 2.8735 LearningRate 0.0120 Epoch: 16 Global Step: 175110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:36:54,416-Speed 5982.29 samples/sec Loss 2.8739 LearningRate 0.0120 Epoch: 16 Global Step: 175120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:37:01,265-Speed 5980.96 samples/sec Loss 2.8596 LearningRate 0.0119 Epoch: 16 Global Step: 175130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-09 06:37:08,108-Speed 5987.02 samples/sec Loss 2.8371 LearningRate 0.0119 Epoch: 16 Global Step: 175140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:37:14,959-Speed 5980.04 samples/sec Loss 2.8572 LearningRate 0.0119 Epoch: 16 Global Step: 175150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:37:21,828-Speed 5964.09 samples/sec Loss 2.8730 LearningRate 0.0119 Epoch: 16 Global Step: 175160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:37:28,671-Speed 5986.52 samples/sec Loss 2.8736 LearningRate 0.0119 Epoch: 16 Global Step: 175170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:37:35,525-Speed 5977.97 samples/sec Loss 2.8544 LearningRate 0.0119 Epoch: 16 Global Step: 175180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:37:42,396-Speed 5962.67 samples/sec Loss 2.8796 LearningRate 0.0119 Epoch: 16 Global Step: 175190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:37:49,297-Speed 5936.61 samples/sec Loss 2.8295 LearningRate 0.0119 Epoch: 16 Global Step: 175200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:37:56,148-Speed 5979.95 samples/sec Loss 2.8082 LearningRate 0.0119 Epoch: 16 Global Step: 175210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:38:03,001-Speed 5981.24 samples/sec Loss 2.8603 LearningRate 0.0119 Epoch: 16 Global Step: 175220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:38:09,857-Speed 5975.10 samples/sec Loss 2.8497 LearningRate 0.0119 Epoch: 16 Global Step: 175230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:38:16,702-Speed 5985.90 samples/sec Loss 2.8614 LearningRate 0.0119 Epoch: 16 Global Step: 175240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:38:23,571-Speed 5964.19 samples/sec Loss 2.8490 LearningRate 0.0119 Epoch: 16 Global Step: 175250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:38:30,435-Speed 5968.44 samples/sec Loss 2.8499 LearningRate 0.0118 Epoch: 16 Global Step: 175260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:38:37,310-Speed 5959.28 samples/sec Loss 2.8715 LearningRate 0.0118 Epoch: 16 Global Step: 175270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:38:44,149-Speed 5990.89 samples/sec Loss 2.8733 LearningRate 0.0118 Epoch: 16 Global Step: 175280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:38:50,992-Speed 5986.21 samples/sec Loss 2.8313 LearningRate 0.0118 Epoch: 16 Global Step: 175290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:38:57,846-Speed 5980.17 samples/sec Loss 2.8469 LearningRate 0.0118 Epoch: 16 Global Step: 175300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:39:07,394-Speed 4290.80 samples/sec Loss 2.8283 LearningRate 0.0118 Epoch: 16 Global Step: 175310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:39:14,250-Speed 5975.61 samples/sec Loss 2.8736 LearningRate 0.0118 Epoch: 16 Global Step: 175320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:39:21,096-Speed 5984.37 samples/sec Loss 2.8608 LearningRate 0.0118 Epoch: 16 Global Step: 175330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:39:27,933-Speed 5991.97 samples/sec Loss 2.8461 LearningRate 0.0118 Epoch: 16 Global Step: 175340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:39:34,802-Speed 5963.91 samples/sec Loss 2.8554 LearningRate 0.0118 Epoch: 16 Global Step: 175350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:39:41,652-Speed 5981.08 samples/sec Loss 2.8524 LearningRate 0.0118 Epoch: 16 Global Step: 175360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:39:48,481-Speed 5998.45 samples/sec Loss 2.8535 LearningRate 0.0118 Epoch: 16 Global Step: 175370 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:39:55,323-Speed 5988.43 samples/sec Loss 2.8439 LearningRate 0.0118 Epoch: 16 Global Step: 175380 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:40:02,166-Speed 5989.60 samples/sec Loss 2.8340 LearningRate 0.0118 Epoch: 16 Global Step: 175390 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:40:09,020-Speed 5977.37 samples/sec Loss 2.8379 LearningRate 0.0117 Epoch: 16 Global Step: 175400 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:40:15,854-Speed 5994.27 samples/sec Loss 2.8381 LearningRate 0.0117 Epoch: 16 Global Step: 175410 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:40:22,720-Speed 5969.80 samples/sec Loss 2.8722 LearningRate 0.0117 Epoch: 16 Global Step: 175420 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:40:29,571-Speed 5979.34 samples/sec Loss 2.8539 LearningRate 0.0117 Epoch: 16 Global Step: 175430 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:40:36,407-Speed 5993.22 samples/sec Loss 2.8002 LearningRate 0.0117 Epoch: 16 Global Step: 175440 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:40:43,252-Speed 5984.87 samples/sec Loss 2.8568 LearningRate 0.0117 Epoch: 16 Global Step: 175450 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:40:50,094-Speed 5988.06 samples/sec Loss 2.8538 LearningRate 0.0117 Epoch: 16 Global Step: 175460 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:40:56,931-Speed 5991.85 samples/sec Loss 2.8908 LearningRate 0.0117 Epoch: 16 Global Step: 175470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:41:03,764-Speed 5995.18 samples/sec Loss 2.8241 LearningRate 0.0117 Epoch: 16 Global Step: 175480 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:41:10,616-Speed 5980.55 samples/sec Loss 2.8208 LearningRate 0.0117 Epoch: 16 Global Step: 175490 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:41:17,457-Speed 5988.49 samples/sec Loss 2.8478 LearningRate 0.0117 Epoch: 16 Global Step: 175500 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:41:24,310-Speed 5978.66 samples/sec Loss 2.8595 LearningRate 0.0117 Epoch: 16 Global Step: 175510 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:41:31,163-Speed 5977.20 samples/sec Loss 2.8865 LearningRate 0.0117 Epoch: 16 Global Step: 175520 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:41:38,010-Speed 5983.61 samples/sec Loss 2.8337 LearningRate 0.0116 Epoch: 16 Global Step: 175530 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:41:44,861-Speed 5980.18 samples/sec Loss 2.8562 LearningRate 0.0116 Epoch: 16 Global Step: 175540 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:41:51,742-Speed 5953.69 samples/sec Loss 2.8485 LearningRate 0.0116 Epoch: 16 Global Step: 175550 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:41:58,623-Speed 5954.13 samples/sec Loss 2.8440 LearningRate 0.0116 Epoch: 16 Global Step: 175560 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:42:05,474-Speed 5979.85 samples/sec Loss 2.8476 LearningRate 0.0116 Epoch: 16 Global Step: 175570 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:42:12,330-Speed 5975.43 samples/sec Loss 2.8307 LearningRate 0.0116 Epoch: 16 Global Step: 175580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:42:19,184-Speed 5977.23 samples/sec Loss 2.8578 LearningRate 0.0116 Epoch: 16 Global Step: 175590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:42:26,037-Speed 5978.53 samples/sec Loss 2.8261 LearningRate 0.0116 Epoch: 16 Global Step: 175600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:42:32,886-Speed 5981.30 samples/sec Loss 2.8557 LearningRate 0.0116 Epoch: 16 Global Step: 175610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:42:39,793-Speed 5931.54 samples/sec Loss 2.8178 LearningRate 0.0116 Epoch: 16 Global Step: 175620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:42:46,652-Speed 5973.49 samples/sec Loss 2.8036 LearningRate 0.0116 Epoch: 16 Global Step: 175630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:42:53,504-Speed 5978.46 samples/sec Loss 2.8100 LearningRate 0.0116 Epoch: 16 Global Step: 175640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:43:00,368-Speed 5968.51 samples/sec Loss 2.8340 LearningRate 0.0116 Epoch: 16 Global Step: 175650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:43:07,209-Speed 5988.10 samples/sec Loss 2.8261 LearningRate 0.0116 Epoch: 16 Global Step: 175660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:43:14,073-Speed 5968.96 samples/sec Loss 2.9055 LearningRate 0.0115 Epoch: 16 Global Step: 175670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:43:20,924-Speed 5979.54 samples/sec Loss 2.8538 LearningRate 0.0115 Epoch: 16 Global Step: 175680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-09 06:43:27,784-Speed 5972.42 samples/sec Loss 2.8059 LearningRate 0.0115 Epoch: 16 Global Step: 175690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-09 06:43:34,633-Speed 5981.33 samples/sec Loss 2.8264 LearningRate 0.0115 Epoch: 16 Global Step: 175700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:43:41,498-Speed 5967.25 samples/sec Loss 2.7862 LearningRate 0.0115 Epoch: 16 Global Step: 175710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:43:48,358-Speed 5972.05 samples/sec Loss 2.8200 LearningRate 0.0115 Epoch: 16 Global Step: 175720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:43:55,233-Speed 5958.60 samples/sec Loss 2.8409 LearningRate 0.0115 Epoch: 16 Global Step: 175730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:44:02,097-Speed 5968.99 samples/sec Loss 2.7827 LearningRate 0.0115 Epoch: 16 Global Step: 175740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:44:08,961-Speed 5969.33 samples/sec Loss 2.7925 LearningRate 0.0115 Epoch: 16 Global Step: 175750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:44:15,826-Speed 5968.52 samples/sec Loss 2.8436 LearningRate 0.0115 Epoch: 16 Global Step: 175760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:44:22,689-Speed 5969.24 samples/sec Loss 2.8386 LearningRate 0.0115 Epoch: 16 Global Step: 175770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:44:29,554-Speed 5967.99 samples/sec Loss 2.8416 LearningRate 0.0115 Epoch: 16 Global Step: 175780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:44:36,402-Speed 5981.92 samples/sec Loss 2.7980 LearningRate 0.0115 Epoch: 16 Global Step: 175790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:44:43,247-Speed 5987.05 samples/sec Loss 2.8688 LearningRate 0.0115 Epoch: 16 Global Step: 175800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:44:50,099-Speed 5978.67 samples/sec Loss 2.8263 LearningRate 0.0114 Epoch: 16 Global Step: 175810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:44:56,952-Speed 5978.38 samples/sec Loss 2.8442 LearningRate 0.0114 Epoch: 16 Global Step: 175820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:45:03,817-Speed 5967.42 samples/sec Loss 2.8212 LearningRate 0.0114 Epoch: 16 Global Step: 175830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:45:10,688-Speed 5962.39 samples/sec Loss 2.7901 LearningRate 0.0114 Epoch: 16 Global Step: 175840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:45:17,571-Speed 5951.78 samples/sec Loss 2.8100 LearningRate 0.0114 Epoch: 16 Global Step: 175850 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:45:24,447-Speed 5958.89 samples/sec Loss 2.8421 LearningRate 0.0114 Epoch: 16 Global Step: 175860 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:45:31,299-Speed 5978.93 samples/sec Loss 2.8108 LearningRate 0.0114 Epoch: 16 Global Step: 175870 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:45:38,180-Speed 5953.62 samples/sec Loss 2.7786 LearningRate 0.0114 Epoch: 16 Global Step: 175880 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:45:45,055-Speed 5959.13 samples/sec Loss 2.8000 LearningRate 0.0114 Epoch: 16 Global Step: 175890 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:45:51,908-Speed 5978.13 samples/sec Loss 2.8223 LearningRate 0.0114 Epoch: 16 Global Step: 175900 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:45:58,788-Speed 5954.29 samples/sec Loss 2.7703 LearningRate 0.0114 Epoch: 16 Global Step: 175910 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:46:05,662-Speed 5960.51 samples/sec Loss 2.7733 LearningRate 0.0114 Epoch: 16 Global Step: 175920 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:46:12,554-Speed 5944.52 samples/sec Loss 2.7659 LearningRate 0.0114 Epoch: 16 Global Step: 175930 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:46:19,521-Speed 5879.61 samples/sec Loss 2.8275 LearningRate 0.0114 Epoch: 16 Global Step: 175940 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:46:26,481-Speed 5886.85 samples/sec Loss 2.8177 LearningRate 0.0113 Epoch: 16 Global Step: 175950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:46:33,366-Speed 5950.01 samples/sec Loss 2.7706 LearningRate 0.0113 Epoch: 16 Global Step: 175960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:46:40,224-Speed 5973.97 samples/sec Loss 2.8103 LearningRate 0.0113 Epoch: 16 Global Step: 175970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:46:47,078-Speed 5976.80 samples/sec Loss 2.7977 LearningRate 0.0113 Epoch: 16 Global Step: 175980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:46:54,005-Speed 5913.95 samples/sec Loss 2.8141 LearningRate 0.0113 Epoch: 16 Global Step: 175990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:47:00,856-Speed 5980.18 samples/sec Loss 2.8289 LearningRate 0.0113 Epoch: 16 Global Step: 176000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:47:07,751-Speed 5941.74 samples/sec Loss 2.8048 LearningRate 0.0113 Epoch: 16 Global Step: 176010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:47:14,613-Speed 5970.07 samples/sec Loss 2.8305 LearningRate 0.0113 Epoch: 16 Global Step: 176020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:47:21,479-Speed 5967.00 samples/sec Loss 2.7841 LearningRate 0.0113 Epoch: 16 Global Step: 176030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:47:28,346-Speed 5968.53 samples/sec Loss 2.8135 LearningRate 0.0113 Epoch: 16 Global Step: 176040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:47:35,253-Speed 5932.87 samples/sec Loss 2.8589 LearningRate 0.0113 Epoch: 16 Global Step: 176050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-09 06:47:42,099-Speed 5984.03 samples/sec Loss 2.7736 LearningRate 0.0113 Epoch: 16 Global Step: 176060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:47:48,966-Speed 5965.87 samples/sec Loss 2.8245 LearningRate 0.0113 Epoch: 16 Global Step: 176070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:47:55,832-Speed 5966.68 samples/sec Loss 2.7940 LearningRate 0.0112 Epoch: 16 Global Step: 176080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:48:02,719-Speed 5948.91 samples/sec Loss 2.7934 LearningRate 0.0112 Epoch: 16 Global Step: 176090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:48:09,593-Speed 5959.83 samples/sec Loss 2.7814 LearningRate 0.0112 Epoch: 16 Global Step: 176100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:48:16,459-Speed 5966.83 samples/sec Loss 2.8333 LearningRate 0.0112 Epoch: 16 Global Step: 176110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:48:23,320-Speed 5970.88 samples/sec Loss 2.8079 LearningRate 0.0112 Epoch: 16 Global Step: 176120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:48:30,193-Speed 5961.04 samples/sec Loss 2.8043 LearningRate 0.0112 Epoch: 16 Global Step: 176130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:48:37,076-Speed 5952.83 samples/sec Loss 2.7947 LearningRate 0.0112 Epoch: 16 Global Step: 176140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:48:43,947-Speed 5962.37 samples/sec Loss 2.7758 LearningRate 0.0112 Epoch: 16 Global Step: 176150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:48:50,839-Speed 5947.49 samples/sec Loss 2.8168 LearningRate 0.0112 Epoch: 16 Global Step: 176160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:48:57,720-Speed 5954.13 samples/sec Loss 2.7860 LearningRate 0.0112 Epoch: 16 Global Step: 176170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:49:04,591-Speed 5961.74 samples/sec Loss 2.8099 LearningRate 0.0112 Epoch: 16 Global Step: 176180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:49:11,443-Speed 5979.53 samples/sec Loss 2.8403 LearningRate 0.0112 Epoch: 16 Global Step: 176190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:49:18,295-Speed 5978.47 samples/sec Loss 2.7343 LearningRate 0.0112 Epoch: 16 Global Step: 176200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:49:25,162-Speed 5965.52 samples/sec Loss 2.7701 LearningRate 0.0112 Epoch: 16 Global Step: 176210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:49:32,023-Speed 5971.19 samples/sec Loss 2.8083 LearningRate 0.0111 Epoch: 16 Global Step: 176220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:49:38,894-Speed 5962.71 samples/sec Loss 2.7771 LearningRate 0.0111 Epoch: 16 Global Step: 176230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:49:45,758-Speed 5968.28 samples/sec Loss 2.8028 LearningRate 0.0111 Epoch: 16 Global Step: 176240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:49:52,630-Speed 5962.20 samples/sec Loss 2.7874 LearningRate 0.0111 Epoch: 16 Global Step: 176250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:49:59,479-Speed 5981.39 samples/sec Loss 2.8081 LearningRate 0.0111 Epoch: 16 Global Step: 176260 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:50:06,338-Speed 5973.11 samples/sec Loss 2.7670 LearningRate 0.0111 Epoch: 16 Global Step: 176270 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:50:13,194-Speed 5976.14 samples/sec Loss 2.8266 LearningRate 0.0111 Epoch: 16 Global Step: 176280 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:50:36,588-Speed 1751.01 samples/sec Loss 2.8342 LearningRate 0.0111 Epoch: 17 Global Step: 176290 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:50:43,436-Speed 5983.23 samples/sec Loss 2.7696 LearningRate 0.0111 Epoch: 17 Global Step: 176300 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:50:50,279-Speed 5986.51 samples/sec Loss 2.8336 LearningRate 0.0111 Epoch: 17 Global Step: 176310 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:50:57,109-Speed 5998.75 samples/sec Loss 2.8112 LearningRate 0.0111 Epoch: 17 Global Step: 176320 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:51:03,966-Speed 5976.44 samples/sec Loss 2.8130 LearningRate 0.0111 Epoch: 17 Global Step: 176330 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:51:10,808-Speed 5987.19 samples/sec Loss 2.7831 LearningRate 0.0111 Epoch: 17 Global Step: 176340 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:51:17,637-Speed 5998.96 samples/sec Loss 2.7881 LearningRate 0.0111 Epoch: 17 Global Step: 176350 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:51:24,492-Speed 5976.69 samples/sec Loss 2.7898 LearningRate 0.0110 Epoch: 17 Global Step: 176360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:51:31,332-Speed 5990.14 samples/sec Loss 2.7399 LearningRate 0.0110 Epoch: 17 Global Step: 176370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:51:38,204-Speed 5961.03 samples/sec Loss 2.7776 LearningRate 0.0110 Epoch: 17 Global Step: 176380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:51:45,072-Speed 5964.72 samples/sec Loss 2.7911 LearningRate 0.0110 Epoch: 17 Global Step: 176390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:51:51,916-Speed 5986.23 samples/sec Loss 2.7523 LearningRate 0.0110 Epoch: 17 Global Step: 176400 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:51:58,796-Speed 5954.21 samples/sec Loss 2.7906 LearningRate 0.0110 Epoch: 17 Global Step: 176410 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:52:05,656-Speed 5972.76 samples/sec Loss 2.7823 LearningRate 0.0110 Epoch: 17 Global Step: 176420 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:52:12,509-Speed 5977.45 samples/sec Loss 2.7766 LearningRate 0.0110 Epoch: 17 Global Step: 176430 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:52:19,384-Speed 5959.60 samples/sec Loss 2.7870 LearningRate 0.0110 Epoch: 17 Global Step: 176440 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:52:26,249-Speed 5967.62 samples/sec Loss 2.7499 LearningRate 0.0110 Epoch: 17 Global Step: 176450 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:52:33,105-Speed 5974.73 samples/sec Loss 2.7716 LearningRate 0.0110 Epoch: 17 Global Step: 176460 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:52:39,980-Speed 5959.36 samples/sec Loss 2.7529 LearningRate 0.0110 Epoch: 17 Global Step: 176470 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:52:46,830-Speed 5980.82 samples/sec Loss 2.7411 LearningRate 0.0110 Epoch: 17 Global Step: 176480 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:52:53,691-Speed 5971.26 samples/sec Loss 2.7373 LearningRate 0.0110 Epoch: 17 Global Step: 176490 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:53:00,575-Speed 5951.03 samples/sec Loss 2.7710 LearningRate 0.0109 Epoch: 17 Global Step: 176500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:53:07,447-Speed 5961.31 samples/sec Loss 2.7767 LearningRate 0.0109 Epoch: 17 Global Step: 176510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:53:14,320-Speed 5960.17 samples/sec Loss 2.7517 LearningRate 0.0109 Epoch: 17 Global Step: 176520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:53:21,179-Speed 5973.17 samples/sec Loss 2.7669 LearningRate 0.0109 Epoch: 17 Global Step: 176530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:53:28,047-Speed 5964.77 samples/sec Loss 2.7473 LearningRate 0.0109 Epoch: 17 Global Step: 176540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:53:34,898-Speed 5979.90 samples/sec Loss 2.7134 LearningRate 0.0109 Epoch: 17 Global Step: 176550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:53:41,759-Speed 5971.17 samples/sec Loss 2.7529 LearningRate 0.0109 Epoch: 17 Global Step: 176560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:53:48,613-Speed 5976.83 samples/sec Loss 2.7694 LearningRate 0.0109 Epoch: 17 Global Step: 176570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:53:55,473-Speed 5971.51 samples/sec Loss 2.7732 LearningRate 0.0109 Epoch: 17 Global Step: 176580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:54:02,338-Speed 5968.17 samples/sec Loss 2.7705 LearningRate 0.0109 Epoch: 17 Global Step: 176590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:54:09,225-Speed 5948.38 samples/sec Loss 2.7651 LearningRate 0.0109 Epoch: 17 Global Step: 176600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-09 06:54:16,080-Speed 5976.54 samples/sec Loss 2.7475 LearningRate 0.0109 Epoch: 17 Global Step: 176610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:54:22,927-Speed 5982.67 samples/sec Loss 2.7269 LearningRate 0.0109 Epoch: 17 Global Step: 176620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:54:29,790-Speed 5969.41 samples/sec Loss 2.8017 LearningRate 0.0109 Epoch: 17 Global Step: 176630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:54:36,650-Speed 5973.24 samples/sec Loss 2.7319 LearningRate 0.0109 Epoch: 17 Global Step: 176640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:54:43,506-Speed 5977.92 samples/sec Loss 2.7446 LearningRate 0.0108 Epoch: 17 Global Step: 176650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:54:50,362-Speed 5975.75 samples/sec Loss 2.7489 LearningRate 0.0108 Epoch: 17 Global Step: 176660 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:54:57,238-Speed 5958.11 samples/sec Loss 2.7382 LearningRate 0.0108 Epoch: 17 Global Step: 176670 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:55:04,091-Speed 5978.32 samples/sec Loss 2.7719 LearningRate 0.0108 Epoch: 17 Global Step: 176680 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:55:10,961-Speed 5963.56 samples/sec Loss 2.8172 LearningRate 0.0108 Epoch: 17 Global Step: 176690 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:55:17,838-Speed 5957.57 samples/sec Loss 2.7730 LearningRate 0.0108 Epoch: 17 Global Step: 176700 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:55:24,708-Speed 5963.80 samples/sec Loss 2.7629 LearningRate 0.0108 Epoch: 17 Global Step: 176710 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:55:31,571-Speed 5968.82 samples/sec Loss 2.7439 LearningRate 0.0108 Epoch: 17 Global Step: 176720 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:55:38,448-Speed 5958.73 samples/sec Loss 2.7618 LearningRate 0.0108 Epoch: 17 Global Step: 176730 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:55:45,304-Speed 5976.90 samples/sec Loss 2.7684 LearningRate 0.0108 Epoch: 17 Global Step: 176740 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:55:52,172-Speed 5964.68 samples/sec Loss 2.7666 LearningRate 0.0108 Epoch: 17 Global Step: 176750 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:55:59,038-Speed 5966.74 samples/sec Loss 2.7383 LearningRate 0.0108 Epoch: 17 Global Step: 176760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:56:05,913-Speed 5959.44 samples/sec Loss 2.7577 LearningRate 0.0108 Epoch: 17 Global Step: 176770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:56:12,776-Speed 5969.20 samples/sec Loss 2.7443 LearningRate 0.0108 Epoch: 17 Global Step: 176780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:56:19,646-Speed 5963.82 samples/sec Loss 2.7274 LearningRate 0.0107 Epoch: 17 Global Step: 176790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:56:26,546-Speed 5941.24 samples/sec Loss 2.7479 LearningRate 0.0107 Epoch: 17 Global Step: 176800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:56:33,412-Speed 5967.26 samples/sec Loss 2.7575 LearningRate 0.0107 Epoch: 17 Global Step: 176810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:56:40,262-Speed 5980.49 samples/sec Loss 2.7609 LearningRate 0.0107 Epoch: 17 Global Step: 176820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:56:47,110-Speed 5983.12 samples/sec Loss 2.7605 LearningRate 0.0107 Epoch: 17 Global Step: 176830 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:56:53,982-Speed 5964.07 samples/sec Loss 2.7814 LearningRate 0.0107 Epoch: 17 Global Step: 176840 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:57:00,867-Speed 5950.77 samples/sec Loss 2.7255 LearningRate 0.0107 Epoch: 17 Global Step: 176850 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:57:07,720-Speed 5977.98 samples/sec Loss 2.7519 LearningRate 0.0107 Epoch: 17 Global Step: 176860 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:57:14,580-Speed 5972.01 samples/sec Loss 2.7184 LearningRate 0.0107 Epoch: 17 Global Step: 176870 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:57:21,536-Speed 5889.79 samples/sec Loss 2.7707 LearningRate 0.0107 Epoch: 17 Global Step: 176880 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:57:28,476-Speed 5903.30 samples/sec Loss 2.7278 LearningRate 0.0107 Epoch: 17 Global Step: 176890 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:57:35,324-Speed 5982.36 samples/sec Loss 2.7199 LearningRate 0.0107 Epoch: 17 Global Step: 176900 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:57:42,186-Speed 5970.14 samples/sec Loss 2.7534 LearningRate 0.0107 Epoch: 17 Global Step: 176910 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:57:49,072-Speed 5948.63 samples/sec Loss 2.7676 LearningRate 0.0107 Epoch: 17 Global Step: 176920 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:57:55,953-Speed 5954.50 samples/sec Loss 2.7259 LearningRate 0.0106 Epoch: 17 Global Step: 176930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:58:02,819-Speed 5966.91 samples/sec Loss 2.7272 LearningRate 0.0106 Epoch: 17 Global Step: 176940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:58:09,667-Speed 5982.34 samples/sec Loss 2.7554 LearningRate 0.0106 Epoch: 17 Global Step: 176950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:58:16,520-Speed 5978.31 samples/sec Loss 2.7691 LearningRate 0.0106 Epoch: 17 Global Step: 176960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:58:23,396-Speed 5958.31 samples/sec Loss 2.7543 LearningRate 0.0106 Epoch: 17 Global Step: 176970 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:58:30,244-Speed 5982.96 samples/sec Loss 2.7015 LearningRate 0.0106 Epoch: 17 Global Step: 176980 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:58:37,109-Speed 5967.40 samples/sec Loss 2.7299 LearningRate 0.0106 Epoch: 17 Global Step: 176990 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:58:43,969-Speed 5974.12 samples/sec Loss 2.7385 LearningRate 0.0106 Epoch: 17 Global Step: 177000 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:58:50,821-Speed 5978.82 samples/sec Loss 2.7307 LearningRate 0.0106 Epoch: 17 Global Step: 177010 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:58:57,682-Speed 5971.76 samples/sec Loss 2.7537 LearningRate 0.0106 Epoch: 17 Global Step: 177020 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:59:04,530-Speed 5982.64 samples/sec Loss 2.7442 LearningRate 0.0106 Epoch: 17 Global Step: 177030 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:59:11,384-Speed 5976.85 samples/sec Loss 2.7154 LearningRate 0.0106 Epoch: 17 Global Step: 177040 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:59:18,277-Speed 5943.35 samples/sec Loss 2.7429 LearningRate 0.0106 Epoch: 17 Global Step: 177050 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:59:25,188-Speed 5927.71 samples/sec Loss 2.7060 LearningRate 0.0106 Epoch: 17 Global Step: 177060 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 06:59:32,086-Speed 5939.62 samples/sec Loss 2.7132 LearningRate 0.0105 Epoch: 17 Global Step: 177070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:59:38,950-Speed 5968.92 samples/sec Loss 2.7581 LearningRate 0.0105 Epoch: 17 Global Step: 177080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:59:45,812-Speed 5969.55 samples/sec Loss 2.7497 LearningRate 0.0105 Epoch: 17 Global Step: 177090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:59:52,715-Speed 5934.56 samples/sec Loss 2.7152 LearningRate 0.0105 Epoch: 17 Global Step: 177100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 06:59:59,566-Speed 5980.01 samples/sec Loss 2.7541 LearningRate 0.0105 Epoch: 17 Global Step: 177110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:00:06,445-Speed 5955.61 samples/sec Loss 2.7858 LearningRate 0.0105 Epoch: 17 Global Step: 177120 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:00:13,319-Speed 5960.12 samples/sec Loss 2.7116 LearningRate 0.0105 Epoch: 17 Global Step: 177130 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:00:20,174-Speed 5976.57 samples/sec Loss 2.7126 LearningRate 0.0105 Epoch: 17 Global Step: 177140 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:00:27,032-Speed 5973.47 samples/sec Loss 2.7134 LearningRate 0.0105 Epoch: 17 Global Step: 177150 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:00:33,881-Speed 5981.36 samples/sec Loss 2.7265 LearningRate 0.0105 Epoch: 17 Global Step: 177160 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:00:40,722-Speed 5988.37 samples/sec Loss 2.7383 LearningRate 0.0105 Epoch: 17 Global Step: 177170 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:00:47,596-Speed 5960.03 samples/sec Loss 2.7375 LearningRate 0.0105 Epoch: 17 Global Step: 177180 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:00:54,450-Speed 5976.92 samples/sec Loss 2.7458 LearningRate 0.0105 Epoch: 17 Global Step: 177190 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:01:01,328-Speed 5956.77 samples/sec Loss 2.7218 LearningRate 0.0105 Epoch: 17 Global Step: 177200 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:01:08,204-Speed 5958.05 samples/sec Loss 2.7450 LearningRate 0.0105 Epoch: 17 Global Step: 177210 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:01:15,045-Speed 5987.55 samples/sec Loss 2.7078 LearningRate 0.0104 Epoch: 17 Global Step: 177220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:01:21,918-Speed 5963.41 samples/sec Loss 2.6950 LearningRate 0.0104 Epoch: 17 Global Step: 177230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:01:28,854-Speed 5906.47 samples/sec Loss 2.6957 LearningRate 0.0104 Epoch: 17 Global Step: 177240 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:01:35,778-Speed 5917.42 samples/sec Loss 2.6837 LearningRate 0.0104 Epoch: 17 Global Step: 177250 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:01:42,645-Speed 5965.00 samples/sec Loss 2.6934 LearningRate 0.0104 Epoch: 17 Global Step: 177260 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:01:49,492-Speed 5984.90 samples/sec Loss 2.7672 LearningRate 0.0104 Epoch: 17 Global Step: 177270 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:01:56,369-Speed 5957.11 samples/sec Loss 2.7568 LearningRate 0.0104 Epoch: 17 Global Step: 177280 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:02:03,222-Speed 5980.59 samples/sec Loss 2.7230 LearningRate 0.0104 Epoch: 17 Global Step: 177290 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:02:10,081-Speed 5972.49 samples/sec Loss 2.6781 LearningRate 0.0104 Epoch: 17 Global Step: 177300 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:02:16,965-Speed 5953.57 samples/sec Loss 2.7079 LearningRate 0.0104 Epoch: 17 Global Step: 177310 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:02:23,830-Speed 5968.01 samples/sec Loss 2.7463 LearningRate 0.0104 Epoch: 17 Global Step: 177320 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:02:30,702-Speed 5961.99 samples/sec Loss 2.6582 LearningRate 0.0104 Epoch: 17 Global Step: 177330 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:02:37,559-Speed 5974.77 samples/sec Loss 2.7323 LearningRate 0.0104 Epoch: 17 Global Step: 177340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:02:44,427-Speed 5965.21 samples/sec Loss 2.6858 LearningRate 0.0104 Epoch: 17 Global Step: 177350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:02:51,286-Speed 5972.54 samples/sec Loss 2.7400 LearningRate 0.0103 Epoch: 17 Global Step: 177360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:02:58,167-Speed 5953.46 samples/sec Loss 2.6829 LearningRate 0.0103 Epoch: 17 Global Step: 177370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:03:05,071-Speed 5933.99 samples/sec Loss 2.7211 LearningRate 0.0103 Epoch: 17 Global Step: 177380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:03:11,929-Speed 5973.49 samples/sec Loss 2.6947 LearningRate 0.0103 Epoch: 17 Global Step: 177390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:03:18,796-Speed 5966.06 samples/sec Loss 2.6992 LearningRate 0.0103 Epoch: 17 Global Step: 177400 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:03:25,647-Speed 5979.77 samples/sec Loss 2.7105 LearningRate 0.0103 Epoch: 17 Global Step: 177410 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:03:32,499-Speed 5979.19 samples/sec Loss 2.6804 LearningRate 0.0103 Epoch: 17 Global Step: 177420 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:03:39,361-Speed 5970.33 samples/sec Loss 2.7121 LearningRate 0.0103 Epoch: 17 Global Step: 177430 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:03:46,253-Speed 5946.69 samples/sec Loss 2.6996 LearningRate 0.0103 Epoch: 17 Global Step: 177440 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:03:53,113-Speed 5971.83 samples/sec Loss 2.6657 LearningRate 0.0103 Epoch: 17 Global Step: 177450 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:03:59,980-Speed 5966.34 samples/sec Loss 2.7297 LearningRate 0.0103 Epoch: 17 Global Step: 177460 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:04:06,836-Speed 5974.70 samples/sec Loss 2.6997 LearningRate 0.0103 Epoch: 17 Global Step: 177470 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:04:13,698-Speed 5970.83 samples/sec Loss 2.7366 LearningRate 0.0103 Epoch: 17 Global Step: 177480 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:04:20,546-Speed 5983.18 samples/sec Loss 2.6768 LearningRate 0.0103 Epoch: 17 Global Step: 177490 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:04:27,398-Speed 5978.73 samples/sec Loss 2.7018 LearningRate 0.0103 Epoch: 17 Global Step: 177500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:04:34,259-Speed 5971.13 samples/sec Loss 2.6961 LearningRate 0.0102 Epoch: 17 Global Step: 177510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:04:41,117-Speed 5973.58 samples/sec Loss 2.7023 LearningRate 0.0102 Epoch: 17 Global Step: 177520 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:04:48,025-Speed 5930.36 samples/sec Loss 2.7178 LearningRate 0.0102 Epoch: 17 Global Step: 177530 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:04:54,907-Speed 5954.44 samples/sec Loss 2.6888 LearningRate 0.0102 Epoch: 17 Global Step: 177540 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:05:01,756-Speed 5980.81 samples/sec Loss 2.7121 LearningRate 0.0102 Epoch: 17 Global Step: 177550 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:05:08,625-Speed 5966.65 samples/sec Loss 2.7158 LearningRate 0.0102 Epoch: 17 Global Step: 177560 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:05:15,497-Speed 5961.78 samples/sec Loss 2.6914 LearningRate 0.0102 Epoch: 17 Global Step: 177570 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:05:22,376-Speed 5957.01 samples/sec Loss 2.6658 LearningRate 0.0102 Epoch: 17 Global Step: 177580 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:05:29,248-Speed 5961.67 samples/sec Loss 2.7230 LearningRate 0.0102 Epoch: 17 Global Step: 177590 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:05:36,180-Speed 5910.33 samples/sec Loss 2.6782 LearningRate 0.0102 Epoch: 17 Global Step: 177600 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:05:43,022-Speed 5988.28 samples/sec Loss 2.7075 LearningRate 0.0102 Epoch: 17 Global Step: 177610 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:05:49,987-Speed 5882.02 samples/sec Loss 2.6942 LearningRate 0.0102 Epoch: 17 Global Step: 177620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:05:56,959-Speed 5876.81 samples/sec Loss 2.7280 LearningRate 0.0102 Epoch: 17 Global Step: 177630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:06:03,920-Speed 5885.07 samples/sec Loss 2.6758 LearningRate 0.0102 Epoch: 17 Global Step: 177640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:06:10,868-Speed 5896.23 samples/sec Loss 2.7089 LearningRate 0.0101 Epoch: 17 Global Step: 177650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:06:17,712-Speed 5986.31 samples/sec Loss 2.7021 LearningRate 0.0101 Epoch: 17 Global Step: 177660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:06:24,576-Speed 5968.69 samples/sec Loss 2.6602 LearningRate 0.0101 Epoch: 17 Global Step: 177670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:06:31,432-Speed 5976.11 samples/sec Loss 2.6651 LearningRate 0.0101 Epoch: 17 Global Step: 177680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:06:38,279-Speed 5983.96 samples/sec Loss 2.6964 LearningRate 0.0101 Epoch: 17 Global Step: 177690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:06:45,136-Speed 5974.95 samples/sec Loss 2.6735 LearningRate 0.0101 Epoch: 17 Global Step: 177700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:06:52,117-Speed 5871.43 samples/sec Loss 2.6883 LearningRate 0.0101 Epoch: 17 Global Step: 177710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:06:59,060-Speed 5900.75 samples/sec Loss 2.6599 LearningRate 0.0101 Epoch: 17 Global Step: 177720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-09 07:07:05,924-Speed 5968.90 samples/sec Loss 2.7087 LearningRate 0.0101 Epoch: 17 Global Step: 177730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-09 07:07:12,779-Speed 5977.01 samples/sec Loss 2.6558 LearningRate 0.0101 Epoch: 17 Global Step: 177740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:07:19,649-Speed 5966.28 samples/sec Loss 2.6780 LearningRate 0.0101 Epoch: 17 Global Step: 177750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:07:26,508-Speed 5974.04 samples/sec Loss 2.7003 LearningRate 0.0101 Epoch: 17 Global Step: 177760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:07:33,377-Speed 5964.51 samples/sec Loss 2.6609 LearningRate 0.0101 Epoch: 17 Global Step: 177770 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:07:40,231-Speed 5977.18 samples/sec Loss 2.6851 LearningRate 0.0101 Epoch: 17 Global Step: 177780 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:07:47,098-Speed 5965.83 samples/sec Loss 2.6684 LearningRate 0.0101 Epoch: 17 Global Step: 177790 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:07:53,951-Speed 5978.18 samples/sec Loss 2.7002 LearningRate 0.0100 Epoch: 17 Global Step: 177800 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:08:00,819-Speed 5965.15 samples/sec Loss 2.6883 LearningRate 0.0100 Epoch: 17 Global Step: 177810 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:08:07,697-Speed 5958.08 samples/sec Loss 2.6874 LearningRate 0.0100 Epoch: 17 Global Step: 177820 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:08:14,573-Speed 5959.15 samples/sec Loss 2.6956 LearningRate 0.0100 Epoch: 17 Global Step: 177830 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:08:21,431-Speed 5974.19 samples/sec Loss 2.7028 LearningRate 0.0100 Epoch: 17 Global Step: 177840 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:08:28,289-Speed 5973.32 samples/sec Loss 2.7089 LearningRate 0.0100 Epoch: 17 Global Step: 177850 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:08:35,141-Speed 5981.31 samples/sec Loss 2.6851 LearningRate 0.0100 Epoch: 17 Global Step: 177860 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:08:42,023-Speed 5952.47 samples/sec Loss 2.7273 LearningRate 0.0100 Epoch: 17 Global Step: 177870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:08:48,868-Speed 5985.53 samples/sec Loss 2.6737 LearningRate 0.0100 Epoch: 17 Global Step: 177880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:08:55,726-Speed 5973.53 samples/sec Loss 2.6815 LearningRate 0.0100 Epoch: 17 Global Step: 177890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:09:02,576-Speed 5980.76 samples/sec Loss 2.6612 LearningRate 0.0100 Epoch: 17 Global Step: 177900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:09:09,428-Speed 5978.46 samples/sec Loss 2.6966 LearningRate 0.0100 Epoch: 17 Global Step: 177910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:09:16,306-Speed 5957.29 samples/sec Loss 2.6776 LearningRate 0.0100 Epoch: 17 Global Step: 177920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:09:23,179-Speed 5960.68 samples/sec Loss 2.6534 LearningRate 0.0100 Epoch: 17 Global Step: 177930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:09:30,041-Speed 5969.78 samples/sec Loss 2.6866 LearningRate 0.0100 Epoch: 17 Global Step: 177940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:09:36,932-Speed 5945.30 samples/sec Loss 2.6658 LearningRate 0.0099 Epoch: 17 Global Step: 177950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:09:43,795-Speed 5969.67 samples/sec Loss 2.6722 LearningRate 0.0099 Epoch: 17 Global Step: 177960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:09:50,666-Speed 5962.15 samples/sec Loss 2.6726 LearningRate 0.0099 Epoch: 17 Global Step: 177970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-09 07:09:57,514-Speed 5982.66 samples/sec Loss 2.6688 LearningRate 0.0099 Epoch: 17 Global Step: 177980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:10:04,364-Speed 5980.13 samples/sec Loss 2.6644 LearningRate 0.0099 Epoch: 17 Global Step: 177990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:10:11,237-Speed 5960.46 samples/sec Loss 2.6361 LearningRate 0.0099 Epoch: 17 Global Step: 178000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:10:18,100-Speed 5969.99 samples/sec Loss 2.6482 LearningRate 0.0099 Epoch: 17 Global Step: 178010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:10:24,943-Speed 5986.78 samples/sec Loss 2.6211 LearningRate 0.0099 Epoch: 17 Global Step: 178020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:10:31,805-Speed 5970.14 samples/sec Loss 2.6497 LearningRate 0.0099 Epoch: 17 Global Step: 178030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:10:38,661-Speed 5974.87 samples/sec Loss 2.6481 LearningRate 0.0099 Epoch: 17 Global Step: 178040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:10:45,512-Speed 5979.71 samples/sec Loss 2.6432 LearningRate 0.0099 Epoch: 17 Global Step: 178050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:10:52,404-Speed 5943.52 samples/sec Loss 2.6572 LearningRate 0.0099 Epoch: 17 Global Step: 178060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:10:59,259-Speed 5978.97 samples/sec Loss 2.7166 LearningRate 0.0099 Epoch: 17 Global Step: 178070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:11:06,113-Speed 5980.79 samples/sec Loss 2.6622 LearningRate 0.0099 Epoch: 17 Global Step: 178080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-09 07:11:12,963-Speed 5980.16 samples/sec Loss 2.6457 LearningRate 0.0099 Epoch: 17 Global Step: 178090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-09 07:11:19,808-Speed 5985.48 samples/sec Loss 2.6497 LearningRate 0.0098 Epoch: 17 Global Step: 178100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:11:26,662-Speed 5977.84 samples/sec Loss 2.6722 LearningRate 0.0098 Epoch: 17 Global Step: 178110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:11:33,522-Speed 5975.22 samples/sec Loss 2.6126 LearningRate 0.0098 Epoch: 17 Global Step: 178120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:11:40,381-Speed 5972.05 samples/sec Loss 2.6506 LearningRate 0.0098 Epoch: 17 Global Step: 178130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:11:47,242-Speed 5971.21 samples/sec Loss 2.6819 LearningRate 0.0098 Epoch: 17 Global Step: 178140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:11:54,093-Speed 5979.09 samples/sec Loss 2.6359 LearningRate 0.0098 Epoch: 17 Global Step: 178150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:12:00,989-Speed 5943.32 samples/sec Loss 2.6808 LearningRate 0.0098 Epoch: 17 Global Step: 178160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:12:07,861-Speed 5961.62 samples/sec Loss 2.6683 LearningRate 0.0098 Epoch: 17 Global Step: 178170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:12:14,716-Speed 5975.57 samples/sec Loss 2.6793 LearningRate 0.0098 Epoch: 17 Global Step: 178180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:12:21,573-Speed 5974.96 samples/sec Loss 2.6265 LearningRate 0.0098 Epoch: 17 Global Step: 178190 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:12:28,456-Speed 5951.64 samples/sec Loss 2.6476 LearningRate 0.0098 Epoch: 17 Global Step: 178200 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:12:35,308-Speed 5979.34 samples/sec Loss 2.6298 LearningRate 0.0098 Epoch: 17 Global Step: 178210 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:12:42,150-Speed 5988.53 samples/sec Loss 2.6537 LearningRate 0.0098 Epoch: 17 Global Step: 178220 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:12:49,019-Speed 5963.88 samples/sec Loss 2.6294 LearningRate 0.0098 Epoch: 17 Global Step: 178230 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:12:55,882-Speed 5969.38 samples/sec Loss 2.6423 LearningRate 0.0098 Epoch: 17 Global Step: 178240 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:13:02,756-Speed 5960.86 samples/sec Loss 2.6870 LearningRate 0.0097 Epoch: 17 Global Step: 178250 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:13:09,608-Speed 5979.25 samples/sec Loss 2.6799 LearningRate 0.0097 Epoch: 17 Global Step: 178260 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:13:16,458-Speed 5980.59 samples/sec Loss 2.6866 LearningRate 0.0097 Epoch: 17 Global Step: 178270 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:13:23,324-Speed 5966.81 samples/sec Loss 2.6436 LearningRate 0.0097 Epoch: 17 Global Step: 178280 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:13:30,190-Speed 5966.38 samples/sec Loss 2.6343 LearningRate 0.0097 Epoch: 17 Global Step: 178290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:13:37,067-Speed 5958.27 samples/sec Loss 2.6714 LearningRate 0.0097 Epoch: 17 Global Step: 178300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:13:43,926-Speed 5973.19 samples/sec Loss 2.6466 LearningRate 0.0097 Epoch: 17 Global Step: 178310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:13:50,782-Speed 5975.19 samples/sec Loss 2.6617 LearningRate 0.0097 Epoch: 17 Global Step: 178320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:13:57,643-Speed 5971.05 samples/sec Loss 2.6392 LearningRate 0.0097 Epoch: 17 Global Step: 178330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:14:04,493-Speed 5981.60 samples/sec Loss 2.6609 LearningRate 0.0097 Epoch: 17 Global Step: 178340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:14:11,339-Speed 5984.53 samples/sec Loss 2.6645 LearningRate 0.0097 Epoch: 17 Global Step: 178350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:14:18,184-Speed 5985.13 samples/sec Loss 2.6379 LearningRate 0.0097 Epoch: 17 Global Step: 178360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:14:25,038-Speed 5976.60 samples/sec Loss 2.6393 LearningRate 0.0097 Epoch: 17 Global Step: 178370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:14:31,889-Speed 5982.03 samples/sec Loss 2.6425 LearningRate 0.0097 Epoch: 17 Global Step: 178380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:14:38,730-Speed 5989.50 samples/sec Loss 2.6166 LearningRate 0.0097 Epoch: 17 Global Step: 178390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:14:45,607-Speed 5956.59 samples/sec Loss 2.6411 LearningRate 0.0096 Epoch: 17 Global Step: 178400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:14:52,451-Speed 5986.48 samples/sec Loss 2.6095 LearningRate 0.0096 Epoch: 17 Global Step: 178410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:14:59,289-Speed 5990.43 samples/sec Loss 2.6281 LearningRate 0.0096 Epoch: 17 Global Step: 178420 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:15:06,173-Speed 5951.20 samples/sec Loss 2.6619 LearningRate 0.0096 Epoch: 17 Global Step: 178430 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:15:13,028-Speed 5980.28 samples/sec Loss 2.6304 LearningRate 0.0096 Epoch: 17 Global Step: 178440 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:15:19,879-Speed 5979.59 samples/sec Loss 2.6375 LearningRate 0.0096 Epoch: 17 Global Step: 178450 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:15:26,729-Speed 5983.10 samples/sec Loss 2.6484 LearningRate 0.0096 Epoch: 17 Global Step: 178460 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:15:33,579-Speed 5982.01 samples/sec Loss 2.6184 LearningRate 0.0096 Epoch: 17 Global Step: 178470 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:15:40,436-Speed 5974.89 samples/sec Loss 2.6722 LearningRate 0.0096 Epoch: 17 Global Step: 178480 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:15:47,296-Speed 5971.74 samples/sec Loss 2.5991 LearningRate 0.0096 Epoch: 17 Global Step: 178490 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:15:54,154-Speed 5973.85 samples/sec Loss 2.6534 LearningRate 0.0096 Epoch: 17 Global Step: 178500 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:16:01,013-Speed 5973.15 samples/sec Loss 2.6710 LearningRate 0.0096 Epoch: 17 Global Step: 178510 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:16:07,877-Speed 5969.14 samples/sec Loss 2.6307 LearningRate 0.0096 Epoch: 17 Global Step: 178520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:16:14,751-Speed 5959.49 samples/sec Loss 2.6446 LearningRate 0.0096 Epoch: 17 Global Step: 178530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:16:21,611-Speed 5972.39 samples/sec Loss 2.6561 LearningRate 0.0096 Epoch: 17 Global Step: 178540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:16:28,475-Speed 5967.90 samples/sec Loss 2.6146 LearningRate 0.0095 Epoch: 17 Global Step: 178550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:16:35,331-Speed 5975.82 samples/sec Loss 2.6425 LearningRate 0.0095 Epoch: 17 Global Step: 178560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:16:42,220-Speed 5947.00 samples/sec Loss 2.6218 LearningRate 0.0095 Epoch: 17 Global Step: 178570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:16:49,070-Speed 5980.70 samples/sec Loss 2.6481 LearningRate 0.0095 Epoch: 17 Global Step: 178580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:16:55,921-Speed 5980.00 samples/sec Loss 2.5916 LearningRate 0.0095 Epoch: 17 Global Step: 178590 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:17:02,780-Speed 5973.32 samples/sec Loss 2.6171 LearningRate 0.0095 Epoch: 17 Global Step: 178600 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:17:09,627-Speed 5983.06 samples/sec Loss 2.6408 LearningRate 0.0095 Epoch: 17 Global Step: 178610 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:17:16,485-Speed 5973.86 samples/sec Loss 2.6043 LearningRate 0.0095 Epoch: 17 Global Step: 178620 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:17:23,336-Speed 5980.14 samples/sec Loss 2.6296 LearningRate 0.0095 Epoch: 17 Global Step: 178630 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:17:30,208-Speed 5960.93 samples/sec Loss 2.6595 LearningRate 0.0095 Epoch: 17 Global Step: 178640 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:17:37,060-Speed 5979.15 samples/sec Loss 2.6186 LearningRate 0.0095 Epoch: 17 Global Step: 178650 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:17:43,932-Speed 5963.60 samples/sec Loss 2.5984 LearningRate 0.0095 Epoch: 17 Global Step: 178660 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:17:50,810-Speed 5955.64 samples/sec Loss 2.6270 LearningRate 0.0095 Epoch: 17 Global Step: 178670 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:17:57,670-Speed 5972.30 samples/sec Loss 2.6185 LearningRate 0.0095 Epoch: 17 Global Step: 178680 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:18:04,540-Speed 5962.84 samples/sec Loss 2.6059 LearningRate 0.0095 Epoch: 17 Global Step: 178690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:18:11,395-Speed 5976.34 samples/sec Loss 2.6157 LearningRate 0.0094 Epoch: 17 Global Step: 178700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:18:18,263-Speed 5965.19 samples/sec Loss 2.6176 LearningRate 0.0094 Epoch: 17 Global Step: 178710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:18:25,173-Speed 5931.36 samples/sec Loss 2.6295 LearningRate 0.0094 Epoch: 17 Global Step: 178720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:18:32,028-Speed 5977.09 samples/sec Loss 2.6237 LearningRate 0.0094 Epoch: 17 Global Step: 178730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:18:38,879-Speed 5980.10 samples/sec Loss 2.6324 LearningRate 0.0094 Epoch: 17 Global Step: 178740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:18:45,768-Speed 5946.86 samples/sec Loss 2.6280 LearningRate 0.0094 Epoch: 17 Global Step: 178750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:18:52,623-Speed 5975.90 samples/sec Loss 2.6300 LearningRate 0.0094 Epoch: 17 Global Step: 178760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:18:59,486-Speed 5969.79 samples/sec Loss 2.6069 LearningRate 0.0094 Epoch: 17 Global Step: 178770 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:19:06,365-Speed 5955.02 samples/sec Loss 2.5742 LearningRate 0.0094 Epoch: 17 Global Step: 178780 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:19:13,217-Speed 5981.55 samples/sec Loss 2.6415 LearningRate 0.0094 Epoch: 17 Global Step: 178790 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:19:20,069-Speed 5979.02 samples/sec Loss 2.5962 LearningRate 0.0094 Epoch: 17 Global Step: 178800 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:19:26,943-Speed 5959.41 samples/sec Loss 2.6357 LearningRate 0.0094 Epoch: 17 Global Step: 178810 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:19:33,792-Speed 5981.30 samples/sec Loss 2.6165 LearningRate 0.0094 Epoch: 17 Global Step: 178820 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:19:40,638-Speed 5986.01 samples/sec Loss 2.6040 LearningRate 0.0094 Epoch: 17 Global Step: 178830 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:19:47,487-Speed 5981.52 samples/sec Loss 2.6124 LearningRate 0.0094 Epoch: 17 Global Step: 178840 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:19:54,354-Speed 5965.78 samples/sec Loss 2.5931 LearningRate 0.0093 Epoch: 17 Global Step: 178850 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:20:01,217-Speed 5969.48 samples/sec Loss 2.6297 LearningRate 0.0093 Epoch: 17 Global Step: 178860 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:20:08,065-Speed 5982.84 samples/sec Loss 2.6144 LearningRate 0.0093 Epoch: 17 Global Step: 178870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:20:14,913-Speed 5982.38 samples/sec Loss 2.6354 LearningRate 0.0093 Epoch: 17 Global Step: 178880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:20:21,791-Speed 5956.68 samples/sec Loss 2.6155 LearningRate 0.0093 Epoch: 17 Global Step: 178890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:20:28,655-Speed 5968.12 samples/sec Loss 2.6109 LearningRate 0.0093 Epoch: 17 Global Step: 178900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:20:35,583-Speed 5913.55 samples/sec Loss 2.5674 LearningRate 0.0093 Epoch: 17 Global Step: 178910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:20:42,565-Speed 5868.38 samples/sec Loss 2.6347 LearningRate 0.0093 Epoch: 17 Global Step: 178920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:20:49,426-Speed 5970.87 samples/sec Loss 2.6119 LearningRate 0.0093 Epoch: 17 Global Step: 178930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:20:56,296-Speed 5965.42 samples/sec Loss 2.6041 LearningRate 0.0093 Epoch: 17 Global Step: 178940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:21:03,206-Speed 5928.10 samples/sec Loss 2.5958 LearningRate 0.0093 Epoch: 17 Global Step: 178950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:21:10,065-Speed 5975.47 samples/sec Loss 2.5955 LearningRate 0.0093 Epoch: 17 Global Step: 178960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:21:16,905-Speed 5989.80 samples/sec Loss 2.6161 LearningRate 0.0093 Epoch: 17 Global Step: 178970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:21:23,766-Speed 5970.86 samples/sec Loss 2.5918 LearningRate 0.0093 Epoch: 17 Global Step: 178980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:21:30,612-Speed 5983.96 samples/sec Loss 2.5943 LearningRate 0.0093 Epoch: 17 Global Step: 178990 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:21:37,464-Speed 5979.19 samples/sec Loss 2.6100 LearningRate 0.0092 Epoch: 17 Global Step: 179000 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:21:44,326-Speed 5969.93 samples/sec Loss 2.6033 LearningRate 0.0092 Epoch: 17 Global Step: 179010 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:21:51,182-Speed 5975.48 samples/sec Loss 2.5989 LearningRate 0.0092 Epoch: 17 Global Step: 179020 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:21:58,037-Speed 5976.31 samples/sec Loss 2.6074 LearningRate 0.0092 Epoch: 17 Global Step: 179030 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:22:04,908-Speed 5961.99 samples/sec Loss 2.6147 LearningRate 0.0092 Epoch: 17 Global Step: 179040 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:22:11,781-Speed 5960.65 samples/sec Loss 2.6352 LearningRate 0.0092 Epoch: 17 Global Step: 179050 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:22:18,658-Speed 5958.50 samples/sec Loss 2.5793 LearningRate 0.0092 Epoch: 17 Global Step: 179060 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:22:25,517-Speed 5973.99 samples/sec Loss 2.5815 LearningRate 0.0092 Epoch: 17 Global Step: 179070 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:22:32,364-Speed 5983.12 samples/sec Loss 2.6226 LearningRate 0.0092 Epoch: 17 Global Step: 179080 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 07:22:39,225-Speed 5972.48 samples/sec Loss 2.5740 LearningRate 0.0092 Epoch: 17 Global Step: 179090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:22:46,070-Speed 5984.95 samples/sec Loss 2.5865 LearningRate 0.0092 Epoch: 17 Global Step: 179100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:22:52,939-Speed 5964.68 samples/sec Loss 2.6167 LearningRate 0.0092 Epoch: 17 Global Step: 179110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:22:59,789-Speed 5980.86 samples/sec Loss 2.6150 LearningRate 0.0092 Epoch: 17 Global Step: 179120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:23:06,648-Speed 5975.31 samples/sec Loss 2.5833 LearningRate 0.0092 Epoch: 17 Global Step: 179130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:23:13,521-Speed 5961.14 samples/sec Loss 2.5988 LearningRate 0.0092 Epoch: 17 Global Step: 179140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:23:20,406-Speed 5950.20 samples/sec Loss 2.5836 LearningRate 0.0092 Epoch: 17 Global Step: 179150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:23:27,249-Speed 5987.14 samples/sec Loss 2.6243 LearningRate 0.0091 Epoch: 17 Global Step: 179160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:23:34,120-Speed 5962.24 samples/sec Loss 2.5752 LearningRate 0.0091 Epoch: 17 Global Step: 179170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:23:41,005-Speed 5952.03 samples/sec Loss 2.6094 LearningRate 0.0091 Epoch: 17 Global Step: 179180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:23:47,871-Speed 5966.97 samples/sec Loss 2.6053 LearningRate 0.0091 Epoch: 17 Global Step: 179190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-09 07:23:54,722-Speed 5979.56 samples/sec Loss 2.5970 LearningRate 0.0091 Epoch: 17 Global Step: 179200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:24:01,595-Speed 5961.23 samples/sec Loss 2.6088 LearningRate 0.0091 Epoch: 17 Global Step: 179210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:24:08,469-Speed 5959.73 samples/sec Loss 2.6042 LearningRate 0.0091 Epoch: 17 Global Step: 179220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:24:15,322-Speed 5978.20 samples/sec Loss 2.6043 LearningRate 0.0091 Epoch: 17 Global Step: 179230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:24:22,166-Speed 5985.68 samples/sec Loss 2.5635 LearningRate 0.0091 Epoch: 17 Global Step: 179240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:24:29,021-Speed 5976.50 samples/sec Loss 2.5980 LearningRate 0.0091 Epoch: 17 Global Step: 179250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:24:35,874-Speed 5977.92 samples/sec Loss 2.6034 LearningRate 0.0091 Epoch: 17 Global Step: 179260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:24:42,718-Speed 5986.31 samples/sec Loss 2.6087 LearningRate 0.0091 Epoch: 17 Global Step: 179270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:24:49,570-Speed 5979.06 samples/sec Loss 2.6133 LearningRate 0.0091 Epoch: 17 Global Step: 179280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:24:56,413-Speed 5986.57 samples/sec Loss 2.5973 LearningRate 0.0091 Epoch: 17 Global Step: 179290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:25:03,265-Speed 5978.83 samples/sec Loss 2.5676 LearningRate 0.0091 Epoch: 17 Global Step: 179300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:25:10,124-Speed 5973.05 samples/sec Loss 2.6063 LearningRate 0.0090 Epoch: 17 Global Step: 179310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:25:17,013-Speed 5946.89 samples/sec Loss 2.5935 LearningRate 0.0090 Epoch: 17 Global Step: 179320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 07:25:23,873-Speed 5972.74 samples/sec Loss 2.6066 LearningRate 0.0090 Epoch: 17 Global Step: 179330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:25:30,830-Speed 5888.65 samples/sec Loss 2.5397 LearningRate 0.0090 Epoch: 17 Global Step: 179340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:25:37,662-Speed 5998.04 samples/sec Loss 2.5235 LearningRate 0.0090 Epoch: 17 Global Step: 179350 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:25:44,515-Speed 5977.87 samples/sec Loss 2.5768 LearningRate 0.0090 Epoch: 17 Global Step: 179360 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:25:51,369-Speed 5977.16 samples/sec Loss 2.5803 LearningRate 0.0090 Epoch: 17 Global Step: 179370 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:25:58,233-Speed 5971.12 samples/sec Loss 2.5933 LearningRate 0.0090 Epoch: 17 Global Step: 179380 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:26:05,097-Speed 5968.22 samples/sec Loss 2.6006 LearningRate 0.0090 Epoch: 17 Global Step: 179390 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:26:11,951-Speed 5977.22 samples/sec Loss 2.5742 LearningRate 0.0090 Epoch: 17 Global Step: 179400 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:26:18,813-Speed 5969.92 samples/sec Loss 2.5846 LearningRate 0.0090 Epoch: 17 Global Step: 179410 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:26:25,715-Speed 5936.22 samples/sec Loss 2.5388 LearningRate 0.0090 Epoch: 17 Global Step: 179420 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:26:32,607-Speed 5943.70 samples/sec Loss 2.5679 LearningRate 0.0090 Epoch: 17 Global Step: 179430 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:26:39,469-Speed 5970.52 samples/sec Loss 2.5438 LearningRate 0.0090 Epoch: 17 Global Step: 179440 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:26:46,320-Speed 5979.76 samples/sec Loss 2.5877 LearningRate 0.0090 Epoch: 17 Global Step: 179450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:26:53,184-Speed 5969.30 samples/sec Loss 2.5763 LearningRate 0.0090 Epoch: 17 Global Step: 179460 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:27:00,039-Speed 5976.03 samples/sec Loss 2.5713 LearningRate 0.0089 Epoch: 17 Global Step: 179470 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:27:06,920-Speed 5953.96 samples/sec Loss 2.5806 LearningRate 0.0089 Epoch: 17 Global Step: 179480 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:27:13,821-Speed 5936.86 samples/sec Loss 2.5736 LearningRate 0.0089 Epoch: 17 Global Step: 179490 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:27:20,729-Speed 5930.73 samples/sec Loss 2.5702 LearningRate 0.0089 Epoch: 17 Global Step: 179500 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:27:27,633-Speed 5933.94 samples/sec Loss 2.5851 LearningRate 0.0089 Epoch: 17 Global Step: 179510 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:27:34,508-Speed 5959.20 samples/sec Loss 2.5598 LearningRate 0.0089 Epoch: 17 Global Step: 179520 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:27:41,395-Speed 5948.52 samples/sec Loss 2.5385 LearningRate 0.0089 Epoch: 17 Global Step: 179530 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:27:48,243-Speed 5982.73 samples/sec Loss 2.5851 LearningRate 0.0089 Epoch: 17 Global Step: 179540 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:27:55,110-Speed 5965.84 samples/sec Loss 2.5521 LearningRate 0.0089 Epoch: 17 Global Step: 179550 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:28:02,010-Speed 5938.71 samples/sec Loss 2.5691 LearningRate 0.0089 Epoch: 17 Global Step: 179560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:28:08,851-Speed 5989.75 samples/sec Loss 2.5682 LearningRate 0.0089 Epoch: 17 Global Step: 179570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:28:15,699-Speed 5982.35 samples/sec Loss 2.5525 LearningRate 0.0089 Epoch: 17 Global Step: 179580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:28:22,556-Speed 5974.81 samples/sec Loss 2.5869 LearningRate 0.0089 Epoch: 17 Global Step: 179590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:28:29,398-Speed 5987.22 samples/sec Loss 2.5235 LearningRate 0.0089 Epoch: 17 Global Step: 179600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:28:36,263-Speed 5967.83 samples/sec Loss 2.5353 LearningRate 0.0089 Epoch: 17 Global Step: 179610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:28:43,134-Speed 5962.82 samples/sec Loss 2.5734 LearningRate 0.0088 Epoch: 17 Global Step: 179620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:28:50,029-Speed 5941.99 samples/sec Loss 2.5497 LearningRate 0.0088 Epoch: 17 Global Step: 179630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:28:56,884-Speed 5978.32 samples/sec Loss 2.5637 LearningRate 0.0088 Epoch: 17 Global Step: 179640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:29:03,773-Speed 5946.83 samples/sec Loss 2.5682 LearningRate 0.0088 Epoch: 17 Global Step: 179650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:29:10,615-Speed 5987.87 samples/sec Loss 2.5683 LearningRate 0.0088 Epoch: 17 Global Step: 179660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:29:17,469-Speed 5979.34 samples/sec Loss 2.5364 LearningRate 0.0088 Epoch: 17 Global Step: 179670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:29:24,320-Speed 5980.00 samples/sec Loss 2.5609 LearningRate 0.0088 Epoch: 17 Global Step: 179680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:29:31,174-Speed 5976.92 samples/sec Loss 2.5157 LearningRate 0.0088 Epoch: 17 Global Step: 179690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:29:38,042-Speed 5965.42 samples/sec Loss 2.5703 LearningRate 0.0088 Epoch: 17 Global Step: 179700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:29:44,892-Speed 5980.40 samples/sec Loss 2.5655 LearningRate 0.0088 Epoch: 17 Global Step: 179710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:29:51,740-Speed 5982.85 samples/sec Loss 2.5432 LearningRate 0.0088 Epoch: 17 Global Step: 179720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:29:58,580-Speed 5989.76 samples/sec Loss 2.5464 LearningRate 0.0088 Epoch: 17 Global Step: 179730 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:30:05,437-Speed 5974.37 samples/sec Loss 2.5593 LearningRate 0.0088 Epoch: 17 Global Step: 179740 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:30:12,284-Speed 5985.65 samples/sec Loss 2.5678 LearningRate 0.0088 Epoch: 17 Global Step: 179750 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:30:19,148-Speed 5968.94 samples/sec Loss 2.5566 LearningRate 0.0088 Epoch: 17 Global Step: 179760 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:30:26,022-Speed 5960.38 samples/sec Loss 2.5450 LearningRate 0.0088 Epoch: 17 Global Step: 179770 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:30:32,888-Speed 5966.77 samples/sec Loss 2.5444 LearningRate 0.0087 Epoch: 17 Global Step: 179780 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:30:39,757-Speed 5964.29 samples/sec Loss 2.5677 LearningRate 0.0087 Epoch: 17 Global Step: 179790 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:30:46,598-Speed 5988.92 samples/sec Loss 2.5828 LearningRate 0.0087 Epoch: 17 Global Step: 179800 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:30:53,439-Speed 5988.61 samples/sec Loss 2.5641 LearningRate 0.0087 Epoch: 17 Global Step: 179810 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:31:00,319-Speed 5955.27 samples/sec Loss 2.5646 LearningRate 0.0087 Epoch: 17 Global Step: 179820 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:31:07,185-Speed 5966.56 samples/sec Loss 2.5205 LearningRate 0.0087 Epoch: 17 Global Step: 179830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:31:14,034-Speed 5981.41 samples/sec Loss 2.5379 LearningRate 0.0087 Epoch: 17 Global Step: 179840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:31:20,904-Speed 5962.55 samples/sec Loss 2.5869 LearningRate 0.0087 Epoch: 17 Global Step: 179850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:31:27,749-Speed 5985.12 samples/sec Loss 2.5442 LearningRate 0.0087 Epoch: 17 Global Step: 179860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:31:34,598-Speed 5981.75 samples/sec Loss 2.5870 LearningRate 0.0087 Epoch: 17 Global Step: 179870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:31:41,452-Speed 5976.90 samples/sec Loss 2.5528 LearningRate 0.0087 Epoch: 17 Global Step: 179880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:31:48,849-Speed 5540.77 samples/sec Loss 2.5337 LearningRate 0.0087 Epoch: 17 Global Step: 179890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:31:55,744-Speed 5960.51 samples/sec Loss 2.5422 LearningRate 0.0087 Epoch: 17 Global Step: 179900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:32:02,599-Speed 5977.04 samples/sec Loss 2.5382 LearningRate 0.0087 Epoch: 17 Global Step: 179910 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:32:09,455-Speed 5975.03 samples/sec Loss 2.5595 LearningRate 0.0087 Epoch: 17 Global Step: 179920 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:32:16,307-Speed 5978.59 samples/sec Loss 2.5447 LearningRate 0.0087 Epoch: 17 Global Step: 179930 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:32:23,180-Speed 5961.24 samples/sec Loss 2.5516 LearningRate 0.0086 Epoch: 17 Global Step: 179940 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:32:30,034-Speed 5977.34 samples/sec Loss 2.5531 LearningRate 0.0086 Epoch: 17 Global Step: 179950 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:32:36,880-Speed 5984.00 samples/sec Loss 2.5434 LearningRate 0.0086 Epoch: 17 Global Step: 179960 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:32:43,746-Speed 5967.04 samples/sec Loss 2.5319 LearningRate 0.0086 Epoch: 17 Global Step: 179970 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:32:50,602-Speed 5975.25 samples/sec Loss 2.4881 LearningRate 0.0086 Epoch: 17 Global Step: 179980 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:32:57,446-Speed 5986.37 samples/sec Loss 2.5487 LearningRate 0.0086 Epoch: 17 Global Step: 179990 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:33:04,302-Speed 5977.41 samples/sec Loss 2.5412 LearningRate 0.0086 Epoch: 17 Global Step: 180000 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:33:31,318-[lfw][180000]XNorm: 23.782969 Training: 2022-01-09 07:33:31,319-[lfw][180000]Accuracy-Flip: 0.99833+-0.00236 Training: 2022-01-09 07:33:31,320-[lfw][180000]Accuracy-Highest: 0.99833 Training: 2022-01-09 07:34:02,720-[cfp_fp][180000]XNorm: 21.364016 Training: 2022-01-09 07:34:02,721-[cfp_fp][180000]Accuracy-Flip: 0.99086+-0.00400 Training: 2022-01-09 07:34:02,722-[cfp_fp][180000]Accuracy-Highest: 0.99229 Training: 2022-01-09 07:34:29,288-[agedb_30][180000]XNorm: 23.191659 Training: 2022-01-09 07:34:29,289-[agedb_30][180000]Accuracy-Flip: 0.98200+-0.00591 Training: 2022-01-09 07:34:29,289-[agedb_30][180000]Accuracy-Highest: 0.98200 Training: 2022-01-09 07:34:36,114-Speed 446.13 samples/sec Loss 2.5488 LearningRate 0.0086 Epoch: 17 Global Step: 180010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:34:42,974-Speed 5972.64 samples/sec Loss 2.5654 LearningRate 0.0086 Epoch: 17 Global Step: 180020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:34:49,820-Speed 5984.54 samples/sec Loss 2.5724 LearningRate 0.0086 Epoch: 17 Global Step: 180030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:34:56,683-Speed 5971.08 samples/sec Loss 2.5476 LearningRate 0.0086 Epoch: 17 Global Step: 180040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:35:03,547-Speed 5968.62 samples/sec Loss 2.5303 LearningRate 0.0086 Epoch: 17 Global Step: 180050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:35:10,403-Speed 5975.68 samples/sec Loss 2.5235 LearningRate 0.0086 Epoch: 17 Global Step: 180060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:35:17,250-Speed 5983.42 samples/sec Loss 2.5218 LearningRate 0.0086 Epoch: 17 Global Step: 180070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:35:24,110-Speed 5972.16 samples/sec Loss 2.5479 LearningRate 0.0086 Epoch: 17 Global Step: 180080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:35:30,987-Speed 5957.33 samples/sec Loss 2.5110 LearningRate 0.0086 Epoch: 17 Global Step: 180090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:35:37,858-Speed 5963.04 samples/sec Loss 2.5242 LearningRate 0.0085 Epoch: 17 Global Step: 180100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:35:44,695-Speed 5992.61 samples/sec Loss 2.5311 LearningRate 0.0085 Epoch: 17 Global Step: 180110 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:35:51,552-Speed 5974.28 samples/sec Loss 2.4757 LearningRate 0.0085 Epoch: 17 Global Step: 180120 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:35:58,401-Speed 5982.03 samples/sec Loss 2.5228 LearningRate 0.0085 Epoch: 17 Global Step: 180130 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:36:05,272-Speed 5962.94 samples/sec Loss 2.5186 LearningRate 0.0085 Epoch: 17 Global Step: 180140 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:36:12,122-Speed 5983.13 samples/sec Loss 2.5538 LearningRate 0.0085 Epoch: 17 Global Step: 180150 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:36:18,984-Speed 5970.38 samples/sec Loss 2.5708 LearningRate 0.0085 Epoch: 17 Global Step: 180160 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:36:25,834-Speed 5980.43 samples/sec Loss 2.5308 LearningRate 0.0085 Epoch: 17 Global Step: 180170 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:36:32,689-Speed 5976.58 samples/sec Loss 2.4817 LearningRate 0.0085 Epoch: 17 Global Step: 180180 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:36:39,547-Speed 5973.88 samples/sec Loss 2.4986 LearningRate 0.0085 Epoch: 17 Global Step: 180190 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:36:46,387-Speed 5989.85 samples/sec Loss 2.5272 LearningRate 0.0085 Epoch: 17 Global Step: 180200 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:36:53,284-Speed 5940.25 samples/sec Loss 2.5318 LearningRate 0.0085 Epoch: 17 Global Step: 180210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:37:00,174-Speed 5946.06 samples/sec Loss 2.5337 LearningRate 0.0085 Epoch: 17 Global Step: 180220 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:37:07,033-Speed 5973.10 samples/sec Loss 2.4955 LearningRate 0.0085 Epoch: 17 Global Step: 180230 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:37:13,923-Speed 5946.53 samples/sec Loss 2.5104 LearningRate 0.0085 Epoch: 17 Global Step: 180240 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:37:20,759-Speed 5992.62 samples/sec Loss 2.5289 LearningRate 0.0085 Epoch: 17 Global Step: 180250 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:37:27,599-Speed 5989.43 samples/sec Loss 2.5460 LearningRate 0.0084 Epoch: 17 Global Step: 180260 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:37:34,472-Speed 5960.57 samples/sec Loss 2.5245 LearningRate 0.0084 Epoch: 17 Global Step: 180270 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:37:41,333-Speed 5971.15 samples/sec Loss 2.5399 LearningRate 0.0084 Epoch: 17 Global Step: 180280 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:37:48,175-Speed 5990.43 samples/sec Loss 2.4967 LearningRate 0.0084 Epoch: 17 Global Step: 180290 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:37:55,027-Speed 5978.98 samples/sec Loss 2.5441 LearningRate 0.0084 Epoch: 17 Global Step: 180300 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:38:01,872-Speed 5985.04 samples/sec Loss 2.5133 LearningRate 0.0084 Epoch: 17 Global Step: 180310 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:38:08,723-Speed 5980.19 samples/sec Loss 2.5050 LearningRate 0.0084 Epoch: 17 Global Step: 180320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:38:15,568-Speed 5985.02 samples/sec Loss 2.5173 LearningRate 0.0084 Epoch: 17 Global Step: 180330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:38:22,435-Speed 5965.37 samples/sec Loss 2.4786 LearningRate 0.0084 Epoch: 17 Global Step: 180340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:38:29,349-Speed 5926.11 samples/sec Loss 2.5234 LearningRate 0.0084 Epoch: 17 Global Step: 180350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:38:36,218-Speed 5964.55 samples/sec Loss 2.5091 LearningRate 0.0084 Epoch: 17 Global Step: 180360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:38:43,062-Speed 5985.80 samples/sec Loss 2.4930 LearningRate 0.0084 Epoch: 17 Global Step: 180370 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:38:49,917-Speed 5977.03 samples/sec Loss 2.5021 LearningRate 0.0084 Epoch: 17 Global Step: 180380 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:38:56,768-Speed 5979.65 samples/sec Loss 2.5424 LearningRate 0.0084 Epoch: 17 Global Step: 180390 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:39:03,649-Speed 5956.91 samples/sec Loss 2.5327 LearningRate 0.0084 Epoch: 17 Global Step: 180400 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:39:10,506-Speed 5975.69 samples/sec Loss 2.5749 LearningRate 0.0084 Epoch: 17 Global Step: 180410 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:39:17,380-Speed 5959.92 samples/sec Loss 2.5383 LearningRate 0.0083 Epoch: 17 Global Step: 180420 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:39:24,246-Speed 5967.15 samples/sec Loss 2.4971 LearningRate 0.0083 Epoch: 17 Global Step: 180430 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:39:31,088-Speed 5987.22 samples/sec Loss 2.5086 LearningRate 0.0083 Epoch: 17 Global Step: 180440 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:39:37,941-Speed 5979.14 samples/sec Loss 2.5255 LearningRate 0.0083 Epoch: 17 Global Step: 180450 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:39:44,795-Speed 5977.00 samples/sec Loss 2.4816 LearningRate 0.0083 Epoch: 17 Global Step: 180460 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:39:51,638-Speed 5986.74 samples/sec Loss 2.4887 LearningRate 0.0083 Epoch: 17 Global Step: 180470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:39:58,497-Speed 5972.89 samples/sec Loss 2.5200 LearningRate 0.0083 Epoch: 17 Global Step: 180480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:40:05,361-Speed 5968.69 samples/sec Loss 2.5337 LearningRate 0.0083 Epoch: 17 Global Step: 180490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:40:12,209-Speed 5982.29 samples/sec Loss 2.4890 LearningRate 0.0083 Epoch: 17 Global Step: 180500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:40:19,072-Speed 5969.41 samples/sec Loss 2.5120 LearningRate 0.0083 Epoch: 17 Global Step: 180510 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:40:25,920-Speed 5982.72 samples/sec Loss 2.4712 LearningRate 0.0083 Epoch: 17 Global Step: 180520 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:40:32,777-Speed 5974.38 samples/sec Loss 2.4527 LearningRate 0.0083 Epoch: 17 Global Step: 180530 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:40:39,638-Speed 5971.51 samples/sec Loss 2.5007 LearningRate 0.0083 Epoch: 17 Global Step: 180540 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:40:46,499-Speed 5971.15 samples/sec Loss 2.4702 LearningRate 0.0083 Epoch: 17 Global Step: 180550 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:40:53,367-Speed 5965.48 samples/sec Loss 2.5145 LearningRate 0.0083 Epoch: 17 Global Step: 180560 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:41:00,222-Speed 5976.53 samples/sec Loss 2.5173 LearningRate 0.0083 Epoch: 17 Global Step: 180570 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:41:07,069-Speed 5982.89 samples/sec Loss 2.5355 LearningRate 0.0082 Epoch: 17 Global Step: 180580 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 07:41:13,916-Speed 5983.10 samples/sec Loss 2.4946 LearningRate 0.0082 Epoch: 17 Global Step: 180590 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 07:41:20,760-Speed 5987.04 samples/sec Loss 2.4974 LearningRate 0.0082 Epoch: 17 Global Step: 180600 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 07:41:27,602-Speed 5988.29 samples/sec Loss 2.5073 LearningRate 0.0082 Epoch: 17 Global Step: 180610 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 07:41:34,475-Speed 5960.14 samples/sec Loss 2.4802 LearningRate 0.0082 Epoch: 17 Global Step: 180620 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 07:41:41,327-Speed 5978.73 samples/sec Loss 2.5181 LearningRate 0.0082 Epoch: 17 Global Step: 180630 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 07:41:48,281-Speed 5891.70 samples/sec Loss 2.5239 LearningRate 0.0082 Epoch: 17 Global Step: 180640 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 07:41:55,130-Speed 5981.56 samples/sec Loss 2.4907 LearningRate 0.0082 Epoch: 17 Global Step: 180650 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 07:42:02,006-Speed 5958.09 samples/sec Loss 2.4871 LearningRate 0.0082 Epoch: 17 Global Step: 180660 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 07:42:08,883-Speed 5957.11 samples/sec Loss 2.4913 LearningRate 0.0082 Epoch: 17 Global Step: 180670 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 07:42:16,411-Speed 5442.41 samples/sec Loss 2.4654 LearningRate 0.0082 Epoch: 17 Global Step: 180680 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:42:23,300-Speed 5946.84 samples/sec Loss 2.5029 LearningRate 0.0082 Epoch: 17 Global Step: 180690 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:42:30,170-Speed 5963.65 samples/sec Loss 2.4833 LearningRate 0.0082 Epoch: 17 Global Step: 180700 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:42:37,027-Speed 5974.69 samples/sec Loss 2.4865 LearningRate 0.0082 Epoch: 17 Global Step: 180710 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:42:43,890-Speed 5971.01 samples/sec Loss 2.4861 LearningRate 0.0082 Epoch: 17 Global Step: 180720 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:42:50,743-Speed 5978.40 samples/sec Loss 2.4456 LearningRate 0.0082 Epoch: 17 Global Step: 180730 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:42:57,635-Speed 5943.12 samples/sec Loss 2.4720 LearningRate 0.0081 Epoch: 17 Global Step: 180740 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:43:04,531-Speed 5941.42 samples/sec Loss 2.4896 LearningRate 0.0081 Epoch: 17 Global Step: 180750 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:43:11,429-Speed 5938.96 samples/sec Loss 2.5145 LearningRate 0.0081 Epoch: 17 Global Step: 180760 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:43:18,290-Speed 5970.52 samples/sec Loss 2.4903 LearningRate 0.0081 Epoch: 17 Global Step: 180770 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:43:25,162-Speed 5960.87 samples/sec Loss 2.4979 LearningRate 0.0081 Epoch: 17 Global Step: 180780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:43:32,015-Speed 5978.66 samples/sec Loss 2.4909 LearningRate 0.0081 Epoch: 17 Global Step: 180790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:43:38,862-Speed 5982.65 samples/sec Loss 2.4916 LearningRate 0.0081 Epoch: 17 Global Step: 180800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:43:45,717-Speed 5976.87 samples/sec Loss 2.4740 LearningRate 0.0081 Epoch: 17 Global Step: 180810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:43:52,567-Speed 5980.34 samples/sec Loss 2.4903 LearningRate 0.0081 Epoch: 17 Global Step: 180820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:43:59,429-Speed 5970.41 samples/sec Loss 2.4667 LearningRate 0.0081 Epoch: 17 Global Step: 180830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:44:06,276-Speed 5983.58 samples/sec Loss 2.4756 LearningRate 0.0081 Epoch: 17 Global Step: 180840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:44:13,138-Speed 5970.25 samples/sec Loss 2.4847 LearningRate 0.0081 Epoch: 17 Global Step: 180850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:44:20,010-Speed 5961.38 samples/sec Loss 2.5092 LearningRate 0.0081 Epoch: 17 Global Step: 180860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:44:26,881-Speed 5962.26 samples/sec Loss 2.4558 LearningRate 0.0081 Epoch: 17 Global Step: 180870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:44:33,740-Speed 5972.61 samples/sec Loss 2.4941 LearningRate 0.0081 Epoch: 17 Global Step: 180880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-09 07:44:40,589-Speed 5981.81 samples/sec Loss 2.5028 LearningRate 0.0081 Epoch: 17 Global Step: 180890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:44:47,458-Speed 5964.07 samples/sec Loss 2.4573 LearningRate 0.0081 Epoch: 17 Global Step: 180900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:44:54,313-Speed 5976.87 samples/sec Loss 2.4644 LearningRate 0.0080 Epoch: 17 Global Step: 180910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:45:01,167-Speed 5976.76 samples/sec Loss 2.4658 LearningRate 0.0080 Epoch: 17 Global Step: 180920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:45:08,027-Speed 5972.12 samples/sec Loss 2.4522 LearningRate 0.0080 Epoch: 17 Global Step: 180930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:45:14,901-Speed 5960.61 samples/sec Loss 2.4948 LearningRate 0.0080 Epoch: 17 Global Step: 180940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:45:21,796-Speed 5941.47 samples/sec Loss 2.4681 LearningRate 0.0080 Epoch: 17 Global Step: 180950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:45:28,710-Speed 5925.17 samples/sec Loss 2.4704 LearningRate 0.0080 Epoch: 17 Global Step: 180960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:45:35,587-Speed 5958.66 samples/sec Loss 2.4574 LearningRate 0.0080 Epoch: 17 Global Step: 180970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:45:42,450-Speed 5968.73 samples/sec Loss 2.4789 LearningRate 0.0080 Epoch: 17 Global Step: 180980 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:45:49,312-Speed 5975.78 samples/sec Loss 2.4677 LearningRate 0.0080 Epoch: 17 Global Step: 180990 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:45:56,300-Speed 5862.39 samples/sec Loss 2.4898 LearningRate 0.0080 Epoch: 17 Global Step: 181000 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:46:03,255-Speed 5890.68 samples/sec Loss 2.4528 LearningRate 0.0080 Epoch: 17 Global Step: 181010 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:46:10,113-Speed 5973.26 samples/sec Loss 2.4587 LearningRate 0.0080 Epoch: 17 Global Step: 181020 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:46:16,973-Speed 5972.63 samples/sec Loss 2.4633 LearningRate 0.0080 Epoch: 17 Global Step: 181030 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:46:23,841-Speed 5967.48 samples/sec Loss 2.4624 LearningRate 0.0080 Epoch: 17 Global Step: 181040 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:46:30,692-Speed 5978.79 samples/sec Loss 2.4972 LearningRate 0.0080 Epoch: 17 Global Step: 181050 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:46:37,565-Speed 5960.65 samples/sec Loss 2.4621 LearningRate 0.0080 Epoch: 17 Global Step: 181060 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:46:44,427-Speed 5970.53 samples/sec Loss 2.4350 LearningRate 0.0079 Epoch: 17 Global Step: 181070 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:46:51,293-Speed 5967.34 samples/sec Loss 2.4778 LearningRate 0.0079 Epoch: 17 Global Step: 181080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:46:58,163-Speed 5963.18 samples/sec Loss 2.4735 LearningRate 0.0079 Epoch: 17 Global Step: 181090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:47:05,020-Speed 5974.54 samples/sec Loss 2.4592 LearningRate 0.0079 Epoch: 17 Global Step: 181100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:47:11,879-Speed 5972.33 samples/sec Loss 2.4867 LearningRate 0.0079 Epoch: 17 Global Step: 181110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:47:18,742-Speed 5969.37 samples/sec Loss 2.4977 LearningRate 0.0079 Epoch: 17 Global Step: 181120 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:47:25,614-Speed 5961.67 samples/sec Loss 2.4512 LearningRate 0.0079 Epoch: 17 Global Step: 181130 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:47:32,485-Speed 5962.19 samples/sec Loss 2.4410 LearningRate 0.0079 Epoch: 17 Global Step: 181140 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:47:39,344-Speed 5972.62 samples/sec Loss 2.4926 LearningRate 0.0079 Epoch: 17 Global Step: 181150 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:47:46,241-Speed 5940.56 samples/sec Loss 2.4557 LearningRate 0.0079 Epoch: 17 Global Step: 181160 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:47:53,129-Speed 5948.00 samples/sec Loss 2.4430 LearningRate 0.0079 Epoch: 17 Global Step: 181170 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:47:59,988-Speed 5972.57 samples/sec Loss 2.4964 LearningRate 0.0079 Epoch: 17 Global Step: 181180 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:48:06,863-Speed 5959.12 samples/sec Loss 2.4493 LearningRate 0.0079 Epoch: 17 Global Step: 181190 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:48:13,726-Speed 5969.09 samples/sec Loss 2.4830 LearningRate 0.0079 Epoch: 17 Global Step: 181200 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:48:20,599-Speed 5960.72 samples/sec Loss 2.4436 LearningRate 0.0079 Epoch: 17 Global Step: 181210 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:48:27,465-Speed 5966.63 samples/sec Loss 2.4679 LearningRate 0.0079 Epoch: 17 Global Step: 181220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:48:34,331-Speed 5966.75 samples/sec Loss 2.4481 LearningRate 0.0079 Epoch: 17 Global Step: 181230 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:48:41,211-Speed 5954.80 samples/sec Loss 2.4510 LearningRate 0.0078 Epoch: 17 Global Step: 181240 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:48:48,081-Speed 5969.12 samples/sec Loss 2.4668 LearningRate 0.0078 Epoch: 17 Global Step: 181250 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:48:54,952-Speed 5962.57 samples/sec Loss 2.4447 LearningRate 0.0078 Epoch: 17 Global Step: 181260 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:49:01,835-Speed 5952.24 samples/sec Loss 2.4645 LearningRate 0.0078 Epoch: 17 Global Step: 181270 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:49:08,730-Speed 5943.63 samples/sec Loss 2.4285 LearningRate 0.0078 Epoch: 17 Global Step: 181280 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:49:15,590-Speed 5971.93 samples/sec Loss 2.4528 LearningRate 0.0078 Epoch: 17 Global Step: 181290 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:49:22,512-Speed 5919.44 samples/sec Loss 2.4630 LearningRate 0.0078 Epoch: 17 Global Step: 181300 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:49:29,392-Speed 5954.74 samples/sec Loss 2.4634 LearningRate 0.0078 Epoch: 17 Global Step: 181310 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:49:36,257-Speed 5967.43 samples/sec Loss 2.4523 LearningRate 0.0078 Epoch: 17 Global Step: 181320 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:49:43,136-Speed 5955.57 samples/sec Loss 2.4546 LearningRate 0.0078 Epoch: 17 Global Step: 181330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:49:50,015-Speed 5954.89 samples/sec Loss 2.4429 LearningRate 0.0078 Epoch: 17 Global Step: 181340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:49:56,875-Speed 5972.41 samples/sec Loss 2.4560 LearningRate 0.0078 Epoch: 17 Global Step: 181350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:50:03,746-Speed 5961.77 samples/sec Loss 2.4369 LearningRate 0.0078 Epoch: 17 Global Step: 181360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:50:10,606-Speed 5971.34 samples/sec Loss 2.4921 LearningRate 0.0078 Epoch: 17 Global Step: 181370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:50:17,474-Speed 5965.36 samples/sec Loss 2.4588 LearningRate 0.0078 Epoch: 17 Global Step: 181380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:50:24,342-Speed 5964.87 samples/sec Loss 2.4561 LearningRate 0.0078 Epoch: 17 Global Step: 181390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:50:31,229-Speed 5948.92 samples/sec Loss 2.4565 LearningRate 0.0078 Epoch: 17 Global Step: 181400 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:50:38,083-Speed 5977.56 samples/sec Loss 2.4458 LearningRate 0.0077 Epoch: 17 Global Step: 181410 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:50:44,945-Speed 5970.00 samples/sec Loss 2.4580 LearningRate 0.0077 Epoch: 17 Global Step: 181420 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:50:51,808-Speed 5968.89 samples/sec Loss 2.4618 LearningRate 0.0077 Epoch: 17 Global Step: 181430 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:50:58,715-Speed 5932.73 samples/sec Loss 2.4140 LearningRate 0.0077 Epoch: 17 Global Step: 181440 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:51:05,580-Speed 5968.21 samples/sec Loss 2.4285 LearningRate 0.0077 Epoch: 17 Global Step: 181450 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:51:12,445-Speed 5967.46 samples/sec Loss 2.4331 LearningRate 0.0077 Epoch: 17 Global Step: 181460 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:51:19,307-Speed 5970.18 samples/sec Loss 2.4529 LearningRate 0.0077 Epoch: 17 Global Step: 181470 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:51:26,162-Speed 5975.63 samples/sec Loss 2.4240 LearningRate 0.0077 Epoch: 17 Global Step: 181480 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:51:33,035-Speed 5961.96 samples/sec Loss 2.4427 LearningRate 0.0077 Epoch: 17 Global Step: 181490 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:51:39,899-Speed 5969.54 samples/sec Loss 2.4349 LearningRate 0.0077 Epoch: 17 Global Step: 181500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:51:46,756-Speed 5973.82 samples/sec Loss 2.4260 LearningRate 0.0077 Epoch: 17 Global Step: 181510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:51:53,638-Speed 5953.12 samples/sec Loss 2.4222 LearningRate 0.0077 Epoch: 17 Global Step: 181520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:52:00,496-Speed 5974.79 samples/sec Loss 2.4625 LearningRate 0.0077 Epoch: 17 Global Step: 181530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:52:07,354-Speed 5973.94 samples/sec Loss 2.4364 LearningRate 0.0077 Epoch: 17 Global Step: 181540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:52:14,246-Speed 5944.53 samples/sec Loss 2.4530 LearningRate 0.0077 Epoch: 17 Global Step: 181550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:52:21,099-Speed 5978.42 samples/sec Loss 2.4337 LearningRate 0.0077 Epoch: 17 Global Step: 181560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:52:27,948-Speed 5981.91 samples/sec Loss 2.4158 LearningRate 0.0076 Epoch: 17 Global Step: 181570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:52:34,795-Speed 5983.37 samples/sec Loss 2.4068 LearningRate 0.0076 Epoch: 17 Global Step: 181580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:52:41,645-Speed 5980.32 samples/sec Loss 2.4027 LearningRate 0.0076 Epoch: 17 Global Step: 181590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:52:48,494-Speed 5981.60 samples/sec Loss 2.4200 LearningRate 0.0076 Epoch: 17 Global Step: 181600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:52:55,366-Speed 5961.51 samples/sec Loss 2.4061 LearningRate 0.0076 Epoch: 17 Global Step: 181610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:53:02,215-Speed 5982.25 samples/sec Loss 2.4399 LearningRate 0.0076 Epoch: 17 Global Step: 181620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:53:09,086-Speed 5962.30 samples/sec Loss 2.4265 LearningRate 0.0076 Epoch: 17 Global Step: 181630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:53:15,940-Speed 5977.04 samples/sec Loss 2.4159 LearningRate 0.0076 Epoch: 17 Global Step: 181640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:53:22,799-Speed 5972.57 samples/sec Loss 2.4702 LearningRate 0.0076 Epoch: 17 Global Step: 181650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:53:29,656-Speed 5975.03 samples/sec Loss 2.4265 LearningRate 0.0076 Epoch: 17 Global Step: 181660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:53:36,505-Speed 5981.42 samples/sec Loss 2.4495 LearningRate 0.0076 Epoch: 17 Global Step: 181670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:53:43,370-Speed 5967.89 samples/sec Loss 2.4133 LearningRate 0.0076 Epoch: 17 Global Step: 181680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:53:50,214-Speed 5985.90 samples/sec Loss 2.4294 LearningRate 0.0076 Epoch: 17 Global Step: 181690 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:53:57,063-Speed 5981.20 samples/sec Loss 2.4311 LearningRate 0.0076 Epoch: 17 Global Step: 181700 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:54:03,921-Speed 5974.28 samples/sec Loss 2.4512 LearningRate 0.0076 Epoch: 17 Global Step: 181710 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:54:10,784-Speed 5969.54 samples/sec Loss 2.4511 LearningRate 0.0076 Epoch: 17 Global Step: 181720 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:54:17,661-Speed 5957.95 samples/sec Loss 2.4912 LearningRate 0.0076 Epoch: 17 Global Step: 181730 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:54:24,536-Speed 5958.35 samples/sec Loss 2.4500 LearningRate 0.0075 Epoch: 17 Global Step: 181740 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:54:31,399-Speed 5969.19 samples/sec Loss 2.4021 LearningRate 0.0075 Epoch: 17 Global Step: 181750 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:54:38,262-Speed 5969.24 samples/sec Loss 2.4609 LearningRate 0.0075 Epoch: 17 Global Step: 181760 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:54:45,129-Speed 5966.35 samples/sec Loss 2.4037 LearningRate 0.0075 Epoch: 17 Global Step: 181770 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:54:51,990-Speed 5970.92 samples/sec Loss 2.4665 LearningRate 0.0075 Epoch: 17 Global Step: 181780 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:54:58,847-Speed 5973.79 samples/sec Loss 2.4485 LearningRate 0.0075 Epoch: 17 Global Step: 181790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:55:05,699-Speed 5979.40 samples/sec Loss 2.4502 LearningRate 0.0075 Epoch: 17 Global Step: 181800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:55:12,545-Speed 5983.78 samples/sec Loss 2.4283 LearningRate 0.0075 Epoch: 17 Global Step: 181810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:55:19,405-Speed 5972.53 samples/sec Loss 2.4568 LearningRate 0.0075 Epoch: 17 Global Step: 181820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:55:26,266-Speed 5970.48 samples/sec Loss 2.4014 LearningRate 0.0075 Epoch: 17 Global Step: 181830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:55:33,125-Speed 5973.18 samples/sec Loss 2.4355 LearningRate 0.0075 Epoch: 17 Global Step: 181840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:55:40,006-Speed 5955.90 samples/sec Loss 2.4123 LearningRate 0.0075 Epoch: 17 Global Step: 181850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:55:46,874-Speed 5964.81 samples/sec Loss 2.4176 LearningRate 0.0075 Epoch: 17 Global Step: 181860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:55:53,730-Speed 5975.76 samples/sec Loss 2.4399 LearningRate 0.0075 Epoch: 17 Global Step: 181870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:56:00,601-Speed 5962.21 samples/sec Loss 2.4220 LearningRate 0.0075 Epoch: 17 Global Step: 181880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:56:07,444-Speed 5986.97 samples/sec Loss 2.3658 LearningRate 0.0075 Epoch: 17 Global Step: 181890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:56:14,301-Speed 5974.36 samples/sec Loss 2.3940 LearningRate 0.0075 Epoch: 17 Global Step: 181900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:56:21,150-Speed 5981.81 samples/sec Loss 2.4006 LearningRate 0.0074 Epoch: 17 Global Step: 181910 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:56:28,016-Speed 5967.21 samples/sec Loss 2.4511 LearningRate 0.0074 Epoch: 17 Global Step: 181920 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:56:34,879-Speed 5969.11 samples/sec Loss 2.4168 LearningRate 0.0074 Epoch: 17 Global Step: 181930 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:56:41,776-Speed 5939.80 samples/sec Loss 2.3872 LearningRate 0.0074 Epoch: 17 Global Step: 181940 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:56:48,665-Speed 5947.71 samples/sec Loss 2.4208 LearningRate 0.0074 Epoch: 17 Global Step: 181950 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:56:55,521-Speed 5979.45 samples/sec Loss 2.4037 LearningRate 0.0074 Epoch: 17 Global Step: 181960 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:57:02,378-Speed 5974.53 samples/sec Loss 2.4317 LearningRate 0.0074 Epoch: 17 Global Step: 181970 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:57:09,234-Speed 5975.84 samples/sec Loss 2.4104 LearningRate 0.0074 Epoch: 17 Global Step: 181980 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:57:16,110-Speed 5957.55 samples/sec Loss 2.4059 LearningRate 0.0074 Epoch: 17 Global Step: 181990 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:57:22,958-Speed 5984.06 samples/sec Loss 2.4360 LearningRate 0.0074 Epoch: 17 Global Step: 182000 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:57:29,826-Speed 5964.82 samples/sec Loss 2.4262 LearningRate 0.0074 Epoch: 17 Global Step: 182010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:57:36,670-Speed 5985.59 samples/sec Loss 2.4179 LearningRate 0.0074 Epoch: 17 Global Step: 182020 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:57:43,548-Speed 5957.00 samples/sec Loss 2.3923 LearningRate 0.0074 Epoch: 17 Global Step: 182030 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:57:50,552-Speed 5849.49 samples/sec Loss 2.4115 LearningRate 0.0074 Epoch: 17 Global Step: 182040 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:57:57,504-Speed 5893.39 samples/sec Loss 2.3960 LearningRate 0.0074 Epoch: 17 Global Step: 182050 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:58:04,411-Speed 5931.94 samples/sec Loss 2.3850 LearningRate 0.0074 Epoch: 17 Global Step: 182060 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:58:11,262-Speed 5978.94 samples/sec Loss 2.4188 LearningRate 0.0074 Epoch: 17 Global Step: 182070 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:58:18,127-Speed 5968.38 samples/sec Loss 2.3757 LearningRate 0.0073 Epoch: 17 Global Step: 182080 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:58:24,973-Speed 5984.03 samples/sec Loss 2.4052 LearningRate 0.0073 Epoch: 17 Global Step: 182090 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:58:31,884-Speed 5928.09 samples/sec Loss 2.4706 LearningRate 0.0073 Epoch: 17 Global Step: 182100 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:58:38,740-Speed 5975.81 samples/sec Loss 2.3990 LearningRate 0.0073 Epoch: 17 Global Step: 182110 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 07:58:45,593-Speed 5977.95 samples/sec Loss 2.4267 LearningRate 0.0073 Epoch: 17 Global Step: 182120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:58:52,476-Speed 5952.15 samples/sec Loss 2.4029 LearningRate 0.0073 Epoch: 17 Global Step: 182130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:58:59,401-Speed 5915.67 samples/sec Loss 2.4031 LearningRate 0.0073 Epoch: 17 Global Step: 182140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:59:06,265-Speed 5968.62 samples/sec Loss 2.3707 LearningRate 0.0073 Epoch: 17 Global Step: 182150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:59:13,140-Speed 5958.37 samples/sec Loss 2.3830 LearningRate 0.0073 Epoch: 17 Global Step: 182160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:59:19,998-Speed 5973.55 samples/sec Loss 2.4386 LearningRate 0.0073 Epoch: 17 Global Step: 182170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:59:26,847-Speed 5981.29 samples/sec Loss 2.3621 LearningRate 0.0073 Epoch: 17 Global Step: 182180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:59:33,701-Speed 5977.06 samples/sec Loss 2.4133 LearningRate 0.0073 Epoch: 17 Global Step: 182190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:59:40,565-Speed 5968.57 samples/sec Loss 2.4207 LearningRate 0.0073 Epoch: 17 Global Step: 182200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:59:47,427-Speed 5972.07 samples/sec Loss 2.3792 LearningRate 0.0073 Epoch: 17 Global Step: 182210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 07:59:54,279-Speed 5978.77 samples/sec Loss 2.3715 LearningRate 0.0073 Epoch: 17 Global Step: 182220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:00:01,145-Speed 5967.12 samples/sec Loss 2.4069 LearningRate 0.0073 Epoch: 17 Global Step: 182230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:00:08,007-Speed 5970.18 samples/sec Loss 2.3732 LearningRate 0.0073 Epoch: 17 Global Step: 182240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:00:14,855-Speed 5982.45 samples/sec Loss 2.3816 LearningRate 0.0073 Epoch: 17 Global Step: 182250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:00:21,709-Speed 5977.46 samples/sec Loss 2.3908 LearningRate 0.0072 Epoch: 17 Global Step: 182260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:00:28,595-Speed 5949.37 samples/sec Loss 2.3951 LearningRate 0.0072 Epoch: 17 Global Step: 182270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:00:35,525-Speed 5911.43 samples/sec Loss 2.4082 LearningRate 0.0072 Epoch: 17 Global Step: 182280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:00:42,402-Speed 5956.99 samples/sec Loss 2.3759 LearningRate 0.0072 Epoch: 17 Global Step: 182290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:00:49,245-Speed 5986.87 samples/sec Loss 2.3782 LearningRate 0.0072 Epoch: 17 Global Step: 182300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:00:56,096-Speed 5980.54 samples/sec Loss 2.3656 LearningRate 0.0072 Epoch: 17 Global Step: 182310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:01:02,960-Speed 5968.38 samples/sec Loss 2.4231 LearningRate 0.0072 Epoch: 17 Global Step: 182320 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:01:09,842-Speed 5954.21 samples/sec Loss 2.3672 LearningRate 0.0072 Epoch: 17 Global Step: 182330 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:01:16,716-Speed 5961.62 samples/sec Loss 2.4350 LearningRate 0.0072 Epoch: 17 Global Step: 182340 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:01:23,575-Speed 5972.46 samples/sec Loss 2.3892 LearningRate 0.0072 Epoch: 17 Global Step: 182350 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:01:30,448-Speed 5960.64 samples/sec Loss 2.3663 LearningRate 0.0072 Epoch: 17 Global Step: 182360 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:01:37,311-Speed 5972.30 samples/sec Loss 2.4000 LearningRate 0.0072 Epoch: 17 Global Step: 182370 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:01:44,173-Speed 5970.07 samples/sec Loss 2.4069 LearningRate 0.0072 Epoch: 17 Global Step: 182380 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:01:51,022-Speed 5981.95 samples/sec Loss 2.4214 LearningRate 0.0072 Epoch: 17 Global Step: 182390 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:01:57,876-Speed 5977.22 samples/sec Loss 2.3503 LearningRate 0.0072 Epoch: 17 Global Step: 182400 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:02:04,729-Speed 5978.50 samples/sec Loss 2.3704 LearningRate 0.0072 Epoch: 17 Global Step: 182410 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:02:11,597-Speed 5964.61 samples/sec Loss 2.3674 LearningRate 0.0072 Epoch: 17 Global Step: 182420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:02:18,459-Speed 5969.82 samples/sec Loss 2.3514 LearningRate 0.0071 Epoch: 17 Global Step: 182430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:02:25,320-Speed 5973.86 samples/sec Loss 2.3945 LearningRate 0.0071 Epoch: 17 Global Step: 182440 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 08:02:32,181-Speed 5971.86 samples/sec Loss 2.3951 LearningRate 0.0071 Epoch: 17 Global Step: 182450 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 08:02:39,077-Speed 5940.24 samples/sec Loss 2.3539 LearningRate 0.0071 Epoch: 17 Global Step: 182460 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 08:02:45,950-Speed 5960.43 samples/sec Loss 2.4117 LearningRate 0.0071 Epoch: 17 Global Step: 182470 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 08:02:52,806-Speed 5975.24 samples/sec Loss 2.3656 LearningRate 0.0071 Epoch: 17 Global Step: 182480 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 08:02:59,663-Speed 5975.10 samples/sec Loss 2.3425 LearningRate 0.0071 Epoch: 17 Global Step: 182490 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 08:03:06,535-Speed 5962.00 samples/sec Loss 2.3937 LearningRate 0.0071 Epoch: 17 Global Step: 182500 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 08:03:13,393-Speed 5973.95 samples/sec Loss 2.3211 LearningRate 0.0071 Epoch: 17 Global Step: 182510 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 08:03:20,252-Speed 5973.12 samples/sec Loss 2.4037 LearningRate 0.0071 Epoch: 17 Global Step: 182520 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 08:03:27,155-Speed 5934.69 samples/sec Loss 2.3478 LearningRate 0.0071 Epoch: 17 Global Step: 182530 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 08:03:34,013-Speed 5974.05 samples/sec Loss 2.3949 LearningRate 0.0071 Epoch: 17 Global Step: 182540 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:03:40,871-Speed 5973.14 samples/sec Loss 2.3311 LearningRate 0.0071 Epoch: 17 Global Step: 182550 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:03:47,716-Speed 5985.03 samples/sec Loss 2.3782 LearningRate 0.0071 Epoch: 17 Global Step: 182560 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:03:54,564-Speed 5982.58 samples/sec Loss 2.3672 LearningRate 0.0071 Epoch: 17 Global Step: 182570 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:04:01,428-Speed 5968.23 samples/sec Loss 2.3426 LearningRate 0.0071 Epoch: 17 Global Step: 182580 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:04:08,286-Speed 5974.16 samples/sec Loss 2.3675 LearningRate 0.0071 Epoch: 17 Global Step: 182590 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:04:15,136-Speed 5979.88 samples/sec Loss 2.3847 LearningRate 0.0071 Epoch: 17 Global Step: 182600 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:04:21,994-Speed 5975.88 samples/sec Loss 2.4151 LearningRate 0.0070 Epoch: 17 Global Step: 182610 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:04:28,861-Speed 5965.57 samples/sec Loss 2.3470 LearningRate 0.0070 Epoch: 17 Global Step: 182620 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:04:35,712-Speed 5979.97 samples/sec Loss 2.4114 LearningRate 0.0070 Epoch: 17 Global Step: 182630 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:04:42,607-Speed 5944.74 samples/sec Loss 2.3777 LearningRate 0.0070 Epoch: 17 Global Step: 182640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:04:49,478-Speed 5962.93 samples/sec Loss 2.3676 LearningRate 0.0070 Epoch: 17 Global Step: 182650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:04:56,321-Speed 5986.20 samples/sec Loss 2.3630 LearningRate 0.0070 Epoch: 17 Global Step: 182660 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:05:03,177-Speed 5975.70 samples/sec Loss 2.3927 LearningRate 0.0070 Epoch: 17 Global Step: 182670 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:05:10,045-Speed 5967.84 samples/sec Loss 2.3825 LearningRate 0.0070 Epoch: 17 Global Step: 182680 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:05:16,916-Speed 5962.29 samples/sec Loss 2.3604 LearningRate 0.0070 Epoch: 17 Global Step: 182690 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:05:23,771-Speed 5977.69 samples/sec Loss 2.3800 LearningRate 0.0070 Epoch: 17 Global Step: 182700 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:05:30,625-Speed 5976.98 samples/sec Loss 2.3838 LearningRate 0.0070 Epoch: 17 Global Step: 182710 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:05:37,517-Speed 5947.09 samples/sec Loss 2.3949 LearningRate 0.0070 Epoch: 17 Global Step: 182720 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:05:44,436-Speed 5921.28 samples/sec Loss 2.3702 LearningRate 0.0070 Epoch: 17 Global Step: 182730 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:05:51,302-Speed 5967.09 samples/sec Loss 2.3576 LearningRate 0.0070 Epoch: 17 Global Step: 182740 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:05:58,149-Speed 5982.54 samples/sec Loss 2.3692 LearningRate 0.0070 Epoch: 17 Global Step: 182750 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:06:05,004-Speed 5975.90 samples/sec Loss 2.3525 LearningRate 0.0070 Epoch: 17 Global Step: 182760 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:06:11,855-Speed 5982.77 samples/sec Loss 2.3472 LearningRate 0.0070 Epoch: 17 Global Step: 182770 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:06:18,723-Speed 5965.10 samples/sec Loss 2.3754 LearningRate 0.0069 Epoch: 17 Global Step: 182780 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:06:25,566-Speed 5987.12 samples/sec Loss 2.3589 LearningRate 0.0069 Epoch: 17 Global Step: 182790 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:06:32,415-Speed 5981.68 samples/sec Loss 2.3618 LearningRate 0.0069 Epoch: 17 Global Step: 182800 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:06:39,261-Speed 5984.19 samples/sec Loss 2.3712 LearningRate 0.0069 Epoch: 17 Global Step: 182810 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:06:46,124-Speed 5969.99 samples/sec Loss 2.3489 LearningRate 0.0069 Epoch: 17 Global Step: 182820 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:06:52,968-Speed 5985.73 samples/sec Loss 2.3544 LearningRate 0.0069 Epoch: 17 Global Step: 182830 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:06:59,831-Speed 5969.36 samples/sec Loss 2.3591 LearningRate 0.0069 Epoch: 17 Global Step: 182840 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:07:06,680-Speed 5981.57 samples/sec Loss 2.3788 LearningRate 0.0069 Epoch: 17 Global Step: 182850 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:07:13,531-Speed 5982.07 samples/sec Loss 2.3731 LearningRate 0.0069 Epoch: 17 Global Step: 182860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:07:20,374-Speed 5986.45 samples/sec Loss 2.3841 LearningRate 0.0069 Epoch: 17 Global Step: 182870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:07:27,249-Speed 5961.61 samples/sec Loss 2.3518 LearningRate 0.0069 Epoch: 17 Global Step: 182880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:07:34,097-Speed 5983.28 samples/sec Loss 2.3590 LearningRate 0.0069 Epoch: 17 Global Step: 182890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:07:40,982-Speed 5951.56 samples/sec Loss 2.3302 LearningRate 0.0069 Epoch: 17 Global Step: 182900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:07:47,829-Speed 5982.83 samples/sec Loss 2.3628 LearningRate 0.0069 Epoch: 17 Global Step: 182910 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:07:54,702-Speed 5963.70 samples/sec Loss 2.3773 LearningRate 0.0069 Epoch: 17 Global Step: 182920 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:08:01,570-Speed 5965.12 samples/sec Loss 2.3350 LearningRate 0.0069 Epoch: 17 Global Step: 182930 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:08:08,414-Speed 5986.45 samples/sec Loss 2.3358 LearningRate 0.0069 Epoch: 17 Global Step: 182940 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:08:15,272-Speed 5973.89 samples/sec Loss 2.3568 LearningRate 0.0069 Epoch: 17 Global Step: 182950 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:08:22,129-Speed 5975.34 samples/sec Loss 2.3725 LearningRate 0.0068 Epoch: 17 Global Step: 182960 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:08:28,992-Speed 5968.51 samples/sec Loss 2.3476 LearningRate 0.0068 Epoch: 17 Global Step: 182970 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:08:35,844-Speed 5979.18 samples/sec Loss 2.3666 LearningRate 0.0068 Epoch: 17 Global Step: 182980 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:08:42,698-Speed 5977.98 samples/sec Loss 2.3665 LearningRate 0.0068 Epoch: 17 Global Step: 182990 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:08:49,571-Speed 5960.56 samples/sec Loss 2.3450 LearningRate 0.0068 Epoch: 17 Global Step: 183000 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:08:56,446-Speed 5959.73 samples/sec Loss 2.3308 LearningRate 0.0068 Epoch: 17 Global Step: 183010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:09:03,311-Speed 5967.34 samples/sec Loss 2.3634 LearningRate 0.0068 Epoch: 17 Global Step: 183020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:09:10,162-Speed 5979.45 samples/sec Loss 2.3399 LearningRate 0.0068 Epoch: 17 Global Step: 183030 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:09:17,018-Speed 5975.31 samples/sec Loss 2.3680 LearningRate 0.0068 Epoch: 17 Global Step: 183040 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:09:23,889-Speed 5962.61 samples/sec Loss 2.3336 LearningRate 0.0068 Epoch: 17 Global Step: 183050 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:09:30,757-Speed 5964.86 samples/sec Loss 2.3662 LearningRate 0.0068 Epoch: 17 Global Step: 183060 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:09:37,622-Speed 5968.62 samples/sec Loss 2.3754 LearningRate 0.0068 Epoch: 17 Global Step: 183070 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:09:44,480-Speed 5973.31 samples/sec Loss 2.3289 LearningRate 0.0068 Epoch: 17 Global Step: 183080 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:09:51,341-Speed 5971.30 samples/sec Loss 2.3224 LearningRate 0.0068 Epoch: 17 Global Step: 183090 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:09:58,235-Speed 5942.68 samples/sec Loss 2.3721 LearningRate 0.0068 Epoch: 17 Global Step: 183100 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:10:05,113-Speed 5956.19 samples/sec Loss 2.3694 LearningRate 0.0068 Epoch: 17 Global Step: 183110 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:10:12,001-Speed 5947.62 samples/sec Loss 2.3324 LearningRate 0.0068 Epoch: 17 Global Step: 183120 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:10:18,900-Speed 5938.88 samples/sec Loss 2.3273 LearningRate 0.0068 Epoch: 17 Global Step: 183130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:10:25,778-Speed 5955.69 samples/sec Loss 2.2948 LearningRate 0.0067 Epoch: 17 Global Step: 183140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:10:32,709-Speed 5911.39 samples/sec Loss 2.3331 LearningRate 0.0067 Epoch: 17 Global Step: 183150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:10:39,581-Speed 5961.43 samples/sec Loss 2.3319 LearningRate 0.0067 Epoch: 17 Global Step: 183160 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:10:46,444-Speed 5971.09 samples/sec Loss 2.3674 LearningRate 0.0067 Epoch: 17 Global Step: 183170 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:10:53,320-Speed 5957.58 samples/sec Loss 2.3140 LearningRate 0.0067 Epoch: 17 Global Step: 183180 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:11:00,173-Speed 5978.53 samples/sec Loss 2.3473 LearningRate 0.0067 Epoch: 17 Global Step: 183190 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:11:07,050-Speed 5957.06 samples/sec Loss 2.3254 LearningRate 0.0067 Epoch: 17 Global Step: 183200 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:11:13,926-Speed 5957.96 samples/sec Loss 2.3425 LearningRate 0.0067 Epoch: 17 Global Step: 183210 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:11:20,786-Speed 5972.49 samples/sec Loss 2.3655 LearningRate 0.0067 Epoch: 17 Global Step: 183220 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:11:27,650-Speed 5968.60 samples/sec Loss 2.3302 LearningRate 0.0067 Epoch: 17 Global Step: 183230 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:11:34,510-Speed 5971.99 samples/sec Loss 2.3418 LearningRate 0.0067 Epoch: 17 Global Step: 183240 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:11:41,369-Speed 5973.06 samples/sec Loss 2.3123 LearningRate 0.0067 Epoch: 17 Global Step: 183250 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:11:48,236-Speed 5966.12 samples/sec Loss 2.3260 LearningRate 0.0067 Epoch: 17 Global Step: 183260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:11:55,081-Speed 5984.92 samples/sec Loss 2.3514 LearningRate 0.0067 Epoch: 17 Global Step: 183270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:12:01,987-Speed 5932.44 samples/sec Loss 2.3385 LearningRate 0.0067 Epoch: 17 Global Step: 183280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:12:08,859-Speed 5961.69 samples/sec Loss 2.3230 LearningRate 0.0067 Epoch: 17 Global Step: 183290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:12:15,716-Speed 5974.14 samples/sec Loss 2.3304 LearningRate 0.0067 Epoch: 17 Global Step: 183300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:12:22,653-Speed 5906.07 samples/sec Loss 2.3380 LearningRate 0.0067 Epoch: 17 Global Step: 183310 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:12:29,630-Speed 5871.60 samples/sec Loss 2.3411 LearningRate 0.0066 Epoch: 17 Global Step: 183320 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:12:36,486-Speed 5975.45 samples/sec Loss 2.3487 LearningRate 0.0066 Epoch: 17 Global Step: 183330 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:12:43,334-Speed 5983.37 samples/sec Loss 2.3516 LearningRate 0.0066 Epoch: 17 Global Step: 183340 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:12:50,189-Speed 5976.37 samples/sec Loss 2.3684 LearningRate 0.0066 Epoch: 17 Global Step: 183350 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:12:57,048-Speed 5972.79 samples/sec Loss 2.3417 LearningRate 0.0066 Epoch: 17 Global Step: 183360 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:13:03,906-Speed 5974.29 samples/sec Loss 2.3013 LearningRate 0.0066 Epoch: 17 Global Step: 183370 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:13:10,778-Speed 5960.82 samples/sec Loss 2.3226 LearningRate 0.0066 Epoch: 17 Global Step: 183380 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:13:17,632-Speed 5978.04 samples/sec Loss 2.3102 LearningRate 0.0066 Epoch: 17 Global Step: 183390 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:13:24,489-Speed 5974.71 samples/sec Loss 2.2961 LearningRate 0.0066 Epoch: 17 Global Step: 183400 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:13:31,364-Speed 5959.11 samples/sec Loss 2.3090 LearningRate 0.0066 Epoch: 17 Global Step: 183410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:13:38,218-Speed 5977.01 samples/sec Loss 2.3510 LearningRate 0.0066 Epoch: 17 Global Step: 183420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:13:45,069-Speed 5979.89 samples/sec Loss 2.3141 LearningRate 0.0066 Epoch: 17 Global Step: 183430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:13:51,931-Speed 5969.97 samples/sec Loss 2.3466 LearningRate 0.0066 Epoch: 17 Global Step: 183440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:13:58,826-Speed 5941.68 samples/sec Loss 2.3440 LearningRate 0.0066 Epoch: 17 Global Step: 183450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:14:05,705-Speed 5955.30 samples/sec Loss 2.3029 LearningRate 0.0066 Epoch: 17 Global Step: 183460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:14:12,562-Speed 5974.29 samples/sec Loss 2.2943 LearningRate 0.0066 Epoch: 17 Global Step: 183470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:14:19,422-Speed 5972.26 samples/sec Loss 2.3005 LearningRate 0.0066 Epoch: 17 Global Step: 183480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:14:26,268-Speed 5984.07 samples/sec Loss 2.2990 LearningRate 0.0066 Epoch: 17 Global Step: 183490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:14:33,118-Speed 5980.27 samples/sec Loss 2.3218 LearningRate 0.0065 Epoch: 17 Global Step: 183500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:14:39,953-Speed 5993.80 samples/sec Loss 2.3130 LearningRate 0.0065 Epoch: 17 Global Step: 183510 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:14:46,795-Speed 5987.32 samples/sec Loss 2.3281 LearningRate 0.0065 Epoch: 17 Global Step: 183520 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:14:53,664-Speed 5963.83 samples/sec Loss 2.3100 LearningRate 0.0065 Epoch: 17 Global Step: 183530 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:15:00,525-Speed 5971.69 samples/sec Loss 2.3170 LearningRate 0.0065 Epoch: 17 Global Step: 183540 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:15:07,395-Speed 5963.16 samples/sec Loss 2.2818 LearningRate 0.0065 Epoch: 17 Global Step: 183550 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:15:14,288-Speed 5943.28 samples/sec Loss 2.3521 LearningRate 0.0065 Epoch: 17 Global Step: 183560 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:15:21,150-Speed 5970.68 samples/sec Loss 2.2993 LearningRate 0.0065 Epoch: 17 Global Step: 183570 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:15:28,016-Speed 5967.32 samples/sec Loss 2.3262 LearningRate 0.0065 Epoch: 17 Global Step: 183580 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:15:34,854-Speed 5990.88 samples/sec Loss 2.3306 LearningRate 0.0065 Epoch: 17 Global Step: 183590 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:15:41,705-Speed 5980.08 samples/sec Loss 2.3078 LearningRate 0.0065 Epoch: 17 Global Step: 183600 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:15:48,589-Speed 5950.66 samples/sec Loss 2.2866 LearningRate 0.0065 Epoch: 17 Global Step: 183610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:15:55,433-Speed 5986.29 samples/sec Loss 2.3228 LearningRate 0.0065 Epoch: 17 Global Step: 183620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:16:02,271-Speed 5991.03 samples/sec Loss 2.3049 LearningRate 0.0065 Epoch: 17 Global Step: 183630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:16:09,117-Speed 5984.29 samples/sec Loss 2.3169 LearningRate 0.0065 Epoch: 17 Global Step: 183640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:16:15,962-Speed 5984.35 samples/sec Loss 2.3148 LearningRate 0.0065 Epoch: 17 Global Step: 183650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:16:22,795-Speed 5995.48 samples/sec Loss 2.3127 LearningRate 0.0065 Epoch: 17 Global Step: 183660 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:16:29,642-Speed 5983.12 samples/sec Loss 2.3056 LearningRate 0.0065 Epoch: 17 Global Step: 183670 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:16:36,488-Speed 5983.88 samples/sec Loss 2.2772 LearningRate 0.0064 Epoch: 17 Global Step: 183680 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:16:43,333-Speed 5984.70 samples/sec Loss 2.2730 LearningRate 0.0064 Epoch: 17 Global Step: 183690 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:16:50,183-Speed 5980.96 samples/sec Loss 2.3135 LearningRate 0.0064 Epoch: 17 Global Step: 183700 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:16:57,041-Speed 5974.06 samples/sec Loss 2.3291 LearningRate 0.0064 Epoch: 17 Global Step: 183710 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:17:03,902-Speed 5970.76 samples/sec Loss 2.2969 LearningRate 0.0064 Epoch: 17 Global Step: 183720 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:17:10,772-Speed 5962.91 samples/sec Loss 2.3176 LearningRate 0.0064 Epoch: 17 Global Step: 183730 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:17:17,623-Speed 5979.88 samples/sec Loss 2.3291 LearningRate 0.0064 Epoch: 17 Global Step: 183740 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:17:24,478-Speed 5976.20 samples/sec Loss 2.3223 LearningRate 0.0064 Epoch: 17 Global Step: 183750 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:17:31,320-Speed 5987.75 samples/sec Loss 2.2693 LearningRate 0.0064 Epoch: 17 Global Step: 183760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:17:38,167-Speed 5983.06 samples/sec Loss 2.3001 LearningRate 0.0064 Epoch: 17 Global Step: 183770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:17:45,035-Speed 5965.28 samples/sec Loss 2.2865 LearningRate 0.0064 Epoch: 17 Global Step: 183780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:17:51,927-Speed 5944.06 samples/sec Loss 2.3074 LearningRate 0.0064 Epoch: 17 Global Step: 183790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:17:58,791-Speed 5968.83 samples/sec Loss 2.2846 LearningRate 0.0064 Epoch: 17 Global Step: 183800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:18:05,635-Speed 5985.85 samples/sec Loss 2.2844 LearningRate 0.0064 Epoch: 17 Global Step: 183810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:18:12,492-Speed 5974.81 samples/sec Loss 2.3149 LearningRate 0.0064 Epoch: 17 Global Step: 183820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:18:19,353-Speed 5970.90 samples/sec Loss 2.2992 LearningRate 0.0064 Epoch: 17 Global Step: 183830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:18:26,280-Speed 5915.10 samples/sec Loss 2.3256 LearningRate 0.0064 Epoch: 17 Global Step: 183840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:18:33,133-Speed 5978.05 samples/sec Loss 2.2986 LearningRate 0.0064 Epoch: 17 Global Step: 183850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:18:39,992-Speed 5973.47 samples/sec Loss 2.3065 LearningRate 0.0064 Epoch: 17 Global Step: 183860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-09 08:18:46,872-Speed 5953.78 samples/sec Loss 2.3015 LearningRate 0.0063 Epoch: 17 Global Step: 183870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:18:53,730-Speed 5974.13 samples/sec Loss 2.3156 LearningRate 0.0063 Epoch: 17 Global Step: 183880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:19:00,601-Speed 5962.60 samples/sec Loss 2.3064 LearningRate 0.0063 Epoch: 17 Global Step: 183890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:19:07,458-Speed 5974.68 samples/sec Loss 2.3110 LearningRate 0.0063 Epoch: 17 Global Step: 183900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:19:14,311-Speed 5978.05 samples/sec Loss 2.3062 LearningRate 0.0063 Epoch: 17 Global Step: 183910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:19:21,230-Speed 5920.82 samples/sec Loss 2.2641 LearningRate 0.0063 Epoch: 17 Global Step: 183920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:19:28,081-Speed 5980.24 samples/sec Loss 2.2616 LearningRate 0.0063 Epoch: 17 Global Step: 183930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:19:34,942-Speed 5970.79 samples/sec Loss 2.3041 LearningRate 0.0063 Epoch: 17 Global Step: 183940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:19:41,784-Speed 5988.19 samples/sec Loss 2.3026 LearningRate 0.0063 Epoch: 17 Global Step: 183950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:19:48,675-Speed 5944.81 samples/sec Loss 2.3064 LearningRate 0.0063 Epoch: 17 Global Step: 183960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:19:55,533-Speed 5973.36 samples/sec Loss 2.2970 LearningRate 0.0063 Epoch: 17 Global Step: 183970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:20:02,391-Speed 5973.81 samples/sec Loss 2.2726 LearningRate 0.0063 Epoch: 17 Global Step: 183980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:20:09,253-Speed 5970.45 samples/sec Loss 2.2628 LearningRate 0.0063 Epoch: 17 Global Step: 183990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:20:16,097-Speed 5985.85 samples/sec Loss 2.2959 LearningRate 0.0063 Epoch: 17 Global Step: 184000 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:20:22,940-Speed 5988.05 samples/sec Loss 2.2863 LearningRate 0.0063 Epoch: 17 Global Step: 184010 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:20:29,783-Speed 5987.29 samples/sec Loss 2.3138 LearningRate 0.0063 Epoch: 17 Global Step: 184020 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:20:36,632-Speed 5981.35 samples/sec Loss 2.2418 LearningRate 0.0063 Epoch: 17 Global Step: 184030 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:20:43,495-Speed 5969.89 samples/sec Loss 2.2898 LearningRate 0.0063 Epoch: 17 Global Step: 184040 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:20:50,421-Speed 5915.11 samples/sec Loss 2.2944 LearningRate 0.0062 Epoch: 17 Global Step: 184050 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:20:57,263-Speed 5987.62 samples/sec Loss 2.3140 LearningRate 0.0062 Epoch: 17 Global Step: 184060 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:21:04,114-Speed 5980.05 samples/sec Loss 2.2936 LearningRate 0.0062 Epoch: 17 Global Step: 184070 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:21:10,959-Speed 5985.84 samples/sec Loss 2.2601 LearningRate 0.0062 Epoch: 17 Global Step: 184080 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:21:17,849-Speed 5946.77 samples/sec Loss 2.2877 LearningRate 0.0062 Epoch: 17 Global Step: 184090 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:21:24,720-Speed 5961.82 samples/sec Loss 2.3320 LearningRate 0.0062 Epoch: 17 Global Step: 184100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:21:31,665-Speed 5900.00 samples/sec Loss 2.2826 LearningRate 0.0062 Epoch: 17 Global Step: 184110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:21:38,544-Speed 5955.33 samples/sec Loss 2.3253 LearningRate 0.0062 Epoch: 17 Global Step: 184120 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:21:45,400-Speed 5975.12 samples/sec Loss 2.2471 LearningRate 0.0062 Epoch: 17 Global Step: 184130 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:21:52,252-Speed 5979.83 samples/sec Loss 2.2378 LearningRate 0.0062 Epoch: 17 Global Step: 184140 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:21:59,114-Speed 5970.00 samples/sec Loss 2.3296 LearningRate 0.0062 Epoch: 17 Global Step: 184150 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:22:05,986-Speed 5960.98 samples/sec Loss 2.2677 LearningRate 0.0062 Epoch: 17 Global Step: 184160 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:22:12,876-Speed 5946.15 samples/sec Loss 2.2378 LearningRate 0.0062 Epoch: 17 Global Step: 184170 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:22:19,755-Speed 5955.36 samples/sec Loss 2.2743 LearningRate 0.0062 Epoch: 17 Global Step: 184180 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:22:26,632-Speed 5956.76 samples/sec Loss 2.2916 LearningRate 0.0062 Epoch: 17 Global Step: 184190 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:22:33,474-Speed 5987.79 samples/sec Loss 2.2768 LearningRate 0.0062 Epoch: 17 Global Step: 184200 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:22:40,358-Speed 5951.71 samples/sec Loss 2.2782 LearningRate 0.0062 Epoch: 17 Global Step: 184210 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:22:47,224-Speed 5968.81 samples/sec Loss 2.2574 LearningRate 0.0062 Epoch: 17 Global Step: 184220 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:22:54,069-Speed 5985.50 samples/sec Loss 2.2847 LearningRate 0.0062 Epoch: 17 Global Step: 184230 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:23:00,944-Speed 5958.89 samples/sec Loss 2.2947 LearningRate 0.0061 Epoch: 17 Global Step: 184240 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:23:07,811-Speed 5965.19 samples/sec Loss 2.2743 LearningRate 0.0061 Epoch: 17 Global Step: 184250 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:23:14,655-Speed 5985.73 samples/sec Loss 2.2810 LearningRate 0.0061 Epoch: 17 Global Step: 184260 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:23:21,507-Speed 5979.30 samples/sec Loss 2.2832 LearningRate 0.0061 Epoch: 17 Global Step: 184270 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:23:28,365-Speed 5973.31 samples/sec Loss 2.2779 LearningRate 0.0061 Epoch: 17 Global Step: 184280 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:23:35,244-Speed 5955.13 samples/sec Loss 2.2598 LearningRate 0.0061 Epoch: 17 Global Step: 184290 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:23:42,094-Speed 5981.40 samples/sec Loss 2.2817 LearningRate 0.0061 Epoch: 17 Global Step: 184300 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:23:48,950-Speed 5976.31 samples/sec Loss 2.2967 LearningRate 0.0061 Epoch: 17 Global Step: 184310 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:23:55,799-Speed 5981.39 samples/sec Loss 2.2765 LearningRate 0.0061 Epoch: 17 Global Step: 184320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:24:02,646-Speed 5983.78 samples/sec Loss 2.2870 LearningRate 0.0061 Epoch: 17 Global Step: 184330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 08:24:09,574-Speed 5912.80 samples/sec Loss 2.2577 LearningRate 0.0061 Epoch: 17 Global Step: 184340 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:24:16,508-Speed 5908.43 samples/sec Loss 2.2645 LearningRate 0.0061 Epoch: 17 Global Step: 184350 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:24:23,379-Speed 5962.09 samples/sec Loss 2.2540 LearningRate 0.0061 Epoch: 17 Global Step: 184360 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:24:30,232-Speed 5977.62 samples/sec Loss 2.2617 LearningRate 0.0061 Epoch: 17 Global Step: 184370 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:24:37,090-Speed 5973.49 samples/sec Loss 2.2819 LearningRate 0.0061 Epoch: 17 Global Step: 184380 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:24:43,944-Speed 5976.47 samples/sec Loss 2.2694 LearningRate 0.0061 Epoch: 17 Global Step: 184390 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:24:50,793-Speed 5981.65 samples/sec Loss 2.2567 LearningRate 0.0061 Epoch: 17 Global Step: 184400 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:24:57,638-Speed 5984.66 samples/sec Loss 2.2452 LearningRate 0.0061 Epoch: 17 Global Step: 184410 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:25:04,493-Speed 5976.28 samples/sec Loss 2.2516 LearningRate 0.0061 Epoch: 17 Global Step: 184420 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 08:25:11,344-Speed 5979.68 samples/sec Loss 2.2692 LearningRate 0.0060 Epoch: 17 Global Step: 184430 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:25:18,217-Speed 5960.87 samples/sec Loss 2.2653 LearningRate 0.0060 Epoch: 17 Global Step: 184440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:25:25,083-Speed 5966.43 samples/sec Loss 2.2629 LearningRate 0.0060 Epoch: 17 Global Step: 184450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:25:31,942-Speed 5972.69 samples/sec Loss 2.2525 LearningRate 0.0060 Epoch: 17 Global Step: 184460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:25:38,791-Speed 5981.48 samples/sec Loss 2.2878 LearningRate 0.0060 Epoch: 17 Global Step: 184470 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:25:45,650-Speed 5972.61 samples/sec Loss 2.2473 LearningRate 0.0060 Epoch: 17 Global Step: 184480 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:25:52,494-Speed 5987.61 samples/sec Loss 2.2547 LearningRate 0.0060 Epoch: 17 Global Step: 184490 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:25:59,360-Speed 5967.83 samples/sec Loss 2.2477 LearningRate 0.0060 Epoch: 17 Global Step: 184500 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:26:06,213-Speed 5977.80 samples/sec Loss 2.2471 LearningRate 0.0060 Epoch: 17 Global Step: 184510 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:26:13,057-Speed 5985.57 samples/sec Loss 2.2686 LearningRate 0.0060 Epoch: 17 Global Step: 184520 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:26:19,936-Speed 5955.57 samples/sec Loss 2.3005 LearningRate 0.0060 Epoch: 17 Global Step: 184530 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:26:26,797-Speed 5972.02 samples/sec Loss 2.2600 LearningRate 0.0060 Epoch: 17 Global Step: 184540 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:26:33,646-Speed 5981.06 samples/sec Loss 2.2444 LearningRate 0.0060 Epoch: 17 Global Step: 184550 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:26:40,515-Speed 5963.75 samples/sec Loss 2.2531 LearningRate 0.0060 Epoch: 17 Global Step: 184560 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:26:47,381-Speed 5967.39 samples/sec Loss 2.2412 LearningRate 0.0060 Epoch: 17 Global Step: 184570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:26:54,239-Speed 5974.27 samples/sec Loss 2.2518 LearningRate 0.0060 Epoch: 17 Global Step: 184580 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:27:01,096-Speed 5975.29 samples/sec Loss 2.2544 LearningRate 0.0060 Epoch: 17 Global Step: 184590 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:27:07,972-Speed 5957.78 samples/sec Loss 2.2513 LearningRate 0.0060 Epoch: 17 Global Step: 184600 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:27:14,842-Speed 5962.92 samples/sec Loss 2.2685 LearningRate 0.0060 Epoch: 17 Global Step: 184610 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:27:21,708-Speed 5967.41 samples/sec Loss 2.2763 LearningRate 0.0059 Epoch: 17 Global Step: 184620 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:27:28,568-Speed 5974.02 samples/sec Loss 2.2589 LearningRate 0.0059 Epoch: 17 Global Step: 184630 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:27:35,421-Speed 5980.68 samples/sec Loss 2.2200 LearningRate 0.0059 Epoch: 17 Global Step: 184640 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:27:42,271-Speed 5980.22 samples/sec Loss 2.2227 LearningRate 0.0059 Epoch: 17 Global Step: 184650 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:27:49,136-Speed 5967.49 samples/sec Loss 2.2466 LearningRate 0.0059 Epoch: 17 Global Step: 184660 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:27:55,991-Speed 5976.91 samples/sec Loss 2.2297 LearningRate 0.0059 Epoch: 17 Global Step: 184670 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:28:02,867-Speed 5957.84 samples/sec Loss 2.2526 LearningRate 0.0059 Epoch: 17 Global Step: 184680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:28:09,728-Speed 5971.25 samples/sec Loss 2.2973 LearningRate 0.0059 Epoch: 17 Global Step: 184690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:28:16,584-Speed 5976.01 samples/sec Loss 2.2288 LearningRate 0.0059 Epoch: 17 Global Step: 184700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:28:23,437-Speed 5978.22 samples/sec Loss 2.2865 LearningRate 0.0059 Epoch: 17 Global Step: 184710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:28:30,292-Speed 5977.00 samples/sec Loss 2.2525 LearningRate 0.0059 Epoch: 17 Global Step: 184720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:28:37,147-Speed 5976.16 samples/sec Loss 2.2373 LearningRate 0.0059 Epoch: 17 Global Step: 184730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:28:43,996-Speed 5981.48 samples/sec Loss 2.2544 LearningRate 0.0059 Epoch: 17 Global Step: 184740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:28:50,861-Speed 5967.41 samples/sec Loss 2.1925 LearningRate 0.0059 Epoch: 17 Global Step: 184750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:28:57,709-Speed 5982.53 samples/sec Loss 2.2427 LearningRate 0.0059 Epoch: 17 Global Step: 184760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:29:04,563-Speed 5977.41 samples/sec Loss 2.2293 LearningRate 0.0059 Epoch: 17 Global Step: 184770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:29:11,414-Speed 5979.27 samples/sec Loss 2.2347 LearningRate 0.0059 Epoch: 17 Global Step: 184780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:29:18,268-Speed 5978.20 samples/sec Loss 2.2342 LearningRate 0.0059 Epoch: 17 Global Step: 184790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:29:25,139-Speed 5962.02 samples/sec Loss 2.2481 LearningRate 0.0059 Epoch: 17 Global Step: 184800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:29:31,995-Speed 5976.32 samples/sec Loss 2.2463 LearningRate 0.0058 Epoch: 17 Global Step: 184810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:29:38,846-Speed 5979.04 samples/sec Loss 2.2251 LearningRate 0.0058 Epoch: 17 Global Step: 184820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:29:45,701-Speed 5976.29 samples/sec Loss 2.2058 LearningRate 0.0058 Epoch: 17 Global Step: 184830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:29:52,552-Speed 5980.62 samples/sec Loss 2.2344 LearningRate 0.0058 Epoch: 17 Global Step: 184840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:29:59,401-Speed 5983.27 samples/sec Loss 2.2447 LearningRate 0.0058 Epoch: 17 Global Step: 184850 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:30:06,257-Speed 5975.23 samples/sec Loss 2.2522 LearningRate 0.0058 Epoch: 17 Global Step: 184860 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:30:13,117-Speed 5972.51 samples/sec Loss 2.2596 LearningRate 0.0058 Epoch: 17 Global Step: 184870 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:30:19,979-Speed 5969.92 samples/sec Loss 2.2671 LearningRate 0.0058 Epoch: 17 Global Step: 184880 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:30:26,833-Speed 5977.67 samples/sec Loss 2.2552 LearningRate 0.0058 Epoch: 17 Global Step: 184890 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:30:33,702-Speed 5964.13 samples/sec Loss 2.2752 LearningRate 0.0058 Epoch: 17 Global Step: 184900 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:30:40,590-Speed 5947.01 samples/sec Loss 2.2489 LearningRate 0.0058 Epoch: 17 Global Step: 184910 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:30:47,467-Speed 5958.10 samples/sec Loss 2.2194 LearningRate 0.0058 Epoch: 17 Global Step: 184920 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:30:54,338-Speed 5972.00 samples/sec Loss 2.2116 LearningRate 0.0058 Epoch: 17 Global Step: 184930 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:31:01,195-Speed 5974.03 samples/sec Loss 2.2152 LearningRate 0.0058 Epoch: 17 Global Step: 184940 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:31:08,059-Speed 5968.96 samples/sec Loss 2.2217 LearningRate 0.0058 Epoch: 17 Global Step: 184950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:31:14,917-Speed 5974.86 samples/sec Loss 2.2438 LearningRate 0.0058 Epoch: 17 Global Step: 184960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:31:21,788-Speed 5962.60 samples/sec Loss 2.2196 LearningRate 0.0058 Epoch: 17 Global Step: 184970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:31:28,641-Speed 5977.96 samples/sec Loss 2.2137 LearningRate 0.0058 Epoch: 17 Global Step: 184980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:31:35,484-Speed 5987.59 samples/sec Loss 2.1970 LearningRate 0.0058 Epoch: 17 Global Step: 184990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:31:42,322-Speed 5990.42 samples/sec Loss 2.2412 LearningRate 0.0058 Epoch: 17 Global Step: 185000 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:32:09,082-[lfw][185000]XNorm: 23.374321 Training: 2022-01-09 08:32:09,083-[lfw][185000]Accuracy-Flip: 0.99783+-0.00279 Training: 2022-01-09 08:32:09,083-[lfw][185000]Accuracy-Highest: 0.99833 Training: 2022-01-09 08:32:40,032-[cfp_fp][185000]XNorm: 21.096349 Training: 2022-01-09 08:32:40,033-[cfp_fp][185000]Accuracy-Flip: 0.99200+-0.00479 Training: 2022-01-09 08:32:40,034-[cfp_fp][185000]Accuracy-Highest: 0.99229 Training: 2022-01-09 08:33:06,746-[agedb_30][185000]XNorm: 22.766516 Training: 2022-01-09 08:33:06,747-[agedb_30][185000]Accuracy-Flip: 0.98033+-0.00618 Training: 2022-01-09 08:33:06,747-[agedb_30][185000]Accuracy-Highest: 0.98200 Training: 2022-01-09 08:33:13,626-Speed 448.62 samples/sec Loss 2.1973 LearningRate 0.0057 Epoch: 17 Global Step: 185010 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:33:20,455-Speed 6000.40 samples/sec Loss 2.2451 LearningRate 0.0057 Epoch: 17 Global Step: 185020 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:33:27,304-Speed 5981.51 samples/sec Loss 2.2024 LearningRate 0.0057 Epoch: 17 Global Step: 185030 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:33:34,168-Speed 5968.51 samples/sec Loss 2.2103 LearningRate 0.0057 Epoch: 17 Global Step: 185040 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:33:41,013-Speed 5984.96 samples/sec Loss 2.2262 LearningRate 0.0057 Epoch: 17 Global Step: 185050 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:33:47,861-Speed 5982.59 samples/sec Loss 2.2339 LearningRate 0.0057 Epoch: 17 Global Step: 185060 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:33:54,734-Speed 5961.01 samples/sec Loss 2.2266 LearningRate 0.0057 Epoch: 17 Global Step: 185070 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:34:01,610-Speed 5967.70 samples/sec Loss 2.2282 LearningRate 0.0057 Epoch: 17 Global Step: 185080 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:34:08,483-Speed 5960.65 samples/sec Loss 2.2121 LearningRate 0.0057 Epoch: 17 Global Step: 185090 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:34:15,346-Speed 5969.44 samples/sec Loss 2.2161 LearningRate 0.0057 Epoch: 17 Global Step: 185100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:34:22,194-Speed 5982.84 samples/sec Loss 2.2057 LearningRate 0.0057 Epoch: 17 Global Step: 185110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:34:29,052-Speed 5974.07 samples/sec Loss 2.1872 LearningRate 0.0057 Epoch: 17 Global Step: 185120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:34:35,932-Speed 5953.74 samples/sec Loss 2.1965 LearningRate 0.0057 Epoch: 17 Global Step: 185130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:34:42,794-Speed 5970.72 samples/sec Loss 2.2433 LearningRate 0.0057 Epoch: 17 Global Step: 185140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:34:49,652-Speed 5973.38 samples/sec Loss 2.1967 LearningRate 0.0057 Epoch: 17 Global Step: 185150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:34:56,507-Speed 5976.48 samples/sec Loss 2.2087 LearningRate 0.0057 Epoch: 17 Global Step: 185160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:35:03,385-Speed 5956.89 samples/sec Loss 2.2214 LearningRate 0.0057 Epoch: 17 Global Step: 185170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:35:10,227-Speed 5987.78 samples/sec Loss 2.2437 LearningRate 0.0057 Epoch: 17 Global Step: 185180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:35:17,068-Speed 5988.61 samples/sec Loss 2.2219 LearningRate 0.0057 Epoch: 17 Global Step: 185190 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:35:23,912-Speed 5985.51 samples/sec Loss 2.2200 LearningRate 0.0056 Epoch: 17 Global Step: 185200 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:35:30,755-Speed 5987.37 samples/sec Loss 2.2252 LearningRate 0.0056 Epoch: 17 Global Step: 185210 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:35:37,617-Speed 5969.81 samples/sec Loss 2.2056 LearningRate 0.0056 Epoch: 17 Global Step: 185220 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:35:44,482-Speed 5970.58 samples/sec Loss 2.1873 LearningRate 0.0056 Epoch: 17 Global Step: 185230 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:35:51,340-Speed 5973.43 samples/sec Loss 2.1566 LearningRate 0.0056 Epoch: 17 Global Step: 185240 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:35:58,203-Speed 5969.11 samples/sec Loss 2.2330 LearningRate 0.0056 Epoch: 17 Global Step: 185250 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:36:05,091-Speed 5946.73 samples/sec Loss 2.2220 LearningRate 0.0056 Epoch: 17 Global Step: 185260 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:36:11,950-Speed 5973.50 samples/sec Loss 2.1771 LearningRate 0.0056 Epoch: 17 Global Step: 185270 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:36:18,821-Speed 5965.60 samples/sec Loss 2.2209 LearningRate 0.0056 Epoch: 17 Global Step: 185280 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:36:25,691-Speed 5964.55 samples/sec Loss 2.1954 LearningRate 0.0056 Epoch: 17 Global Step: 185290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:36:32,550-Speed 5972.36 samples/sec Loss 2.2083 LearningRate 0.0056 Epoch: 17 Global Step: 185300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:36:39,402-Speed 5979.36 samples/sec Loss 2.2154 LearningRate 0.0056 Epoch: 17 Global Step: 185310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:36:46,251-Speed 5980.88 samples/sec Loss 2.2136 LearningRate 0.0056 Epoch: 17 Global Step: 185320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:36:53,118-Speed 5966.06 samples/sec Loss 2.1821 LearningRate 0.0056 Epoch: 17 Global Step: 185330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:36:59,994-Speed 5957.68 samples/sec Loss 2.2044 LearningRate 0.0056 Epoch: 17 Global Step: 185340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:37:06,864-Speed 5963.54 samples/sec Loss 2.1702 LearningRate 0.0056 Epoch: 17 Global Step: 185350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:37:13,713-Speed 5982.39 samples/sec Loss 2.2251 LearningRate 0.0056 Epoch: 17 Global Step: 185360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:37:20,567-Speed 5976.65 samples/sec Loss 2.2209 LearningRate 0.0056 Epoch: 17 Global Step: 185370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:37:27,440-Speed 5960.53 samples/sec Loss 2.1751 LearningRate 0.0056 Epoch: 17 Global Step: 185380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:37:34,284-Speed 5986.00 samples/sec Loss 2.1951 LearningRate 0.0056 Epoch: 17 Global Step: 185390 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-09 08:37:41,137-Speed 5980.17 samples/sec Loss 2.2064 LearningRate 0.0055 Epoch: 17 Global Step: 185400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:37:47,989-Speed 5978.95 samples/sec Loss 2.1798 LearningRate 0.0055 Epoch: 17 Global Step: 185410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:37:54,840-Speed 5979.95 samples/sec Loss 2.1919 LearningRate 0.0055 Epoch: 17 Global Step: 185420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:38:01,705-Speed 5969.39 samples/sec Loss 2.2259 LearningRate 0.0055 Epoch: 17 Global Step: 185430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:38:08,555-Speed 5981.39 samples/sec Loss 2.2273 LearningRate 0.0055 Epoch: 17 Global Step: 185440 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:38:15,421-Speed 5968.04 samples/sec Loss 2.2131 LearningRate 0.0055 Epoch: 17 Global Step: 185450 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:38:22,288-Speed 5965.82 samples/sec Loss 2.2224 LearningRate 0.0055 Epoch: 17 Global Step: 185460 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:38:29,152-Speed 5968.49 samples/sec Loss 2.1942 LearningRate 0.0055 Epoch: 17 Global Step: 185470 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:38:36,027-Speed 5958.91 samples/sec Loss 2.2022 LearningRate 0.0055 Epoch: 17 Global Step: 185480 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:38:42,904-Speed 5957.31 samples/sec Loss 2.2009 LearningRate 0.0055 Epoch: 17 Global Step: 185490 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:38:49,777-Speed 5960.94 samples/sec Loss 2.1824 LearningRate 0.0055 Epoch: 17 Global Step: 185500 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:38:56,635-Speed 5973.96 samples/sec Loss 2.2004 LearningRate 0.0055 Epoch: 17 Global Step: 185510 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:39:03,500-Speed 5967.31 samples/sec Loss 2.1867 LearningRate 0.0055 Epoch: 17 Global Step: 185520 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:39:10,359-Speed 5973.82 samples/sec Loss 2.1801 LearningRate 0.0055 Epoch: 17 Global Step: 185530 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:39:17,223-Speed 5968.57 samples/sec Loss 2.1871 LearningRate 0.0055 Epoch: 17 Global Step: 185540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:39:24,097-Speed 5960.44 samples/sec Loss 2.2165 LearningRate 0.0055 Epoch: 17 Global Step: 185550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:39:30,970-Speed 5960.65 samples/sec Loss 2.1761 LearningRate 0.0055 Epoch: 17 Global Step: 185560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:39:37,847-Speed 5958.09 samples/sec Loss 2.1926 LearningRate 0.0055 Epoch: 17 Global Step: 185570 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:39:44,717-Speed 5964.76 samples/sec Loss 2.1956 LearningRate 0.0055 Epoch: 17 Global Step: 185580 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:39:51,573-Speed 5974.76 samples/sec Loss 2.1924 LearningRate 0.0055 Epoch: 17 Global Step: 185590 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:39:58,422-Speed 5982.15 samples/sec Loss 2.2134 LearningRate 0.0054 Epoch: 17 Global Step: 185600 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:40:05,261-Speed 5989.83 samples/sec Loss 2.1834 LearningRate 0.0054 Epoch: 17 Global Step: 185610 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:40:12,128-Speed 5965.69 samples/sec Loss 2.1932 LearningRate 0.0054 Epoch: 17 Global Step: 185620 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:40:18,990-Speed 5970.32 samples/sec Loss 2.1876 LearningRate 0.0054 Epoch: 17 Global Step: 185630 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:40:25,849-Speed 5972.77 samples/sec Loss 2.1834 LearningRate 0.0054 Epoch: 17 Global Step: 185640 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:40:32,714-Speed 5966.51 samples/sec Loss 2.1679 LearningRate 0.0054 Epoch: 17 Global Step: 185650 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:40:39,573-Speed 5972.69 samples/sec Loss 2.2173 LearningRate 0.0054 Epoch: 17 Global Step: 185660 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:40:46,445-Speed 5962.46 samples/sec Loss 2.1945 LearningRate 0.0054 Epoch: 17 Global Step: 185670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:40:53,319-Speed 5959.72 samples/sec Loss 2.1763 LearningRate 0.0054 Epoch: 17 Global Step: 185680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:41:00,201-Speed 5952.68 samples/sec Loss 2.1705 LearningRate 0.0054 Epoch: 17 Global Step: 185690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:41:07,058-Speed 5977.48 samples/sec Loss 2.2075 LearningRate 0.0054 Epoch: 17 Global Step: 185700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:41:13,932-Speed 5960.97 samples/sec Loss 2.2194 LearningRate 0.0054 Epoch: 17 Global Step: 185710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:41:20,785-Speed 5978.22 samples/sec Loss 2.2135 LearningRate 0.0054 Epoch: 17 Global Step: 185720 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:41:27,662-Speed 5956.72 samples/sec Loss 2.2018 LearningRate 0.0054 Epoch: 17 Global Step: 185730 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:41:34,513-Speed 5979.87 samples/sec Loss 2.1834 LearningRate 0.0054 Epoch: 17 Global Step: 185740 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:41:41,384-Speed 5962.36 samples/sec Loss 2.1782 LearningRate 0.0054 Epoch: 17 Global Step: 185750 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:41:48,260-Speed 5958.12 samples/sec Loss 2.1681 LearningRate 0.0054 Epoch: 17 Global Step: 185760 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:41:55,127-Speed 5968.71 samples/sec Loss 2.1507 LearningRate 0.0054 Epoch: 17 Global Step: 185770 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:42:01,986-Speed 5972.61 samples/sec Loss 2.1730 LearningRate 0.0054 Epoch: 17 Global Step: 185780 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:42:08,870-Speed 5951.24 samples/sec Loss 2.1953 LearningRate 0.0054 Epoch: 17 Global Step: 185790 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:42:15,720-Speed 5981.32 samples/sec Loss 2.1967 LearningRate 0.0053 Epoch: 17 Global Step: 185800 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:42:22,598-Speed 5956.54 samples/sec Loss 2.2070 LearningRate 0.0053 Epoch: 17 Global Step: 185810 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:42:29,482-Speed 5951.48 samples/sec Loss 2.1898 LearningRate 0.0053 Epoch: 17 Global Step: 185820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:42:36,338-Speed 5974.55 samples/sec Loss 2.1845 LearningRate 0.0053 Epoch: 17 Global Step: 185830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:42:43,242-Speed 5937.33 samples/sec Loss 2.1834 LearningRate 0.0053 Epoch: 17 Global Step: 185840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:42:50,127-Speed 5949.78 samples/sec Loss 2.1642 LearningRate 0.0053 Epoch: 17 Global Step: 185850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:42:56,984-Speed 5976.99 samples/sec Loss 2.1888 LearningRate 0.0053 Epoch: 17 Global Step: 185860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:43:03,846-Speed 5970.14 samples/sec Loss 2.1925 LearningRate 0.0053 Epoch: 17 Global Step: 185870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:43:10,728-Speed 5953.59 samples/sec Loss 2.1813 LearningRate 0.0053 Epoch: 17 Global Step: 185880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:43:17,580-Speed 5978.44 samples/sec Loss 2.1554 LearningRate 0.0053 Epoch: 17 Global Step: 185890 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:43:24,427-Speed 5983.89 samples/sec Loss 2.1977 LearningRate 0.0053 Epoch: 17 Global Step: 185900 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:43:31,279-Speed 5978.83 samples/sec Loss 2.2051 LearningRate 0.0053 Epoch: 17 Global Step: 185910 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:43:38,171-Speed 5944.57 samples/sec Loss 2.1850 LearningRate 0.0053 Epoch: 17 Global Step: 185920 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:43:45,016-Speed 5984.43 samples/sec Loss 2.1933 LearningRate 0.0053 Epoch: 17 Global Step: 185930 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:43:51,953-Speed 5906.39 samples/sec Loss 2.1895 LearningRate 0.0053 Epoch: 17 Global Step: 185940 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:43:58,923-Speed 5878.03 samples/sec Loss 2.1761 LearningRate 0.0053 Epoch: 17 Global Step: 185950 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:44:05,879-Speed 5889.07 samples/sec Loss 2.1577 LearningRate 0.0053 Epoch: 17 Global Step: 185960 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:44:12,866-Speed 5864.17 samples/sec Loss 2.1471 LearningRate 0.0053 Epoch: 17 Global Step: 185970 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:44:19,720-Speed 5977.61 samples/sec Loss 2.1988 LearningRate 0.0053 Epoch: 17 Global Step: 185980 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:44:26,686-Speed 5881.44 samples/sec Loss 2.1855 LearningRate 0.0053 Epoch: 17 Global Step: 185990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:44:33,616-Speed 5911.39 samples/sec Loss 2.1674 LearningRate 0.0052 Epoch: 17 Global Step: 186000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:44:40,490-Speed 5959.56 samples/sec Loss 2.1772 LearningRate 0.0052 Epoch: 17 Global Step: 186010 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:44:47,362-Speed 5962.53 samples/sec Loss 2.1503 LearningRate 0.0052 Epoch: 17 Global Step: 186020 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:44:54,224-Speed 5970.20 samples/sec Loss 2.1774 LearningRate 0.0052 Epoch: 17 Global Step: 186030 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:45:01,097-Speed 5960.12 samples/sec Loss 2.1575 LearningRate 0.0052 Epoch: 17 Global Step: 186040 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:45:07,962-Speed 5967.76 samples/sec Loss 2.1925 LearningRate 0.0052 Epoch: 17 Global Step: 186050 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:45:14,829-Speed 5966.09 samples/sec Loss 2.1866 LearningRate 0.0052 Epoch: 17 Global Step: 186060 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:45:21,683-Speed 5976.98 samples/sec Loss 2.1870 LearningRate 0.0052 Epoch: 17 Global Step: 186070 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:45:28,551-Speed 5964.42 samples/sec Loss 2.1597 LearningRate 0.0052 Epoch: 17 Global Step: 186080 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:45:35,432-Speed 5956.86 samples/sec Loss 2.1666 LearningRate 0.0052 Epoch: 17 Global Step: 186090 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:45:42,287-Speed 5976.22 samples/sec Loss 2.1566 LearningRate 0.0052 Epoch: 17 Global Step: 186100 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 08:45:49,142-Speed 5975.98 samples/sec Loss 2.1749 LearningRate 0.0052 Epoch: 17 Global Step: 186110 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 08:45:56,003-Speed 5972.85 samples/sec Loss 2.1550 LearningRate 0.0052 Epoch: 17 Global Step: 186120 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 08:46:02,892-Speed 5948.43 samples/sec Loss 2.1433 LearningRate 0.0052 Epoch: 17 Global Step: 186130 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 08:46:09,758-Speed 5966.95 samples/sec Loss 2.1497 LearningRate 0.0052 Epoch: 17 Global Step: 186140 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 08:46:16,599-Speed 5988.27 samples/sec Loss 2.1543 LearningRate 0.0052 Epoch: 17 Global Step: 186150 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 08:46:23,448-Speed 5984.44 samples/sec Loss 2.1650 LearningRate 0.0052 Epoch: 17 Global Step: 186160 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 08:46:30,301-Speed 5985.92 samples/sec Loss 2.1609 LearningRate 0.0052 Epoch: 17 Global Step: 186170 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 08:46:37,155-Speed 5976.29 samples/sec Loss 2.1399 LearningRate 0.0052 Epoch: 17 Global Step: 186180 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 08:46:44,007-Speed 5981.42 samples/sec Loss 2.1595 LearningRate 0.0052 Epoch: 17 Global Step: 186190 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 08:46:50,863-Speed 5974.92 samples/sec Loss 2.1348 LearningRate 0.0052 Epoch: 17 Global Step: 186200 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:46:57,735-Speed 5961.98 samples/sec Loss 2.1837 LearningRate 0.0051 Epoch: 17 Global Step: 186210 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:47:04,588-Speed 5978.05 samples/sec Loss 2.1521 LearningRate 0.0051 Epoch: 17 Global Step: 186220 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:47:11,433-Speed 5985.06 samples/sec Loss 2.1561 LearningRate 0.0051 Epoch: 17 Global Step: 186230 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:47:18,289-Speed 5975.19 samples/sec Loss 2.1330 LearningRate 0.0051 Epoch: 17 Global Step: 186240 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:47:25,144-Speed 5976.10 samples/sec Loss 2.1941 LearningRate 0.0051 Epoch: 17 Global Step: 186250 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:47:31,996-Speed 5979.71 samples/sec Loss 2.1624 LearningRate 0.0051 Epoch: 17 Global Step: 186260 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:47:38,839-Speed 5986.87 samples/sec Loss 2.1381 LearningRate 0.0051 Epoch: 17 Global Step: 186270 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:47:45,688-Speed 5983.28 samples/sec Loss 2.1335 LearningRate 0.0051 Epoch: 17 Global Step: 186280 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:47:52,605-Speed 5922.54 samples/sec Loss 2.1593 LearningRate 0.0051 Epoch: 17 Global Step: 186290 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:47:59,489-Speed 5951.72 samples/sec Loss 2.1560 LearningRate 0.0051 Epoch: 17 Global Step: 186300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:48:06,371-Speed 5955.02 samples/sec Loss 2.1073 LearningRate 0.0051 Epoch: 17 Global Step: 186310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:48:13,255-Speed 5950.45 samples/sec Loss 2.1514 LearningRate 0.0051 Epoch: 17 Global Step: 186320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:48:20,143-Speed 5948.19 samples/sec Loss 2.1306 LearningRate 0.0051 Epoch: 17 Global Step: 186330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:48:27,094-Speed 5894.31 samples/sec Loss 2.1508 LearningRate 0.0051 Epoch: 17 Global Step: 186340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:48:33,950-Speed 5975.23 samples/sec Loss 2.1716 LearningRate 0.0051 Epoch: 17 Global Step: 186350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:48:40,810-Speed 5971.97 samples/sec Loss 2.1389 LearningRate 0.0051 Epoch: 17 Global Step: 186360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:48:47,651-Speed 5988.03 samples/sec Loss 2.1677 LearningRate 0.0051 Epoch: 17 Global Step: 186370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:48:54,542-Speed 5944.69 samples/sec Loss 2.1556 LearningRate 0.0051 Epoch: 17 Global Step: 186380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:49:01,388-Speed 5984.34 samples/sec Loss 2.1682 LearningRate 0.0051 Epoch: 17 Global Step: 186390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:49:08,236-Speed 5982.45 samples/sec Loss 2.1619 LearningRate 0.0051 Epoch: 17 Global Step: 186400 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-09 08:49:15,086-Speed 5983.61 samples/sec Loss 2.1472 LearningRate 0.0050 Epoch: 17 Global Step: 186410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:49:21,927-Speed 5987.85 samples/sec Loss 2.1552 LearningRate 0.0050 Epoch: 17 Global Step: 186420 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:49:28,774-Speed 5983.62 samples/sec Loss 2.1455 LearningRate 0.0050 Epoch: 17 Global Step: 186430 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:49:35,629-Speed 5976.20 samples/sec Loss 2.1252 LearningRate 0.0050 Epoch: 17 Global Step: 186440 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:49:42,533-Speed 5934.07 samples/sec Loss 2.1482 LearningRate 0.0050 Epoch: 17 Global Step: 186450 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:49:49,401-Speed 5965.91 samples/sec Loss 2.1208 LearningRate 0.0050 Epoch: 17 Global Step: 186460 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:49:56,253-Speed 5978.43 samples/sec Loss 2.1616 LearningRate 0.0050 Epoch: 17 Global Step: 186470 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:50:03,103-Speed 5981.13 samples/sec Loss 2.1704 LearningRate 0.0050 Epoch: 17 Global Step: 186480 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:50:09,955-Speed 5978.24 samples/sec Loss 2.1744 LearningRate 0.0050 Epoch: 17 Global Step: 186490 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:50:16,830-Speed 5958.90 samples/sec Loss 2.1463 LearningRate 0.0050 Epoch: 17 Global Step: 186500 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:50:23,696-Speed 5966.85 samples/sec Loss 2.1473 LearningRate 0.0050 Epoch: 17 Global Step: 186510 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:50:30,543-Speed 5984.79 samples/sec Loss 2.1805 LearningRate 0.0050 Epoch: 17 Global Step: 186520 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:50:37,390-Speed 5983.28 samples/sec Loss 2.1623 LearningRate 0.0050 Epoch: 17 Global Step: 186530 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:50:44,242-Speed 5979.06 samples/sec Loss 2.1749 LearningRate 0.0050 Epoch: 17 Global Step: 186540 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:50:51,091-Speed 5980.77 samples/sec Loss 2.1364 LearningRate 0.0050 Epoch: 17 Global Step: 186550 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:50:57,962-Speed 5962.40 samples/sec Loss 2.1298 LearningRate 0.0050 Epoch: 17 Global Step: 186560 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:51:04,829-Speed 5966.37 samples/sec Loss 2.1671 LearningRate 0.0050 Epoch: 17 Global Step: 186570 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:51:11,700-Speed 5961.96 samples/sec Loss 2.1397 LearningRate 0.0050 Epoch: 17 Global Step: 186580 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:51:18,591-Speed 5947.91 samples/sec Loss 2.1367 LearningRate 0.0050 Epoch: 17 Global Step: 186590 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:51:25,523-Speed 5910.38 samples/sec Loss 2.1591 LearningRate 0.0050 Epoch: 17 Global Step: 186600 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:51:32,379-Speed 5975.45 samples/sec Loss 2.1222 LearningRate 0.0050 Epoch: 17 Global Step: 186610 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:51:39,254-Speed 5960.06 samples/sec Loss 2.1457 LearningRate 0.0049 Epoch: 17 Global Step: 186620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:51:46,104-Speed 5979.67 samples/sec Loss 2.1366 LearningRate 0.0049 Epoch: 17 Global Step: 186630 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:51:52,949-Speed 5985.91 samples/sec Loss 2.1545 LearningRate 0.0049 Epoch: 17 Global Step: 186640 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:51:59,793-Speed 5985.80 samples/sec Loss 2.1356 LearningRate 0.0049 Epoch: 17 Global Step: 186650 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:52:22,706-Speed 1787.77 samples/sec Loss 2.1843 LearningRate 0.0049 Epoch: 18 Global Step: 186660 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:52:29,533-Speed 6004.30 samples/sec Loss 2.1161 LearningRate 0.0049 Epoch: 18 Global Step: 186670 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:52:36,386-Speed 5978.24 samples/sec Loss 2.1415 LearningRate 0.0049 Epoch: 18 Global Step: 186680 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:52:43,226-Speed 5989.56 samples/sec Loss 2.1449 LearningRate 0.0049 Epoch: 18 Global Step: 186690 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:52:50,079-Speed 5977.90 samples/sec Loss 2.1549 LearningRate 0.0049 Epoch: 18 Global Step: 186700 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:52:56,918-Speed 5990.56 samples/sec Loss 2.1328 LearningRate 0.0049 Epoch: 18 Global Step: 186710 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:53:03,780-Speed 5973.50 samples/sec Loss 2.1084 LearningRate 0.0049 Epoch: 18 Global Step: 186720 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:53:10,630-Speed 5980.25 samples/sec Loss 2.1204 LearningRate 0.0049 Epoch: 18 Global Step: 186730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:53:17,490-Speed 5971.57 samples/sec Loss 2.1409 LearningRate 0.0049 Epoch: 18 Global Step: 186740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:53:24,375-Speed 5950.57 samples/sec Loss 2.1152 LearningRate 0.0049 Epoch: 18 Global Step: 186750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:53:31,223-Speed 5984.43 samples/sec Loss 2.0949 LearningRate 0.0049 Epoch: 18 Global Step: 186760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:53:38,101-Speed 5956.64 samples/sec Loss 2.0984 LearningRate 0.0049 Epoch: 18 Global Step: 186770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:53:44,950-Speed 5981.46 samples/sec Loss 2.1274 LearningRate 0.0049 Epoch: 18 Global Step: 186780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:53:51,820-Speed 5963.21 samples/sec Loss 2.1257 LearningRate 0.0049 Epoch: 18 Global Step: 186790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:53:58,665-Speed 5984.84 samples/sec Loss 2.1213 LearningRate 0.0049 Epoch: 18 Global Step: 186800 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:54:05,530-Speed 5968.07 samples/sec Loss 2.1228 LearningRate 0.0049 Epoch: 18 Global Step: 186810 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:54:12,402-Speed 5962.14 samples/sec Loss 2.1398 LearningRate 0.0049 Epoch: 18 Global Step: 186820 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:54:19,255-Speed 5977.65 samples/sec Loss 2.1033 LearningRate 0.0048 Epoch: 18 Global Step: 186830 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:54:26,107-Speed 5983.78 samples/sec Loss 2.1368 LearningRate 0.0048 Epoch: 18 Global Step: 186840 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:54:32,966-Speed 5972.66 samples/sec Loss 2.0972 LearningRate 0.0048 Epoch: 18 Global Step: 186850 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:54:39,812-Speed 5983.47 samples/sec Loss 2.0914 LearningRate 0.0048 Epoch: 18 Global Step: 186860 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:54:46,713-Speed 5936.79 samples/sec Loss 2.0817 LearningRate 0.0048 Epoch: 18 Global Step: 186870 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:54:53,569-Speed 5975.76 samples/sec Loss 2.1391 LearningRate 0.0048 Epoch: 18 Global Step: 186880 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:55:00,431-Speed 5970.99 samples/sec Loss 2.1108 LearningRate 0.0048 Epoch: 18 Global Step: 186890 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:55:07,292-Speed 5971.35 samples/sec Loss 2.0666 LearningRate 0.0048 Epoch: 18 Global Step: 186900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:55:14,235-Speed 5900.82 samples/sec Loss 2.1305 LearningRate 0.0048 Epoch: 18 Global Step: 186910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:55:21,076-Speed 5988.45 samples/sec Loss 2.1097 LearningRate 0.0048 Epoch: 18 Global Step: 186920 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:55:27,961-Speed 5950.72 samples/sec Loss 2.0906 LearningRate 0.0048 Epoch: 18 Global Step: 186930 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:55:34,807-Speed 5983.51 samples/sec Loss 2.0852 LearningRate 0.0048 Epoch: 18 Global Step: 186940 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:55:41,674-Speed 5965.98 samples/sec Loss 2.1019 LearningRate 0.0048 Epoch: 18 Global Step: 186950 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:55:48,643-Speed 5878.75 samples/sec Loss 2.1116 LearningRate 0.0048 Epoch: 18 Global Step: 186960 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:55:55,510-Speed 5966.42 samples/sec Loss 2.1337 LearningRate 0.0048 Epoch: 18 Global Step: 186970 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:56:02,358-Speed 5982.63 samples/sec Loss 2.1347 LearningRate 0.0048 Epoch: 18 Global Step: 186980 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:56:09,231-Speed 5961.72 samples/sec Loss 2.1267 LearningRate 0.0048 Epoch: 18 Global Step: 186990 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:56:16,080-Speed 5981.72 samples/sec Loss 2.1102 LearningRate 0.0048 Epoch: 18 Global Step: 187000 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:56:22,925-Speed 5985.05 samples/sec Loss 2.1083 LearningRate 0.0048 Epoch: 18 Global Step: 187010 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:56:29,772-Speed 5983.14 samples/sec Loss 2.1345 LearningRate 0.0048 Epoch: 18 Global Step: 187020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:56:36,628-Speed 5976.98 samples/sec Loss 2.1445 LearningRate 0.0048 Epoch: 18 Global Step: 187030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:56:43,500-Speed 5960.57 samples/sec Loss 2.1106 LearningRate 0.0048 Epoch: 18 Global Step: 187040 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:56:50,374-Speed 5960.16 samples/sec Loss 2.1102 LearningRate 0.0047 Epoch: 18 Global Step: 187050 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:56:57,226-Speed 5980.32 samples/sec Loss 2.1019 LearningRate 0.0047 Epoch: 18 Global Step: 187060 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:57:04,084-Speed 5973.18 samples/sec Loss 2.1322 LearningRate 0.0047 Epoch: 18 Global Step: 187070 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:57:10,947-Speed 5969.93 samples/sec Loss 2.1011 LearningRate 0.0047 Epoch: 18 Global Step: 187080 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:57:17,817-Speed 5963.87 samples/sec Loss 2.0967 LearningRate 0.0047 Epoch: 18 Global Step: 187090 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:57:24,701-Speed 5950.45 samples/sec Loss 2.1250 LearningRate 0.0047 Epoch: 18 Global Step: 187100 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:57:31,569-Speed 5965.70 samples/sec Loss 2.1281 LearningRate 0.0047 Epoch: 18 Global Step: 187110 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:57:38,453-Speed 5951.71 samples/sec Loss 2.1270 LearningRate 0.0047 Epoch: 18 Global Step: 187120 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:57:45,303-Speed 5980.00 samples/sec Loss 2.1158 LearningRate 0.0047 Epoch: 18 Global Step: 187130 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:57:52,188-Speed 5950.74 samples/sec Loss 2.0958 LearningRate 0.0047 Epoch: 18 Global Step: 187140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:57:59,041-Speed 5979.25 samples/sec Loss 2.1182 LearningRate 0.0047 Epoch: 18 Global Step: 187150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:58:05,888-Speed 5982.68 samples/sec Loss 2.1100 LearningRate 0.0047 Epoch: 18 Global Step: 187160 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:58:12,745-Speed 5974.80 samples/sec Loss 2.1465 LearningRate 0.0047 Epoch: 18 Global Step: 187170 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:58:19,598-Speed 5977.95 samples/sec Loss 2.0962 LearningRate 0.0047 Epoch: 18 Global Step: 187180 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:58:26,440-Speed 5987.53 samples/sec Loss 2.0693 LearningRate 0.0047 Epoch: 18 Global Step: 187190 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:58:33,325-Speed 5950.99 samples/sec Loss 2.1033 LearningRate 0.0047 Epoch: 18 Global Step: 187200 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:58:40,206-Speed 5953.83 samples/sec Loss 2.1249 LearningRate 0.0047 Epoch: 18 Global Step: 187210 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:58:47,068-Speed 5969.77 samples/sec Loss 2.1010 LearningRate 0.0047 Epoch: 18 Global Step: 187220 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:58:53,956-Speed 5948.02 samples/sec Loss 2.0954 LearningRate 0.0047 Epoch: 18 Global Step: 187230 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:59:00,836-Speed 5954.27 samples/sec Loss 2.0920 LearningRate 0.0047 Epoch: 18 Global Step: 187240 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:59:07,700-Speed 5968.54 samples/sec Loss 2.0805 LearningRate 0.0047 Epoch: 18 Global Step: 187250 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 08:59:14,562-Speed 5970.28 samples/sec Loss 2.0723 LearningRate 0.0046 Epoch: 18 Global Step: 187260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:59:21,410-Speed 5982.76 samples/sec Loss 2.1209 LearningRate 0.0046 Epoch: 18 Global Step: 187270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:59:28,258-Speed 5982.31 samples/sec Loss 2.0879 LearningRate 0.0046 Epoch: 18 Global Step: 187280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:59:35,105-Speed 5982.51 samples/sec Loss 2.1088 LearningRate 0.0046 Epoch: 18 Global Step: 187290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:59:41,970-Speed 5967.87 samples/sec Loss 2.0798 LearningRate 0.0046 Epoch: 18 Global Step: 187300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:59:48,828-Speed 5974.03 samples/sec Loss 2.0930 LearningRate 0.0046 Epoch: 18 Global Step: 187310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 08:59:55,662-Speed 5994.89 samples/sec Loss 2.0962 LearningRate 0.0046 Epoch: 18 Global Step: 187320 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:00:02,520-Speed 5973.82 samples/sec Loss 2.0859 LearningRate 0.0046 Epoch: 18 Global Step: 187330 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:00:09,365-Speed 5984.70 samples/sec Loss 2.0672 LearningRate 0.0046 Epoch: 18 Global Step: 187340 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:00:16,216-Speed 5979.71 samples/sec Loss 2.1086 LearningRate 0.0046 Epoch: 18 Global Step: 187350 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:00:23,074-Speed 5974.62 samples/sec Loss 2.0853 LearningRate 0.0046 Epoch: 18 Global Step: 187360 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:00:29,930-Speed 5975.00 samples/sec Loss 2.1237 LearningRate 0.0046 Epoch: 18 Global Step: 187370 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:00:36,798-Speed 5965.04 samples/sec Loss 2.0891 LearningRate 0.0046 Epoch: 18 Global Step: 187380 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:00:43,683-Speed 5950.59 samples/sec Loss 2.0886 LearningRate 0.0046 Epoch: 18 Global Step: 187390 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:00:50,530-Speed 5983.06 samples/sec Loss 2.0870 LearningRate 0.0046 Epoch: 18 Global Step: 187400 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 09:00:57,382-Speed 5978.99 samples/sec Loss 2.0746 LearningRate 0.0046 Epoch: 18 Global Step: 187410 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 09:01:04,256-Speed 5960.62 samples/sec Loss 2.0999 LearningRate 0.0046 Epoch: 18 Global Step: 187420 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 09:01:11,138-Speed 5952.50 samples/sec Loss 2.0847 LearningRate 0.0046 Epoch: 18 Global Step: 187430 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 09:01:18,014-Speed 5958.53 samples/sec Loss 2.0789 LearningRate 0.0046 Epoch: 18 Global Step: 187440 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 09:01:24,877-Speed 5969.57 samples/sec Loss 2.0705 LearningRate 0.0046 Epoch: 18 Global Step: 187450 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 09:01:31,736-Speed 5973.16 samples/sec Loss 2.0757 LearningRate 0.0046 Epoch: 18 Global Step: 187460 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 09:01:38,584-Speed 5982.74 samples/sec Loss 2.0898 LearningRate 0.0046 Epoch: 18 Global Step: 187470 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 09:01:45,452-Speed 5965.31 samples/sec Loss 2.1020 LearningRate 0.0045 Epoch: 18 Global Step: 187480 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 09:01:52,340-Speed 5947.69 samples/sec Loss 2.0809 LearningRate 0.0045 Epoch: 18 Global Step: 187490 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 09:01:59,208-Speed 5965.05 samples/sec Loss 2.0843 LearningRate 0.0045 Epoch: 18 Global Step: 187500 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:02:06,080-Speed 5962.06 samples/sec Loss 2.1155 LearningRate 0.0045 Epoch: 18 Global Step: 187510 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:02:12,961-Speed 5953.68 samples/sec Loss 2.0874 LearningRate 0.0045 Epoch: 18 Global Step: 187520 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:02:19,817-Speed 5976.02 samples/sec Loss 2.1177 LearningRate 0.0045 Epoch: 18 Global Step: 187530 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:02:26,678-Speed 5972.36 samples/sec Loss 2.0964 LearningRate 0.0045 Epoch: 18 Global Step: 187540 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:02:33,532-Speed 5977.08 samples/sec Loss 2.0553 LearningRate 0.0045 Epoch: 18 Global Step: 187550 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:02:40,387-Speed 5975.82 samples/sec Loss 2.0740 LearningRate 0.0045 Epoch: 18 Global Step: 187560 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:02:47,255-Speed 5966.12 samples/sec Loss 2.0848 LearningRate 0.0045 Epoch: 18 Global Step: 187570 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:02:54,111-Speed 5974.66 samples/sec Loss 2.1059 LearningRate 0.0045 Epoch: 18 Global Step: 187580 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:03:00,974-Speed 5972.70 samples/sec Loss 2.0734 LearningRate 0.0045 Epoch: 18 Global Step: 187590 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:03:07,821-Speed 5983.49 samples/sec Loss 2.0857 LearningRate 0.0045 Epoch: 18 Global Step: 187600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:03:14,668-Speed 5982.73 samples/sec Loss 2.0624 LearningRate 0.0045 Epoch: 18 Global Step: 187610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:03:21,528-Speed 5971.85 samples/sec Loss 2.0767 LearningRate 0.0045 Epoch: 18 Global Step: 187620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:03:28,368-Speed 5989.72 samples/sec Loss 2.0652 LearningRate 0.0045 Epoch: 18 Global Step: 187630 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:03:35,231-Speed 5968.36 samples/sec Loss 2.1186 LearningRate 0.0045 Epoch: 18 Global Step: 187640 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:03:42,106-Speed 5963.22 samples/sec Loss 2.0600 LearningRate 0.0045 Epoch: 18 Global Step: 187650 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:03:48,968-Speed 5970.34 samples/sec Loss 2.0595 LearningRate 0.0045 Epoch: 18 Global Step: 187660 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:03:55,837-Speed 5964.15 samples/sec Loss 2.0903 LearningRate 0.0045 Epoch: 18 Global Step: 187670 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:04:02,689-Speed 5979.31 samples/sec Loss 2.0744 LearningRate 0.0045 Epoch: 18 Global Step: 187680 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:04:09,558-Speed 5967.07 samples/sec Loss 2.0497 LearningRate 0.0045 Epoch: 18 Global Step: 187690 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:04:16,403-Speed 5985.38 samples/sec Loss 2.0766 LearningRate 0.0044 Epoch: 18 Global Step: 187700 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:04:23,263-Speed 5971.93 samples/sec Loss 2.1049 LearningRate 0.0044 Epoch: 18 Global Step: 187710 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:04:30,131-Speed 5966.25 samples/sec Loss 2.0892 LearningRate 0.0044 Epoch: 18 Global Step: 187720 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:04:36,993-Speed 5970.20 samples/sec Loss 2.0658 LearningRate 0.0044 Epoch: 18 Global Step: 187730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:04:43,835-Speed 5988.23 samples/sec Loss 2.0753 LearningRate 0.0044 Epoch: 18 Global Step: 187740 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:04:50,711-Speed 5957.60 samples/sec Loss 2.0684 LearningRate 0.0044 Epoch: 18 Global Step: 187750 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:04:57,584-Speed 5960.64 samples/sec Loss 2.0550 LearningRate 0.0044 Epoch: 18 Global Step: 187760 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:05:04,464-Speed 5956.32 samples/sec Loss 2.0631 LearningRate 0.0044 Epoch: 18 Global Step: 187770 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:05:11,316-Speed 5979.03 samples/sec Loss 2.0843 LearningRate 0.0044 Epoch: 18 Global Step: 187780 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:05:18,171-Speed 5975.63 samples/sec Loss 2.0900 LearningRate 0.0044 Epoch: 18 Global Step: 187790 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:05:25,045-Speed 5961.27 samples/sec Loss 2.0718 LearningRate 0.0044 Epoch: 18 Global Step: 187800 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:05:31,902-Speed 5974.67 samples/sec Loss 2.0901 LearningRate 0.0044 Epoch: 18 Global Step: 187810 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:05:38,767-Speed 5966.73 samples/sec Loss 2.0718 LearningRate 0.0044 Epoch: 18 Global Step: 187820 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:05:45,638-Speed 5965.43 samples/sec Loss 2.0402 LearningRate 0.0044 Epoch: 18 Global Step: 187830 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:05:52,536-Speed 5939.60 samples/sec Loss 2.0342 LearningRate 0.0044 Epoch: 18 Global Step: 187840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:05:59,455-Speed 5920.43 samples/sec Loss 2.0836 LearningRate 0.0044 Epoch: 18 Global Step: 187850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:06:06,311-Speed 5976.15 samples/sec Loss 2.0648 LearningRate 0.0044 Epoch: 18 Global Step: 187860 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:06:13,201-Speed 5948.33 samples/sec Loss 2.0679 LearningRate 0.0044 Epoch: 18 Global Step: 187870 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:06:20,053-Speed 5978.86 samples/sec Loss 2.0821 LearningRate 0.0044 Epoch: 18 Global Step: 187880 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:06:26,910-Speed 5975.30 samples/sec Loss 2.0320 LearningRate 0.0044 Epoch: 18 Global Step: 187890 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:06:33,764-Speed 5978.12 samples/sec Loss 2.0495 LearningRate 0.0044 Epoch: 18 Global Step: 187900 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:06:40,639-Speed 5959.06 samples/sec Loss 2.0492 LearningRate 0.0044 Epoch: 18 Global Step: 187910 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:06:47,528-Speed 5946.39 samples/sec Loss 2.0594 LearningRate 0.0043 Epoch: 18 Global Step: 187920 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:06:54,383-Speed 5976.54 samples/sec Loss 2.0624 LearningRate 0.0043 Epoch: 18 Global Step: 187930 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:07:01,234-Speed 5979.43 samples/sec Loss 2.0713 LearningRate 0.0043 Epoch: 18 Global Step: 187940 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:07:08,098-Speed 5968.57 samples/sec Loss 2.0995 LearningRate 0.0043 Epoch: 18 Global Step: 187950 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:07:14,958-Speed 5972.23 samples/sec Loss 2.0675 LearningRate 0.0043 Epoch: 18 Global Step: 187960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:07:21,819-Speed 5970.83 samples/sec Loss 2.0490 LearningRate 0.0043 Epoch: 18 Global Step: 187970 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:07:28,683-Speed 5968.58 samples/sec Loss 2.0546 LearningRate 0.0043 Epoch: 18 Global Step: 187980 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:07:35,540-Speed 5975.12 samples/sec Loss 2.0482 LearningRate 0.0043 Epoch: 18 Global Step: 187990 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:07:42,387-Speed 5982.84 samples/sec Loss 2.0208 LearningRate 0.0043 Epoch: 18 Global Step: 188000 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:07:49,239-Speed 5979.53 samples/sec Loss 2.0618 LearningRate 0.0043 Epoch: 18 Global Step: 188010 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:07:56,098-Speed 5972.24 samples/sec Loss 2.0335 LearningRate 0.0043 Epoch: 18 Global Step: 188020 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:08:02,952-Speed 5977.37 samples/sec Loss 2.0621 LearningRate 0.0043 Epoch: 18 Global Step: 188030 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:08:09,817-Speed 5967.99 samples/sec Loss 2.0718 LearningRate 0.0043 Epoch: 18 Global Step: 188040 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:08:16,673-Speed 5976.15 samples/sec Loss 2.0624 LearningRate 0.0043 Epoch: 18 Global Step: 188050 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:08:23,523-Speed 5980.91 samples/sec Loss 2.0262 LearningRate 0.0043 Epoch: 18 Global Step: 188060 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:08:30,403-Speed 5954.71 samples/sec Loss 2.0763 LearningRate 0.0043 Epoch: 18 Global Step: 188070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:08:37,269-Speed 5967.02 samples/sec Loss 2.0498 LearningRate 0.0043 Epoch: 18 Global Step: 188080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:08:44,118-Speed 5981.03 samples/sec Loss 2.0422 LearningRate 0.0043 Epoch: 18 Global Step: 188090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:08:50,976-Speed 5975.64 samples/sec Loss 2.0638 LearningRate 0.0043 Epoch: 18 Global Step: 188100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:08:57,827-Speed 5979.49 samples/sec Loss 2.0379 LearningRate 0.0043 Epoch: 18 Global Step: 188110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:09:04,683-Speed 5975.30 samples/sec Loss 2.0390 LearningRate 0.0043 Epoch: 18 Global Step: 188120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:09:11,558-Speed 5958.61 samples/sec Loss 2.0762 LearningRate 0.0043 Epoch: 18 Global Step: 188130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:09:18,419-Speed 5973.38 samples/sec Loss 2.0366 LearningRate 0.0043 Epoch: 18 Global Step: 188140 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:09:25,270-Speed 5979.44 samples/sec Loss 2.0837 LearningRate 0.0042 Epoch: 18 Global Step: 188150 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:09:32,129-Speed 5972.94 samples/sec Loss 2.0818 LearningRate 0.0042 Epoch: 18 Global Step: 188160 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:09:38,983-Speed 5976.66 samples/sec Loss 2.0696 LearningRate 0.0042 Epoch: 18 Global Step: 188170 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:09:45,825-Speed 5987.59 samples/sec Loss 2.0388 LearningRate 0.0042 Epoch: 18 Global Step: 188180 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:09:52,716-Speed 5945.67 samples/sec Loss 2.0637 LearningRate 0.0042 Epoch: 18 Global Step: 188190 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:09:59,588-Speed 5961.03 samples/sec Loss 2.1025 LearningRate 0.0042 Epoch: 18 Global Step: 188200 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:10:06,436-Speed 5982.19 samples/sec Loss 2.0354 LearningRate 0.0042 Epoch: 18 Global Step: 188210 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:10:13,299-Speed 5971.82 samples/sec Loss 2.0687 LearningRate 0.0042 Epoch: 18 Global Step: 188220 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:10:20,157-Speed 5974.09 samples/sec Loss 2.0471 LearningRate 0.0042 Epoch: 18 Global Step: 188230 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:10:27,022-Speed 5966.88 samples/sec Loss 2.0128 LearningRate 0.0042 Epoch: 18 Global Step: 188240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:10:33,884-Speed 5970.93 samples/sec Loss 2.0808 LearningRate 0.0042 Epoch: 18 Global Step: 188250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:10:40,738-Speed 5977.67 samples/sec Loss 2.0143 LearningRate 0.0042 Epoch: 18 Global Step: 188260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:10:47,633-Speed 5943.54 samples/sec Loss 2.0182 LearningRate 0.0042 Epoch: 18 Global Step: 188270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:10:54,501-Speed 5965.55 samples/sec Loss 2.0656 LearningRate 0.0042 Epoch: 18 Global Step: 188280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:11:01,368-Speed 5968.04 samples/sec Loss 2.0253 LearningRate 0.0042 Epoch: 18 Global Step: 188290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:11:08,232-Speed 5968.95 samples/sec Loss 2.0583 LearningRate 0.0042 Epoch: 18 Global Step: 188300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:11:15,086-Speed 5977.43 samples/sec Loss 2.0224 LearningRate 0.0042 Epoch: 18 Global Step: 188310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:11:21,943-Speed 5974.85 samples/sec Loss 2.0474 LearningRate 0.0042 Epoch: 18 Global Step: 188320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:11:28,835-Speed 5944.12 samples/sec Loss 2.0567 LearningRate 0.0042 Epoch: 18 Global Step: 188330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:11:35,691-Speed 5975.58 samples/sec Loss 2.0394 LearningRate 0.0042 Epoch: 18 Global Step: 188340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:11:42,544-Speed 5977.96 samples/sec Loss 2.0329 LearningRate 0.0042 Epoch: 18 Global Step: 188350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:11:49,395-Speed 5979.10 samples/sec Loss 2.0263 LearningRate 0.0042 Epoch: 18 Global Step: 188360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:11:56,250-Speed 5977.09 samples/sec Loss 2.0418 LearningRate 0.0041 Epoch: 18 Global Step: 188370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:12:03,132-Speed 5952.98 samples/sec Loss 2.0296 LearningRate 0.0041 Epoch: 18 Global Step: 188380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:12:10,017-Speed 5950.18 samples/sec Loss 2.0362 LearningRate 0.0041 Epoch: 18 Global Step: 188390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:12:16,860-Speed 5986.28 samples/sec Loss 2.0307 LearningRate 0.0041 Epoch: 18 Global Step: 188400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:12:23,702-Speed 5988.08 samples/sec Loss 2.0520 LearningRate 0.0041 Epoch: 18 Global Step: 188410 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:12:30,564-Speed 5970.40 samples/sec Loss 2.0571 LearningRate 0.0041 Epoch: 18 Global Step: 188420 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:12:37,412-Speed 5982.44 samples/sec Loss 2.0312 LearningRate 0.0041 Epoch: 18 Global Step: 188430 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:12:44,260-Speed 5982.04 samples/sec Loss 2.0469 LearningRate 0.0041 Epoch: 18 Global Step: 188440 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:12:51,098-Speed 5990.81 samples/sec Loss 2.0822 LearningRate 0.0041 Epoch: 18 Global Step: 188450 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:12:57,964-Speed 5967.52 samples/sec Loss 2.0137 LearningRate 0.0041 Epoch: 18 Global Step: 188460 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:13:04,816-Speed 5978.96 samples/sec Loss 2.0224 LearningRate 0.0041 Epoch: 18 Global Step: 188470 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:13:11,689-Speed 5960.27 samples/sec Loss 2.0337 LearningRate 0.0041 Epoch: 18 Global Step: 188480 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:13:18,543-Speed 5977.36 samples/sec Loss 2.0497 LearningRate 0.0041 Epoch: 18 Global Step: 188490 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:13:25,390-Speed 5983.52 samples/sec Loss 2.0327 LearningRate 0.0041 Epoch: 18 Global Step: 188500 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:13:32,247-Speed 5975.34 samples/sec Loss 2.0285 LearningRate 0.0041 Epoch: 18 Global Step: 188510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:13:39,107-Speed 5971.20 samples/sec Loss 1.9796 LearningRate 0.0041 Epoch: 18 Global Step: 188520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:13:45,959-Speed 5979.36 samples/sec Loss 2.0538 LearningRate 0.0041 Epoch: 18 Global Step: 188530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:13:52,797-Speed 5990.73 samples/sec Loss 1.9738 LearningRate 0.0041 Epoch: 18 Global Step: 188540 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:13:59,639-Speed 5987.61 samples/sec Loss 2.0135 LearningRate 0.0041 Epoch: 18 Global Step: 188550 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:14:06,494-Speed 5976.32 samples/sec Loss 2.0449 LearningRate 0.0041 Epoch: 18 Global Step: 188560 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:14:13,359-Speed 5968.05 samples/sec Loss 2.0339 LearningRate 0.0041 Epoch: 18 Global Step: 188570 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:14:20,211-Speed 5979.22 samples/sec Loss 2.0419 LearningRate 0.0041 Epoch: 18 Global Step: 188580 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:14:27,083-Speed 5961.40 samples/sec Loss 2.0381 LearningRate 0.0041 Epoch: 18 Global Step: 188590 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:14:34,066-Speed 5867.09 samples/sec Loss 2.0395 LearningRate 0.0040 Epoch: 18 Global Step: 188600 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:14:41,061-Speed 5856.82 samples/sec Loss 2.0060 LearningRate 0.0040 Epoch: 18 Global Step: 188610 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:14:47,927-Speed 5969.08 samples/sec Loss 2.0062 LearningRate 0.0040 Epoch: 18 Global Step: 188620 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:14:54,778-Speed 5979.68 samples/sec Loss 2.0433 LearningRate 0.0040 Epoch: 18 Global Step: 188630 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:15:01,635-Speed 5974.82 samples/sec Loss 2.0080 LearningRate 0.0040 Epoch: 18 Global Step: 188640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:15:08,482-Speed 5983.58 samples/sec Loss 2.0128 LearningRate 0.0040 Epoch: 18 Global Step: 188650 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:15:15,326-Speed 5986.02 samples/sec Loss 2.0228 LearningRate 0.0040 Epoch: 18 Global Step: 188660 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:15:22,168-Speed 5987.15 samples/sec Loss 2.0175 LearningRate 0.0040 Epoch: 18 Global Step: 188670 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:15:29,021-Speed 5978.91 samples/sec Loss 2.0495 LearningRate 0.0040 Epoch: 18 Global Step: 188680 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:15:35,884-Speed 5972.62 samples/sec Loss 2.0650 LearningRate 0.0040 Epoch: 18 Global Step: 188690 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:15:42,720-Speed 5992.09 samples/sec Loss 2.0059 LearningRate 0.0040 Epoch: 18 Global Step: 188700 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:15:49,571-Speed 5980.08 samples/sec Loss 2.0371 LearningRate 0.0040 Epoch: 18 Global Step: 188710 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:15:56,437-Speed 5967.24 samples/sec Loss 2.0200 LearningRate 0.0040 Epoch: 18 Global Step: 188720 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:16:03,301-Speed 5968.36 samples/sec Loss 2.0332 LearningRate 0.0040 Epoch: 18 Global Step: 188730 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:16:10,156-Speed 5976.93 samples/sec Loss 2.0357 LearningRate 0.0040 Epoch: 18 Global Step: 188740 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:16:17,050-Speed 5941.78 samples/sec Loss 1.9803 LearningRate 0.0040 Epoch: 18 Global Step: 188750 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:16:23,899-Speed 5982.18 samples/sec Loss 2.0652 LearningRate 0.0040 Epoch: 18 Global Step: 188760 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:16:30,882-Speed 5867.64 samples/sec Loss 1.9857 LearningRate 0.0040 Epoch: 18 Global Step: 188770 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:16:37,741-Speed 5973.02 samples/sec Loss 1.9977 LearningRate 0.0040 Epoch: 18 Global Step: 188780 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:16:44,601-Speed 5972.32 samples/sec Loss 1.9984 LearningRate 0.0040 Epoch: 18 Global Step: 188790 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:16:51,444-Speed 5986.34 samples/sec Loss 2.0131 LearningRate 0.0040 Epoch: 18 Global Step: 188800 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:16:58,322-Speed 5956.51 samples/sec Loss 2.0410 LearningRate 0.0040 Epoch: 18 Global Step: 188810 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:17:05,185-Speed 5969.90 samples/sec Loss 2.0081 LearningRate 0.0040 Epoch: 18 Global Step: 188820 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:17:12,077-Speed 5947.80 samples/sec Loss 2.0199 LearningRate 0.0040 Epoch: 18 Global Step: 188830 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:17:18,948-Speed 5962.80 samples/sec Loss 2.0154 LearningRate 0.0039 Epoch: 18 Global Step: 188840 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:17:25,796-Speed 5982.12 samples/sec Loss 2.0322 LearningRate 0.0039 Epoch: 18 Global Step: 188850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:17:32,643-Speed 5984.10 samples/sec Loss 2.0438 LearningRate 0.0039 Epoch: 18 Global Step: 188860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:17:39,612-Speed 5878.45 samples/sec Loss 1.9911 LearningRate 0.0039 Epoch: 18 Global Step: 188870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:17:46,456-Speed 5986.01 samples/sec Loss 1.9969 LearningRate 0.0039 Epoch: 18 Global Step: 188880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:17:53,307-Speed 5980.58 samples/sec Loss 2.0100 LearningRate 0.0039 Epoch: 18 Global Step: 188890 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:18:00,159-Speed 5978.90 samples/sec Loss 2.0339 LearningRate 0.0039 Epoch: 18 Global Step: 188900 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:18:07,037-Speed 5956.28 samples/sec Loss 2.0016 LearningRate 0.0039 Epoch: 18 Global Step: 188910 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:18:13,905-Speed 5965.79 samples/sec Loss 2.0101 LearningRate 0.0039 Epoch: 18 Global Step: 188920 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:18:20,780-Speed 5958.12 samples/sec Loss 1.9567 LearningRate 0.0039 Epoch: 18 Global Step: 188930 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:18:27,617-Speed 5992.31 samples/sec Loss 2.0113 LearningRate 0.0039 Epoch: 18 Global Step: 188940 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:18:34,462-Speed 5984.95 samples/sec Loss 1.9730 LearningRate 0.0039 Epoch: 18 Global Step: 188950 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:18:41,302-Speed 5989.40 samples/sec Loss 2.0109 LearningRate 0.0039 Epoch: 18 Global Step: 188960 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:18:48,158-Speed 5975.39 samples/sec Loss 2.0014 LearningRate 0.0039 Epoch: 18 Global Step: 188970 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:18:55,003-Speed 5984.88 samples/sec Loss 2.0395 LearningRate 0.0039 Epoch: 18 Global Step: 188980 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:19:01,845-Speed 5987.88 samples/sec Loss 1.9899 LearningRate 0.0039 Epoch: 18 Global Step: 188990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:19:08,703-Speed 5973.54 samples/sec Loss 1.9969 LearningRate 0.0039 Epoch: 18 Global Step: 189000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:19:15,564-Speed 5971.95 samples/sec Loss 2.0235 LearningRate 0.0039 Epoch: 18 Global Step: 189010 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:19:22,434-Speed 5963.10 samples/sec Loss 1.9943 LearningRate 0.0039 Epoch: 18 Global Step: 189020 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:19:29,296-Speed 5970.67 samples/sec Loss 1.9749 LearningRate 0.0039 Epoch: 18 Global Step: 189030 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:19:36,254-Speed 5888.23 samples/sec Loss 2.0145 LearningRate 0.0039 Epoch: 18 Global Step: 189040 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:19:43,207-Speed 5891.64 samples/sec Loss 2.0147 LearningRate 0.0039 Epoch: 18 Global Step: 189050 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:19:50,060-Speed 5978.98 samples/sec Loss 2.0011 LearningRate 0.0039 Epoch: 18 Global Step: 189060 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:19:57,015-Speed 5890.26 samples/sec Loss 2.0055 LearningRate 0.0038 Epoch: 18 Global Step: 189070 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:20:03,860-Speed 5985.30 samples/sec Loss 1.9613 LearningRate 0.0038 Epoch: 18 Global Step: 189080 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:20:10,779-Speed 5921.97 samples/sec Loss 2.0402 LearningRate 0.0038 Epoch: 18 Global Step: 189090 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:20:17,687-Speed 5930.10 samples/sec Loss 2.0130 LearningRate 0.0038 Epoch: 18 Global Step: 189100 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:20:24,550-Speed 5969.27 samples/sec Loss 1.9977 LearningRate 0.0038 Epoch: 18 Global Step: 189110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:20:31,425-Speed 5959.08 samples/sec Loss 1.9694 LearningRate 0.0038 Epoch: 18 Global Step: 189120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:20:38,333-Speed 5930.68 samples/sec Loss 1.9787 LearningRate 0.0038 Epoch: 18 Global Step: 189130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:20:45,182-Speed 5981.08 samples/sec Loss 1.9834 LearningRate 0.0038 Epoch: 18 Global Step: 189140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:20:52,032-Speed 5981.06 samples/sec Loss 1.9982 LearningRate 0.0038 Epoch: 18 Global Step: 189150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:20:58,892-Speed 5972.06 samples/sec Loss 1.9710 LearningRate 0.0038 Epoch: 18 Global Step: 189160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:21:05,736-Speed 5985.24 samples/sec Loss 2.0373 LearningRate 0.0038 Epoch: 18 Global Step: 189170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:21:12,580-Speed 5986.36 samples/sec Loss 2.0077 LearningRate 0.0038 Epoch: 18 Global Step: 189180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:21:19,450-Speed 5963.72 samples/sec Loss 1.9863 LearningRate 0.0038 Epoch: 18 Global Step: 189190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:21:26,320-Speed 5963.27 samples/sec Loss 1.9699 LearningRate 0.0038 Epoch: 18 Global Step: 189200 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:21:33,186-Speed 5967.13 samples/sec Loss 1.9926 LearningRate 0.0038 Epoch: 18 Global Step: 189210 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:21:40,055-Speed 5964.42 samples/sec Loss 1.9975 LearningRate 0.0038 Epoch: 18 Global Step: 189220 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:21:46,926-Speed 5962.11 samples/sec Loss 1.9817 LearningRate 0.0038 Epoch: 18 Global Step: 189230 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:21:53,766-Speed 5989.45 samples/sec Loss 2.0023 LearningRate 0.0038 Epoch: 18 Global Step: 189240 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:22:00,623-Speed 5974.67 samples/sec Loss 2.0084 LearningRate 0.0038 Epoch: 18 Global Step: 189250 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:22:07,492-Speed 5964.04 samples/sec Loss 2.0110 LearningRate 0.0038 Epoch: 18 Global Step: 189260 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:22:14,344-Speed 5979.64 samples/sec Loss 1.9733 LearningRate 0.0038 Epoch: 18 Global Step: 189270 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:22:21,280-Speed 5906.91 samples/sec Loss 2.0040 LearningRate 0.0038 Epoch: 18 Global Step: 189280 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:22:28,210-Speed 5911.53 samples/sec Loss 2.0092 LearningRate 0.0038 Epoch: 18 Global Step: 189290 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:22:35,101-Speed 5944.91 samples/sec Loss 2.0304 LearningRate 0.0038 Epoch: 18 Global Step: 189300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:22:41,990-Speed 5947.66 samples/sec Loss 2.0128 LearningRate 0.0037 Epoch: 18 Global Step: 189310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:22:48,848-Speed 5973.71 samples/sec Loss 1.9832 LearningRate 0.0037 Epoch: 18 Global Step: 189320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:22:55,715-Speed 5965.87 samples/sec Loss 1.9713 LearningRate 0.0037 Epoch: 18 Global Step: 189330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:23:02,586-Speed 5963.39 samples/sec Loss 1.9664 LearningRate 0.0037 Epoch: 18 Global Step: 189340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:23:09,447-Speed 5970.40 samples/sec Loss 2.0095 LearningRate 0.0037 Epoch: 18 Global Step: 189350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:23:16,324-Speed 5957.45 samples/sec Loss 1.9990 LearningRate 0.0037 Epoch: 18 Global Step: 189360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:23:23,203-Speed 5955.76 samples/sec Loss 2.0145 LearningRate 0.0037 Epoch: 18 Global Step: 189370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:23:30,071-Speed 5965.37 samples/sec Loss 1.9672 LearningRate 0.0037 Epoch: 18 Global Step: 189380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:23:36,933-Speed 5970.70 samples/sec Loss 1.9688 LearningRate 0.0037 Epoch: 18 Global Step: 189390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:23:43,851-Speed 5921.56 samples/sec Loss 2.0016 LearningRate 0.0037 Epoch: 18 Global Step: 189400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:23:50,715-Speed 5968.74 samples/sec Loss 1.9655 LearningRate 0.0037 Epoch: 18 Global Step: 189410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:23:57,563-Speed 5982.82 samples/sec Loss 1.9566 LearningRate 0.0037 Epoch: 18 Global Step: 189420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:24:04,442-Speed 5956.50 samples/sec Loss 2.0069 LearningRate 0.0037 Epoch: 18 Global Step: 189430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:24:11,281-Speed 5989.63 samples/sec Loss 1.9968 LearningRate 0.0037 Epoch: 18 Global Step: 189440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:24:18,131-Speed 5981.08 samples/sec Loss 1.9652 LearningRate 0.0037 Epoch: 18 Global Step: 189450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:24:24,985-Speed 5976.78 samples/sec Loss 1.9739 LearningRate 0.0037 Epoch: 18 Global Step: 189460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:24:31,842-Speed 5974.53 samples/sec Loss 1.9902 LearningRate 0.0037 Epoch: 18 Global Step: 189470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:24:38,699-Speed 5976.41 samples/sec Loss 1.9794 LearningRate 0.0037 Epoch: 18 Global Step: 189480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 09:24:45,550-Speed 5980.14 samples/sec Loss 1.9993 LearningRate 0.0037 Epoch: 18 Global Step: 189490 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:24:52,437-Speed 5947.81 samples/sec Loss 1.9944 LearningRate 0.0037 Epoch: 18 Global Step: 189500 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:24:59,329-Speed 5944.72 samples/sec Loss 2.0138 LearningRate 0.0037 Epoch: 18 Global Step: 189510 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:25:06,184-Speed 5976.99 samples/sec Loss 1.9568 LearningRate 0.0037 Epoch: 18 Global Step: 189520 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 09:25:13,068-Speed 5951.19 samples/sec Loss 1.9737 LearningRate 0.0037 Epoch: 18 Global Step: 189530 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:25:19,978-Speed 5928.24 samples/sec Loss 1.9177 LearningRate 0.0037 Epoch: 18 Global Step: 189540 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:25:26,825-Speed 5984.67 samples/sec Loss 1.9455 LearningRate 0.0037 Epoch: 18 Global Step: 189550 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:25:33,685-Speed 5971.87 samples/sec Loss 1.9859 LearningRate 0.0036 Epoch: 18 Global Step: 189560 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:25:40,554-Speed 5964.08 samples/sec Loss 1.9836 LearningRate 0.0036 Epoch: 18 Global Step: 189570 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:25:47,426-Speed 5961.97 samples/sec Loss 1.9753 LearningRate 0.0036 Epoch: 18 Global Step: 189580 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:25:54,300-Speed 5959.99 samples/sec Loss 1.9811 LearningRate 0.0036 Epoch: 18 Global Step: 189590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:26:01,154-Speed 5979.19 samples/sec Loss 1.9954 LearningRate 0.0036 Epoch: 18 Global Step: 189600 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:26:08,116-Speed 5883.96 samples/sec Loss 2.0175 LearningRate 0.0036 Epoch: 18 Global Step: 189610 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:26:14,969-Speed 5978.36 samples/sec Loss 1.9725 LearningRate 0.0036 Epoch: 18 Global Step: 189620 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:26:21,849-Speed 5956.70 samples/sec Loss 1.9873 LearningRate 0.0036 Epoch: 18 Global Step: 189630 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:26:28,714-Speed 5967.60 samples/sec Loss 1.9969 LearningRate 0.0036 Epoch: 18 Global Step: 189640 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:26:35,560-Speed 5984.07 samples/sec Loss 1.9701 LearningRate 0.0036 Epoch: 18 Global Step: 189650 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:26:42,412-Speed 5978.99 samples/sec Loss 1.9706 LearningRate 0.0036 Epoch: 18 Global Step: 189660 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:26:49,253-Speed 5988.65 samples/sec Loss 1.9818 LearningRate 0.0036 Epoch: 18 Global Step: 189670 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:26:56,103-Speed 5980.25 samples/sec Loss 1.9897 LearningRate 0.0036 Epoch: 18 Global Step: 189680 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:27:03,019-Speed 5923.69 samples/sec Loss 1.9907 LearningRate 0.0036 Epoch: 18 Global Step: 189690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:27:09,873-Speed 5977.74 samples/sec Loss 1.9765 LearningRate 0.0036 Epoch: 18 Global Step: 189700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:27:16,716-Speed 5986.23 samples/sec Loss 1.9443 LearningRate 0.0036 Epoch: 18 Global Step: 189710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:27:23,576-Speed 5971.85 samples/sec Loss 1.9864 LearningRate 0.0036 Epoch: 18 Global Step: 189720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:27:30,428-Speed 5978.96 samples/sec Loss 2.0171 LearningRate 0.0036 Epoch: 18 Global Step: 189730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:27:37,296-Speed 5964.79 samples/sec Loss 1.9772 LearningRate 0.0036 Epoch: 18 Global Step: 189740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:27:44,160-Speed 5969.45 samples/sec Loss 1.9294 LearningRate 0.0036 Epoch: 18 Global Step: 189750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:27:51,064-Speed 5934.40 samples/sec Loss 1.9586 LearningRate 0.0036 Epoch: 18 Global Step: 189760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:27:57,922-Speed 5973.81 samples/sec Loss 1.9550 LearningRate 0.0036 Epoch: 18 Global Step: 189770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:28:04,785-Speed 5968.57 samples/sec Loss 1.9914 LearningRate 0.0036 Epoch: 18 Global Step: 189780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:28:11,646-Speed 5971.58 samples/sec Loss 1.9438 LearningRate 0.0036 Epoch: 18 Global Step: 189790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:28:18,550-Speed 5933.97 samples/sec Loss 1.9561 LearningRate 0.0035 Epoch: 18 Global Step: 189800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:28:25,398-Speed 5984.66 samples/sec Loss 1.9634 LearningRate 0.0035 Epoch: 18 Global Step: 189810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:28:32,238-Speed 5989.55 samples/sec Loss 1.9988 LearningRate 0.0035 Epoch: 18 Global Step: 189820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:28:39,093-Speed 5975.51 samples/sec Loss 1.9925 LearningRate 0.0035 Epoch: 18 Global Step: 189830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:28:45,946-Speed 5978.17 samples/sec Loss 1.9630 LearningRate 0.0035 Epoch: 18 Global Step: 189840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:28:52,802-Speed 5976.42 samples/sec Loss 1.9604 LearningRate 0.0035 Epoch: 18 Global Step: 189850 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:28:59,762-Speed 5886.20 samples/sec Loss 1.9227 LearningRate 0.0035 Epoch: 18 Global Step: 189860 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:29:06,623-Speed 5971.34 samples/sec Loss 1.9347 LearningRate 0.0035 Epoch: 18 Global Step: 189870 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:29:13,473-Speed 5983.17 samples/sec Loss 1.9430 LearningRate 0.0035 Epoch: 18 Global Step: 189880 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:29:20,326-Speed 5977.92 samples/sec Loss 1.9406 LearningRate 0.0035 Epoch: 18 Global Step: 189890 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:29:27,184-Speed 5973.79 samples/sec Loss 1.9470 LearningRate 0.0035 Epoch: 18 Global Step: 189900 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:29:34,061-Speed 5957.45 samples/sec Loss 1.9672 LearningRate 0.0035 Epoch: 18 Global Step: 189910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:29:40,929-Speed 5965.68 samples/sec Loss 1.9428 LearningRate 0.0035 Epoch: 18 Global Step: 189920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:29:47,798-Speed 5964.02 samples/sec Loss 1.9921 LearningRate 0.0035 Epoch: 18 Global Step: 189930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:29:54,682-Speed 5951.55 samples/sec Loss 1.9582 LearningRate 0.0035 Epoch: 18 Global Step: 189940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:30:01,533-Speed 5980.17 samples/sec Loss 1.9427 LearningRate 0.0035 Epoch: 18 Global Step: 189950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:30:08,371-Speed 5990.89 samples/sec Loss 1.9601 LearningRate 0.0035 Epoch: 18 Global Step: 189960 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:30:15,240-Speed 5964.30 samples/sec Loss 1.9627 LearningRate 0.0035 Epoch: 18 Global Step: 189970 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:30:22,091-Speed 5979.22 samples/sec Loss 1.9134 LearningRate 0.0035 Epoch: 18 Global Step: 189980 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:30:28,959-Speed 5968.04 samples/sec Loss 1.9835 LearningRate 0.0035 Epoch: 18 Global Step: 189990 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:30:35,841-Speed 5953.42 samples/sec Loss 1.9990 LearningRate 0.0035 Epoch: 18 Global Step: 190000 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:31:02,610-[lfw][190000]XNorm: 23.655104 Training: 2022-01-09 09:31:02,611-[lfw][190000]Accuracy-Flip: 0.99767+-0.00271 Training: 2022-01-09 09:31:02,611-[lfw][190000]Accuracy-Highest: 0.99833 Training: 2022-01-09 09:31:33,605-[cfp_fp][190000]XNorm: 21.504125 Training: 2022-01-09 09:31:33,606-[cfp_fp][190000]Accuracy-Flip: 0.99286+-0.00389 Training: 2022-01-09 09:31:33,607-[cfp_fp][190000]Accuracy-Highest: 0.99286 Training: 2022-01-09 09:32:00,336-[agedb_30][190000]XNorm: 23.125094 Training: 2022-01-09 09:32:00,337-[agedb_30][190000]Accuracy-Flip: 0.98117+-0.00606 Training: 2022-01-09 09:32:00,337-[agedb_30][190000]Accuracy-Highest: 0.98200 Training: 2022-01-09 09:32:07,190-Speed 448.40 samples/sec Loss 1.9535 LearningRate 0.0035 Epoch: 18 Global Step: 190010 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:32:14,024-Speed 5995.09 samples/sec Loss 1.9457 LearningRate 0.0035 Epoch: 18 Global Step: 190020 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:32:20,866-Speed 5987.77 samples/sec Loss 1.9517 LearningRate 0.0035 Epoch: 18 Global Step: 190030 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:32:27,710-Speed 5986.20 samples/sec Loss 1.9016 LearningRate 0.0035 Epoch: 18 Global Step: 190040 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:32:34,572-Speed 5969.69 samples/sec Loss 1.9386 LearningRate 0.0034 Epoch: 18 Global Step: 190050 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:32:41,423-Speed 5979.55 samples/sec Loss 1.9291 LearningRate 0.0034 Epoch: 18 Global Step: 190060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:32:48,273-Speed 5983.23 samples/sec Loss 1.9685 LearningRate 0.0034 Epoch: 18 Global Step: 190070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:32:55,119-Speed 5984.66 samples/sec Loss 1.9775 LearningRate 0.0034 Epoch: 18 Global Step: 190080 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:33:01,968-Speed 5981.46 samples/sec Loss 1.9759 LearningRate 0.0034 Epoch: 18 Global Step: 190090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:33:08,874-Speed 5932.62 samples/sec Loss 1.9617 LearningRate 0.0034 Epoch: 18 Global Step: 190100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:33:15,725-Speed 5979.84 samples/sec Loss 1.9287 LearningRate 0.0034 Epoch: 18 Global Step: 190110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:33:22,582-Speed 5974.74 samples/sec Loss 1.9386 LearningRate 0.0034 Epoch: 18 Global Step: 190120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:33:29,427-Speed 5985.77 samples/sec Loss 1.9605 LearningRate 0.0034 Epoch: 18 Global Step: 190130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:33:36,272-Speed 5984.95 samples/sec Loss 1.9512 LearningRate 0.0034 Epoch: 18 Global Step: 190140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:33:43,165-Speed 5944.16 samples/sec Loss 1.9365 LearningRate 0.0034 Epoch: 18 Global Step: 190150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:33:50,069-Speed 5937.06 samples/sec Loss 1.9457 LearningRate 0.0034 Epoch: 18 Global Step: 190160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:33:56,994-Speed 5915.91 samples/sec Loss 1.9530 LearningRate 0.0034 Epoch: 18 Global Step: 190170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:34:03,887-Speed 5943.69 samples/sec Loss 1.9263 LearningRate 0.0034 Epoch: 18 Global Step: 190180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:34:10,769-Speed 5952.60 samples/sec Loss 1.9444 LearningRate 0.0034 Epoch: 18 Global Step: 190190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:34:17,623-Speed 5977.43 samples/sec Loss 1.9505 LearningRate 0.0034 Epoch: 18 Global Step: 190200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:34:24,480-Speed 5974.39 samples/sec Loss 1.9474 LearningRate 0.0034 Epoch: 18 Global Step: 190210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:34:31,341-Speed 5971.37 samples/sec Loss 1.9513 LearningRate 0.0034 Epoch: 18 Global Step: 190220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:34:38,194-Speed 5977.48 samples/sec Loss 1.9403 LearningRate 0.0034 Epoch: 18 Global Step: 190230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:34:45,047-Speed 5977.74 samples/sec Loss 1.9531 LearningRate 0.0034 Epoch: 18 Global Step: 190240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:34:51,891-Speed 5986.07 samples/sec Loss 1.9578 LearningRate 0.0034 Epoch: 18 Global Step: 190250 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:34:58,768-Speed 5957.79 samples/sec Loss 1.9612 LearningRate 0.0034 Epoch: 18 Global Step: 190260 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:35:05,617-Speed 5981.76 samples/sec Loss 1.9231 LearningRate 0.0034 Epoch: 18 Global Step: 190270 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:35:12,480-Speed 5969.54 samples/sec Loss 1.9556 LearningRate 0.0034 Epoch: 18 Global Step: 190280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:35:19,325-Speed 5984.47 samples/sec Loss 1.9493 LearningRate 0.0034 Epoch: 18 Global Step: 190290 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:35:26,183-Speed 5974.07 samples/sec Loss 1.9546 LearningRate 0.0033 Epoch: 18 Global Step: 190300 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:35:33,042-Speed 5973.41 samples/sec Loss 1.9435 LearningRate 0.0033 Epoch: 18 Global Step: 190310 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:35:39,903-Speed 5971.14 samples/sec Loss 1.9440 LearningRate 0.0033 Epoch: 18 Global Step: 190320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:35:46,751-Speed 5982.51 samples/sec Loss 1.9309 LearningRate 0.0033 Epoch: 18 Global Step: 190330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:35:53,612-Speed 5971.69 samples/sec Loss 1.9351 LearningRate 0.0033 Epoch: 18 Global Step: 190340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:36:00,471-Speed 5971.80 samples/sec Loss 1.9087 LearningRate 0.0033 Epoch: 18 Global Step: 190350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:36:07,369-Speed 5942.06 samples/sec Loss 1.9295 LearningRate 0.0033 Epoch: 18 Global Step: 190360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:36:14,247-Speed 5956.30 samples/sec Loss 1.9299 LearningRate 0.0033 Epoch: 18 Global Step: 190370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:36:21,093-Speed 5983.82 samples/sec Loss 1.9182 LearningRate 0.0033 Epoch: 18 Global Step: 190380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:36:27,931-Speed 5991.88 samples/sec Loss 1.9402 LearningRate 0.0033 Epoch: 18 Global Step: 190390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:36:34,800-Speed 5964.30 samples/sec Loss 1.9342 LearningRate 0.0033 Epoch: 18 Global Step: 190400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:36:41,682-Speed 5952.42 samples/sec Loss 1.9694 LearningRate 0.0033 Epoch: 18 Global Step: 190410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:36:54,053-Speed 3311.90 samples/sec Loss 1.9326 LearningRate 0.0033 Epoch: 18 Global Step: 190420 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-09 09:37:00,889-Speed 5993.26 samples/sec Loss 1.9396 LearningRate 0.0033 Epoch: 18 Global Step: 190430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:37:07,739-Speed 5980.70 samples/sec Loss 1.8962 LearningRate 0.0033 Epoch: 18 Global Step: 190440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:37:14,613-Speed 5959.58 samples/sec Loss 1.9028 LearningRate 0.0033 Epoch: 18 Global Step: 190450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:37:21,462-Speed 5981.89 samples/sec Loss 1.9093 LearningRate 0.0033 Epoch: 18 Global Step: 190460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:37:28,322-Speed 5971.68 samples/sec Loss 1.9351 LearningRate 0.0033 Epoch: 18 Global Step: 190470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:37:35,178-Speed 5975.88 samples/sec Loss 1.9521 LearningRate 0.0033 Epoch: 18 Global Step: 190480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:37:42,025-Speed 5983.27 samples/sec Loss 1.9738 LearningRate 0.0033 Epoch: 18 Global Step: 190490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:37:48,883-Speed 5973.18 samples/sec Loss 1.9393 LearningRate 0.0033 Epoch: 18 Global Step: 190500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:37:55,766-Speed 5953.06 samples/sec Loss 1.9068 LearningRate 0.0033 Epoch: 18 Global Step: 190510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:38:02,630-Speed 5968.78 samples/sec Loss 1.9454 LearningRate 0.0033 Epoch: 18 Global Step: 190520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:38:09,477-Speed 5983.11 samples/sec Loss 1.9344 LearningRate 0.0033 Epoch: 18 Global Step: 190530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:38:16,329-Speed 5979.48 samples/sec Loss 1.9478 LearningRate 0.0033 Epoch: 18 Global Step: 190540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:38:23,194-Speed 5967.77 samples/sec Loss 1.9243 LearningRate 0.0033 Epoch: 18 Global Step: 190550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:38:30,049-Speed 5976.33 samples/sec Loss 1.9204 LearningRate 0.0032 Epoch: 18 Global Step: 190560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:38:36,922-Speed 5961.47 samples/sec Loss 1.9399 LearningRate 0.0032 Epoch: 18 Global Step: 190570 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:38:43,791-Speed 5964.12 samples/sec Loss 1.9160 LearningRate 0.0032 Epoch: 18 Global Step: 190580 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:38:50,655-Speed 5968.85 samples/sec Loss 1.9478 LearningRate 0.0032 Epoch: 18 Global Step: 190590 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:38:57,520-Speed 5967.08 samples/sec Loss 1.9186 LearningRate 0.0032 Epoch: 18 Global Step: 190600 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:39:04,370-Speed 5981.46 samples/sec Loss 1.9294 LearningRate 0.0032 Epoch: 18 Global Step: 190610 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:39:11,249-Speed 5954.83 samples/sec Loss 1.9456 LearningRate 0.0032 Epoch: 18 Global Step: 190620 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:39:18,139-Speed 5945.95 samples/sec Loss 1.8763 LearningRate 0.0032 Epoch: 18 Global Step: 190630 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:39:25,014-Speed 5959.30 samples/sec Loss 1.9158 LearningRate 0.0032 Epoch: 18 Global Step: 190640 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:39:31,871-Speed 5974.39 samples/sec Loss 1.9531 LearningRate 0.0032 Epoch: 18 Global Step: 190650 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:39:38,777-Speed 5931.74 samples/sec Loss 1.9450 LearningRate 0.0032 Epoch: 18 Global Step: 190660 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:39:45,650-Speed 5960.62 samples/sec Loss 1.8935 LearningRate 0.0032 Epoch: 18 Global Step: 190670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:39:52,506-Speed 5976.01 samples/sec Loss 1.8904 LearningRate 0.0032 Epoch: 18 Global Step: 190680 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:39:59,360-Speed 5976.88 samples/sec Loss 1.9426 LearningRate 0.0032 Epoch: 18 Global Step: 190690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:40:06,265-Speed 5932.91 samples/sec Loss 1.9212 LearningRate 0.0032 Epoch: 18 Global Step: 190700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:40:13,121-Speed 5975.17 samples/sec Loss 1.9146 LearningRate 0.0032 Epoch: 18 Global Step: 190710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:40:20,019-Speed 5939.34 samples/sec Loss 1.9275 LearningRate 0.0032 Epoch: 18 Global Step: 190720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:40:26,875-Speed 5976.02 samples/sec Loss 1.9572 LearningRate 0.0032 Epoch: 18 Global Step: 190730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:40:33,732-Speed 5974.11 samples/sec Loss 1.9124 LearningRate 0.0032 Epoch: 18 Global Step: 190740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:40:40,609-Speed 5957.65 samples/sec Loss 1.9370 LearningRate 0.0032 Epoch: 18 Global Step: 190750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:40:47,465-Speed 5975.89 samples/sec Loss 1.9308 LearningRate 0.0032 Epoch: 18 Global Step: 190760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:40:54,333-Speed 5964.33 samples/sec Loss 1.9278 LearningRate 0.0032 Epoch: 18 Global Step: 190770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:41:01,245-Speed 5927.22 samples/sec Loss 1.9349 LearningRate 0.0032 Epoch: 18 Global Step: 190780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:41:08,220-Speed 5874.23 samples/sec Loss 1.8965 LearningRate 0.0032 Epoch: 18 Global Step: 190790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:41:15,182-Speed 5883.64 samples/sec Loss 1.9081 LearningRate 0.0032 Epoch: 18 Global Step: 190800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:41:22,048-Speed 5967.25 samples/sec Loss 1.9314 LearningRate 0.0032 Epoch: 18 Global Step: 190810 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:41:28,898-Speed 5980.46 samples/sec Loss 1.9230 LearningRate 0.0031 Epoch: 18 Global Step: 190820 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:41:35,756-Speed 5973.63 samples/sec Loss 1.8953 LearningRate 0.0031 Epoch: 18 Global Step: 190830 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:41:42,634-Speed 5956.55 samples/sec Loss 1.9208 LearningRate 0.0031 Epoch: 18 Global Step: 190840 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:41:49,499-Speed 5968.22 samples/sec Loss 1.9086 LearningRate 0.0031 Epoch: 18 Global Step: 190850 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:41:56,365-Speed 5966.91 samples/sec Loss 1.9269 LearningRate 0.0031 Epoch: 18 Global Step: 190860 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:42:03,256-Speed 5945.60 samples/sec Loss 1.9068 LearningRate 0.0031 Epoch: 18 Global Step: 190870 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:42:10,111-Speed 5976.53 samples/sec Loss 1.8977 LearningRate 0.0031 Epoch: 18 Global Step: 190880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:42:16,979-Speed 5964.70 samples/sec Loss 1.8982 LearningRate 0.0031 Epoch: 18 Global Step: 190890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:42:23,834-Speed 5976.60 samples/sec Loss 1.9039 LearningRate 0.0031 Epoch: 18 Global Step: 190900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:42:30,688-Speed 5977.54 samples/sec Loss 1.9078 LearningRate 0.0031 Epoch: 18 Global Step: 190910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:42:37,546-Speed 5972.96 samples/sec Loss 1.8950 LearningRate 0.0031 Epoch: 18 Global Step: 190920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:42:44,419-Speed 5961.17 samples/sec Loss 1.9289 LearningRate 0.0031 Epoch: 18 Global Step: 190930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:42:51,377-Speed 5887.78 samples/sec Loss 1.9220 LearningRate 0.0031 Epoch: 18 Global Step: 190940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:42:58,230-Speed 5977.61 samples/sec Loss 1.8811 LearningRate 0.0031 Epoch: 18 Global Step: 190950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:43:05,094-Speed 5969.45 samples/sec Loss 1.9541 LearningRate 0.0031 Epoch: 18 Global Step: 190960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:43:11,977-Speed 5952.47 samples/sec Loss 1.9019 LearningRate 0.0031 Epoch: 18 Global Step: 190970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:43:18,833-Speed 5975.76 samples/sec Loss 1.9063 LearningRate 0.0031 Epoch: 18 Global Step: 190980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:43:25,695-Speed 5970.22 samples/sec Loss 1.9114 LearningRate 0.0031 Epoch: 18 Global Step: 190990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:43:32,576-Speed 5954.51 samples/sec Loss 1.8995 LearningRate 0.0031 Epoch: 18 Global Step: 191000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:43:39,424-Speed 5981.46 samples/sec Loss 1.8966 LearningRate 0.0031 Epoch: 18 Global Step: 191010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:43:46,278-Speed 5977.25 samples/sec Loss 1.9086 LearningRate 0.0031 Epoch: 18 Global Step: 191020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:43:53,154-Speed 5958.73 samples/sec Loss 1.9344 LearningRate 0.0031 Epoch: 18 Global Step: 191030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:44:00,014-Speed 5971.35 samples/sec Loss 1.8934 LearningRate 0.0031 Epoch: 18 Global Step: 191040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:44:06,869-Speed 5976.14 samples/sec Loss 1.9068 LearningRate 0.0031 Epoch: 18 Global Step: 191050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:44:13,713-Speed 5985.65 samples/sec Loss 1.8873 LearningRate 0.0031 Epoch: 18 Global Step: 191060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:44:20,560-Speed 5983.08 samples/sec Loss 1.9122 LearningRate 0.0031 Epoch: 18 Global Step: 191070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:44:27,423-Speed 5969.69 samples/sec Loss 1.9066 LearningRate 0.0031 Epoch: 18 Global Step: 191080 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-09 09:44:34,286-Speed 5969.29 samples/sec Loss 1.8858 LearningRate 0.0030 Epoch: 18 Global Step: 191090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:44:41,143-Speed 5974.86 samples/sec Loss 1.9034 LearningRate 0.0030 Epoch: 18 Global Step: 191100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:44:48,002-Speed 5972.18 samples/sec Loss 1.8654 LearningRate 0.0030 Epoch: 18 Global Step: 191110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:44:54,864-Speed 5970.54 samples/sec Loss 1.9124 LearningRate 0.0030 Epoch: 18 Global Step: 191120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:45:01,727-Speed 5968.52 samples/sec Loss 1.8936 LearningRate 0.0030 Epoch: 18 Global Step: 191130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:45:08,617-Speed 5948.90 samples/sec Loss 1.9032 LearningRate 0.0030 Epoch: 18 Global Step: 191140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:45:15,469-Speed 5979.33 samples/sec Loss 1.9290 LearningRate 0.0030 Epoch: 18 Global Step: 191150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:45:22,325-Speed 5974.81 samples/sec Loss 1.8901 LearningRate 0.0030 Epoch: 18 Global Step: 191160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:45:29,183-Speed 5973.75 samples/sec Loss 1.8810 LearningRate 0.0030 Epoch: 18 Global Step: 191170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:45:36,046-Speed 5969.82 samples/sec Loss 1.9018 LearningRate 0.0030 Epoch: 18 Global Step: 191180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:45:42,896-Speed 5980.36 samples/sec Loss 1.9079 LearningRate 0.0030 Epoch: 18 Global Step: 191190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:45:49,754-Speed 5973.66 samples/sec Loss 1.8798 LearningRate 0.0030 Epoch: 18 Global Step: 191200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:45:56,598-Speed 5986.22 samples/sec Loss 1.8680 LearningRate 0.0030 Epoch: 18 Global Step: 191210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:46:03,478-Speed 5954.45 samples/sec Loss 1.9017 LearningRate 0.0030 Epoch: 18 Global Step: 191220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:46:10,334-Speed 5977.41 samples/sec Loss 1.8968 LearningRate 0.0030 Epoch: 18 Global Step: 191230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:46:17,201-Speed 5966.33 samples/sec Loss 1.9330 LearningRate 0.0030 Epoch: 18 Global Step: 191240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:46:24,066-Speed 5967.61 samples/sec Loss 1.8681 LearningRate 0.0030 Epoch: 18 Global Step: 191250 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:46:30,928-Speed 5970.24 samples/sec Loss 1.9019 LearningRate 0.0030 Epoch: 18 Global Step: 191260 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:46:37,780-Speed 5978.94 samples/sec Loss 1.8861 LearningRate 0.0030 Epoch: 18 Global Step: 191270 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:46:44,625-Speed 5984.98 samples/sec Loss 1.9106 LearningRate 0.0030 Epoch: 18 Global Step: 191280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:46:51,471-Speed 5984.56 samples/sec Loss 1.9066 LearningRate 0.0030 Epoch: 18 Global Step: 191290 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:46:58,333-Speed 5970.56 samples/sec Loss 1.8785 LearningRate 0.0030 Epoch: 18 Global Step: 191300 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:47:05,190-Speed 5973.67 samples/sec Loss 1.8939 LearningRate 0.0030 Epoch: 18 Global Step: 191310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:47:12,065-Speed 5959.71 samples/sec Loss 1.8937 LearningRate 0.0030 Epoch: 18 Global Step: 191320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:47:18,917-Speed 5979.20 samples/sec Loss 1.9150 LearningRate 0.0030 Epoch: 18 Global Step: 191330 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:47:25,773-Speed 5974.80 samples/sec Loss 1.8692 LearningRate 0.0030 Epoch: 18 Global Step: 191340 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:47:32,682-Speed 5929.82 samples/sec Loss 1.9004 LearningRate 0.0030 Epoch: 18 Global Step: 191350 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:47:39,564-Speed 5952.75 samples/sec Loss 1.9017 LearningRate 0.0029 Epoch: 18 Global Step: 191360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:47:46,421-Speed 5974.96 samples/sec Loss 1.8677 LearningRate 0.0029 Epoch: 18 Global Step: 191370 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:47:53,276-Speed 5975.80 samples/sec Loss 1.8774 LearningRate 0.0029 Epoch: 18 Global Step: 191380 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:48:00,157-Speed 5954.50 samples/sec Loss 1.9238 LearningRate 0.0029 Epoch: 18 Global Step: 191390 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:48:07,038-Speed 5954.03 samples/sec Loss 1.8656 LearningRate 0.0029 Epoch: 18 Global Step: 191400 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:48:13,888-Speed 5980.72 samples/sec Loss 1.8945 LearningRate 0.0029 Epoch: 18 Global Step: 191410 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:48:20,756-Speed 5964.93 samples/sec Loss 1.9148 LearningRate 0.0029 Epoch: 18 Global Step: 191420 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:48:27,663-Speed 5931.23 samples/sec Loss 1.8854 LearningRate 0.0029 Epoch: 18 Global Step: 191430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:48:34,514-Speed 5980.30 samples/sec Loss 1.8443 LearningRate 0.0029 Epoch: 18 Global Step: 191440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:48:41,402-Speed 5947.84 samples/sec Loss 1.8953 LearningRate 0.0029 Epoch: 18 Global Step: 191450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:48:48,265-Speed 5969.14 samples/sec Loss 1.8725 LearningRate 0.0029 Epoch: 18 Global Step: 191460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:48:55,114-Speed 5981.32 samples/sec Loss 1.8769 LearningRate 0.0029 Epoch: 18 Global Step: 191470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:49:01,982-Speed 5965.56 samples/sec Loss 1.8928 LearningRate 0.0029 Epoch: 18 Global Step: 191480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:49:08,836-Speed 5977.39 samples/sec Loss 1.8851 LearningRate 0.0029 Epoch: 18 Global Step: 191490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:49:15,690-Speed 5977.20 samples/sec Loss 1.9004 LearningRate 0.0029 Epoch: 18 Global Step: 191500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:49:22,564-Speed 5959.25 samples/sec Loss 1.8900 LearningRate 0.0029 Epoch: 18 Global Step: 191510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:49:29,430-Speed 5966.85 samples/sec Loss 1.8470 LearningRate 0.0029 Epoch: 18 Global Step: 191520 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:49:36,277-Speed 5983.23 samples/sec Loss 1.8869 LearningRate 0.0029 Epoch: 18 Global Step: 191530 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:49:43,130-Speed 5978.36 samples/sec Loss 1.8139 LearningRate 0.0029 Epoch: 18 Global Step: 191540 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:49:50,005-Speed 5960.66 samples/sec Loss 1.9098 LearningRate 0.0029 Epoch: 18 Global Step: 191550 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:49:56,862-Speed 5973.99 samples/sec Loss 1.8736 LearningRate 0.0029 Epoch: 18 Global Step: 191560 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:50:03,735-Speed 5960.18 samples/sec Loss 1.9075 LearningRate 0.0029 Epoch: 18 Global Step: 191570 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:50:10,579-Speed 5986.22 samples/sec Loss 1.8622 LearningRate 0.0029 Epoch: 18 Global Step: 191580 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:50:17,421-Speed 5987.60 samples/sec Loss 1.8912 LearningRate 0.0029 Epoch: 18 Global Step: 191590 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:50:24,279-Speed 5975.87 samples/sec Loss 1.8730 LearningRate 0.0029 Epoch: 18 Global Step: 191600 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:50:31,127-Speed 5982.02 samples/sec Loss 1.8989 LearningRate 0.0029 Epoch: 18 Global Step: 191610 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:50:37,993-Speed 5966.84 samples/sec Loss 1.8517 LearningRate 0.0029 Epoch: 18 Global Step: 191620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:50:44,836-Speed 5986.79 samples/sec Loss 1.9210 LearningRate 0.0028 Epoch: 18 Global Step: 191630 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:50:51,685-Speed 5981.66 samples/sec Loss 1.8960 LearningRate 0.0028 Epoch: 18 Global Step: 191640 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:50:58,544-Speed 5973.03 samples/sec Loss 1.8692 LearningRate 0.0028 Epoch: 18 Global Step: 191650 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:51:05,429-Speed 5951.73 samples/sec Loss 1.8914 LearningRate 0.0028 Epoch: 18 Global Step: 191660 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:51:12,294-Speed 5967.14 samples/sec Loss 1.8415 LearningRate 0.0028 Epoch: 18 Global Step: 191670 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:51:19,148-Speed 5977.66 samples/sec Loss 1.8631 LearningRate 0.0028 Epoch: 18 Global Step: 191680 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:51:25,992-Speed 5986.21 samples/sec Loss 1.8938 LearningRate 0.0028 Epoch: 18 Global Step: 191690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:51:32,846-Speed 5976.63 samples/sec Loss 1.8632 LearningRate 0.0028 Epoch: 18 Global Step: 191700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:51:39,732-Speed 5950.26 samples/sec Loss 1.8717 LearningRate 0.0028 Epoch: 18 Global Step: 191710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:51:46,599-Speed 5966.35 samples/sec Loss 1.8801 LearningRate 0.0028 Epoch: 18 Global Step: 191720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:51:53,445-Speed 5983.39 samples/sec Loss 1.8743 LearningRate 0.0028 Epoch: 18 Global Step: 191730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:52:00,296-Speed 5980.69 samples/sec Loss 1.9089 LearningRate 0.0028 Epoch: 18 Global Step: 191740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:52:07,155-Speed 5974.70 samples/sec Loss 1.8540 LearningRate 0.0028 Epoch: 18 Global Step: 191750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:52:13,995-Speed 5989.56 samples/sec Loss 1.8440 LearningRate 0.0028 Epoch: 18 Global Step: 191760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:52:20,860-Speed 5966.97 samples/sec Loss 1.8618 LearningRate 0.0028 Epoch: 18 Global Step: 191770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:52:27,739-Speed 5958.13 samples/sec Loss 1.8792 LearningRate 0.0028 Epoch: 18 Global Step: 191780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:52:34,598-Speed 5972.96 samples/sec Loss 1.8496 LearningRate 0.0028 Epoch: 18 Global Step: 191790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:52:41,459-Speed 5971.37 samples/sec Loss 1.8877 LearningRate 0.0028 Epoch: 18 Global Step: 191800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:52:48,326-Speed 5965.66 samples/sec Loss 1.8763 LearningRate 0.0028 Epoch: 18 Global Step: 191810 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:52:55,215-Speed 5946.69 samples/sec Loss 1.8688 LearningRate 0.0028 Epoch: 18 Global Step: 191820 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:53:02,075-Speed 5972.34 samples/sec Loss 1.8421 LearningRate 0.0028 Epoch: 18 Global Step: 191830 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:53:08,930-Speed 5976.30 samples/sec Loss 1.8611 LearningRate 0.0028 Epoch: 18 Global Step: 191840 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:53:15,775-Speed 5984.43 samples/sec Loss 1.8727 LearningRate 0.0028 Epoch: 18 Global Step: 191850 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:53:22,632-Speed 5974.85 samples/sec Loss 1.8654 LearningRate 0.0028 Epoch: 18 Global Step: 191860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:53:29,488-Speed 5975.84 samples/sec Loss 1.8663 LearningRate 0.0028 Epoch: 18 Global Step: 191870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:53:36,348-Speed 5972.17 samples/sec Loss 1.8474 LearningRate 0.0028 Epoch: 18 Global Step: 191880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:53:43,239-Speed 5953.58 samples/sec Loss 1.8550 LearningRate 0.0028 Epoch: 18 Global Step: 191890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:53:50,109-Speed 5963.15 samples/sec Loss 1.8369 LearningRate 0.0028 Epoch: 18 Global Step: 191900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:53:56,963-Speed 5977.13 samples/sec Loss 1.8787 LearningRate 0.0027 Epoch: 18 Global Step: 191910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:54:03,811-Speed 5982.58 samples/sec Loss 1.8697 LearningRate 0.0027 Epoch: 18 Global Step: 191920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:54:10,710-Speed 5937.67 samples/sec Loss 1.8501 LearningRate 0.0027 Epoch: 18 Global Step: 191930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:54:17,562-Speed 5978.62 samples/sec Loss 1.8591 LearningRate 0.0027 Epoch: 18 Global Step: 191940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:54:24,417-Speed 5976.43 samples/sec Loss 1.8698 LearningRate 0.0027 Epoch: 18 Global Step: 191950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:54:31,304-Speed 5948.98 samples/sec Loss 1.8723 LearningRate 0.0027 Epoch: 18 Global Step: 191960 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-09 09:54:38,176-Speed 5961.86 samples/sec Loss 1.8496 LearningRate 0.0027 Epoch: 18 Global Step: 191970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:54:45,053-Speed 5957.36 samples/sec Loss 1.8559 LearningRate 0.0027 Epoch: 18 Global Step: 191980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:54:51,898-Speed 5984.49 samples/sec Loss 1.8578 LearningRate 0.0027 Epoch: 18 Global Step: 191990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:54:58,761-Speed 5969.72 samples/sec Loss 1.8596 LearningRate 0.0027 Epoch: 18 Global Step: 192000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:55:05,649-Speed 5947.87 samples/sec Loss 1.8680 LearningRate 0.0027 Epoch: 18 Global Step: 192010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:55:12,512-Speed 5969.21 samples/sec Loss 1.8533 LearningRate 0.0027 Epoch: 18 Global Step: 192020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:55:19,375-Speed 5968.64 samples/sec Loss 1.8795 LearningRate 0.0027 Epoch: 18 Global Step: 192030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:55:26,252-Speed 5958.14 samples/sec Loss 1.8443 LearningRate 0.0027 Epoch: 18 Global Step: 192040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:55:33,105-Speed 5977.73 samples/sec Loss 1.8262 LearningRate 0.0027 Epoch: 18 Global Step: 192050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:55:39,958-Speed 5978.61 samples/sec Loss 1.8627 LearningRate 0.0027 Epoch: 18 Global Step: 192060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:55:46,812-Speed 5976.99 samples/sec Loss 1.8709 LearningRate 0.0027 Epoch: 18 Global Step: 192070 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:55:53,681-Speed 5966.02 samples/sec Loss 1.8652 LearningRate 0.0027 Epoch: 18 Global Step: 192080 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:56:00,532-Speed 5979.23 samples/sec Loss 1.8725 LearningRate 0.0027 Epoch: 18 Global Step: 192090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:56:07,381-Speed 5982.84 samples/sec Loss 1.8678 LearningRate 0.0027 Epoch: 18 Global Step: 192100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:56:14,253-Speed 5962.24 samples/sec Loss 1.8623 LearningRate 0.0027 Epoch: 18 Global Step: 192110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:56:21,128-Speed 5958.98 samples/sec Loss 1.8657 LearningRate 0.0027 Epoch: 18 Global Step: 192120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:56:27,982-Speed 5977.30 samples/sec Loss 1.8704 LearningRate 0.0027 Epoch: 18 Global Step: 192130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:56:34,834-Speed 5979.49 samples/sec Loss 1.8844 LearningRate 0.0027 Epoch: 18 Global Step: 192140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:56:41,687-Speed 5977.59 samples/sec Loss 1.8624 LearningRate 0.0027 Epoch: 18 Global Step: 192150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:56:48,547-Speed 5972.30 samples/sec Loss 1.8695 LearningRate 0.0027 Epoch: 18 Global Step: 192160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:56:55,444-Speed 5940.16 samples/sec Loss 1.8723 LearningRate 0.0027 Epoch: 18 Global Step: 192170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:57:02,292-Speed 5982.40 samples/sec Loss 1.8473 LearningRate 0.0027 Epoch: 18 Global Step: 192180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:57:09,139-Speed 5982.66 samples/sec Loss 1.8506 LearningRate 0.0026 Epoch: 18 Global Step: 192190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:57:16,004-Speed 5968.50 samples/sec Loss 1.8456 LearningRate 0.0026 Epoch: 18 Global Step: 192200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 09:57:22,850-Speed 5983.76 samples/sec Loss 1.8351 LearningRate 0.0026 Epoch: 18 Global Step: 192210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:57:29,698-Speed 5984.19 samples/sec Loss 1.8394 LearningRate 0.0026 Epoch: 18 Global Step: 192220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:57:36,541-Speed 5987.25 samples/sec Loss 1.8752 LearningRate 0.0026 Epoch: 18 Global Step: 192230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:57:43,390-Speed 5980.83 samples/sec Loss 1.8843 LearningRate 0.0026 Epoch: 18 Global Step: 192240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:57:50,244-Speed 5977.55 samples/sec Loss 1.8757 LearningRate 0.0026 Epoch: 18 Global Step: 192250 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 09:57:57,095-Speed 5980.14 samples/sec Loss 1.8574 LearningRate 0.0026 Epoch: 18 Global Step: 192260 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 09:58:03,944-Speed 5981.05 samples/sec Loss 1.8346 LearningRate 0.0026 Epoch: 18 Global Step: 192270 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 09:58:10,785-Speed 5988.82 samples/sec Loss 1.8453 LearningRate 0.0026 Epoch: 18 Global Step: 192280 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 09:58:17,639-Speed 5978.73 samples/sec Loss 1.8206 LearningRate 0.0026 Epoch: 18 Global Step: 192290 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 09:58:24,488-Speed 5981.15 samples/sec Loss 1.8532 LearningRate 0.0026 Epoch: 18 Global Step: 192300 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 09:58:31,344-Speed 5975.97 samples/sec Loss 1.8571 LearningRate 0.0026 Epoch: 18 Global Step: 192310 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 09:58:38,195-Speed 5980.04 samples/sec Loss 1.8391 LearningRate 0.0026 Epoch: 18 Global Step: 192320 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 09:58:45,044-Speed 5980.58 samples/sec Loss 1.8251 LearningRate 0.0026 Epoch: 18 Global Step: 192330 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 09:59:07,939-Speed 1789.25 samples/sec Loss 1.8344 LearningRate 0.0026 Epoch: 18 Global Step: 192340 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 09:59:14,803-Speed 5968.98 samples/sec Loss 1.8601 LearningRate 0.0026 Epoch: 18 Global Step: 192350 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:59:21,648-Speed 5984.98 samples/sec Loss 1.8551 LearningRate 0.0026 Epoch: 18 Global Step: 192360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:59:28,493-Speed 5984.89 samples/sec Loss 1.8296 LearningRate 0.0026 Epoch: 18 Global Step: 192370 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:59:35,354-Speed 5970.96 samples/sec Loss 1.8180 LearningRate 0.0026 Epoch: 18 Global Step: 192380 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:59:42,216-Speed 5969.67 samples/sec Loss 1.8486 LearningRate 0.0026 Epoch: 18 Global Step: 192390 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:59:49,073-Speed 5975.76 samples/sec Loss 1.8307 LearningRate 0.0026 Epoch: 18 Global Step: 192400 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 09:59:55,927-Speed 5977.48 samples/sec Loss 1.8292 LearningRate 0.0026 Epoch: 18 Global Step: 192410 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:00:02,801-Speed 5959.09 samples/sec Loss 1.8633 LearningRate 0.0026 Epoch: 18 Global Step: 192420 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:00:09,689-Speed 5948.21 samples/sec Loss 1.8238 LearningRate 0.0026 Epoch: 18 Global Step: 192430 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:00:16,583-Speed 5943.27 samples/sec Loss 1.8266 LearningRate 0.0026 Epoch: 18 Global Step: 192440 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:00:23,468-Speed 5950.16 samples/sec Loss 1.8314 LearningRate 0.0026 Epoch: 18 Global Step: 192450 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:00:30,340-Speed 5961.69 samples/sec Loss 1.8394 LearningRate 0.0026 Epoch: 18 Global Step: 192460 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:00:37,218-Speed 5956.52 samples/sec Loss 1.8305 LearningRate 0.0026 Epoch: 18 Global Step: 192470 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:00:44,091-Speed 5960.60 samples/sec Loss 1.8145 LearningRate 0.0025 Epoch: 18 Global Step: 192480 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:00:50,957-Speed 5967.24 samples/sec Loss 1.8299 LearningRate 0.0025 Epoch: 18 Global Step: 192490 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:00:57,809-Speed 5980.76 samples/sec Loss 1.8112 LearningRate 0.0025 Epoch: 18 Global Step: 192500 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:01:04,664-Speed 5976.26 samples/sec Loss 1.8518 LearningRate 0.0025 Epoch: 18 Global Step: 192510 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:01:11,518-Speed 5977.67 samples/sec Loss 1.8061 LearningRate 0.0025 Epoch: 18 Global Step: 192520 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:01:18,388-Speed 5963.12 samples/sec Loss 1.8211 LearningRate 0.0025 Epoch: 18 Global Step: 192530 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:01:25,248-Speed 5972.06 samples/sec Loss 1.8470 LearningRate 0.0025 Epoch: 18 Global Step: 192540 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:01:32,116-Speed 5964.78 samples/sec Loss 1.8426 LearningRate 0.0025 Epoch: 18 Global Step: 192550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:01:38,989-Speed 5961.57 samples/sec Loss 1.8263 LearningRate 0.0025 Epoch: 18 Global Step: 192560 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:01:45,860-Speed 5962.16 samples/sec Loss 1.8680 LearningRate 0.0025 Epoch: 18 Global Step: 192570 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:01:52,738-Speed 5957.05 samples/sec Loss 1.8417 LearningRate 0.0025 Epoch: 18 Global Step: 192580 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:01:59,597-Speed 5974.67 samples/sec Loss 1.8178 LearningRate 0.0025 Epoch: 18 Global Step: 192590 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:02:06,482-Speed 5950.24 samples/sec Loss 1.8385 LearningRate 0.0025 Epoch: 18 Global Step: 192600 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:02:13,364-Speed 5953.04 samples/sec Loss 1.8157 LearningRate 0.0025 Epoch: 18 Global Step: 192610 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:02:20,270-Speed 5934.24 samples/sec Loss 1.8349 LearningRate 0.0025 Epoch: 18 Global Step: 192620 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:02:27,127-Speed 5974.08 samples/sec Loss 1.8239 LearningRate 0.0025 Epoch: 18 Global Step: 192630 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:02:33,984-Speed 5974.91 samples/sec Loss 1.8315 LearningRate 0.0025 Epoch: 18 Global Step: 192640 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:02:40,856-Speed 5961.82 samples/sec Loss 1.8086 LearningRate 0.0025 Epoch: 18 Global Step: 192650 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:02:47,717-Speed 5971.58 samples/sec Loss 1.8429 LearningRate 0.0025 Epoch: 18 Global Step: 192660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:02:54,590-Speed 5960.70 samples/sec Loss 1.8266 LearningRate 0.0025 Epoch: 18 Global Step: 192670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:03:01,459-Speed 5964.98 samples/sec Loss 1.8127 LearningRate 0.0025 Epoch: 18 Global Step: 192680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:03:08,319-Speed 5971.67 samples/sec Loss 1.8584 LearningRate 0.0025 Epoch: 18 Global Step: 192690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:03:15,156-Speed 5992.28 samples/sec Loss 1.8132 LearningRate 0.0025 Epoch: 18 Global Step: 192700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:03:22,002-Speed 5984.24 samples/sec Loss 1.8155 LearningRate 0.0025 Epoch: 18 Global Step: 192710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:03:28,883-Speed 5953.62 samples/sec Loss 1.8179 LearningRate 0.0025 Epoch: 18 Global Step: 192720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:03:35,743-Speed 5973.80 samples/sec Loss 1.8359 LearningRate 0.0025 Epoch: 18 Global Step: 192730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:03:42,608-Speed 5970.69 samples/sec Loss 1.8486 LearningRate 0.0025 Epoch: 18 Global Step: 192740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:03:49,483-Speed 5959.01 samples/sec Loss 1.8145 LearningRate 0.0025 Epoch: 18 Global Step: 192750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:03:56,340-Speed 5974.83 samples/sec Loss 1.8373 LearningRate 0.0025 Epoch: 18 Global Step: 192760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:04:03,207-Speed 5965.66 samples/sec Loss 1.8062 LearningRate 0.0025 Epoch: 18 Global Step: 192770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:04:10,072-Speed 5967.34 samples/sec Loss 1.8317 LearningRate 0.0024 Epoch: 18 Global Step: 192780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:04:16,926-Speed 5977.86 samples/sec Loss 1.8019 LearningRate 0.0024 Epoch: 18 Global Step: 192790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:04:23,774-Speed 5982.88 samples/sec Loss 1.8033 LearningRate 0.0024 Epoch: 18 Global Step: 192800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:04:30,626-Speed 5978.29 samples/sec Loss 1.8142 LearningRate 0.0024 Epoch: 18 Global Step: 192810 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:04:37,479-Speed 5978.24 samples/sec Loss 1.8427 LearningRate 0.0024 Epoch: 18 Global Step: 192820 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:04:44,332-Speed 5979.28 samples/sec Loss 1.8451 LearningRate 0.0024 Epoch: 18 Global Step: 192830 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:04:51,197-Speed 5966.57 samples/sec Loss 1.8376 LearningRate 0.0024 Epoch: 18 Global Step: 192840 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:04:58,054-Speed 5974.89 samples/sec Loss 1.7947 LearningRate 0.0024 Epoch: 18 Global Step: 192850 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:05:04,921-Speed 5966.67 samples/sec Loss 1.7947 LearningRate 0.0024 Epoch: 18 Global Step: 192860 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:05:11,820-Speed 5937.97 samples/sec Loss 1.8383 LearningRate 0.0024 Epoch: 18 Global Step: 192870 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:05:18,709-Speed 5946.64 samples/sec Loss 1.8074 LearningRate 0.0024 Epoch: 18 Global Step: 192880 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:05:25,587-Speed 5959.53 samples/sec Loss 1.8001 LearningRate 0.0024 Epoch: 18 Global Step: 192890 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:05:32,440-Speed 5978.03 samples/sec Loss 1.7838 LearningRate 0.0024 Epoch: 18 Global Step: 192900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:05:39,304-Speed 5968.54 samples/sec Loss 1.7864 LearningRate 0.0024 Epoch: 18 Global Step: 192910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:05:46,163-Speed 5973.94 samples/sec Loss 1.8223 LearningRate 0.0024 Epoch: 18 Global Step: 192920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:05:53,030-Speed 5965.55 samples/sec Loss 1.8218 LearningRate 0.0024 Epoch: 18 Global Step: 192930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:05:59,874-Speed 5986.15 samples/sec Loss 1.8109 LearningRate 0.0024 Epoch: 18 Global Step: 192940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:06:06,736-Speed 5970.54 samples/sec Loss 1.8019 LearningRate 0.0024 Epoch: 18 Global Step: 192950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:06:13,579-Speed 5986.34 samples/sec Loss 1.8140 LearningRate 0.0024 Epoch: 18 Global Step: 192960 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:06:20,434-Speed 5976.84 samples/sec Loss 1.8315 LearningRate 0.0024 Epoch: 18 Global Step: 192970 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:06:27,294-Speed 5971.38 samples/sec Loss 1.7965 LearningRate 0.0024 Epoch: 18 Global Step: 192980 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:06:34,139-Speed 5985.53 samples/sec Loss 1.8586 LearningRate 0.0024 Epoch: 18 Global Step: 192990 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:06:40,994-Speed 5976.45 samples/sec Loss 1.8151 LearningRate 0.0024 Epoch: 18 Global Step: 193000 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:06:47,837-Speed 5986.86 samples/sec Loss 1.8087 LearningRate 0.0024 Epoch: 18 Global Step: 193010 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:06:54,710-Speed 5960.03 samples/sec Loss 1.8252 LearningRate 0.0024 Epoch: 18 Global Step: 193020 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:07:01,556-Speed 5984.37 samples/sec Loss 1.8302 LearningRate 0.0024 Epoch: 18 Global Step: 193030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:07:08,409-Speed 5978.62 samples/sec Loss 1.8182 LearningRate 0.0024 Epoch: 18 Global Step: 193040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:07:15,265-Speed 5975.52 samples/sec Loss 1.8355 LearningRate 0.0024 Epoch: 18 Global Step: 193050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:07:22,121-Speed 5975.55 samples/sec Loss 1.8136 LearningRate 0.0024 Epoch: 18 Global Step: 193060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:07:29,036-Speed 5925.35 samples/sec Loss 1.8199 LearningRate 0.0024 Epoch: 18 Global Step: 193070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:07:35,886-Speed 5981.21 samples/sec Loss 1.7841 LearningRate 0.0023 Epoch: 18 Global Step: 193080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:07:42,749-Speed 5969.11 samples/sec Loss 1.8230 LearningRate 0.0023 Epoch: 18 Global Step: 193090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:07:49,599-Speed 5981.01 samples/sec Loss 1.8181 LearningRate 0.0023 Epoch: 18 Global Step: 193100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:07:56,449-Speed 5980.51 samples/sec Loss 1.8146 LearningRate 0.0023 Epoch: 18 Global Step: 193110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:08:03,306-Speed 5974.41 samples/sec Loss 1.8417 LearningRate 0.0023 Epoch: 18 Global Step: 193120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:08:10,160-Speed 5977.98 samples/sec Loss 1.7874 LearningRate 0.0023 Epoch: 18 Global Step: 193130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:08:17,007-Speed 5983.11 samples/sec Loss 1.8025 LearningRate 0.0023 Epoch: 18 Global Step: 193140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:08:23,854-Speed 5983.11 samples/sec Loss 1.8148 LearningRate 0.0023 Epoch: 18 Global Step: 193150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:08:30,709-Speed 5976.95 samples/sec Loss 1.8075 LearningRate 0.0023 Epoch: 18 Global Step: 193160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:08:37,570-Speed 5970.40 samples/sec Loss 1.8250 LearningRate 0.0023 Epoch: 18 Global Step: 193170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:08:44,419-Speed 5981.45 samples/sec Loss 1.8366 LearningRate 0.0023 Epoch: 18 Global Step: 193180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:08:51,286-Speed 5966.38 samples/sec Loss 1.8329 LearningRate 0.0023 Epoch: 18 Global Step: 193190 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:08:58,155-Speed 5964.04 samples/sec Loss 1.7875 LearningRate 0.0023 Epoch: 18 Global Step: 193200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:09:05,000-Speed 5985.31 samples/sec Loss 1.7961 LearningRate 0.0023 Epoch: 18 Global Step: 193210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:09:11,865-Speed 5967.29 samples/sec Loss 1.8260 LearningRate 0.0023 Epoch: 18 Global Step: 193220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:09:18,713-Speed 5982.22 samples/sec Loss 1.8017 LearningRate 0.0023 Epoch: 18 Global Step: 193230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:09:25,595-Speed 5954.85 samples/sec Loss 1.8211 LearningRate 0.0023 Epoch: 18 Global Step: 193240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:09:32,487-Speed 5944.20 samples/sec Loss 1.7938 LearningRate 0.0023 Epoch: 18 Global Step: 193250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:09:39,375-Speed 5947.86 samples/sec Loss 1.8190 LearningRate 0.0023 Epoch: 18 Global Step: 193260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:09:46,250-Speed 5958.94 samples/sec Loss 1.8360 LearningRate 0.0023 Epoch: 18 Global Step: 193270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:09:53,122-Speed 5961.44 samples/sec Loss 1.8128 LearningRate 0.0023 Epoch: 18 Global Step: 193280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:09:59,984-Speed 5970.79 samples/sec Loss 1.8181 LearningRate 0.0023 Epoch: 18 Global Step: 193290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:10:06,823-Speed 5990.26 samples/sec Loss 1.7730 LearningRate 0.0023 Epoch: 18 Global Step: 193300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:10:13,684-Speed 5971.61 samples/sec Loss 1.7763 LearningRate 0.0023 Epoch: 18 Global Step: 193310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:10:20,526-Speed 5986.47 samples/sec Loss 1.7860 LearningRate 0.0023 Epoch: 18 Global Step: 193320 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:10:27,380-Speed 5977.88 samples/sec Loss 1.7921 LearningRate 0.0023 Epoch: 18 Global Step: 193330 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:10:34,272-Speed 5943.99 samples/sec Loss 1.8132 LearningRate 0.0023 Epoch: 18 Global Step: 193340 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:10:41,143-Speed 5962.73 samples/sec Loss 1.8051 LearningRate 0.0023 Epoch: 18 Global Step: 193350 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:10:48,001-Speed 5973.44 samples/sec Loss 1.7861 LearningRate 0.0023 Epoch: 18 Global Step: 193360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:10:54,857-Speed 5975.61 samples/sec Loss 1.7927 LearningRate 0.0023 Epoch: 18 Global Step: 193370 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:11:01,710-Speed 5978.29 samples/sec Loss 1.7867 LearningRate 0.0023 Epoch: 18 Global Step: 193380 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:11:08,570-Speed 5972.42 samples/sec Loss 1.8211 LearningRate 0.0022 Epoch: 18 Global Step: 193390 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:11:15,429-Speed 5973.19 samples/sec Loss 1.8031 LearningRate 0.0022 Epoch: 18 Global Step: 193400 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:11:22,299-Speed 5962.62 samples/sec Loss 1.7806 LearningRate 0.0022 Epoch: 18 Global Step: 193410 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:11:29,155-Speed 5976.02 samples/sec Loss 1.7923 LearningRate 0.0022 Epoch: 18 Global Step: 193420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:11:36,015-Speed 5971.86 samples/sec Loss 1.8032 LearningRate 0.0022 Epoch: 18 Global Step: 193430 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:11:42,900-Speed 5950.70 samples/sec Loss 1.7724 LearningRate 0.0022 Epoch: 18 Global Step: 193440 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:11:49,767-Speed 5965.84 samples/sec Loss 1.8116 LearningRate 0.0022 Epoch: 18 Global Step: 193450 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:11:56,617-Speed 5981.03 samples/sec Loss 1.8062 LearningRate 0.0022 Epoch: 18 Global Step: 193460 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:12:03,466-Speed 5981.41 samples/sec Loss 1.7804 LearningRate 0.0022 Epoch: 18 Global Step: 193470 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:12:10,318-Speed 5979.77 samples/sec Loss 1.7980 LearningRate 0.0022 Epoch: 18 Global Step: 193480 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:12:17,182-Speed 5971.17 samples/sec Loss 1.7792 LearningRate 0.0022 Epoch: 18 Global Step: 193490 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:12:24,055-Speed 5960.54 samples/sec Loss 1.8008 LearningRate 0.0022 Epoch: 18 Global Step: 193500 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:12:30,900-Speed 5984.58 samples/sec Loss 1.7888 LearningRate 0.0022 Epoch: 18 Global Step: 193510 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:12:37,746-Speed 5984.37 samples/sec Loss 1.7775 LearningRate 0.0022 Epoch: 18 Global Step: 193520 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:12:44,596-Speed 5980.48 samples/sec Loss 1.7808 LearningRate 0.0022 Epoch: 18 Global Step: 193530 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:12:51,444-Speed 5983.13 samples/sec Loss 1.8202 LearningRate 0.0022 Epoch: 18 Global Step: 193540 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:12:58,334-Speed 5945.64 samples/sec Loss 1.7698 LearningRate 0.0022 Epoch: 18 Global Step: 193550 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:13:05,194-Speed 5971.69 samples/sec Loss 1.8218 LearningRate 0.0022 Epoch: 18 Global Step: 193560 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:13:12,073-Speed 5955.75 samples/sec Loss 1.7923 LearningRate 0.0022 Epoch: 18 Global Step: 193570 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:13:18,931-Speed 5974.73 samples/sec Loss 1.7796 LearningRate 0.0022 Epoch: 18 Global Step: 193580 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:13:25,797-Speed 5966.52 samples/sec Loss 1.7688 LearningRate 0.0022 Epoch: 18 Global Step: 193590 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:13:32,654-Speed 5974.87 samples/sec Loss 1.8086 LearningRate 0.0022 Epoch: 18 Global Step: 193600 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:13:45,111-Speed 3288.79 samples/sec Loss 1.7868 LearningRate 0.0022 Epoch: 18 Global Step: 193610 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:13:51,957-Speed 5984.28 samples/sec Loss 1.7919 LearningRate 0.0022 Epoch: 18 Global Step: 193620 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:13:58,823-Speed 5966.90 samples/sec Loss 1.8039 LearningRate 0.0022 Epoch: 18 Global Step: 193630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:14:05,661-Speed 5990.92 samples/sec Loss 1.7771 LearningRate 0.0022 Epoch: 18 Global Step: 193640 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:14:12,494-Speed 5994.91 samples/sec Loss 1.7618 LearningRate 0.0022 Epoch: 18 Global Step: 193650 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:14:19,323-Speed 5998.83 samples/sec Loss 1.8232 LearningRate 0.0022 Epoch: 18 Global Step: 193660 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:14:26,192-Speed 5964.26 samples/sec Loss 1.8087 LearningRate 0.0022 Epoch: 18 Global Step: 193670 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:14:33,072-Speed 5954.95 samples/sec Loss 1.8127 LearningRate 0.0022 Epoch: 18 Global Step: 193680 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:14:39,980-Speed 5930.52 samples/sec Loss 1.7775 LearningRate 0.0022 Epoch: 18 Global Step: 193690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:14:46,894-Speed 5926.87 samples/sec Loss 1.7689 LearningRate 0.0021 Epoch: 18 Global Step: 193700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:14:53,739-Speed 5985.61 samples/sec Loss 1.7786 LearningRate 0.0021 Epoch: 18 Global Step: 193710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:15:00,632-Speed 5943.66 samples/sec Loss 1.7799 LearningRate 0.0021 Epoch: 18 Global Step: 193720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:15:07,563-Speed 5910.74 samples/sec Loss 1.7910 LearningRate 0.0021 Epoch: 18 Global Step: 193730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:15:14,473-Speed 5931.47 samples/sec Loss 1.7779 LearningRate 0.0021 Epoch: 18 Global Step: 193740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:15:21,332-Speed 5972.45 samples/sec Loss 1.7766 LearningRate 0.0021 Epoch: 18 Global Step: 193750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:15:28,189-Speed 5974.53 samples/sec Loss 1.7905 LearningRate 0.0021 Epoch: 18 Global Step: 193760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:15:35,098-Speed 5930.57 samples/sec Loss 1.7758 LearningRate 0.0021 Epoch: 18 Global Step: 193770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:15:42,026-Speed 5913.52 samples/sec Loss 1.7866 LearningRate 0.0021 Epoch: 18 Global Step: 193780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:15:48,952-Speed 5915.22 samples/sec Loss 1.7764 LearningRate 0.0021 Epoch: 18 Global Step: 193790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:15:55,845-Speed 5946.25 samples/sec Loss 1.7882 LearningRate 0.0021 Epoch: 18 Global Step: 193800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:16:02,707-Speed 5970.50 samples/sec Loss 1.7483 LearningRate 0.0021 Epoch: 18 Global Step: 193810 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:16:09,574-Speed 5965.76 samples/sec Loss 1.7472 LearningRate 0.0021 Epoch: 18 Global Step: 193820 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:16:16,432-Speed 5974.50 samples/sec Loss 1.8054 LearningRate 0.0021 Epoch: 18 Global Step: 193830 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:16:23,292-Speed 5971.77 samples/sec Loss 1.7665 LearningRate 0.0021 Epoch: 18 Global Step: 193840 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:16:30,207-Speed 5926.43 samples/sec Loss 1.7569 LearningRate 0.0021 Epoch: 18 Global Step: 193850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:16:37,074-Speed 5965.93 samples/sec Loss 1.7515 LearningRate 0.0021 Epoch: 18 Global Step: 193860 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:16:43,933-Speed 5972.26 samples/sec Loss 1.7581 LearningRate 0.0021 Epoch: 18 Global Step: 193870 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:16:50,798-Speed 5968.81 samples/sec Loss 1.7944 LearningRate 0.0021 Epoch: 18 Global Step: 193880 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:16:57,655-Speed 5976.35 samples/sec Loss 1.8005 LearningRate 0.0021 Epoch: 18 Global Step: 193890 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:17:04,506-Speed 5979.57 samples/sec Loss 1.7861 LearningRate 0.0021 Epoch: 18 Global Step: 193900 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:17:11,370-Speed 5968.53 samples/sec Loss 1.8068 LearningRate 0.0021 Epoch: 18 Global Step: 193910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:17:18,241-Speed 5964.31 samples/sec Loss 1.7427 LearningRate 0.0021 Epoch: 18 Global Step: 193920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:17:25,098-Speed 5974.83 samples/sec Loss 1.7938 LearningRate 0.0021 Epoch: 18 Global Step: 193930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:17:32,075-Speed 5871.77 samples/sec Loss 1.7635 LearningRate 0.0021 Epoch: 18 Global Step: 193940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:17:39,023-Speed 5897.30 samples/sec Loss 1.7434 LearningRate 0.0021 Epoch: 18 Global Step: 193950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:17:45,974-Speed 5895.86 samples/sec Loss 1.7826 LearningRate 0.0021 Epoch: 18 Global Step: 193960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:17:52,840-Speed 5967.10 samples/sec Loss 1.7823 LearningRate 0.0021 Epoch: 18 Global Step: 193970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:17:59,686-Speed 5983.68 samples/sec Loss 1.7702 LearningRate 0.0021 Epoch: 18 Global Step: 193980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:18:06,538-Speed 5979.45 samples/sec Loss 1.8009 LearningRate 0.0021 Epoch: 18 Global Step: 193990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:18:13,485-Speed 5898.46 samples/sec Loss 1.7487 LearningRate 0.0021 Epoch: 18 Global Step: 194000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:18:20,354-Speed 5964.02 samples/sec Loss 1.7654 LearningRate 0.0021 Epoch: 18 Global Step: 194010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:18:27,224-Speed 5963.34 samples/sec Loss 1.7757 LearningRate 0.0020 Epoch: 18 Global Step: 194020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:18:34,096-Speed 5961.42 samples/sec Loss 1.7995 LearningRate 0.0020 Epoch: 18 Global Step: 194030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:18:40,972-Speed 5960.49 samples/sec Loss 1.7589 LearningRate 0.0020 Epoch: 18 Global Step: 194040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:18:47,873-Speed 5936.25 samples/sec Loss 1.7469 LearningRate 0.0020 Epoch: 18 Global Step: 194050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:18:54,746-Speed 5961.36 samples/sec Loss 1.7954 LearningRate 0.0020 Epoch: 18 Global Step: 194060 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-09 10:19:01,584-Speed 5991.36 samples/sec Loss 1.7577 LearningRate 0.0020 Epoch: 18 Global Step: 194070 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:19:08,467-Speed 5952.28 samples/sec Loss 1.7673 LearningRate 0.0020 Epoch: 18 Global Step: 194080 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:19:15,407-Speed 5903.21 samples/sec Loss 1.7684 LearningRate 0.0020 Epoch: 18 Global Step: 194090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:19:22,278-Speed 5962.41 samples/sec Loss 1.7890 LearningRate 0.0020 Epoch: 18 Global Step: 194100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:19:29,143-Speed 5968.27 samples/sec Loss 1.7647 LearningRate 0.0020 Epoch: 18 Global Step: 194110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:19:36,043-Speed 5937.28 samples/sec Loss 1.7828 LearningRate 0.0020 Epoch: 18 Global Step: 194120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:19:42,911-Speed 5967.12 samples/sec Loss 1.7733 LearningRate 0.0020 Epoch: 18 Global Step: 194130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:19:49,773-Speed 5969.89 samples/sec Loss 1.7412 LearningRate 0.0020 Epoch: 18 Global Step: 194140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:19:56,640-Speed 5966.59 samples/sec Loss 1.7601 LearningRate 0.0020 Epoch: 18 Global Step: 194150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:20:03,493-Speed 5978.17 samples/sec Loss 1.7621 LearningRate 0.0020 Epoch: 18 Global Step: 194160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:20:10,347-Speed 5976.79 samples/sec Loss 1.7674 LearningRate 0.0020 Epoch: 18 Global Step: 194170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:20:17,214-Speed 5965.81 samples/sec Loss 1.7763 LearningRate 0.0020 Epoch: 18 Global Step: 194180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:20:24,064-Speed 5980.79 samples/sec Loss 1.7542 LearningRate 0.0020 Epoch: 18 Global Step: 194190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:20:30,907-Speed 5986.62 samples/sec Loss 1.7567 LearningRate 0.0020 Epoch: 18 Global Step: 194200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:20:37,822-Speed 5924.62 samples/sec Loss 1.7646 LearningRate 0.0020 Epoch: 18 Global Step: 194210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:20:44,696-Speed 5960.32 samples/sec Loss 1.7626 LearningRate 0.0020 Epoch: 18 Global Step: 194220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:20:51,602-Speed 5931.88 samples/sec Loss 1.7835 LearningRate 0.0020 Epoch: 18 Global Step: 194230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:20:58,500-Speed 5939.35 samples/sec Loss 1.7559 LearningRate 0.0020 Epoch: 18 Global Step: 194240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:21:05,398-Speed 5939.05 samples/sec Loss 1.7578 LearningRate 0.0020 Epoch: 18 Global Step: 194250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:21:12,304-Speed 5931.95 samples/sec Loss 1.7630 LearningRate 0.0020 Epoch: 18 Global Step: 194260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:21:19,151-Speed 5983.83 samples/sec Loss 1.7817 LearningRate 0.0020 Epoch: 18 Global Step: 194270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:21:26,036-Speed 5950.43 samples/sec Loss 1.7793 LearningRate 0.0020 Epoch: 18 Global Step: 194280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:21:32,889-Speed 5977.49 samples/sec Loss 1.7593 LearningRate 0.0020 Epoch: 18 Global Step: 194290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:21:39,750-Speed 5971.62 samples/sec Loss 1.7594 LearningRate 0.0020 Epoch: 18 Global Step: 194300 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:21:46,618-Speed 5967.42 samples/sec Loss 1.7581 LearningRate 0.0020 Epoch: 18 Global Step: 194310 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:21:53,478-Speed 5971.34 samples/sec Loss 1.7764 LearningRate 0.0020 Epoch: 18 Global Step: 194320 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:22:00,359-Speed 5954.67 samples/sec Loss 1.7723 LearningRate 0.0020 Epoch: 18 Global Step: 194330 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:22:07,228-Speed 5964.11 samples/sec Loss 1.7702 LearningRate 0.0020 Epoch: 18 Global Step: 194340 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:22:14,074-Speed 5983.48 samples/sec Loss 1.7845 LearningRate 0.0019 Epoch: 18 Global Step: 194350 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:22:20,929-Speed 5976.21 samples/sec Loss 1.7717 LearningRate 0.0019 Epoch: 18 Global Step: 194360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:22:27,776-Speed 5983.29 samples/sec Loss 1.7716 LearningRate 0.0019 Epoch: 18 Global Step: 194370 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:22:34,633-Speed 5974.57 samples/sec Loss 1.7658 LearningRate 0.0019 Epoch: 18 Global Step: 194380 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:22:41,480-Speed 5982.96 samples/sec Loss 1.7584 LearningRate 0.0019 Epoch: 18 Global Step: 194390 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:22:48,326-Speed 5984.32 samples/sec Loss 1.7823 LearningRate 0.0019 Epoch: 18 Global Step: 194400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:22:55,175-Speed 5981.27 samples/sec Loss 1.7293 LearningRate 0.0019 Epoch: 18 Global Step: 194410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:23:02,033-Speed 5973.64 samples/sec Loss 1.7678 LearningRate 0.0019 Epoch: 18 Global Step: 194420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:23:08,896-Speed 5969.64 samples/sec Loss 1.7385 LearningRate 0.0019 Epoch: 18 Global Step: 194430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:23:15,744-Speed 5982.13 samples/sec Loss 1.7692 LearningRate 0.0019 Epoch: 18 Global Step: 194440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:23:22,578-Speed 5994.45 samples/sec Loss 1.7799 LearningRate 0.0019 Epoch: 18 Global Step: 194450 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:23:29,443-Speed 5968.33 samples/sec Loss 1.7399 LearningRate 0.0019 Epoch: 18 Global Step: 194460 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:23:36,296-Speed 5977.77 samples/sec Loss 1.7434 LearningRate 0.0019 Epoch: 18 Global Step: 194470 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:23:43,179-Speed 5952.24 samples/sec Loss 1.7399 LearningRate 0.0019 Epoch: 18 Global Step: 194480 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:23:50,039-Speed 5972.59 samples/sec Loss 1.7338 LearningRate 0.0019 Epoch: 18 Global Step: 194490 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:23:56,913-Speed 5959.23 samples/sec Loss 1.7473 LearningRate 0.0019 Epoch: 18 Global Step: 194500 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:24:03,760-Speed 5983.95 samples/sec Loss 1.7626 LearningRate 0.0019 Epoch: 18 Global Step: 194510 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:24:10,627-Speed 5966.05 samples/sec Loss 1.7466 LearningRate 0.0019 Epoch: 18 Global Step: 194520 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:24:17,485-Speed 5974.07 samples/sec Loss 1.7643 LearningRate 0.0019 Epoch: 18 Global Step: 194530 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:24:24,335-Speed 5980.45 samples/sec Loss 1.7885 LearningRate 0.0019 Epoch: 18 Global Step: 194540 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:24:31,188-Speed 5978.96 samples/sec Loss 1.7330 LearningRate 0.0019 Epoch: 18 Global Step: 194550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:24:38,046-Speed 5973.09 samples/sec Loss 1.7408 LearningRate 0.0019 Epoch: 18 Global Step: 194560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:24:44,901-Speed 5976.83 samples/sec Loss 1.7314 LearningRate 0.0019 Epoch: 18 Global Step: 194570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 10:24:51,750-Speed 5981.92 samples/sec Loss 1.7367 LearningRate 0.0019 Epoch: 18 Global Step: 194580 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:24:58,601-Speed 5979.57 samples/sec Loss 1.7476 LearningRate 0.0019 Epoch: 18 Global Step: 194590 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:25:05,459-Speed 5973.47 samples/sec Loss 1.7038 LearningRate 0.0019 Epoch: 18 Global Step: 194600 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:25:12,443-Speed 5866.68 samples/sec Loss 1.7261 LearningRate 0.0019 Epoch: 18 Global Step: 194610 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:25:19,305-Speed 5970.70 samples/sec Loss 1.7274 LearningRate 0.0019 Epoch: 18 Global Step: 194620 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 10:25:26,172-Speed 5966.27 samples/sec Loss 1.7742 LearningRate 0.0019 Epoch: 18 Global Step: 194630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:25:33,112-Speed 5904.98 samples/sec Loss 1.7232 LearningRate 0.0019 Epoch: 18 Global Step: 194640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:25:40,065-Speed 5892.42 samples/sec Loss 1.7495 LearningRate 0.0019 Epoch: 18 Global Step: 194650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:25:46,918-Speed 5978.11 samples/sec Loss 1.7664 LearningRate 0.0019 Epoch: 18 Global Step: 194660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:25:53,775-Speed 5977.82 samples/sec Loss 1.7841 LearningRate 0.0019 Epoch: 18 Global Step: 194670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:26:00,632-Speed 5974.18 samples/sec Loss 1.7694 LearningRate 0.0019 Epoch: 18 Global Step: 194680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:26:07,506-Speed 5960.02 samples/sec Loss 1.7407 LearningRate 0.0018 Epoch: 18 Global Step: 194690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:26:14,356-Speed 5980.63 samples/sec Loss 1.7221 LearningRate 0.0018 Epoch: 18 Global Step: 194700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:26:21,222-Speed 5966.93 samples/sec Loss 1.7469 LearningRate 0.0018 Epoch: 18 Global Step: 194710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:26:28,072-Speed 5980.75 samples/sec Loss 1.7537 LearningRate 0.0018 Epoch: 18 Global Step: 194720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:26:34,930-Speed 5974.77 samples/sec Loss 1.7400 LearningRate 0.0018 Epoch: 18 Global Step: 194730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:26:41,864-Speed 5907.97 samples/sec Loss 1.7681 LearningRate 0.0018 Epoch: 18 Global Step: 194740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:26:48,736-Speed 5961.35 samples/sec Loss 1.7200 LearningRate 0.0018 Epoch: 18 Global Step: 194750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:26:55,586-Speed 5981.42 samples/sec Loss 1.7412 LearningRate 0.0018 Epoch: 18 Global Step: 194760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:27:02,457-Speed 5961.68 samples/sec Loss 1.7615 LearningRate 0.0018 Epoch: 18 Global Step: 194770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:27:09,315-Speed 5974.26 samples/sec Loss 1.7476 LearningRate 0.0018 Epoch: 18 Global Step: 194780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:27:16,160-Speed 5985.36 samples/sec Loss 1.7276 LearningRate 0.0018 Epoch: 18 Global Step: 194790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:27:23,011-Speed 5979.19 samples/sec Loss 1.7462 LearningRate 0.0018 Epoch: 18 Global Step: 194800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:27:29,871-Speed 5972.79 samples/sec Loss 1.7468 LearningRate 0.0018 Epoch: 18 Global Step: 194810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:27:36,719-Speed 5982.37 samples/sec Loss 1.7058 LearningRate 0.0018 Epoch: 18 Global Step: 194820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:27:43,588-Speed 5964.10 samples/sec Loss 1.7613 LearningRate 0.0018 Epoch: 18 Global Step: 194830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:27:50,429-Speed 5989.08 samples/sec Loss 1.7428 LearningRate 0.0018 Epoch: 18 Global Step: 194840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:27:57,302-Speed 5960.51 samples/sec Loss 1.7261 LearningRate 0.0018 Epoch: 18 Global Step: 194850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:28:04,164-Speed 5970.02 samples/sec Loss 1.7654 LearningRate 0.0018 Epoch: 18 Global Step: 194860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:28:11,044-Speed 5956.31 samples/sec Loss 1.7441 LearningRate 0.0018 Epoch: 18 Global Step: 194870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:28:17,899-Speed 5976.37 samples/sec Loss 1.7211 LearningRate 0.0018 Epoch: 18 Global Step: 194880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:28:24,759-Speed 5972.20 samples/sec Loss 1.7770 LearningRate 0.0018 Epoch: 18 Global Step: 194890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:28:31,662-Speed 5934.92 samples/sec Loss 1.7382 LearningRate 0.0018 Epoch: 18 Global Step: 194900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:28:38,560-Speed 5939.79 samples/sec Loss 1.7394 LearningRate 0.0018 Epoch: 18 Global Step: 194910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:28:45,431-Speed 5962.23 samples/sec Loss 1.7137 LearningRate 0.0018 Epoch: 18 Global Step: 194920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:28:52,268-Speed 5991.70 samples/sec Loss 1.7464 LearningRate 0.0018 Epoch: 18 Global Step: 194930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:28:59,118-Speed 5981.10 samples/sec Loss 1.7214 LearningRate 0.0018 Epoch: 18 Global Step: 194940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:29:05,951-Speed 5994.65 samples/sec Loss 1.7371 LearningRate 0.0018 Epoch: 18 Global Step: 194950 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 10:29:12,814-Speed 5969.21 samples/sec Loss 1.7662 LearningRate 0.0018 Epoch: 18 Global Step: 194960 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 10:29:19,673-Speed 5973.33 samples/sec Loss 1.7393 LearningRate 0.0018 Epoch: 18 Global Step: 194970 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 10:29:26,533-Speed 5971.40 samples/sec Loss 1.7329 LearningRate 0.0018 Epoch: 18 Global Step: 194980 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 10:29:33,386-Speed 5978.44 samples/sec Loss 1.7445 LearningRate 0.0018 Epoch: 18 Global Step: 194990 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 10:29:40,235-Speed 5981.25 samples/sec Loss 1.7025 LearningRate 0.0018 Epoch: 18 Global Step: 195000 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 10:30:06,980-[lfw][195000]XNorm: 23.604671 Training: 2022-01-09 10:30:06,980-[lfw][195000]Accuracy-Flip: 0.99800+-0.00267 Training: 2022-01-09 10:30:06,981-[lfw][195000]Accuracy-Highest: 0.99833 Training: 2022-01-09 10:30:38,026-[cfp_fp][195000]XNorm: 21.649061 Training: 2022-01-09 10:30:38,027-[cfp_fp][195000]Accuracy-Flip: 0.99286+-0.00313 Training: 2022-01-09 10:30:38,027-[cfp_fp][195000]Accuracy-Highest: 0.99286 Training: 2022-01-09 10:31:04,928-[agedb_30][195000]XNorm: 23.145942 Training: 2022-01-09 10:31:04,929-[agedb_30][195000]Accuracy-Flip: 0.98233+-0.00583 Training: 2022-01-09 10:31:04,930-[agedb_30][195000]Accuracy-Highest: 0.98233 Training: 2022-01-09 10:31:11,761-Speed 447.53 samples/sec Loss 1.7544 LearningRate 0.0018 Epoch: 18 Global Step: 195010 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 10:31:18,587-Speed 6001.40 samples/sec Loss 1.7183 LearningRate 0.0018 Epoch: 18 Global Step: 195020 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 10:31:25,454-Speed 5965.80 samples/sec Loss 1.7370 LearningRate 0.0018 Epoch: 18 Global Step: 195030 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 10:31:32,299-Speed 5985.02 samples/sec Loss 1.7591 LearningRate 0.0017 Epoch: 18 Global Step: 195040 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 10:31:39,157-Speed 5976.46 samples/sec Loss 1.7161 LearningRate 0.0017 Epoch: 18 Global Step: 195050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:31:46,034-Speed 5957.56 samples/sec Loss 1.7307 LearningRate 0.0017 Epoch: 18 Global Step: 195060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:31:52,906-Speed 5962.11 samples/sec Loss 1.7245 LearningRate 0.0017 Epoch: 18 Global Step: 195070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:31:59,772-Speed 5966.14 samples/sec Loss 1.7288 LearningRate 0.0017 Epoch: 18 Global Step: 195080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:32:06,643-Speed 5963.34 samples/sec Loss 1.7354 LearningRate 0.0017 Epoch: 18 Global Step: 195090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:32:13,497-Speed 5977.28 samples/sec Loss 1.7238 LearningRate 0.0017 Epoch: 18 Global Step: 195100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:32:20,363-Speed 5966.90 samples/sec Loss 1.7420 LearningRate 0.0017 Epoch: 18 Global Step: 195110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:32:27,237-Speed 5959.98 samples/sec Loss 1.7292 LearningRate 0.0017 Epoch: 18 Global Step: 195120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:32:34,099-Speed 5970.12 samples/sec Loss 1.7216 LearningRate 0.0017 Epoch: 18 Global Step: 195130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:32:40,961-Speed 5969.98 samples/sec Loss 1.7074 LearningRate 0.0017 Epoch: 18 Global Step: 195140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:32:47,815-Speed 5977.75 samples/sec Loss 1.7156 LearningRate 0.0017 Epoch: 18 Global Step: 195150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:32:54,664-Speed 5981.67 samples/sec Loss 1.7159 LearningRate 0.0017 Epoch: 18 Global Step: 195160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:33:01,513-Speed 5981.69 samples/sec Loss 1.7444 LearningRate 0.0017 Epoch: 18 Global Step: 195170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:33:08,371-Speed 5973.39 samples/sec Loss 1.7191 LearningRate 0.0017 Epoch: 18 Global Step: 195180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:33:15,240-Speed 5964.54 samples/sec Loss 1.7140 LearningRate 0.0017 Epoch: 18 Global Step: 195190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:33:22,096-Speed 5975.24 samples/sec Loss 1.7133 LearningRate 0.0017 Epoch: 18 Global Step: 195200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:33:28,947-Speed 5980.90 samples/sec Loss 1.7083 LearningRate 0.0017 Epoch: 18 Global Step: 195210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:33:35,785-Speed 5990.90 samples/sec Loss 1.6837 LearningRate 0.0017 Epoch: 18 Global Step: 195220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:33:42,624-Speed 5990.12 samples/sec Loss 1.7575 LearningRate 0.0017 Epoch: 18 Global Step: 195230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:33:49,482-Speed 5973.41 samples/sec Loss 1.7172 LearningRate 0.0017 Epoch: 18 Global Step: 195240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:33:56,351-Speed 5965.18 samples/sec Loss 1.7695 LearningRate 0.0017 Epoch: 18 Global Step: 195250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:34:03,193-Speed 5987.97 samples/sec Loss 1.7496 LearningRate 0.0017 Epoch: 18 Global Step: 195260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:34:10,044-Speed 5979.97 samples/sec Loss 1.7412 LearningRate 0.0017 Epoch: 18 Global Step: 195270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:34:16,887-Speed 5986.89 samples/sec Loss 1.7342 LearningRate 0.0017 Epoch: 18 Global Step: 195280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:34:23,724-Speed 5991.79 samples/sec Loss 1.7104 LearningRate 0.0017 Epoch: 18 Global Step: 195290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:34:30,577-Speed 5978.13 samples/sec Loss 1.7188 LearningRate 0.0017 Epoch: 18 Global Step: 195300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:34:37,459-Speed 5954.14 samples/sec Loss 1.7096 LearningRate 0.0017 Epoch: 18 Global Step: 195310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:34:44,326-Speed 5965.56 samples/sec Loss 1.7102 LearningRate 0.0017 Epoch: 18 Global Step: 195320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:34:51,176-Speed 5983.65 samples/sec Loss 1.6973 LearningRate 0.0017 Epoch: 18 Global Step: 195330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:34:58,035-Speed 5972.27 samples/sec Loss 1.7199 LearningRate 0.0017 Epoch: 18 Global Step: 195340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:35:04,892-Speed 5974.89 samples/sec Loss 1.7149 LearningRate 0.0017 Epoch: 18 Global Step: 195350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:35:11,738-Speed 5984.59 samples/sec Loss 1.7409 LearningRate 0.0017 Epoch: 18 Global Step: 195360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:35:18,585-Speed 5983.18 samples/sec Loss 1.7160 LearningRate 0.0017 Epoch: 18 Global Step: 195370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:35:25,420-Speed 5993.85 samples/sec Loss 1.7131 LearningRate 0.0017 Epoch: 18 Global Step: 195380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:35:32,254-Speed 5995.17 samples/sec Loss 1.7335 LearningRate 0.0017 Epoch: 18 Global Step: 195390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:35:39,087-Speed 5995.28 samples/sec Loss 1.7317 LearningRate 0.0016 Epoch: 18 Global Step: 195400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:35:45,952-Speed 5967.33 samples/sec Loss 1.7064 LearningRate 0.0016 Epoch: 18 Global Step: 195410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:35:52,838-Speed 5949.67 samples/sec Loss 1.7099 LearningRate 0.0016 Epoch: 18 Global Step: 195420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:36:05,519-Speed 3230.33 samples/sec Loss 1.7341 LearningRate 0.0016 Epoch: 18 Global Step: 195430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:36:12,363-Speed 5987.63 samples/sec Loss 1.7249 LearningRate 0.0016 Epoch: 18 Global Step: 195440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:36:19,194-Speed 5996.40 samples/sec Loss 1.7054 LearningRate 0.0016 Epoch: 18 Global Step: 195450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:36:26,068-Speed 5960.53 samples/sec Loss 1.6874 LearningRate 0.0016 Epoch: 18 Global Step: 195460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:36:32,927-Speed 5972.56 samples/sec Loss 1.7170 LearningRate 0.0016 Epoch: 18 Global Step: 195470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:36:39,764-Speed 5991.72 samples/sec Loss 1.7128 LearningRate 0.0016 Epoch: 18 Global Step: 195480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:36:46,616-Speed 5979.47 samples/sec Loss 1.7087 LearningRate 0.0016 Epoch: 18 Global Step: 195490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:36:53,468-Speed 5978.88 samples/sec Loss 1.6823 LearningRate 0.0016 Epoch: 18 Global Step: 195500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:37:00,352-Speed 5951.45 samples/sec Loss 1.6892 LearningRate 0.0016 Epoch: 18 Global Step: 195510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:37:07,210-Speed 5973.88 samples/sec Loss 1.7056 LearningRate 0.0016 Epoch: 18 Global Step: 195520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:37:14,107-Speed 5940.45 samples/sec Loss 1.6998 LearningRate 0.0016 Epoch: 18 Global Step: 195530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:37:20,956-Speed 5981.18 samples/sec Loss 1.7038 LearningRate 0.0016 Epoch: 18 Global Step: 195540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:37:27,823-Speed 5965.91 samples/sec Loss 1.7508 LearningRate 0.0016 Epoch: 18 Global Step: 195550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:37:34,673-Speed 5981.26 samples/sec Loss 1.6877 LearningRate 0.0016 Epoch: 18 Global Step: 195560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:37:41,544-Speed 5962.25 samples/sec Loss 1.7134 LearningRate 0.0016 Epoch: 18 Global Step: 195570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:37:48,479-Speed 5907.26 samples/sec Loss 1.7025 LearningRate 0.0016 Epoch: 18 Global Step: 195580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:37:55,335-Speed 5976.37 samples/sec Loss 1.7447 LearningRate 0.0016 Epoch: 18 Global Step: 195590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:38:02,223-Speed 5947.32 samples/sec Loss 1.6777 LearningRate 0.0016 Epoch: 18 Global Step: 195600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:38:09,097-Speed 5959.96 samples/sec Loss 1.7076 LearningRate 0.0016 Epoch: 18 Global Step: 195610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:38:15,976-Speed 5955.83 samples/sec Loss 1.7021 LearningRate 0.0016 Epoch: 18 Global Step: 195620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:38:22,827-Speed 5979.47 samples/sec Loss 1.6898 LearningRate 0.0016 Epoch: 18 Global Step: 195630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:38:29,676-Speed 5981.62 samples/sec Loss 1.6844 LearningRate 0.0016 Epoch: 18 Global Step: 195640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:38:36,546-Speed 5963.01 samples/sec Loss 1.6869 LearningRate 0.0016 Epoch: 18 Global Step: 195650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:38:43,415-Speed 5964.14 samples/sec Loss 1.7270 LearningRate 0.0016 Epoch: 18 Global Step: 195660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:38:50,261-Speed 5984.81 samples/sec Loss 1.7157 LearningRate 0.0016 Epoch: 18 Global Step: 195670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:38:57,202-Speed 5902.02 samples/sec Loss 1.6719 LearningRate 0.0016 Epoch: 18 Global Step: 195680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:39:04,058-Speed 5975.23 samples/sec Loss 1.6935 LearningRate 0.0016 Epoch: 18 Global Step: 195690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:39:11,015-Speed 5889.26 samples/sec Loss 1.6966 LearningRate 0.0016 Epoch: 18 Global Step: 195700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:39:17,866-Speed 5979.63 samples/sec Loss 1.6939 LearningRate 0.0016 Epoch: 18 Global Step: 195710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:39:24,748-Speed 5952.37 samples/sec Loss 1.6817 LearningRate 0.0016 Epoch: 18 Global Step: 195720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:39:31,603-Speed 5976.28 samples/sec Loss 1.7125 LearningRate 0.0016 Epoch: 18 Global Step: 195730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:39:38,481-Speed 5956.81 samples/sec Loss 1.7097 LearningRate 0.0016 Epoch: 18 Global Step: 195740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:39:45,344-Speed 5969.22 samples/sec Loss 1.7239 LearningRate 0.0016 Epoch: 18 Global Step: 195750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:39:52,211-Speed 5966.23 samples/sec Loss 1.7072 LearningRate 0.0016 Epoch: 18 Global Step: 195760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:39:59,083-Speed 5961.07 samples/sec Loss 1.6745 LearningRate 0.0015 Epoch: 18 Global Step: 195770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:40:05,957-Speed 5960.22 samples/sec Loss 1.6872 LearningRate 0.0015 Epoch: 18 Global Step: 195780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:40:12,827-Speed 5963.71 samples/sec Loss 1.7020 LearningRate 0.0015 Epoch: 18 Global Step: 195790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:40:19,679-Speed 5978.83 samples/sec Loss 1.7194 LearningRate 0.0015 Epoch: 18 Global Step: 195800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:40:26,535-Speed 5975.05 samples/sec Loss 1.6713 LearningRate 0.0015 Epoch: 18 Global Step: 195810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:40:33,389-Speed 5977.83 samples/sec Loss 1.7442 LearningRate 0.0015 Epoch: 18 Global Step: 195820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:40:40,269-Speed 5954.39 samples/sec Loss 1.7045 LearningRate 0.0015 Epoch: 18 Global Step: 195830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:40:47,122-Speed 5977.68 samples/sec Loss 1.7158 LearningRate 0.0015 Epoch: 18 Global Step: 195840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:40:53,979-Speed 5974.35 samples/sec Loss 1.7337 LearningRate 0.0015 Epoch: 18 Global Step: 195850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:41:00,826-Speed 5984.06 samples/sec Loss 1.6786 LearningRate 0.0015 Epoch: 18 Global Step: 195860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:41:07,717-Speed 5944.79 samples/sec Loss 1.7079 LearningRate 0.0015 Epoch: 18 Global Step: 195870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:41:14,613-Speed 5940.94 samples/sec Loss 1.7059 LearningRate 0.0015 Epoch: 18 Global Step: 195880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:41:21,467-Speed 5976.40 samples/sec Loss 1.7095 LearningRate 0.0015 Epoch: 18 Global Step: 195890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:41:28,350-Speed 5952.16 samples/sec Loss 1.7063 LearningRate 0.0015 Epoch: 18 Global Step: 195900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:41:35,228-Speed 5956.55 samples/sec Loss 1.6683 LearningRate 0.0015 Epoch: 18 Global Step: 195910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:41:42,084-Speed 5975.63 samples/sec Loss 1.6920 LearningRate 0.0015 Epoch: 18 Global Step: 195920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:41:48,958-Speed 5960.21 samples/sec Loss 1.7375 LearningRate 0.0015 Epoch: 18 Global Step: 195930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:41:55,813-Speed 5975.95 samples/sec Loss 1.6900 LearningRate 0.0015 Epoch: 18 Global Step: 195940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:42:02,690-Speed 5957.42 samples/sec Loss 1.6653 LearningRate 0.0015 Epoch: 18 Global Step: 195950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:42:09,554-Speed 5969.30 samples/sec Loss 1.6878 LearningRate 0.0015 Epoch: 18 Global Step: 195960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:42:16,419-Speed 5967.28 samples/sec Loss 1.7122 LearningRate 0.0015 Epoch: 18 Global Step: 195970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:42:23,287-Speed 5965.25 samples/sec Loss 1.7153 LearningRate 0.0015 Epoch: 18 Global Step: 195980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:42:30,165-Speed 5956.27 samples/sec Loss 1.6809 LearningRate 0.0015 Epoch: 18 Global Step: 195990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:42:37,041-Speed 5958.23 samples/sec Loss 1.6897 LearningRate 0.0015 Epoch: 18 Global Step: 196000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:42:43,912-Speed 5962.42 samples/sec Loss 1.7097 LearningRate 0.0015 Epoch: 18 Global Step: 196010 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:42:50,780-Speed 5965.01 samples/sec Loss 1.6897 LearningRate 0.0015 Epoch: 18 Global Step: 196020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:42:57,636-Speed 5974.97 samples/sec Loss 1.6842 LearningRate 0.0015 Epoch: 18 Global Step: 196030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:43:04,488-Speed 5979.29 samples/sec Loss 1.6844 LearningRate 0.0015 Epoch: 18 Global Step: 196040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:43:11,373-Speed 5950.70 samples/sec Loss 1.7073 LearningRate 0.0015 Epoch: 18 Global Step: 196050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:43:18,235-Speed 5969.98 samples/sec Loss 1.7014 LearningRate 0.0015 Epoch: 18 Global Step: 196060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:43:25,092-Speed 5974.61 samples/sec Loss 1.7043 LearningRate 0.0015 Epoch: 18 Global Step: 196070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:43:31,950-Speed 5974.28 samples/sec Loss 1.6815 LearningRate 0.0015 Epoch: 18 Global Step: 196080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:43:38,807-Speed 5974.72 samples/sec Loss 1.7081 LearningRate 0.0015 Epoch: 18 Global Step: 196090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:43:45,667-Speed 5972.75 samples/sec Loss 1.6953 LearningRate 0.0015 Epoch: 18 Global Step: 196100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:43:52,524-Speed 5974.49 samples/sec Loss 1.6974 LearningRate 0.0015 Epoch: 18 Global Step: 196110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:43:59,379-Speed 5976.34 samples/sec Loss 1.6911 LearningRate 0.0015 Epoch: 18 Global Step: 196120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:44:06,234-Speed 5976.19 samples/sec Loss 1.6691 LearningRate 0.0015 Epoch: 18 Global Step: 196130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:44:13,155-Speed 5919.59 samples/sec Loss 1.6498 LearningRate 0.0015 Epoch: 18 Global Step: 196140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:44:20,067-Speed 5927.43 samples/sec Loss 1.7171 LearningRate 0.0014 Epoch: 18 Global Step: 196150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:44:26,920-Speed 5977.62 samples/sec Loss 1.6733 LearningRate 0.0014 Epoch: 18 Global Step: 196160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:44:33,783-Speed 5968.98 samples/sec Loss 1.7031 LearningRate 0.0014 Epoch: 18 Global Step: 196170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:44:40,642-Speed 5973.49 samples/sec Loss 1.7103 LearningRate 0.0014 Epoch: 18 Global Step: 196180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:44:47,503-Speed 5970.75 samples/sec Loss 1.6949 LearningRate 0.0014 Epoch: 18 Global Step: 196190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:44:54,380-Speed 5957.66 samples/sec Loss 1.6804 LearningRate 0.0014 Epoch: 18 Global Step: 196200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:45:01,254-Speed 5959.40 samples/sec Loss 1.7076 LearningRate 0.0014 Epoch: 18 Global Step: 196210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:45:08,124-Speed 5963.33 samples/sec Loss 1.6869 LearningRate 0.0014 Epoch: 18 Global Step: 196220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:45:15,008-Speed 5951.38 samples/sec Loss 1.6718 LearningRate 0.0014 Epoch: 18 Global Step: 196230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:45:21,884-Speed 5958.54 samples/sec Loss 1.7132 LearningRate 0.0014 Epoch: 18 Global Step: 196240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:45:28,747-Speed 5968.56 samples/sec Loss 1.6746 LearningRate 0.0014 Epoch: 18 Global Step: 196250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:45:35,621-Speed 5960.29 samples/sec Loss 1.7088 LearningRate 0.0014 Epoch: 18 Global Step: 196260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:45:42,486-Speed 5967.99 samples/sec Loss 1.6844 LearningRate 0.0014 Epoch: 18 Global Step: 196270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:45:49,349-Speed 5968.91 samples/sec Loss 1.7075 LearningRate 0.0014 Epoch: 18 Global Step: 196280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:45:56,214-Speed 5968.17 samples/sec Loss 1.7055 LearningRate 0.0014 Epoch: 18 Global Step: 196290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:46:03,071-Speed 5974.26 samples/sec Loss 1.6624 LearningRate 0.0014 Epoch: 18 Global Step: 196300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:46:09,923-Speed 5979.40 samples/sec Loss 1.6900 LearningRate 0.0014 Epoch: 18 Global Step: 196310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:46:16,782-Speed 5972.18 samples/sec Loss 1.6779 LearningRate 0.0014 Epoch: 18 Global Step: 196320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:46:23,650-Speed 5969.13 samples/sec Loss 1.6666 LearningRate 0.0014 Epoch: 18 Global Step: 196330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:46:30,528-Speed 5956.28 samples/sec Loss 1.6806 LearningRate 0.0014 Epoch: 18 Global Step: 196340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:46:37,378-Speed 5979.90 samples/sec Loss 1.6754 LearningRate 0.0014 Epoch: 18 Global Step: 196350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:46:44,239-Speed 5971.53 samples/sec Loss 1.6923 LearningRate 0.0014 Epoch: 18 Global Step: 196360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:46:51,119-Speed 5955.22 samples/sec Loss 1.6986 LearningRate 0.0014 Epoch: 18 Global Step: 196370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:46:57,993-Speed 5959.10 samples/sec Loss 1.7057 LearningRate 0.0014 Epoch: 18 Global Step: 196380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:47:04,888-Speed 5942.19 samples/sec Loss 1.6580 LearningRate 0.0014 Epoch: 18 Global Step: 196390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:47:11,770-Speed 5952.76 samples/sec Loss 1.6517 LearningRate 0.0014 Epoch: 18 Global Step: 196400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:47:18,634-Speed 5968.50 samples/sec Loss 1.6372 LearningRate 0.0014 Epoch: 18 Global Step: 196410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:47:25,494-Speed 5972.31 samples/sec Loss 1.7040 LearningRate 0.0014 Epoch: 18 Global Step: 196420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:47:32,351-Speed 5973.78 samples/sec Loss 1.6886 LearningRate 0.0014 Epoch: 18 Global Step: 196430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:47:39,220-Speed 5964.61 samples/sec Loss 1.6997 LearningRate 0.0014 Epoch: 18 Global Step: 196440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:47:46,088-Speed 5964.73 samples/sec Loss 1.6828 LearningRate 0.0014 Epoch: 18 Global Step: 196450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:47:52,958-Speed 5963.71 samples/sec Loss 1.6812 LearningRate 0.0014 Epoch: 18 Global Step: 196460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:47:59,839-Speed 5954.15 samples/sec Loss 1.6826 LearningRate 0.0014 Epoch: 18 Global Step: 196470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:48:06,682-Speed 5986.65 samples/sec Loss 1.6810 LearningRate 0.0014 Epoch: 18 Global Step: 196480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:48:13,554-Speed 5961.27 samples/sec Loss 1.6864 LearningRate 0.0014 Epoch: 18 Global Step: 196490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:48:20,410-Speed 5976.04 samples/sec Loss 1.6827 LearningRate 0.0014 Epoch: 18 Global Step: 196500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:48:27,272-Speed 5970.15 samples/sec Loss 1.6778 LearningRate 0.0014 Epoch: 18 Global Step: 196510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:48:34,128-Speed 5974.91 samples/sec Loss 1.6756 LearningRate 0.0014 Epoch: 18 Global Step: 196520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:48:41,012-Speed 5951.48 samples/sec Loss 1.6641 LearningRate 0.0014 Epoch: 18 Global Step: 196530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:48:47,882-Speed 5964.01 samples/sec Loss 1.6851 LearningRate 0.0013 Epoch: 18 Global Step: 196540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:48:54,733-Speed 5979.63 samples/sec Loss 1.6724 LearningRate 0.0013 Epoch: 18 Global Step: 196550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:49:01,579-Speed 5983.71 samples/sec Loss 1.6883 LearningRate 0.0013 Epoch: 18 Global Step: 196560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:49:08,436-Speed 5974.20 samples/sec Loss 1.6636 LearningRate 0.0013 Epoch: 18 Global Step: 196570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:49:15,299-Speed 5969.79 samples/sec Loss 1.6708 LearningRate 0.0013 Epoch: 18 Global Step: 196580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:49:22,144-Speed 5985.57 samples/sec Loss 1.6675 LearningRate 0.0013 Epoch: 18 Global Step: 196590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:49:29,005-Speed 5970.96 samples/sec Loss 1.6695 LearningRate 0.0013 Epoch: 18 Global Step: 196600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:49:35,856-Speed 5979.80 samples/sec Loss 1.6403 LearningRate 0.0013 Epoch: 18 Global Step: 196610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:49:42,711-Speed 5976.91 samples/sec Loss 1.6767 LearningRate 0.0013 Epoch: 18 Global Step: 196620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:49:49,575-Speed 5968.22 samples/sec Loss 1.6836 LearningRate 0.0013 Epoch: 18 Global Step: 196630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:49:56,425-Speed 5980.31 samples/sec Loss 1.6820 LearningRate 0.0013 Epoch: 18 Global Step: 196640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:50:03,274-Speed 5982.33 samples/sec Loss 1.7011 LearningRate 0.0013 Epoch: 18 Global Step: 196650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:50:10,148-Speed 5959.50 samples/sec Loss 1.6543 LearningRate 0.0013 Epoch: 18 Global Step: 196660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:50:17,018-Speed 5963.54 samples/sec Loss 1.6870 LearningRate 0.0013 Epoch: 18 Global Step: 196670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:50:23,935-Speed 5923.66 samples/sec Loss 1.6949 LearningRate 0.0013 Epoch: 18 Global Step: 196680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:50:30,788-Speed 5980.86 samples/sec Loss 1.6690 LearningRate 0.0013 Epoch: 18 Global Step: 196690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:50:37,647-Speed 5972.61 samples/sec Loss 1.6508 LearningRate 0.0013 Epoch: 18 Global Step: 196700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:50:44,499-Speed 5979.86 samples/sec Loss 1.6952 LearningRate 0.0013 Epoch: 18 Global Step: 196710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:50:51,341-Speed 5986.75 samples/sec Loss 1.6986 LearningRate 0.0013 Epoch: 18 Global Step: 196720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:50:58,207-Speed 5967.10 samples/sec Loss 1.6855 LearningRate 0.0013 Epoch: 18 Global Step: 196730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:51:05,067-Speed 5972.23 samples/sec Loss 1.6619 LearningRate 0.0013 Epoch: 18 Global Step: 196740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:51:12,018-Speed 5893.46 samples/sec Loss 1.6710 LearningRate 0.0013 Epoch: 18 Global Step: 196750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:51:18,866-Speed 5982.48 samples/sec Loss 1.6756 LearningRate 0.0013 Epoch: 18 Global Step: 196760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:51:25,753-Speed 5951.65 samples/sec Loss 1.6881 LearningRate 0.0013 Epoch: 18 Global Step: 196770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:51:32,610-Speed 5974.54 samples/sec Loss 1.6740 LearningRate 0.0013 Epoch: 18 Global Step: 196780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:51:39,470-Speed 5971.90 samples/sec Loss 1.6464 LearningRate 0.0013 Epoch: 18 Global Step: 196790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:51:46,317-Speed 5983.52 samples/sec Loss 1.6416 LearningRate 0.0013 Epoch: 18 Global Step: 196800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:51:53,162-Speed 5985.84 samples/sec Loss 1.6949 LearningRate 0.0013 Epoch: 18 Global Step: 196810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:52:00,018-Speed 5975.16 samples/sec Loss 1.6878 LearningRate 0.0013 Epoch: 18 Global Step: 196820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:52:06,891-Speed 5960.74 samples/sec Loss 1.6578 LearningRate 0.0013 Epoch: 18 Global Step: 196830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:52:13,739-Speed 5982.60 samples/sec Loss 1.6684 LearningRate 0.0013 Epoch: 18 Global Step: 196840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:52:20,606-Speed 5966.38 samples/sec Loss 1.6891 LearningRate 0.0013 Epoch: 18 Global Step: 196850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:52:27,477-Speed 5962.12 samples/sec Loss 1.6828 LearningRate 0.0013 Epoch: 18 Global Step: 196860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:52:34,347-Speed 5963.77 samples/sec Loss 1.6570 LearningRate 0.0013 Epoch: 18 Global Step: 196870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:52:41,201-Speed 5976.94 samples/sec Loss 1.6649 LearningRate 0.0013 Epoch: 18 Global Step: 196880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:52:48,061-Speed 5972.48 samples/sec Loss 1.6776 LearningRate 0.0013 Epoch: 18 Global Step: 196890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:52:54,909-Speed 5981.80 samples/sec Loss 1.6634 LearningRate 0.0013 Epoch: 18 Global Step: 196900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:53:01,766-Speed 5975.03 samples/sec Loss 1.6733 LearningRate 0.0013 Epoch: 18 Global Step: 196910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:53:08,652-Speed 5949.76 samples/sec Loss 1.6719 LearningRate 0.0013 Epoch: 18 Global Step: 196920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:53:15,511-Speed 5972.70 samples/sec Loss 1.6668 LearningRate 0.0013 Epoch: 18 Global Step: 196930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:53:22,360-Speed 5981.69 samples/sec Loss 1.6895 LearningRate 0.0013 Epoch: 18 Global Step: 196940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:53:29,210-Speed 5980.18 samples/sec Loss 1.6773 LearningRate 0.0012 Epoch: 18 Global Step: 196950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:53:36,057-Speed 5986.63 samples/sec Loss 1.6850 LearningRate 0.0012 Epoch: 18 Global Step: 196960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:53:42,909-Speed 5978.48 samples/sec Loss 1.6660 LearningRate 0.0012 Epoch: 18 Global Step: 196970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:53:49,773-Speed 5968.42 samples/sec Loss 1.6837 LearningRate 0.0012 Epoch: 18 Global Step: 196980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:53:56,622-Speed 5981.37 samples/sec Loss 1.6763 LearningRate 0.0012 Epoch: 18 Global Step: 196990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:54:03,468-Speed 5984.05 samples/sec Loss 1.6805 LearningRate 0.0012 Epoch: 18 Global Step: 197000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:54:10,311-Speed 5987.24 samples/sec Loss 1.6854 LearningRate 0.0012 Epoch: 18 Global Step: 197010 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:54:17,169-Speed 5973.54 samples/sec Loss 1.6723 LearningRate 0.0012 Epoch: 18 Global Step: 197020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:54:40,395-Speed 1763.62 samples/sec Loss 1.6555 LearningRate 0.0012 Epoch: 19 Global Step: 197030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:54:47,215-Speed 6008.94 samples/sec Loss 1.6754 LearningRate 0.0012 Epoch: 19 Global Step: 197040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:54:54,059-Speed 5985.95 samples/sec Loss 1.6617 LearningRate 0.0012 Epoch: 19 Global Step: 197050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:55:00,909-Speed 5980.89 samples/sec Loss 1.6777 LearningRate 0.0012 Epoch: 19 Global Step: 197060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:55:07,744-Speed 5993.81 samples/sec Loss 1.6860 LearningRate 0.0012 Epoch: 19 Global Step: 197070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:55:14,585-Speed 5988.88 samples/sec Loss 1.6748 LearningRate 0.0012 Epoch: 19 Global Step: 197080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:55:21,480-Speed 5953.31 samples/sec Loss 1.6620 LearningRate 0.0012 Epoch: 19 Global Step: 197090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:55:28,314-Speed 5994.97 samples/sec Loss 1.6439 LearningRate 0.0012 Epoch: 19 Global Step: 197100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:55:35,149-Speed 5993.48 samples/sec Loss 1.6449 LearningRate 0.0012 Epoch: 19 Global Step: 197110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:55:42,016-Speed 5966.42 samples/sec Loss 1.6692 LearningRate 0.0012 Epoch: 19 Global Step: 197120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:55:48,874-Speed 5974.01 samples/sec Loss 1.6711 LearningRate 0.0012 Epoch: 19 Global Step: 197130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:55:55,743-Speed 5963.83 samples/sec Loss 1.6286 LearningRate 0.0012 Epoch: 19 Global Step: 197140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:56:02,633-Speed 5946.34 samples/sec Loss 1.6375 LearningRate 0.0012 Epoch: 19 Global Step: 197150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:56:09,494-Speed 5971.21 samples/sec Loss 1.6625 LearningRate 0.0012 Epoch: 19 Global Step: 197160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:56:16,354-Speed 5972.64 samples/sec Loss 1.6626 LearningRate 0.0012 Epoch: 19 Global Step: 197170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:56:23,219-Speed 5967.17 samples/sec Loss 1.6596 LearningRate 0.0012 Epoch: 19 Global Step: 197180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:56:30,175-Speed 5891.34 samples/sec Loss 1.6279 LearningRate 0.0012 Epoch: 19 Global Step: 197190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:56:37,054-Speed 5955.71 samples/sec Loss 1.6381 LearningRate 0.0012 Epoch: 19 Global Step: 197200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:56:43,938-Speed 5951.01 samples/sec Loss 1.6318 LearningRate 0.0012 Epoch: 19 Global Step: 197210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:56:50,787-Speed 5981.68 samples/sec Loss 1.6446 LearningRate 0.0012 Epoch: 19 Global Step: 197220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:56:57,656-Speed 5964.34 samples/sec Loss 1.6106 LearningRate 0.0012 Epoch: 19 Global Step: 197230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:57:04,512-Speed 5975.59 samples/sec Loss 1.6629 LearningRate 0.0012 Epoch: 19 Global Step: 197240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:57:11,371-Speed 5972.67 samples/sec Loss 1.6403 LearningRate 0.0012 Epoch: 19 Global Step: 197250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:57:18,219-Speed 5985.18 samples/sec Loss 1.6476 LearningRate 0.0012 Epoch: 19 Global Step: 197260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:57:25,065-Speed 5984.41 samples/sec Loss 1.6260 LearningRate 0.0012 Epoch: 19 Global Step: 197270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:57:31,960-Speed 5941.65 samples/sec Loss 1.6383 LearningRate 0.0012 Epoch: 19 Global Step: 197280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:57:38,814-Speed 5977.51 samples/sec Loss 1.6413 LearningRate 0.0012 Epoch: 19 Global Step: 197290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:57:45,665-Speed 5978.94 samples/sec Loss 1.6378 LearningRate 0.0012 Epoch: 19 Global Step: 197300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:57:52,522-Speed 5974.96 samples/sec Loss 1.6518 LearningRate 0.0012 Epoch: 19 Global Step: 197310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:57:59,370-Speed 5982.45 samples/sec Loss 1.6792 LearningRate 0.0012 Epoch: 19 Global Step: 197320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:58:06,225-Speed 5976.26 samples/sec Loss 1.6328 LearningRate 0.0012 Epoch: 19 Global Step: 197330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:58:13,134-Speed 5929.31 samples/sec Loss 1.6555 LearningRate 0.0012 Epoch: 19 Global Step: 197340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:58:20,001-Speed 5965.97 samples/sec Loss 1.6415 LearningRate 0.0012 Epoch: 19 Global Step: 197350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:58:26,857-Speed 5976.22 samples/sec Loss 1.6878 LearningRate 0.0012 Epoch: 19 Global Step: 197360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:58:33,724-Speed 5966.65 samples/sec Loss 1.6525 LearningRate 0.0012 Epoch: 19 Global Step: 197370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:58:40,579-Speed 5976.23 samples/sec Loss 1.6505 LearningRate 0.0011 Epoch: 19 Global Step: 197380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:58:47,447-Speed 5964.89 samples/sec Loss 1.6568 LearningRate 0.0011 Epoch: 19 Global Step: 197390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 10:58:54,316-Speed 5964.84 samples/sec Loss 1.6495 LearningRate 0.0011 Epoch: 19 Global Step: 197400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:59:01,203-Speed 5948.88 samples/sec Loss 1.6294 LearningRate 0.0011 Epoch: 19 Global Step: 197410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:59:08,080-Speed 5956.77 samples/sec Loss 1.6652 LearningRate 0.0011 Epoch: 19 Global Step: 197420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:59:14,919-Speed 5990.92 samples/sec Loss 1.6386 LearningRate 0.0011 Epoch: 19 Global Step: 197430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:59:21,763-Speed 5985.82 samples/sec Loss 1.6645 LearningRate 0.0011 Epoch: 19 Global Step: 197440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:59:28,644-Speed 5953.73 samples/sec Loss 1.6629 LearningRate 0.0011 Epoch: 19 Global Step: 197450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:59:35,503-Speed 5972.18 samples/sec Loss 1.6520 LearningRate 0.0011 Epoch: 19 Global Step: 197460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:59:42,366-Speed 5969.21 samples/sec Loss 1.6514 LearningRate 0.0011 Epoch: 19 Global Step: 197470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:59:49,214-Speed 5981.97 samples/sec Loss 1.6347 LearningRate 0.0011 Epoch: 19 Global Step: 197480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 10:59:56,071-Speed 5975.50 samples/sec Loss 1.6581 LearningRate 0.0011 Epoch: 19 Global Step: 197490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:00:02,927-Speed 5974.71 samples/sec Loss 1.6695 LearningRate 0.0011 Epoch: 19 Global Step: 197500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:00:09,812-Speed 5949.96 samples/sec Loss 1.6474 LearningRate 0.0011 Epoch: 19 Global Step: 197510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:00:16,670-Speed 5974.23 samples/sec Loss 1.6485 LearningRate 0.0011 Epoch: 19 Global Step: 197520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:00:23,527-Speed 5975.10 samples/sec Loss 1.6319 LearningRate 0.0011 Epoch: 19 Global Step: 197530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:00:30,389-Speed 5969.90 samples/sec Loss 1.6199 LearningRate 0.0011 Epoch: 19 Global Step: 197540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:00:37,237-Speed 5982.78 samples/sec Loss 1.6115 LearningRate 0.0011 Epoch: 19 Global Step: 197550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:00:44,082-Speed 5984.89 samples/sec Loss 1.6421 LearningRate 0.0011 Epoch: 19 Global Step: 197560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:00:50,943-Speed 5971.56 samples/sec Loss 1.6374 LearningRate 0.0011 Epoch: 19 Global Step: 197570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:00:57,819-Speed 5958.33 samples/sec Loss 1.6439 LearningRate 0.0011 Epoch: 19 Global Step: 197580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:01:04,669-Speed 5980.29 samples/sec Loss 1.6757 LearningRate 0.0011 Epoch: 19 Global Step: 197590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:01:11,545-Speed 5958.17 samples/sec Loss 1.6711 LearningRate 0.0011 Epoch: 19 Global Step: 197600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:01:18,421-Speed 5958.29 samples/sec Loss 1.6437 LearningRate 0.0011 Epoch: 19 Global Step: 197610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:01:25,296-Speed 5959.45 samples/sec Loss 1.6436 LearningRate 0.0011 Epoch: 19 Global Step: 197620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:01:32,150-Speed 5977.03 samples/sec Loss 1.6614 LearningRate 0.0011 Epoch: 19 Global Step: 197630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:01:39,036-Speed 5949.14 samples/sec Loss 1.6468 LearningRate 0.0011 Epoch: 19 Global Step: 197640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:01:45,885-Speed 5982.11 samples/sec Loss 1.6203 LearningRate 0.0011 Epoch: 19 Global Step: 197650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:01:52,760-Speed 5958.68 samples/sec Loss 1.6262 LearningRate 0.0011 Epoch: 19 Global Step: 197660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:01:59,622-Speed 5970.81 samples/sec Loss 1.6491 LearningRate 0.0011 Epoch: 19 Global Step: 197670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:02:06,516-Speed 5942.63 samples/sec Loss 1.6188 LearningRate 0.0011 Epoch: 19 Global Step: 197680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:02:13,367-Speed 5979.99 samples/sec Loss 1.6454 LearningRate 0.0011 Epoch: 19 Global Step: 197690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:02:20,235-Speed 5965.46 samples/sec Loss 1.6275 LearningRate 0.0011 Epoch: 19 Global Step: 197700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:02:27,117-Speed 5952.39 samples/sec Loss 1.6420 LearningRate 0.0011 Epoch: 19 Global Step: 197710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:02:33,990-Speed 5962.85 samples/sec Loss 1.6197 LearningRate 0.0011 Epoch: 19 Global Step: 197720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:02:40,836-Speed 5984.22 samples/sec Loss 1.6502 LearningRate 0.0011 Epoch: 19 Global Step: 197730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:02:47,702-Speed 5966.79 samples/sec Loss 1.6319 LearningRate 0.0011 Epoch: 19 Global Step: 197740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:02:54,571-Speed 5964.56 samples/sec Loss 1.6257 LearningRate 0.0011 Epoch: 19 Global Step: 197750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:03:01,434-Speed 5969.59 samples/sec Loss 1.5994 LearningRate 0.0011 Epoch: 19 Global Step: 197760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:03:08,304-Speed 5964.09 samples/sec Loss 1.6708 LearningRate 0.0011 Epoch: 19 Global Step: 197770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:03:15,167-Speed 5969.26 samples/sec Loss 1.6316 LearningRate 0.0011 Epoch: 19 Global Step: 197780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:03:22,088-Speed 5920.48 samples/sec Loss 1.6243 LearningRate 0.0011 Epoch: 19 Global Step: 197790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:03:28,969-Speed 5953.59 samples/sec Loss 1.6188 LearningRate 0.0011 Epoch: 19 Global Step: 197800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:03:35,836-Speed 5966.08 samples/sec Loss 1.6303 LearningRate 0.0011 Epoch: 19 Global Step: 197810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:03:42,702-Speed 5967.40 samples/sec Loss 1.6556 LearningRate 0.0010 Epoch: 19 Global Step: 197820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:03:49,558-Speed 5975.36 samples/sec Loss 1.6501 LearningRate 0.0010 Epoch: 19 Global Step: 197830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:03:56,416-Speed 5973.41 samples/sec Loss 1.6492 LearningRate 0.0010 Epoch: 19 Global Step: 197840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:04:03,269-Speed 5978.26 samples/sec Loss 1.6516 LearningRate 0.0010 Epoch: 19 Global Step: 197850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:04:10,139-Speed 5963.18 samples/sec Loss 1.6266 LearningRate 0.0010 Epoch: 19 Global Step: 197860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:04:17,008-Speed 5963.87 samples/sec Loss 1.5936 LearningRate 0.0010 Epoch: 19 Global Step: 197870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:04:23,874-Speed 5972.38 samples/sec Loss 1.6376 LearningRate 0.0010 Epoch: 19 Global Step: 197880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:04:30,745-Speed 5962.90 samples/sec Loss 1.6287 LearningRate 0.0010 Epoch: 19 Global Step: 197890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:04:37,717-Speed 5876.24 samples/sec Loss 1.6258 LearningRate 0.0010 Epoch: 19 Global Step: 197900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:04:44,580-Speed 5969.79 samples/sec Loss 1.6130 LearningRate 0.0010 Epoch: 19 Global Step: 197910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:04:51,422-Speed 5988.74 samples/sec Loss 1.6379 LearningRate 0.0010 Epoch: 19 Global Step: 197920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:04:58,321-Speed 5937.81 samples/sec Loss 1.6336 LearningRate 0.0010 Epoch: 19 Global Step: 197930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:05:05,176-Speed 5978.14 samples/sec Loss 1.6379 LearningRate 0.0010 Epoch: 19 Global Step: 197940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:05:12,037-Speed 5972.15 samples/sec Loss 1.6234 LearningRate 0.0010 Epoch: 19 Global Step: 197950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:05:18,902-Speed 5967.12 samples/sec Loss 1.6220 LearningRate 0.0010 Epoch: 19 Global Step: 197960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:05:25,756-Speed 5977.17 samples/sec Loss 1.6355 LearningRate 0.0010 Epoch: 19 Global Step: 197970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:05:32,620-Speed 5968.93 samples/sec Loss 1.6160 LearningRate 0.0010 Epoch: 19 Global Step: 197980 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-09 11:05:39,476-Speed 5976.06 samples/sec Loss 1.6364 LearningRate 0.0010 Epoch: 19 Global Step: 197990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:05:46,321-Speed 5984.80 samples/sec Loss 1.6291 LearningRate 0.0010 Epoch: 19 Global Step: 198000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:05:53,169-Speed 5982.18 samples/sec Loss 1.6587 LearningRate 0.0010 Epoch: 19 Global Step: 198010 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:06:00,035-Speed 5967.12 samples/sec Loss 1.6330 LearningRate 0.0010 Epoch: 19 Global Step: 198020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:06:06,911-Speed 5958.07 samples/sec Loss 1.6317 LearningRate 0.0010 Epoch: 19 Global Step: 198030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:06:13,784-Speed 5960.98 samples/sec Loss 1.6536 LearningRate 0.0010 Epoch: 19 Global Step: 198040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:06:20,641-Speed 5974.30 samples/sec Loss 1.6242 LearningRate 0.0010 Epoch: 19 Global Step: 198050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:06:27,521-Speed 5954.68 samples/sec Loss 1.6367 LearningRate 0.0010 Epoch: 19 Global Step: 198060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:06:34,388-Speed 5965.63 samples/sec Loss 1.6225 LearningRate 0.0010 Epoch: 19 Global Step: 198070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:06:41,245-Speed 5974.79 samples/sec Loss 1.6235 LearningRate 0.0010 Epoch: 19 Global Step: 198080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:06:48,098-Speed 5977.83 samples/sec Loss 1.6086 LearningRate 0.0010 Epoch: 19 Global Step: 198090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:06:54,950-Speed 5979.78 samples/sec Loss 1.6186 LearningRate 0.0010 Epoch: 19 Global Step: 198100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:07:01,841-Speed 5944.75 samples/sec Loss 1.6113 LearningRate 0.0010 Epoch: 19 Global Step: 198110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:07:08,687-Speed 5984.37 samples/sec Loss 1.6415 LearningRate 0.0010 Epoch: 19 Global Step: 198120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:07:15,555-Speed 5964.96 samples/sec Loss 1.6238 LearningRate 0.0010 Epoch: 19 Global Step: 198130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:07:22,417-Speed 5970.53 samples/sec Loss 1.6276 LearningRate 0.0010 Epoch: 19 Global Step: 198140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:07:29,299-Speed 5953.35 samples/sec Loss 1.6334 LearningRate 0.0010 Epoch: 19 Global Step: 198150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:07:36,157-Speed 5973.56 samples/sec Loss 1.6593 LearningRate 0.0010 Epoch: 19 Global Step: 198160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:07:43,006-Speed 5981.98 samples/sec Loss 1.6317 LearningRate 0.0010 Epoch: 19 Global Step: 198170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:07:49,851-Speed 5985.26 samples/sec Loss 1.6210 LearningRate 0.0010 Epoch: 19 Global Step: 198180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:07:56,704-Speed 5978.05 samples/sec Loss 1.6107 LearningRate 0.0010 Epoch: 19 Global Step: 198190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:08:03,575-Speed 5963.23 samples/sec Loss 1.5958 LearningRate 0.0010 Epoch: 19 Global Step: 198200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:08:10,424-Speed 5981.55 samples/sec Loss 1.6519 LearningRate 0.0010 Epoch: 19 Global Step: 198210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:08:17,312-Speed 5947.66 samples/sec Loss 1.6077 LearningRate 0.0010 Epoch: 19 Global Step: 198220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:08:24,175-Speed 5969.59 samples/sec Loss 1.6037 LearningRate 0.0010 Epoch: 19 Global Step: 198230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:08:31,027-Speed 5980.90 samples/sec Loss 1.6306 LearningRate 0.0010 Epoch: 19 Global Step: 198240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:08:37,882-Speed 5976.56 samples/sec Loss 1.6256 LearningRate 0.0010 Epoch: 19 Global Step: 198250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:08:44,777-Speed 5941.24 samples/sec Loss 1.6471 LearningRate 0.0010 Epoch: 19 Global Step: 198260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:08:51,639-Speed 5970.27 samples/sec Loss 1.6248 LearningRate 0.0010 Epoch: 19 Global Step: 198270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:08:58,494-Speed 5976.35 samples/sec Loss 1.6193 LearningRate 0.0010 Epoch: 19 Global Step: 198280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:09:05,373-Speed 5955.74 samples/sec Loss 1.5961 LearningRate 0.0009 Epoch: 19 Global Step: 198290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:09:12,251-Speed 5956.80 samples/sec Loss 1.6094 LearningRate 0.0009 Epoch: 19 Global Step: 198300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:09:19,136-Speed 5949.77 samples/sec Loss 1.5846 LearningRate 0.0009 Epoch: 19 Global Step: 198310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:09:25,988-Speed 5978.50 samples/sec Loss 1.6093 LearningRate 0.0009 Epoch: 19 Global Step: 198320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:09:32,856-Speed 5965.48 samples/sec Loss 1.6465 LearningRate 0.0009 Epoch: 19 Global Step: 198330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:09:39,711-Speed 5978.85 samples/sec Loss 1.5720 LearningRate 0.0009 Epoch: 19 Global Step: 198340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:09:46,587-Speed 5957.73 samples/sec Loss 1.6154 LearningRate 0.0009 Epoch: 19 Global Step: 198350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:09:53,456-Speed 5964.38 samples/sec Loss 1.6294 LearningRate 0.0009 Epoch: 19 Global Step: 198360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:10:00,322-Speed 5967.94 samples/sec Loss 1.5990 LearningRate 0.0009 Epoch: 19 Global Step: 198370 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-09 11:10:07,166-Speed 5985.48 samples/sec Loss 1.6204 LearningRate 0.0009 Epoch: 19 Global Step: 198380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:10:14,009-Speed 5987.50 samples/sec Loss 1.6429 LearningRate 0.0009 Epoch: 19 Global Step: 198390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:10:20,853-Speed 5987.47 samples/sec Loss 1.6207 LearningRate 0.0009 Epoch: 19 Global Step: 198400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:10:27,707-Speed 5977.01 samples/sec Loss 1.6190 LearningRate 0.0009 Epoch: 19 Global Step: 198410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:10:34,571-Speed 5970.31 samples/sec Loss 1.6015 LearningRate 0.0009 Epoch: 19 Global Step: 198420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:10:41,425-Speed 5977.47 samples/sec Loss 1.6139 LearningRate 0.0009 Epoch: 19 Global Step: 198430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:10:48,303-Speed 5955.75 samples/sec Loss 1.6456 LearningRate 0.0009 Epoch: 19 Global Step: 198440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:10:55,163-Speed 5972.44 samples/sec Loss 1.6292 LearningRate 0.0009 Epoch: 19 Global Step: 198450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:11:02,017-Speed 5976.63 samples/sec Loss 1.6162 LearningRate 0.0009 Epoch: 19 Global Step: 198460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:11:08,864-Speed 5983.21 samples/sec Loss 1.5901 LearningRate 0.0009 Epoch: 19 Global Step: 198470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:11:15,717-Speed 5979.27 samples/sec Loss 1.6444 LearningRate 0.0009 Epoch: 19 Global Step: 198480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:11:22,562-Speed 5985.09 samples/sec Loss 1.6207 LearningRate 0.0009 Epoch: 19 Global Step: 198490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:11:29,413-Speed 5980.09 samples/sec Loss 1.5993 LearningRate 0.0009 Epoch: 19 Global Step: 198500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:11:36,278-Speed 5967.85 samples/sec Loss 1.6203 LearningRate 0.0009 Epoch: 19 Global Step: 198510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:11:43,141-Speed 5972.24 samples/sec Loss 1.6277 LearningRate 0.0009 Epoch: 19 Global Step: 198520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:11:49,977-Speed 5992.53 samples/sec Loss 1.6110 LearningRate 0.0009 Epoch: 19 Global Step: 198530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:11:56,824-Speed 5983.34 samples/sec Loss 1.6299 LearningRate 0.0009 Epoch: 19 Global Step: 198540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:12:03,687-Speed 5970.76 samples/sec Loss 1.6273 LearningRate 0.0009 Epoch: 19 Global Step: 198550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:12:10,545-Speed 5973.43 samples/sec Loss 1.6002 LearningRate 0.0009 Epoch: 19 Global Step: 198560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:12:17,401-Speed 5976.70 samples/sec Loss 1.6235 LearningRate 0.0009 Epoch: 19 Global Step: 198570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:12:24,264-Speed 5968.83 samples/sec Loss 1.6136 LearningRate 0.0009 Epoch: 19 Global Step: 198580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:12:31,122-Speed 5974.54 samples/sec Loss 1.5873 LearningRate 0.0009 Epoch: 19 Global Step: 198590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:12:37,983-Speed 5974.16 samples/sec Loss 1.6231 LearningRate 0.0009 Epoch: 19 Global Step: 198600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:12:44,832-Speed 5981.65 samples/sec Loss 1.6132 LearningRate 0.0009 Epoch: 19 Global Step: 198610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:12:51,687-Speed 5976.36 samples/sec Loss 1.6163 LearningRate 0.0009 Epoch: 19 Global Step: 198620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:12:58,552-Speed 5968.40 samples/sec Loss 1.6081 LearningRate 0.0009 Epoch: 19 Global Step: 198630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:13:05,400-Speed 5982.33 samples/sec Loss 1.5962 LearningRate 0.0009 Epoch: 19 Global Step: 198640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:13:12,257-Speed 5974.57 samples/sec Loss 1.6316 LearningRate 0.0009 Epoch: 19 Global Step: 198650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:13:19,121-Speed 5968.07 samples/sec Loss 1.6080 LearningRate 0.0009 Epoch: 19 Global Step: 198660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:13:25,986-Speed 5968.32 samples/sec Loss 1.6096 LearningRate 0.0009 Epoch: 19 Global Step: 198670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:13:32,828-Speed 5989.34 samples/sec Loss 1.6559 LearningRate 0.0009 Epoch: 19 Global Step: 198680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:13:39,667-Speed 5990.06 samples/sec Loss 1.6196 LearningRate 0.0009 Epoch: 19 Global Step: 198690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:13:46,669-Speed 5852.55 samples/sec Loss 1.6055 LearningRate 0.0009 Epoch: 19 Global Step: 198700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:13:53,620-Speed 5893.63 samples/sec Loss 1.6319 LearningRate 0.0009 Epoch: 19 Global Step: 198710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:14:00,582-Speed 5884.48 samples/sec Loss 1.5941 LearningRate 0.0009 Epoch: 19 Global Step: 198720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:14:07,445-Speed 5969.64 samples/sec Loss 1.5862 LearningRate 0.0009 Epoch: 19 Global Step: 198730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:14:14,303-Speed 5975.99 samples/sec Loss 1.5943 LearningRate 0.0009 Epoch: 19 Global Step: 198740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:14:21,170-Speed 5966.16 samples/sec Loss 1.5957 LearningRate 0.0009 Epoch: 19 Global Step: 198750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:14:28,026-Speed 5976.30 samples/sec Loss 1.6007 LearningRate 0.0009 Epoch: 19 Global Step: 198760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:14:34,867-Speed 5988.00 samples/sec Loss 1.6084 LearningRate 0.0009 Epoch: 19 Global Step: 198770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:14:41,726-Speed 5974.43 samples/sec Loss 1.5994 LearningRate 0.0008 Epoch: 19 Global Step: 198780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:14:48,570-Speed 5985.22 samples/sec Loss 1.6331 LearningRate 0.0008 Epoch: 19 Global Step: 198790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:14:55,421-Speed 5980.21 samples/sec Loss 1.6011 LearningRate 0.0008 Epoch: 19 Global Step: 198800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:15:02,274-Speed 5977.96 samples/sec Loss 1.6198 LearningRate 0.0008 Epoch: 19 Global Step: 198810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:15:09,116-Speed 5987.72 samples/sec Loss 1.6167 LearningRate 0.0008 Epoch: 19 Global Step: 198820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:15:15,949-Speed 5995.23 samples/sec Loss 1.6204 LearningRate 0.0008 Epoch: 19 Global Step: 198830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:15:22,796-Speed 5983.80 samples/sec Loss 1.5832 LearningRate 0.0008 Epoch: 19 Global Step: 198840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:15:29,642-Speed 5983.77 samples/sec Loss 1.6080 LearningRate 0.0008 Epoch: 19 Global Step: 198850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:15:36,500-Speed 5973.98 samples/sec Loss 1.6121 LearningRate 0.0008 Epoch: 19 Global Step: 198860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:15:43,355-Speed 5976.80 samples/sec Loss 1.6132 LearningRate 0.0008 Epoch: 19 Global Step: 198870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:15:50,224-Speed 5965.06 samples/sec Loss 1.5926 LearningRate 0.0008 Epoch: 19 Global Step: 198880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:15:57,078-Speed 5977.56 samples/sec Loss 1.6076 LearningRate 0.0008 Epoch: 19 Global Step: 198890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:16:03,927-Speed 5980.87 samples/sec Loss 1.5986 LearningRate 0.0008 Epoch: 19 Global Step: 198900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:16:10,777-Speed 5980.17 samples/sec Loss 1.6031 LearningRate 0.0008 Epoch: 19 Global Step: 198910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:16:17,622-Speed 5985.50 samples/sec Loss 1.6084 LearningRate 0.0008 Epoch: 19 Global Step: 198920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:16:24,464-Speed 5987.77 samples/sec Loss 1.6028 LearningRate 0.0008 Epoch: 19 Global Step: 198930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:16:31,314-Speed 5980.47 samples/sec Loss 1.5922 LearningRate 0.0008 Epoch: 19 Global Step: 198940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:16:38,189-Speed 5959.10 samples/sec Loss 1.5944 LearningRate 0.0008 Epoch: 19 Global Step: 198950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:16:45,066-Speed 5957.36 samples/sec Loss 1.6017 LearningRate 0.0008 Epoch: 19 Global Step: 198960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:16:51,943-Speed 5957.09 samples/sec Loss 1.6238 LearningRate 0.0008 Epoch: 19 Global Step: 198970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:16:58,785-Speed 5988.25 samples/sec Loss 1.5722 LearningRate 0.0008 Epoch: 19 Global Step: 198980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:17:05,655-Speed 5963.25 samples/sec Loss 1.5570 LearningRate 0.0008 Epoch: 19 Global Step: 198990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:17:12,579-Speed 5917.30 samples/sec Loss 1.6120 LearningRate 0.0008 Epoch: 19 Global Step: 199000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:17:19,484-Speed 5932.74 samples/sec Loss 1.6020 LearningRate 0.0008 Epoch: 19 Global Step: 199010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:17:26,336-Speed 5981.51 samples/sec Loss 1.5936 LearningRate 0.0008 Epoch: 19 Global Step: 199020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:17:33,180-Speed 5986.13 samples/sec Loss 1.6048 LearningRate 0.0008 Epoch: 19 Global Step: 199030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:17:40,036-Speed 5976.02 samples/sec Loss 1.5968 LearningRate 0.0008 Epoch: 19 Global Step: 199040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:17:46,894-Speed 5973.82 samples/sec Loss 1.6015 LearningRate 0.0008 Epoch: 19 Global Step: 199050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:17:53,741-Speed 5982.99 samples/sec Loss 1.6093 LearningRate 0.0008 Epoch: 19 Global Step: 199060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:18:00,585-Speed 5985.17 samples/sec Loss 1.6191 LearningRate 0.0008 Epoch: 19 Global Step: 199070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:18:07,442-Speed 5975.19 samples/sec Loss 1.5763 LearningRate 0.0008 Epoch: 19 Global Step: 199080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:18:14,295-Speed 5977.30 samples/sec Loss 1.6159 LearningRate 0.0008 Epoch: 19 Global Step: 199090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:18:21,146-Speed 5982.39 samples/sec Loss 1.6180 LearningRate 0.0008 Epoch: 19 Global Step: 199100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:18:28,029-Speed 5952.17 samples/sec Loss 1.6095 LearningRate 0.0008 Epoch: 19 Global Step: 199110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:18:34,871-Speed 5986.68 samples/sec Loss 1.6054 LearningRate 0.0008 Epoch: 19 Global Step: 199120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:18:41,724-Speed 5978.14 samples/sec Loss 1.6045 LearningRate 0.0008 Epoch: 19 Global Step: 199130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:18:48,589-Speed 5968.49 samples/sec Loss 1.5916 LearningRate 0.0008 Epoch: 19 Global Step: 199140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:18:55,540-Speed 5893.63 samples/sec Loss 1.5965 LearningRate 0.0008 Epoch: 19 Global Step: 199150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:19:02,510-Speed 5877.78 samples/sec Loss 1.6098 LearningRate 0.0008 Epoch: 19 Global Step: 199160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:19:09,374-Speed 5968.63 samples/sec Loss 1.5956 LearningRate 0.0008 Epoch: 19 Global Step: 199170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:19:16,222-Speed 5982.76 samples/sec Loss 1.5780 LearningRate 0.0008 Epoch: 19 Global Step: 199180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:19:23,067-Speed 5985.47 samples/sec Loss 1.5825 LearningRate 0.0008 Epoch: 19 Global Step: 199190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:19:29,933-Speed 5966.48 samples/sec Loss 1.5962 LearningRate 0.0008 Epoch: 19 Global Step: 199200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:19:36,800-Speed 5966.01 samples/sec Loss 1.5916 LearningRate 0.0008 Epoch: 19 Global Step: 199210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:19:43,685-Speed 5951.03 samples/sec Loss 1.5780 LearningRate 0.0008 Epoch: 19 Global Step: 199220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:19:50,534-Speed 5981.26 samples/sec Loss 1.6039 LearningRate 0.0008 Epoch: 19 Global Step: 199230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:19:57,413-Speed 5956.25 samples/sec Loss 1.6030 LearningRate 0.0008 Epoch: 19 Global Step: 199240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:20:04,273-Speed 5971.47 samples/sec Loss 1.5831 LearningRate 0.0008 Epoch: 19 Global Step: 199250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:20:11,115-Speed 5987.44 samples/sec Loss 1.5902 LearningRate 0.0008 Epoch: 19 Global Step: 199260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:20:17,984-Speed 5964.34 samples/sec Loss 1.6051 LearningRate 0.0008 Epoch: 19 Global Step: 199270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:20:24,840-Speed 5975.52 samples/sec Loss 1.5660 LearningRate 0.0008 Epoch: 19 Global Step: 199280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:20:31,711-Speed 5963.99 samples/sec Loss 1.6402 LearningRate 0.0008 Epoch: 19 Global Step: 199290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:20:38,579-Speed 5964.73 samples/sec Loss 1.5590 LearningRate 0.0007 Epoch: 19 Global Step: 199300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:20:45,455-Speed 5958.35 samples/sec Loss 1.5908 LearningRate 0.0007 Epoch: 19 Global Step: 199310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:20:52,311-Speed 5974.85 samples/sec Loss 1.5784 LearningRate 0.0007 Epoch: 19 Global Step: 199320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:20:59,192-Speed 5954.44 samples/sec Loss 1.5782 LearningRate 0.0007 Epoch: 19 Global Step: 199330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:21:06,047-Speed 5976.28 samples/sec Loss 1.6191 LearningRate 0.0007 Epoch: 19 Global Step: 199340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:21:12,904-Speed 5974.74 samples/sec Loss 1.6094 LearningRate 0.0007 Epoch: 19 Global Step: 199350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:21:19,750-Speed 5983.97 samples/sec Loss 1.5989 LearningRate 0.0007 Epoch: 19 Global Step: 199360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:21:26,622-Speed 5961.95 samples/sec Loss 1.5815 LearningRate 0.0007 Epoch: 19 Global Step: 199370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:21:33,476-Speed 5976.95 samples/sec Loss 1.5950 LearningRate 0.0007 Epoch: 19 Global Step: 199380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:21:40,362-Speed 5949.31 samples/sec Loss 1.6008 LearningRate 0.0007 Epoch: 19 Global Step: 199390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:21:47,207-Speed 5986.26 samples/sec Loss 1.6180 LearningRate 0.0007 Epoch: 19 Global Step: 199400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:21:54,082-Speed 5960.67 samples/sec Loss 1.5548 LearningRate 0.0007 Epoch: 19 Global Step: 199410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:22:00,945-Speed 5968.77 samples/sec Loss 1.5759 LearningRate 0.0007 Epoch: 19 Global Step: 199420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:22:07,804-Speed 5973.17 samples/sec Loss 1.5991 LearningRate 0.0007 Epoch: 19 Global Step: 199430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:22:14,667-Speed 5969.36 samples/sec Loss 1.5727 LearningRate 0.0007 Epoch: 19 Global Step: 199440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:22:21,560-Speed 5942.73 samples/sec Loss 1.5575 LearningRate 0.0007 Epoch: 19 Global Step: 199450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:22:28,422-Speed 5970.25 samples/sec Loss 1.6270 LearningRate 0.0007 Epoch: 19 Global Step: 199460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:22:35,291-Speed 5964.42 samples/sec Loss 1.5874 LearningRate 0.0007 Epoch: 19 Global Step: 199470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:22:42,146-Speed 5976.76 samples/sec Loss 1.6117 LearningRate 0.0007 Epoch: 19 Global Step: 199480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:22:49,035-Speed 5949.66 samples/sec Loss 1.5919 LearningRate 0.0007 Epoch: 19 Global Step: 199490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:22:55,890-Speed 5978.15 samples/sec Loss 1.5841 LearningRate 0.0007 Epoch: 19 Global Step: 199500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:23:02,747-Speed 5974.11 samples/sec Loss 1.5963 LearningRate 0.0007 Epoch: 19 Global Step: 199510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:23:09,601-Speed 5977.36 samples/sec Loss 1.5798 LearningRate 0.0007 Epoch: 19 Global Step: 199520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:23:16,474-Speed 5961.63 samples/sec Loss 1.5981 LearningRate 0.0007 Epoch: 19 Global Step: 199530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:23:23,371-Speed 5940.12 samples/sec Loss 1.6100 LearningRate 0.0007 Epoch: 19 Global Step: 199540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:23:30,228-Speed 5974.67 samples/sec Loss 1.6090 LearningRate 0.0007 Epoch: 19 Global Step: 199550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:23:37,097-Speed 5964.41 samples/sec Loss 1.5537 LearningRate 0.0007 Epoch: 19 Global Step: 199560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:23:43,952-Speed 5976.12 samples/sec Loss 1.5876 LearningRate 0.0007 Epoch: 19 Global Step: 199570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:23:50,823-Speed 5964.29 samples/sec Loss 1.5981 LearningRate 0.0007 Epoch: 19 Global Step: 199580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:23:57,793-Speed 5877.89 samples/sec Loss 1.5812 LearningRate 0.0007 Epoch: 19 Global Step: 199590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:24:04,646-Speed 5978.16 samples/sec Loss 1.5998 LearningRate 0.0007 Epoch: 19 Global Step: 199600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:24:11,500-Speed 5976.83 samples/sec Loss 1.5706 LearningRate 0.0007 Epoch: 19 Global Step: 199610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:24:18,337-Speed 5992.07 samples/sec Loss 1.5802 LearningRate 0.0007 Epoch: 19 Global Step: 199620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:24:25,226-Speed 5946.75 samples/sec Loss 1.5929 LearningRate 0.0007 Epoch: 19 Global Step: 199630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:24:32,085-Speed 5973.23 samples/sec Loss 1.5835 LearningRate 0.0007 Epoch: 19 Global Step: 199640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:24:38,980-Speed 5942.31 samples/sec Loss 1.5677 LearningRate 0.0007 Epoch: 19 Global Step: 199650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:24:45,846-Speed 5966.26 samples/sec Loss 1.5873 LearningRate 0.0007 Epoch: 19 Global Step: 199660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:24:52,685-Speed 5990.28 samples/sec Loss 1.5799 LearningRate 0.0007 Epoch: 19 Global Step: 199670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:24:59,566-Speed 5953.57 samples/sec Loss 1.5857 LearningRate 0.0007 Epoch: 19 Global Step: 199680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:25:06,454-Speed 5947.91 samples/sec Loss 1.5868 LearningRate 0.0007 Epoch: 19 Global Step: 199690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:25:13,317-Speed 5970.84 samples/sec Loss 1.5704 LearningRate 0.0007 Epoch: 19 Global Step: 199700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 11:25:20,177-Speed 5971.57 samples/sec Loss 1.5993 LearningRate 0.0007 Epoch: 19 Global Step: 199710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:25:27,025-Speed 5985.10 samples/sec Loss 1.5915 LearningRate 0.0007 Epoch: 19 Global Step: 199720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 11:25:33,905-Speed 5955.48 samples/sec Loss 1.5920 LearningRate 0.0007 Epoch: 19 Global Step: 199730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:25:40,870-Speed 5881.99 samples/sec Loss 1.5801 LearningRate 0.0007 Epoch: 19 Global Step: 199740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:25:47,733-Speed 5969.91 samples/sec Loss 1.5621 LearningRate 0.0007 Epoch: 19 Global Step: 199750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:25:54,581-Speed 5982.46 samples/sec Loss 1.5788 LearningRate 0.0007 Epoch: 19 Global Step: 199760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:26:07,181-Speed 3251.22 samples/sec Loss 1.5546 LearningRate 0.0007 Epoch: 19 Global Step: 199770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:26:14,024-Speed 5987.28 samples/sec Loss 1.5962 LearningRate 0.0007 Epoch: 19 Global Step: 199780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:26:20,866-Speed 5987.47 samples/sec Loss 1.5995 LearningRate 0.0007 Epoch: 19 Global Step: 199790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:26:27,729-Speed 5969.33 samples/sec Loss 1.5545 LearningRate 0.0007 Epoch: 19 Global Step: 199800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:26:34,568-Speed 5990.35 samples/sec Loss 1.5918 LearningRate 0.0007 Epoch: 19 Global Step: 199810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:26:41,419-Speed 5979.91 samples/sec Loss 1.5775 LearningRate 0.0007 Epoch: 19 Global Step: 199820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:26:48,263-Speed 5984.84 samples/sec Loss 1.5654 LearningRate 0.0007 Epoch: 19 Global Step: 199830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:26:55,109-Speed 5984.45 samples/sec Loss 1.5702 LearningRate 0.0007 Epoch: 19 Global Step: 199840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:27:01,957-Speed 5983.09 samples/sec Loss 1.5982 LearningRate 0.0007 Epoch: 19 Global Step: 199850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:27:08,890-Speed 5908.90 samples/sec Loss 1.5821 LearningRate 0.0006 Epoch: 19 Global Step: 199860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:27:15,764-Speed 5959.85 samples/sec Loss 1.5808 LearningRate 0.0006 Epoch: 19 Global Step: 199870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:27:22,617-Speed 5978.31 samples/sec Loss 1.5932 LearningRate 0.0006 Epoch: 19 Global Step: 199880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:27:29,472-Speed 5976.00 samples/sec Loss 1.5849 LearningRate 0.0006 Epoch: 19 Global Step: 199890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:27:36,325-Speed 5977.99 samples/sec Loss 1.5932 LearningRate 0.0006 Epoch: 19 Global Step: 199900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:27:43,185-Speed 5972.37 samples/sec Loss 1.5581 LearningRate 0.0006 Epoch: 19 Global Step: 199910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:27:50,061-Speed 5957.17 samples/sec Loss 1.5563 LearningRate 0.0006 Epoch: 19 Global Step: 199920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:27:56,932-Speed 5962.99 samples/sec Loss 1.5771 LearningRate 0.0006 Epoch: 19 Global Step: 199930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:28:03,790-Speed 5973.63 samples/sec Loss 1.6112 LearningRate 0.0006 Epoch: 19 Global Step: 199940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:28:10,656-Speed 5965.75 samples/sec Loss 1.5818 LearningRate 0.0006 Epoch: 19 Global Step: 199950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:28:17,506-Speed 5981.41 samples/sec Loss 1.5770 LearningRate 0.0006 Epoch: 19 Global Step: 199960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:28:24,359-Speed 5979.35 samples/sec Loss 1.5902 LearningRate 0.0006 Epoch: 19 Global Step: 199970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:28:31,223-Speed 5968.35 samples/sec Loss 1.5844 LearningRate 0.0006 Epoch: 19 Global Step: 199980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:28:38,077-Speed 5978.17 samples/sec Loss 1.6035 LearningRate 0.0006 Epoch: 19 Global Step: 199990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:28:44,940-Speed 5969.11 samples/sec Loss 1.5668 LearningRate 0.0006 Epoch: 19 Global Step: 200000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:29:11,620-[lfw][200000]XNorm: 23.359737 Training: 2022-01-09 11:29:11,620-[lfw][200000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-01-09 11:29:11,621-[lfw][200000]Accuracy-Highest: 0.99833 Training: 2022-01-09 11:29:42,551-[cfp_fp][200000]XNorm: 21.488663 Training: 2022-01-09 11:29:42,552-[cfp_fp][200000]Accuracy-Flip: 0.99243+-0.00320 Training: 2022-01-09 11:29:42,553-[cfp_fp][200000]Accuracy-Highest: 0.99286 Training: 2022-01-09 11:30:09,161-[agedb_30][200000]XNorm: 22.877173 Training: 2022-01-09 11:30:09,161-[agedb_30][200000]Accuracy-Flip: 0.98300+-0.00515 Training: 2022-01-09 11:30:09,162-[agedb_30][200000]Accuracy-Highest: 0.98300 Training: 2022-01-09 11:30:16,016-Speed 449.74 samples/sec Loss 1.5715 LearningRate 0.0006 Epoch: 19 Global Step: 200010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:30:22,851-Speed 5994.20 samples/sec Loss 1.5735 LearningRate 0.0006 Epoch: 19 Global Step: 200020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:30:29,700-Speed 5982.53 samples/sec Loss 1.5884 LearningRate 0.0006 Epoch: 19 Global Step: 200030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:30:36,581-Speed 5954.30 samples/sec Loss 1.5814 LearningRate 0.0006 Epoch: 19 Global Step: 200040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:30:43,436-Speed 5976.01 samples/sec Loss 1.5978 LearningRate 0.0006 Epoch: 19 Global Step: 200050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:30:50,275-Speed 5990.13 samples/sec Loss 1.5836 LearningRate 0.0006 Epoch: 19 Global Step: 200060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:30:57,119-Speed 5985.76 samples/sec Loss 1.5743 LearningRate 0.0006 Epoch: 19 Global Step: 200070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:31:03,961-Speed 5987.75 samples/sec Loss 1.5940 LearningRate 0.0006 Epoch: 19 Global Step: 200080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:31:10,830-Speed 5963.81 samples/sec Loss 1.5589 LearningRate 0.0006 Epoch: 19 Global Step: 200090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:31:17,675-Speed 5985.45 samples/sec Loss 1.5705 LearningRate 0.0006 Epoch: 19 Global Step: 200100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:31:24,517-Speed 5987.66 samples/sec Loss 1.5858 LearningRate 0.0006 Epoch: 19 Global Step: 200110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:31:31,358-Speed 5988.53 samples/sec Loss 1.5636 LearningRate 0.0006 Epoch: 19 Global Step: 200120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:31:38,208-Speed 5980.90 samples/sec Loss 1.5990 LearningRate 0.0006 Epoch: 19 Global Step: 200130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:31:45,076-Speed 5964.71 samples/sec Loss 1.5777 LearningRate 0.0006 Epoch: 19 Global Step: 200140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:31:55,229-Speed 6001.77 samples/sec Loss 1.5498 LearningRate 0.0006 Epoch: 19 Global Step: 200150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:32:02,070-Speed 5988.60 samples/sec Loss 1.5696 LearningRate 0.0006 Epoch: 19 Global Step: 200160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:32:08,900-Speed 5998.41 samples/sec Loss 1.5582 LearningRate 0.0006 Epoch: 19 Global Step: 200170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:32:15,737-Speed 5992.06 samples/sec Loss 1.5902 LearningRate 0.0006 Epoch: 19 Global Step: 200180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:32:22,585-Speed 5982.00 samples/sec Loss 1.5738 LearningRate 0.0006 Epoch: 19 Global Step: 200190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:32:29,420-Speed 5994.36 samples/sec Loss 1.5907 LearningRate 0.0006 Epoch: 19 Global Step: 200200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:32:36,269-Speed 5984.14 samples/sec Loss 1.5525 LearningRate 0.0006 Epoch: 19 Global Step: 200210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:32:43,131-Speed 5970.70 samples/sec Loss 1.5817 LearningRate 0.0006 Epoch: 19 Global Step: 200220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:32:49,981-Speed 5980.37 samples/sec Loss 1.5569 LearningRate 0.0006 Epoch: 19 Global Step: 200230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:32:56,834-Speed 5979.22 samples/sec Loss 1.5855 LearningRate 0.0006 Epoch: 19 Global Step: 200240 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 11:33:03,665-Speed 5997.43 samples/sec Loss 1.5742 LearningRate 0.0006 Epoch: 19 Global Step: 200250 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 11:33:10,511-Speed 5984.02 samples/sec Loss 1.5227 LearningRate 0.0006 Epoch: 19 Global Step: 200260 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 11:33:17,346-Speed 5994.16 samples/sec Loss 1.5615 LearningRate 0.0006 Epoch: 19 Global Step: 200270 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 11:33:24,183-Speed 5991.85 samples/sec Loss 1.5727 LearningRate 0.0006 Epoch: 19 Global Step: 200280 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 11:33:31,032-Speed 5982.84 samples/sec Loss 1.5422 LearningRate 0.0006 Epoch: 19 Global Step: 200290 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 11:33:37,874-Speed 5988.03 samples/sec Loss 1.5631 LearningRate 0.0006 Epoch: 19 Global Step: 200300 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 11:33:44,710-Speed 5992.69 samples/sec Loss 1.5616 LearningRate 0.0006 Epoch: 19 Global Step: 200310 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 11:33:51,577-Speed 5965.92 samples/sec Loss 1.5887 LearningRate 0.0006 Epoch: 19 Global Step: 200320 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 11:33:58,443-Speed 5966.72 samples/sec Loss 1.5746 LearningRate 0.0006 Epoch: 19 Global Step: 200330 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 11:34:05,281-Speed 5990.92 samples/sec Loss 1.5633 LearningRate 0.0006 Epoch: 19 Global Step: 200340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:34:12,116-Speed 5993.66 samples/sec Loss 1.5620 LearningRate 0.0006 Epoch: 19 Global Step: 200350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:34:18,980-Speed 5968.14 samples/sec Loss 1.5726 LearningRate 0.0006 Epoch: 19 Global Step: 200360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:34:25,819-Speed 5990.76 samples/sec Loss 1.5785 LearningRate 0.0006 Epoch: 19 Global Step: 200370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:34:32,668-Speed 5981.84 samples/sec Loss 1.5437 LearningRate 0.0006 Epoch: 19 Global Step: 200380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:34:39,526-Speed 5973.56 samples/sec Loss 1.5666 LearningRate 0.0006 Epoch: 19 Global Step: 200390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:34:46,391-Speed 5967.24 samples/sec Loss 1.5596 LearningRate 0.0006 Epoch: 19 Global Step: 200400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:34:53,245-Speed 5977.55 samples/sec Loss 1.5702 LearningRate 0.0006 Epoch: 19 Global Step: 200410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:35:00,089-Speed 5985.40 samples/sec Loss 1.5919 LearningRate 0.0006 Epoch: 19 Global Step: 200420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:35:06,937-Speed 5982.45 samples/sec Loss 1.5536 LearningRate 0.0006 Epoch: 19 Global Step: 200430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:35:13,798-Speed 5970.46 samples/sec Loss 1.5563 LearningRate 0.0006 Epoch: 19 Global Step: 200440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:35:20,664-Speed 5967.33 samples/sec Loss 1.5496 LearningRate 0.0006 Epoch: 19 Global Step: 200450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:35:27,540-Speed 5959.53 samples/sec Loss 1.5813 LearningRate 0.0005 Epoch: 19 Global Step: 200460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:35:34,381-Speed 5988.65 samples/sec Loss 1.5833 LearningRate 0.0005 Epoch: 19 Global Step: 200470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:35:41,228-Speed 5982.92 samples/sec Loss 1.5454 LearningRate 0.0005 Epoch: 19 Global Step: 200480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:35:48,068-Speed 5989.82 samples/sec Loss 1.5714 LearningRate 0.0005 Epoch: 19 Global Step: 200490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:35:54,927-Speed 5972.96 samples/sec Loss 1.5702 LearningRate 0.0005 Epoch: 19 Global Step: 200500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:36:01,785-Speed 5973.43 samples/sec Loss 1.5730 LearningRate 0.0005 Epoch: 19 Global Step: 200510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:36:08,634-Speed 5982.60 samples/sec Loss 1.5530 LearningRate 0.0005 Epoch: 19 Global Step: 200520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:36:15,561-Speed 5913.71 samples/sec Loss 1.5444 LearningRate 0.0005 Epoch: 19 Global Step: 200530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:36:22,438-Speed 5958.73 samples/sec Loss 1.5966 LearningRate 0.0005 Epoch: 19 Global Step: 200540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:36:29,290-Speed 5980.61 samples/sec Loss 1.5792 LearningRate 0.0005 Epoch: 19 Global Step: 200550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:36:36,129-Speed 5989.85 samples/sec Loss 1.5595 LearningRate 0.0005 Epoch: 19 Global Step: 200560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:36:42,995-Speed 5966.69 samples/sec Loss 1.5571 LearningRate 0.0005 Epoch: 19 Global Step: 200570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:36:49,859-Speed 5968.58 samples/sec Loss 1.5724 LearningRate 0.0005 Epoch: 19 Global Step: 200580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:36:56,702-Speed 5987.86 samples/sec Loss 1.5849 LearningRate 0.0005 Epoch: 19 Global Step: 200590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:37:03,546-Speed 5987.17 samples/sec Loss 1.5743 LearningRate 0.0005 Epoch: 19 Global Step: 200600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:37:10,396-Speed 5980.42 samples/sec Loss 1.5617 LearningRate 0.0005 Epoch: 19 Global Step: 200610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:37:17,234-Speed 5990.89 samples/sec Loss 1.5659 LearningRate 0.0005 Epoch: 19 Global Step: 200620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:37:24,200-Speed 5881.09 samples/sec Loss 1.5880 LearningRate 0.0005 Epoch: 19 Global Step: 200630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:37:31,089-Speed 5947.20 samples/sec Loss 1.5568 LearningRate 0.0005 Epoch: 19 Global Step: 200640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:37:37,923-Speed 5994.37 samples/sec Loss 1.5297 LearningRate 0.0005 Epoch: 19 Global Step: 200650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:37:44,772-Speed 5981.17 samples/sec Loss 1.5672 LearningRate 0.0005 Epoch: 19 Global Step: 200660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:37:51,618-Speed 5984.95 samples/sec Loss 1.5544 LearningRate 0.0005 Epoch: 19 Global Step: 200670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:37:58,476-Speed 5974.02 samples/sec Loss 1.5379 LearningRate 0.0005 Epoch: 19 Global Step: 200680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:38:05,330-Speed 5976.44 samples/sec Loss 1.5624 LearningRate 0.0005 Epoch: 19 Global Step: 200690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:38:12,173-Speed 5986.90 samples/sec Loss 1.5590 LearningRate 0.0005 Epoch: 19 Global Step: 200700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:38:19,626-Speed 5497.00 samples/sec Loss 1.5389 LearningRate 0.0005 Epoch: 19 Global Step: 200710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:38:26,487-Speed 5970.78 samples/sec Loss 1.5803 LearningRate 0.0005 Epoch: 19 Global Step: 200720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:38:33,340-Speed 5978.53 samples/sec Loss 1.5623 LearningRate 0.0005 Epoch: 19 Global Step: 200730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:38:40,186-Speed 5983.55 samples/sec Loss 1.5692 LearningRate 0.0005 Epoch: 19 Global Step: 200740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:38:47,077-Speed 5945.47 samples/sec Loss 1.5585 LearningRate 0.0005 Epoch: 19 Global Step: 200750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:38:53,918-Speed 5988.79 samples/sec Loss 1.5553 LearningRate 0.0005 Epoch: 19 Global Step: 200760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:39:00,761-Speed 5987.05 samples/sec Loss 1.5234 LearningRate 0.0005 Epoch: 19 Global Step: 200770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:39:07,618-Speed 5974.20 samples/sec Loss 1.5599 LearningRate 0.0005 Epoch: 19 Global Step: 200780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:39:14,468-Speed 5981.36 samples/sec Loss 1.5655 LearningRate 0.0005 Epoch: 19 Global Step: 200790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:39:21,319-Speed 5979.25 samples/sec Loss 1.5633 LearningRate 0.0005 Epoch: 19 Global Step: 200800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:39:28,168-Speed 5981.47 samples/sec Loss 1.5941 LearningRate 0.0005 Epoch: 19 Global Step: 200810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:39:35,010-Speed 5987.26 samples/sec Loss 1.5557 LearningRate 0.0005 Epoch: 19 Global Step: 200820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:39:41,861-Speed 5980.36 samples/sec Loss 1.5815 LearningRate 0.0005 Epoch: 19 Global Step: 200830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:39:48,724-Speed 5968.65 samples/sec Loss 1.5635 LearningRate 0.0005 Epoch: 19 Global Step: 200840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:39:55,561-Speed 5991.86 samples/sec Loss 1.5556 LearningRate 0.0005 Epoch: 19 Global Step: 200850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:40:02,433-Speed 5961.76 samples/sec Loss 1.5592 LearningRate 0.0005 Epoch: 19 Global Step: 200860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:40:09,305-Speed 5962.36 samples/sec Loss 1.5643 LearningRate 0.0005 Epoch: 19 Global Step: 200870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:40:16,155-Speed 5980.56 samples/sec Loss 1.5534 LearningRate 0.0005 Epoch: 19 Global Step: 200880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:40:23,022-Speed 5965.96 samples/sec Loss 1.5224 LearningRate 0.0005 Epoch: 19 Global Step: 200890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:40:29,876-Speed 5977.35 samples/sec Loss 1.5536 LearningRate 0.0005 Epoch: 19 Global Step: 200900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:40:36,727-Speed 5979.10 samples/sec Loss 1.5801 LearningRate 0.0005 Epoch: 19 Global Step: 200910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:40:43,577-Speed 5980.56 samples/sec Loss 1.5567 LearningRate 0.0005 Epoch: 19 Global Step: 200920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:40:50,423-Speed 5986.82 samples/sec Loss 1.5447 LearningRate 0.0005 Epoch: 19 Global Step: 200930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:40:57,266-Speed 5986.05 samples/sec Loss 1.5471 LearningRate 0.0005 Epoch: 19 Global Step: 200940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:41:04,122-Speed 5975.83 samples/sec Loss 1.5602 LearningRate 0.0005 Epoch: 19 Global Step: 200950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:41:10,993-Speed 5962.42 samples/sec Loss 1.5483 LearningRate 0.0005 Epoch: 19 Global Step: 200960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:41:17,866-Speed 5961.27 samples/sec Loss 1.5587 LearningRate 0.0005 Epoch: 19 Global Step: 200970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:41:24,737-Speed 5962.61 samples/sec Loss 1.5622 LearningRate 0.0005 Epoch: 19 Global Step: 200980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:41:31,619-Speed 5952.46 samples/sec Loss 1.5248 LearningRate 0.0005 Epoch: 19 Global Step: 200990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:41:38,483-Speed 5968.28 samples/sec Loss 1.5479 LearningRate 0.0005 Epoch: 19 Global Step: 201000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:41:45,338-Speed 5976.81 samples/sec Loss 1.6080 LearningRate 0.0005 Epoch: 19 Global Step: 201010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:41:52,189-Speed 5978.98 samples/sec Loss 1.5522 LearningRate 0.0005 Epoch: 19 Global Step: 201020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:41:59,096-Speed 5933.79 samples/sec Loss 1.5434 LearningRate 0.0005 Epoch: 19 Global Step: 201030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:42:05,999-Speed 5934.77 samples/sec Loss 1.5972 LearningRate 0.0005 Epoch: 19 Global Step: 201040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:42:12,870-Speed 5962.23 samples/sec Loss 1.5906 LearningRate 0.0005 Epoch: 19 Global Step: 201050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:42:19,843-Speed 5875.74 samples/sec Loss 1.5255 LearningRate 0.0005 Epoch: 19 Global Step: 201060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:42:26,700-Speed 5974.15 samples/sec Loss 1.5712 LearningRate 0.0005 Epoch: 19 Global Step: 201070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:42:33,550-Speed 5981.08 samples/sec Loss 1.5479 LearningRate 0.0005 Epoch: 19 Global Step: 201080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:42:40,401-Speed 5979.69 samples/sec Loss 1.5414 LearningRate 0.0005 Epoch: 19 Global Step: 201090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:42:47,260-Speed 5973.18 samples/sec Loss 1.5431 LearningRate 0.0005 Epoch: 19 Global Step: 201100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:42:54,110-Speed 5980.80 samples/sec Loss 1.5211 LearningRate 0.0005 Epoch: 19 Global Step: 201110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:43:00,964-Speed 5976.06 samples/sec Loss 1.5582 LearningRate 0.0004 Epoch: 19 Global Step: 201120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:43:07,817-Speed 5980.67 samples/sec Loss 1.5381 LearningRate 0.0004 Epoch: 19 Global Step: 201130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:43:14,682-Speed 5967.00 samples/sec Loss 1.5792 LearningRate 0.0004 Epoch: 19 Global Step: 201140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:43:21,532-Speed 5981.06 samples/sec Loss 1.5576 LearningRate 0.0004 Epoch: 19 Global Step: 201150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:43:28,410-Speed 5956.32 samples/sec Loss 1.5237 LearningRate 0.0004 Epoch: 19 Global Step: 201160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:43:35,268-Speed 5974.00 samples/sec Loss 1.5392 LearningRate 0.0004 Epoch: 19 Global Step: 201170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:43:42,142-Speed 5961.93 samples/sec Loss 1.5471 LearningRate 0.0004 Epoch: 19 Global Step: 201180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:43:49,014-Speed 5961.51 samples/sec Loss 1.5446 LearningRate 0.0004 Epoch: 19 Global Step: 201190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:43:55,876-Speed 5970.08 samples/sec Loss 1.5452 LearningRate 0.0004 Epoch: 19 Global Step: 201200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:44:02,745-Speed 5964.39 samples/sec Loss 1.5706 LearningRate 0.0004 Epoch: 19 Global Step: 201210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:44:09,604-Speed 5972.89 samples/sec Loss 1.5558 LearningRate 0.0004 Epoch: 19 Global Step: 201220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:44:16,476-Speed 5962.49 samples/sec Loss 1.5433 LearningRate 0.0004 Epoch: 19 Global Step: 201230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:44:23,361-Speed 5950.75 samples/sec Loss 1.5202 LearningRate 0.0004 Epoch: 19 Global Step: 201240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:44:30,217-Speed 5975.01 samples/sec Loss 1.5581 LearningRate 0.0004 Epoch: 19 Global Step: 201250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:44:43,152-Speed 3167.04 samples/sec Loss 1.5319 LearningRate 0.0004 Epoch: 19 Global Step: 201260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:44:50,002-Speed 5980.20 samples/sec Loss 1.5551 LearningRate 0.0004 Epoch: 19 Global Step: 201270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:44:56,839-Speed 5991.85 samples/sec Loss 1.5476 LearningRate 0.0004 Epoch: 19 Global Step: 201280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:45:03,705-Speed 5966.90 samples/sec Loss 1.5684 LearningRate 0.0004 Epoch: 19 Global Step: 201290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:45:10,574-Speed 5964.50 samples/sec Loss 1.5439 LearningRate 0.0004 Epoch: 19 Global Step: 201300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:45:17,424-Speed 5981.07 samples/sec Loss 1.5293 LearningRate 0.0004 Epoch: 19 Global Step: 201310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:45:24,283-Speed 5972.93 samples/sec Loss 1.5555 LearningRate 0.0004 Epoch: 19 Global Step: 201320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:45:31,132-Speed 5981.68 samples/sec Loss 1.5452 LearningRate 0.0004 Epoch: 19 Global Step: 201330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:45:38,010-Speed 5955.94 samples/sec Loss 1.5637 LearningRate 0.0004 Epoch: 19 Global Step: 201340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:45:44,876-Speed 5967.37 samples/sec Loss 1.5534 LearningRate 0.0004 Epoch: 19 Global Step: 201350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:45:51,750-Speed 5959.61 samples/sec Loss 1.5630 LearningRate 0.0004 Epoch: 19 Global Step: 201360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:45:58,614-Speed 5968.20 samples/sec Loss 1.5422 LearningRate 0.0004 Epoch: 19 Global Step: 201370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:46:05,513-Speed 5938.51 samples/sec Loss 1.5436 LearningRate 0.0004 Epoch: 19 Global Step: 201380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:46:12,386-Speed 5960.72 samples/sec Loss 1.5461 LearningRate 0.0004 Epoch: 19 Global Step: 201390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:46:19,254-Speed 5965.69 samples/sec Loss 1.5575 LearningRate 0.0004 Epoch: 19 Global Step: 201400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:46:26,131-Speed 5957.91 samples/sec Loss 1.5623 LearningRate 0.0004 Epoch: 19 Global Step: 201410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:46:33,010-Speed 5955.92 samples/sec Loss 1.5536 LearningRate 0.0004 Epoch: 19 Global Step: 201420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:46:39,878-Speed 5964.85 samples/sec Loss 1.5608 LearningRate 0.0004 Epoch: 19 Global Step: 201430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:46:46,753-Speed 5959.54 samples/sec Loss 1.5596 LearningRate 0.0004 Epoch: 19 Global Step: 201440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:46:53,626-Speed 5961.25 samples/sec Loss 1.5548 LearningRate 0.0004 Epoch: 19 Global Step: 201450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:47:00,500-Speed 5959.82 samples/sec Loss 1.5286 LearningRate 0.0004 Epoch: 19 Global Step: 201460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:47:07,359-Speed 5972.46 samples/sec Loss 1.5300 LearningRate 0.0004 Epoch: 19 Global Step: 201470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:47:14,216-Speed 5975.10 samples/sec Loss 1.5500 LearningRate 0.0004 Epoch: 19 Global Step: 201480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:47:21,088-Speed 5964.30 samples/sec Loss 1.5398 LearningRate 0.0004 Epoch: 19 Global Step: 201490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:47:27,943-Speed 5975.92 samples/sec Loss 1.5286 LearningRate 0.0004 Epoch: 19 Global Step: 201500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:47:34,807-Speed 5967.87 samples/sec Loss 1.5434 LearningRate 0.0004 Epoch: 19 Global Step: 201510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:47:41,659-Speed 5979.47 samples/sec Loss 1.5383 LearningRate 0.0004 Epoch: 19 Global Step: 201520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:47:48,573-Speed 5925.03 samples/sec Loss 1.5545 LearningRate 0.0004 Epoch: 19 Global Step: 201530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:47:55,453-Speed 5954.09 samples/sec Loss 1.5532 LearningRate 0.0004 Epoch: 19 Global Step: 201540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:48:02,306-Speed 5978.39 samples/sec Loss 1.5331 LearningRate 0.0004 Epoch: 19 Global Step: 201550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:48:09,177-Speed 5962.68 samples/sec Loss 1.5294 LearningRate 0.0004 Epoch: 19 Global Step: 201560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:48:16,068-Speed 5945.37 samples/sec Loss 1.5263 LearningRate 0.0004 Epoch: 19 Global Step: 201570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:48:22,920-Speed 5978.51 samples/sec Loss 1.5534 LearningRate 0.0004 Epoch: 19 Global Step: 201580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:48:29,787-Speed 5966.25 samples/sec Loss 1.5146 LearningRate 0.0004 Epoch: 19 Global Step: 201590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:48:36,652-Speed 5967.07 samples/sec Loss 1.5663 LearningRate 0.0004 Epoch: 19 Global Step: 201600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:48:43,508-Speed 5975.66 samples/sec Loss 1.5445 LearningRate 0.0004 Epoch: 19 Global Step: 201610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:48:50,369-Speed 5971.01 samples/sec Loss 1.5455 LearningRate 0.0004 Epoch: 19 Global Step: 201620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:48:57,267-Speed 5938.77 samples/sec Loss 1.5470 LearningRate 0.0004 Epoch: 19 Global Step: 201630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:49:04,135-Speed 5965.75 samples/sec Loss 1.5358 LearningRate 0.0004 Epoch: 19 Global Step: 201640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:49:10,987-Speed 5979.24 samples/sec Loss 1.5588 LearningRate 0.0004 Epoch: 19 Global Step: 201650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:49:17,871-Speed 5950.48 samples/sec Loss 1.5400 LearningRate 0.0004 Epoch: 19 Global Step: 201660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:49:24,740-Speed 5964.43 samples/sec Loss 1.5743 LearningRate 0.0004 Epoch: 19 Global Step: 201670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:49:31,603-Speed 5972.07 samples/sec Loss 1.5408 LearningRate 0.0004 Epoch: 19 Global Step: 201680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:49:38,466-Speed 5968.84 samples/sec Loss 1.5410 LearningRate 0.0004 Epoch: 19 Global Step: 201690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:49:45,326-Speed 5972.74 samples/sec Loss 1.5486 LearningRate 0.0004 Epoch: 19 Global Step: 201700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:49:52,182-Speed 5974.80 samples/sec Loss 1.5522 LearningRate 0.0004 Epoch: 19 Global Step: 201710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:49:59,041-Speed 5973.06 samples/sec Loss 1.5306 LearningRate 0.0004 Epoch: 19 Global Step: 201720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:50:05,934-Speed 5943.36 samples/sec Loss 1.5514 LearningRate 0.0004 Epoch: 19 Global Step: 201730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:50:12,810-Speed 5959.22 samples/sec Loss 1.5394 LearningRate 0.0004 Epoch: 19 Global Step: 201740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:50:19,681-Speed 5963.16 samples/sec Loss 1.5195 LearningRate 0.0004 Epoch: 19 Global Step: 201750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:50:26,561-Speed 5953.77 samples/sec Loss 1.5446 LearningRate 0.0004 Epoch: 19 Global Step: 201760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:50:33,414-Speed 5978.17 samples/sec Loss 1.5427 LearningRate 0.0004 Epoch: 19 Global Step: 201770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:50:40,271-Speed 5974.70 samples/sec Loss 1.5429 LearningRate 0.0004 Epoch: 19 Global Step: 201780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:50:47,135-Speed 5967.58 samples/sec Loss 1.4964 LearningRate 0.0004 Epoch: 19 Global Step: 201790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:50:54,008-Speed 5960.97 samples/sec Loss 1.5073 LearningRate 0.0004 Epoch: 19 Global Step: 201800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:51:00,873-Speed 5967.55 samples/sec Loss 1.5584 LearningRate 0.0004 Epoch: 19 Global Step: 201810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:51:07,800-Speed 5915.04 samples/sec Loss 1.5222 LearningRate 0.0004 Epoch: 19 Global Step: 201820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:51:14,742-Speed 5901.79 samples/sec Loss 1.5314 LearningRate 0.0004 Epoch: 19 Global Step: 201830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:51:21,620-Speed 5958.02 samples/sec Loss 1.5260 LearningRate 0.0004 Epoch: 19 Global Step: 201840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:51:28,486-Speed 5966.45 samples/sec Loss 1.5417 LearningRate 0.0004 Epoch: 19 Global Step: 201850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:51:35,354-Speed 5965.22 samples/sec Loss 1.5564 LearningRate 0.0003 Epoch: 19 Global Step: 201860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:51:42,228-Speed 5959.10 samples/sec Loss 1.5367 LearningRate 0.0003 Epoch: 19 Global Step: 201870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:51:49,104-Speed 5959.00 samples/sec Loss 1.5189 LearningRate 0.0003 Epoch: 19 Global Step: 201880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:51:55,960-Speed 5975.70 samples/sec Loss 1.5356 LearningRate 0.0003 Epoch: 19 Global Step: 201890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:52:02,826-Speed 5966.27 samples/sec Loss 1.5238 LearningRate 0.0003 Epoch: 19 Global Step: 201900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:52:09,697-Speed 5962.65 samples/sec Loss 1.5586 LearningRate 0.0003 Epoch: 19 Global Step: 201910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:52:16,549-Speed 5978.70 samples/sec Loss 1.5233 LearningRate 0.0003 Epoch: 19 Global Step: 201920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:52:23,422-Speed 5962.35 samples/sec Loss 1.5144 LearningRate 0.0003 Epoch: 19 Global Step: 201930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:52:30,284-Speed 5970.79 samples/sec Loss 1.5496 LearningRate 0.0003 Epoch: 19 Global Step: 201940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:52:37,167-Speed 5951.99 samples/sec Loss 1.5588 LearningRate 0.0003 Epoch: 19 Global Step: 201950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:52:44,037-Speed 5963.04 samples/sec Loss 1.5506 LearningRate 0.0003 Epoch: 19 Global Step: 201960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:52:50,916-Speed 5956.05 samples/sec Loss 1.5374 LearningRate 0.0003 Epoch: 19 Global Step: 201970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:52:57,776-Speed 5973.34 samples/sec Loss 1.5374 LearningRate 0.0003 Epoch: 19 Global Step: 201980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:53:04,635-Speed 5972.33 samples/sec Loss 1.5535 LearningRate 0.0003 Epoch: 19 Global Step: 201990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:53:11,493-Speed 5974.53 samples/sec Loss 1.5464 LearningRate 0.0003 Epoch: 19 Global Step: 202000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:53:18,380-Speed 5948.76 samples/sec Loss 1.5480 LearningRate 0.0003 Epoch: 19 Global Step: 202010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:53:25,241-Speed 5972.56 samples/sec Loss 1.5254 LearningRate 0.0003 Epoch: 19 Global Step: 202020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:53:32,093-Speed 5978.68 samples/sec Loss 1.5417 LearningRate 0.0003 Epoch: 19 Global Step: 202030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:53:38,949-Speed 5975.65 samples/sec Loss 1.5100 LearningRate 0.0003 Epoch: 19 Global Step: 202040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:53:45,808-Speed 5972.93 samples/sec Loss 1.5183 LearningRate 0.0003 Epoch: 19 Global Step: 202050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:53:52,679-Speed 5962.46 samples/sec Loss 1.5351 LearningRate 0.0003 Epoch: 19 Global Step: 202060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:53:59,566-Speed 5948.64 samples/sec Loss 1.5371 LearningRate 0.0003 Epoch: 19 Global Step: 202070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:54:06,411-Speed 5984.65 samples/sec Loss 1.5468 LearningRate 0.0003 Epoch: 19 Global Step: 202080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:54:13,269-Speed 5974.30 samples/sec Loss 1.5432 LearningRate 0.0003 Epoch: 19 Global Step: 202090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:54:20,120-Speed 5982.81 samples/sec Loss 1.5234 LearningRate 0.0003 Epoch: 19 Global Step: 202100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:54:26,973-Speed 5979.33 samples/sec Loss 1.5205 LearningRate 0.0003 Epoch: 19 Global Step: 202110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:54:33,837-Speed 5968.46 samples/sec Loss 1.5388 LearningRate 0.0003 Epoch: 19 Global Step: 202120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:54:40,742-Speed 5932.97 samples/sec Loss 1.5525 LearningRate 0.0003 Epoch: 19 Global Step: 202130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:54:47,598-Speed 5975.15 samples/sec Loss 1.5519 LearningRate 0.0003 Epoch: 19 Global Step: 202140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:54:54,461-Speed 5971.63 samples/sec Loss 1.5216 LearningRate 0.0003 Epoch: 19 Global Step: 202150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:55:01,352-Speed 5944.98 samples/sec Loss 1.5020 LearningRate 0.0003 Epoch: 19 Global Step: 202160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:55:08,214-Speed 5969.98 samples/sec Loss 1.5600 LearningRate 0.0003 Epoch: 19 Global Step: 202170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:55:15,074-Speed 5972.09 samples/sec Loss 1.5073 LearningRate 0.0003 Epoch: 19 Global Step: 202180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:55:21,918-Speed 5985.24 samples/sec Loss 1.5408 LearningRate 0.0003 Epoch: 19 Global Step: 202190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:55:28,790-Speed 5962.45 samples/sec Loss 1.5178 LearningRate 0.0003 Epoch: 19 Global Step: 202200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:55:35,668-Speed 5955.80 samples/sec Loss 1.5133 LearningRate 0.0003 Epoch: 19 Global Step: 202210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:55:42,526-Speed 5976.12 samples/sec Loss 1.5348 LearningRate 0.0003 Epoch: 19 Global Step: 202220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:55:49,379-Speed 5978.39 samples/sec Loss 1.5357 LearningRate 0.0003 Epoch: 19 Global Step: 202230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:55:56,255-Speed 5958.09 samples/sec Loss 1.5569 LearningRate 0.0003 Epoch: 19 Global Step: 202240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:56:03,115-Speed 5971.89 samples/sec Loss 1.5426 LearningRate 0.0003 Epoch: 19 Global Step: 202250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:56:09,971-Speed 5975.96 samples/sec Loss 1.5239 LearningRate 0.0003 Epoch: 19 Global Step: 202260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:56:16,837-Speed 5966.51 samples/sec Loss 1.5066 LearningRate 0.0003 Epoch: 19 Global Step: 202270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:56:23,681-Speed 5985.44 samples/sec Loss 1.4980 LearningRate 0.0003 Epoch: 19 Global Step: 202280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:56:30,532-Speed 5980.29 samples/sec Loss 1.5315 LearningRate 0.0003 Epoch: 19 Global Step: 202290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:56:37,405-Speed 5960.54 samples/sec Loss 1.5401 LearningRate 0.0003 Epoch: 19 Global Step: 202300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:56:44,269-Speed 5968.09 samples/sec Loss 1.5424 LearningRate 0.0003 Epoch: 19 Global Step: 202310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:56:51,147-Speed 5956.62 samples/sec Loss 1.5409 LearningRate 0.0003 Epoch: 19 Global Step: 202320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:56:58,031-Speed 5954.27 samples/sec Loss 1.5247 LearningRate 0.0003 Epoch: 19 Global Step: 202330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:57:04,890-Speed 5972.13 samples/sec Loss 1.5379 LearningRate 0.0003 Epoch: 19 Global Step: 202340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:57:11,766-Speed 5958.15 samples/sec Loss 1.5259 LearningRate 0.0003 Epoch: 19 Global Step: 202350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:57:18,631-Speed 5967.82 samples/sec Loss 1.5507 LearningRate 0.0003 Epoch: 19 Global Step: 202360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:57:25,492-Speed 5972.05 samples/sec Loss 1.5298 LearningRate 0.0003 Epoch: 19 Global Step: 202370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:57:32,349-Speed 5974.50 samples/sec Loss 1.5476 LearningRate 0.0003 Epoch: 19 Global Step: 202380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:57:39,216-Speed 5966.21 samples/sec Loss 1.5526 LearningRate 0.0003 Epoch: 19 Global Step: 202390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:57:46,076-Speed 5971.55 samples/sec Loss 1.5053 LearningRate 0.0003 Epoch: 19 Global Step: 202400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:57:52,943-Speed 5968.10 samples/sec Loss 1.5185 LearningRate 0.0003 Epoch: 19 Global Step: 202410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:57:59,811-Speed 5965.42 samples/sec Loss 1.5307 LearningRate 0.0003 Epoch: 19 Global Step: 202420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:58:06,657-Speed 5984.53 samples/sec Loss 1.5256 LearningRate 0.0003 Epoch: 19 Global Step: 202430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:58:13,513-Speed 5975.18 samples/sec Loss 1.5256 LearningRate 0.0003 Epoch: 19 Global Step: 202440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:58:20,375-Speed 5970.54 samples/sec Loss 1.5236 LearningRate 0.0003 Epoch: 19 Global Step: 202450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:58:27,225-Speed 5980.94 samples/sec Loss 1.5322 LearningRate 0.0003 Epoch: 19 Global Step: 202460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:58:34,096-Speed 5962.59 samples/sec Loss 1.5362 LearningRate 0.0003 Epoch: 19 Global Step: 202470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:58:40,954-Speed 5973.42 samples/sec Loss 1.5387 LearningRate 0.0003 Epoch: 19 Global Step: 202480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:58:47,832-Speed 5956.20 samples/sec Loss 1.5388 LearningRate 0.0003 Epoch: 19 Global Step: 202490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:58:54,698-Speed 5967.71 samples/sec Loss 1.5322 LearningRate 0.0003 Epoch: 19 Global Step: 202500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:59:01,555-Speed 5974.12 samples/sec Loss 1.5117 LearningRate 0.0003 Epoch: 19 Global Step: 202510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:59:08,436-Speed 5953.66 samples/sec Loss 1.5229 LearningRate 0.0003 Epoch: 19 Global Step: 202520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:59:15,294-Speed 5976.04 samples/sec Loss 1.5357 LearningRate 0.0003 Epoch: 19 Global Step: 202530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:59:22,179-Speed 5950.82 samples/sec Loss 1.5472 LearningRate 0.0003 Epoch: 19 Global Step: 202540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 11:59:29,020-Speed 5988.48 samples/sec Loss 1.5004 LearningRate 0.0003 Epoch: 19 Global Step: 202550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:59:35,863-Speed 5986.86 samples/sec Loss 1.5094 LearningRate 0.0003 Epoch: 19 Global Step: 202560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:59:42,740-Speed 5957.49 samples/sec Loss 1.5530 LearningRate 0.0003 Epoch: 19 Global Step: 202570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:59:49,587-Speed 5982.59 samples/sec Loss 1.5328 LearningRate 0.0003 Epoch: 19 Global Step: 202580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 11:59:56,427-Speed 5989.29 samples/sec Loss 1.5359 LearningRate 0.0003 Epoch: 19 Global Step: 202590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:00:03,277-Speed 5980.69 samples/sec Loss 1.5245 LearningRate 0.0003 Epoch: 19 Global Step: 202600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:00:10,138-Speed 5970.66 samples/sec Loss 1.5307 LearningRate 0.0003 Epoch: 19 Global Step: 202610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:00:16,991-Speed 5978.51 samples/sec Loss 1.5247 LearningRate 0.0003 Epoch: 19 Global Step: 202620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:00:23,855-Speed 5967.62 samples/sec Loss 1.5350 LearningRate 0.0003 Epoch: 19 Global Step: 202630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:00:30,737-Speed 5953.35 samples/sec Loss 1.5305 LearningRate 0.0003 Epoch: 19 Global Step: 202640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:00:37,594-Speed 5974.40 samples/sec Loss 1.5286 LearningRate 0.0003 Epoch: 19 Global Step: 202650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:00:44,446-Speed 5978.81 samples/sec Loss 1.5409 LearningRate 0.0003 Epoch: 19 Global Step: 202660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:00:51,345-Speed 5938.23 samples/sec Loss 1.5309 LearningRate 0.0003 Epoch: 19 Global Step: 202670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:00:58,197-Speed 5978.95 samples/sec Loss 1.5681 LearningRate 0.0003 Epoch: 19 Global Step: 202680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:01:05,054-Speed 5974.75 samples/sec Loss 1.5209 LearningRate 0.0003 Epoch: 19 Global Step: 202690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:01:11,920-Speed 5966.48 samples/sec Loss 1.4903 LearningRate 0.0003 Epoch: 19 Global Step: 202700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:01:18,782-Speed 5971.31 samples/sec Loss 1.5517 LearningRate 0.0003 Epoch: 19 Global Step: 202710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:01:25,644-Speed 5971.19 samples/sec Loss 1.5057 LearningRate 0.0002 Epoch: 19 Global Step: 202720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:01:32,514-Speed 5963.12 samples/sec Loss 1.5129 LearningRate 0.0002 Epoch: 19 Global Step: 202730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:01:39,388-Speed 5960.04 samples/sec Loss 1.5261 LearningRate 0.0002 Epoch: 19 Global Step: 202740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:01:46,256-Speed 5965.51 samples/sec Loss 1.5486 LearningRate 0.0002 Epoch: 19 Global Step: 202750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:01:53,134-Speed 5955.74 samples/sec Loss 1.5389 LearningRate 0.0002 Epoch: 19 Global Step: 202760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:01:59,994-Speed 5971.90 samples/sec Loss 1.5490 LearningRate 0.0002 Epoch: 19 Global Step: 202770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:02:06,915-Speed 5919.58 samples/sec Loss 1.4950 LearningRate 0.0002 Epoch: 19 Global Step: 202780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:02:13,776-Speed 5970.51 samples/sec Loss 1.4944 LearningRate 0.0002 Epoch: 19 Global Step: 202790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:02:20,622-Speed 5984.39 samples/sec Loss 1.5272 LearningRate 0.0002 Epoch: 19 Global Step: 202800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:02:27,489-Speed 5965.78 samples/sec Loss 1.5141 LearningRate 0.0002 Epoch: 19 Global Step: 202810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:02:34,331-Speed 5987.80 samples/sec Loss 1.5160 LearningRate 0.0002 Epoch: 19 Global Step: 202820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:02:41,191-Speed 5972.14 samples/sec Loss 1.5233 LearningRate 0.0002 Epoch: 19 Global Step: 202830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:02:48,060-Speed 5964.23 samples/sec Loss 1.5131 LearningRate 0.0002 Epoch: 19 Global Step: 202840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:02:54,916-Speed 5974.68 samples/sec Loss 1.5201 LearningRate 0.0002 Epoch: 19 Global Step: 202850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:03:01,831-Speed 5924.26 samples/sec Loss 1.5095 LearningRate 0.0002 Epoch: 19 Global Step: 202860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:03:08,790-Speed 5887.46 samples/sec Loss 1.5130 LearningRate 0.0002 Epoch: 19 Global Step: 202870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:03:15,651-Speed 5970.93 samples/sec Loss 1.5174 LearningRate 0.0002 Epoch: 19 Global Step: 202880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:03:22,506-Speed 5976.84 samples/sec Loss 1.5331 LearningRate 0.0002 Epoch: 19 Global Step: 202890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:03:29,358-Speed 5978.72 samples/sec Loss 1.5039 LearningRate 0.0002 Epoch: 19 Global Step: 202900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:03:36,241-Speed 5951.62 samples/sec Loss 1.5160 LearningRate 0.0002 Epoch: 19 Global Step: 202910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:03:43,083-Speed 5987.32 samples/sec Loss 1.5097 LearningRate 0.0002 Epoch: 19 Global Step: 202920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:03:49,932-Speed 5982.09 samples/sec Loss 1.5315 LearningRate 0.0002 Epoch: 19 Global Step: 202930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:03:56,791-Speed 5972.96 samples/sec Loss 1.5306 LearningRate 0.0002 Epoch: 19 Global Step: 202940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:04:03,657-Speed 5966.89 samples/sec Loss 1.5467 LearningRate 0.0002 Epoch: 19 Global Step: 202950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:04:10,518-Speed 5970.25 samples/sec Loss 1.5220 LearningRate 0.0002 Epoch: 19 Global Step: 202960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:04:22,907-Speed 3306.57 samples/sec Loss 1.5075 LearningRate 0.0002 Epoch: 19 Global Step: 202970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:04:29,739-Speed 5996.68 samples/sec Loss 1.4997 LearningRate 0.0002 Epoch: 19 Global Step: 202980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:04:36,596-Speed 5974.63 samples/sec Loss 1.5179 LearningRate 0.0002 Epoch: 19 Global Step: 202990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:04:43,475-Speed 5954.92 samples/sec Loss 1.4868 LearningRate 0.0002 Epoch: 19 Global Step: 203000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:04:50,314-Speed 5990.11 samples/sec Loss 1.4905 LearningRate 0.0002 Epoch: 19 Global Step: 203010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:04:57,161-Speed 5983.84 samples/sec Loss 1.5195 LearningRate 0.0002 Epoch: 19 Global Step: 203020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:05:04,025-Speed 5968.59 samples/sec Loss 1.5363 LearningRate 0.0002 Epoch: 19 Global Step: 203030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:05:10,870-Speed 5985.01 samples/sec Loss 1.5158 LearningRate 0.0002 Epoch: 19 Global Step: 203040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:05:17,728-Speed 5973.90 samples/sec Loss 1.5079 LearningRate 0.0002 Epoch: 19 Global Step: 203050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:05:24,601-Speed 5960.29 samples/sec Loss 1.5179 LearningRate 0.0002 Epoch: 19 Global Step: 203060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:05:31,502-Speed 5936.77 samples/sec Loss 1.5369 LearningRate 0.0002 Epoch: 19 Global Step: 203070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:05:38,348-Speed 5984.52 samples/sec Loss 1.5324 LearningRate 0.0002 Epoch: 19 Global Step: 203080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:05:45,242-Speed 5942.29 samples/sec Loss 1.5161 LearningRate 0.0002 Epoch: 19 Global Step: 203090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:05:52,097-Speed 5978.82 samples/sec Loss 1.5003 LearningRate 0.0002 Epoch: 19 Global Step: 203100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:05:58,941-Speed 5987.30 samples/sec Loss 1.5305 LearningRate 0.0002 Epoch: 19 Global Step: 203110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:06:05,814-Speed 5961.32 samples/sec Loss 1.5170 LearningRate 0.0002 Epoch: 19 Global Step: 203120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:06:12,687-Speed 5961.50 samples/sec Loss 1.5018 LearningRate 0.0002 Epoch: 19 Global Step: 203130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:06:19,575-Speed 5948.45 samples/sec Loss 1.5162 LearningRate 0.0002 Epoch: 19 Global Step: 203140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:06:26,461-Speed 5949.79 samples/sec Loss 1.5082 LearningRate 0.0002 Epoch: 19 Global Step: 203150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:06:33,338-Speed 5957.38 samples/sec Loss 1.5279 LearningRate 0.0002 Epoch: 19 Global Step: 203160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:06:40,194-Speed 5975.82 samples/sec Loss 1.5327 LearningRate 0.0002 Epoch: 19 Global Step: 203170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:06:47,041-Speed 5983.54 samples/sec Loss 1.4960 LearningRate 0.0002 Epoch: 19 Global Step: 203180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:06:53,906-Speed 5967.74 samples/sec Loss 1.4861 LearningRate 0.0002 Epoch: 19 Global Step: 203190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:07:00,768-Speed 5970.68 samples/sec Loss 1.5053 LearningRate 0.0002 Epoch: 19 Global Step: 203200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:07:07,617-Speed 5981.24 samples/sec Loss 1.5306 LearningRate 0.0002 Epoch: 19 Global Step: 203210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:07:14,475-Speed 5974.28 samples/sec Loss 1.5011 LearningRate 0.0002 Epoch: 19 Global Step: 203220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:07:21,327-Speed 5979.01 samples/sec Loss 1.5308 LearningRate 0.0002 Epoch: 19 Global Step: 203230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:07:28,183-Speed 5975.04 samples/sec Loss 1.4988 LearningRate 0.0002 Epoch: 19 Global Step: 203240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:07:35,032-Speed 5981.37 samples/sec Loss 1.4795 LearningRate 0.0002 Epoch: 19 Global Step: 203250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:07:41,899-Speed 5966.43 samples/sec Loss 1.5105 LearningRate 0.0002 Epoch: 19 Global Step: 203260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:07:48,749-Speed 5980.67 samples/sec Loss 1.5065 LearningRate 0.0002 Epoch: 19 Global Step: 203270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:07:55,601-Speed 5978.69 samples/sec Loss 1.5304 LearningRate 0.0002 Epoch: 19 Global Step: 203280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:08:02,495-Speed 5942.66 samples/sec Loss 1.5293 LearningRate 0.0002 Epoch: 19 Global Step: 203290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:08:09,371-Speed 5958.22 samples/sec Loss 1.5335 LearningRate 0.0002 Epoch: 19 Global Step: 203300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:08:16,225-Speed 5976.78 samples/sec Loss 1.4977 LearningRate 0.0002 Epoch: 19 Global Step: 203310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:08:23,068-Speed 5987.00 samples/sec Loss 1.5355 LearningRate 0.0002 Epoch: 19 Global Step: 203320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:08:29,932-Speed 5968.43 samples/sec Loss 1.5205 LearningRate 0.0002 Epoch: 19 Global Step: 203330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:08:36,784-Speed 5978.84 samples/sec Loss 1.4926 LearningRate 0.0002 Epoch: 19 Global Step: 203340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:08:43,629-Speed 5985.53 samples/sec Loss 1.5427 LearningRate 0.0002 Epoch: 19 Global Step: 203350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:08:50,487-Speed 5976.02 samples/sec Loss 1.5280 LearningRate 0.0002 Epoch: 19 Global Step: 203360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:08:57,353-Speed 5966.55 samples/sec Loss 1.5002 LearningRate 0.0002 Epoch: 19 Global Step: 203370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:09:04,212-Speed 5972.73 samples/sec Loss 1.5248 LearningRate 0.0002 Epoch: 19 Global Step: 203380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:09:11,078-Speed 5967.08 samples/sec Loss 1.4954 LearningRate 0.0002 Epoch: 19 Global Step: 203390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:09:17,938-Speed 5972.02 samples/sec Loss 1.5439 LearningRate 0.0002 Epoch: 19 Global Step: 203400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:09:24,799-Speed 5975.57 samples/sec Loss 1.4995 LearningRate 0.0002 Epoch: 19 Global Step: 203410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:09:31,642-Speed 5986.22 samples/sec Loss 1.5210 LearningRate 0.0002 Epoch: 19 Global Step: 203420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:09:38,485-Speed 5989.46 samples/sec Loss 1.5035 LearningRate 0.0002 Epoch: 19 Global Step: 203430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:09:45,360-Speed 5959.01 samples/sec Loss 1.5107 LearningRate 0.0002 Epoch: 19 Global Step: 203440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:09:52,225-Speed 5967.54 samples/sec Loss 1.5163 LearningRate 0.0002 Epoch: 19 Global Step: 203450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:09:59,076-Speed 5980.00 samples/sec Loss 1.4974 LearningRate 0.0002 Epoch: 19 Global Step: 203460 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 12:10:05,951-Speed 5959.16 samples/sec Loss 1.5153 LearningRate 0.0002 Epoch: 19 Global Step: 203470 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 12:10:12,809-Speed 5973.89 samples/sec Loss 1.5295 LearningRate 0.0002 Epoch: 19 Global Step: 203480 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 12:10:19,663-Speed 5977.50 samples/sec Loss 1.5472 LearningRate 0.0002 Epoch: 19 Global Step: 203490 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 12:10:26,544-Speed 5953.07 samples/sec Loss 1.5035 LearningRate 0.0002 Epoch: 19 Global Step: 203500 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 12:10:33,439-Speed 5941.70 samples/sec Loss 1.5120 LearningRate 0.0002 Epoch: 19 Global Step: 203510 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 12:10:40,309-Speed 5963.32 samples/sec Loss 1.4910 LearningRate 0.0002 Epoch: 19 Global Step: 203520 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 12:10:47,163-Speed 5977.51 samples/sec Loss 1.5378 LearningRate 0.0002 Epoch: 19 Global Step: 203530 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 12:10:54,013-Speed 5983.85 samples/sec Loss 1.5391 LearningRate 0.0002 Epoch: 19 Global Step: 203540 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 12:11:00,880-Speed 5965.44 samples/sec Loss 1.4865 LearningRate 0.0002 Epoch: 19 Global Step: 203550 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 12:11:07,729-Speed 5981.09 samples/sec Loss 1.5008 LearningRate 0.0002 Epoch: 19 Global Step: 203560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:11:14,575-Speed 5984.57 samples/sec Loss 1.5128 LearningRate 0.0002 Epoch: 19 Global Step: 203570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:11:21,432-Speed 5974.21 samples/sec Loss 1.5023 LearningRate 0.0002 Epoch: 19 Global Step: 203580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:11:28,309-Speed 5956.86 samples/sec Loss 1.4914 LearningRate 0.0002 Epoch: 19 Global Step: 203590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:11:35,179-Speed 5963.26 samples/sec Loss 1.4950 LearningRate 0.0002 Epoch: 19 Global Step: 203600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:11:42,034-Speed 5976.79 samples/sec Loss 1.4940 LearningRate 0.0002 Epoch: 19 Global Step: 203610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:11:48,909-Speed 5959.12 samples/sec Loss 1.4983 LearningRate 0.0002 Epoch: 19 Global Step: 203620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:11:55,795-Speed 5948.87 samples/sec Loss 1.5241 LearningRate 0.0002 Epoch: 19 Global Step: 203630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:12:02,659-Speed 5968.81 samples/sec Loss 1.4978 LearningRate 0.0002 Epoch: 19 Global Step: 203640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:12:09,535-Speed 5958.54 samples/sec Loss 1.5268 LearningRate 0.0002 Epoch: 19 Global Step: 203650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:12:16,411-Speed 5958.14 samples/sec Loss 1.5069 LearningRate 0.0002 Epoch: 19 Global Step: 203660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:12:23,293-Speed 5953.16 samples/sec Loss 1.4930 LearningRate 0.0002 Epoch: 19 Global Step: 203670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:12:30,186-Speed 5943.17 samples/sec Loss 1.5190 LearningRate 0.0002 Epoch: 19 Global Step: 203680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:12:37,047-Speed 5972.56 samples/sec Loss 1.5248 LearningRate 0.0002 Epoch: 19 Global Step: 203690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:12:43,898-Speed 5979.93 samples/sec Loss 1.5339 LearningRate 0.0002 Epoch: 19 Global Step: 203700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:12:50,766-Speed 5965.56 samples/sec Loss 1.5160 LearningRate 0.0002 Epoch: 19 Global Step: 203710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:12:57,619-Speed 5977.64 samples/sec Loss 1.5199 LearningRate 0.0002 Epoch: 19 Global Step: 203720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:13:04,461-Speed 5988.15 samples/sec Loss 1.5116 LearningRate 0.0002 Epoch: 19 Global Step: 203730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:13:11,324-Speed 5969.19 samples/sec Loss 1.5070 LearningRate 0.0002 Epoch: 19 Global Step: 203740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:13:18,172-Speed 5982.98 samples/sec Loss 1.5313 LearningRate 0.0002 Epoch: 19 Global Step: 203750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:13:25,019-Speed 5983.09 samples/sec Loss 1.4880 LearningRate 0.0002 Epoch: 19 Global Step: 203760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:13:31,891-Speed 5961.56 samples/sec Loss 1.5046 LearningRate 0.0001 Epoch: 19 Global Step: 203770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:13:38,760-Speed 5964.13 samples/sec Loss 1.4996 LearningRate 0.0001 Epoch: 19 Global Step: 203780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:13:45,614-Speed 5980.15 samples/sec Loss 1.4990 LearningRate 0.0001 Epoch: 19 Global Step: 203790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:13:52,455-Speed 5988.14 samples/sec Loss 1.5181 LearningRate 0.0001 Epoch: 19 Global Step: 203800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:13:59,336-Speed 5954.48 samples/sec Loss 1.5195 LearningRate 0.0001 Epoch: 19 Global Step: 203810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:14:06,186-Speed 5981.02 samples/sec Loss 1.4705 LearningRate 0.0001 Epoch: 19 Global Step: 203820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:14:13,039-Speed 5977.93 samples/sec Loss 1.5104 LearningRate 0.0001 Epoch: 19 Global Step: 203830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:14:19,919-Speed 5955.08 samples/sec Loss 1.5244 LearningRate 0.0001 Epoch: 19 Global Step: 203840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:14:26,788-Speed 5964.79 samples/sec Loss 1.4907 LearningRate 0.0001 Epoch: 19 Global Step: 203850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:14:33,656-Speed 5964.86 samples/sec Loss 1.5312 LearningRate 0.0001 Epoch: 19 Global Step: 203860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:14:40,515-Speed 5973.47 samples/sec Loss 1.5032 LearningRate 0.0001 Epoch: 19 Global Step: 203870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:14:47,391-Speed 5957.89 samples/sec Loss 1.5457 LearningRate 0.0001 Epoch: 19 Global Step: 203880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:14:54,244-Speed 5977.54 samples/sec Loss 1.4941 LearningRate 0.0001 Epoch: 19 Global Step: 203890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:15:01,073-Speed 5999.48 samples/sec Loss 1.5009 LearningRate 0.0001 Epoch: 19 Global Step: 203900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:15:07,921-Speed 5982.99 samples/sec Loss 1.5049 LearningRate 0.0001 Epoch: 19 Global Step: 203910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:15:14,773-Speed 5979.83 samples/sec Loss 1.5331 LearningRate 0.0001 Epoch: 19 Global Step: 203920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:15:21,636-Speed 5969.67 samples/sec Loss 1.5104 LearningRate 0.0001 Epoch: 19 Global Step: 203930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:15:28,482-Speed 5984.33 samples/sec Loss 1.5001 LearningRate 0.0001 Epoch: 19 Global Step: 203940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:15:35,326-Speed 5985.91 samples/sec Loss 1.5024 LearningRate 0.0001 Epoch: 19 Global Step: 203950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:15:42,176-Speed 5980.87 samples/sec Loss 1.5260 LearningRate 0.0001 Epoch: 19 Global Step: 203960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:15:49,036-Speed 5971.83 samples/sec Loss 1.4960 LearningRate 0.0001 Epoch: 19 Global Step: 203970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:15:55,892-Speed 5975.39 samples/sec Loss 1.5067 LearningRate 0.0001 Epoch: 19 Global Step: 203980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:16:02,738-Speed 5983.93 samples/sec Loss 1.4776 LearningRate 0.0001 Epoch: 19 Global Step: 203990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:16:09,594-Speed 5975.66 samples/sec Loss 1.5173 LearningRate 0.0001 Epoch: 19 Global Step: 204000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:16:16,448-Speed 5979.55 samples/sec Loss 1.4861 LearningRate 0.0001 Epoch: 19 Global Step: 204010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:16:23,313-Speed 5966.63 samples/sec Loss 1.5274 LearningRate 0.0001 Epoch: 19 Global Step: 204020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:16:30,233-Speed 5920.34 samples/sec Loss 1.5157 LearningRate 0.0001 Epoch: 19 Global Step: 204030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:16:37,084-Speed 5980.72 samples/sec Loss 1.5331 LearningRate 0.0001 Epoch: 19 Global Step: 204040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:16:43,943-Speed 5972.63 samples/sec Loss 1.5179 LearningRate 0.0001 Epoch: 19 Global Step: 204050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:16:50,793-Speed 5979.67 samples/sec Loss 1.4872 LearningRate 0.0001 Epoch: 19 Global Step: 204060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:16:57,647-Speed 5978.63 samples/sec Loss 1.5252 LearningRate 0.0001 Epoch: 19 Global Step: 204070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:17:04,491-Speed 5985.34 samples/sec Loss 1.5090 LearningRate 0.0001 Epoch: 19 Global Step: 204080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:17:11,337-Speed 5984.37 samples/sec Loss 1.4875 LearningRate 0.0001 Epoch: 19 Global Step: 204090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:17:18,175-Speed 5991.57 samples/sec Loss 1.5111 LearningRate 0.0001 Epoch: 19 Global Step: 204100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:17:25,040-Speed 5967.05 samples/sec Loss 1.5001 LearningRate 0.0001 Epoch: 19 Global Step: 204110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:17:31,896-Speed 5975.08 samples/sec Loss 1.4848 LearningRate 0.0001 Epoch: 19 Global Step: 204120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:17:38,749-Speed 5978.11 samples/sec Loss 1.5040 LearningRate 0.0001 Epoch: 19 Global Step: 204130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:17:45,673-Speed 5917.07 samples/sec Loss 1.5004 LearningRate 0.0001 Epoch: 19 Global Step: 204140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:17:52,528-Speed 5976.72 samples/sec Loss 1.5126 LearningRate 0.0001 Epoch: 19 Global Step: 204150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:17:59,377-Speed 5982.17 samples/sec Loss 1.4927 LearningRate 0.0001 Epoch: 19 Global Step: 204160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:18:06,235-Speed 5973.36 samples/sec Loss 1.5143 LearningRate 0.0001 Epoch: 19 Global Step: 204170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:18:13,091-Speed 5976.27 samples/sec Loss 1.4911 LearningRate 0.0001 Epoch: 19 Global Step: 204180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:18:19,960-Speed 5964.31 samples/sec Loss 1.4917 LearningRate 0.0001 Epoch: 19 Global Step: 204190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:18:26,834-Speed 5960.10 samples/sec Loss 1.5101 LearningRate 0.0001 Epoch: 19 Global Step: 204200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:18:33,696-Speed 5969.29 samples/sec Loss 1.4865 LearningRate 0.0001 Epoch: 19 Global Step: 204210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:18:40,550-Speed 5977.65 samples/sec Loss 1.4758 LearningRate 0.0001 Epoch: 19 Global Step: 204220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:18:47,398-Speed 5982.25 samples/sec Loss 1.4888 LearningRate 0.0001 Epoch: 19 Global Step: 204230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:18:54,261-Speed 5969.70 samples/sec Loss 1.5213 LearningRate 0.0001 Epoch: 19 Global Step: 204240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:19:01,101-Speed 5989.19 samples/sec Loss 1.4992 LearningRate 0.0001 Epoch: 19 Global Step: 204250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:19:07,947-Speed 5983.52 samples/sec Loss 1.5425 LearningRate 0.0001 Epoch: 19 Global Step: 204260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:19:14,813-Speed 5967.20 samples/sec Loss 1.5011 LearningRate 0.0001 Epoch: 19 Global Step: 204270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:19:21,662-Speed 5981.17 samples/sec Loss 1.5106 LearningRate 0.0001 Epoch: 19 Global Step: 204280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:19:28,515-Speed 5978.36 samples/sec Loss 1.4941 LearningRate 0.0001 Epoch: 19 Global Step: 204290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:19:35,367-Speed 5978.99 samples/sec Loss 1.4900 LearningRate 0.0001 Epoch: 19 Global Step: 204300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:19:42,253-Speed 5950.30 samples/sec Loss 1.5095 LearningRate 0.0001 Epoch: 19 Global Step: 204310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:19:49,175-Speed 5918.52 samples/sec Loss 1.4808 LearningRate 0.0001 Epoch: 19 Global Step: 204320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:19:56,108-Speed 5908.79 samples/sec Loss 1.4662 LearningRate 0.0001 Epoch: 19 Global Step: 204330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:20:03,030-Speed 5918.59 samples/sec Loss 1.5052 LearningRate 0.0001 Epoch: 19 Global Step: 204340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:20:09,962-Speed 5910.16 samples/sec Loss 1.5010 LearningRate 0.0001 Epoch: 19 Global Step: 204350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:20:16,879-Speed 5924.33 samples/sec Loss 1.5020 LearningRate 0.0001 Epoch: 19 Global Step: 204360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:20:23,823-Speed 5899.59 samples/sec Loss 1.4817 LearningRate 0.0001 Epoch: 19 Global Step: 204370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:20:30,745-Speed 5918.45 samples/sec Loss 1.4981 LearningRate 0.0001 Epoch: 19 Global Step: 204380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:20:37,604-Speed 5973.27 samples/sec Loss 1.5045 LearningRate 0.0001 Epoch: 19 Global Step: 204390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:20:44,454-Speed 5980.16 samples/sec Loss 1.5284 LearningRate 0.0001 Epoch: 19 Global Step: 204400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:20:51,374-Speed 5920.29 samples/sec Loss 1.5241 LearningRate 0.0001 Epoch: 19 Global Step: 204410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:20:58,224-Speed 5980.68 samples/sec Loss 1.4941 LearningRate 0.0001 Epoch: 19 Global Step: 204420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:21:05,074-Speed 5980.13 samples/sec Loss 1.4972 LearningRate 0.0001 Epoch: 19 Global Step: 204430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:21:11,914-Speed 5989.82 samples/sec Loss 1.4993 LearningRate 0.0001 Epoch: 19 Global Step: 204440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:21:18,770-Speed 5977.22 samples/sec Loss 1.5067 LearningRate 0.0001 Epoch: 19 Global Step: 204450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:21:25,647-Speed 5958.75 samples/sec Loss 1.5227 LearningRate 0.0001 Epoch: 19 Global Step: 204460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:21:32,521-Speed 5959.69 samples/sec Loss 1.5273 LearningRate 0.0001 Epoch: 19 Global Step: 204470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:21:39,374-Speed 5978.43 samples/sec Loss 1.5040 LearningRate 0.0001 Epoch: 19 Global Step: 204480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:21:46,227-Speed 5977.74 samples/sec Loss 1.5003 LearningRate 0.0001 Epoch: 19 Global Step: 204490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:21:53,079-Speed 5979.73 samples/sec Loss 1.4905 LearningRate 0.0001 Epoch: 19 Global Step: 204500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:21:59,974-Speed 5941.71 samples/sec Loss 1.5004 LearningRate 0.0001 Epoch: 19 Global Step: 204510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:22:06,859-Speed 5950.48 samples/sec Loss 1.5053 LearningRate 0.0001 Epoch: 19 Global Step: 204520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:22:13,709-Speed 5980.44 samples/sec Loss 1.4954 LearningRate 0.0001 Epoch: 19 Global Step: 204530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:22:20,590-Speed 5954.21 samples/sec Loss 1.4983 LearningRate 0.0001 Epoch: 19 Global Step: 204540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:22:27,445-Speed 5976.57 samples/sec Loss 1.5102 LearningRate 0.0001 Epoch: 19 Global Step: 204550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:22:34,300-Speed 5976.49 samples/sec Loss 1.5072 LearningRate 0.0001 Epoch: 19 Global Step: 204560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:22:41,194-Speed 5942.33 samples/sec Loss 1.4983 LearningRate 0.0001 Epoch: 19 Global Step: 204570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:22:48,103-Speed 5929.68 samples/sec Loss 1.5116 LearningRate 0.0001 Epoch: 19 Global Step: 204580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:22:55,005-Speed 5936.05 samples/sec Loss 1.4848 LearningRate 0.0001 Epoch: 19 Global Step: 204590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:23:01,869-Speed 5968.28 samples/sec Loss 1.5056 LearningRate 0.0001 Epoch: 19 Global Step: 204600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:23:08,831-Speed 5884.89 samples/sec Loss 1.4940 LearningRate 0.0001 Epoch: 19 Global Step: 204610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:23:15,775-Speed 5901.05 samples/sec Loss 1.5045 LearningRate 0.0001 Epoch: 19 Global Step: 204620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:23:22,629-Speed 5977.70 samples/sec Loss 1.5056 LearningRate 0.0001 Epoch: 19 Global Step: 204630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:23:29,482-Speed 5978.23 samples/sec Loss 1.4862 LearningRate 0.0001 Epoch: 19 Global Step: 204640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 12:23:36,322-Speed 5991.47 samples/sec Loss 1.5074 LearningRate 0.0001 Epoch: 19 Global Step: 204650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:23:43,177-Speed 5976.00 samples/sec Loss 1.5140 LearningRate 0.0001 Epoch: 19 Global Step: 204660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:23:50,033-Speed 5975.92 samples/sec Loss 1.4783 LearningRate 0.0001 Epoch: 19 Global Step: 204670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:23:56,898-Speed 5967.49 samples/sec Loss 1.5047 LearningRate 0.0001 Epoch: 19 Global Step: 204680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:24:03,768-Speed 5965.74 samples/sec Loss 1.5257 LearningRate 0.0001 Epoch: 19 Global Step: 204690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:24:10,612-Speed 5986.57 samples/sec Loss 1.4892 LearningRate 0.0001 Epoch: 19 Global Step: 204700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:24:17,483-Speed 5962.72 samples/sec Loss 1.4943 LearningRate 0.0001 Epoch: 19 Global Step: 204710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:24:24,328-Speed 5985.09 samples/sec Loss 1.5183 LearningRate 0.0001 Epoch: 19 Global Step: 204720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:24:31,182-Speed 5976.91 samples/sec Loss 1.4678 LearningRate 0.0001 Epoch: 19 Global Step: 204730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:24:38,038-Speed 5975.61 samples/sec Loss 1.5193 LearningRate 0.0001 Epoch: 19 Global Step: 204740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:24:44,893-Speed 5976.44 samples/sec Loss 1.5034 LearningRate 0.0001 Epoch: 19 Global Step: 204750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:24:51,743-Speed 5981.33 samples/sec Loss 1.5013 LearningRate 0.0001 Epoch: 19 Global Step: 204760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:24:58,599-Speed 5975.03 samples/sec Loss 1.5184 LearningRate 0.0001 Epoch: 19 Global Step: 204770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:25:05,450-Speed 5979.49 samples/sec Loss 1.4927 LearningRate 0.0001 Epoch: 19 Global Step: 204780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:25:12,296-Speed 5983.93 samples/sec Loss 1.4982 LearningRate 0.0001 Epoch: 19 Global Step: 204790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:25:19,141-Speed 5984.22 samples/sec Loss 1.5074 LearningRate 0.0001 Epoch: 19 Global Step: 204800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:25:25,986-Speed 5985.56 samples/sec Loss 1.4912 LearningRate 0.0001 Epoch: 19 Global Step: 204810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:25:32,868-Speed 5953.58 samples/sec Loss 1.5144 LearningRate 0.0001 Epoch: 19 Global Step: 204820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 12:25:39,713-Speed 5984.84 samples/sec Loss 1.4973 LearningRate 0.0001 Epoch: 19 Global Step: 204830 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:25:46,565-Speed 5979.13 samples/sec Loss 1.5049 LearningRate 0.0001 Epoch: 19 Global Step: 204840 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:25:53,415-Speed 5980.70 samples/sec Loss 1.4961 LearningRate 0.0001 Epoch: 19 Global Step: 204850 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:26:00,300-Speed 5950.37 samples/sec Loss 1.4932 LearningRate 0.0001 Epoch: 19 Global Step: 204860 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:26:07,142-Speed 5988.07 samples/sec Loss 1.4968 LearningRate 0.0001 Epoch: 19 Global Step: 204870 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:26:14,006-Speed 5970.39 samples/sec Loss 1.5117 LearningRate 0.0001 Epoch: 19 Global Step: 204880 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:26:20,853-Speed 5984.13 samples/sec Loss 1.4984 LearningRate 0.0001 Epoch: 19 Global Step: 204890 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:26:27,693-Speed 5989.27 samples/sec Loss 1.5031 LearningRate 0.0001 Epoch: 19 Global Step: 204900 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:26:34,544-Speed 5980.48 samples/sec Loss 1.4856 LearningRate 0.0001 Epoch: 19 Global Step: 204910 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:26:41,399-Speed 5975.73 samples/sec Loss 1.4604 LearningRate 0.0001 Epoch: 19 Global Step: 204920 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:26:48,244-Speed 5984.64 samples/sec Loss 1.4839 LearningRate 0.0001 Epoch: 19 Global Step: 204930 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:26:55,102-Speed 5974.01 samples/sec Loss 1.5062 LearningRate 0.0001 Epoch: 19 Global Step: 204940 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:27:01,980-Speed 5957.24 samples/sec Loss 1.4974 LearningRate 0.0001 Epoch: 19 Global Step: 204950 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:27:08,829-Speed 5981.08 samples/sec Loss 1.4950 LearningRate 0.0001 Epoch: 19 Global Step: 204960 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:27:15,703-Speed 5960.97 samples/sec Loss 1.5100 LearningRate 0.0001 Epoch: 19 Global Step: 204970 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:27:22,561-Speed 5974.15 samples/sec Loss 1.4946 LearningRate 0.0001 Epoch: 19 Global Step: 204980 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:27:29,430-Speed 5964.44 samples/sec Loss 1.4932 LearningRate 0.0001 Epoch: 19 Global Step: 204990 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:27:36,284-Speed 5976.76 samples/sec Loss 1.4871 LearningRate 0.0001 Epoch: 19 Global Step: 205000 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:28:03,131-[lfw][205000]XNorm: 23.454143 Training: 2022-01-09 12:28:03,132-[lfw][205000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-01-09 12:28:03,132-[lfw][205000]Accuracy-Highest: 0.99833 Training: 2022-01-09 12:28:34,222-[cfp_fp][205000]XNorm: 21.622859 Training: 2022-01-09 12:28:34,223-[cfp_fp][205000]Accuracy-Flip: 0.99271+-0.00329 Training: 2022-01-09 12:28:34,224-[cfp_fp][205000]Accuracy-Highest: 0.99286 Training: 2022-01-09 12:29:00,925-[agedb_30][205000]XNorm: 22.974586 Training: 2022-01-09 12:29:00,926-[agedb_30][205000]Accuracy-Flip: 0.98267+-0.00554 Training: 2022-01-09 12:29:00,927-[agedb_30][205000]Accuracy-Highest: 0.98300 Training: 2022-01-09 12:29:07,753-Speed 447.81 samples/sec Loss 1.4973 LearningRate 0.0001 Epoch: 19 Global Step: 205010 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:29:14,595-Speed 5988.60 samples/sec Loss 1.5018 LearningRate 0.0001 Epoch: 19 Global Step: 205020 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:29:21,446-Speed 5979.25 samples/sec Loss 1.4811 LearningRate 0.0001 Epoch: 19 Global Step: 205030 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:29:28,318-Speed 5962.55 samples/sec Loss 1.5022 LearningRate 0.0001 Epoch: 19 Global Step: 205040 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:29:35,155-Speed 5991.65 samples/sec Loss 1.5117 LearningRate 0.0001 Epoch: 19 Global Step: 205050 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:29:42,006-Speed 5979.84 samples/sec Loss 1.5086 LearningRate 0.0001 Epoch: 19 Global Step: 205060 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:29:48,870-Speed 5968.12 samples/sec Loss 1.4964 LearningRate 0.0001 Epoch: 19 Global Step: 205070 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:29:55,715-Speed 5984.96 samples/sec Loss 1.4936 LearningRate 0.0001 Epoch: 19 Global Step: 205080 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 12:30:02,592-Speed 5956.91 samples/sec Loss 1.5108 LearningRate 0.0001 Epoch: 19 Global Step: 205090 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 12:30:09,453-Speed 5971.28 samples/sec Loss 1.4880 LearningRate 0.0001 Epoch: 19 Global Step: 205100 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 12:30:16,310-Speed 5974.15 samples/sec Loss 1.5223 LearningRate 0.0001 Epoch: 19 Global Step: 205110 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 12:30:23,195-Speed 5955.98 samples/sec Loss 1.4940 LearningRate 0.0001 Epoch: 19 Global Step: 205120 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 12:30:30,089-Speed 5942.15 samples/sec Loss 1.4985 LearningRate 0.0001 Epoch: 19 Global Step: 205130 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 12:30:36,968-Speed 5955.28 samples/sec Loss 1.5033 LearningRate 0.0001 Epoch: 19 Global Step: 205140 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 12:30:43,831-Speed 5970.45 samples/sec Loss 1.5137 LearningRate 0.0001 Epoch: 19 Global Step: 205150 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 12:30:50,688-Speed 5974.20 samples/sec Loss 1.5046 LearningRate 0.0001 Epoch: 19 Global Step: 205160 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 12:30:57,552-Speed 5968.28 samples/sec Loss 1.4722 LearningRate 0.0001 Epoch: 19 Global Step: 205170 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 12:31:04,442-Speed 5946.21 samples/sec Loss 1.5032 LearningRate 0.0001 Epoch: 19 Global Step: 205180 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:31:11,297-Speed 5977.06 samples/sec Loss 1.4892 LearningRate 0.0001 Epoch: 19 Global Step: 205190 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:31:18,159-Speed 5970.56 samples/sec Loss 1.5032 LearningRate 0.0001 Epoch: 19 Global Step: 205200 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:31:25,022-Speed 5969.33 samples/sec Loss 1.4915 LearningRate 0.0001 Epoch: 19 Global Step: 205210 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:31:31,890-Speed 5965.47 samples/sec Loss 1.4656 LearningRate 0.0001 Epoch: 19 Global Step: 205220 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:31:38,798-Speed 5930.64 samples/sec Loss 1.4846 LearningRate 0.0001 Epoch: 19 Global Step: 205230 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:31:45,653-Speed 5976.77 samples/sec Loss 1.4877 LearningRate 0.0001 Epoch: 19 Global Step: 205240 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:31:52,506-Speed 5977.85 samples/sec Loss 1.5146 LearningRate 0.0001 Epoch: 19 Global Step: 205250 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:31:59,377-Speed 5962.43 samples/sec Loss 1.4800 LearningRate 0.0001 Epoch: 19 Global Step: 205260 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:32:06,224-Speed 5982.62 samples/sec Loss 1.5029 LearningRate 0.0001 Epoch: 19 Global Step: 205270 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:32:13,121-Speed 5940.69 samples/sec Loss 1.4662 LearningRate 0.0001 Epoch: 19 Global Step: 205280 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:32:19,972-Speed 5979.64 samples/sec Loss 1.5031 LearningRate 0.0001 Epoch: 19 Global Step: 205290 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:32:26,828-Speed 5975.50 samples/sec Loss 1.4907 LearningRate 0.0000 Epoch: 19 Global Step: 205300 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:32:33,680-Speed 5978.83 samples/sec Loss 1.5058 LearningRate 0.0000 Epoch: 19 Global Step: 205310 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:32:40,559-Speed 5955.50 samples/sec Loss 1.5026 LearningRate 0.0000 Epoch: 19 Global Step: 205320 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:32:47,406-Speed 5982.80 samples/sec Loss 1.5175 LearningRate 0.0000 Epoch: 19 Global Step: 205330 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:32:54,274-Speed 5965.31 samples/sec Loss 1.5022 LearningRate 0.0000 Epoch: 19 Global Step: 205340 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:33:01,121-Speed 5983.29 samples/sec Loss 1.5004 LearningRate 0.0000 Epoch: 19 Global Step: 205350 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:33:07,975-Speed 5977.47 samples/sec Loss 1.4864 LearningRate 0.0000 Epoch: 19 Global Step: 205360 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:33:14,847-Speed 5965.03 samples/sec Loss 1.4962 LearningRate 0.0000 Epoch: 19 Global Step: 205370 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:33:21,706-Speed 5973.15 samples/sec Loss 1.4815 LearningRate 0.0000 Epoch: 19 Global Step: 205380 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:33:28,581-Speed 5959.44 samples/sec Loss 1.4920 LearningRate 0.0000 Epoch: 19 Global Step: 205390 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:33:35,429-Speed 5982.26 samples/sec Loss 1.4983 LearningRate 0.0000 Epoch: 19 Global Step: 205400 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:33:42,302-Speed 5960.51 samples/sec Loss 1.4871 LearningRate 0.0000 Epoch: 19 Global Step: 205410 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:33:49,156-Speed 5977.64 samples/sec Loss 1.5182 LearningRate 0.0000 Epoch: 19 Global Step: 205420 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:33:56,012-Speed 5975.44 samples/sec Loss 1.4971 LearningRate 0.0000 Epoch: 19 Global Step: 205430 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:34:02,867-Speed 5976.97 samples/sec Loss 1.4812 LearningRate 0.0000 Epoch: 19 Global Step: 205440 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:34:09,725-Speed 5973.77 samples/sec Loss 1.4661 LearningRate 0.0000 Epoch: 19 Global Step: 205450 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:34:16,579-Speed 5976.58 samples/sec Loss 1.4916 LearningRate 0.0000 Epoch: 19 Global Step: 205460 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:34:23,463-Speed 5952.00 samples/sec Loss 1.5071 LearningRate 0.0000 Epoch: 19 Global Step: 205470 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:34:30,316-Speed 5978.32 samples/sec Loss 1.4695 LearningRate 0.0000 Epoch: 19 Global Step: 205480 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:34:37,192-Speed 5957.29 samples/sec Loss 1.4829 LearningRate 0.0000 Epoch: 19 Global Step: 205490 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:34:44,053-Speed 5971.72 samples/sec Loss 1.4960 LearningRate 0.0000 Epoch: 19 Global Step: 205500 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:34:50,936-Speed 5951.61 samples/sec Loss 1.4594 LearningRate 0.0000 Epoch: 19 Global Step: 205510 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:34:57,796-Speed 5972.06 samples/sec Loss 1.4759 LearningRate 0.0000 Epoch: 19 Global Step: 205520 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:35:04,648-Speed 5978.88 samples/sec Loss 1.4655 LearningRate 0.0000 Epoch: 19 Global Step: 205530 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:35:11,499-Speed 5981.85 samples/sec Loss 1.4785 LearningRate 0.0000 Epoch: 19 Global Step: 205540 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:35:18,365-Speed 5966.63 samples/sec Loss 1.4987 LearningRate 0.0000 Epoch: 19 Global Step: 205550 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:35:25,234-Speed 5964.77 samples/sec Loss 1.4826 LearningRate 0.0000 Epoch: 19 Global Step: 205560 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:35:32,103-Speed 5964.19 samples/sec Loss 1.5212 LearningRate 0.0000 Epoch: 19 Global Step: 205570 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:35:38,950-Speed 5983.39 samples/sec Loss 1.4603 LearningRate 0.0000 Epoch: 19 Global Step: 205580 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:35:45,811-Speed 5970.68 samples/sec Loss 1.5029 LearningRate 0.0000 Epoch: 19 Global Step: 205590 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:35:52,656-Speed 5985.40 samples/sec Loss 1.5123 LearningRate 0.0000 Epoch: 19 Global Step: 205600 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:35:59,531-Speed 5958.96 samples/sec Loss 1.4964 LearningRate 0.0000 Epoch: 19 Global Step: 205610 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:36:06,396-Speed 5967.68 samples/sec Loss 1.5054 LearningRate 0.0000 Epoch: 19 Global Step: 205620 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:36:13,259-Speed 5978.25 samples/sec Loss 1.5129 LearningRate 0.0000 Epoch: 19 Global Step: 205630 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:36:20,127-Speed 5964.65 samples/sec Loss 1.4915 LearningRate 0.0000 Epoch: 19 Global Step: 205640 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:36:27,007-Speed 5967.78 samples/sec Loss 1.5049 LearningRate 0.0000 Epoch: 19 Global Step: 205650 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:36:33,856-Speed 5980.98 samples/sec Loss 1.5230 LearningRate 0.0000 Epoch: 19 Global Step: 205660 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:36:40,713-Speed 5976.71 samples/sec Loss 1.4931 LearningRate 0.0000 Epoch: 19 Global Step: 205670 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:36:47,566-Speed 5978.36 samples/sec Loss 1.4989 LearningRate 0.0000 Epoch: 19 Global Step: 205680 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:36:54,419-Speed 5979.68 samples/sec Loss 1.4827 LearningRate 0.0000 Epoch: 19 Global Step: 205690 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:37:01,294-Speed 5959.20 samples/sec Loss 1.4998 LearningRate 0.0000 Epoch: 19 Global Step: 205700 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:37:08,157-Speed 5968.77 samples/sec Loss 1.5013 LearningRate 0.0000 Epoch: 19 Global Step: 205710 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:37:15,003-Speed 5984.25 samples/sec Loss 1.4782 LearningRate 0.0000 Epoch: 19 Global Step: 205720 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:37:21,869-Speed 5966.76 samples/sec Loss 1.4844 LearningRate 0.0000 Epoch: 19 Global Step: 205730 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:37:28,745-Speed 5961.39 samples/sec Loss 1.4913 LearningRate 0.0000 Epoch: 19 Global Step: 205740 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:37:35,602-Speed 5974.74 samples/sec Loss 1.4935 LearningRate 0.0000 Epoch: 19 Global Step: 205750 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:37:42,457-Speed 5976.92 samples/sec Loss 1.4598 LearningRate 0.0000 Epoch: 19 Global Step: 205760 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:37:49,316-Speed 5972.36 samples/sec Loss 1.4909 LearningRate 0.0000 Epoch: 19 Global Step: 205770 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:37:56,160-Speed 5985.73 samples/sec Loss 1.5134 LearningRate 0.0000 Epoch: 19 Global Step: 205780 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:38:03,027-Speed 5966.38 samples/sec Loss 1.4926 LearningRate 0.0000 Epoch: 19 Global Step: 205790 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:38:09,877-Speed 5981.09 samples/sec Loss 1.5133 LearningRate 0.0000 Epoch: 19 Global Step: 205800 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:38:16,751-Speed 5959.67 samples/sec Loss 1.4765 LearningRate 0.0000 Epoch: 19 Global Step: 205810 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:38:23,749-Speed 5853.72 samples/sec Loss 1.4968 LearningRate 0.0000 Epoch: 19 Global Step: 205820 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:38:30,720-Speed 5877.81 samples/sec Loss 1.4725 LearningRate 0.0000 Epoch: 19 Global Step: 205830 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:38:37,580-Speed 5972.03 samples/sec Loss 1.4962 LearningRate 0.0000 Epoch: 19 Global Step: 205840 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:38:44,440-Speed 5971.82 samples/sec Loss 1.4653 LearningRate 0.0000 Epoch: 19 Global Step: 205850 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:38:51,289-Speed 5981.87 samples/sec Loss 1.4577 LearningRate 0.0000 Epoch: 19 Global Step: 205860 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:38:58,166-Speed 5957.41 samples/sec Loss 1.4573 LearningRate 0.0000 Epoch: 19 Global Step: 205870 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:39:05,117-Speed 5894.10 samples/sec Loss 1.4899 LearningRate 0.0000 Epoch: 19 Global Step: 205880 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:39:12,074-Speed 5888.79 samples/sec Loss 1.4717 LearningRate 0.0000 Epoch: 19 Global Step: 205890 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:39:19,028-Speed 5890.92 samples/sec Loss 1.4909 LearningRate 0.0000 Epoch: 19 Global Step: 205900 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:39:25,885-Speed 5975.63 samples/sec Loss 1.4900 LearningRate 0.0000 Epoch: 19 Global Step: 205910 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:39:32,746-Speed 5971.40 samples/sec Loss 1.4908 LearningRate 0.0000 Epoch: 19 Global Step: 205920 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:39:39,627-Speed 5952.89 samples/sec Loss 1.5082 LearningRate 0.0000 Epoch: 19 Global Step: 205930 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:39:46,474-Speed 5984.41 samples/sec Loss 1.4709 LearningRate 0.0000 Epoch: 19 Global Step: 205940 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:39:53,365-Speed 5944.30 samples/sec Loss 1.4643 LearningRate 0.0000 Epoch: 19 Global Step: 205950 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:40:00,223-Speed 5974.03 samples/sec Loss 1.4800 LearningRate 0.0000 Epoch: 19 Global Step: 205960 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:40:07,097-Speed 5959.94 samples/sec Loss 1.4699 LearningRate 0.0000 Epoch: 19 Global Step: 205970 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:40:13,944-Speed 5983.64 samples/sec Loss 1.4905 LearningRate 0.0000 Epoch: 19 Global Step: 205980 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:40:20,787-Speed 5987.04 samples/sec Loss 1.4545 LearningRate 0.0000 Epoch: 19 Global Step: 205990 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:40:27,649-Speed 5969.76 samples/sec Loss 1.4700 LearningRate 0.0000 Epoch: 19 Global Step: 206000 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:40:34,506-Speed 5975.27 samples/sec Loss 1.4906 LearningRate 0.0000 Epoch: 19 Global Step: 206010 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:40:41,358-Speed 5978.10 samples/sec Loss 1.4983 LearningRate 0.0000 Epoch: 19 Global Step: 206020 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:40:48,206-Speed 5982.27 samples/sec Loss 1.4707 LearningRate 0.0000 Epoch: 19 Global Step: 206030 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:40:55,066-Speed 5971.75 samples/sec Loss 1.4682 LearningRate 0.0000 Epoch: 19 Global Step: 206040 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:41:01,918-Speed 5980.02 samples/sec Loss 1.5043 LearningRate 0.0000 Epoch: 19 Global Step: 206050 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:41:08,759-Speed 5988.97 samples/sec Loss 1.4912 LearningRate 0.0000 Epoch: 19 Global Step: 206060 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:41:15,602-Speed 5986.44 samples/sec Loss 1.4945 LearningRate 0.0000 Epoch: 19 Global Step: 206070 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:41:22,441-Speed 5990.13 samples/sec Loss 1.4782 LearningRate 0.0000 Epoch: 19 Global Step: 206080 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:41:29,308-Speed 5966.50 samples/sec Loss 1.4757 LearningRate 0.0000 Epoch: 19 Global Step: 206090 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:41:36,182-Speed 5959.82 samples/sec Loss 1.4916 LearningRate 0.0000 Epoch: 19 Global Step: 206100 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:41:43,033-Speed 5979.48 samples/sec Loss 1.4970 LearningRate 0.0000 Epoch: 19 Global Step: 206110 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:41:49,893-Speed 5972.46 samples/sec Loss 1.4619 LearningRate 0.0000 Epoch: 19 Global Step: 206120 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:41:56,746-Speed 5977.94 samples/sec Loss 1.4895 LearningRate 0.0000 Epoch: 19 Global Step: 206130 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:42:10,589-Speed 2959.18 samples/sec Loss 1.4999 LearningRate 0.0000 Epoch: 19 Global Step: 206140 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:42:17,489-Speed 5939.98 samples/sec Loss 1.4825 LearningRate 0.0000 Epoch: 19 Global Step: 206150 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:42:24,332-Speed 5986.81 samples/sec Loss 1.4730 LearningRate 0.0000 Epoch: 19 Global Step: 206160 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:42:31,182-Speed 5980.54 samples/sec Loss 1.4823 LearningRate 0.0000 Epoch: 19 Global Step: 206170 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:42:38,050-Speed 5965.56 samples/sec Loss 1.5093 LearningRate 0.0000 Epoch: 19 Global Step: 206180 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:42:44,903-Speed 5978.11 samples/sec Loss 1.4835 LearningRate 0.0000 Epoch: 19 Global Step: 206190 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:42:51,760-Speed 5977.51 samples/sec Loss 1.4997 LearningRate 0.0000 Epoch: 19 Global Step: 206200 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:42:58,612-Speed 5978.94 samples/sec Loss 1.5004 LearningRate 0.0000 Epoch: 19 Global Step: 206210 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:43:05,483-Speed 5962.19 samples/sec Loss 1.4868 LearningRate 0.0000 Epoch: 19 Global Step: 206220 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:43:12,333-Speed 5980.92 samples/sec Loss 1.4973 LearningRate 0.0000 Epoch: 19 Global Step: 206230 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:43:19,198-Speed 5968.15 samples/sec Loss 1.4981 LearningRate 0.0000 Epoch: 19 Global Step: 206240 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:43:26,067-Speed 5963.87 samples/sec Loss 1.5058 LearningRate 0.0000 Epoch: 19 Global Step: 206250 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:43:32,926-Speed 5973.31 samples/sec Loss 1.4919 LearningRate 0.0000 Epoch: 19 Global Step: 206260 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:43:39,774-Speed 5982.58 samples/sec Loss 1.4904 LearningRate 0.0000 Epoch: 19 Global Step: 206270 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:43:46,636-Speed 5970.42 samples/sec Loss 1.4766 LearningRate 0.0000 Epoch: 19 Global Step: 206280 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:43:53,508-Speed 5961.48 samples/sec Loss 1.4766 LearningRate 0.0000 Epoch: 19 Global Step: 206290 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:44:00,371-Speed 5969.54 samples/sec Loss 1.4807 LearningRate 0.0000 Epoch: 19 Global Step: 206300 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:44:07,230-Speed 5973.64 samples/sec Loss 1.4793 LearningRate 0.0000 Epoch: 19 Global Step: 206310 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:44:14,078-Speed 5981.73 samples/sec Loss 1.4798 LearningRate 0.0000 Epoch: 19 Global Step: 206320 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:44:20,945-Speed 5966.44 samples/sec Loss 1.4869 LearningRate 0.0000 Epoch: 19 Global Step: 206330 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:44:27,798-Speed 5978.63 samples/sec Loss 1.4891 LearningRate 0.0000 Epoch: 19 Global Step: 206340 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:44:34,659-Speed 5971.11 samples/sec Loss 1.4529 LearningRate 0.0000 Epoch: 19 Global Step: 206350 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:44:41,507-Speed 5982.00 samples/sec Loss 1.5222 LearningRate 0.0000 Epoch: 19 Global Step: 206360 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:44:48,402-Speed 5941.90 samples/sec Loss 1.4848 LearningRate 0.0000 Epoch: 19 Global Step: 206370 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:44:55,260-Speed 5973.59 samples/sec Loss 1.4902 LearningRate 0.0000 Epoch: 19 Global Step: 206380 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:45:02,133-Speed 5961.30 samples/sec Loss 1.4923 LearningRate 0.0000 Epoch: 19 Global Step: 206390 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:45:08,991-Speed 5974.73 samples/sec Loss 1.4686 LearningRate 0.0000 Epoch: 19 Global Step: 206400 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:45:15,846-Speed 5976.10 samples/sec Loss 1.5141 LearningRate 0.0000 Epoch: 19 Global Step: 206410 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:45:22,695-Speed 5981.78 samples/sec Loss 1.4679 LearningRate 0.0000 Epoch: 19 Global Step: 206420 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 12:45:29,548-Speed 5979.89 samples/sec Loss 1.4586 LearningRate 0.0000 Epoch: 19 Global Step: 206430 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 12:45:36,435-Speed 5948.19 samples/sec Loss 1.4692 LearningRate 0.0000 Epoch: 19 Global Step: 206440 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 12:45:43,295-Speed 5972.06 samples/sec Loss 1.4942 LearningRate 0.0000 Epoch: 19 Global Step: 206450 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 12:45:50,165-Speed 5963.15 samples/sec Loss 1.4813 LearningRate 0.0000 Epoch: 19 Global Step: 206460 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 12:45:57,028-Speed 5969.49 samples/sec Loss 1.4887 LearningRate 0.0000 Epoch: 19 Global Step: 206470 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 12:46:03,906-Speed 5958.65 samples/sec Loss 1.5010 LearningRate 0.0000 Epoch: 19 Global Step: 206480 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 12:46:10,753-Speed 5983.68 samples/sec Loss 1.4597 LearningRate 0.0000 Epoch: 19 Global Step: 206490 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 12:46:17,603-Speed 5980.35 samples/sec Loss 1.4771 LearningRate 0.0000 Epoch: 19 Global Step: 206500 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 12:46:24,450-Speed 5983.20 samples/sec Loss 1.4856 LearningRate 0.0000 Epoch: 19 Global Step: 206510 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 12:46:31,338-Speed 5947.74 samples/sec Loss 1.4824 LearningRate 0.0000 Epoch: 19 Global Step: 206520 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:46:38,203-Speed 5968.04 samples/sec Loss 1.5069 LearningRate 0.0000 Epoch: 19 Global Step: 206530 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:46:45,064-Speed 5970.30 samples/sec Loss 1.4820 LearningRate 0.0000 Epoch: 19 Global Step: 206540 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:46:51,922-Speed 5974.22 samples/sec Loss 1.4859 LearningRate 0.0000 Epoch: 19 Global Step: 206550 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:46:58,788-Speed 5966.65 samples/sec Loss 1.4710 LearningRate 0.0000 Epoch: 19 Global Step: 206560 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:47:05,653-Speed 5967.31 samples/sec Loss 1.4852 LearningRate 0.0000 Epoch: 19 Global Step: 206570 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:47:12,513-Speed 5972.62 samples/sec Loss 1.4961 LearningRate 0.0000 Epoch: 19 Global Step: 206580 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:47:19,374-Speed 5970.56 samples/sec Loss 1.4798 LearningRate 0.0000 Epoch: 19 Global Step: 206590 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:47:26,233-Speed 5972.58 samples/sec Loss 1.5281 LearningRate 0.0000 Epoch: 19 Global Step: 206600 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:47:33,083-Speed 5980.84 samples/sec Loss 1.4687 LearningRate 0.0000 Epoch: 19 Global Step: 206610 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:47:40,118-Speed 5825.99 samples/sec Loss 1.4895 LearningRate 0.0000 Epoch: 19 Global Step: 206620 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:47:47,086-Speed 5879.85 samples/sec Loss 1.4781 LearningRate 0.0000 Epoch: 19 Global Step: 206630 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:47:53,924-Speed 5990.98 samples/sec Loss 1.5233 LearningRate 0.0000 Epoch: 19 Global Step: 206640 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:48:00,778-Speed 5976.96 samples/sec Loss 1.4756 LearningRate 0.0000 Epoch: 19 Global Step: 206650 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:48:07,619-Speed 5989.08 samples/sec Loss 1.4716 LearningRate 0.0000 Epoch: 19 Global Step: 206660 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:48:14,477-Speed 5973.64 samples/sec Loss 1.4619 LearningRate 0.0000 Epoch: 19 Global Step: 206670 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:48:21,339-Speed 5970.23 samples/sec Loss 1.4790 LearningRate 0.0000 Epoch: 19 Global Step: 206680 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:48:28,203-Speed 5968.52 samples/sec Loss 1.4912 LearningRate 0.0000 Epoch: 19 Global Step: 206690 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:48:35,058-Speed 5975.78 samples/sec Loss 1.4995 LearningRate 0.0000 Epoch: 19 Global Step: 206700 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:48:41,938-Speed 5954.68 samples/sec Loss 1.4879 LearningRate 0.0000 Epoch: 19 Global Step: 206710 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:48:48,809-Speed 5962.72 samples/sec Loss 1.4759 LearningRate 0.0000 Epoch: 19 Global Step: 206720 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:48:55,687-Speed 5956.50 samples/sec Loss 1.4704 LearningRate 0.0000 Epoch: 19 Global Step: 206730 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:49:02,535-Speed 5981.99 samples/sec Loss 1.4970 LearningRate 0.0000 Epoch: 19 Global Step: 206740 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:49:09,388-Speed 5978.42 samples/sec Loss 1.5044 LearningRate 0.0000 Epoch: 19 Global Step: 206750 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:49:16,236-Speed 5982.54 samples/sec Loss 1.4825 LearningRate 0.0000 Epoch: 19 Global Step: 206760 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:49:23,203-Speed 5879.66 samples/sec Loss 1.4891 LearningRate 0.0000 Epoch: 19 Global Step: 206770 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:49:30,045-Speed 5987.62 samples/sec Loss 1.4959 LearningRate 0.0000 Epoch: 19 Global Step: 206780 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:49:36,897-Speed 5978.98 samples/sec Loss 1.4690 LearningRate 0.0000 Epoch: 19 Global Step: 206790 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:49:43,743-Speed 5984.16 samples/sec Loss 1.4949 LearningRate 0.0000 Epoch: 19 Global Step: 206800 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:49:50,621-Speed 5956.05 samples/sec Loss 1.4791 LearningRate 0.0000 Epoch: 19 Global Step: 206810 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:49:57,487-Speed 5967.19 samples/sec Loss 1.4862 LearningRate 0.0000 Epoch: 19 Global Step: 206820 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:50:04,342-Speed 5976.03 samples/sec Loss 1.5036 LearningRate 0.0000 Epoch: 19 Global Step: 206830 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:50:11,215-Speed 5961.84 samples/sec Loss 1.4929 LearningRate 0.0000 Epoch: 19 Global Step: 206840 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:50:18,066-Speed 5979.12 samples/sec Loss 1.4767 LearningRate 0.0000 Epoch: 19 Global Step: 206850 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:50:24,915-Speed 5981.60 samples/sec Loss 1.4572 LearningRate 0.0000 Epoch: 19 Global Step: 206860 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:50:31,782-Speed 5965.97 samples/sec Loss 1.5182 LearningRate 0.0000 Epoch: 19 Global Step: 206870 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:50:38,637-Speed 5976.10 samples/sec Loss 1.4795 LearningRate 0.0000 Epoch: 19 Global Step: 206880 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:50:45,515-Speed 5957.22 samples/sec Loss 1.4759 LearningRate 0.0000 Epoch: 19 Global Step: 206890 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:50:52,376-Speed 5970.58 samples/sec Loss 1.4836 LearningRate 0.0000 Epoch: 19 Global Step: 206900 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:50:59,231-Speed 5977.41 samples/sec Loss 1.4887 LearningRate 0.0000 Epoch: 19 Global Step: 206910 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:51:06,088-Speed 5973.84 samples/sec Loss 1.4588 LearningRate 0.0000 Epoch: 19 Global Step: 206920 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:51:12,935-Speed 5983.42 samples/sec Loss 1.4833 LearningRate 0.0000 Epoch: 19 Global Step: 206930 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:51:19,789-Speed 5977.54 samples/sec Loss 1.5066 LearningRate 0.0000 Epoch: 19 Global Step: 206940 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:51:26,634-Speed 5985.22 samples/sec Loss 1.4956 LearningRate 0.0000 Epoch: 19 Global Step: 206950 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:51:33,508-Speed 5967.09 samples/sec Loss 1.4967 LearningRate 0.0000 Epoch: 19 Global Step: 206960 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:51:40,367-Speed 5972.29 samples/sec Loss 1.5012 LearningRate 0.0000 Epoch: 19 Global Step: 206970 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:51:47,231-Speed 5968.38 samples/sec Loss 1.4477 LearningRate 0.0000 Epoch: 19 Global Step: 206980 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:51:54,092-Speed 5971.50 samples/sec Loss 1.4964 LearningRate 0.0000 Epoch: 19 Global Step: 206990 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:52:00,943-Speed 5979.71 samples/sec Loss 1.4602 LearningRate 0.0000 Epoch: 19 Global Step: 207000 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:52:07,814-Speed 5962.82 samples/sec Loss 1.4929 LearningRate 0.0000 Epoch: 19 Global Step: 207010 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:52:14,673-Speed 5972.97 samples/sec Loss 1.5120 LearningRate 0.0000 Epoch: 19 Global Step: 207020 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:52:21,540-Speed 5965.46 samples/sec Loss 1.4648 LearningRate 0.0000 Epoch: 19 Global Step: 207030 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:52:28,418-Speed 5956.14 samples/sec Loss 1.4635 LearningRate 0.0000 Epoch: 19 Global Step: 207040 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:52:35,264-Speed 5984.49 samples/sec Loss 1.4955 LearningRate 0.0000 Epoch: 19 Global Step: 207050 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:52:42,132-Speed 5965.01 samples/sec Loss 1.5012 LearningRate 0.0000 Epoch: 19 Global Step: 207060 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:52:48,992-Speed 5971.65 samples/sec Loss 1.4935 LearningRate 0.0000 Epoch: 19 Global Step: 207070 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:52:55,838-Speed 5984.38 samples/sec Loss 1.4688 LearningRate 0.0000 Epoch: 19 Global Step: 207080 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:53:02,695-Speed 5975.08 samples/sec Loss 1.4872 LearningRate 0.0000 Epoch: 19 Global Step: 207090 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:53:09,555-Speed 5974.66 samples/sec Loss 1.4717 LearningRate 0.0000 Epoch: 19 Global Step: 207100 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:53:16,394-Speed 5990.38 samples/sec Loss 1.5151 LearningRate 0.0000 Epoch: 19 Global Step: 207110 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:53:23,236-Speed 5987.94 samples/sec Loss 1.4629 LearningRate 0.0000 Epoch: 19 Global Step: 207120 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:53:30,106-Speed 5965.15 samples/sec Loss 1.4757 LearningRate 0.0000 Epoch: 19 Global Step: 207130 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:53:36,969-Speed 5969.58 samples/sec Loss 1.4917 LearningRate 0.0000 Epoch: 19 Global Step: 207140 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:53:43,844-Speed 5959.78 samples/sec Loss 1.5104 LearningRate 0.0000 Epoch: 19 Global Step: 207150 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:53:50,696-Speed 5978.73 samples/sec Loss 1.4786 LearningRate 0.0000 Epoch: 19 Global Step: 207160 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:53:57,552-Speed 5975.63 samples/sec Loss 1.4782 LearningRate 0.0000 Epoch: 19 Global Step: 207170 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:54:04,400-Speed 5982.11 samples/sec Loss 1.4684 LearningRate 0.0000 Epoch: 19 Global Step: 207180 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:54:11,266-Speed 5967.61 samples/sec Loss 1.4684 LearningRate 0.0000 Epoch: 19 Global Step: 207190 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:54:18,112-Speed 5984.26 samples/sec Loss 1.4896 LearningRate 0.0000 Epoch: 19 Global Step: 207200 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:54:24,985-Speed 5960.04 samples/sec Loss 1.5111 LearningRate 0.0000 Epoch: 19 Global Step: 207210 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:54:31,842-Speed 5977.17 samples/sec Loss 1.5104 LearningRate 0.0000 Epoch: 19 Global Step: 207220 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:54:38,717-Speed 5958.50 samples/sec Loss 1.4772 LearningRate 0.0000 Epoch: 19 Global Step: 207230 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:54:45,588-Speed 5962.89 samples/sec Loss 1.4821 LearningRate 0.0000 Epoch: 19 Global Step: 207240 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:54:52,458-Speed 5966.08 samples/sec Loss 1.4929 LearningRate 0.0000 Epoch: 19 Global Step: 207250 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:54:59,335-Speed 5956.85 samples/sec Loss 1.4945 LearningRate 0.0000 Epoch: 19 Global Step: 207260 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:55:06,206-Speed 5962.83 samples/sec Loss 1.5052 LearningRate 0.0000 Epoch: 19 Global Step: 207270 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:55:13,052-Speed 5983.66 samples/sec Loss 1.4985 LearningRate 0.0000 Epoch: 19 Global Step: 207280 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 12:55:19,893-Speed 5990.57 samples/sec Loss 1.4743 LearningRate 0.0000 Epoch: 19 Global Step: 207290 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:55:26,748-Speed 5979.68 samples/sec Loss 1.4865 LearningRate 0.0000 Epoch: 19 Global Step: 207300 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:55:33,616-Speed 5964.71 samples/sec Loss 1.4767 LearningRate 0.0000 Epoch: 19 Global Step: 207310 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:55:40,478-Speed 5970.87 samples/sec Loss 1.4929 LearningRate 0.0000 Epoch: 19 Global Step: 207320 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:55:47,326-Speed 5983.69 samples/sec Loss 1.5018 LearningRate 0.0000 Epoch: 19 Global Step: 207330 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:55:54,211-Speed 5954.38 samples/sec Loss 1.4981 LearningRate 0.0000 Epoch: 19 Global Step: 207340 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:56:01,056-Speed 5984.53 samples/sec Loss 1.4933 LearningRate 0.0000 Epoch: 19 Global Step: 207350 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:56:07,912-Speed 5975.63 samples/sec Loss 1.4956 LearningRate 0.0000 Epoch: 19 Global Step: 207360 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:56:14,765-Speed 5978.30 samples/sec Loss 1.5111 LearningRate 0.0000 Epoch: 19 Global Step: 207370 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 12:56:21,627-Speed 5972.10 samples/sec Loss 1.5128 LearningRate 0.0000 Epoch: 19 Global Step: 207380 Fp16 Grad Scale: 32768 Required: -0 hours Training: 2022-01-09 12:56:28,484-Speed 5975.01 samples/sec Loss 1.4906 LearningRate 0.0000 Epoch: 19 Global Step: 207390 Fp16 Grad Scale: 65536 Required: -0 hours