Training: 2022-03-04 20:30:03,610-rank_id: 0 Training: 2022-03-04 20:31:59,200-Speed 9419.43 samples/sec Loss 42.4879 LearningRate 0.0000 Epoch: 0 Global Step: 20 Fp16 Grad Scale: 65536 Required: 50 hours Training: 2022-03-04 20:32:25,370-Speed 9391.83 samples/sec Loss 42.4783 LearningRate 0.0000 Epoch: 0 Global Step: 30 Fp16 Grad Scale: 65536 Required: 50 hours Training: 2022-03-04 20:32:51,479-Speed 9413.31 samples/sec Loss 42.4502 LearningRate 0.0000 Epoch: 0 Global Step: 40 Fp16 Grad Scale: 65536 Required: 50 hours Training: 2022-03-04 20:33:17,610-Speed 9405.67 samples/sec Loss 42.4231 LearningRate 0.0000 Epoch: 0 Global Step: 50 Fp16 Grad Scale: 65536 Required: 50 hours Training: 2022-03-04 20:33:43,805-Speed 9382.61 samples/sec Loss 42.3702 LearningRate 0.0000 Epoch: 0 Global Step: 60 Fp16 Grad Scale: 65536 Required: 50 hours Training: 2022-03-04 20:34:10,003-Speed 9381.28 samples/sec Loss 42.2705 LearningRate 0.0000 Epoch: 0 Global Step: 70 Fp16 Grad Scale: 65536 Required: 50 hours Training: 2022-03-04 20:34:36,158-Speed 9397.12 samples/sec Loss 42.1277 LearningRate 0.0000 Epoch: 0 Global Step: 80 Fp16 Grad Scale: 65536 Required: 50 hours Training: 2022-03-04 20:35:02,302-Speed 9400.49 samples/sec Loss 41.9391 LearningRate 0.0000 Epoch: 0 Global Step: 90 Fp16 Grad Scale: 65536 Required: 50 hours Training: 2022-03-04 20:35:28,470-Speed 9392.39 samples/sec Loss 41.7228 LearningRate 0.0000 Epoch: 0 Global Step: 100 Fp16 Grad Scale: 131072 Required: 50 hours Training: 2022-03-04 20:35:54,664-Speed 9383.88 samples/sec Loss 41.4637 LearningRate 0.0000 Epoch: 0 Global Step: 110 Fp16 Grad Scale: 131072 Required: 50 hours Training: 2022-03-04 20:36:20,776-Speed 9412.54 samples/sec Loss 41.1974 LearningRate 0.0000 Epoch: 0 Global Step: 120 Fp16 Grad Scale: 131072 Required: 50 hours Training: 2022-03-04 20:36:46,920-Speed 9400.55 samples/sec Loss 40.9261 LearningRate 0.0000 Epoch: 0 Global Step: 130 Fp16 Grad Scale: 131072 Required: 50 hours Training: 2022-03-04 20:37:13,123-Speed 9379.80 samples/sec Loss 40.6231 LearningRate 0.0000 Epoch: 0 Global Step: 140 Fp16 Grad Scale: 131072 Required: 50 hours Training: 2022-03-04 20:37:39,415-Speed 9347.83 samples/sec Loss 40.3331 LearningRate 0.0000 Epoch: 0 Global Step: 150 Fp16 Grad Scale: 131072 Required: 50 hours Training: 2022-03-04 20:38:05,554-Speed 9403.86 samples/sec Loss 40.0366 LearningRate 0.0000 Epoch: 0 Global Step: 160 Fp16 Grad Scale: 131072 Required: 50 hours Training: 2022-03-04 20:38:31,700-Speed 9400.08 samples/sec Loss 39.7915 LearningRate 0.0000 Epoch: 0 Global Step: 170 Fp16 Grad Scale: 131072 Required: 50 hours Training: 2022-03-04 20:38:57,845-Speed 9400.66 samples/sec Loss 39.5708 LearningRate 0.0000 Epoch: 0 Global Step: 180 Fp16 Grad Scale: 131072 Required: 50 hours Training: 2022-03-04 20:39:24,095-Speed 9362.74 samples/sec Loss 39.3797 LearningRate 0.0000 Epoch: 0 Global Step: 190 Fp16 Grad Scale: 131072 Required: 50 hours Training: 2022-03-04 20:39:50,270-Speed 9389.90 samples/sec Loss 39.2291 LearningRate 0.0000 Epoch: 0 Global Step: 200 Fp16 Grad Scale: 65536 Required: 50 hours Training: 2022-03-04 20:40:16,419-Speed 9398.99 samples/sec Loss 39.0980 LearningRate 0.0000 Epoch: 0 Global Step: 210 Fp16 Grad Scale: 65536 Required: 50 hours Training: 2022-03-04 20:40:42,672-Speed 9361.62 samples/sec Loss 38.9949 LearningRate 0.0000 Epoch: 0 Global Step: 220 Fp16 Grad Scale: 65536 Required: 50 hours Training: 2022-03-04 20:41:08,882-Speed 9377.16 samples/sec Loss 38.9242 LearningRate 0.0000 Epoch: 0 Global Step: 230 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-03-04 20:41:35,104-Speed 9372.62 samples/sec Loss 38.8745 LearningRate 0.0000 Epoch: 0 Global Step: 240 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-03-04 20:42:01,276-Speed 9390.97 samples/sec Loss 38.8414 LearningRate 0.0000 Epoch: 0 Global Step: 250 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-03-04 20:42:27,497-Speed 9373.19 samples/sec Loss 38.9515 LearningRate 0.0000 Epoch: 0 Global Step: 260 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-03-04 20:42:53,706-Speed 9377.37 samples/sec Loss 38.8448 LearningRate 0.0000 Epoch: 0 Global Step: 270 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-03-04 20:43:19,876-Speed 9391.23 samples/sec Loss 38.8188 LearningRate 0.0000 Epoch: 0 Global Step: 280 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-03-04 20:43:46,061-Speed 9385.98 samples/sec Loss 38.8203 LearningRate 0.0000 Epoch: 0 Global Step: 290 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-03-04 20:44:12,273-Speed 9376.50 samples/sec Loss 38.8265 LearningRate 0.0000 Epoch: 0 Global Step: 300 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-03-04 20:44:38,486-Speed 9376.31 samples/sec Loss 38.8582 LearningRate 0.0000 Epoch: 0 Global Step: 310 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-03-04 20:45:04,594-Speed 9413.55 samples/sec Loss 38.8541 LearningRate 0.0000 Epoch: 0 Global Step: 320 Fp16 Grad Scale: 8192 Required: 50 hours Training: 2022-03-04 20:45:30,808-Speed 9375.91 samples/sec Loss 38.8525 LearningRate 0.0000 Epoch: 0 Global Step: 330 Fp16 Grad Scale: 8192 Required: 50 hours Training: 2022-03-04 20:45:57,032-Speed 9372.07 samples/sec Loss 38.8563 LearningRate 0.0000 Epoch: 0 Global Step: 340 Fp16 Grad Scale: 8192 Required: 50 hours Training: 2022-03-04 20:46:23,207-Speed 9389.50 samples/sec Loss 38.8785 LearningRate 0.0001 Epoch: 0 Global Step: 350 Fp16 Grad Scale: 8192 Required: 50 hours Training: 2022-03-04 20:46:49,368-Speed 9394.78 samples/sec Loss 39.2829 LearningRate 0.0001 Epoch: 0 Global Step: 360 Fp16 Grad Scale: 8192 Required: 50 hours Training: 2022-03-04 20:47:15,571-Speed 9379.68 samples/sec Loss 38.9493 LearningRate 0.0001 Epoch: 0 Global Step: 370 Fp16 Grad Scale: 8192 Required: 50 hours Training: 2022-03-04 20:47:41,839-Speed 9356.14 samples/sec Loss 38.8881 LearningRate 0.0001 Epoch: 0 Global Step: 380 Fp16 Grad Scale: 8192 Required: 50 hours Training: 2022-03-04 20:48:08,028-Speed 9384.69 samples/sec Loss 38.8905 LearningRate 0.0001 Epoch: 0 Global Step: 390 Fp16 Grad Scale: 8192 Required: 50 hours Training: 2022-03-04 20:48:34,246-Speed 9374.21 samples/sec Loss 38.8509 LearningRate 0.0001 Epoch: 0 Global Step: 400 Fp16 Grad Scale: 8192 Required: 50 hours Training: 2022-03-04 20:49:00,433-Speed 9385.70 samples/sec Loss 38.8412 LearningRate 0.0001 Epoch: 0 Global Step: 410 Fp16 Grad Scale: 8192 Required: 50 hours Training: 2022-03-04 20:49:26,562-Speed 9406.34 samples/sec Loss 38.8563 LearningRate 0.0001 Epoch: 0 Global Step: 420 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-03-04 20:49:52,715-Speed 9397.81 samples/sec Loss 38.8280 LearningRate 0.0001 Epoch: 0 Global Step: 430 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-03-04 20:50:18,904-Speed 9384.39 samples/sec Loss 38.8262 LearningRate 0.0001 Epoch: 0 Global Step: 440 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-03-04 20:50:45,117-Speed 9376.49 samples/sec Loss 38.8329 LearningRate 0.0001 Epoch: 0 Global Step: 450 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-03-04 20:51:11,304-Speed 9385.23 samples/sec Loss 38.8253 LearningRate 0.0001 Epoch: 0 Global Step: 460 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-03-04 20:51:37,493-Speed 9384.63 samples/sec Loss 38.9925 LearningRate 0.0001 Epoch: 0 Global Step: 470 Fp16 Grad Scale: 2048 Required: 50 hours Training: 2022-03-04 20:52:03,653-Speed 9395.14 samples/sec Loss 38.8594 LearningRate 0.0001 Epoch: 0 Global Step: 480 Fp16 Grad Scale: 2048 Required: 50 hours Training: 2022-03-04 20:52:29,867-Speed 9375.51 samples/sec Loss 38.8976 LearningRate 0.0001 Epoch: 0 Global Step: 490 Fp16 Grad Scale: 2048 Required: 50 hours Training: 2022-03-04 20:52:56,084-Speed 9374.63 samples/sec Loss 38.8854 LearningRate 0.0001 Epoch: 0 Global Step: 500 Fp16 Grad Scale: 2048 Required: 50 hours Training: 2022-03-04 20:53:22,270-Speed 9386.17 samples/sec Loss 38.8807 LearningRate 0.0001 Epoch: 0 Global Step: 510 Fp16 Grad Scale: 2048 Required: 50 hours Training: 2022-03-04 20:53:48,513-Speed 9365.29 samples/sec Loss 38.8871 LearningRate 0.0001 Epoch: 0 Global Step: 520 Fp16 Grad Scale: 2048 Required: 50 hours Training: 2022-03-04 20:54:14,727-Speed 9375.61 samples/sec Loss 38.9119 LearningRate 0.0001 Epoch: 0 Global Step: 530 Fp16 Grad Scale: 2048 Required: 50 hours Training: 2022-03-04 20:54:40,915-Speed 9384.97 samples/sec Loss 38.9298 LearningRate 0.0001 Epoch: 0 Global Step: 540 Fp16 Grad Scale: 2048 Required: 50 hours Training: 2022-03-04 20:55:07,106-Speed 9383.92 samples/sec Loss 38.9156 LearningRate 0.0001 Epoch: 0 Global Step: 550 Fp16 Grad Scale: 2048 Required: 50 hours Training: 2022-03-04 20:55:33,409-Speed 9344.05 samples/sec Loss 38.9260 LearningRate 0.0001 Epoch: 0 Global Step: 560 Fp16 Grad Scale: 2048 Required: 50 hours Training: 2022-03-04 20:55:59,619-Speed 9377.20 samples/sec Loss 38.9574 LearningRate 0.0001 Epoch: 0 Global Step: 570 Fp16 Grad Scale: 4096 Required: 50 hours Training: 2022-03-04 20:56:25,836-Speed 9374.61 samples/sec Loss 38.9363 LearningRate 0.0001 Epoch: 0 Global Step: 580 Fp16 Grad Scale: 4096 Required: 50 hours Training: 2022-03-04 20:56:52,087-Speed 9362.49 samples/sec Loss 38.9533 LearningRate 0.0001 Epoch: 0 Global Step: 590 Fp16 Grad Scale: 4096 Required: 50 hours Training: 2022-03-04 20:57:18,280-Speed 9383.19 samples/sec Loss 39.0026 LearningRate 0.0001 Epoch: 0 Global Step: 600 Fp16 Grad Scale: 4096 Required: 50 hours Training: 2022-03-04 20:57:44,408-Speed 9407.27 samples/sec Loss 39.0228 LearningRate 0.0001 Epoch: 0 Global Step: 610 Fp16 Grad Scale: 2048 Required: 50 hours Training: 2022-03-04 20:58:10,598-Speed 9384.32 samples/sec Loss 39.1995 LearningRate 0.0001 Epoch: 0 Global Step: 620 Fp16 Grad Scale: 2048 Required: 50 hours Training: 2022-03-04 20:58:36,695-Speed 9417.55 samples/sec Loss 39.0336 LearningRate 0.0001 Epoch: 0 Global Step: 630 Fp16 Grad Scale: 2048 Required: 50 hours Training: 2022-03-04 20:59:02,839-Speed 9401.05 samples/sec Loss 39.0512 LearningRate 0.0001 Epoch: 0 Global Step: 640 Fp16 Grad Scale: 2048 Required: 50 hours Training: 2022-03-04 20:59:28,962-Speed 9408.85 samples/sec Loss 39.0467 LearningRate 0.0001 Epoch: 0 Global Step: 650 Fp16 Grad Scale: 2048 Required: 50 hours Training: 2022-03-04 20:59:55,150-Speed 9385.12 samples/sec Loss 39.0546 LearningRate 0.0001 Epoch: 0 Global Step: 660 Fp16 Grad Scale: 2048 Required: 50 hours Training: 2022-03-04 21:00:21,326-Speed 9389.25 samples/sec Loss 39.0568 LearningRate 0.0001 Epoch: 0 Global Step: 670 Fp16 Grad Scale: 2048 Required: 50 hours Training: 2022-03-04 21:00:47,555-Speed 9370.18 samples/sec Loss 39.0575 LearningRate 0.0001 Epoch: 0 Global Step: 680 Fp16 Grad Scale: 2048 Required: 50 hours Training: 2022-03-04 21:01:13,781-Speed 9371.21 samples/sec Loss 39.0660 LearningRate 0.0001 Epoch: 0 Global Step: 690 Fp16 Grad Scale: 2048 Required: 50 hours Training: 2022-03-04 21:01:39,948-Speed 9392.72 samples/sec Loss 39.1022 LearningRate 0.0001 Epoch: 0 Global Step: 700 Fp16 Grad Scale: 2048 Required: 50 hours Training: 2022-03-04 21:02:06,110-Speed 9393.97 samples/sec Loss 39.0941 LearningRate 0.0001 Epoch: 0 Global Step: 710 Fp16 Grad Scale: 4096 Required: 50 hours Training: 2022-03-04 21:02:32,311-Speed 9380.29 samples/sec Loss 39.1007 LearningRate 0.0001 Epoch: 0 Global Step: 720 Fp16 Grad Scale: 4096 Required: 50 hours Training: 2022-03-04 21:02:58,481-Speed 9391.42 samples/sec Loss 39.1284 LearningRate 0.0001 Epoch: 0 Global Step: 730 Fp16 Grad Scale: 4096 Required: 50 hours Training: 2022-03-04 21:03:24,681-Speed 9381.78 samples/sec Loss 39.1243 LearningRate 0.0001 Epoch: 0 Global Step: 740 Fp16 Grad Scale: 4096 Required: 50 hours Training: 2022-03-04 21:03:50,941-Speed 9359.18 samples/sec Loss 39.1188 LearningRate 0.0001 Epoch: 0 Global Step: 750 Fp16 Grad Scale: 4096 Required: 50 hours Training: 2022-03-04 21:04:17,137-Speed 9382.05 samples/sec Loss 39.1153 LearningRate 0.0001 Epoch: 0 Global Step: 760 Fp16 Grad Scale: 4096 Required: 50 hours Training: 2022-03-04 21:04:43,356-Speed 9374.02 samples/sec Loss 39.1058 LearningRate 0.0001 Epoch: 0 Global Step: 770 Fp16 Grad Scale: 4096 Required: 50 hours Training: 2022-03-04 21:05:09,471-Speed 9411.25 samples/sec Loss 39.1136 LearningRate 0.0001 Epoch: 0 Global Step: 780 Fp16 Grad Scale: 4096 Required: 50 hours Training: 2022-03-04 21:05:35,713-Speed 9365.35 samples/sec Loss 39.1154 LearningRate 0.0001 Epoch: 0 Global Step: 790 Fp16 Grad Scale: 4096 Required: 50 hours Training: 2022-03-04 21:06:01,867-Speed 9397.30 samples/sec Loss 39.1301 LearningRate 0.0001 Epoch: 0 Global Step: 800 Fp16 Grad Scale: 4096 Required: 50 hours Training: 2022-03-04 21:06:28,107-Speed 9366.45 samples/sec Loss 39.1347 LearningRate 0.0001 Epoch: 0 Global Step: 810 Fp16 Grad Scale: 8192 Required: 50 hours Training: 2022-03-04 21:06:54,378-Speed 9355.01 samples/sec Loss 39.1288 LearningRate 0.0001 Epoch: 0 Global Step: 820 Fp16 Grad Scale: 8192 Required: 50 hours Training: 2022-03-04 21:07:20,602-Speed 9372.25 samples/sec Loss 39.1449 LearningRate 0.0001 Epoch: 0 Global Step: 830 Fp16 Grad Scale: 8192 Required: 50 hours Training: 2022-03-04 21:07:46,810-Speed 9377.72 samples/sec Loss 39.1504 LearningRate 0.0001 Epoch: 0 Global Step: 840 Fp16 Grad Scale: 8192 Required: 50 hours Training: 2022-03-04 21:08:12,969-Speed 9395.38 samples/sec Loss 39.1360 LearningRate 0.0001 Epoch: 0 Global Step: 850 Fp16 Grad Scale: 8192 Required: 50 hours Training: 2022-03-04 21:08:39,202-Speed 9368.98 samples/sec Loss 39.1317 LearningRate 0.0001 Epoch: 0 Global Step: 860 Fp16 Grad Scale: 8192 Required: 50 hours Training: 2022-03-04 21:09:05,391-Speed 9384.43 samples/sec Loss 39.1395 LearningRate 0.0001 Epoch: 0 Global Step: 870 Fp16 Grad Scale: 8192 Required: 50 hours Training: 2022-03-04 21:09:31,678-Speed 9349.48 samples/sec Loss 39.1393 LearningRate 0.0001 Epoch: 0 Global Step: 880 Fp16 Grad Scale: 8192 Required: 50 hours Training: 2022-03-04 21:09:57,882-Speed 9379.42 samples/sec Loss 39.1531 LearningRate 0.0001 Epoch: 0 Global Step: 890 Fp16 Grad Scale: 8192 Required: 50 hours Training: 2022-03-04 21:10:24,145-Speed 9358.06 samples/sec Loss 39.1415 LearningRate 0.0001 Epoch: 0 Global Step: 900 Fp16 Grad Scale: 8192 Required: 50 hours Training: 2022-03-04 21:10:50,435-Speed 9348.68 samples/sec Loss 39.1467 LearningRate 0.0001 Epoch: 0 Global Step: 910 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-03-04 21:11:16,652-Speed 9374.34 samples/sec Loss 39.1439 LearningRate 0.0001 Epoch: 0 Global Step: 920 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-03-04 21:11:42,948-Speed 9346.94 samples/sec Loss 39.1466 LearningRate 0.0001 Epoch: 0 Global Step: 930 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-03-04 21:12:09,155-Speed 9378.36 samples/sec Loss 39.1386 LearningRate 0.0001 Epoch: 0 Global Step: 940 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-03-04 21:12:35,364-Speed 9377.45 samples/sec Loss 39.1392 LearningRate 0.0001 Epoch: 0 Global Step: 950 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-03-04 21:13:01,530-Speed 9392.66 samples/sec Loss 39.1398 LearningRate 0.0001 Epoch: 0 Global Step: 960 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-03-04 21:13:27,758-Speed 9370.70 samples/sec Loss 39.1443 LearningRate 0.0001 Epoch: 0 Global Step: 970 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-03-04 21:13:53,904-Speed 9400.20 samples/sec Loss 39.1408 LearningRate 0.0001 Epoch: 0 Global Step: 980 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-03-04 21:14:20,053-Speed 9398.87 samples/sec Loss 39.1392 LearningRate 0.0001 Epoch: 0 Global Step: 990 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-03-04 21:14:46,301-Speed 9363.52 samples/sec Loss 39.1501 LearningRate 0.0001 Epoch: 0 Global Step: 1000 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-03-04 21:15:12,537-Speed 9367.82 samples/sec Loss 39.1452 LearningRate 0.0001 Epoch: 0 Global Step: 1010 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-03-04 21:15:38,844-Speed 9342.60 samples/sec Loss 39.1637 LearningRate 0.0001 Epoch: 0 Global Step: 1020 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-03-04 21:16:05,141-Speed 9346.19 samples/sec Loss 39.1496 LearningRate 0.0001 Epoch: 0 Global Step: 1030 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-03-04 21:16:31,394-Speed 9361.48 samples/sec Loss 39.1356 LearningRate 0.0002 Epoch: 0 Global Step: 1040 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-03-04 21:16:57,631-Speed 9367.61 samples/sec Loss 39.1390 LearningRate 0.0002 Epoch: 0 Global Step: 1050 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-03-04 21:17:23,957-Speed 9335.79 samples/sec Loss 39.1362 LearningRate 0.0002 Epoch: 0 Global Step: 1060 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-03-04 21:17:50,204-Speed 9363.90 samples/sec Loss 39.1244 LearningRate 0.0002 Epoch: 0 Global Step: 1070 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-03-04 21:18:16,532-Speed 9335.11 samples/sec Loss 39.1305 LearningRate 0.0002 Epoch: 0 Global Step: 1080 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-03-04 21:18:42,846-Speed 9339.77 samples/sec Loss 39.1270 LearningRate 0.0002 Epoch: 0 Global Step: 1090 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-03-04 21:19:09,167-Speed 9337.85 samples/sec Loss 39.1196 LearningRate 0.0002 Epoch: 0 Global Step: 1100 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-03-04 21:19:35,540-Speed 9318.80 samples/sec Loss 39.1140 LearningRate 0.0002 Epoch: 0 Global Step: 1110 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-03-04 21:20:01,821-Speed 9352.24 samples/sec Loss 39.0860 LearningRate 0.0002 Epoch: 0 Global Step: 1120 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-03-04 21:20:28,119-Speed 9345.91 samples/sec Loss 39.1035 LearningRate 0.0002 Epoch: 0 Global Step: 1130 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-03-04 21:20:54,449-Speed 9334.39 samples/sec Loss 39.0553 LearningRate 0.0002 Epoch: 0 Global Step: 1140 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-03-04 21:21:20,800-Speed 9327.00 samples/sec Loss 39.0431 LearningRate 0.0002 Epoch: 0 Global Step: 1150 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-03-04 21:21:47,151-Speed 9327.02 samples/sec Loss 39.0310 LearningRate 0.0002 Epoch: 0 Global Step: 1160 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-03-04 21:22:13,509-Speed 9324.40 samples/sec Loss 39.0079 LearningRate 0.0002 Epoch: 0 Global Step: 1170 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-03-04 21:22:39,813-Speed 9343.62 samples/sec Loss 39.0274 LearningRate 0.0002 Epoch: 0 Global Step: 1180 Fp16 Grad Scale: 8192 Required: 49 hours Training: 2022-03-04 21:23:06,089-Speed 9353.70 samples/sec Loss 39.0159 LearningRate 0.0002 Epoch: 0 Global Step: 1190 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 21:23:32,466-Speed 9317.71 samples/sec Loss 38.9808 LearningRate 0.0002 Epoch: 0 Global Step: 1200 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 21:23:58,788-Speed 9337.86 samples/sec Loss 38.9294 LearningRate 0.0002 Epoch: 0 Global Step: 1210 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 21:24:25,118-Speed 9334.09 samples/sec Loss 38.8897 LearningRate 0.0002 Epoch: 0 Global Step: 1220 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 21:24:51,523-Speed 9307.92 samples/sec Loss 38.9448 LearningRate 0.0002 Epoch: 0 Global Step: 1230 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 21:25:17,845-Speed 9337.36 samples/sec Loss 38.9361 LearningRate 0.0002 Epoch: 0 Global Step: 1240 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 21:25:44,167-Speed 9337.05 samples/sec Loss 38.8574 LearningRate 0.0002 Epoch: 0 Global Step: 1250 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 21:26:10,450-Speed 9351.05 samples/sec Loss 38.8182 LearningRate 0.0002 Epoch: 0 Global Step: 1260 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 21:26:36,744-Speed 9347.10 samples/sec Loss 38.7686 LearningRate 0.0002 Epoch: 0 Global Step: 1270 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 21:27:03,042-Speed 9345.68 samples/sec Loss 38.7139 LearningRate 0.0002 Epoch: 0 Global Step: 1280 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 21:27:29,363-Speed 9337.88 samples/sec Loss 38.6863 LearningRate 0.0002 Epoch: 0 Global Step: 1290 Fp16 Grad Scale: 8192 Required: 49 hours Training: 2022-03-04 21:27:55,654-Speed 9348.01 samples/sec Loss 38.6548 LearningRate 0.0002 Epoch: 0 Global Step: 1300 Fp16 Grad Scale: 8192 Required: 49 hours Training: 2022-03-04 21:28:21,943-Speed 9348.79 samples/sec Loss 38.6359 LearningRate 0.0002 Epoch: 0 Global Step: 1310 Fp16 Grad Scale: 8192 Required: 49 hours Training: 2022-03-04 21:28:48,206-Speed 9358.28 samples/sec Loss 38.6223 LearningRate 0.0002 Epoch: 0 Global Step: 1320 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 21:29:14,494-Speed 9349.14 samples/sec Loss 38.6513 LearningRate 0.0002 Epoch: 0 Global Step: 1330 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 21:29:40,799-Speed 9343.78 samples/sec Loss 38.6576 LearningRate 0.0002 Epoch: 0 Global Step: 1340 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 21:30:07,059-Speed 9358.93 samples/sec Loss 38.5467 LearningRate 0.0002 Epoch: 0 Global Step: 1350 Fp16 Grad Scale: 2048 Required: 49 hours Training: 2022-03-04 21:30:33,371-Speed 9340.67 samples/sec Loss 38.5419 LearningRate 0.0002 Epoch: 0 Global Step: 1360 Fp16 Grad Scale: 2048 Required: 49 hours Training: 2022-03-04 21:30:59,675-Speed 9344.18 samples/sec Loss 38.5570 LearningRate 0.0002 Epoch: 0 Global Step: 1370 Fp16 Grad Scale: 2048 Required: 49 hours Training: 2022-03-04 21:31:25,912-Speed 9367.38 samples/sec Loss 38.5327 LearningRate 0.0002 Epoch: 0 Global Step: 1380 Fp16 Grad Scale: 2048 Required: 49 hours Training: 2022-03-04 21:31:52,235-Speed 9337.00 samples/sec Loss 38.4775 LearningRate 0.0002 Epoch: 0 Global Step: 1390 Fp16 Grad Scale: 2048 Required: 49 hours Training: 2022-03-04 21:32:18,528-Speed 9347.54 samples/sec Loss 38.5005 LearningRate 0.0002 Epoch: 0 Global Step: 1400 Fp16 Grad Scale: 2048 Required: 49 hours Training: 2022-03-04 21:32:44,889-Speed 9323.70 samples/sec Loss 38.4442 LearningRate 0.0002 Epoch: 0 Global Step: 1410 Fp16 Grad Scale: 2048 Required: 49 hours Training: 2022-03-04 21:33:11,220-Speed 9333.76 samples/sec Loss 38.7379 LearningRate 0.0002 Epoch: 0 Global Step: 1420 Fp16 Grad Scale: 2048 Required: 49 hours Training: 2022-03-04 21:33:37,555-Speed 9332.57 samples/sec Loss 38.4480 LearningRate 0.0002 Epoch: 0 Global Step: 1430 Fp16 Grad Scale: 2048 Required: 49 hours Training: 2022-03-04 21:34:03,808-Speed 9361.56 samples/sec Loss 38.4603 LearningRate 0.0002 Epoch: 0 Global Step: 1440 Fp16 Grad Scale: 2048 Required: 49 hours Training: 2022-03-04 21:34:30,102-Speed 9347.27 samples/sec Loss 38.4207 LearningRate 0.0002 Epoch: 0 Global Step: 1450 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 21:34:56,409-Speed 9342.34 samples/sec Loss 38.3964 LearningRate 0.0002 Epoch: 0 Global Step: 1460 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 21:35:22,667-Speed 9360.09 samples/sec Loss 38.3533 LearningRate 0.0002 Epoch: 0 Global Step: 1470 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 21:35:48,940-Speed 9354.72 samples/sec Loss 38.2894 LearningRate 0.0002 Epoch: 0 Global Step: 1480 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 21:36:15,310-Speed 9319.88 samples/sec Loss 38.2424 LearningRate 0.0002 Epoch: 0 Global Step: 1490 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 21:36:41,670-Speed 9323.86 samples/sec Loss 38.1882 LearningRate 0.0002 Epoch: 0 Global Step: 1500 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 21:37:07,981-Speed 9341.24 samples/sec Loss 38.1190 LearningRate 0.0002 Epoch: 0 Global Step: 1510 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 21:37:34,300-Speed 9338.02 samples/sec Loss 38.1026 LearningRate 0.0002 Epoch: 0 Global Step: 1520 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 21:38:00,658-Speed 9324.53 samples/sec Loss 38.0877 LearningRate 0.0002 Epoch: 0 Global Step: 1530 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 21:38:26,968-Speed 9341.63 samples/sec Loss 38.4491 LearningRate 0.0002 Epoch: 0 Global Step: 1540 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 21:38:53,266-Speed 9345.65 samples/sec Loss 38.1682 LearningRate 0.0002 Epoch: 0 Global Step: 1550 Fp16 Grad Scale: 8192 Required: 49 hours Training: 2022-03-04 21:39:19,621-Speed 9325.70 samples/sec Loss 38.0522 LearningRate 0.0002 Epoch: 0 Global Step: 1560 Fp16 Grad Scale: 8192 Required: 49 hours Training: 2022-03-04 21:39:46,067-Speed 9293.35 samples/sec Loss 37.9772 LearningRate 0.0002 Epoch: 0 Global Step: 1570 Fp16 Grad Scale: 8192 Required: 49 hours Training: 2022-03-04 21:40:12,413-Speed 9328.75 samples/sec Loss 37.9387 LearningRate 0.0002 Epoch: 0 Global Step: 1580 Fp16 Grad Scale: 8192 Required: 49 hours Training: 2022-03-04 21:40:38,759-Speed 9328.44 samples/sec Loss 37.9246 LearningRate 0.0002 Epoch: 0 Global Step: 1590 Fp16 Grad Scale: 8192 Required: 49 hours Training: 2022-03-04 21:41:05,094-Speed 9332.68 samples/sec Loss 37.8783 LearningRate 0.0002 Epoch: 0 Global Step: 1600 Fp16 Grad Scale: 8192 Required: 49 hours Training: 2022-03-04 21:41:31,414-Speed 9338.64 samples/sec Loss 37.8408 LearningRate 0.0002 Epoch: 0 Global Step: 1610 Fp16 Grad Scale: 8192 Required: 49 hours Training: 2022-03-04 21:41:57,758-Speed 9329.14 samples/sec Loss 37.7741 LearningRate 0.0002 Epoch: 0 Global Step: 1620 Fp16 Grad Scale: 8192 Required: 49 hours Training: 2022-03-04 21:42:23,968-Speed 9377.35 samples/sec Loss 37.7123 LearningRate 0.0002 Epoch: 0 Global Step: 1630 Fp16 Grad Scale: 8192 Required: 49 hours Training: 2022-03-04 21:42:50,277-Speed 9341.79 samples/sec Loss 37.6759 LearningRate 0.0002 Epoch: 0 Global Step: 1640 Fp16 Grad Scale: 8192 Required: 49 hours Training: 2022-03-04 21:43:16,577-Speed 9344.84 samples/sec Loss 37.6202 LearningRate 0.0002 Epoch: 0 Global Step: 1650 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-03-04 21:43:42,814-Speed 9367.29 samples/sec Loss 37.5821 LearningRate 0.0002 Epoch: 0 Global Step: 1660 Fp16 Grad Scale: 8192 Required: 49 hours Training: 2022-03-04 21:44:09,079-Speed 9357.71 samples/sec Loss 37.5247 LearningRate 0.0002 Epoch: 0 Global Step: 1670 Fp16 Grad Scale: 8192 Required: 49 hours Training: 2022-03-04 21:44:35,361-Speed 9351.23 samples/sec Loss 37.4741 LearningRate 0.0002 Epoch: 0 Global Step: 1680 Fp16 Grad Scale: 8192 Required: 49 hours Training: 2022-03-04 21:45:01,636-Speed 9354.71 samples/sec Loss 37.5179 LearningRate 0.0002 Epoch: 0 Global Step: 1690 Fp16 Grad Scale: 8192 Required: 49 hours Training: 2022-03-04 21:45:27,946-Speed 9341.28 samples/sec Loss 37.4822 LearningRate 0.0002 Epoch: 0 Global Step: 1700 Fp16 Grad Scale: 8192 Required: 49 hours Training: 2022-03-04 21:45:54,205-Speed 9359.54 samples/sec Loss 37.4035 LearningRate 0.0002 Epoch: 0 Global Step: 1710 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 21:46:20,482-Speed 9352.96 samples/sec Loss 37.3636 LearningRate 0.0002 Epoch: 0 Global Step: 1720 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 21:47:38,019-Speed 3169.66 samples/sec Loss 37.3438 LearningRate 0.0003 Epoch: 1 Global Step: 1730 Fp16 Grad Scale: 4096 Required: 50 hours Training: 2022-03-04 21:48:04,034-Speed 9447.26 samples/sec Loss 37.3141 LearningRate 0.0003 Epoch: 1 Global Step: 1740 Fp16 Grad Scale: 4096 Required: 50 hours Training: 2022-03-04 21:48:30,307-Speed 9354.87 samples/sec Loss 37.2755 LearningRate 0.0003 Epoch: 1 Global Step: 1750 Fp16 Grad Scale: 4096 Required: 50 hours Training: 2022-03-04 21:48:56,582-Speed 9353.76 samples/sec Loss 37.3661 LearningRate 0.0003 Epoch: 1 Global Step: 1760 Fp16 Grad Scale: 4096 Required: 50 hours Training: 2022-03-04 21:49:22,737-Speed 9396.82 samples/sec Loss 37.3078 LearningRate 0.0003 Epoch: 1 Global Step: 1770 Fp16 Grad Scale: 4096 Required: 50 hours Training: 2022-03-04 21:49:48,939-Speed 9380.16 samples/sec Loss 37.1904 LearningRate 0.0003 Epoch: 1 Global Step: 1780 Fp16 Grad Scale: 4096 Required: 50 hours Training: 2022-03-04 21:50:15,152-Speed 9375.99 samples/sec Loss 37.1099 LearningRate 0.0003 Epoch: 1 Global Step: 1790 Fp16 Grad Scale: 4096 Required: 50 hours Training: 2022-03-04 21:50:41,400-Speed 9363.92 samples/sec Loss 37.0244 LearningRate 0.0003 Epoch: 1 Global Step: 1800 Fp16 Grad Scale: 4096 Required: 50 hours Training: 2022-03-04 21:51:07,631-Speed 9369.56 samples/sec Loss 36.9832 LearningRate 0.0003 Epoch: 1 Global Step: 1810 Fp16 Grad Scale: 8192 Required: 50 hours Training: 2022-03-04 21:51:33,932-Speed 9344.38 samples/sec Loss 36.9613 LearningRate 0.0003 Epoch: 1 Global Step: 1820 Fp16 Grad Scale: 8192 Required: 50 hours Training: 2022-03-04 21:52:00,100-Speed 9392.93 samples/sec Loss 36.8970 LearningRate 0.0003 Epoch: 1 Global Step: 1830 Fp16 Grad Scale: 8192 Required: 50 hours Training: 2022-03-04 21:52:26,308-Speed 9377.76 samples/sec Loss 36.8320 LearningRate 0.0003 Epoch: 1 Global Step: 1840 Fp16 Grad Scale: 8192 Required: 50 hours Training: 2022-03-04 21:52:52,468-Speed 9395.26 samples/sec Loss 36.8124 LearningRate 0.0003 Epoch: 1 Global Step: 1850 Fp16 Grad Scale: 8192 Required: 50 hours Training: 2022-03-04 21:53:18,746-Speed 9352.69 samples/sec Loss 36.7429 LearningRate 0.0003 Epoch: 1 Global Step: 1860 Fp16 Grad Scale: 8192 Required: 50 hours Training: 2022-03-04 21:53:45,001-Speed 9361.09 samples/sec Loss 36.7100 LearningRate 0.0003 Epoch: 1 Global Step: 1870 Fp16 Grad Scale: 8192 Required: 50 hours Training: 2022-03-04 21:54:11,212-Speed 9376.71 samples/sec Loss 36.6441 LearningRate 0.0003 Epoch: 1 Global Step: 1880 Fp16 Grad Scale: 8192 Required: 50 hours Training: 2022-03-04 21:54:37,389-Speed 9389.52 samples/sec Loss 36.6006 LearningRate 0.0003 Epoch: 1 Global Step: 1890 Fp16 Grad Scale: 8192 Required: 50 hours Training: 2022-03-04 21:55:03,555-Speed 9392.88 samples/sec Loss 36.5648 LearningRate 0.0003 Epoch: 1 Global Step: 1900 Fp16 Grad Scale: 8192 Required: 49 hours Training: 2022-03-04 21:55:29,785-Speed 9369.81 samples/sec Loss 36.4824 LearningRate 0.0003 Epoch: 1 Global Step: 1910 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-03-04 21:55:55,975-Speed 9384.43 samples/sec Loss 36.4613 LearningRate 0.0003 Epoch: 1 Global Step: 1920 Fp16 Grad Scale: 8192 Required: 49 hours Training: 2022-03-04 21:56:22,191-Speed 9374.95 samples/sec Loss 36.4317 LearningRate 0.0003 Epoch: 1 Global Step: 1930 Fp16 Grad Scale: 8192 Required: 49 hours Training: 2022-03-04 21:56:48,415-Speed 9372.10 samples/sec Loss 36.3634 LearningRate 0.0003 Epoch: 1 Global Step: 1940 Fp16 Grad Scale: 8192 Required: 49 hours Training: 2022-03-04 21:57:14,675-Speed 9359.30 samples/sec Loss 36.3233 LearningRate 0.0003 Epoch: 1 Global Step: 1950 Fp16 Grad Scale: 8192 Required: 49 hours Training: 2022-03-04 21:57:40,912-Speed 9367.58 samples/sec Loss 36.3043 LearningRate 0.0003 Epoch: 1 Global Step: 1960 Fp16 Grad Scale: 8192 Required: 49 hours Training: 2022-03-04 21:58:07,106-Speed 9383.11 samples/sec Loss 36.2663 LearningRate 0.0003 Epoch: 1 Global Step: 1970 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 21:58:33,316-Speed 9376.95 samples/sec Loss 36.3529 LearningRate 0.0003 Epoch: 1 Global Step: 1980 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 21:58:59,681-Speed 9321.82 samples/sec Loss 36.2152 LearningRate 0.0003 Epoch: 1 Global Step: 1990 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 21:59:25,955-Speed 9354.75 samples/sec Loss 36.1151 LearningRate 0.0003 Epoch: 1 Global Step: 2000 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 21:59:52,157-Speed 9380.08 samples/sec Loss 36.0745 LearningRate 0.0003 Epoch: 1 Global Step: 2010 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 22:00:18,369-Speed 9376.33 samples/sec Loss 36.0269 LearningRate 0.0003 Epoch: 1 Global Step: 2020 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 22:00:44,596-Speed 9371.23 samples/sec Loss 35.9820 LearningRate 0.0003 Epoch: 1 Global Step: 2030 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 22:01:10,875-Speed 9352.50 samples/sec Loss 36.0160 LearningRate 0.0003 Epoch: 1 Global Step: 2040 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 22:01:37,089-Speed 9375.99 samples/sec Loss 35.9425 LearningRate 0.0003 Epoch: 1 Global Step: 2050 Fp16 Grad Scale: 2048 Required: 49 hours Training: 2022-03-04 22:02:03,283-Speed 9382.93 samples/sec Loss 35.9677 LearningRate 0.0003 Epoch: 1 Global Step: 2060 Fp16 Grad Scale: 2048 Required: 49 hours Training: 2022-03-04 22:02:29,521-Speed 9367.15 samples/sec Loss 35.9316 LearningRate 0.0003 Epoch: 1 Global Step: 2070 Fp16 Grad Scale: 2048 Required: 49 hours Training: 2022-03-04 22:02:55,790-Speed 9355.98 samples/sec Loss 35.8029 LearningRate 0.0003 Epoch: 1 Global Step: 2080 Fp16 Grad Scale: 2048 Required: 49 hours Training: 2022-03-04 22:03:22,044-Speed 9361.52 samples/sec Loss 35.7370 LearningRate 0.0003 Epoch: 1 Global Step: 2090 Fp16 Grad Scale: 2048 Required: 49 hours Training: 2022-03-04 22:03:48,218-Speed 9390.17 samples/sec Loss 35.8043 LearningRate 0.0003 Epoch: 1 Global Step: 2100 Fp16 Grad Scale: 2048 Required: 49 hours Training: 2022-03-04 22:04:14,445-Speed 9371.78 samples/sec Loss 35.6417 LearningRate 0.0003 Epoch: 1 Global Step: 2110 Fp16 Grad Scale: 2048 Required: 49 hours Training: 2022-03-04 22:04:40,689-Speed 9365.08 samples/sec Loss 35.5674 LearningRate 0.0003 Epoch: 1 Global Step: 2120 Fp16 Grad Scale: 2048 Required: 49 hours Training: 2022-03-04 22:05:06,924-Speed 9368.29 samples/sec Loss 35.5179 LearningRate 0.0003 Epoch: 1 Global Step: 2130 Fp16 Grad Scale: 2048 Required: 49 hours Training: 2022-03-04 22:05:33,123-Speed 9381.76 samples/sec Loss 35.5799 LearningRate 0.0003 Epoch: 1 Global Step: 2140 Fp16 Grad Scale: 2048 Required: 49 hours Training: 2022-03-04 22:05:59,356-Speed 9369.25 samples/sec Loss 35.6791 LearningRate 0.0003 Epoch: 1 Global Step: 2150 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 22:06:25,632-Speed 9353.51 samples/sec Loss 35.4853 LearningRate 0.0003 Epoch: 1 Global Step: 2160 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 22:06:51,840-Speed 9378.01 samples/sec Loss 35.4411 LearningRate 0.0003 Epoch: 1 Global Step: 2170 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 22:07:18,089-Speed 9363.11 samples/sec Loss 35.6056 LearningRate 0.0003 Epoch: 1 Global Step: 2180 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 22:07:44,457-Speed 9321.13 samples/sec Loss 35.3852 LearningRate 0.0003 Epoch: 1 Global Step: 2190 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 22:08:10,751-Speed 9347.27 samples/sec Loss 35.9786 LearningRate 0.0003 Epoch: 1 Global Step: 2200 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 22:08:36,988-Speed 9367.91 samples/sec Loss 35.5871 LearningRate 0.0003 Epoch: 1 Global Step: 2210 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 22:09:03,313-Speed 9336.12 samples/sec Loss 35.2985 LearningRate 0.0003 Epoch: 1 Global Step: 2220 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 22:09:29,516-Speed 9379.73 samples/sec Loss 35.1592 LearningRate 0.0003 Epoch: 1 Global Step: 2230 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 22:09:55,750-Speed 9368.40 samples/sec Loss 35.0873 LearningRate 0.0003 Epoch: 1 Global Step: 2240 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 22:10:22,012-Speed 9358.36 samples/sec Loss 35.0537 LearningRate 0.0003 Epoch: 1 Global Step: 2250 Fp16 Grad Scale: 2048 Required: 49 hours Training: 2022-03-04 22:10:48,261-Speed 9363.23 samples/sec Loss 34.9923 LearningRate 0.0003 Epoch: 1 Global Step: 2260 Fp16 Grad Scale: 2048 Required: 49 hours Training: 2022-03-04 22:11:14,510-Speed 9363.08 samples/sec Loss 35.1088 LearningRate 0.0003 Epoch: 1 Global Step: 2270 Fp16 Grad Scale: 2048 Required: 49 hours Training: 2022-03-04 22:11:40,769-Speed 9359.87 samples/sec Loss 36.3821 LearningRate 0.0003 Epoch: 1 Global Step: 2280 Fp16 Grad Scale: 2048 Required: 49 hours Training: 2022-03-04 22:12:06,911-Speed 9401.04 samples/sec Loss 35.1854 LearningRate 0.0003 Epoch: 1 Global Step: 2290 Fp16 Grad Scale: 2048 Required: 49 hours Training: 2022-03-04 22:12:33,144-Speed 9368.92 samples/sec Loss 34.9644 LearningRate 0.0003 Epoch: 1 Global Step: 2300 Fp16 Grad Scale: 2048 Required: 49 hours Training: 2022-03-04 22:12:59,462-Speed 9338.43 samples/sec Loss 34.8483 LearningRate 0.0003 Epoch: 1 Global Step: 2310 Fp16 Grad Scale: 2048 Required: 49 hours Training: 2022-03-04 22:13:25,682-Speed 9373.22 samples/sec Loss 34.8001 LearningRate 0.0003 Epoch: 1 Global Step: 2320 Fp16 Grad Scale: 2048 Required: 49 hours Training: 2022-03-04 22:13:52,046-Speed 9322.43 samples/sec Loss 34.6771 LearningRate 0.0003 Epoch: 1 Global Step: 2330 Fp16 Grad Scale: 2048 Required: 49 hours Training: 2022-03-04 22:14:18,280-Speed 9368.37 samples/sec Loss 34.6543 LearningRate 0.0003 Epoch: 1 Global Step: 2340 Fp16 Grad Scale: 2048 Required: 49 hours Training: 2022-03-04 22:14:44,495-Speed 9375.12 samples/sec Loss 34.5566 LearningRate 0.0003 Epoch: 1 Global Step: 2350 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 22:15:10,749-Speed 9361.46 samples/sec Loss 34.4858 LearningRate 0.0003 Epoch: 1 Global Step: 2360 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 22:15:36,933-Speed 9386.61 samples/sec Loss 34.5599 LearningRate 0.0003 Epoch: 1 Global Step: 2370 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 22:16:03,176-Speed 9364.94 samples/sec Loss 34.5383 LearningRate 0.0003 Epoch: 1 Global Step: 2380 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 22:16:29,436-Speed 9359.13 samples/sec Loss 34.4457 LearningRate 0.0003 Epoch: 1 Global Step: 2390 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 22:16:55,726-Speed 9348.26 samples/sec Loss 34.2967 LearningRate 0.0003 Epoch: 1 Global Step: 2400 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 22:17:21,944-Speed 9374.26 samples/sec Loss 34.1739 LearningRate 0.0003 Epoch: 1 Global Step: 2410 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 22:17:48,248-Speed 9343.47 samples/sec Loss 34.0907 LearningRate 0.0004 Epoch: 1 Global Step: 2420 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 22:18:14,472-Speed 9371.81 samples/sec Loss 33.9901 LearningRate 0.0004 Epoch: 1 Global Step: 2430 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 22:18:40,716-Speed 9365.07 samples/sec Loss 33.9421 LearningRate 0.0004 Epoch: 1 Global Step: 2440 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 22:19:06,984-Speed 9356.25 samples/sec Loss 33.9121 LearningRate 0.0004 Epoch: 1 Global Step: 2450 Fp16 Grad Scale: 8192 Required: 49 hours Training: 2022-03-04 22:19:33,188-Speed 9378.90 samples/sec Loss 33.8778 LearningRate 0.0004 Epoch: 1 Global Step: 2460 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 22:19:59,369-Speed 9387.56 samples/sec Loss 33.8028 LearningRate 0.0004 Epoch: 1 Global Step: 2470 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 22:20:25,588-Speed 9373.56 samples/sec Loss 33.7008 LearningRate 0.0004 Epoch: 1 Global Step: 2480 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 22:20:51,803-Speed 9375.03 samples/sec Loss 33.6673 LearningRate 0.0004 Epoch: 1 Global Step: 2490 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 22:21:18,048-Speed 9364.72 samples/sec Loss 33.6720 LearningRate 0.0004 Epoch: 1 Global Step: 2500 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 22:21:44,264-Speed 9374.90 samples/sec Loss 33.8205 LearningRate 0.0004 Epoch: 1 Global Step: 2510 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 22:22:10,484-Speed 9373.39 samples/sec Loss 33.6063 LearningRate 0.0004 Epoch: 1 Global Step: 2520 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 22:22:36,747-Speed 9358.09 samples/sec Loss 33.4343 LearningRate 0.0004 Epoch: 1 Global Step: 2530 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 22:23:03,034-Speed 9350.45 samples/sec Loss 33.3491 LearningRate 0.0004 Epoch: 1 Global Step: 2540 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 22:23:29,300-Speed 9357.16 samples/sec Loss 33.5888 LearningRate 0.0004 Epoch: 1 Global Step: 2550 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 22:23:55,580-Speed 9351.87 samples/sec Loss 33.5354 LearningRate 0.0004 Epoch: 1 Global Step: 2560 Fp16 Grad Scale: 8192 Required: 49 hours Training: 2022-03-04 22:24:21,775-Speed 9382.32 samples/sec Loss 33.2971 LearningRate 0.0004 Epoch: 1 Global Step: 2570 Fp16 Grad Scale: 8192 Required: 49 hours Training: 2022-03-04 22:24:48,026-Speed 9362.34 samples/sec Loss 33.1971 LearningRate 0.0004 Epoch: 1 Global Step: 2580 Fp16 Grad Scale: 8192 Required: 49 hours Training: 2022-03-04 22:25:14,347-Speed 9337.74 samples/sec Loss 33.0794 LearningRate 0.0004 Epoch: 1 Global Step: 2590 Fp16 Grad Scale: 8192 Required: 49 hours Training: 2022-03-04 22:25:40,672-Speed 9336.12 samples/sec Loss 32.9905 LearningRate 0.0004 Epoch: 1 Global Step: 2600 Fp16 Grad Scale: 8192 Required: 49 hours Training: 2022-03-04 22:26:06,893-Speed 9372.96 samples/sec Loss 33.0474 LearningRate 0.0004 Epoch: 1 Global Step: 2610 Fp16 Grad Scale: 8192 Required: 49 hours Training: 2022-03-04 22:26:33,070-Speed 9388.93 samples/sec Loss 32.9066 LearningRate 0.0004 Epoch: 1 Global Step: 2620 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 22:26:59,438-Speed 9320.66 samples/sec Loss 32.8521 LearningRate 0.0004 Epoch: 1 Global Step: 2630 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 22:27:25,645-Speed 9378.25 samples/sec Loss 32.6729 LearningRate 0.0004 Epoch: 1 Global Step: 2640 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 22:27:51,882-Speed 9367.23 samples/sec Loss 32.5752 LearningRate 0.0004 Epoch: 1 Global Step: 2650 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 22:28:18,102-Speed 9373.62 samples/sec Loss 32.4900 LearningRate 0.0004 Epoch: 1 Global Step: 2660 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 22:28:44,387-Speed 9350.52 samples/sec Loss 32.4219 LearningRate 0.0004 Epoch: 1 Global Step: 2670 Fp16 Grad Scale: 2048 Required: 49 hours Training: 2022-03-04 22:29:10,644-Speed 9360.28 samples/sec Loss 32.3991 LearningRate 0.0004 Epoch: 1 Global Step: 2680 Fp16 Grad Scale: 2048 Required: 49 hours Training: 2022-03-04 22:29:36,906-Speed 9358.53 samples/sec Loss 32.2771 LearningRate 0.0004 Epoch: 1 Global Step: 2690 Fp16 Grad Scale: 2048 Required: 49 hours Training: 2022-03-04 22:30:03,133-Speed 9370.83 samples/sec Loss 32.1982 LearningRate 0.0004 Epoch: 1 Global Step: 2700 Fp16 Grad Scale: 2048 Required: 49 hours Training: 2022-03-04 22:30:29,373-Speed 9366.34 samples/sec Loss 32.1197 LearningRate 0.0004 Epoch: 1 Global Step: 2710 Fp16 Grad Scale: 2048 Required: 49 hours Training: 2022-03-04 22:30:55,555-Speed 9387.99 samples/sec Loss 32.2206 LearningRate 0.0004 Epoch: 1 Global Step: 2720 Fp16 Grad Scale: 2048 Required: 49 hours Training: 2022-03-04 22:31:21,697-Speed 9401.42 samples/sec Loss 32.5546 LearningRate 0.0004 Epoch: 1 Global Step: 2730 Fp16 Grad Scale: 2048 Required: 49 hours Training: 2022-03-04 22:31:47,866-Speed 9391.79 samples/sec Loss 31.9658 LearningRate 0.0004 Epoch: 1 Global Step: 2740 Fp16 Grad Scale: 1024 Required: 49 hours Training: 2022-03-04 22:32:14,149-Speed 9350.69 samples/sec Loss 31.8221 LearningRate 0.0004 Epoch: 1 Global Step: 2750 Fp16 Grad Scale: 1024 Required: 49 hours Training: 2022-03-04 22:32:40,455-Speed 9342.74 samples/sec Loss 31.7245 LearningRate 0.0004 Epoch: 1 Global Step: 2760 Fp16 Grad Scale: 1024 Required: 49 hours Training: 2022-03-04 22:33:06,695-Speed 9366.60 samples/sec Loss 31.6304 LearningRate 0.0004 Epoch: 1 Global Step: 2770 Fp16 Grad Scale: 1024 Required: 49 hours Training: 2022-03-04 22:33:32,920-Speed 9371.24 samples/sec Loss 31.5504 LearningRate 0.0004 Epoch: 1 Global Step: 2780 Fp16 Grad Scale: 1024 Required: 49 hours Training: 2022-03-04 22:33:59,090-Speed 9391.46 samples/sec Loss 31.4515 LearningRate 0.0004 Epoch: 1 Global Step: 2790 Fp16 Grad Scale: 1024 Required: 49 hours Training: 2022-03-04 22:34:25,268-Speed 9388.30 samples/sec Loss 31.3570 LearningRate 0.0004 Epoch: 1 Global Step: 2800 Fp16 Grad Scale: 1024 Required: 49 hours Training: 2022-03-04 22:34:51,511-Speed 9365.29 samples/sec Loss 31.2302 LearningRate 0.0004 Epoch: 1 Global Step: 2810 Fp16 Grad Scale: 1024 Required: 49 hours Training: 2022-03-04 22:35:17,726-Speed 9375.33 samples/sec Loss 31.1367 LearningRate 0.0004 Epoch: 1 Global Step: 2820 Fp16 Grad Scale: 1024 Required: 49 hours Training: 2022-03-04 22:35:44,089-Speed 9322.37 samples/sec Loss 31.0615 LearningRate 0.0004 Epoch: 1 Global Step: 2830 Fp16 Grad Scale: 1024 Required: 49 hours Training: 2022-03-04 22:36:10,478-Speed 9313.38 samples/sec Loss 31.0016 LearningRate 0.0004 Epoch: 1 Global Step: 2840 Fp16 Grad Scale: 2048 Required: 49 hours Training: 2022-03-04 22:36:36,677-Speed 9380.88 samples/sec Loss 30.8827 LearningRate 0.0004 Epoch: 1 Global Step: 2850 Fp16 Grad Scale: 2048 Required: 49 hours Training: 2022-03-04 22:37:02,944-Speed 9356.80 samples/sec Loss 30.7280 LearningRate 0.0004 Epoch: 1 Global Step: 2860 Fp16 Grad Scale: 2048 Required: 49 hours Training: 2022-03-04 22:37:29,134-Speed 9384.01 samples/sec Loss 30.6384 LearningRate 0.0004 Epoch: 1 Global Step: 2870 Fp16 Grad Scale: 2048 Required: 49 hours Training: 2022-03-04 22:37:55,373-Speed 9366.71 samples/sec Loss 30.5557 LearningRate 0.0004 Epoch: 1 Global Step: 2880 Fp16 Grad Scale: 2048 Required: 49 hours Training: 2022-03-04 22:38:21,606-Speed 9368.57 samples/sec Loss 30.4550 LearningRate 0.0004 Epoch: 1 Global Step: 2890 Fp16 Grad Scale: 2048 Required: 49 hours Training: 2022-03-04 22:38:47,922-Speed 9339.28 samples/sec Loss 30.3982 LearningRate 0.0004 Epoch: 1 Global Step: 2900 Fp16 Grad Scale: 2048 Required: 49 hours Training: 2022-03-04 22:39:14,145-Speed 9372.41 samples/sec Loss 30.3677 LearningRate 0.0004 Epoch: 1 Global Step: 2910 Fp16 Grad Scale: 2048 Required: 49 hours Training: 2022-03-04 22:39:40,351-Speed 9378.43 samples/sec Loss 30.2269 LearningRate 0.0004 Epoch: 1 Global Step: 2920 Fp16 Grad Scale: 2048 Required: 49 hours Training: 2022-03-04 22:40:06,544-Speed 9382.99 samples/sec Loss 30.1060 LearningRate 0.0004 Epoch: 1 Global Step: 2930 Fp16 Grad Scale: 2048 Required: 49 hours Training: 2022-03-04 22:40:32,763-Speed 9374.07 samples/sec Loss 30.0018 LearningRate 0.0004 Epoch: 1 Global Step: 2940 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 22:40:58,944-Speed 9387.21 samples/sec Loss 29.8765 LearningRate 0.0004 Epoch: 1 Global Step: 2950 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 22:41:25,121-Speed 9388.99 samples/sec Loss 29.7620 LearningRate 0.0004 Epoch: 1 Global Step: 2960 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 22:41:51,301-Speed 9387.38 samples/sec Loss 29.6872 LearningRate 0.0004 Epoch: 1 Global Step: 2970 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 22:42:17,483-Speed 9387.41 samples/sec Loss 29.5593 LearningRate 0.0004 Epoch: 1 Global Step: 2980 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 22:42:43,703-Speed 9373.50 samples/sec Loss 29.4333 LearningRate 0.0004 Epoch: 1 Global Step: 2990 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 22:43:09,948-Speed 9364.46 samples/sec Loss 29.3521 LearningRate 0.0004 Epoch: 1 Global Step: 3000 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-03-04 22:43:36,187-Speed 9366.76 samples/sec Loss 29.2615 LearningRate 0.0004 Epoch: 1 Global Step: 3010 Fp16 Grad Scale: 4096 Required: 48 hours Training: 2022-03-04 22:44:02,420-Speed 9368.73 samples/sec Loss 29.1430 LearningRate 0.0004 Epoch: 1 Global Step: 3020 Fp16 Grad Scale: 4096 Required: 48 hours Training: 2022-03-04 22:44:28,648-Speed 9370.59 samples/sec Loss 29.0706 LearningRate 0.0004 Epoch: 1 Global Step: 3030 Fp16 Grad Scale: 4096 Required: 48 hours Training: 2022-03-04 22:44:54,898-Speed 9363.08 samples/sec Loss 28.9694 LearningRate 0.0004 Epoch: 1 Global Step: 3040 Fp16 Grad Scale: 8192 Required: 48 hours Training: 2022-03-04 22:45:21,159-Speed 9358.54 samples/sec Loss 28.9417 LearningRate 0.0004 Epoch: 1 Global Step: 3050 Fp16 Grad Scale: 8192 Required: 48 hours Training: 2022-03-04 22:45:47,395-Speed 9367.79 samples/sec Loss 28.7467 LearningRate 0.0004 Epoch: 1 Global Step: 3060 Fp16 Grad Scale: 8192 Required: 48 hours Training: 2022-03-04 22:46:13,582-Speed 9385.30 samples/sec Loss 28.6278 LearningRate 0.0004 Epoch: 1 Global Step: 3070 Fp16 Grad Scale: 8192 Required: 48 hours Training: 2022-03-04 22:46:39,849-Speed 9356.31 samples/sec Loss 28.5329 LearningRate 0.0004 Epoch: 1 Global Step: 3080 Fp16 Grad Scale: 8192 Required: 48 hours Training: 2022-03-04 22:47:05,985-Speed 9403.47 samples/sec Loss 28.3850 LearningRate 0.0004 Epoch: 1 Global Step: 3090 Fp16 Grad Scale: 8192 Required: 48 hours Training: 2022-03-04 22:47:32,217-Speed 9369.12 samples/sec Loss 28.2506 LearningRate 0.0004 Epoch: 1 Global Step: 3100 Fp16 Grad Scale: 8192 Required: 48 hours Training: 2022-03-04 22:47:58,476-Speed 9359.67 samples/sec Loss 28.1208 LearningRate 0.0004 Epoch: 1 Global Step: 3110 Fp16 Grad Scale: 8192 Required: 48 hours Training: 2022-03-04 22:48:24,642-Speed 9392.73 samples/sec Loss 28.0529 LearningRate 0.0005 Epoch: 1 Global Step: 3120 Fp16 Grad Scale: 8192 Required: 48 hours Training: 2022-03-04 22:48:50,815-Speed 9390.31 samples/sec Loss 27.9338 LearningRate 0.0005 Epoch: 1 Global Step: 3130 Fp16 Grad Scale: 8192 Required: 48 hours Training: 2022-03-04 22:49:17,028-Speed 9376.05 samples/sec Loss 27.8545 LearningRate 0.0005 Epoch: 1 Global Step: 3140 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-03-04 22:49:43,263-Speed 9368.15 samples/sec Loss 27.6669 LearningRate 0.0005 Epoch: 1 Global Step: 3150 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-03-04 22:50:09,414-Speed 9398.08 samples/sec Loss 27.5537 LearningRate 0.0005 Epoch: 1 Global Step: 3160 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-03-04 22:50:35,618-Speed 9379.18 samples/sec Loss 27.4204 LearningRate 0.0005 Epoch: 1 Global Step: 3170 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-03-04 22:51:01,846-Speed 9370.54 samples/sec Loss 27.3992 LearningRate 0.0005 Epoch: 1 Global Step: 3180 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-03-04 22:51:28,054-Speed 9377.71 samples/sec Loss 27.2903 LearningRate 0.0005 Epoch: 1 Global Step: 3190 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-03-04 22:51:54,251-Speed 9381.53 samples/sec Loss 27.0632 LearningRate 0.0005 Epoch: 1 Global Step: 3200 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-03-04 22:52:20,535-Speed 9350.62 samples/sec Loss 26.9365 LearningRate 0.0005 Epoch: 1 Global Step: 3210 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-03-04 22:52:46,804-Speed 9356.09 samples/sec Loss 26.8220 LearningRate 0.0005 Epoch: 1 Global Step: 3220 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-03-04 22:53:13,050-Speed 9363.84 samples/sec Loss 26.7746 LearningRate 0.0005 Epoch: 1 Global Step: 3230 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-03-04 22:53:39,311-Speed 9358.85 samples/sec Loss 26.5821 LearningRate 0.0005 Epoch: 1 Global Step: 3240 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-03-04 22:54:05,521-Speed 9376.89 samples/sec Loss 26.4576 LearningRate 0.0005 Epoch: 1 Global Step: 3250 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-03-04 22:54:31,720-Speed 9380.88 samples/sec Loss 26.3202 LearningRate 0.0005 Epoch: 1 Global Step: 3260 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-03-04 22:54:58,004-Speed 9350.70 samples/sec Loss 26.2488 LearningRate 0.0005 Epoch: 1 Global Step: 3270 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-03-04 22:55:24,242-Speed 9367.08 samples/sec Loss 26.0808 LearningRate 0.0005 Epoch: 1 Global Step: 3280 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-03-04 22:55:50,531-Speed 9348.67 samples/sec Loss 25.9360 LearningRate 0.0005 Epoch: 1 Global Step: 3290 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-03-04 22:56:16,695-Speed 9393.71 samples/sec Loss 25.7775 LearningRate 0.0005 Epoch: 1 Global Step: 3300 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-03-04 22:56:42,950-Speed 9360.75 samples/sec Loss 25.7067 LearningRate 0.0005 Epoch: 1 Global Step: 3310 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-03-04 22:57:09,267-Speed 9338.61 samples/sec Loss 25.6807 LearningRate 0.0005 Epoch: 1 Global Step: 3320 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-03-04 22:57:35,467-Speed 9380.54 samples/sec Loss 25.4889 LearningRate 0.0005 Epoch: 1 Global Step: 3330 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-03-04 22:58:01,676-Speed 9378.17 samples/sec Loss 25.3300 LearningRate 0.0005 Epoch: 1 Global Step: 3340 Fp16 Grad Scale: 65536 Required: 48 hours Training: 2022-03-04 22:58:27,889-Speed 9375.86 samples/sec Loss 25.3224 LearningRate 0.0005 Epoch: 1 Global Step: 3350 Fp16 Grad Scale: 65536 Required: 48 hours Training: 2022-03-04 22:58:54,104-Speed 9375.46 samples/sec Loss 25.0933 LearningRate 0.0005 Epoch: 1 Global Step: 3360 Fp16 Grad Scale: 65536 Required: 48 hours Training: 2022-03-04 22:59:20,337-Speed 9368.70 samples/sec Loss 24.9737 LearningRate 0.0005 Epoch: 1 Global Step: 3370 Fp16 Grad Scale: 65536 Required: 48 hours Training: 2022-03-04 22:59:46,518-Speed 9387.30 samples/sec Loss 24.8284 LearningRate 0.0005 Epoch: 1 Global Step: 3380 Fp16 Grad Scale: 65536 Required: 48 hours Training: 2022-03-04 23:00:12,753-Speed 9368.94 samples/sec Loss 24.7039 LearningRate 0.0005 Epoch: 1 Global Step: 3390 Fp16 Grad Scale: 65536 Required: 48 hours Training: 2022-03-04 23:00:39,051-Speed 9345.81 samples/sec Loss 24.5769 LearningRate 0.0005 Epoch: 1 Global Step: 3400 Fp16 Grad Scale: 65536 Required: 48 hours Training: 2022-03-04 23:01:05,307-Speed 9360.29 samples/sec Loss 24.4282 LearningRate 0.0005 Epoch: 1 Global Step: 3410 Fp16 Grad Scale: 65536 Required: 48 hours Training: 2022-03-04 23:01:31,533-Speed 9371.14 samples/sec Loss 24.3659 LearningRate 0.0005 Epoch: 1 Global Step: 3420 Fp16 Grad Scale: 65536 Required: 48 hours Training: 2022-03-04 23:01:57,784-Speed 9362.38 samples/sec Loss 24.2225 LearningRate 0.0005 Epoch: 1 Global Step: 3430 Fp16 Grad Scale: 65536 Required: 48 hours Training: 2022-03-04 23:02:24,069-Speed 9350.33 samples/sec Loss 24.1185 LearningRate 0.0005 Epoch: 1 Global Step: 3440 Fp16 Grad Scale: 131072 Required: 48 hours Training: 2022-03-04 23:02:50,272-Speed 9379.44 samples/sec Loss 23.9282 LearningRate 0.0005 Epoch: 1 Global Step: 3450 Fp16 Grad Scale: 131072 Required: 48 hours Training: 2022-03-04 23:04:10,139-Speed 3077.17 samples/sec Loss 23.8230 LearningRate 0.0005 Epoch: 2 Global Step: 3460 Fp16 Grad Scale: 131072 Required: 48 hours Training: 2022-03-04 23:04:36,139-Speed 9452.56 samples/sec Loss 23.7455 LearningRate 0.0005 Epoch: 2 Global Step: 3470 Fp16 Grad Scale: 131072 Required: 48 hours Training: 2022-03-04 23:05:02,222-Speed 9422.87 samples/sec Loss 23.5690 LearningRate 0.0005 Epoch: 2 Global Step: 3480 Fp16 Grad Scale: 131072 Required: 48 hours Training: 2022-03-04 23:05:28,301-Speed 9423.94 samples/sec Loss 23.4775 LearningRate 0.0005 Epoch: 2 Global Step: 3490 Fp16 Grad Scale: 131072 Required: 48 hours Training: 2022-03-04 23:05:54,446-Speed 9400.39 samples/sec Loss 23.3041 LearningRate 0.0005 Epoch: 2 Global Step: 3500 Fp16 Grad Scale: 131072 Required: 48 hours Training: 2022-03-04 23:06:20,546-Speed 9417.04 samples/sec Loss 23.2133 LearningRate 0.0005 Epoch: 2 Global Step: 3510 Fp16 Grad Scale: 131072 Required: 48 hours Training: 2022-03-04 23:06:46,535-Speed 9456.85 samples/sec Loss 23.1278 LearningRate 0.0005 Epoch: 2 Global Step: 3520 Fp16 Grad Scale: 131072 Required: 48 hours Training: 2022-03-04 23:07:12,568-Speed 9440.74 samples/sec Loss 22.9712 LearningRate 0.0005 Epoch: 2 Global Step: 3530 Fp16 Grad Scale: 131072 Required: 48 hours Training: 2022-03-04 23:07:38,584-Speed 9447.04 samples/sec Loss 22.8904 LearningRate 0.0005 Epoch: 2 Global Step: 3540 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:08:04,698-Speed 9411.53 samples/sec Loss 22.7847 LearningRate 0.0005 Epoch: 2 Global Step: 3550 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:08:30,751-Speed 9433.58 samples/sec Loss 22.6194 LearningRate 0.0005 Epoch: 2 Global Step: 3560 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:08:56,902-Speed 9398.39 samples/sec Loss 22.5263 LearningRate 0.0005 Epoch: 2 Global Step: 3570 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:09:22,972-Speed 9427.18 samples/sec Loss 22.4015 LearningRate 0.0005 Epoch: 2 Global Step: 3580 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:09:49,121-Speed 9399.20 samples/sec Loss 22.2545 LearningRate 0.0005 Epoch: 2 Global Step: 3590 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:10:15,291-Speed 9391.05 samples/sec Loss 22.2213 LearningRate 0.0005 Epoch: 2 Global Step: 3600 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:10:41,367-Speed 9425.38 samples/sec Loss 22.0199 LearningRate 0.0005 Epoch: 2 Global Step: 3610 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:11:07,572-Speed 9378.92 samples/sec Loss 21.9057 LearningRate 0.0005 Epoch: 2 Global Step: 3620 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:11:33,770-Speed 9381.39 samples/sec Loss 21.8135 LearningRate 0.0005 Epoch: 2 Global Step: 3630 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:12:00,149-Speed 9316.99 samples/sec Loss 21.7001 LearningRate 0.0005 Epoch: 2 Global Step: 3640 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:12:26,307-Speed 9395.63 samples/sec Loss 21.5335 LearningRate 0.0005 Epoch: 2 Global Step: 3650 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:12:52,480-Speed 9390.38 samples/sec Loss 21.4874 LearningRate 0.0005 Epoch: 2 Global Step: 3660 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:13:18,697-Speed 9374.67 samples/sec Loss 21.3727 LearningRate 0.0005 Epoch: 2 Global Step: 3670 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:13:44,987-Speed 9348.20 samples/sec Loss 21.1840 LearningRate 0.0005 Epoch: 2 Global Step: 3680 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:14:11,127-Speed 9402.22 samples/sec Loss 21.0678 LearningRate 0.0005 Epoch: 2 Global Step: 3690 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:14:37,264-Speed 9403.30 samples/sec Loss 21.0521 LearningRate 0.0005 Epoch: 2 Global Step: 3700 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:15:03,414-Speed 9398.75 samples/sec Loss 20.9104 LearningRate 0.0005 Epoch: 2 Global Step: 3710 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:15:29,520-Speed 9414.37 samples/sec Loss 20.6769 LearningRate 0.0005 Epoch: 2 Global Step: 3720 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:15:55,688-Speed 9391.86 samples/sec Loss 20.6337 LearningRate 0.0005 Epoch: 2 Global Step: 3730 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:16:21,919-Speed 9369.79 samples/sec Loss 20.5441 LearningRate 0.0005 Epoch: 2 Global Step: 3740 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:16:48,131-Speed 9376.11 samples/sec Loss 20.4279 LearningRate 0.0005 Epoch: 2 Global Step: 3750 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:17:14,379-Speed 9363.36 samples/sec Loss 20.3371 LearningRate 0.0005 Epoch: 2 Global Step: 3760 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:17:40,670-Speed 9348.17 samples/sec Loss 20.1477 LearningRate 0.0005 Epoch: 2 Global Step: 3770 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:18:06,883-Speed 9375.85 samples/sec Loss 20.0800 LearningRate 0.0005 Epoch: 2 Global Step: 3780 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:18:33,114-Speed 9369.87 samples/sec Loss 19.9308 LearningRate 0.0005 Epoch: 2 Global Step: 3790 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:18:59,360-Speed 9363.77 samples/sec Loss 19.8609 LearningRate 0.0005 Epoch: 2 Global Step: 3800 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:19:25,579-Speed 9373.87 samples/sec Loss 19.7468 LearningRate 0.0006 Epoch: 2 Global Step: 3810 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:19:51,875-Speed 9346.31 samples/sec Loss 19.6391 LearningRate 0.0006 Epoch: 2 Global Step: 3820 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:20:18,282-Speed 9306.88 samples/sec Loss 19.4825 LearningRate 0.0006 Epoch: 2 Global Step: 3830 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:20:44,685-Speed 9308.44 samples/sec Loss 19.3966 LearningRate 0.0006 Epoch: 2 Global Step: 3840 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:21:10,970-Speed 9350.06 samples/sec Loss 19.3295 LearningRate 0.0006 Epoch: 2 Global Step: 3850 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:21:37,176-Speed 9378.27 samples/sec Loss 19.1544 LearningRate 0.0006 Epoch: 2 Global Step: 3860 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:22:03,405-Speed 9370.34 samples/sec Loss 19.0620 LearningRate 0.0006 Epoch: 2 Global Step: 3870 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:22:29,603-Speed 9381.26 samples/sec Loss 19.0348 LearningRate 0.0006 Epoch: 2 Global Step: 3880 Fp16 Grad Scale: 131072 Required: 48 hours Training: 2022-03-04 23:22:55,890-Speed 9349.78 samples/sec Loss 18.8646 LearningRate 0.0006 Epoch: 2 Global Step: 3890 Fp16 Grad Scale: 131072 Required: 48 hours Training: 2022-03-04 23:23:22,053-Speed 9393.57 samples/sec Loss 18.8054 LearningRate 0.0006 Epoch: 2 Global Step: 3900 Fp16 Grad Scale: 131072 Required: 48 hours Training: 2022-03-04 23:23:48,219-Speed 9393.01 samples/sec Loss 18.6878 LearningRate 0.0006 Epoch: 2 Global Step: 3910 Fp16 Grad Scale: 131072 Required: 48 hours Training: 2022-03-04 23:24:14,392-Speed 9390.70 samples/sec Loss 18.5648 LearningRate 0.0006 Epoch: 2 Global Step: 3920 Fp16 Grad Scale: 131072 Required: 48 hours Training: 2022-03-04 23:24:40,571-Speed 9388.08 samples/sec Loss 18.4951 LearningRate 0.0006 Epoch: 2 Global Step: 3930 Fp16 Grad Scale: 131072 Required: 48 hours Training: 2022-03-04 23:25:06,809-Speed 9366.96 samples/sec Loss 18.3457 LearningRate 0.0006 Epoch: 2 Global Step: 3940 Fp16 Grad Scale: 131072 Required: 48 hours Training: 2022-03-04 23:25:33,031-Speed 9372.71 samples/sec Loss 18.3018 LearningRate 0.0006 Epoch: 2 Global Step: 3950 Fp16 Grad Scale: 131072 Required: 48 hours Training: 2022-03-04 23:25:59,182-Speed 9398.03 samples/sec Loss 18.2073 LearningRate 0.0006 Epoch: 2 Global Step: 3960 Fp16 Grad Scale: 131072 Required: 48 hours Training: 2022-03-04 23:26:25,410-Speed 9370.57 samples/sec Loss 18.0502 LearningRate 0.0006 Epoch: 2 Global Step: 3970 Fp16 Grad Scale: 131072 Required: 48 hours Training: 2022-03-04 23:26:51,607-Speed 9381.60 samples/sec Loss 18.0014 LearningRate 0.0006 Epoch: 2 Global Step: 3980 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:27:17,778-Speed 9390.74 samples/sec Loss 17.8644 LearningRate 0.0006 Epoch: 2 Global Step: 3990 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:27:43,990-Speed 9376.63 samples/sec Loss 17.7584 LearningRate 0.0006 Epoch: 2 Global Step: 4000 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:28:10,176-Speed 9385.65 samples/sec Loss 17.6530 LearningRate 0.0006 Epoch: 2 Global Step: 4010 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:28:36,453-Speed 9352.92 samples/sec Loss 17.5815 LearningRate 0.0006 Epoch: 2 Global Step: 4020 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:29:02,752-Speed 9345.08 samples/sec Loss 17.4636 LearningRate 0.0006 Epoch: 2 Global Step: 4030 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:29:29,042-Speed 9348.51 samples/sec Loss 17.4181 LearningRate 0.0006 Epoch: 2 Global Step: 4040 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:29:55,335-Speed 9347.41 samples/sec Loss 17.3387 LearningRate 0.0006 Epoch: 2 Global Step: 4050 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:30:21,561-Speed 9370.97 samples/sec Loss 17.1701 LearningRate 0.0006 Epoch: 2 Global Step: 4060 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:30:47,770-Speed 9377.22 samples/sec Loss 17.1661 LearningRate 0.0006 Epoch: 2 Global Step: 4070 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:31:14,060-Speed 9348.65 samples/sec Loss 17.0247 LearningRate 0.0006 Epoch: 2 Global Step: 4080 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:31:40,254-Speed 9382.76 samples/sec Loss 16.8869 LearningRate 0.0006 Epoch: 2 Global Step: 4090 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:32:06,575-Speed 9337.39 samples/sec Loss 16.8229 LearningRate 0.0006 Epoch: 2 Global Step: 4100 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:32:32,807-Speed 9369.24 samples/sec Loss 16.7196 LearningRate 0.0006 Epoch: 2 Global Step: 4110 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:32:58,972-Speed 9393.48 samples/sec Loss 16.6642 LearningRate 0.0006 Epoch: 2 Global Step: 4120 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:33:25,205-Speed 9368.77 samples/sec Loss 16.5082 LearningRate 0.0006 Epoch: 2 Global Step: 4130 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:33:51,478-Speed 9354.51 samples/sec Loss 16.5088 LearningRate 0.0006 Epoch: 2 Global Step: 4140 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:34:17,707-Speed 9370.31 samples/sec Loss 16.4001 LearningRate 0.0006 Epoch: 2 Global Step: 4150 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:34:43,981-Speed 9354.26 samples/sec Loss 16.3546 LearningRate 0.0006 Epoch: 2 Global Step: 4160 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:35:10,191-Speed 9376.90 samples/sec Loss 16.3827 LearningRate 0.0006 Epoch: 2 Global Step: 4170 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:35:36,378-Speed 9384.82 samples/sec Loss 16.1647 LearningRate 0.0006 Epoch: 2 Global Step: 4180 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:36:02,494-Speed 9410.89 samples/sec Loss 16.0192 LearningRate 0.0006 Epoch: 2 Global Step: 4190 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:36:28,580-Speed 9421.42 samples/sec Loss 15.9341 LearningRate 0.0006 Epoch: 2 Global Step: 4200 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:36:54,793-Speed 9376.03 samples/sec Loss 15.8689 LearningRate 0.0006 Epoch: 2 Global Step: 4210 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:37:20,911-Speed 9410.04 samples/sec Loss 15.8010 LearningRate 0.0006 Epoch: 2 Global Step: 4220 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:37:47,078-Speed 9392.10 samples/sec Loss 15.7191 LearningRate 0.0006 Epoch: 2 Global Step: 4230 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:38:13,263-Speed 9386.25 samples/sec Loss 15.6540 LearningRate 0.0006 Epoch: 2 Global Step: 4240 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:38:39,433-Speed 9391.10 samples/sec Loss 15.5965 LearningRate 0.0006 Epoch: 2 Global Step: 4250 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:39:05,611-Speed 9388.41 samples/sec Loss 15.4284 LearningRate 0.0006 Epoch: 2 Global Step: 4260 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:39:31,774-Speed 9393.70 samples/sec Loss 15.3716 LearningRate 0.0006 Epoch: 2 Global Step: 4270 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:39:57,957-Speed 9386.85 samples/sec Loss 15.3776 LearningRate 0.0006 Epoch: 2 Global Step: 4280 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:40:24,174-Speed 9374.67 samples/sec Loss 15.2983 LearningRate 0.0006 Epoch: 2 Global Step: 4290 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:40:50,549-Speed 9318.25 samples/sec Loss 15.2279 LearningRate 0.0006 Epoch: 2 Global Step: 4300 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:41:16,827-Speed 9352.72 samples/sec Loss 15.0487 LearningRate 0.0006 Epoch: 2 Global Step: 4310 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:41:43,136-Speed 9341.68 samples/sec Loss 15.0302 LearningRate 0.0006 Epoch: 2 Global Step: 4320 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:42:09,333-Speed 9382.62 samples/sec Loss 14.9198 LearningRate 0.0006 Epoch: 2 Global Step: 4330 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:42:35,571-Speed 9366.95 samples/sec Loss 14.8411 LearningRate 0.0006 Epoch: 2 Global Step: 4340 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:43:01,792-Speed 9372.89 samples/sec Loss 14.7905 LearningRate 0.0006 Epoch: 2 Global Step: 4350 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:43:27,964-Speed 9390.41 samples/sec Loss 14.7908 LearningRate 0.0006 Epoch: 2 Global Step: 4360 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:43:54,193-Speed 9370.31 samples/sec Loss 14.6256 LearningRate 0.0006 Epoch: 2 Global Step: 4370 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:44:20,371-Speed 9388.79 samples/sec Loss 14.5524 LearningRate 0.0006 Epoch: 2 Global Step: 4380 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:44:46,543-Speed 9390.36 samples/sec Loss 14.4723 LearningRate 0.0006 Epoch: 2 Global Step: 4390 Fp16 Grad Scale: 131072 Required: 48 hours Training: 2022-03-04 23:45:12,699-Speed 9396.58 samples/sec Loss 14.4378 LearningRate 0.0006 Epoch: 2 Global Step: 4400 Fp16 Grad Scale: 131072 Required: 48 hours Training: 2022-03-04 23:45:38,896-Speed 9381.30 samples/sec Loss 14.3879 LearningRate 0.0006 Epoch: 2 Global Step: 4410 Fp16 Grad Scale: 131072 Required: 48 hours Training: 2022-03-04 23:46:05,061-Speed 9394.45 samples/sec Loss 14.3904 LearningRate 0.0006 Epoch: 2 Global Step: 4420 Fp16 Grad Scale: 131072 Required: 48 hours Training: 2022-03-04 23:46:31,282-Speed 9372.83 samples/sec Loss 14.2867 LearningRate 0.0006 Epoch: 2 Global Step: 4430 Fp16 Grad Scale: 131072 Required: 48 hours Training: 2022-03-04 23:46:57,449-Speed 9392.47 samples/sec Loss 14.1646 LearningRate 0.0006 Epoch: 2 Global Step: 4440 Fp16 Grad Scale: 131072 Required: 48 hours Training: 2022-03-04 23:47:23,780-Speed 9333.76 samples/sec Loss 14.0904 LearningRate 0.0006 Epoch: 2 Global Step: 4450 Fp16 Grad Scale: 131072 Required: 48 hours Training: 2022-03-04 23:47:50,115-Speed 9332.61 samples/sec Loss 14.0519 LearningRate 0.0006 Epoch: 2 Global Step: 4460 Fp16 Grad Scale: 131072 Required: 48 hours Training: 2022-03-04 23:48:16,389-Speed 9354.25 samples/sec Loss 13.9958 LearningRate 0.0006 Epoch: 2 Global Step: 4470 Fp16 Grad Scale: 131072 Required: 48 hours Training: 2022-03-04 23:48:42,745-Speed 9325.36 samples/sec Loss 13.8737 LearningRate 0.0006 Epoch: 2 Global Step: 4480 Fp16 Grad Scale: 131072 Required: 48 hours Training: 2022-03-04 23:49:08,956-Speed 9376.56 samples/sec Loss 13.8452 LearningRate 0.0006 Epoch: 2 Global Step: 4490 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:49:35,221-Speed 9357.47 samples/sec Loss 13.7793 LearningRate 0.0007 Epoch: 2 Global Step: 4500 Fp16 Grad Scale: 262144 Required: 48 hours Training: 2022-03-04 23:50:01,435-Speed 9375.75 samples/sec Loss 13.7203 LearningRate 0.0007 Epoch: 2 Global Step: 4510 Fp16 Grad Scale: 262144 Required: 47 hours Training: 2022-03-04 23:50:27,570-Speed 9403.85 samples/sec Loss 13.6260 LearningRate 0.0007 Epoch: 2 Global Step: 4520 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-04 23:50:53,710-Speed 9402.13 samples/sec Loss 13.5629 LearningRate 0.0007 Epoch: 2 Global Step: 4530 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-04 23:51:19,926-Speed 9375.12 samples/sec Loss 13.5103 LearningRate 0.0007 Epoch: 2 Global Step: 4540 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-04 23:51:46,109-Speed 9386.57 samples/sec Loss 13.5108 LearningRate 0.0007 Epoch: 2 Global Step: 4550 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-04 23:52:12,233-Speed 9407.95 samples/sec Loss 13.4399 LearningRate 0.0007 Epoch: 2 Global Step: 4560 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-04 23:52:38,476-Speed 9365.36 samples/sec Loss 13.3145 LearningRate 0.0007 Epoch: 2 Global Step: 4570 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-04 23:53:04,691-Speed 9375.04 samples/sec Loss 13.2925 LearningRate 0.0007 Epoch: 2 Global Step: 4580 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-04 23:53:30,884-Speed 9383.21 samples/sec Loss 13.1523 LearningRate 0.0007 Epoch: 2 Global Step: 4590 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-04 23:53:57,063-Speed 9388.19 samples/sec Loss 13.1162 LearningRate 0.0007 Epoch: 2 Global Step: 4600 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-04 23:54:23,228-Speed 9392.91 samples/sec Loss 13.1042 LearningRate 0.0007 Epoch: 2 Global Step: 4610 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-04 23:54:49,529-Speed 9344.65 samples/sec Loss 13.0833 LearningRate 0.0007 Epoch: 2 Global Step: 4620 Fp16 Grad Scale: 262144 Required: 47 hours Training: 2022-03-04 23:55:15,798-Speed 9355.93 samples/sec Loss 13.0501 LearningRate 0.0007 Epoch: 2 Global Step: 4630 Fp16 Grad Scale: 262144 Required: 47 hours Training: 2022-03-04 23:55:42,032-Speed 9368.37 samples/sec Loss 12.9008 LearningRate 0.0007 Epoch: 2 Global Step: 4640 Fp16 Grad Scale: 262144 Required: 47 hours Training: 2022-03-04 23:56:08,277-Speed 9364.65 samples/sec Loss 12.8563 LearningRate 0.0007 Epoch: 2 Global Step: 4650 Fp16 Grad Scale: 262144 Required: 47 hours Training: 2022-03-04 23:56:34,472-Speed 9382.26 samples/sec Loss 12.8341 LearningRate 0.0007 Epoch: 2 Global Step: 4660 Fp16 Grad Scale: 262144 Required: 47 hours Training: 2022-03-04 23:57:00,775-Speed 9343.95 samples/sec Loss 12.7723 LearningRate 0.0007 Epoch: 2 Global Step: 4670 Fp16 Grad Scale: 262144 Required: 47 hours Training: 2022-03-04 23:57:27,054-Speed 9353.24 samples/sec Loss 12.7329 LearningRate 0.0007 Epoch: 2 Global Step: 4680 Fp16 Grad Scale: 262144 Required: 47 hours Training: 2022-03-04 23:57:53,319-Speed 9357.83 samples/sec Loss 12.6433 LearningRate 0.0007 Epoch: 2 Global Step: 4690 Fp16 Grad Scale: 262144 Required: 47 hours Training: 2022-03-04 23:58:19,457-Speed 9402.68 samples/sec Loss 12.6099 LearningRate 0.0007 Epoch: 2 Global Step: 4700 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-04 23:58:45,719-Speed 9358.53 samples/sec Loss 12.5604 LearningRate 0.0007 Epoch: 2 Global Step: 4710 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-04 23:59:11,939-Speed 9373.05 samples/sec Loss 12.5427 LearningRate 0.0007 Epoch: 2 Global Step: 4720 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-04 23:59:38,141-Speed 9380.02 samples/sec Loss 12.4480 LearningRate 0.0007 Epoch: 2 Global Step: 4730 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:00:04,294-Speed 9397.48 samples/sec Loss 12.4492 LearningRate 0.0007 Epoch: 2 Global Step: 4740 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:00:30,467-Speed 9390.24 samples/sec Loss 12.3986 LearningRate 0.0007 Epoch: 2 Global Step: 4750 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:00:56,683-Speed 9374.64 samples/sec Loss 12.2626 LearningRate 0.0007 Epoch: 2 Global Step: 4760 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:01:22,898-Speed 9375.29 samples/sec Loss 12.2106 LearningRate 0.0007 Epoch: 2 Global Step: 4770 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:01:49,112-Speed 9375.41 samples/sec Loss 12.2160 LearningRate 0.0007 Epoch: 2 Global Step: 4780 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:02:15,337-Speed 9371.53 samples/sec Loss 12.1175 LearningRate 0.0007 Epoch: 2 Global Step: 4790 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:02:41,640-Speed 9343.78 samples/sec Loss 12.0901 LearningRate 0.0007 Epoch: 2 Global Step: 4800 Fp16 Grad Scale: 262144 Required: 47 hours Training: 2022-03-05 00:03:07,766-Speed 9407.24 samples/sec Loss 12.0638 LearningRate 0.0007 Epoch: 2 Global Step: 4810 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:03:34,086-Speed 9337.84 samples/sec Loss 12.0581 LearningRate 0.0007 Epoch: 2 Global Step: 4820 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:04:00,339-Speed 9361.57 samples/sec Loss 11.9482 LearningRate 0.0007 Epoch: 2 Global Step: 4830 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:04:26,583-Speed 9364.53 samples/sec Loss 11.9483 LearningRate 0.0007 Epoch: 2 Global Step: 4840 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:04:52,746-Speed 9394.16 samples/sec Loss 11.8822 LearningRate 0.0007 Epoch: 2 Global Step: 4850 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:05:18,855-Speed 9413.09 samples/sec Loss 11.8264 LearningRate 0.0007 Epoch: 2 Global Step: 4860 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-03-05 00:05:45,010-Speed 9397.00 samples/sec Loss 11.7611 LearningRate 0.0007 Epoch: 2 Global Step: 4870 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-03-05 00:06:11,238-Speed 9370.31 samples/sec Loss 11.7104 LearningRate 0.0007 Epoch: 2 Global Step: 4880 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-03-05 00:06:37,448-Speed 9376.79 samples/sec Loss 11.7299 LearningRate 0.0007 Epoch: 2 Global Step: 4890 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-03-05 00:07:03,609-Speed 9394.64 samples/sec Loss 11.6169 LearningRate 0.0007 Epoch: 2 Global Step: 4900 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-03-05 00:07:29,776-Speed 9392.93 samples/sec Loss 11.5412 LearningRate 0.0007 Epoch: 2 Global Step: 4910 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-03-05 00:07:56,043-Speed 9356.85 samples/sec Loss 11.5923 LearningRate 0.0007 Epoch: 2 Global Step: 4920 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-03-05 00:08:22,308-Speed 9357.39 samples/sec Loss 11.4901 LearningRate 0.0007 Epoch: 2 Global Step: 4930 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-03-05 00:08:48,609-Speed 9344.38 samples/sec Loss 11.4772 LearningRate 0.0007 Epoch: 2 Global Step: 4940 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-03-05 00:09:14,749-Speed 9402.17 samples/sec Loss 11.4281 LearningRate 0.0007 Epoch: 2 Global Step: 4950 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-03-05 00:09:40,872-Speed 9408.15 samples/sec Loss 11.3790 LearningRate 0.0007 Epoch: 2 Global Step: 4960 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:10:07,070-Speed 9381.38 samples/sec Loss 11.3694 LearningRate 0.0007 Epoch: 2 Global Step: 4970 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:10:33,180-Speed 9412.53 samples/sec Loss 11.2824 LearningRate 0.0007 Epoch: 2 Global Step: 4980 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:10:59,481-Speed 9344.71 samples/sec Loss 11.2905 LearningRate 0.0007 Epoch: 2 Global Step: 4990 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:11:25,589-Speed 9413.70 samples/sec Loss 11.2292 LearningRate 0.0007 Epoch: 2 Global Step: 5000 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:11:51,851-Speed 9358.25 samples/sec Loss 11.1455 LearningRate 0.0007 Epoch: 2 Global Step: 5010 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:12:18,060-Speed 9377.68 samples/sec Loss 11.1901 LearningRate 0.0007 Epoch: 2 Global Step: 5020 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:12:44,284-Speed 9371.91 samples/sec Loss 11.0826 LearningRate 0.0007 Epoch: 2 Global Step: 5030 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:13:10,482-Speed 9381.00 samples/sec Loss 11.0806 LearningRate 0.0007 Epoch: 2 Global Step: 5040 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:13:36,705-Speed 9372.41 samples/sec Loss 11.0331 LearningRate 0.0007 Epoch: 2 Global Step: 5050 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:14:02,905-Speed 9380.39 samples/sec Loss 10.9913 LearningRate 0.0007 Epoch: 2 Global Step: 5060 Fp16 Grad Scale: 262144 Required: 47 hours Training: 2022-03-05 00:14:29,133-Speed 9370.65 samples/sec Loss 10.9791 LearningRate 0.0007 Epoch: 2 Global Step: 5070 Fp16 Grad Scale: 262144 Required: 47 hours Training: 2022-03-05 00:14:55,308-Speed 9389.33 samples/sec Loss 10.9060 LearningRate 0.0007 Epoch: 2 Global Step: 5080 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:15:21,515-Speed 9378.23 samples/sec Loss 10.8355 LearningRate 0.0007 Epoch: 2 Global Step: 5090 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:15:47,597-Speed 9423.11 samples/sec Loss 10.7892 LearningRate 0.0007 Epoch: 2 Global Step: 5100 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-03-05 00:16:13,691-Speed 9418.67 samples/sec Loss 10.8642 LearningRate 0.0007 Epoch: 2 Global Step: 5110 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-03-05 00:16:39,970-Speed 9352.02 samples/sec Loss 10.7822 LearningRate 0.0007 Epoch: 2 Global Step: 5120 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-03-05 00:17:06,175-Speed 9378.86 samples/sec Loss 10.7967 LearningRate 0.0007 Epoch: 2 Global Step: 5130 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-03-05 00:17:32,404-Speed 9370.29 samples/sec Loss 10.7192 LearningRate 0.0007 Epoch: 2 Global Step: 5140 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-03-05 00:17:58,493-Speed 9420.55 samples/sec Loss 10.7132 LearningRate 0.0007 Epoch: 2 Global Step: 5150 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-03-05 00:18:24,598-Speed 9414.85 samples/sec Loss 10.6081 LearningRate 0.0007 Epoch: 2 Global Step: 5160 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-03-05 00:18:50,737-Speed 9402.33 samples/sec Loss 10.6645 LearningRate 0.0007 Epoch: 2 Global Step: 5170 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-03-05 00:19:16,922-Speed 9386.25 samples/sec Loss 10.6182 LearningRate 0.0007 Epoch: 2 Global Step: 5180 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-03-05 00:20:36,218-Speed 3099.32 samples/sec Loss 10.5035 LearningRate 0.0008 Epoch: 3 Global Step: 5190 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-03-05 00:21:02,213-Speed 9454.55 samples/sec Loss 10.4244 LearningRate 0.0008 Epoch: 3 Global Step: 5200 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:21:28,318-Speed 9414.96 samples/sec Loss 10.3780 LearningRate 0.0008 Epoch: 3 Global Step: 5210 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:21:54,574-Speed 9360.56 samples/sec Loss 10.3785 LearningRate 0.0008 Epoch: 3 Global Step: 5220 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:22:20,726-Speed 9397.75 samples/sec Loss 10.2695 LearningRate 0.0008 Epoch: 3 Global Step: 5230 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:22:46,847-Speed 9408.94 samples/sec Loss 10.3271 LearningRate 0.0008 Epoch: 3 Global Step: 5240 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:23:13,026-Speed 9387.97 samples/sec Loss 10.4045 LearningRate 0.0008 Epoch: 3 Global Step: 5250 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:23:39,265-Speed 9366.34 samples/sec Loss 10.2201 LearningRate 0.0008 Epoch: 3 Global Step: 5260 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:24:05,430-Speed 9393.42 samples/sec Loss 10.1490 LearningRate 0.0008 Epoch: 3 Global Step: 5270 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:24:31,626-Speed 9382.06 samples/sec Loss 10.1389 LearningRate 0.0008 Epoch: 3 Global Step: 5280 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:24:57,818-Speed 9383.41 samples/sec Loss 10.2095 LearningRate 0.0008 Epoch: 3 Global Step: 5290 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:25:24,090-Speed 9354.95 samples/sec Loss 10.2266 LearningRate 0.0008 Epoch: 3 Global Step: 5300 Fp16 Grad Scale: 262144 Required: 47 hours Training: 2022-03-05 00:25:50,293-Speed 9379.24 samples/sec Loss 10.0993 LearningRate 0.0008 Epoch: 3 Global Step: 5310 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:26:16,506-Speed 9376.21 samples/sec Loss 10.0482 LearningRate 0.0008 Epoch: 3 Global Step: 5320 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:26:42,712-Speed 9378.46 samples/sec Loss 10.0422 LearningRate 0.0008 Epoch: 3 Global Step: 5330 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:27:08,930-Speed 9374.07 samples/sec Loss 9.9599 LearningRate 0.0008 Epoch: 3 Global Step: 5340 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:27:35,147-Speed 9374.68 samples/sec Loss 9.9745 LearningRate 0.0008 Epoch: 3 Global Step: 5350 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:28:01,373-Speed 9371.33 samples/sec Loss 9.9956 LearningRate 0.0008 Epoch: 3 Global Step: 5360 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:28:27,529-Speed 9396.32 samples/sec Loss 9.9758 LearningRate 0.0008 Epoch: 3 Global Step: 5370 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:28:53,631-Speed 9415.78 samples/sec Loss 9.8970 LearningRate 0.0008 Epoch: 3 Global Step: 5380 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-03-05 00:29:19,797-Speed 9392.94 samples/sec Loss 9.8948 LearningRate 0.0008 Epoch: 3 Global Step: 5390 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-03-05 00:29:45,968-Speed 9391.03 samples/sec Loss 9.8361 LearningRate 0.0008 Epoch: 3 Global Step: 5400 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-03-05 00:30:12,082-Speed 9411.24 samples/sec Loss 9.7722 LearningRate 0.0008 Epoch: 3 Global Step: 5410 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-03-05 00:30:38,165-Speed 9422.45 samples/sec Loss 9.7329 LearningRate 0.0008 Epoch: 3 Global Step: 5420 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-03-05 00:31:04,287-Speed 9408.78 samples/sec Loss 9.8135 LearningRate 0.0008 Epoch: 3 Global Step: 5430 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-03-05 00:31:30,498-Speed 9376.64 samples/sec Loss 9.7378 LearningRate 0.0008 Epoch: 3 Global Step: 5440 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-03-05 00:31:56,701-Speed 9379.31 samples/sec Loss 9.6962 LearningRate 0.0008 Epoch: 3 Global Step: 5450 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-03-05 00:32:22,888-Speed 9385.16 samples/sec Loss 9.6719 LearningRate 0.0008 Epoch: 3 Global Step: 5460 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-03-05 00:32:49,062-Speed 9389.88 samples/sec Loss 9.7046 LearningRate 0.0008 Epoch: 3 Global Step: 5470 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-03-05 00:33:15,239-Speed 9388.63 samples/sec Loss 9.6187 LearningRate 0.0008 Epoch: 3 Global Step: 5480 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:33:41,421-Speed 9386.87 samples/sec Loss 9.5813 LearningRate 0.0008 Epoch: 3 Global Step: 5490 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:34:07,647-Speed 9371.52 samples/sec Loss 9.5887 LearningRate 0.0008 Epoch: 3 Global Step: 5500 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:34:33,838-Speed 9383.69 samples/sec Loss 9.6137 LearningRate 0.0008 Epoch: 3 Global Step: 5510 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:34:59,948-Speed 9413.02 samples/sec Loss 9.5180 LearningRate 0.0008 Epoch: 3 Global Step: 5520 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:35:26,154-Speed 9378.47 samples/sec Loss 9.5224 LearningRate 0.0008 Epoch: 3 Global Step: 5530 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:35:52,304-Speed 9398.19 samples/sec Loss 9.4515 LearningRate 0.0008 Epoch: 3 Global Step: 5540 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:36:18,526-Speed 9372.75 samples/sec Loss 9.3836 LearningRate 0.0008 Epoch: 3 Global Step: 5550 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:36:44,665-Speed 9402.35 samples/sec Loss 9.3551 LearningRate 0.0008 Epoch: 3 Global Step: 5560 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:37:10,747-Speed 9423.07 samples/sec Loss 9.3328 LearningRate 0.0008 Epoch: 3 Global Step: 5570 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:37:36,945-Speed 9381.05 samples/sec Loss 9.4622 LearningRate 0.0008 Epoch: 3 Global Step: 5580 Fp16 Grad Scale: 262144 Required: 47 hours Training: 2022-03-05 00:38:03,070-Speed 9407.44 samples/sec Loss 9.3932 LearningRate 0.0008 Epoch: 3 Global Step: 5590 Fp16 Grad Scale: 262144 Required: 47 hours Training: 2022-03-05 00:38:29,269-Speed 9381.12 samples/sec Loss 9.3170 LearningRate 0.0008 Epoch: 3 Global Step: 5600 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:38:55,481-Speed 9376.18 samples/sec Loss 9.3042 LearningRate 0.0008 Epoch: 3 Global Step: 5610 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:39:21,633-Speed 9397.75 samples/sec Loss 9.2678 LearningRate 0.0008 Epoch: 3 Global Step: 5620 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:39:47,862-Speed 9370.19 samples/sec Loss 9.1868 LearningRate 0.0008 Epoch: 3 Global Step: 5630 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:40:13,941-Speed 9423.96 samples/sec Loss 9.1914 LearningRate 0.0008 Epoch: 3 Global Step: 5640 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:40:40,022-Speed 9423.20 samples/sec Loss 9.1134 LearningRate 0.0008 Epoch: 3 Global Step: 5650 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:41:06,069-Speed 9435.39 samples/sec Loss 9.1663 LearningRate 0.0008 Epoch: 3 Global Step: 5660 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:41:32,189-Speed 9409.50 samples/sec Loss 9.1508 LearningRate 0.0008 Epoch: 3 Global Step: 5670 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:41:58,258-Speed 9427.51 samples/sec Loss 9.0630 LearningRate 0.0008 Epoch: 3 Global Step: 5680 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:42:24,350-Speed 9419.49 samples/sec Loss 9.0734 LearningRate 0.0008 Epoch: 3 Global Step: 5690 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:42:50,463-Speed 9411.49 samples/sec Loss 9.0755 LearningRate 0.0008 Epoch: 3 Global Step: 5700 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:43:16,589-Speed 9407.26 samples/sec Loss 9.0723 LearningRate 0.0008 Epoch: 3 Global Step: 5710 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:43:42,811-Speed 9372.75 samples/sec Loss 9.0221 LearningRate 0.0008 Epoch: 3 Global Step: 5720 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:44:08,958-Speed 9399.28 samples/sec Loss 9.0340 LearningRate 0.0008 Epoch: 3 Global Step: 5730 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:44:35,064-Speed 9414.27 samples/sec Loss 8.9136 LearningRate 0.0008 Epoch: 3 Global Step: 5740 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-03-05 00:45:01,203-Speed 9402.39 samples/sec Loss 8.9262 LearningRate 0.0008 Epoch: 3 Global Step: 5750 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-03-05 00:45:27,368-Speed 9393.41 samples/sec Loss 8.8822 LearningRate 0.0008 Epoch: 3 Global Step: 5760 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-03-05 00:45:53,503-Speed 9403.62 samples/sec Loss 8.8773 LearningRate 0.0008 Epoch: 3 Global Step: 5770 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-03-05 00:46:19,629-Speed 9407.16 samples/sec Loss 8.8142 LearningRate 0.0008 Epoch: 3 Global Step: 5780 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-03-05 00:46:45,812-Speed 9386.53 samples/sec Loss 8.9814 LearningRate 0.0008 Epoch: 3 Global Step: 5790 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-03-05 00:47:11,861-Speed 9435.06 samples/sec Loss 8.8660 LearningRate 0.0008 Epoch: 3 Global Step: 5800 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-03-05 00:47:37,895-Speed 9440.77 samples/sec Loss 8.8238 LearningRate 0.0008 Epoch: 3 Global Step: 5810 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-03-05 00:48:03,987-Speed 9419.25 samples/sec Loss 8.7663 LearningRate 0.0008 Epoch: 3 Global Step: 5820 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-03-05 00:48:30,173-Speed 9385.68 samples/sec Loss 8.7663 LearningRate 0.0008 Epoch: 3 Global Step: 5830 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-03-05 00:48:56,290-Speed 9410.32 samples/sec Loss 8.7775 LearningRate 0.0008 Epoch: 3 Global Step: 5840 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:49:22,475-Speed 9386.24 samples/sec Loss 8.7606 LearningRate 0.0008 Epoch: 3 Global Step: 5850 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:49:48,592-Speed 9410.52 samples/sec Loss 8.7120 LearningRate 0.0008 Epoch: 3 Global Step: 5860 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:50:14,749-Speed 9395.97 samples/sec Loss 8.6315 LearningRate 0.0008 Epoch: 3 Global Step: 5870 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:50:40,883-Speed 9404.13 samples/sec Loss 8.6725 LearningRate 0.0009 Epoch: 3 Global Step: 5880 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:51:07,099-Speed 9374.75 samples/sec Loss 8.6652 LearningRate 0.0009 Epoch: 3 Global Step: 5890 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:51:33,304-Speed 9378.84 samples/sec Loss 8.7229 LearningRate 0.0009 Epoch: 3 Global Step: 5900 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:51:59,466-Speed 9394.23 samples/sec Loss 8.6382 LearningRate 0.0009 Epoch: 3 Global Step: 5910 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-03-05 00:52:25,645-Speed 9388.29 samples/sec Loss 8.5562 LearningRate 0.0009 Epoch: 3 Global Step: 5920 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 00:52:51,836-Speed 9383.71 samples/sec Loss 8.5631 LearningRate 0.0009 Epoch: 3 Global Step: 5930 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 00:53:17,972-Speed 9403.32 samples/sec Loss 8.5657 LearningRate 0.0009 Epoch: 3 Global Step: 5940 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-03-05 00:53:44,163-Speed 9383.96 samples/sec Loss 8.5307 LearningRate 0.0009 Epoch: 3 Global Step: 5950 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-03-05 00:54:10,401-Speed 9367.25 samples/sec Loss 8.4970 LearningRate 0.0009 Epoch: 3 Global Step: 5960 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-03-05 00:54:36,558-Speed 9395.74 samples/sec Loss 8.4642 LearningRate 0.0009 Epoch: 3 Global Step: 5970 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-03-05 00:55:02,798-Speed 9366.40 samples/sec Loss 8.4620 LearningRate 0.0009 Epoch: 3 Global Step: 5980 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-03-05 00:55:28,916-Speed 9409.79 samples/sec Loss 8.4322 LearningRate 0.0009 Epoch: 3 Global Step: 5990 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-03-05 00:55:55,052-Speed 9403.94 samples/sec Loss 8.3797 LearningRate 0.0009 Epoch: 3 Global Step: 6000 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-03-05 00:56:21,315-Speed 9357.93 samples/sec Loss 8.4301 LearningRate 0.0009 Epoch: 3 Global Step: 6010 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-03-05 00:56:47,428-Speed 9412.34 samples/sec Loss 8.3944 LearningRate 0.0009 Epoch: 3 Global Step: 6020 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-03-05 00:57:13,539-Speed 9412.48 samples/sec Loss 8.3799 LearningRate 0.0009 Epoch: 3 Global Step: 6030 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-03-05 00:57:39,722-Speed 9386.53 samples/sec Loss 8.3421 LearningRate 0.0009 Epoch: 3 Global Step: 6040 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 00:58:05,866-Speed 9400.88 samples/sec Loss 8.3068 LearningRate 0.0009 Epoch: 3 Global Step: 6050 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 00:58:31,928-Speed 9430.07 samples/sec Loss 8.2714 LearningRate 0.0009 Epoch: 3 Global Step: 6060 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-03-05 00:58:58,065-Speed 9403.33 samples/sec Loss 8.3461 LearningRate 0.0009 Epoch: 3 Global Step: 6070 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-03-05 00:59:24,218-Speed 9397.26 samples/sec Loss 8.2961 LearningRate 0.0009 Epoch: 3 Global Step: 6080 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-03-05 00:59:50,315-Speed 9417.61 samples/sec Loss 8.3285 LearningRate 0.0009 Epoch: 3 Global Step: 6090 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-03-05 01:00:16,555-Speed 9367.14 samples/sec Loss 8.2633 LearningRate 0.0009 Epoch: 3 Global Step: 6100 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-03-05 01:00:42,662-Speed 9413.85 samples/sec Loss 8.2278 LearningRate 0.0009 Epoch: 3 Global Step: 6110 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-03-05 01:01:08,775-Speed 9412.01 samples/sec Loss 8.1667 LearningRate 0.0009 Epoch: 3 Global Step: 6120 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-03-05 01:01:34,825-Speed 9434.34 samples/sec Loss 8.1817 LearningRate 0.0009 Epoch: 3 Global Step: 6130 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-03-05 01:02:00,882-Speed 9431.82 samples/sec Loss 8.1854 LearningRate 0.0009 Epoch: 3 Global Step: 6140 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-03-05 01:02:26,991-Speed 9413.51 samples/sec Loss 8.1729 LearningRate 0.0009 Epoch: 3 Global Step: 6150 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-03-05 01:02:53,074-Speed 9422.38 samples/sec Loss 8.1389 LearningRate 0.0009 Epoch: 3 Global Step: 6160 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:03:19,249-Speed 9389.76 samples/sec Loss 8.0828 LearningRate 0.0009 Epoch: 3 Global Step: 6170 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:03:45,375-Speed 9406.80 samples/sec Loss 8.0918 LearningRate 0.0009 Epoch: 3 Global Step: 6180 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:04:11,496-Speed 9408.81 samples/sec Loss 8.1057 LearningRate 0.0009 Epoch: 3 Global Step: 6190 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:04:37,553-Speed 9432.33 samples/sec Loss 8.0808 LearningRate 0.0009 Epoch: 3 Global Step: 6200 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:05:03,685-Speed 9404.93 samples/sec Loss 8.0145 LearningRate 0.0009 Epoch: 3 Global Step: 6210 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-03-05 01:05:29,889-Speed 9379.47 samples/sec Loss 8.0368 LearningRate 0.0009 Epoch: 3 Global Step: 6220 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-03-05 01:05:56,047-Speed 9395.54 samples/sec Loss 7.9887 LearningRate 0.0009 Epoch: 3 Global Step: 6230 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-03-05 01:06:22,176-Speed 9405.98 samples/sec Loss 8.1119 LearningRate 0.0009 Epoch: 3 Global Step: 6240 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-03-05 01:06:48,295-Speed 9409.58 samples/sec Loss 8.0351 LearningRate 0.0009 Epoch: 3 Global Step: 6250 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-03-05 01:07:14,515-Speed 9373.58 samples/sec Loss 7.9773 LearningRate 0.0009 Epoch: 3 Global Step: 6260 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-03-05 01:07:40,727-Speed 9376.07 samples/sec Loss 7.9356 LearningRate 0.0009 Epoch: 3 Global Step: 6270 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-03-05 01:08:06,819-Speed 9419.43 samples/sec Loss 7.9890 LearningRate 0.0009 Epoch: 3 Global Step: 6280 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-03-05 01:08:32,839-Speed 9445.56 samples/sec Loss 7.9465 LearningRate 0.0009 Epoch: 3 Global Step: 6290 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-03-05 01:08:58,958-Speed 9409.56 samples/sec Loss 7.9356 LearningRate 0.0009 Epoch: 3 Global Step: 6300 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-03-05 01:09:25,143-Speed 9385.86 samples/sec Loss 7.8832 LearningRate 0.0009 Epoch: 3 Global Step: 6310 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:09:51,211-Speed 9428.21 samples/sec Loss 7.8497 LearningRate 0.0009 Epoch: 3 Global Step: 6320 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:10:17,382-Speed 9390.77 samples/sec Loss 7.7864 LearningRate 0.0009 Epoch: 3 Global Step: 6330 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:10:43,474-Speed 9419.77 samples/sec Loss 7.8885 LearningRate 0.0009 Epoch: 3 Global Step: 6340 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:11:09,702-Speed 9370.58 samples/sec Loss 7.8661 LearningRate 0.0009 Epoch: 3 Global Step: 6350 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:11:35,948-Speed 9363.88 samples/sec Loss 7.7723 LearningRate 0.0009 Epoch: 3 Global Step: 6360 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:12:02,025-Speed 9424.88 samples/sec Loss 7.8317 LearningRate 0.0009 Epoch: 3 Global Step: 6370 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:12:28,266-Speed 9365.80 samples/sec Loss 7.7841 LearningRate 0.0009 Epoch: 3 Global Step: 6380 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:12:54,374-Speed 9413.58 samples/sec Loss 7.7514 LearningRate 0.0009 Epoch: 3 Global Step: 6390 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:13:20,440-Speed 9428.94 samples/sec Loss 7.8425 LearningRate 0.0009 Epoch: 3 Global Step: 6400 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:13:46,544-Speed 9414.95 samples/sec Loss 7.8011 LearningRate 0.0009 Epoch: 3 Global Step: 6410 Fp16 Grad Scale: 262144 Required: 46 hours Training: 2022-03-05 01:14:12,565-Speed 9444.90 samples/sec Loss 7.7263 LearningRate 0.0009 Epoch: 3 Global Step: 6420 Fp16 Grad Scale: 262144 Required: 46 hours Training: 2022-03-05 01:14:38,673-Speed 9413.65 samples/sec Loss 7.6786 LearningRate 0.0009 Epoch: 3 Global Step: 6430 Fp16 Grad Scale: 262144 Required: 46 hours Training: 2022-03-05 01:15:04,790-Speed 9410.44 samples/sec Loss 7.6738 LearningRate 0.0009 Epoch: 3 Global Step: 6440 Fp16 Grad Scale: 262144 Required: 46 hours Training: 2022-03-05 01:15:30,899-Speed 9413.32 samples/sec Loss 7.6419 LearningRate 0.0009 Epoch: 3 Global Step: 6450 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:15:57,020-Speed 9409.15 samples/sec Loss 7.6810 LearningRate 0.0009 Epoch: 3 Global Step: 6460 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:16:23,123-Speed 9415.39 samples/sec Loss 7.6609 LearningRate 0.0009 Epoch: 3 Global Step: 6470 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:16:49,324-Speed 9380.17 samples/sec Loss 7.6310 LearningRate 0.0009 Epoch: 3 Global Step: 6480 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:17:15,495-Speed 9390.73 samples/sec Loss 7.6295 LearningRate 0.0009 Epoch: 3 Global Step: 6490 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:17:41,805-Speed 9341.71 samples/sec Loss 7.5982 LearningRate 0.0009 Epoch: 3 Global Step: 6500 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:18:07,944-Speed 9402.14 samples/sec Loss 7.5651 LearningRate 0.0009 Epoch: 3 Global Step: 6510 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:18:34,103-Speed 9395.14 samples/sec Loss 7.5808 LearningRate 0.0009 Epoch: 3 Global Step: 6520 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:19:00,230-Speed 9406.91 samples/sec Loss 7.5706 LearningRate 0.0009 Epoch: 3 Global Step: 6530 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:19:26,379-Speed 9399.03 samples/sec Loss 7.5436 LearningRate 0.0009 Epoch: 3 Global Step: 6540 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:19:52,516-Speed 9403.04 samples/sec Loss 7.5398 LearningRate 0.0009 Epoch: 3 Global Step: 6550 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:20:18,723-Speed 9377.78 samples/sec Loss 7.5365 LearningRate 0.0009 Epoch: 3 Global Step: 6560 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:20:44,894-Speed 9391.02 samples/sec Loss 7.5070 LearningRate 0.0010 Epoch: 3 Global Step: 6570 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:21:11,005-Speed 9413.09 samples/sec Loss 7.5000 LearningRate 0.0010 Epoch: 3 Global Step: 6580 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:21:37,167-Speed 9394.31 samples/sec Loss 7.4404 LearningRate 0.0010 Epoch: 3 Global Step: 6590 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:22:03,290-Speed 9408.24 samples/sec Loss 7.4878 LearningRate 0.0010 Epoch: 3 Global Step: 6600 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:22:29,429-Speed 9402.51 samples/sec Loss 7.5212 LearningRate 0.0010 Epoch: 3 Global Step: 6610 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:22:55,620-Speed 9383.52 samples/sec Loss 7.4579 LearningRate 0.0010 Epoch: 3 Global Step: 6620 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:23:21,723-Speed 9415.47 samples/sec Loss 7.4489 LearningRate 0.0010 Epoch: 3 Global Step: 6630 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:23:47,770-Speed 9435.78 samples/sec Loss 7.4256 LearningRate 0.0010 Epoch: 3 Global Step: 6640 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:24:13,955-Speed 9385.98 samples/sec Loss 7.4070 LearningRate 0.0010 Epoch: 3 Global Step: 6650 Fp16 Grad Scale: 262144 Required: 46 hours Training: 2022-03-05 01:24:40,198-Speed 9365.05 samples/sec Loss 7.4203 LearningRate 0.0010 Epoch: 3 Global Step: 6660 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:25:06,359-Speed 9394.58 samples/sec Loss 7.3474 LearningRate 0.0010 Epoch: 3 Global Step: 6670 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:25:32,556-Speed 9381.73 samples/sec Loss 7.3903 LearningRate 0.0010 Epoch: 3 Global Step: 6680 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:25:58,681-Speed 9407.60 samples/sec Loss 7.3195 LearningRate 0.0010 Epoch: 3 Global Step: 6690 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:26:24,798-Speed 9410.15 samples/sec Loss 7.3466 LearningRate 0.0010 Epoch: 3 Global Step: 6700 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:26:50,907-Speed 9413.21 samples/sec Loss 7.3632 LearningRate 0.0010 Epoch: 3 Global Step: 6710 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:27:17,033-Speed 9407.08 samples/sec Loss 7.3247 LearningRate 0.0010 Epoch: 3 Global Step: 6720 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:27:43,136-Speed 9415.67 samples/sec Loss 7.3299 LearningRate 0.0010 Epoch: 3 Global Step: 6730 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:28:09,264-Speed 9406.22 samples/sec Loss 7.3136 LearningRate 0.0010 Epoch: 3 Global Step: 6740 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:28:35,516-Speed 9362.13 samples/sec Loss 7.2815 LearningRate 0.0010 Epoch: 3 Global Step: 6750 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:29:01,716-Speed 9380.29 samples/sec Loss 7.2564 LearningRate 0.0010 Epoch: 3 Global Step: 6760 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:29:27,893-Speed 9388.96 samples/sec Loss 7.2968 LearningRate 0.0010 Epoch: 3 Global Step: 6770 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:29:54,053-Speed 9394.73 samples/sec Loss 7.2566 LearningRate 0.0010 Epoch: 3 Global Step: 6780 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:30:20,170-Speed 9410.48 samples/sec Loss 7.2770 LearningRate 0.0010 Epoch: 3 Global Step: 6790 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:30:46,262-Speed 9419.46 samples/sec Loss 7.2731 LearningRate 0.0010 Epoch: 3 Global Step: 6800 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:31:12,363-Speed 9415.96 samples/sec Loss 7.2177 LearningRate 0.0010 Epoch: 3 Global Step: 6810 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:31:38,485-Speed 9408.61 samples/sec Loss 7.2937 LearningRate 0.0010 Epoch: 3 Global Step: 6820 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-03-05 01:32:04,641-Speed 9396.47 samples/sec Loss 7.2050 LearningRate 0.0010 Epoch: 3 Global Step: 6830 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-03-05 01:32:30,824-Speed 9386.48 samples/sec Loss 7.2383 LearningRate 0.0010 Epoch: 3 Global Step: 6840 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-03-05 01:32:56,925-Speed 9416.15 samples/sec Loss 7.2708 LearningRate 0.0010 Epoch: 3 Global Step: 6850 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-03-05 01:33:23,068-Speed 9400.76 samples/sec Loss 7.2365 LearningRate 0.0010 Epoch: 3 Global Step: 6860 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-03-05 01:33:49,227-Speed 9395.56 samples/sec Loss 7.2244 LearningRate 0.0010 Epoch: 3 Global Step: 6870 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-03-05 01:34:15,409-Speed 9387.13 samples/sec Loss 7.1772 LearningRate 0.0010 Epoch: 3 Global Step: 6880 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-03-05 01:34:41,675-Speed 9356.59 samples/sec Loss 7.1295 LearningRate 0.0010 Epoch: 3 Global Step: 6890 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-03-05 01:35:07,766-Speed 9419.85 samples/sec Loss 7.1183 LearningRate 0.0010 Epoch: 3 Global Step: 6900 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-03-05 01:35:33,866-Speed 9416.53 samples/sec Loss 7.1383 LearningRate 0.0010 Epoch: 3 Global Step: 6910 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-03-05 01:36:52,848-Speed 3111.63 samples/sec Loss 7.0619 LearningRate 0.0010 Epoch: 4 Global Step: 6920 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:37:18,741-Speed 9492.07 samples/sec Loss 7.0603 LearningRate 0.0010 Epoch: 4 Global Step: 6930 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:37:44,895-Speed 9397.09 samples/sec Loss 7.0336 LearningRate 0.0010 Epoch: 4 Global Step: 6940 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:38:10,994-Speed 9416.90 samples/sec Loss 6.9824 LearningRate 0.0010 Epoch: 4 Global Step: 6950 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:38:37,156-Speed 9394.06 samples/sec Loss 7.0056 LearningRate 0.0010 Epoch: 4 Global Step: 6960 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:39:03,238-Speed 9422.94 samples/sec Loss 7.0229 LearningRate 0.0010 Epoch: 4 Global Step: 6970 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:39:29,343-Speed 9414.57 samples/sec Loss 6.9540 LearningRate 0.0010 Epoch: 4 Global Step: 6980 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:39:55,449-Speed 9414.21 samples/sec Loss 6.9536 LearningRate 0.0010 Epoch: 4 Global Step: 6990 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:40:21,587-Speed 9402.84 samples/sec Loss 6.9495 LearningRate 0.0010 Epoch: 4 Global Step: 7000 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:40:47,690-Speed 9415.58 samples/sec Loss 6.9251 LearningRate 0.0010 Epoch: 4 Global Step: 7010 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:41:13,803-Speed 9411.67 samples/sec Loss 6.9118 LearningRate 0.0010 Epoch: 4 Global Step: 7020 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:41:39,835-Speed 9441.38 samples/sec Loss 6.9003 LearningRate 0.0010 Epoch: 4 Global Step: 7030 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:42:05,951-Speed 9410.58 samples/sec Loss 6.9278 LearningRate 0.0010 Epoch: 4 Global Step: 7040 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:42:32,085-Speed 9404.15 samples/sec Loss 6.9165 LearningRate 0.0010 Epoch: 4 Global Step: 7050 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:42:58,209-Speed 9408.01 samples/sec Loss 6.8950 LearningRate 0.0010 Epoch: 4 Global Step: 7060 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:43:24,389-Speed 9387.85 samples/sec Loss 6.8704 LearningRate 0.0010 Epoch: 4 Global Step: 7070 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:43:50,509-Speed 9409.41 samples/sec Loss 6.9546 LearningRate 0.0010 Epoch: 4 Global Step: 7080 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:44:16,703-Speed 9382.50 samples/sec Loss 6.8495 LearningRate 0.0010 Epoch: 4 Global Step: 7090 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:44:42,769-Speed 9428.79 samples/sec Loss 6.9050 LearningRate 0.0010 Epoch: 4 Global Step: 7100 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:45:08,829-Speed 9431.00 samples/sec Loss 6.8117 LearningRate 0.0010 Epoch: 4 Global Step: 7110 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:45:34,919-Speed 9420.31 samples/sec Loss 6.8458 LearningRate 0.0010 Epoch: 4 Global Step: 7120 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-03-05 01:46:01,039-Speed 9409.77 samples/sec Loss 6.8099 LearningRate 0.0010 Epoch: 4 Global Step: 7130 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-03-05 01:46:27,107-Speed 9428.12 samples/sec Loss 6.7774 LearningRate 0.0010 Epoch: 4 Global Step: 7140 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-03-05 01:46:53,268-Speed 9394.65 samples/sec Loss 6.8436 LearningRate 0.0010 Epoch: 4 Global Step: 7150 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-03-05 01:47:19,355-Speed 9421.13 samples/sec Loss 6.7902 LearningRate 0.0010 Epoch: 4 Global Step: 7160 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-03-05 01:47:45,531-Speed 9389.28 samples/sec Loss 6.7476 LearningRate 0.0010 Epoch: 4 Global Step: 7170 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-03-05 01:48:11,610-Speed 9424.25 samples/sec Loss 6.7390 LearningRate 0.0010 Epoch: 4 Global Step: 7180 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-03-05 01:48:37,790-Speed 9387.54 samples/sec Loss 6.7377 LearningRate 0.0010 Epoch: 4 Global Step: 7190 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-03-05 01:49:03,905-Speed 9410.99 samples/sec Loss 6.7601 LearningRate 0.0010 Epoch: 4 Global Step: 7200 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-03-05 01:49:30,108-Speed 9379.65 samples/sec Loss 6.6831 LearningRate 0.0010 Epoch: 4 Global Step: 7210 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-03-05 01:49:56,305-Speed 9381.43 samples/sec Loss 6.6802 LearningRate 0.0010 Epoch: 4 Global Step: 7220 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:50:22,432-Speed 9406.98 samples/sec Loss 6.7104 LearningRate 0.0010 Epoch: 4 Global Step: 7230 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:50:48,636-Speed 9379.20 samples/sec Loss 6.6893 LearningRate 0.0010 Epoch: 4 Global Step: 7240 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-03-05 01:51:14,743-Speed 9413.98 samples/sec Loss 6.6512 LearningRate 0.0010 Epoch: 4 Global Step: 7250 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-03-05 01:51:41,019-Speed 9353.30 samples/sec Loss 6.6788 LearningRate 0.0010 Epoch: 4 Global Step: 7260 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-03-05 01:52:07,204-Speed 9386.12 samples/sec Loss 6.6842 LearningRate 0.0010 Epoch: 4 Global Step: 7270 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-03-05 01:52:33,407-Speed 9379.27 samples/sec Loss 6.6308 LearningRate 0.0010 Epoch: 4 Global Step: 7280 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-03-05 01:52:59,555-Speed 9399.59 samples/sec Loss 6.5917 LearningRate 0.0010 Epoch: 4 Global Step: 7290 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 01:53:25,684-Speed 9405.98 samples/sec Loss 6.6053 LearningRate 0.0010 Epoch: 4 Global Step: 7300 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 01:53:51,885-Speed 9380.13 samples/sec Loss 6.6214 LearningRate 0.0010 Epoch: 4 Global Step: 7310 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 01:54:17,989-Speed 9415.44 samples/sec Loss 6.6041 LearningRate 0.0010 Epoch: 4 Global Step: 7320 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 01:54:44,245-Speed 9360.50 samples/sec Loss 6.5264 LearningRate 0.0010 Epoch: 4 Global Step: 7330 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 01:55:10,383-Speed 9402.90 samples/sec Loss 6.5980 LearningRate 0.0010 Epoch: 4 Global Step: 7340 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 01:55:36,569-Speed 9385.66 samples/sec Loss 6.5166 LearningRate 0.0010 Epoch: 4 Global Step: 7350 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 01:56:02,792-Speed 9372.31 samples/sec Loss 6.5335 LearningRate 0.0010 Epoch: 4 Global Step: 7360 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 01:56:28,859-Speed 9428.48 samples/sec Loss 6.6218 LearningRate 0.0010 Epoch: 4 Global Step: 7370 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 01:56:55,015-Speed 9396.36 samples/sec Loss 6.5264 LearningRate 0.0010 Epoch: 4 Global Step: 7380 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 01:57:21,185-Speed 9391.45 samples/sec Loss 6.5322 LearningRate 0.0010 Epoch: 4 Global Step: 7390 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 01:57:47,330-Speed 9400.44 samples/sec Loss 6.5185 LearningRate 0.0010 Epoch: 4 Global Step: 7400 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 01:58:13,530-Speed 9380.74 samples/sec Loss 6.4990 LearningRate 0.0010 Epoch: 4 Global Step: 7410 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 01:58:39,639-Speed 9413.41 samples/sec Loss 6.4377 LearningRate 0.0010 Epoch: 4 Global Step: 7420 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 01:59:05,813-Speed 9390.04 samples/sec Loss 6.4374 LearningRate 0.0010 Epoch: 4 Global Step: 7430 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 01:59:31,932-Speed 9409.41 samples/sec Loss 6.4792 LearningRate 0.0010 Epoch: 4 Global Step: 7440 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 01:59:58,000-Speed 9428.23 samples/sec Loss 6.4280 LearningRate 0.0010 Epoch: 4 Global Step: 7450 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 02:00:24,195-Speed 9382.48 samples/sec Loss 6.4579 LearningRate 0.0010 Epoch: 4 Global Step: 7460 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 02:00:50,341-Speed 9399.96 samples/sec Loss 6.4510 LearningRate 0.0010 Epoch: 4 Global Step: 7470 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 02:01:16,469-Speed 9406.33 samples/sec Loss 6.4101 LearningRate 0.0010 Epoch: 4 Global Step: 7480 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 02:01:42,657-Speed 9384.80 samples/sec Loss 6.3892 LearningRate 0.0010 Epoch: 4 Global Step: 7490 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 02:02:08,840-Speed 9386.51 samples/sec Loss 6.4289 LearningRate 0.0010 Epoch: 4 Global Step: 7500 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 02:02:35,029-Speed 9384.77 samples/sec Loss 6.4055 LearningRate 0.0010 Epoch: 4 Global Step: 7510 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 02:03:01,196-Speed 9392.59 samples/sec Loss 6.3372 LearningRate 0.0010 Epoch: 4 Global Step: 7520 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 02:03:27,409-Speed 9375.79 samples/sec Loss 6.3962 LearningRate 0.0010 Epoch: 4 Global Step: 7530 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 02:03:53,688-Speed 9352.18 samples/sec Loss 6.2861 LearningRate 0.0010 Epoch: 4 Global Step: 7540 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 02:04:19,944-Speed 9360.59 samples/sec Loss 6.3183 LearningRate 0.0010 Epoch: 4 Global Step: 7550 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-03-05 02:04:46,096-Speed 9398.42 samples/sec Loss 6.3467 LearningRate 0.0010 Epoch: 4 Global Step: 7560 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 02:05:12,326-Speed 9369.80 samples/sec Loss 6.2746 LearningRate 0.0010 Epoch: 4 Global Step: 7570 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 02:05:38,491-Speed 9393.13 samples/sec Loss 6.2870 LearningRate 0.0010 Epoch: 4 Global Step: 7580 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 02:06:04,677-Speed 9385.44 samples/sec Loss 6.2968 LearningRate 0.0010 Epoch: 4 Global Step: 7590 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 02:06:30,794-Speed 9411.28 samples/sec Loss 6.2425 LearningRate 0.0010 Epoch: 4 Global Step: 7600 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 02:06:56,894-Speed 9416.40 samples/sec Loss 6.2853 LearningRate 0.0010 Epoch: 4 Global Step: 7610 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 02:07:23,059-Speed 9393.15 samples/sec Loss 6.2480 LearningRate 0.0010 Epoch: 4 Global Step: 7620 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 02:07:49,218-Speed 9395.17 samples/sec Loss 6.2440 LearningRate 0.0010 Epoch: 4 Global Step: 7630 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 02:08:15,444-Speed 9370.93 samples/sec Loss 6.2055 LearningRate 0.0010 Epoch: 4 Global Step: 7640 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 02:08:41,602-Speed 9395.98 samples/sec Loss 6.2159 LearningRate 0.0010 Epoch: 4 Global Step: 7650 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 02:09:07,764-Speed 9394.22 samples/sec Loss 6.2180 LearningRate 0.0010 Epoch: 4 Global Step: 7660 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 02:09:33,907-Speed 9400.91 samples/sec Loss 6.2008 LearningRate 0.0010 Epoch: 4 Global Step: 7670 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:10:00,060-Speed 9397.16 samples/sec Loss 6.2154 LearningRate 0.0010 Epoch: 4 Global Step: 7680 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:10:26,334-Speed 9354.29 samples/sec Loss 6.1497 LearningRate 0.0010 Epoch: 4 Global Step: 7690 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:10:52,582-Speed 9363.35 samples/sec Loss 6.1427 LearningRate 0.0010 Epoch: 4 Global Step: 7700 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:11:18,725-Speed 9400.73 samples/sec Loss 6.1468 LearningRate 0.0010 Epoch: 4 Global Step: 7710 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:11:44,955-Speed 9370.05 samples/sec Loss 6.1768 LearningRate 0.0010 Epoch: 4 Global Step: 7720 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:12:11,248-Speed 9347.21 samples/sec Loss 6.1342 LearningRate 0.0010 Epoch: 4 Global Step: 7730 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:12:37,457-Speed 9377.39 samples/sec Loss 6.1000 LearningRate 0.0010 Epoch: 4 Global Step: 7740 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:13:03,662-Speed 9378.98 samples/sec Loss 6.1616 LearningRate 0.0010 Epoch: 4 Global Step: 7750 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:13:29,878-Speed 9374.72 samples/sec Loss 6.1043 LearningRate 0.0010 Epoch: 4 Global Step: 7760 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:13:56,195-Speed 9338.68 samples/sec Loss 6.1267 LearningRate 0.0010 Epoch: 4 Global Step: 7770 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 02:14:22,405-Speed 9378.08 samples/sec Loss 6.0739 LearningRate 0.0010 Epoch: 4 Global Step: 7780 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 02:14:48,657-Speed 9362.07 samples/sec Loss 6.0513 LearningRate 0.0010 Epoch: 4 Global Step: 7790 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 02:15:14,898-Speed 9365.87 samples/sec Loss 6.0824 LearningRate 0.0010 Epoch: 4 Global Step: 7800 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 02:15:41,046-Speed 9399.26 samples/sec Loss 6.0179 LearningRate 0.0010 Epoch: 4 Global Step: 7810 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:16:07,230-Speed 9386.01 samples/sec Loss 6.0486 LearningRate 0.0010 Epoch: 4 Global Step: 7820 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:16:33,421-Speed 9384.42 samples/sec Loss 6.0717 LearningRate 0.0010 Epoch: 4 Global Step: 7830 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:16:59,634-Speed 9375.68 samples/sec Loss 5.9894 LearningRate 0.0010 Epoch: 4 Global Step: 7840 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:17:25,819-Speed 9386.23 samples/sec Loss 6.0351 LearningRate 0.0010 Epoch: 4 Global Step: 7850 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:17:51,959-Speed 9401.85 samples/sec Loss 6.0337 LearningRate 0.0010 Epoch: 4 Global Step: 7860 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:18:18,111-Speed 9397.87 samples/sec Loss 6.0213 LearningRate 0.0010 Epoch: 4 Global Step: 7870 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:18:44,307-Speed 9381.79 samples/sec Loss 6.0081 LearningRate 0.0010 Epoch: 4 Global Step: 7880 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:19:10,583-Speed 9353.63 samples/sec Loss 5.9360 LearningRate 0.0010 Epoch: 4 Global Step: 7890 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:19:36,851-Speed 9356.23 samples/sec Loss 5.9147 LearningRate 0.0010 Epoch: 4 Global Step: 7900 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:20:03,118-Speed 9356.32 samples/sec Loss 5.9679 LearningRate 0.0010 Epoch: 4 Global Step: 7910 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 02:20:29,357-Speed 9366.94 samples/sec Loss 5.9519 LearningRate 0.0010 Epoch: 4 Global Step: 7920 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 02:20:55,571-Speed 9375.19 samples/sec Loss 5.9136 LearningRate 0.0010 Epoch: 4 Global Step: 7930 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 02:21:21,798-Speed 9370.89 samples/sec Loss 5.9230 LearningRate 0.0010 Epoch: 4 Global Step: 7940 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 02:21:48,030-Speed 9368.94 samples/sec Loss 5.9383 LearningRate 0.0010 Epoch: 4 Global Step: 7950 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 02:22:14,154-Speed 9407.87 samples/sec Loss 5.9007 LearningRate 0.0010 Epoch: 4 Global Step: 7960 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 02:22:40,296-Speed 9401.39 samples/sec Loss 5.9440 LearningRate 0.0010 Epoch: 4 Global Step: 7970 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 02:23:06,380-Speed 9422.37 samples/sec Loss 5.9172 LearningRate 0.0010 Epoch: 4 Global Step: 7980 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:23:32,566-Speed 9385.65 samples/sec Loss 5.8890 LearningRate 0.0010 Epoch: 4 Global Step: 7990 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:23:58,735-Speed 9391.74 samples/sec Loss 5.8600 LearningRate 0.0010 Epoch: 4 Global Step: 8000 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:24:24,970-Speed 9368.04 samples/sec Loss 5.9298 LearningRate 0.0010 Epoch: 4 Global Step: 8010 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:24:51,133-Speed 9394.04 samples/sec Loss 5.8494 LearningRate 0.0010 Epoch: 4 Global Step: 8020 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:25:17,163-Speed 9441.62 samples/sec Loss 5.8181 LearningRate 0.0010 Epoch: 4 Global Step: 8030 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:25:43,311-Speed 9399.44 samples/sec Loss 5.8442 LearningRate 0.0010 Epoch: 4 Global Step: 8040 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:26:09,572-Speed 9358.70 samples/sec Loss 5.8364 LearningRate 0.0010 Epoch: 4 Global Step: 8050 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:26:35,695-Speed 9408.34 samples/sec Loss 5.8310 LearningRate 0.0010 Epoch: 4 Global Step: 8060 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:27:01,873-Speed 9388.32 samples/sec Loss 5.8031 LearningRate 0.0010 Epoch: 4 Global Step: 8070 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:27:28,035-Speed 9394.22 samples/sec Loss 5.7445 LearningRate 0.0010 Epoch: 4 Global Step: 8080 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 02:27:54,189-Speed 9397.08 samples/sec Loss 5.7991 LearningRate 0.0010 Epoch: 4 Global Step: 8090 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 02:28:20,484-Speed 9346.84 samples/sec Loss 5.7472 LearningRate 0.0010 Epoch: 4 Global Step: 8100 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 02:28:46,556-Speed 9426.76 samples/sec Loss 5.7216 LearningRate 0.0010 Epoch: 4 Global Step: 8110 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 02:29:12,661-Speed 9414.71 samples/sec Loss 5.7794 LearningRate 0.0010 Epoch: 4 Global Step: 8120 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 02:29:38,846-Speed 9385.89 samples/sec Loss 5.7425 LearningRate 0.0010 Epoch: 4 Global Step: 8130 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 02:30:05,094-Speed 9363.27 samples/sec Loss 5.7537 LearningRate 0.0010 Epoch: 4 Global Step: 8140 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 02:30:31,195-Speed 9416.01 samples/sec Loss 5.7553 LearningRate 0.0010 Epoch: 4 Global Step: 8150 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:30:57,355-Speed 9394.80 samples/sec Loss 5.7546 LearningRate 0.0010 Epoch: 4 Global Step: 8160 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:31:23,629-Speed 9354.41 samples/sec Loss 5.7129 LearningRate 0.0010 Epoch: 4 Global Step: 8170 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:31:49,880-Speed 9362.12 samples/sec Loss 5.6976 LearningRate 0.0010 Epoch: 4 Global Step: 8180 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:32:16,104-Speed 9371.99 samples/sec Loss 5.7367 LearningRate 0.0010 Epoch: 4 Global Step: 8190 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:32:42,274-Speed 9391.35 samples/sec Loss 5.7191 LearningRate 0.0010 Epoch: 4 Global Step: 8200 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:33:08,462-Speed 9384.64 samples/sec Loss 5.6783 LearningRate 0.0010 Epoch: 4 Global Step: 8210 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:33:34,638-Speed 9389.07 samples/sec Loss 5.6504 LearningRate 0.0010 Epoch: 4 Global Step: 8220 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:34:00,762-Speed 9407.69 samples/sec Loss 5.6473 LearningRate 0.0010 Epoch: 4 Global Step: 8230 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:34:26,929-Speed 9392.43 samples/sec Loss 5.6755 LearningRate 0.0010 Epoch: 4 Global Step: 8240 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:34:53,050-Speed 9408.92 samples/sec Loss 5.6577 LearningRate 0.0010 Epoch: 4 Global Step: 8250 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 02:35:19,195-Speed 9400.33 samples/sec Loss 5.6298 LearningRate 0.0010 Epoch: 4 Global Step: 8260 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:35:45,386-Speed 9383.92 samples/sec Loss 5.6035 LearningRate 0.0010 Epoch: 4 Global Step: 8270 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:36:11,626-Speed 9366.19 samples/sec Loss 5.5892 LearningRate 0.0010 Epoch: 4 Global Step: 8280 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:36:37,862-Speed 9367.83 samples/sec Loss 5.6366 LearningRate 0.0010 Epoch: 4 Global Step: 8290 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:37:04,098-Speed 9367.63 samples/sec Loss 5.6345 LearningRate 0.0010 Epoch: 4 Global Step: 8300 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:37:30,244-Speed 9399.94 samples/sec Loss 5.5931 LearningRate 0.0010 Epoch: 4 Global Step: 8310 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:37:56,357-Speed 9411.76 samples/sec Loss 5.5730 LearningRate 0.0010 Epoch: 4 Global Step: 8320 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:38:22,551-Speed 9382.77 samples/sec Loss 5.5694 LearningRate 0.0010 Epoch: 4 Global Step: 8330 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:38:48,686-Speed 9404.09 samples/sec Loss 5.6010 LearningRate 0.0010 Epoch: 4 Global Step: 8340 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:39:14,837-Speed 9398.07 samples/sec Loss 5.5628 LearningRate 0.0010 Epoch: 4 Global Step: 8350 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:39:41,048-Speed 9376.74 samples/sec Loss 5.5650 LearningRate 0.0010 Epoch: 4 Global Step: 8360 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 02:40:07,234-Speed 9385.46 samples/sec Loss 5.5559 LearningRate 0.0010 Epoch: 4 Global Step: 8370 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 02:40:33,412-Speed 9388.41 samples/sec Loss 5.5436 LearningRate 0.0010 Epoch: 4 Global Step: 8380 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 02:40:59,594-Speed 9387.43 samples/sec Loss 5.5451 LearningRate 0.0010 Epoch: 4 Global Step: 8390 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 02:41:25,806-Speed 9376.45 samples/sec Loss 5.5371 LearningRate 0.0010 Epoch: 4 Global Step: 8400 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 02:41:52,024-Speed 9374.17 samples/sec Loss 5.5727 LearningRate 0.0010 Epoch: 4 Global Step: 8410 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 02:42:18,241-Speed 9374.26 samples/sec Loss 5.5139 LearningRate 0.0010 Epoch: 4 Global Step: 8420 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 02:42:44,329-Speed 9421.19 samples/sec Loss 5.5018 LearningRate 0.0010 Epoch: 4 Global Step: 8430 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 02:43:10,559-Speed 9369.42 samples/sec Loss 5.6199 LearningRate 0.0010 Epoch: 4 Global Step: 8440 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 02:43:36,720-Speed 9394.53 samples/sec Loss 5.5683 LearningRate 0.0010 Epoch: 4 Global Step: 8450 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 02:44:02,917-Speed 9381.77 samples/sec Loss 5.4605 LearningRate 0.0010 Epoch: 4 Global Step: 8460 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-03-05 02:44:29,079-Speed 9394.07 samples/sec Loss 5.4323 LearningRate 0.0010 Epoch: 4 Global Step: 8470 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 02:44:55,311-Speed 9369.19 samples/sec Loss 5.4541 LearningRate 0.0010 Epoch: 4 Global Step: 8480 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 02:45:21,530-Speed 9373.71 samples/sec Loss 5.4300 LearningRate 0.0009 Epoch: 4 Global Step: 8490 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-03-05 02:45:47,789-Speed 9359.62 samples/sec Loss 5.4392 LearningRate 0.0009 Epoch: 4 Global Step: 8500 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:46:14,064-Speed 9354.02 samples/sec Loss 5.4669 LearningRate 0.0009 Epoch: 4 Global Step: 8510 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:46:40,433-Speed 9320.36 samples/sec Loss 5.4333 LearningRate 0.0009 Epoch: 4 Global Step: 8520 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:47:06,746-Speed 9340.55 samples/sec Loss 5.4289 LearningRate 0.0009 Epoch: 4 Global Step: 8530 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:47:33,044-Speed 9345.50 samples/sec Loss 5.4546 LearningRate 0.0009 Epoch: 4 Global Step: 8540 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:47:59,371-Speed 9335.17 samples/sec Loss 5.4097 LearningRate 0.0009 Epoch: 4 Global Step: 8550 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 02:48:25,637-Speed 9357.32 samples/sec Loss 5.4641 LearningRate 0.0009 Epoch: 4 Global Step: 8560 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 02:48:51,894-Speed 9360.01 samples/sec Loss 5.4730 LearningRate 0.0009 Epoch: 4 Global Step: 8570 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 02:49:18,209-Speed 9339.45 samples/sec Loss 5.4073 LearningRate 0.0009 Epoch: 4 Global Step: 8580 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 02:49:44,536-Speed 9335.44 samples/sec Loss 5.4151 LearningRate 0.0009 Epoch: 4 Global Step: 8590 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 02:50:10,857-Speed 9337.33 samples/sec Loss 5.4005 LearningRate 0.0009 Epoch: 4 Global Step: 8600 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-03-05 02:50:37,139-Speed 9351.68 samples/sec Loss 5.4643 LearningRate 0.0009 Epoch: 4 Global Step: 8610 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-03-05 02:51:03,370-Speed 9369.43 samples/sec Loss 5.4210 LearningRate 0.0009 Epoch: 4 Global Step: 8620 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 02:51:29,610-Speed 9366.24 samples/sec Loss 5.4087 LearningRate 0.0009 Epoch: 4 Global Step: 8630 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 02:51:55,904-Speed 9346.93 samples/sec Loss 5.3932 LearningRate 0.0009 Epoch: 4 Global Step: 8640 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 02:53:13,566-Speed 3164.54 samples/sec Loss 5.2916 LearningRate 0.0009 Epoch: 5 Global Step: 8650 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:53:39,633-Speed 9428.54 samples/sec Loss 5.2671 LearningRate 0.0009 Epoch: 5 Global Step: 8660 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:54:05,782-Speed 9399.06 samples/sec Loss 5.2597 LearningRate 0.0009 Epoch: 5 Global Step: 8670 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-03-05 02:54:31,866-Speed 9422.43 samples/sec Loss 5.3309 LearningRate 0.0009 Epoch: 5 Global Step: 8680 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 02:54:57,967-Speed 9416.12 samples/sec Loss 5.3113 LearningRate 0.0009 Epoch: 5 Global Step: 8690 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 02:55:24,074-Speed 9413.96 samples/sec Loss 5.2650 LearningRate 0.0009 Epoch: 5 Global Step: 8700 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 02:55:50,281-Speed 9378.23 samples/sec Loss 5.2515 LearningRate 0.0009 Epoch: 5 Global Step: 8710 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 02:56:16,479-Speed 9381.46 samples/sec Loss 5.2377 LearningRate 0.0009 Epoch: 5 Global Step: 8720 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-03-05 02:56:42,713-Speed 9368.30 samples/sec Loss 5.2604 LearningRate 0.0009 Epoch: 5 Global Step: 8730 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-03-05 02:57:08,834-Speed 9408.86 samples/sec Loss 5.2486 LearningRate 0.0009 Epoch: 5 Global Step: 8740 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-03-05 02:57:35,064-Speed 9369.79 samples/sec Loss 5.2462 LearningRate 0.0009 Epoch: 5 Global Step: 8750 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-03-05 02:58:01,350-Speed 9349.76 samples/sec Loss 5.2310 LearningRate 0.0009 Epoch: 5 Global Step: 8760 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-03-05 02:58:27,499-Speed 9399.09 samples/sec Loss 5.2179 LearningRate 0.0009 Epoch: 5 Global Step: 8770 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-03-05 02:58:53,687-Speed 9384.99 samples/sec Loss 5.2830 LearningRate 0.0009 Epoch: 5 Global Step: 8780 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-03-05 02:59:19,866-Speed 9388.00 samples/sec Loss 5.2495 LearningRate 0.0009 Epoch: 5 Global Step: 8790 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-03-05 02:59:46,079-Speed 9375.85 samples/sec Loss 5.2192 LearningRate 0.0009 Epoch: 5 Global Step: 8800 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-03-05 03:00:12,277-Speed 9381.27 samples/sec Loss 5.2143 LearningRate 0.0009 Epoch: 5 Global Step: 8810 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-03-05 03:00:38,483-Speed 9378.16 samples/sec Loss 5.2432 LearningRate 0.0009 Epoch: 5 Global Step: 8820 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-03-05 03:01:04,719-Speed 9367.72 samples/sec Loss 5.2120 LearningRate 0.0009 Epoch: 5 Global Step: 8830 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-03-05 03:01:30,893-Speed 9389.98 samples/sec Loss 5.1857 LearningRate 0.0009 Epoch: 5 Global Step: 8840 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-03-05 03:01:57,095-Speed 9379.80 samples/sec Loss 5.2238 LearningRate 0.0009 Epoch: 5 Global Step: 8850 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-03-05 03:02:23,285-Speed 9384.82 samples/sec Loss 5.1910 LearningRate 0.0009 Epoch: 5 Global Step: 8860 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-03-05 03:02:49,545-Speed 9358.96 samples/sec Loss 5.2175 LearningRate 0.0009 Epoch: 5 Global Step: 8870 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-03-05 03:03:15,783-Speed 9367.07 samples/sec Loss 5.1841 LearningRate 0.0009 Epoch: 5 Global Step: 8880 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-03-05 03:03:41,923-Speed 9402.20 samples/sec Loss 5.1832 LearningRate 0.0009 Epoch: 5 Global Step: 8890 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-03-05 03:04:08,068-Speed 9400.28 samples/sec Loss 5.2126 LearningRate 0.0009 Epoch: 5 Global Step: 8900 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-03-05 03:04:34,183-Speed 9411.30 samples/sec Loss 5.1652 LearningRate 0.0009 Epoch: 5 Global Step: 8910 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:05:00,372-Speed 9384.38 samples/sec Loss 5.1651 LearningRate 0.0009 Epoch: 5 Global Step: 8920 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:05:26,652-Speed 9352.00 samples/sec Loss 5.1627 LearningRate 0.0009 Epoch: 5 Global Step: 8930 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:05:52,935-Speed 9350.76 samples/sec Loss 5.1642 LearningRate 0.0009 Epoch: 5 Global Step: 8940 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:06:19,116-Speed 9387.80 samples/sec Loss 5.2038 LearningRate 0.0009 Epoch: 5 Global Step: 8950 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:06:45,304-Speed 9384.65 samples/sec Loss 5.1607 LearningRate 0.0009 Epoch: 5 Global Step: 8960 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:07:11,470-Speed 9393.06 samples/sec Loss 5.1064 LearningRate 0.0009 Epoch: 5 Global Step: 8970 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:07:37,663-Speed 9383.11 samples/sec Loss 5.1087 LearningRate 0.0009 Epoch: 5 Global Step: 8980 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:08:03,928-Speed 9357.21 samples/sec Loss 5.1159 LearningRate 0.0009 Epoch: 5 Global Step: 8990 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:08:30,077-Speed 9398.94 samples/sec Loss 5.1300 LearningRate 0.0009 Epoch: 5 Global Step: 9000 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:08:56,263-Speed 9385.43 samples/sec Loss 5.0988 LearningRate 0.0009 Epoch: 5 Global Step: 9010 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-03-05 03:09:22,481-Speed 9374.28 samples/sec Loss 5.1076 LearningRate 0.0009 Epoch: 5 Global Step: 9020 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:09:48,725-Speed 9364.85 samples/sec Loss 5.1376 LearningRate 0.0009 Epoch: 5 Global Step: 9030 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:10:14,920-Speed 9382.63 samples/sec Loss 5.0553 LearningRate 0.0009 Epoch: 5 Global Step: 9040 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:10:41,072-Speed 9398.12 samples/sec Loss 5.0832 LearningRate 0.0009 Epoch: 5 Global Step: 9050 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:11:07,246-Speed 9389.66 samples/sec Loss 5.0769 LearningRate 0.0009 Epoch: 5 Global Step: 9060 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:11:33,457-Speed 9377.06 samples/sec Loss 5.1012 LearningRate 0.0009 Epoch: 5 Global Step: 9070 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:11:59,602-Speed 9399.97 samples/sec Loss 5.0698 LearningRate 0.0009 Epoch: 5 Global Step: 9080 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:12:25,858-Speed 9361.46 samples/sec Loss 5.0479 LearningRate 0.0009 Epoch: 5 Global Step: 9090 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:12:52,167-Speed 9341.79 samples/sec Loss 5.0563 LearningRate 0.0009 Epoch: 5 Global Step: 9100 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:13:18,399-Speed 9369.13 samples/sec Loss 5.0402 LearningRate 0.0009 Epoch: 5 Global Step: 9110 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:13:44,706-Speed 9342.29 samples/sec Loss 5.0749 LearningRate 0.0009 Epoch: 5 Global Step: 9120 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-03-05 03:14:10,869-Speed 9393.96 samples/sec Loss 5.0536 LearningRate 0.0009 Epoch: 5 Global Step: 9130 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:14:37,092-Speed 9372.12 samples/sec Loss 5.0214 LearningRate 0.0009 Epoch: 5 Global Step: 9140 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:15:03,289-Speed 9382.01 samples/sec Loss 5.0128 LearningRate 0.0009 Epoch: 5 Global Step: 9150 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-03-05 03:15:29,588-Speed 9345.09 samples/sec Loss 5.0243 LearningRate 0.0009 Epoch: 5 Global Step: 9160 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-03-05 03:15:55,757-Speed 9392.49 samples/sec Loss 4.9962 LearningRate 0.0009 Epoch: 5 Global Step: 9170 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-03-05 03:16:21,970-Speed 9376.11 samples/sec Loss 4.9908 LearningRate 0.0009 Epoch: 5 Global Step: 9180 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-03-05 03:16:48,189-Speed 9373.63 samples/sec Loss 5.0602 LearningRate 0.0009 Epoch: 5 Global Step: 9190 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-03-05 03:17:14,466-Speed 9352.87 samples/sec Loss 5.0041 LearningRate 0.0009 Epoch: 5 Global Step: 9200 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-03-05 03:17:40,729-Speed 9358.29 samples/sec Loss 4.9948 LearningRate 0.0009 Epoch: 5 Global Step: 9210 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-03-05 03:18:06,888-Speed 9395.17 samples/sec Loss 5.0107 LearningRate 0.0009 Epoch: 5 Global Step: 9220 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-03-05 03:18:33,148-Speed 9359.27 samples/sec Loss 4.9803 LearningRate 0.0009 Epoch: 5 Global Step: 9230 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-03-05 03:18:59,432-Speed 9350.24 samples/sec Loss 4.9508 LearningRate 0.0009 Epoch: 5 Global Step: 9240 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-03-05 03:19:25,707-Speed 9353.78 samples/sec Loss 5.0151 LearningRate 0.0009 Epoch: 5 Global Step: 9250 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:19:51,996-Speed 9348.88 samples/sec Loss 4.9666 LearningRate 0.0009 Epoch: 5 Global Step: 9260 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:20:18,203-Speed 9378.25 samples/sec Loss 4.9567 LearningRate 0.0009 Epoch: 5 Global Step: 9270 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:20:44,298-Speed 9418.22 samples/sec Loss 4.9636 LearningRate 0.0009 Epoch: 5 Global Step: 9280 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:21:10,458-Speed 9394.77 samples/sec Loss 4.9957 LearningRate 0.0009 Epoch: 5 Global Step: 9290 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:21:36,631-Speed 9390.45 samples/sec Loss 4.9159 LearningRate 0.0009 Epoch: 5 Global Step: 9300 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-03-05 03:22:02,857-Speed 9371.27 samples/sec Loss 4.9514 LearningRate 0.0009 Epoch: 5 Global Step: 9310 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-03-05 03:22:29,123-Speed 9356.85 samples/sec Loss 4.9378 LearningRate 0.0009 Epoch: 5 Global Step: 9320 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-03-05 03:22:55,334-Speed 9376.50 samples/sec Loss 4.9474 LearningRate 0.0009 Epoch: 5 Global Step: 9330 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-03-05 03:23:21,616-Speed 9351.30 samples/sec Loss 4.9160 LearningRate 0.0009 Epoch: 5 Global Step: 9340 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-03-05 03:23:47,872-Speed 9360.72 samples/sec Loss 4.9088 LearningRate 0.0009 Epoch: 5 Global Step: 9350 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-03-05 03:24:14,104-Speed 9369.12 samples/sec Loss 4.9144 LearningRate 0.0009 Epoch: 5 Global Step: 9360 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-03-05 03:24:40,337-Speed 9368.82 samples/sec Loss 4.9214 LearningRate 0.0009 Epoch: 5 Global Step: 9370 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-03-05 03:25:06,623-Speed 9349.72 samples/sec Loss 4.9139 LearningRate 0.0009 Epoch: 5 Global Step: 9380 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-03-05 03:25:32,825-Speed 9380.07 samples/sec Loss 4.8911 LearningRate 0.0009 Epoch: 5 Global Step: 9390 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-03-05 03:25:59,061-Speed 9367.63 samples/sec Loss 4.9088 LearningRate 0.0009 Epoch: 5 Global Step: 9400 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:26:25,256-Speed 9382.47 samples/sec Loss 4.8959 LearningRate 0.0009 Epoch: 5 Global Step: 9410 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:26:51,497-Speed 9365.75 samples/sec Loss 4.8635 LearningRate 0.0009 Epoch: 5 Global Step: 9420 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:27:17,845-Speed 9327.87 samples/sec Loss 4.8801 LearningRate 0.0009 Epoch: 5 Global Step: 9430 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:27:44,266-Speed 9301.97 samples/sec Loss 4.8663 LearningRate 0.0009 Epoch: 5 Global Step: 9440 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:28:10,560-Speed 9347.23 samples/sec Loss 4.8314 LearningRate 0.0009 Epoch: 5 Global Step: 9450 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:28:36,819-Speed 9359.46 samples/sec Loss 4.9014 LearningRate 0.0009 Epoch: 5 Global Step: 9460 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:29:03,082-Speed 9358.08 samples/sec Loss 4.8811 LearningRate 0.0009 Epoch: 5 Global Step: 9470 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:29:29,398-Speed 9339.01 samples/sec Loss 4.8639 LearningRate 0.0009 Epoch: 5 Global Step: 9480 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:29:55,631-Speed 9368.90 samples/sec Loss 4.8369 LearningRate 0.0009 Epoch: 5 Global Step: 9490 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:30:21,942-Speed 9340.71 samples/sec Loss 4.8345 LearningRate 0.0009 Epoch: 5 Global Step: 9500 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-03-05 03:30:48,312-Speed 9320.18 samples/sec Loss 4.7863 LearningRate 0.0009 Epoch: 5 Global Step: 9510 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-03-05 03:31:14,573-Speed 9359.67 samples/sec Loss 4.8258 LearningRate 0.0009 Epoch: 5 Global Step: 9520 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:31:40,787-Speed 9375.55 samples/sec Loss 4.8147 LearningRate 0.0009 Epoch: 5 Global Step: 9530 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:32:07,170-Speed 9315.34 samples/sec Loss 4.8319 LearningRate 0.0009 Epoch: 5 Global Step: 9540 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:32:33,619-Speed 9292.33 samples/sec Loss 4.8049 LearningRate 0.0009 Epoch: 5 Global Step: 9550 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:32:59,891-Speed 9354.80 samples/sec Loss 4.8439 LearningRate 0.0009 Epoch: 5 Global Step: 9560 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:33:26,323-Speed 9298.20 samples/sec Loss 4.8462 LearningRate 0.0009 Epoch: 5 Global Step: 9570 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:33:52,611-Speed 9349.25 samples/sec Loss 4.8052 LearningRate 0.0009 Epoch: 5 Global Step: 9580 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:34:18,875-Speed 9357.76 samples/sec Loss 4.7745 LearningRate 0.0009 Epoch: 5 Global Step: 9590 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:34:45,327-Speed 9291.31 samples/sec Loss 4.7804 LearningRate 0.0009 Epoch: 5 Global Step: 9600 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:35:11,625-Speed 9345.26 samples/sec Loss 4.7872 LearningRate 0.0009 Epoch: 5 Global Step: 9610 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:35:37,995-Speed 9320.13 samples/sec Loss 4.7670 LearningRate 0.0009 Epoch: 5 Global Step: 9620 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-03-05 03:36:04,347-Speed 9326.48 samples/sec Loss 4.7372 LearningRate 0.0009 Epoch: 5 Global Step: 9630 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-03-05 03:36:30,648-Speed 9344.67 samples/sec Loss 4.7561 LearningRate 0.0009 Epoch: 5 Global Step: 9640 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-03-05 03:36:56,892-Speed 9365.32 samples/sec Loss 4.7481 LearningRate 0.0009 Epoch: 5 Global Step: 9650 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-03-05 03:37:23,020-Speed 9406.20 samples/sec Loss 4.7692 LearningRate 0.0009 Epoch: 5 Global Step: 9660 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-03-05 03:37:49,133-Speed 9412.18 samples/sec Loss 4.7680 LearningRate 0.0009 Epoch: 5 Global Step: 9670 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-03-05 03:38:15,268-Speed 9403.99 samples/sec Loss 4.7777 LearningRate 0.0009 Epoch: 5 Global Step: 9680 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-03-05 03:38:41,621-Speed 9326.03 samples/sec Loss 4.7637 LearningRate 0.0009 Epoch: 5 Global Step: 9690 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-03-05 03:39:07,888-Speed 9356.49 samples/sec Loss 4.7093 LearningRate 0.0009 Epoch: 5 Global Step: 9700 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-03-05 03:39:34,173-Speed 9350.40 samples/sec Loss 4.7524 LearningRate 0.0009 Epoch: 5 Global Step: 9710 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:40:00,468-Speed 9346.73 samples/sec Loss 4.7299 LearningRate 0.0009 Epoch: 5 Global Step: 9720 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:40:26,914-Speed 9293.13 samples/sec Loss 4.7362 LearningRate 0.0009 Epoch: 5 Global Step: 9730 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:40:53,169-Speed 9360.87 samples/sec Loss 4.7119 LearningRate 0.0009 Epoch: 5 Global Step: 9740 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:41:19,422-Speed 9361.85 samples/sec Loss 4.7108 LearningRate 0.0009 Epoch: 5 Global Step: 9750 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:41:45,690-Speed 9356.29 samples/sec Loss 4.6979 LearningRate 0.0009 Epoch: 5 Global Step: 9760 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:42:11,957-Speed 9356.45 samples/sec Loss 4.7021 LearningRate 0.0009 Epoch: 5 Global Step: 9770 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:42:38,189-Speed 9369.98 samples/sec Loss 4.6633 LearningRate 0.0009 Epoch: 5 Global Step: 9780 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:43:04,612-Speed 9301.12 samples/sec Loss 4.6534 LearningRate 0.0009 Epoch: 5 Global Step: 9790 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:43:30,850-Speed 9367.22 samples/sec Loss 4.6792 LearningRate 0.0009 Epoch: 5 Global Step: 9800 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:43:57,233-Speed 9315.11 samples/sec Loss 4.7396 LearningRate 0.0009 Epoch: 5 Global Step: 9810 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-03-05 03:44:23,523-Speed 9348.58 samples/sec Loss 4.6864 LearningRate 0.0009 Epoch: 5 Global Step: 9820 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-03-05 03:44:49,782-Speed 9359.63 samples/sec Loss 4.6572 LearningRate 0.0009 Epoch: 5 Global Step: 9830 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-03-05 03:45:16,235-Speed 9290.89 samples/sec Loss 4.6723 LearningRate 0.0009 Epoch: 5 Global Step: 9840 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-03-05 03:45:42,728-Speed 9276.92 samples/sec Loss 4.6455 LearningRate 0.0009 Epoch: 5 Global Step: 9850 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-03-05 03:46:09,157-Speed 9299.33 samples/sec Loss 4.6541 LearningRate 0.0009 Epoch: 5 Global Step: 9860 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-03-05 03:46:35,495-Speed 9331.36 samples/sec Loss 4.6515 LearningRate 0.0009 Epoch: 5 Global Step: 9870 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:47:01,709-Speed 9375.46 samples/sec Loss 4.6441 LearningRate 0.0009 Epoch: 5 Global Step: 9880 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:47:27,919-Speed 9376.87 samples/sec Loss 4.6261 LearningRate 0.0009 Epoch: 5 Global Step: 9890 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:47:54,213-Speed 9347.12 samples/sec Loss 4.6484 LearningRate 0.0009 Epoch: 5 Global Step: 9900 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:48:20,489-Speed 9353.68 samples/sec Loss 4.6058 LearningRate 0.0009 Epoch: 5 Global Step: 9910 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:48:46,758-Speed 9355.74 samples/sec Loss 4.6456 LearningRate 0.0009 Epoch: 5 Global Step: 9920 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:49:13,106-Speed 9327.77 samples/sec Loss 4.6315 LearningRate 0.0009 Epoch: 5 Global Step: 9930 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:49:39,484-Speed 9317.02 samples/sec Loss 4.6485 LearningRate 0.0009 Epoch: 5 Global Step: 9940 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:50:05,749-Speed 9357.66 samples/sec Loss 4.6476 LearningRate 0.0009 Epoch: 5 Global Step: 9950 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:50:32,227-Speed 9281.80 samples/sec Loss 4.6232 LearningRate 0.0009 Epoch: 5 Global Step: 9960 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-03-05 03:50:58,483-Speed 9360.46 samples/sec Loss 4.6023 LearningRate 0.0009 Epoch: 5 Global Step: 9970 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-03-05 03:51:24,824-Speed 9330.53 samples/sec Loss 4.5982 LearningRate 0.0009 Epoch: 5 Global Step: 9980 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-03-05 03:51:51,163-Speed 9330.95 samples/sec Loss 4.6288 LearningRate 0.0009 Epoch: 5 Global Step: 9990 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-03-05 03:52:17,535-Speed 9319.48 samples/sec Loss 4.6102 LearningRate 0.0009 Epoch: 5 Global Step: 10000 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-03-05 03:52:43,888-Speed 9325.92 samples/sec Loss 4.6053 LearningRate 0.0009 Epoch: 5 Global Step: 10010 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-03-05 03:53:10,301-Speed 9305.05 samples/sec Loss 4.6078 LearningRate 0.0009 Epoch: 5 Global Step: 10020 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 03:53:36,766-Speed 9286.76 samples/sec Loss 4.5559 LearningRate 0.0009 Epoch: 5 Global Step: 10030 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 03:54:03,209-Speed 9294.24 samples/sec Loss 4.5569 LearningRate 0.0009 Epoch: 5 Global Step: 10040 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 03:54:29,638-Speed 9299.37 samples/sec Loss 4.6055 LearningRate 0.0009 Epoch: 5 Global Step: 10050 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 03:54:56,027-Speed 9313.27 samples/sec Loss 4.5800 LearningRate 0.0009 Epoch: 5 Global Step: 10060 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 03:55:22,453-Speed 9300.28 samples/sec Loss 4.5639 LearningRate 0.0009 Epoch: 5 Global Step: 10070 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 03:55:48,824-Speed 9319.87 samples/sec Loss 4.5525 LearningRate 0.0009 Epoch: 5 Global Step: 10080 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 03:56:15,268-Speed 9293.86 samples/sec Loss 4.5503 LearningRate 0.0009 Epoch: 5 Global Step: 10090 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 03:56:41,676-Speed 9307.01 samples/sec Loss 4.5641 LearningRate 0.0009 Epoch: 5 Global Step: 10100 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 03:57:08,088-Speed 9305.17 samples/sec Loss 4.5420 LearningRate 0.0009 Epoch: 5 Global Step: 10110 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 03:57:34,512-Speed 9301.20 samples/sec Loss 4.5329 LearningRate 0.0009 Epoch: 5 Global Step: 10120 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-03-05 03:58:00,951-Speed 9295.60 samples/sec Loss 4.5302 LearningRate 0.0009 Epoch: 5 Global Step: 10130 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-03-05 03:58:27,245-Speed 9347.39 samples/sec Loss 4.5423 LearningRate 0.0009 Epoch: 5 Global Step: 10140 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 03:58:53,748-Speed 9273.33 samples/sec Loss 4.5726 LearningRate 0.0009 Epoch: 5 Global Step: 10150 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 03:59:20,079-Speed 9333.95 samples/sec Loss 4.5319 LearningRate 0.0009 Epoch: 5 Global Step: 10160 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 03:59:46,521-Speed 9294.76 samples/sec Loss 4.5321 LearningRate 0.0009 Epoch: 5 Global Step: 10170 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:00:12,884-Speed 9322.35 samples/sec Loss 4.4972 LearningRate 0.0009 Epoch: 5 Global Step: 10180 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:00:39,322-Speed 9296.41 samples/sec Loss 4.5242 LearningRate 0.0009 Epoch: 5 Global Step: 10190 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:01:05,852-Speed 9263.64 samples/sec Loss 4.5197 LearningRate 0.0009 Epoch: 5 Global Step: 10200 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:01:32,407-Speed 9255.47 samples/sec Loss 4.4897 LearningRate 0.0009 Epoch: 5 Global Step: 10210 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:01:59,001-Speed 9241.58 samples/sec Loss 4.5316 LearningRate 0.0009 Epoch: 5 Global Step: 10220 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:02:25,422-Speed 9301.95 samples/sec Loss 4.5071 LearningRate 0.0009 Epoch: 5 Global Step: 10230 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:02:51,744-Speed 9336.92 samples/sec Loss 4.4745 LearningRate 0.0009 Epoch: 5 Global Step: 10240 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-03-05 04:03:18,085-Speed 9330.44 samples/sec Loss 4.4785 LearningRate 0.0009 Epoch: 5 Global Step: 10250 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-03-05 04:03:44,450-Speed 9321.75 samples/sec Loss 4.4893 LearningRate 0.0009 Epoch: 5 Global Step: 10260 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-03-05 04:04:10,860-Speed 9305.91 samples/sec Loss 4.5536 LearningRate 0.0009 Epoch: 5 Global Step: 10270 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-03-05 04:04:37,206-Speed 9328.64 samples/sec Loss 4.4791 LearningRate 0.0009 Epoch: 5 Global Step: 10280 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-03-05 04:05:03,571-Speed 9322.77 samples/sec Loss 4.5119 LearningRate 0.0009 Epoch: 5 Global Step: 10290 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-03-05 04:05:29,838-Speed 9356.54 samples/sec Loss 4.5300 LearningRate 0.0009 Epoch: 5 Global Step: 10300 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-03-05 04:05:56,320-Speed 9280.67 samples/sec Loss 4.4641 LearningRate 0.0009 Epoch: 5 Global Step: 10310 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-03-05 04:06:22,623-Speed 9343.68 samples/sec Loss 4.4704 LearningRate 0.0009 Epoch: 5 Global Step: 10320 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-03-05 04:06:48,950-Speed 9335.38 samples/sec Loss 4.4894 LearningRate 0.0009 Epoch: 5 Global Step: 10330 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-03-05 04:07:15,275-Speed 9336.12 samples/sec Loss 4.5135 LearningRate 0.0009 Epoch: 5 Global Step: 10340 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:07:41,602-Speed 9335.51 samples/sec Loss 4.4829 LearningRate 0.0009 Epoch: 5 Global Step: 10350 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:08:08,020-Speed 9302.94 samples/sec Loss 4.5214 LearningRate 0.0009 Epoch: 5 Global Step: 10360 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:08:34,225-Speed 9378.78 samples/sec Loss 4.5018 LearningRate 0.0009 Epoch: 5 Global Step: 10370 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:09:53,735-Speed 3090.97 samples/sec Loss 4.3854 LearningRate 0.0009 Epoch: 6 Global Step: 10380 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:10:19,817-Speed 9423.07 samples/sec Loss 4.3851 LearningRate 0.0009 Epoch: 6 Global Step: 10390 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:10:45,931-Speed 9411.58 samples/sec Loss 4.3645 LearningRate 0.0009 Epoch: 6 Global Step: 10400 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:11:12,059-Speed 9406.57 samples/sec Loss 4.3863 LearningRate 0.0009 Epoch: 6 Global Step: 10410 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:11:38,281-Speed 9372.82 samples/sec Loss 4.3680 LearningRate 0.0009 Epoch: 6 Global Step: 10420 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:12:04,421-Speed 9401.79 samples/sec Loss 4.3870 LearningRate 0.0009 Epoch: 6 Global Step: 10430 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:12:30,559-Speed 9402.78 samples/sec Loss 4.3441 LearningRate 0.0009 Epoch: 6 Global Step: 10440 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-03-05 04:12:56,733-Speed 9390.26 samples/sec Loss 4.4055 LearningRate 0.0009 Epoch: 6 Global Step: 10450 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-03-05 04:13:22,921-Speed 9384.75 samples/sec Loss 4.3932 LearningRate 0.0009 Epoch: 6 Global Step: 10460 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-03-05 04:13:49,183-Speed 9358.58 samples/sec Loss 4.3772 LearningRate 0.0009 Epoch: 6 Global Step: 10470 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-03-05 04:14:15,342-Speed 9395.15 samples/sec Loss 4.3810 LearningRate 0.0009 Epoch: 6 Global Step: 10480 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-03-05 04:14:41,521-Speed 9388.05 samples/sec Loss 4.3855 LearningRate 0.0009 Epoch: 6 Global Step: 10490 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-03-05 04:15:07,639-Speed 9410.02 samples/sec Loss 4.3734 LearningRate 0.0009 Epoch: 6 Global Step: 10500 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-03-05 04:15:33,836-Speed 9381.51 samples/sec Loss 4.3913 LearningRate 0.0009 Epoch: 6 Global Step: 10510 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:16:00,056-Speed 9373.37 samples/sec Loss 4.3715 LearningRate 0.0009 Epoch: 6 Global Step: 10520 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:16:26,245-Speed 9384.55 samples/sec Loss 4.3703 LearningRate 0.0009 Epoch: 6 Global Step: 10530 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:16:52,465-Speed 9373.33 samples/sec Loss 4.3666 LearningRate 0.0009 Epoch: 6 Global Step: 10540 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:17:18,598-Speed 9404.79 samples/sec Loss 4.3270 LearningRate 0.0009 Epoch: 6 Global Step: 10550 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:17:44,795-Speed 9381.65 samples/sec Loss 4.3559 LearningRate 0.0009 Epoch: 6 Global Step: 10560 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:18:10,892-Speed 9417.76 samples/sec Loss 4.3744 LearningRate 0.0009 Epoch: 6 Global Step: 10570 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:18:37,053-Speed 9394.58 samples/sec Loss 4.3922 LearningRate 0.0009 Epoch: 6 Global Step: 10580 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:19:03,241-Speed 9385.22 samples/sec Loss 4.3569 LearningRate 0.0009 Epoch: 6 Global Step: 10590 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:19:29,421-Speed 9387.87 samples/sec Loss 4.3418 LearningRate 0.0009 Epoch: 6 Global Step: 10600 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:19:55,602-Speed 9387.44 samples/sec Loss 4.3194 LearningRate 0.0009 Epoch: 6 Global Step: 10610 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-03-05 04:20:21,831-Speed 9370.08 samples/sec Loss 4.3127 LearningRate 0.0009 Epoch: 6 Global Step: 10620 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-03-05 04:20:48,012-Speed 9387.65 samples/sec Loss 4.3352 LearningRate 0.0009 Epoch: 6 Global Step: 10630 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-03-05 04:21:14,199-Speed 9385.16 samples/sec Loss 4.3207 LearningRate 0.0009 Epoch: 6 Global Step: 10640 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-03-05 04:21:40,447-Speed 9363.37 samples/sec Loss 4.3004 LearningRate 0.0009 Epoch: 6 Global Step: 10650 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-03-05 04:22:06,670-Speed 9372.40 samples/sec Loss 4.3409 LearningRate 0.0009 Epoch: 6 Global Step: 10660 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-03-05 04:22:32,900-Speed 9369.57 samples/sec Loss 4.3711 LearningRate 0.0009 Epoch: 6 Global Step: 10670 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-03-05 04:22:59,118-Speed 9374.16 samples/sec Loss 4.3202 LearningRate 0.0009 Epoch: 6 Global Step: 10680 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-03-05 04:23:25,319-Speed 9380.62 samples/sec Loss 4.2979 LearningRate 0.0009 Epoch: 6 Global Step: 10690 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-03-05 04:23:51,477-Speed 9396.52 samples/sec Loss 4.3656 LearningRate 0.0009 Epoch: 6 Global Step: 10700 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-03-05 04:24:17,642-Speed 9393.34 samples/sec Loss 4.3237 LearningRate 0.0009 Epoch: 6 Global Step: 10710 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-03-05 04:24:43,848-Speed 9378.23 samples/sec Loss 4.3157 LearningRate 0.0009 Epoch: 6 Global Step: 10720 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-03-05 04:25:10,045-Speed 9381.89 samples/sec Loss 4.2917 LearningRate 0.0009 Epoch: 6 Global Step: 10730 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-03-05 04:25:36,203-Speed 9395.34 samples/sec Loss 4.2777 LearningRate 0.0009 Epoch: 6 Global Step: 10740 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:26:02,349-Speed 9399.77 samples/sec Loss 4.2762 LearningRate 0.0009 Epoch: 6 Global Step: 10750 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:26:28,500-Speed 9398.15 samples/sec Loss 4.3057 LearningRate 0.0009 Epoch: 6 Global Step: 10760 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:26:54,821-Speed 9337.36 samples/sec Loss 4.3069 LearningRate 0.0009 Epoch: 6 Global Step: 10770 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:27:21,056-Speed 9368.23 samples/sec Loss 4.2916 LearningRate 0.0009 Epoch: 6 Global Step: 10780 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-03-05 04:27:47,396-Speed 9330.82 samples/sec Loss 4.2863 LearningRate 0.0009 Epoch: 6 Global Step: 10790 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-03-05 04:28:13,726-Speed 9334.08 samples/sec Loss 4.2633 LearningRate 0.0009 Epoch: 6 Global Step: 10800 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-03-05 04:28:40,017-Speed 9348.12 samples/sec Loss 4.2793 LearningRate 0.0009 Epoch: 6 Global Step: 10810 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-03-05 04:29:06,352-Speed 9332.62 samples/sec Loss 4.2755 LearningRate 0.0009 Epoch: 6 Global Step: 10820 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-03-05 04:29:32,606-Speed 9361.27 samples/sec Loss 4.2586 LearningRate 0.0009 Epoch: 6 Global Step: 10830 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-03-05 04:29:58,886-Speed 9352.05 samples/sec Loss 4.2919 LearningRate 0.0009 Epoch: 6 Global Step: 10840 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-03-05 04:30:25,105-Speed 9373.68 samples/sec Loss 4.2774 LearningRate 0.0009 Epoch: 6 Global Step: 10850 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-03-05 04:30:51,443-Speed 9331.22 samples/sec Loss 4.2367 LearningRate 0.0009 Epoch: 6 Global Step: 10860 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-03-05 04:31:17,683-Speed 9366.16 samples/sec Loss 4.2620 LearningRate 0.0009 Epoch: 6 Global Step: 10870 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-03-05 04:31:44,011-Speed 9334.86 samples/sec Loss 4.2552 LearningRate 0.0009 Epoch: 6 Global Step: 10880 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:32:10,311-Speed 9345.02 samples/sec Loss 4.2195 LearningRate 0.0009 Epoch: 6 Global Step: 10890 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:32:36,690-Speed 9316.88 samples/sec Loss 4.2535 LearningRate 0.0009 Epoch: 6 Global Step: 10900 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:33:03,035-Speed 9329.06 samples/sec Loss 4.2267 LearningRate 0.0009 Epoch: 6 Global Step: 10910 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:33:29,326-Speed 9348.07 samples/sec Loss 4.2684 LearningRate 0.0009 Epoch: 6 Global Step: 10920 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:33:55,581-Speed 9360.95 samples/sec Loss 4.2382 LearningRate 0.0009 Epoch: 6 Global Step: 10930 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:34:21,857-Speed 9353.19 samples/sec Loss 4.1933 LearningRate 0.0009 Epoch: 6 Global Step: 10940 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:34:48,301-Speed 9294.02 samples/sec Loss 4.1927 LearningRate 0.0009 Epoch: 6 Global Step: 10950 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:35:14,718-Speed 9303.53 samples/sec Loss 4.2450 LearningRate 0.0009 Epoch: 6 Global Step: 10960 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:35:41,398-Speed 9211.66 samples/sec Loss 4.2490 LearningRate 0.0009 Epoch: 6 Global Step: 10970 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:36:07,865-Speed 9286.19 samples/sec Loss 4.2284 LearningRate 0.0009 Epoch: 6 Global Step: 10980 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-03-05 04:36:34,303-Speed 9296.12 samples/sec Loss 4.2057 LearningRate 0.0009 Epoch: 6 Global Step: 10990 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:37:00,690-Speed 9314.15 samples/sec Loss 4.1979 LearningRate 0.0009 Epoch: 6 Global Step: 11000 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-03-05 04:37:26,994-Speed 9343.36 samples/sec Loss 4.1977 LearningRate 0.0009 Epoch: 6 Global Step: 11010 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-03-05 04:37:53,361-Speed 9321.32 samples/sec Loss 4.2102 LearningRate 0.0009 Epoch: 6 Global Step: 11020 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-03-05 04:38:19,743-Speed 9315.84 samples/sec Loss 4.1919 LearningRate 0.0009 Epoch: 6 Global Step: 11030 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-03-05 04:38:46,102-Speed 9324.06 samples/sec Loss 4.1908 LearningRate 0.0009 Epoch: 6 Global Step: 11040 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-03-05 04:39:12,584-Speed 9280.51 samples/sec Loss 4.1813 LearningRate 0.0009 Epoch: 6 Global Step: 11050 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-03-05 04:39:39,077-Speed 9277.30 samples/sec Loss 4.1830 LearningRate 0.0009 Epoch: 6 Global Step: 11060 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-03-05 04:40:05,513-Speed 9296.89 samples/sec Loss 4.2159 LearningRate 0.0009 Epoch: 6 Global Step: 11070 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-03-05 04:40:32,006-Speed 9276.75 samples/sec Loss 4.1664 LearningRate 0.0009 Epoch: 6 Global Step: 11080 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-03-05 04:40:58,441-Speed 9297.52 samples/sec Loss 4.1867 LearningRate 0.0009 Epoch: 6 Global Step: 11090 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-03-05 04:41:24,987-Speed 9258.33 samples/sec Loss 4.1869 LearningRate 0.0009 Epoch: 6 Global Step: 11100 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:41:51,467-Speed 9281.31 samples/sec Loss 4.1780 LearningRate 0.0009 Epoch: 6 Global Step: 11110 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:42:18,002-Speed 9261.99 samples/sec Loss 4.1472 LearningRate 0.0009 Epoch: 6 Global Step: 11120 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:42:44,549-Speed 9258.11 samples/sec Loss 4.1439 LearningRate 0.0009 Epoch: 6 Global Step: 11130 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:43:11,207-Speed 9219.38 samples/sec Loss 4.1699 LearningRate 0.0009 Epoch: 6 Global Step: 11140 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:43:38,008-Speed 9170.41 samples/sec Loss 4.1704 LearningRate 0.0009 Epoch: 6 Global Step: 11150 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:44:04,557-Speed 9257.31 samples/sec Loss 4.1334 LearningRate 0.0009 Epoch: 6 Global Step: 11160 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:44:30,976-Speed 9302.78 samples/sec Loss 4.1870 LearningRate 0.0009 Epoch: 6 Global Step: 11170 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:44:57,535-Speed 9253.55 samples/sec Loss 4.1509 LearningRate 0.0009 Epoch: 6 Global Step: 11180 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-03-05 04:45:24,112-Speed 9247.55 samples/sec Loss 4.1486 LearningRate 0.0009 Epoch: 6 Global Step: 11190 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-03-05 04:45:50,785-Speed 9214.50 samples/sec Loss 4.1123 LearningRate 0.0009 Epoch: 6 Global Step: 11200 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-03-05 04:46:17,359-Speed 9248.78 samples/sec Loss 4.1195 LearningRate 0.0009 Epoch: 6 Global Step: 11210 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-03-05 04:46:43,953-Speed 9241.35 samples/sec Loss 4.1457 LearningRate 0.0009 Epoch: 6 Global Step: 11220 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-03-05 04:47:10,626-Speed 9215.27 samples/sec Loss 4.1479 LearningRate 0.0009 Epoch: 6 Global Step: 11230 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-03-05 04:47:37,447-Speed 9163.24 samples/sec Loss 4.0932 LearningRate 0.0009 Epoch: 6 Global Step: 11240 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-03-05 04:48:04,056-Speed 9236.41 samples/sec Loss 4.1220 LearningRate 0.0009 Epoch: 6 Global Step: 11250 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-03-05 04:48:30,662-Speed 9237.66 samples/sec Loss 4.0950 LearningRate 0.0009 Epoch: 6 Global Step: 11260 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-03-05 04:48:57,219-Speed 9254.43 samples/sec Loss 4.1041 LearningRate 0.0009 Epoch: 6 Global Step: 11270 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-03-05 04:49:23,859-Speed 9225.66 samples/sec Loss 4.1125 LearningRate 0.0009 Epoch: 6 Global Step: 11280 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:49:50,434-Speed 9248.23 samples/sec Loss 4.1294 LearningRate 0.0009 Epoch: 6 Global Step: 11290 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:50:16,977-Speed 9259.59 samples/sec Loss 4.0979 LearningRate 0.0009 Epoch: 6 Global Step: 11300 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:50:43,616-Speed 9226.18 samples/sec Loss 4.1185 LearningRate 0.0009 Epoch: 6 Global Step: 11310 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:51:10,233-Speed 9233.49 samples/sec Loss 4.0918 LearningRate 0.0009 Epoch: 6 Global Step: 11320 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:51:36,823-Speed 9243.85 samples/sec Loss 4.0809 LearningRate 0.0009 Epoch: 6 Global Step: 11330 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:52:03,392-Speed 9250.27 samples/sec Loss 4.0894 LearningRate 0.0009 Epoch: 6 Global Step: 11340 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:52:29,837-Speed 9293.76 samples/sec Loss 4.0672 LearningRate 0.0009 Epoch: 6 Global Step: 11350 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:52:56,396-Speed 9253.86 samples/sec Loss 4.1072 LearningRate 0.0009 Epoch: 6 Global Step: 11360 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:53:22,965-Speed 9250.32 samples/sec Loss 4.0833 LearningRate 0.0009 Epoch: 6 Global Step: 11370 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-03-05 04:53:49,555-Speed 9243.32 samples/sec Loss 4.0623 LearningRate 0.0009 Epoch: 6 Global Step: 11380 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-03-05 04:54:16,025-Speed 9284.89 samples/sec Loss 4.0567 LearningRate 0.0009 Epoch: 6 Global Step: 11390 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-03-05 04:54:42,638-Speed 9234.75 samples/sec Loss 4.0584 LearningRate 0.0009 Epoch: 6 Global Step: 11400 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-03-05 04:55:09,179-Speed 9260.03 samples/sec Loss 4.0645 LearningRate 0.0009 Epoch: 6 Global Step: 11410 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 04:55:35,577-Speed 9310.07 samples/sec Loss 4.0933 LearningRate 0.0009 Epoch: 6 Global Step: 11420 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 04:56:02,061-Speed 9280.15 samples/sec Loss 4.0858 LearningRate 0.0009 Epoch: 6 Global Step: 11430 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 04:56:28,503-Speed 9294.79 samples/sec Loss 4.0482 LearningRate 0.0009 Epoch: 6 Global Step: 11440 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 04:56:54,971-Speed 9285.41 samples/sec Loss 4.0657 LearningRate 0.0009 Epoch: 6 Global Step: 11450 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 04:57:21,685-Speed 9200.16 samples/sec Loss 4.0795 LearningRate 0.0009 Epoch: 6 Global Step: 11460 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 04:57:48,248-Speed 9252.59 samples/sec Loss 4.0390 LearningRate 0.0009 Epoch: 6 Global Step: 11470 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 04:58:14,902-Speed 9220.81 samples/sec Loss 4.0505 LearningRate 0.0009 Epoch: 6 Global Step: 11480 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 04:58:41,600-Speed 9205.86 samples/sec Loss 4.0421 LearningRate 0.0009 Epoch: 6 Global Step: 11490 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 04:59:08,363-Speed 9183.23 samples/sec Loss 4.0082 LearningRate 0.0009 Epoch: 6 Global Step: 11500 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 04:59:35,038-Speed 9213.58 samples/sec Loss 4.0255 LearningRate 0.0009 Epoch: 6 Global Step: 11510 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-03-05 05:00:01,708-Speed 9215.07 samples/sec Loss 4.0331 LearningRate 0.0009 Epoch: 6 Global Step: 11520 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-03-05 05:00:28,361-Speed 9221.46 samples/sec Loss 4.0340 LearningRate 0.0009 Epoch: 6 Global Step: 11530 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-03-05 05:00:54,974-Speed 9235.09 samples/sec Loss 4.0377 LearningRate 0.0009 Epoch: 6 Global Step: 11540 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:01:21,595-Speed 9232.18 samples/sec Loss 4.0418 LearningRate 0.0009 Epoch: 6 Global Step: 11550 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:01:48,141-Speed 9258.53 samples/sec Loss 4.0209 LearningRate 0.0009 Epoch: 6 Global Step: 11560 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:02:14,718-Speed 9247.30 samples/sec Loss 4.0191 LearningRate 0.0009 Epoch: 6 Global Step: 11570 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:02:41,330-Speed 9235.54 samples/sec Loss 3.9881 LearningRate 0.0009 Epoch: 6 Global Step: 11580 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:03:07,950-Speed 9232.36 samples/sec Loss 4.0194 LearningRate 0.0009 Epoch: 6 Global Step: 11590 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:03:34,606-Speed 9220.20 samples/sec Loss 4.0299 LearningRate 0.0009 Epoch: 6 Global Step: 11600 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:04:01,232-Speed 9230.27 samples/sec Loss 3.9816 LearningRate 0.0009 Epoch: 6 Global Step: 11610 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-03-05 05:04:27,821-Speed 9243.52 samples/sec Loss 4.0328 LearningRate 0.0009 Epoch: 6 Global Step: 11620 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-03-05 05:04:54,391-Speed 9250.10 samples/sec Loss 4.0067 LearningRate 0.0009 Epoch: 6 Global Step: 11630 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-03-05 05:05:20,795-Speed 9307.90 samples/sec Loss 4.0017 LearningRate 0.0009 Epoch: 6 Global Step: 11640 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-03-05 05:05:47,092-Speed 9346.05 samples/sec Loss 3.9691 LearningRate 0.0009 Epoch: 6 Global Step: 11650 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-03-05 05:06:13,423-Speed 9333.84 samples/sec Loss 4.0049 LearningRate 0.0009 Epoch: 6 Global Step: 11660 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-03-05 05:06:39,923-Speed 9274.79 samples/sec Loss 3.9609 LearningRate 0.0009 Epoch: 6 Global Step: 11670 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-03-05 05:07:06,355-Speed 9298.28 samples/sec Loss 4.0212 LearningRate 0.0009 Epoch: 6 Global Step: 11680 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-03-05 05:07:32,909-Speed 9255.53 samples/sec Loss 3.9503 LearningRate 0.0009 Epoch: 6 Global Step: 11690 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-03-05 05:07:59,438-Speed 9264.29 samples/sec Loss 3.9697 LearningRate 0.0009 Epoch: 6 Global Step: 11700 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-03-05 05:08:25,914-Speed 9282.95 samples/sec Loss 3.9778 LearningRate 0.0009 Epoch: 6 Global Step: 11710 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-03-05 05:08:52,298-Speed 9315.04 samples/sec Loss 4.0403 LearningRate 0.0009 Epoch: 6 Global Step: 11720 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-03-05 05:09:18,694-Speed 9311.02 samples/sec Loss 3.9831 LearningRate 0.0009 Epoch: 6 Global Step: 11730 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-03-05 05:09:45,180-Speed 9279.16 samples/sec Loss 3.9591 LearningRate 0.0009 Epoch: 6 Global Step: 11740 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-03-05 05:10:11,641-Speed 9287.90 samples/sec Loss 3.9863 LearningRate 0.0009 Epoch: 6 Global Step: 11750 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-03-05 05:10:38,194-Speed 9255.83 samples/sec Loss 3.9568 LearningRate 0.0009 Epoch: 6 Global Step: 11760 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-03-05 05:11:04,745-Speed 9256.46 samples/sec Loss 3.9589 LearningRate 0.0008 Epoch: 6 Global Step: 11770 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-03-05 05:11:31,253-Speed 9271.98 samples/sec Loss 3.9636 LearningRate 0.0008 Epoch: 6 Global Step: 11780 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-03-05 05:11:57,618-Speed 9321.97 samples/sec Loss 3.9330 LearningRate 0.0008 Epoch: 6 Global Step: 11790 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-03-05 05:12:24,054-Speed 9296.75 samples/sec Loss 3.9061 LearningRate 0.0008 Epoch: 6 Global Step: 11800 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-03-05 05:12:50,336-Speed 9351.18 samples/sec Loss 3.9446 LearningRate 0.0008 Epoch: 6 Global Step: 11810 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-03-05 05:13:16,721-Speed 9314.89 samples/sec Loss 3.9631 LearningRate 0.0008 Epoch: 6 Global Step: 11820 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:13:43,259-Speed 9261.18 samples/sec Loss 3.9583 LearningRate 0.0008 Epoch: 6 Global Step: 11830 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:14:09,784-Speed 9265.59 samples/sec Loss 3.9326 LearningRate 0.0008 Epoch: 6 Global Step: 11840 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:14:36,314-Speed 9264.00 samples/sec Loss 3.9576 LearningRate 0.0008 Epoch: 6 Global Step: 11850 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:15:02,902-Speed 9243.93 samples/sec Loss 3.9427 LearningRate 0.0008 Epoch: 6 Global Step: 11860 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:15:29,404-Speed 9273.60 samples/sec Loss 3.9276 LearningRate 0.0008 Epoch: 6 Global Step: 11870 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:15:55,996-Speed 9242.49 samples/sec Loss 3.9447 LearningRate 0.0008 Epoch: 6 Global Step: 11880 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:16:22,415-Speed 9302.80 samples/sec Loss 3.9627 LearningRate 0.0008 Epoch: 6 Global Step: 11890 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:16:48,915-Speed 9274.47 samples/sec Loss 3.9219 LearningRate 0.0008 Epoch: 6 Global Step: 11900 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:17:15,532-Speed 9233.67 samples/sec Loss 3.9297 LearningRate 0.0008 Epoch: 6 Global Step: 11910 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:17:42,386-Speed 9152.24 samples/sec Loss 3.9099 LearningRate 0.0008 Epoch: 6 Global Step: 11920 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-03-05 05:18:08,945-Speed 9253.74 samples/sec Loss 3.8764 LearningRate 0.0008 Epoch: 6 Global Step: 11930 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-03-05 05:18:35,388-Speed 9294.58 samples/sec Loss 3.9105 LearningRate 0.0008 Epoch: 6 Global Step: 11940 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-03-05 05:19:02,081-Speed 9207.34 samples/sec Loss 3.9661 LearningRate 0.0008 Epoch: 6 Global Step: 11950 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-03-05 05:19:28,723-Speed 9225.24 samples/sec Loss 3.9365 LearningRate 0.0008 Epoch: 6 Global Step: 11960 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-03-05 05:19:55,369-Speed 9223.67 samples/sec Loss 3.8992 LearningRate 0.0008 Epoch: 6 Global Step: 11970 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-03-05 05:20:21,925-Speed 9254.73 samples/sec Loss 3.9749 LearningRate 0.0008 Epoch: 6 Global Step: 11980 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:20:48,467-Speed 9259.71 samples/sec Loss 3.9024 LearningRate 0.0008 Epoch: 6 Global Step: 11990 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:21:14,982-Speed 9269.16 samples/sec Loss 3.8782 LearningRate 0.0008 Epoch: 6 Global Step: 12000 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:21:41,520-Speed 9261.06 samples/sec Loss 3.9037 LearningRate 0.0008 Epoch: 6 Global Step: 12010 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:22:07,989-Speed 9285.06 samples/sec Loss 3.8851 LearningRate 0.0008 Epoch: 6 Global Step: 12020 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:22:34,499-Speed 9271.01 samples/sec Loss 3.8847 LearningRate 0.0008 Epoch: 6 Global Step: 12030 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:23:01,028-Speed 9264.03 samples/sec Loss 3.9077 LearningRate 0.0008 Epoch: 6 Global Step: 12040 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:23:27,704-Speed 9213.36 samples/sec Loss 3.9189 LearningRate 0.0008 Epoch: 6 Global Step: 12050 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:23:54,237-Speed 9262.82 samples/sec Loss 3.8815 LearningRate 0.0008 Epoch: 6 Global Step: 12060 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:24:20,783-Speed 9258.35 samples/sec Loss 3.9078 LearningRate 0.0008 Epoch: 6 Global Step: 12070 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:24:47,302-Speed 9267.76 samples/sec Loss 3.9332 LearningRate 0.0008 Epoch: 6 Global Step: 12080 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-03-05 05:25:13,795-Speed 9276.69 samples/sec Loss 3.9283 LearningRate 0.0008 Epoch: 6 Global Step: 12090 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-03-05 05:26:33,891-Speed 3068.38 samples/sec Loss 3.9080 LearningRate 0.0008 Epoch: 7 Global Step: 12100 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-03-05 05:26:59,771-Speed 9496.64 samples/sec Loss 3.8232 LearningRate 0.0008 Epoch: 7 Global Step: 12110 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-03-05 05:27:25,799-Speed 9442.93 samples/sec Loss 3.8221 LearningRate 0.0008 Epoch: 7 Global Step: 12120 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-03-05 05:27:52,090-Speed 9347.93 samples/sec Loss 3.8340 LearningRate 0.0008 Epoch: 7 Global Step: 12130 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-03-05 05:28:18,565-Speed 9283.02 samples/sec Loss 3.8684 LearningRate 0.0008 Epoch: 7 Global Step: 12140 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-03-05 05:28:45,001-Speed 9296.94 samples/sec Loss 3.8362 LearningRate 0.0008 Epoch: 7 Global Step: 12150 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-03-05 05:29:11,550-Speed 9257.57 samples/sec Loss 3.8095 LearningRate 0.0008 Epoch: 7 Global Step: 12160 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-03-05 05:29:37,970-Speed 9302.39 samples/sec Loss 3.8048 LearningRate 0.0008 Epoch: 7 Global Step: 12170 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-03-05 05:30:04,483-Speed 9269.90 samples/sec Loss 3.8324 LearningRate 0.0008 Epoch: 7 Global Step: 12180 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:30:30,875-Speed 9312.48 samples/sec Loss 3.7976 LearningRate 0.0008 Epoch: 7 Global Step: 12190 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:30:57,263-Speed 9313.48 samples/sec Loss 3.8625 LearningRate 0.0008 Epoch: 7 Global Step: 12200 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:31:23,550-Speed 9349.58 samples/sec Loss 3.8145 LearningRate 0.0008 Epoch: 7 Global Step: 12210 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:31:49,912-Speed 9322.86 samples/sec Loss 3.8044 LearningRate 0.0008 Epoch: 7 Global Step: 12220 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:32:16,265-Speed 9326.48 samples/sec Loss 3.7914 LearningRate 0.0008 Epoch: 7 Global Step: 12230 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:32:42,756-Speed 9277.71 samples/sec Loss 3.8331 LearningRate 0.0008 Epoch: 7 Global Step: 12240 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:33:09,221-Speed 9286.32 samples/sec Loss 3.8283 LearningRate 0.0008 Epoch: 7 Global Step: 12250 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:33:35,747-Speed 9265.35 samples/sec Loss 3.8165 LearningRate 0.0008 Epoch: 7 Global Step: 12260 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:34:02,222-Speed 9283.56 samples/sec Loss 3.8043 LearningRate 0.0008 Epoch: 7 Global Step: 12270 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:34:28,745-Speed 9266.42 samples/sec Loss 3.8027 LearningRate 0.0008 Epoch: 7 Global Step: 12280 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-03-05 05:34:55,310-Speed 9251.52 samples/sec Loss 3.8219 LearningRate 0.0008 Epoch: 7 Global Step: 12290 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-03-05 05:35:21,880-Speed 9250.33 samples/sec Loss 3.8579 LearningRate 0.0008 Epoch: 7 Global Step: 12300 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-03-05 05:35:48,483-Speed 9238.51 samples/sec Loss 3.8310 LearningRate 0.0008 Epoch: 7 Global Step: 12310 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-03-05 05:36:15,080-Speed 9240.25 samples/sec Loss 3.7960 LearningRate 0.0008 Epoch: 7 Global Step: 12320 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-03-05 05:36:41,683-Speed 9238.83 samples/sec Loss 3.7672 LearningRate 0.0008 Epoch: 7 Global Step: 12330 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-03-05 05:37:08,205-Speed 9266.85 samples/sec Loss 3.7747 LearningRate 0.0008 Epoch: 7 Global Step: 12340 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-03-05 05:37:34,824-Speed 9232.91 samples/sec Loss 3.7931 LearningRate 0.0008 Epoch: 7 Global Step: 12350 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-03-05 05:38:01,337-Speed 9269.92 samples/sec Loss 3.7798 LearningRate 0.0008 Epoch: 7 Global Step: 12360 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-03-05 05:38:27,872-Speed 9262.24 samples/sec Loss 3.8246 LearningRate 0.0008 Epoch: 7 Global Step: 12370 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-03-05 05:38:54,439-Speed 9251.01 samples/sec Loss 3.8160 LearningRate 0.0008 Epoch: 7 Global Step: 12380 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:39:21,023-Speed 9244.87 samples/sec Loss 3.7872 LearningRate 0.0008 Epoch: 7 Global Step: 12390 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:39:47,630-Speed 9237.17 samples/sec Loss 3.7740 LearningRate 0.0008 Epoch: 7 Global Step: 12400 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:40:14,214-Speed 9245.21 samples/sec Loss 3.8145 LearningRate 0.0008 Epoch: 7 Global Step: 12410 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:40:40,655-Speed 9295.28 samples/sec Loss 3.8784 LearningRate 0.0008 Epoch: 7 Global Step: 12420 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:41:07,176-Speed 9267.10 samples/sec Loss 3.7892 LearningRate 0.0008 Epoch: 7 Global Step: 12430 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:41:33,807-Speed 9228.90 samples/sec Loss 3.7757 LearningRate 0.0008 Epoch: 7 Global Step: 12440 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:42:00,504-Speed 9205.83 samples/sec Loss 3.7821 LearningRate 0.0008 Epoch: 7 Global Step: 12450 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:42:27,052-Speed 9257.58 samples/sec Loss 3.7757 LearningRate 0.0008 Epoch: 7 Global Step: 12460 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:42:53,526-Speed 9283.66 samples/sec Loss 3.7703 LearningRate 0.0008 Epoch: 7 Global Step: 12470 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:43:20,195-Speed 9215.51 samples/sec Loss 3.7571 LearningRate 0.0008 Epoch: 7 Global Step: 12480 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-03-05 05:43:46,550-Speed 9325.48 samples/sec Loss 3.7585 LearningRate 0.0008 Epoch: 7 Global Step: 12490 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:44:13,207-Speed 9219.61 samples/sec Loss 3.7946 LearningRate 0.0008 Epoch: 7 Global Step: 12500 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:44:39,757-Speed 9257.55 samples/sec Loss 3.7487 LearningRate 0.0008 Epoch: 7 Global Step: 12510 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:45:06,318-Speed 9252.99 samples/sec Loss 3.7541 LearningRate 0.0008 Epoch: 7 Global Step: 12520 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:45:32,866-Speed 9257.80 samples/sec Loss 3.7577 LearningRate 0.0008 Epoch: 7 Global Step: 12530 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:45:59,467-Speed 9239.10 samples/sec Loss 3.7296 LearningRate 0.0008 Epoch: 7 Global Step: 12540 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-03-05 05:46:26,126-Speed 9219.39 samples/sec Loss 3.7704 LearningRate 0.0008 Epoch: 7 Global Step: 12550 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-03-05 05:46:52,752-Speed 9230.13 samples/sec Loss 3.7190 LearningRate 0.0008 Epoch: 7 Global Step: 12560 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-03-05 05:47:19,415-Speed 9217.77 samples/sec Loss 3.7333 LearningRate 0.0008 Epoch: 7 Global Step: 12570 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-03-05 05:47:45,976-Speed 9253.33 samples/sec Loss 3.7382 LearningRate 0.0008 Epoch: 7 Global Step: 12580 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-03-05 05:48:12,684-Speed 9201.95 samples/sec Loss 3.7398 LearningRate 0.0008 Epoch: 7 Global Step: 12590 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-03-05 05:48:39,206-Speed 9266.34 samples/sec Loss 3.7470 LearningRate 0.0008 Epoch: 7 Global Step: 12600 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-03-05 05:49:05,765-Speed 9254.21 samples/sec Loss 3.7460 LearningRate 0.0008 Epoch: 7 Global Step: 12610 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-03-05 05:49:32,377-Speed 9235.50 samples/sec Loss 3.7799 LearningRate 0.0008 Epoch: 7 Global Step: 12620 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-03-05 05:49:59,024-Speed 9223.15 samples/sec Loss 3.7588 LearningRate 0.0008 Epoch: 7 Global Step: 12630 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-03-05 05:50:25,523-Speed 9275.03 samples/sec Loss 3.7113 LearningRate 0.0008 Epoch: 7 Global Step: 12640 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:50:51,806-Speed 9350.91 samples/sec Loss 3.7089 LearningRate 0.0008 Epoch: 7 Global Step: 12650 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:51:18,248-Speed 9294.64 samples/sec Loss 3.7309 LearningRate 0.0008 Epoch: 7 Global Step: 12660 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:51:44,692-Speed 9293.76 samples/sec Loss 3.6950 LearningRate 0.0008 Epoch: 7 Global Step: 12670 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:52:11,087-Speed 9311.54 samples/sec Loss 3.6999 LearningRate 0.0008 Epoch: 7 Global Step: 12680 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:52:37,474-Speed 9313.89 samples/sec Loss 3.7218 LearningRate 0.0008 Epoch: 7 Global Step: 12690 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:53:03,782-Speed 9342.31 samples/sec Loss 3.7163 LearningRate 0.0008 Epoch: 7 Global Step: 12700 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:53:30,108-Speed 9335.69 samples/sec Loss 3.7222 LearningRate 0.0008 Epoch: 7 Global Step: 12710 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:53:56,425-Speed 9339.07 samples/sec Loss 3.6733 LearningRate 0.0008 Epoch: 7 Global Step: 12720 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:54:22,678-Speed 9361.65 samples/sec Loss 3.6950 LearningRate 0.0008 Epoch: 7 Global Step: 12730 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:54:49,023-Speed 9328.95 samples/sec Loss 3.6721 LearningRate 0.0008 Epoch: 7 Global Step: 12740 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-03-05 05:55:15,267-Speed 9364.95 samples/sec Loss 3.6777 LearningRate 0.0008 Epoch: 7 Global Step: 12750 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-03-05 05:55:41,527-Speed 9359.42 samples/sec Loss 3.7502 LearningRate 0.0008 Epoch: 7 Global Step: 12760 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:56:07,873-Speed 9328.56 samples/sec Loss 3.7476 LearningRate 0.0008 Epoch: 7 Global Step: 12770 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:56:34,286-Speed 9305.00 samples/sec Loss 3.6997 LearningRate 0.0008 Epoch: 7 Global Step: 12780 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:57:00,671-Speed 9315.38 samples/sec Loss 3.7200 LearningRate 0.0008 Epoch: 7 Global Step: 12790 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:57:26,930-Speed 9360.57 samples/sec Loss 3.7112 LearningRate 0.0008 Epoch: 7 Global Step: 12800 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:57:53,424-Speed 9276.40 samples/sec Loss 3.7005 LearningRate 0.0008 Epoch: 7 Global Step: 12810 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:58:19,714-Speed 9348.70 samples/sec Loss 3.6858 LearningRate 0.0008 Epoch: 7 Global Step: 12820 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:58:45,986-Speed 9354.60 samples/sec Loss 3.6538 LearningRate 0.0008 Epoch: 7 Global Step: 12830 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:59:12,272-Speed 9349.68 samples/sec Loss 3.6931 LearningRate 0.0008 Epoch: 7 Global Step: 12840 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-03-05 05:59:38,582-Speed 9341.50 samples/sec Loss 3.6921 LearningRate 0.0008 Epoch: 7 Global Step: 12850 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:00:04,836-Speed 9361.38 samples/sec Loss 3.6689 LearningRate 0.0008 Epoch: 7 Global Step: 12860 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:00:31,105-Speed 9355.85 samples/sec Loss 3.6435 LearningRate 0.0008 Epoch: 7 Global Step: 12870 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:00:57,413-Speed 9341.98 samples/sec Loss 3.6872 LearningRate 0.0008 Epoch: 7 Global Step: 12880 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:01:23,741-Speed 9335.15 samples/sec Loss 3.6858 LearningRate 0.0008 Epoch: 7 Global Step: 12890 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:01:49,931-Speed 9384.39 samples/sec Loss 3.6615 LearningRate 0.0008 Epoch: 7 Global Step: 12900 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:02:16,267-Speed 9331.99 samples/sec Loss 3.6571 LearningRate 0.0008 Epoch: 7 Global Step: 12910 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:02:42,534-Speed 9356.68 samples/sec Loss 3.6387 LearningRate 0.0008 Epoch: 7 Global Step: 12920 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:03:08,853-Speed 9338.41 samples/sec Loss 3.6477 LearningRate 0.0008 Epoch: 7 Global Step: 12930 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:03:35,162-Speed 9341.55 samples/sec Loss 3.6745 LearningRate 0.0008 Epoch: 7 Global Step: 12940 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:04:01,517-Speed 9325.38 samples/sec Loss 3.6283 LearningRate 0.0008 Epoch: 7 Global Step: 12950 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:04:27,750-Speed 9368.68 samples/sec Loss 3.6102 LearningRate 0.0008 Epoch: 7 Global Step: 12960 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-03-05 06:04:54,074-Speed 9336.41 samples/sec Loss 3.6383 LearningRate 0.0008 Epoch: 7 Global Step: 12970 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-03-05 06:05:20,384-Speed 9341.31 samples/sec Loss 3.6549 LearningRate 0.0008 Epoch: 7 Global Step: 12980 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-03-05 06:05:46,801-Speed 9303.61 samples/sec Loss 3.6238 LearningRate 0.0008 Epoch: 7 Global Step: 12990 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-03-05 06:06:13,232-Speed 9298.36 samples/sec Loss 3.6387 LearningRate 0.0008 Epoch: 7 Global Step: 13000 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:06:39,738-Speed 9272.29 samples/sec Loss 3.6516 LearningRate 0.0008 Epoch: 7 Global Step: 13010 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:07:06,177-Speed 9295.75 samples/sec Loss 3.6193 LearningRate 0.0008 Epoch: 7 Global Step: 13020 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:07:32,561-Speed 9314.93 samples/sec Loss 3.6322 LearningRate 0.0008 Epoch: 7 Global Step: 13030 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:07:59,025-Speed 9287.21 samples/sec Loss 3.6490 LearningRate 0.0008 Epoch: 7 Global Step: 13040 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:08:25,527-Speed 9273.98 samples/sec Loss 3.6372 LearningRate 0.0008 Epoch: 7 Global Step: 13050 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:08:51,842-Speed 9339.58 samples/sec Loss 3.6303 LearningRate 0.0008 Epoch: 7 Global Step: 13060 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:09:18,248-Speed 9307.20 samples/sec Loss 3.5975 LearningRate 0.0008 Epoch: 7 Global Step: 13070 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:09:44,622-Speed 9318.79 samples/sec Loss 3.5972 LearningRate 0.0008 Epoch: 7 Global Step: 13080 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:10:10,891-Speed 9356.18 samples/sec Loss 3.6287 LearningRate 0.0008 Epoch: 7 Global Step: 13090 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:10:37,308-Speed 9303.68 samples/sec Loss 3.6388 LearningRate 0.0008 Epoch: 7 Global Step: 13100 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-03-05 06:11:03,638-Speed 9334.22 samples/sec Loss 3.6054 LearningRate 0.0008 Epoch: 7 Global Step: 13110 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-03-05 06:11:30,046-Speed 9306.44 samples/sec Loss 3.6110 LearningRate 0.0008 Epoch: 7 Global Step: 13120 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-03-05 06:11:56,495-Speed 9292.64 samples/sec Loss 3.5852 LearningRate 0.0008 Epoch: 7 Global Step: 13130 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-03-05 06:12:22,960-Speed 9286.70 samples/sec Loss 3.6133 LearningRate 0.0008 Epoch: 7 Global Step: 13140 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-03-05 06:12:49,325-Speed 9322.00 samples/sec Loss 3.6084 LearningRate 0.0008 Epoch: 7 Global Step: 13150 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:13:15,643-Speed 9338.50 samples/sec Loss 3.5983 LearningRate 0.0008 Epoch: 7 Global Step: 13160 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:13:42,094-Speed 9291.63 samples/sec Loss 3.5935 LearningRate 0.0008 Epoch: 7 Global Step: 13170 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:14:08,441-Speed 9328.08 samples/sec Loss 3.5767 LearningRate 0.0008 Epoch: 7 Global Step: 13180 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:14:34,798-Speed 9324.71 samples/sec Loss 3.5852 LearningRate 0.0008 Epoch: 7 Global Step: 13190 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:15:01,090-Speed 9347.92 samples/sec Loss 3.5823 LearningRate 0.0008 Epoch: 7 Global Step: 13200 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:15:27,433-Speed 9329.96 samples/sec Loss 3.5798 LearningRate 0.0008 Epoch: 7 Global Step: 13210 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:15:53,674-Speed 9366.12 samples/sec Loss 3.6014 LearningRate 0.0008 Epoch: 7 Global Step: 13220 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:16:20,047-Speed 9319.13 samples/sec Loss 3.5774 LearningRate 0.0008 Epoch: 7 Global Step: 13230 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:16:46,288-Speed 9365.95 samples/sec Loss 3.5877 LearningRate 0.0008 Epoch: 7 Global Step: 13240 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:17:12,503-Speed 9375.70 samples/sec Loss 3.5744 LearningRate 0.0008 Epoch: 7 Global Step: 13250 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-03-05 06:17:38,818-Speed 9339.66 samples/sec Loss 3.5870 LearningRate 0.0008 Epoch: 7 Global Step: 13260 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-03-05 06:18:05,096-Speed 9352.75 samples/sec Loss 3.5919 LearningRate 0.0008 Epoch: 7 Global Step: 13270 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-03-05 06:18:31,383-Speed 9349.46 samples/sec Loss 3.5559 LearningRate 0.0008 Epoch: 7 Global Step: 13280 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-03-05 06:18:57,749-Speed 9321.51 samples/sec Loss 3.6124 LearningRate 0.0008 Epoch: 7 Global Step: 13290 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-03-05 06:19:24,012-Speed 9358.16 samples/sec Loss 3.5583 LearningRate 0.0008 Epoch: 7 Global Step: 13300 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-03-05 06:19:50,190-Speed 9388.29 samples/sec Loss 3.5539 LearningRate 0.0008 Epoch: 7 Global Step: 13310 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-03-05 06:20:16,435-Speed 9364.42 samples/sec Loss 3.5989 LearningRate 0.0008 Epoch: 7 Global Step: 13320 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-03-05 06:20:42,647-Speed 9377.16 samples/sec Loss 3.5886 LearningRate 0.0008 Epoch: 7 Global Step: 13330 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:21:08,787-Speed 9402.05 samples/sec Loss 3.5693 LearningRate 0.0008 Epoch: 7 Global Step: 13340 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:21:34,920-Speed 9404.41 samples/sec Loss 3.5667 LearningRate 0.0008 Epoch: 7 Global Step: 13350 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:22:01,110-Speed 9384.37 samples/sec Loss 3.5206 LearningRate 0.0008 Epoch: 7 Global Step: 13360 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:22:27,324-Speed 9375.49 samples/sec Loss 3.5453 LearningRate 0.0008 Epoch: 7 Global Step: 13370 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:22:53,649-Speed 9335.95 samples/sec Loss 3.5784 LearningRate 0.0008 Epoch: 7 Global Step: 13380 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:23:19,857-Speed 9377.81 samples/sec Loss 3.5552 LearningRate 0.0008 Epoch: 7 Global Step: 13390 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:23:46,129-Speed 9354.68 samples/sec Loss 3.5336 LearningRate 0.0008 Epoch: 7 Global Step: 13400 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:24:12,399-Speed 9355.45 samples/sec Loss 3.5229 LearningRate 0.0008 Epoch: 7 Global Step: 13410 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:24:38,575-Speed 9389.59 samples/sec Loss 3.5399 LearningRate 0.0008 Epoch: 7 Global Step: 13420 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:25:04,806-Speed 9369.17 samples/sec Loss 3.5515 LearningRate 0.0008 Epoch: 7 Global Step: 13430 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-03-05 06:25:31,448-Speed 9225.10 samples/sec Loss 3.5428 LearningRate 0.0008 Epoch: 7 Global Step: 13440 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-03-05 06:25:58,191-Speed 9190.02 samples/sec Loss 3.5084 LearningRate 0.0008 Epoch: 7 Global Step: 13450 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-03-05 06:26:24,756-Speed 9251.76 samples/sec Loss 3.5344 LearningRate 0.0008 Epoch: 7 Global Step: 13460 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-03-05 06:26:51,251-Speed 9276.05 samples/sec Loss 3.5475 LearningRate 0.0008 Epoch: 7 Global Step: 13470 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-03-05 06:27:17,691-Speed 9295.35 samples/sec Loss 3.5410 LearningRate 0.0008 Epoch: 7 Global Step: 13480 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-03-05 06:27:44,152-Speed 9287.95 samples/sec Loss 3.5740 LearningRate 0.0008 Epoch: 7 Global Step: 13490 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-03-05 06:28:10,365-Speed 9375.91 samples/sec Loss 3.5250 LearningRate 0.0008 Epoch: 7 Global Step: 13500 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-03-05 06:28:36,566-Speed 9380.30 samples/sec Loss 3.5293 LearningRate 0.0008 Epoch: 7 Global Step: 13510 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-03-05 06:29:02,814-Speed 9364.20 samples/sec Loss 3.5287 LearningRate 0.0008 Epoch: 7 Global Step: 13520 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-03-05 06:29:28,972-Speed 9395.52 samples/sec Loss 3.5289 LearningRate 0.0008 Epoch: 7 Global Step: 13530 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-03-05 06:29:55,154-Speed 9387.07 samples/sec Loss 3.5210 LearningRate 0.0008 Epoch: 7 Global Step: 13540 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-03-05 06:30:21,331-Speed 9389.19 samples/sec Loss 3.5305 LearningRate 0.0008 Epoch: 7 Global Step: 13550 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-03-05 06:30:47,551-Speed 9373.31 samples/sec Loss 3.5021 LearningRate 0.0008 Epoch: 7 Global Step: 13560 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-03-05 06:31:13,723-Speed 9390.69 samples/sec Loss 3.5137 LearningRate 0.0008 Epoch: 7 Global Step: 13570 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-03-05 06:31:39,889-Speed 9392.70 samples/sec Loss 3.5247 LearningRate 0.0008 Epoch: 7 Global Step: 13580 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-03-05 06:32:06,088-Speed 9380.91 samples/sec Loss 3.4956 LearningRate 0.0008 Epoch: 7 Global Step: 13590 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-03-05 06:32:32,349-Speed 9358.78 samples/sec Loss 3.5008 LearningRate 0.0008 Epoch: 7 Global Step: 13600 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-03-05 06:32:58,530-Speed 9387.56 samples/sec Loss 3.4890 LearningRate 0.0008 Epoch: 7 Global Step: 13610 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-03-05 06:33:24,747-Speed 9374.40 samples/sec Loss 3.5034 LearningRate 0.0008 Epoch: 7 Global Step: 13620 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-03-05 06:33:50,891-Speed 9400.46 samples/sec Loss 3.5019 LearningRate 0.0008 Epoch: 7 Global Step: 13630 Fp16 Grad Scale: 262144 Required: 41 hours Training: 2022-03-05 06:34:17,041-Speed 9398.72 samples/sec Loss 3.4808 LearningRate 0.0008 Epoch: 7 Global Step: 13640 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-03-05 06:34:43,285-Speed 9364.59 samples/sec Loss 3.5095 LearningRate 0.0008 Epoch: 7 Global Step: 13650 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-03-05 06:35:09,467-Speed 9387.01 samples/sec Loss 3.4719 LearningRate 0.0008 Epoch: 7 Global Step: 13660 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-03-05 06:35:35,666-Speed 9380.80 samples/sec Loss 3.4913 LearningRate 0.0008 Epoch: 7 Global Step: 13670 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-03-05 06:36:01,876-Speed 9377.00 samples/sec Loss 3.4935 LearningRate 0.0008 Epoch: 7 Global Step: 13680 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-03-05 06:36:28,323-Speed 9293.01 samples/sec Loss 3.5251 LearningRate 0.0008 Epoch: 7 Global Step: 13690 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-03-05 06:36:54,508-Speed 9385.90 samples/sec Loss 3.4966 LearningRate 0.0008 Epoch: 7 Global Step: 13700 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-03-05 06:37:20,691-Speed 9386.77 samples/sec Loss 3.4942 LearningRate 0.0008 Epoch: 7 Global Step: 13710 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-03-05 06:37:46,964-Speed 9354.35 samples/sec Loss 3.5043 LearningRate 0.0008 Epoch: 7 Global Step: 13720 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-03-05 06:38:13,223-Speed 9359.44 samples/sec Loss 3.5048 LearningRate 0.0008 Epoch: 7 Global Step: 13730 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-03-05 06:38:39,383-Speed 9394.83 samples/sec Loss 3.4845 LearningRate 0.0008 Epoch: 7 Global Step: 13740 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-03-05 06:39:05,663-Speed 9351.99 samples/sec Loss 3.4448 LearningRate 0.0008 Epoch: 7 Global Step: 13750 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-03-05 06:39:31,793-Speed 9405.73 samples/sec Loss 3.4816 LearningRate 0.0008 Epoch: 7 Global Step: 13760 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-03-05 06:39:58,005-Speed 9378.34 samples/sec Loss 3.4882 LearningRate 0.0008 Epoch: 7 Global Step: 13770 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:40:24,282-Speed 9353.12 samples/sec Loss 3.4727 LearningRate 0.0008 Epoch: 7 Global Step: 13780 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:40:50,434-Speed 9397.62 samples/sec Loss 3.4705 LearningRate 0.0008 Epoch: 7 Global Step: 13790 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:41:16,651-Speed 9374.68 samples/sec Loss 3.5069 LearningRate 0.0008 Epoch: 7 Global Step: 13800 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:41:42,793-Speed 9401.72 samples/sec Loss 3.5037 LearningRate 0.0008 Epoch: 7 Global Step: 13810 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:42:08,913-Speed 9409.16 samples/sec Loss 3.5273 LearningRate 0.0008 Epoch: 7 Global Step: 13820 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:43:28,504-Speed 3087.83 samples/sec Loss 3.4699 LearningRate 0.0008 Epoch: 8 Global Step: 13830 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:43:54,428-Speed 9480.49 samples/sec Loss 3.4148 LearningRate 0.0008 Epoch: 8 Global Step: 13840 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:44:20,498-Speed 9427.46 samples/sec Loss 3.4215 LearningRate 0.0008 Epoch: 8 Global Step: 13850 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:44:46,522-Speed 9444.12 samples/sec Loss 3.4434 LearningRate 0.0008 Epoch: 8 Global Step: 13860 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:45:12,549-Speed 9442.85 samples/sec Loss 3.4109 LearningRate 0.0008 Epoch: 8 Global Step: 13870 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-03-05 06:45:38,549-Speed 9452.97 samples/sec Loss 3.4136 LearningRate 0.0008 Epoch: 8 Global Step: 13880 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-03-05 06:46:04,525-Speed 9461.37 samples/sec Loss 3.4277 LearningRate 0.0008 Epoch: 8 Global Step: 13890 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:46:30,615-Speed 9420.00 samples/sec Loss 3.4242 LearningRate 0.0008 Epoch: 8 Global Step: 13900 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:46:56,751-Speed 9403.43 samples/sec Loss 3.4039 LearningRate 0.0008 Epoch: 8 Global Step: 13910 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:47:22,865-Speed 9411.53 samples/sec Loss 3.4109 LearningRate 0.0008 Epoch: 8 Global Step: 13920 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:47:48,946-Speed 9422.99 samples/sec Loss 3.4259 LearningRate 0.0008 Epoch: 8 Global Step: 13930 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:48:15,099-Speed 9397.51 samples/sec Loss 3.4072 LearningRate 0.0008 Epoch: 8 Global Step: 13940 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:48:41,285-Speed 9386.38 samples/sec Loss 3.4168 LearningRate 0.0008 Epoch: 8 Global Step: 13950 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:49:07,380-Speed 9418.28 samples/sec Loss 3.4595 LearningRate 0.0008 Epoch: 8 Global Step: 13960 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-03-05 06:49:33,511-Speed 9405.22 samples/sec Loss 3.4217 LearningRate 0.0008 Epoch: 8 Global Step: 13970 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-03-05 06:49:59,665-Speed 9396.67 samples/sec Loss 3.4276 LearningRate 0.0008 Epoch: 8 Global Step: 13980 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-03-05 06:50:25,888-Speed 9372.59 samples/sec Loss 3.4061 LearningRate 0.0008 Epoch: 8 Global Step: 13990 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-03-05 06:50:52,254-Speed 9321.47 samples/sec Loss 3.4131 LearningRate 0.0008 Epoch: 8 Global Step: 14000 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-03-05 06:51:18,495-Speed 9365.83 samples/sec Loss 3.4156 LearningRate 0.0008 Epoch: 8 Global Step: 14010 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-03-05 06:51:44,722-Speed 9370.84 samples/sec Loss 3.4036 LearningRate 0.0008 Epoch: 8 Global Step: 14020 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-03-05 06:52:10,934-Speed 9376.41 samples/sec Loss 3.4135 LearningRate 0.0008 Epoch: 8 Global Step: 14030 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-03-05 06:52:37,040-Speed 9414.22 samples/sec Loss 3.4132 LearningRate 0.0008 Epoch: 8 Global Step: 14040 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-03-05 06:53:03,268-Speed 9370.74 samples/sec Loss 3.4121 LearningRate 0.0008 Epoch: 8 Global Step: 14050 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-03-05 06:53:29,503-Speed 9367.71 samples/sec Loss 3.4257 LearningRate 0.0008 Epoch: 8 Global Step: 14060 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-03-05 06:53:55,657-Speed 9396.92 samples/sec Loss 3.4334 LearningRate 0.0008 Epoch: 8 Global Step: 14070 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-03-05 06:54:21,867-Speed 9377.07 samples/sec Loss 3.4208 LearningRate 0.0008 Epoch: 8 Global Step: 14080 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-03-05 06:54:48,071-Speed 9379.27 samples/sec Loss 3.3963 LearningRate 0.0008 Epoch: 8 Global Step: 14090 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-03-05 06:55:14,329-Speed 9359.81 samples/sec Loss 3.4024 LearningRate 0.0008 Epoch: 8 Global Step: 14100 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-03-05 06:55:40,606-Speed 9352.87 samples/sec Loss 3.4206 LearningRate 0.0008 Epoch: 8 Global Step: 14110 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-03-05 06:56:06,972-Speed 9321.75 samples/sec Loss 3.4153 LearningRate 0.0008 Epoch: 8 Global Step: 14120 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-03-05 06:56:33,220-Speed 9363.44 samples/sec Loss 3.3754 LearningRate 0.0008 Epoch: 8 Global Step: 14130 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-03-05 06:56:59,470-Speed 9362.56 samples/sec Loss 3.3855 LearningRate 0.0008 Epoch: 8 Global Step: 14140 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-03-05 06:57:25,794-Speed 9336.53 samples/sec Loss 3.3989 LearningRate 0.0008 Epoch: 8 Global Step: 14150 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-03-05 06:57:52,119-Speed 9335.93 samples/sec Loss 3.3955 LearningRate 0.0008 Epoch: 8 Global Step: 14160 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-03-05 06:58:18,350-Speed 9369.52 samples/sec Loss 3.4185 LearningRate 0.0008 Epoch: 8 Global Step: 14170 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-03-05 06:58:44,644-Speed 9347.19 samples/sec Loss 3.3853 LearningRate 0.0008 Epoch: 8 Global Step: 14180 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-03-05 06:59:10,862-Speed 9374.23 samples/sec Loss 3.4351 LearningRate 0.0008 Epoch: 8 Global Step: 14190 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-03-05 06:59:37,219-Speed 9324.52 samples/sec Loss 3.4059 LearningRate 0.0008 Epoch: 8 Global Step: 14200 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-03-05 07:00:03,592-Speed 9318.87 samples/sec Loss 3.3777 LearningRate 0.0008 Epoch: 8 Global Step: 14210 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-03-05 07:00:29,775-Speed 9386.89 samples/sec Loss 3.3784 LearningRate 0.0008 Epoch: 8 Global Step: 14220 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-03-05 07:00:56,021-Speed 9364.13 samples/sec Loss 3.4086 LearningRate 0.0008 Epoch: 8 Global Step: 14230 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-03-05 07:01:22,232-Speed 9376.38 samples/sec Loss 3.3972 LearningRate 0.0008 Epoch: 8 Global Step: 14240 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-03-05 07:01:48,495-Speed 9358.25 samples/sec Loss 3.4212 LearningRate 0.0008 Epoch: 8 Global Step: 14250 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-03-05 07:02:14,735-Speed 9366.33 samples/sec Loss 3.3728 LearningRate 0.0008 Epoch: 8 Global Step: 14260 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-03-05 07:02:40,960-Speed 9371.48 samples/sec Loss 3.3756 LearningRate 0.0008 Epoch: 8 Global Step: 14270 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-03-05 07:03:07,243-Speed 9351.07 samples/sec Loss 3.3556 LearningRate 0.0008 Epoch: 8 Global Step: 14280 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-03-05 07:03:33,509-Speed 9356.91 samples/sec Loss 3.3547 LearningRate 0.0008 Epoch: 8 Global Step: 14290 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-03-05 07:03:59,841-Speed 9333.71 samples/sec Loss 3.3628 LearningRate 0.0008 Epoch: 8 Global Step: 14300 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-03-05 07:04:26,139-Speed 9345.54 samples/sec Loss 3.3461 LearningRate 0.0008 Epoch: 8 Global Step: 14310 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-03-05 07:04:52,341-Speed 9380.09 samples/sec Loss 3.3653 LearningRate 0.0008 Epoch: 8 Global Step: 14320 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-03-05 07:05:18,469-Speed 9406.25 samples/sec Loss 3.3367 LearningRate 0.0008 Epoch: 8 Global Step: 14330 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-03-05 07:05:44,755-Speed 9349.74 samples/sec Loss 3.3870 LearningRate 0.0008 Epoch: 8 Global Step: 14340 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-03-05 07:06:10,997-Speed 9365.71 samples/sec Loss 3.3877 LearningRate 0.0008 Epoch: 8 Global Step: 14350 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-03-05 07:06:37,294-Speed 9346.05 samples/sec Loss 3.3529 LearningRate 0.0008 Epoch: 8 Global Step: 14360 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-03-05 07:07:03,502-Speed 9377.80 samples/sec Loss 3.3236 LearningRate 0.0008 Epoch: 8 Global Step: 14370 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-03-05 07:07:29,808-Speed 9342.68 samples/sec Loss 3.3520 LearningRate 0.0008 Epoch: 8 Global Step: 14380 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-03-05 07:07:56,126-Speed 9338.77 samples/sec Loss 3.3409 LearningRate 0.0008 Epoch: 8 Global Step: 14390 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-03-05 07:08:22,347-Speed 9373.16 samples/sec Loss 3.3367 LearningRate 0.0008 Epoch: 8 Global Step: 14400 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-03-05 07:08:48,538-Speed 9383.68 samples/sec Loss 3.3466 LearningRate 0.0008 Epoch: 8 Global Step: 14410 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-03-05 07:09:14,845-Speed 9342.81 samples/sec Loss 3.3640 LearningRate 0.0008 Epoch: 8 Global Step: 14420 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-03-05 07:09:41,176-Speed 9333.82 samples/sec Loss 3.3390 LearningRate 0.0008 Epoch: 8 Global Step: 14430 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-03-05 07:10:07,527-Speed 9326.99 samples/sec Loss 3.3688 LearningRate 0.0008 Epoch: 8 Global Step: 14440 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-03-05 07:10:33,883-Speed 9325.13 samples/sec Loss 3.3279 LearningRate 0.0008 Epoch: 8 Global Step: 14450 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-03-05 07:11:00,276-Speed 9311.92 samples/sec Loss 3.3196 LearningRate 0.0008 Epoch: 8 Global Step: 14460 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-03-05 07:11:26,568-Speed 9347.66 samples/sec Loss 3.3353 LearningRate 0.0008 Epoch: 8 Global Step: 14470 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-03-05 07:11:52,846-Speed 9352.80 samples/sec Loss 3.2998 LearningRate 0.0008 Epoch: 8 Global Step: 14480 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-03-05 07:12:19,064-Speed 9373.97 samples/sec Loss 3.3514 LearningRate 0.0008 Epoch: 8 Global Step: 14490 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-03-05 07:12:45,375-Speed 9341.10 samples/sec Loss 3.3211 LearningRate 0.0008 Epoch: 8 Global Step: 14500 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-03-05 07:13:11,553-Speed 9388.58 samples/sec Loss 3.3354 LearningRate 0.0008 Epoch: 8 Global Step: 14510 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-03-05 07:13:37,817-Speed 9358.16 samples/sec Loss 3.3786 LearningRate 0.0008 Epoch: 8 Global Step: 14520 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-03-05 07:14:04,024-Speed 9378.11 samples/sec Loss 3.3333 LearningRate 0.0008 Epoch: 8 Global Step: 14530 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-03-05 07:14:30,274-Speed 9362.68 samples/sec Loss 3.3116 LearningRate 0.0008 Epoch: 8 Global Step: 14540 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-03-05 07:14:56,548-Speed 9353.89 samples/sec Loss 3.2889 LearningRate 0.0008 Epoch: 8 Global Step: 14550 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-03-05 07:15:22,753-Speed 9379.07 samples/sec Loss 3.2894 LearningRate 0.0008 Epoch: 8 Global Step: 14560 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-03-05 07:15:49,113-Speed 9323.95 samples/sec Loss 3.2919 LearningRate 0.0008 Epoch: 8 Global Step: 14570 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-03-05 07:16:15,331-Speed 9374.22 samples/sec Loss 3.3368 LearningRate 0.0008 Epoch: 8 Global Step: 14580 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-03-05 07:16:41,481-Speed 9398.42 samples/sec Loss 3.3319 LearningRate 0.0008 Epoch: 8 Global Step: 14590 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-03-05 07:17:07,594-Speed 9411.76 samples/sec Loss 3.3328 LearningRate 0.0008 Epoch: 8 Global Step: 14600 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-03-05 07:17:33,792-Speed 9381.52 samples/sec Loss 3.3243 LearningRate 0.0008 Epoch: 8 Global Step: 14610 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-03-05 07:17:59,923-Speed 9405.33 samples/sec Loss 3.3287 LearningRate 0.0008 Epoch: 8 Global Step: 14620 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-03-05 07:18:26,173-Speed 9362.89 samples/sec Loss 3.3330 LearningRate 0.0008 Epoch: 8 Global Step: 14630 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-03-05 07:18:52,281-Speed 9413.64 samples/sec Loss 3.2714 LearningRate 0.0008 Epoch: 8 Global Step: 14640 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-03-05 07:19:18,377-Speed 9418.10 samples/sec Loss 3.2877 LearningRate 0.0008 Epoch: 8 Global Step: 14650 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-03-05 07:19:44,553-Speed 9389.22 samples/sec Loss 3.2963 LearningRate 0.0008 Epoch: 8 Global Step: 14660 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-03-05 07:20:10,694-Speed 9401.52 samples/sec Loss 3.2704 LearningRate 0.0008 Epoch: 8 Global Step: 14670 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-03-05 07:20:36,832-Speed 9403.00 samples/sec Loss 3.2906 LearningRate 0.0008 Epoch: 8 Global Step: 14680 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-03-05 07:21:02,977-Speed 9400.18 samples/sec Loss 3.2997 LearningRate 0.0008 Epoch: 8 Global Step: 14690 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-03-05 07:21:29,076-Speed 9416.96 samples/sec Loss 3.2867 LearningRate 0.0008 Epoch: 8 Global Step: 14700 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-03-05 07:21:55,229-Speed 9397.28 samples/sec Loss 3.2635 LearningRate 0.0008 Epoch: 8 Global Step: 14710 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-03-05 07:22:21,420-Speed 9384.04 samples/sec Loss 3.2727 LearningRate 0.0008 Epoch: 8 Global Step: 14720 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-03-05 07:22:47,520-Speed 9416.32 samples/sec Loss 3.2823 LearningRate 0.0008 Epoch: 8 Global Step: 14730 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-03-05 07:23:13,666-Speed 9400.21 samples/sec Loss 3.3053 LearningRate 0.0008 Epoch: 8 Global Step: 14740 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-03-05 07:23:39,861-Speed 9383.02 samples/sec Loss 3.3116 LearningRate 0.0008 Epoch: 8 Global Step: 14750 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-03-05 07:24:06,007-Speed 9399.96 samples/sec Loss 3.2857 LearningRate 0.0008 Epoch: 8 Global Step: 14760 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-03-05 07:24:32,116-Speed 9413.36 samples/sec Loss 3.2849 LearningRate 0.0008 Epoch: 8 Global Step: 14770 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-03-05 07:24:58,221-Speed 9415.01 samples/sec Loss 3.2729 LearningRate 0.0008 Epoch: 8 Global Step: 14780 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-03-05 07:25:24,326-Speed 9414.70 samples/sec Loss 3.2468 LearningRate 0.0008 Epoch: 8 Global Step: 14790 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-03-05 07:25:50,400-Speed 9426.04 samples/sec Loss 3.2588 LearningRate 0.0008 Epoch: 8 Global Step: 14800 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-03-05 07:26:16,535-Speed 9404.16 samples/sec Loss 3.2524 LearningRate 0.0008 Epoch: 8 Global Step: 14810 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-03-05 07:26:42,705-Speed 9391.34 samples/sec Loss 3.2882 LearningRate 0.0008 Epoch: 8 Global Step: 14820 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-03-05 07:27:08,825-Speed 9409.26 samples/sec Loss 3.2802 LearningRate 0.0008 Epoch: 8 Global Step: 14830 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-03-05 07:27:34,960-Speed 9403.83 samples/sec Loss 3.2593 LearningRate 0.0008 Epoch: 8 Global Step: 14840 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-03-05 07:28:01,101-Speed 9401.93 samples/sec Loss 3.2722 LearningRate 0.0008 Epoch: 8 Global Step: 14850 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-03-05 07:28:27,246-Speed 9400.36 samples/sec Loss 3.2574 LearningRate 0.0008 Epoch: 8 Global Step: 14860 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-03-05 07:28:53,493-Speed 9363.78 samples/sec Loss 3.2720 LearningRate 0.0008 Epoch: 8 Global Step: 14870 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-03-05 07:29:19,634-Speed 9401.95 samples/sec Loss 3.2669 LearningRate 0.0008 Epoch: 8 Global Step: 14880 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-03-05 07:29:45,935-Speed 9344.51 samples/sec Loss 3.2746 LearningRate 0.0008 Epoch: 8 Global Step: 14890 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-03-05 07:30:12,081-Speed 9400.05 samples/sec Loss 3.2458 LearningRate 0.0008 Epoch: 8 Global Step: 14900 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-03-05 07:30:38,244-Speed 9393.84 samples/sec Loss 3.2525 LearningRate 0.0008 Epoch: 8 Global Step: 14910 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-03-05 07:31:04,345-Speed 9415.99 samples/sec Loss 3.2942 LearningRate 0.0008 Epoch: 8 Global Step: 14920 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-03-05 07:31:30,511-Speed 9392.90 samples/sec Loss 3.2507 LearningRate 0.0008 Epoch: 8 Global Step: 14930 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-03-05 07:31:56,609-Speed 9417.11 samples/sec Loss 3.2510 LearningRate 0.0008 Epoch: 8 Global Step: 14940 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-03-05 07:32:22,692-Speed 9422.60 samples/sec Loss 3.2242 LearningRate 0.0008 Epoch: 8 Global Step: 14950 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-03-05 07:32:48,862-Speed 9391.52 samples/sec Loss 3.2485 LearningRate 0.0008 Epoch: 8 Global Step: 14960 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-03-05 07:33:14,990-Speed 9406.35 samples/sec Loss 3.2569 LearningRate 0.0008 Epoch: 8 Global Step: 14970 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-03-05 07:33:41,191-Speed 9380.39 samples/sec Loss 3.2587 LearningRate 0.0008 Epoch: 8 Global Step: 14980 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-03-05 07:34:07,439-Speed 9363.46 samples/sec Loss 3.2562 LearningRate 0.0008 Epoch: 8 Global Step: 14990 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-03-05 07:34:33,527-Speed 9420.59 samples/sec Loss 3.2714 LearningRate 0.0008 Epoch: 8 Global Step: 15000 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-03-05 07:34:59,702-Speed 9389.86 samples/sec Loss 3.2206 LearningRate 0.0008 Epoch: 8 Global Step: 15010 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-03-05 07:35:25,881-Speed 9387.87 samples/sec Loss 3.2320 LearningRate 0.0008 Epoch: 8 Global Step: 15020 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-03-05 07:35:52,098-Speed 9374.19 samples/sec Loss 3.2518 LearningRate 0.0008 Epoch: 8 Global Step: 15030 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-03-05 07:36:18,189-Speed 9419.81 samples/sec Loss 3.2321 LearningRate 0.0008 Epoch: 8 Global Step: 15040 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-03-05 07:36:44,243-Speed 9433.04 samples/sec Loss 3.2069 LearningRate 0.0008 Epoch: 8 Global Step: 15050 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-03-05 07:37:10,399-Speed 9396.23 samples/sec Loss 3.2311 LearningRate 0.0008 Epoch: 8 Global Step: 15060 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-03-05 07:37:36,560-Speed 9394.66 samples/sec Loss 3.2498 LearningRate 0.0008 Epoch: 8 Global Step: 15070 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-03-05 07:38:02,659-Speed 9416.70 samples/sec Loss 3.2269 LearningRate 0.0008 Epoch: 8 Global Step: 15080 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-03-05 07:38:28,809-Speed 9398.89 samples/sec Loss 3.2257 LearningRate 0.0008 Epoch: 8 Global Step: 15090 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-03-05 07:38:55,001-Speed 9383.18 samples/sec Loss 3.1937 LearningRate 0.0008 Epoch: 8 Global Step: 15100 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-03-05 07:39:21,295-Speed 9347.11 samples/sec Loss 3.1960 LearningRate 0.0008 Epoch: 8 Global Step: 15110 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-03-05 07:39:47,529-Speed 9368.28 samples/sec Loss 3.2047 LearningRate 0.0008 Epoch: 8 Global Step: 15120 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-03-05 07:40:13,737-Speed 9377.94 samples/sec Loss 3.2201 LearningRate 0.0008 Epoch: 8 Global Step: 15130 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-03-05 07:40:39,962-Speed 9371.62 samples/sec Loss 3.2234 LearningRate 0.0008 Epoch: 8 Global Step: 15140 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-03-05 07:41:06,171-Speed 9377.38 samples/sec Loss 3.1935 LearningRate 0.0008 Epoch: 8 Global Step: 15150 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-03-05 07:41:32,423-Speed 9361.95 samples/sec Loss 3.1971 LearningRate 0.0008 Epoch: 8 Global Step: 15160 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-03-05 07:41:58,594-Speed 9391.13 samples/sec Loss 3.2164 LearningRate 0.0008 Epoch: 8 Global Step: 15170 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-03-05 07:42:24,756-Speed 9394.21 samples/sec Loss 3.2310 LearningRate 0.0008 Epoch: 8 Global Step: 15180 Fp16 Grad Scale: 262144 Required: 40 hours Training: 2022-03-05 07:42:50,919-Speed 9393.66 samples/sec Loss 3.1851 LearningRate 0.0008 Epoch: 8 Global Step: 15190 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-03-05 07:43:17,259-Speed 9330.63 samples/sec Loss 3.1924 LearningRate 0.0008 Epoch: 8 Global Step: 15200 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-03-05 07:43:43,533-Speed 9354.32 samples/sec Loss 3.1934 LearningRate 0.0008 Epoch: 8 Global Step: 15210 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-03-05 07:44:09,744-Speed 9376.80 samples/sec Loss 3.2017 LearningRate 0.0008 Epoch: 8 Global Step: 15220 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-03-05 07:44:35,935-Speed 9383.82 samples/sec Loss 3.1937 LearningRate 0.0008 Epoch: 8 Global Step: 15230 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-03-05 07:45:02,130-Speed 9382.39 samples/sec Loss 3.1985 LearningRate 0.0008 Epoch: 8 Global Step: 15240 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-03-05 07:45:28,345-Speed 9375.28 samples/sec Loss 3.1966 LearningRate 0.0007 Epoch: 8 Global Step: 15250 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-03-05 07:45:54,585-Speed 9366.02 samples/sec Loss 3.2228 LearningRate 0.0007 Epoch: 8 Global Step: 15260 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-03-05 07:46:20,699-Speed 9411.50 samples/sec Loss 3.1849 LearningRate 0.0007 Epoch: 8 Global Step: 15270 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-03-05 07:46:46,853-Speed 9396.88 samples/sec Loss 3.2044 LearningRate 0.0007 Epoch: 8 Global Step: 15280 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-03-05 07:47:13,061-Speed 9378.00 samples/sec Loss 3.1780 LearningRate 0.0007 Epoch: 8 Global Step: 15290 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-03-05 07:47:39,211-Speed 9398.54 samples/sec Loss 3.1843 LearningRate 0.0007 Epoch: 8 Global Step: 15300 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-03-05 07:48:05,343-Speed 9405.11 samples/sec Loss 3.2191 LearningRate 0.0007 Epoch: 8 Global Step: 15310 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-03-05 07:48:31,611-Speed 9356.42 samples/sec Loss 3.2112 LearningRate 0.0007 Epoch: 8 Global Step: 15320 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-03-05 07:48:57,876-Speed 9357.35 samples/sec Loss 3.1943 LearningRate 0.0007 Epoch: 8 Global Step: 15330 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-03-05 07:49:24,037-Speed 9394.53 samples/sec Loss 3.1756 LearningRate 0.0007 Epoch: 8 Global Step: 15340 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-03-05 07:49:50,178-Speed 9402.05 samples/sec Loss 3.1782 LearningRate 0.0007 Epoch: 8 Global Step: 15350 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-03-05 07:50:16,373-Speed 9382.34 samples/sec Loss 3.1768 LearningRate 0.0007 Epoch: 8 Global Step: 15360 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-03-05 07:50:42,644-Speed 9355.07 samples/sec Loss 3.1806 LearningRate 0.0007 Epoch: 8 Global Step: 15370 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-03-05 07:51:08,823-Speed 9388.04 samples/sec Loss 3.1735 LearningRate 0.0007 Epoch: 8 Global Step: 15380 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-03-05 07:51:35,000-Speed 9388.67 samples/sec Loss 3.1783 LearningRate 0.0007 Epoch: 8 Global Step: 15390 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-03-05 07:52:01,199-Speed 9381.00 samples/sec Loss 3.2170 LearningRate 0.0007 Epoch: 8 Global Step: 15400 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-03-05 07:52:27,339-Speed 9402.25 samples/sec Loss 3.1750 LearningRate 0.0007 Epoch: 8 Global Step: 15410 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-03-05 07:52:53,517-Speed 9388.18 samples/sec Loss 3.1415 LearningRate 0.0007 Epoch: 8 Global Step: 15420 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-03-05 07:53:19,687-Speed 9391.38 samples/sec Loss 3.1730 LearningRate 0.0007 Epoch: 8 Global Step: 15430 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-03-05 07:53:45,837-Speed 9398.63 samples/sec Loss 3.1888 LearningRate 0.0007 Epoch: 8 Global Step: 15440 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-03-05 07:54:11,990-Speed 9397.40 samples/sec Loss 3.1697 LearningRate 0.0007 Epoch: 8 Global Step: 15450 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-03-05 07:54:38,173-Speed 9386.44 samples/sec Loss 3.1863 LearningRate 0.0007 Epoch: 8 Global Step: 15460 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-03-05 07:55:04,361-Speed 9385.06 samples/sec Loss 3.1882 LearningRate 0.0007 Epoch: 8 Global Step: 15470 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-03-05 07:55:30,475-Speed 9411.44 samples/sec Loss 3.1687 LearningRate 0.0007 Epoch: 8 Global Step: 15480 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-03-05 07:55:56,635-Speed 9394.95 samples/sec Loss 3.1669 LearningRate 0.0007 Epoch: 8 Global Step: 15490 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-03-05 07:56:22,822-Speed 9385.12 samples/sec Loss 3.1795 LearningRate 0.0007 Epoch: 8 Global Step: 15500 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-03-05 07:56:49,128-Speed 9342.72 samples/sec Loss 3.1821 LearningRate 0.0007 Epoch: 8 Global Step: 15510 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-03-05 07:57:15,326-Speed 9381.35 samples/sec Loss 3.2000 LearningRate 0.0007 Epoch: 8 Global Step: 15520 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-03-05 07:57:41,532-Speed 9378.24 samples/sec Loss 3.1848 LearningRate 0.0007 Epoch: 8 Global Step: 15530 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-03-05 07:58:07,774-Speed 9366.01 samples/sec Loss 3.1626 LearningRate 0.0007 Epoch: 8 Global Step: 15540 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-03-05 07:58:33,959-Speed 9385.95 samples/sec Loss 3.2091 LearningRate 0.0007 Epoch: 8 Global Step: 15550 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-03-05 07:59:53,376-Speed 3094.61 samples/sec Loss 3.1812 LearningRate 0.0007 Epoch: 9 Global Step: 15560 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-03-05 08:00:19,323-Speed 9472.12 samples/sec Loss 3.1202 LearningRate 0.0007 Epoch: 9 Global Step: 15570 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-03-05 08:00:45,460-Speed 9403.40 samples/sec Loss 3.0903 LearningRate 0.0007 Epoch: 9 Global Step: 15580 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:01:11,599-Speed 9402.33 samples/sec Loss 3.1231 LearningRate 0.0007 Epoch: 9 Global Step: 15590 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:01:37,671-Speed 9426.83 samples/sec Loss 3.1271 LearningRate 0.0007 Epoch: 9 Global Step: 15600 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:02:03,790-Speed 9409.48 samples/sec Loss 3.1007 LearningRate 0.0007 Epoch: 9 Global Step: 15610 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:02:29,843-Speed 9433.42 samples/sec Loss 3.1237 LearningRate 0.0007 Epoch: 9 Global Step: 15620 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:02:55,949-Speed 9414.59 samples/sec Loss 3.0881 LearningRate 0.0007 Epoch: 9 Global Step: 15630 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:03:22,085-Speed 9403.24 samples/sec Loss 3.1384 LearningRate 0.0007 Epoch: 9 Global Step: 15640 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:03:48,248-Speed 9393.89 samples/sec Loss 3.1171 LearningRate 0.0007 Epoch: 9 Global Step: 15650 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:04:14,303-Speed 9432.98 samples/sec Loss 3.1397 LearningRate 0.0007 Epoch: 9 Global Step: 15660 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:04:40,396-Speed 9418.80 samples/sec Loss 3.1069 LearningRate 0.0007 Epoch: 9 Global Step: 15670 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-03-05 08:05:06,482-Speed 9421.78 samples/sec Loss 3.1185 LearningRate 0.0007 Epoch: 9 Global Step: 15680 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-03-05 08:05:32,513-Speed 9441.30 samples/sec Loss 3.1532 LearningRate 0.0007 Epoch: 9 Global Step: 15690 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:05:58,553-Speed 9438.32 samples/sec Loss 3.1230 LearningRate 0.0007 Epoch: 9 Global Step: 15700 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:06:24,554-Speed 9452.54 samples/sec Loss 3.1192 LearningRate 0.0007 Epoch: 9 Global Step: 15710 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:06:50,639-Speed 9422.18 samples/sec Loss 3.1188 LearningRate 0.0007 Epoch: 9 Global Step: 15720 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:07:16,716-Speed 9424.63 samples/sec Loss 3.1216 LearningRate 0.0007 Epoch: 9 Global Step: 15730 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:07:42,814-Speed 9417.40 samples/sec Loss 3.1094 LearningRate 0.0007 Epoch: 9 Global Step: 15740 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:08:09,010-Speed 9382.03 samples/sec Loss 3.1319 LearningRate 0.0007 Epoch: 9 Global Step: 15750 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:08:35,103-Speed 9419.25 samples/sec Loss 3.1117 LearningRate 0.0007 Epoch: 9 Global Step: 15760 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:09:01,259-Speed 9396.46 samples/sec Loss 3.1127 LearningRate 0.0007 Epoch: 9 Global Step: 15770 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:09:27,378-Speed 9409.75 samples/sec Loss 3.1381 LearningRate 0.0007 Epoch: 9 Global Step: 15780 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:09:53,509-Speed 9405.76 samples/sec Loss 3.0858 LearningRate 0.0007 Epoch: 9 Global Step: 15790 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-03-05 08:10:19,664-Speed 9396.84 samples/sec Loss 3.1102 LearningRate 0.0007 Epoch: 9 Global Step: 15800 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-03-05 08:10:45,819-Speed 9396.57 samples/sec Loss 3.0933 LearningRate 0.0007 Epoch: 9 Global Step: 15810 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-03-05 08:11:12,098-Speed 9352.71 samples/sec Loss 3.1239 LearningRate 0.0007 Epoch: 9 Global Step: 15820 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-03-05 08:11:38,306-Speed 9377.83 samples/sec Loss 3.1005 LearningRate 0.0007 Epoch: 9 Global Step: 15830 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-03-05 08:12:04,478-Speed 9390.63 samples/sec Loss 3.1120 LearningRate 0.0007 Epoch: 9 Global Step: 15840 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-03-05 08:12:30,624-Speed 9399.62 samples/sec Loss 3.1011 LearningRate 0.0007 Epoch: 9 Global Step: 15850 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:12:56,796-Speed 9391.07 samples/sec Loss 3.1059 LearningRate 0.0007 Epoch: 9 Global Step: 15860 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:13:22,913-Speed 9410.49 samples/sec Loss 3.0949 LearningRate 0.0007 Epoch: 9 Global Step: 15870 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:13:49,071-Speed 9395.44 samples/sec Loss 3.0945 LearningRate 0.0007 Epoch: 9 Global Step: 15880 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:14:15,287-Speed 9374.96 samples/sec Loss 3.1159 LearningRate 0.0007 Epoch: 9 Global Step: 15890 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:14:41,526-Speed 9366.97 samples/sec Loss 3.1021 LearningRate 0.0007 Epoch: 9 Global Step: 15900 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:15:07,679-Speed 9397.63 samples/sec Loss 3.1105 LearningRate 0.0007 Epoch: 9 Global Step: 15910 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:15:33,916-Speed 9367.08 samples/sec Loss 3.1044 LearningRate 0.0007 Epoch: 9 Global Step: 15920 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:16:00,089-Speed 9390.00 samples/sec Loss 3.0999 LearningRate 0.0007 Epoch: 9 Global Step: 15930 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:16:26,259-Speed 9391.50 samples/sec Loss 3.0826 LearningRate 0.0007 Epoch: 9 Global Step: 15940 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:16:52,364-Speed 9414.76 samples/sec Loss 3.0919 LearningRate 0.0007 Epoch: 9 Global Step: 15950 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-03-05 08:17:18,488-Speed 9407.86 samples/sec Loss 3.0644 LearningRate 0.0007 Epoch: 9 Global Step: 15960 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-03-05 08:17:44,677-Speed 9384.57 samples/sec Loss 3.0652 LearningRate 0.0007 Epoch: 9 Global Step: 15970 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-03-05 08:18:10,883-Speed 9378.27 samples/sec Loss 3.0758 LearningRate 0.0007 Epoch: 9 Global Step: 15980 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-03-05 08:18:37,068-Speed 9385.87 samples/sec Loss 3.0803 LearningRate 0.0007 Epoch: 9 Global Step: 15990 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-03-05 08:19:03,264-Speed 9382.28 samples/sec Loss 3.0964 LearningRate 0.0007 Epoch: 9 Global Step: 16000 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-03-05 08:19:29,484-Speed 9373.30 samples/sec Loss 3.1086 LearningRate 0.0007 Epoch: 9 Global Step: 16010 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-03-05 08:19:55,714-Speed 9369.93 samples/sec Loss 3.0965 LearningRate 0.0007 Epoch: 9 Global Step: 16020 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-03-05 08:20:22,026-Speed 9340.77 samples/sec Loss 3.0666 LearningRate 0.0007 Epoch: 9 Global Step: 16030 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-03-05 08:20:48,155-Speed 9405.82 samples/sec Loss 3.0581 LearningRate 0.0007 Epoch: 9 Global Step: 16040 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:21:14,363-Speed 9377.86 samples/sec Loss 3.0784 LearningRate 0.0007 Epoch: 9 Global Step: 16050 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:21:40,661-Speed 9345.60 samples/sec Loss 3.0557 LearningRate 0.0007 Epoch: 9 Global Step: 16060 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:22:06,841-Speed 9387.88 samples/sec Loss 3.0783 LearningRate 0.0007 Epoch: 9 Global Step: 16070 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:22:33,103-Speed 9358.39 samples/sec Loss 3.1065 LearningRate 0.0007 Epoch: 9 Global Step: 16080 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:22:59,341-Speed 9367.05 samples/sec Loss 3.0866 LearningRate 0.0007 Epoch: 9 Global Step: 16090 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:23:25,574-Speed 9368.78 samples/sec Loss 3.0569 LearningRate 0.0007 Epoch: 9 Global Step: 16100 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:23:51,815-Speed 9366.02 samples/sec Loss 3.0559 LearningRate 0.0007 Epoch: 9 Global Step: 16110 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:24:17,969-Speed 9396.92 samples/sec Loss 3.0299 LearningRate 0.0007 Epoch: 9 Global Step: 16120 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:24:44,240-Speed 9355.34 samples/sec Loss 3.0363 LearningRate 0.0007 Epoch: 9 Global Step: 16130 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:25:10,349-Speed 9413.33 samples/sec Loss 3.0381 LearningRate 0.0007 Epoch: 9 Global Step: 16140 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-03-05 08:25:36,643-Speed 9347.20 samples/sec Loss 3.0678 LearningRate 0.0007 Epoch: 9 Global Step: 16150 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-03-05 08:26:02,911-Speed 9356.59 samples/sec Loss 3.0430 LearningRate 0.0007 Epoch: 9 Global Step: 16160 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-03-05 08:26:29,017-Speed 9414.32 samples/sec Loss 3.0750 LearningRate 0.0007 Epoch: 9 Global Step: 16170 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-03-05 08:26:55,164-Speed 9399.90 samples/sec Loss 3.0746 LearningRate 0.0007 Epoch: 9 Global Step: 16180 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-03-05 08:27:21,277-Speed 9411.66 samples/sec Loss 3.0477 LearningRate 0.0007 Epoch: 9 Global Step: 16190 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-03-05 08:27:47,402-Speed 9407.62 samples/sec Loss 3.0435 LearningRate 0.0007 Epoch: 9 Global Step: 16200 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-03-05 08:28:13,555-Speed 9397.52 samples/sec Loss 3.0406 LearningRate 0.0007 Epoch: 9 Global Step: 16210 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-03-05 08:28:39,764-Speed 9377.13 samples/sec Loss 3.0599 LearningRate 0.0007 Epoch: 9 Global Step: 16220 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-03-05 08:29:05,942-Speed 9388.52 samples/sec Loss 3.0609 LearningRate 0.0007 Epoch: 9 Global Step: 16230 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-03-05 08:29:32,026-Speed 9422.25 samples/sec Loss 3.0360 LearningRate 0.0007 Epoch: 9 Global Step: 16240 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-03-05 08:29:58,117-Speed 9419.79 samples/sec Loss 3.0184 LearningRate 0.0007 Epoch: 9 Global Step: 16250 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-03-05 08:30:24,219-Speed 9415.80 samples/sec Loss 3.0477 LearningRate 0.0007 Epoch: 9 Global Step: 16260 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-03-05 08:30:50,384-Speed 9392.90 samples/sec Loss 3.0331 LearningRate 0.0007 Epoch: 9 Global Step: 16270 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-03-05 08:31:16,571-Speed 9385.45 samples/sec Loss 3.0304 LearningRate 0.0007 Epoch: 9 Global Step: 16280 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-03-05 08:31:42,729-Speed 9395.69 samples/sec Loss 3.0245 LearningRate 0.0007 Epoch: 9 Global Step: 16290 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-03-05 08:32:08,888-Speed 9395.36 samples/sec Loss 3.0443 LearningRate 0.0007 Epoch: 9 Global Step: 16300 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-03-05 08:32:34,950-Speed 9430.21 samples/sec Loss 3.0837 LearningRate 0.0007 Epoch: 9 Global Step: 16310 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:33:01,032-Speed 9422.72 samples/sec Loss 3.0492 LearningRate 0.0007 Epoch: 9 Global Step: 16320 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:33:27,148-Speed 9411.04 samples/sec Loss 3.0330 LearningRate 0.0007 Epoch: 9 Global Step: 16330 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:33:53,287-Speed 9402.43 samples/sec Loss 3.0146 LearningRate 0.0007 Epoch: 9 Global Step: 16340 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:34:19,446-Speed 9395.21 samples/sec Loss 3.0164 LearningRate 0.0007 Epoch: 9 Global Step: 16350 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:34:45,646-Speed 9380.40 samples/sec Loss 3.0126 LearningRate 0.0007 Epoch: 9 Global Step: 16360 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:35:11,766-Speed 9409.57 samples/sec Loss 3.0235 LearningRate 0.0007 Epoch: 9 Global Step: 16370 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:35:37,918-Speed 9397.82 samples/sec Loss 3.0383 LearningRate 0.0007 Epoch: 9 Global Step: 16380 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:36:04,012-Speed 9418.66 samples/sec Loss 3.0221 LearningRate 0.0007 Epoch: 9 Global Step: 16390 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:36:30,100-Speed 9420.72 samples/sec Loss 3.0070 LearningRate 0.0007 Epoch: 9 Global Step: 16400 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:36:56,299-Speed 9380.79 samples/sec Loss 3.0024 LearningRate 0.0007 Epoch: 9 Global Step: 16410 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:37:22,514-Speed 9375.14 samples/sec Loss 3.0245 LearningRate 0.0007 Epoch: 9 Global Step: 16420 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:37:48,706-Speed 9383.66 samples/sec Loss 2.9973 LearningRate 0.0007 Epoch: 9 Global Step: 16430 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:38:14,758-Speed 9434.75 samples/sec Loss 3.0027 LearningRate 0.0007 Epoch: 9 Global Step: 16440 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:38:40,898-Speed 9402.24 samples/sec Loss 3.0252 LearningRate 0.0007 Epoch: 9 Global Step: 16450 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:39:06,988-Speed 9420.07 samples/sec Loss 3.0157 LearningRate 0.0007 Epoch: 9 Global Step: 16460 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:39:33,123-Speed 9404.24 samples/sec Loss 2.9890 LearningRate 0.0007 Epoch: 9 Global Step: 16470 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:39:59,343-Speed 9373.46 samples/sec Loss 3.0053 LearningRate 0.0007 Epoch: 9 Global Step: 16480 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:40:25,591-Speed 9363.21 samples/sec Loss 3.0051 LearningRate 0.0007 Epoch: 9 Global Step: 16490 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:40:51,734-Speed 9401.00 samples/sec Loss 2.9925 LearningRate 0.0007 Epoch: 9 Global Step: 16500 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:41:17,883-Speed 9398.94 samples/sec Loss 2.9981 LearningRate 0.0007 Epoch: 9 Global Step: 16510 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-03-05 08:41:43,952-Speed 9427.79 samples/sec Loss 2.9865 LearningRate 0.0007 Epoch: 9 Global Step: 16520 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-03-05 08:42:10,093-Speed 9401.45 samples/sec Loss 3.0056 LearningRate 0.0007 Epoch: 9 Global Step: 16530 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-03-05 08:42:36,232-Speed 9402.64 samples/sec Loss 2.9794 LearningRate 0.0007 Epoch: 9 Global Step: 16540 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-03-05 08:43:02,297-Speed 9429.15 samples/sec Loss 2.9878 LearningRate 0.0007 Epoch: 9 Global Step: 16550 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:43:28,450-Speed 9397.49 samples/sec Loss 2.9710 LearningRate 0.0007 Epoch: 9 Global Step: 16560 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:43:54,642-Speed 9384.08 samples/sec Loss 3.0190 LearningRate 0.0007 Epoch: 9 Global Step: 16570 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:44:20,812-Speed 9391.13 samples/sec Loss 3.0196 LearningRate 0.0007 Epoch: 9 Global Step: 16580 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:44:46,901-Speed 9420.74 samples/sec Loss 3.0033 LearningRate 0.0007 Epoch: 9 Global Step: 16590 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:45:13,014-Speed 9411.73 samples/sec Loss 2.9768 LearningRate 0.0007 Epoch: 9 Global Step: 16600 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:45:39,205-Speed 9383.74 samples/sec Loss 2.9773 LearningRate 0.0007 Epoch: 9 Global Step: 16610 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:46:05,345-Speed 9402.39 samples/sec Loss 2.9800 LearningRate 0.0007 Epoch: 9 Global Step: 16620 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:46:31,543-Speed 9381.00 samples/sec Loss 2.9570 LearningRate 0.0007 Epoch: 9 Global Step: 16630 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:46:57,691-Speed 9399.38 samples/sec Loss 2.9648 LearningRate 0.0007 Epoch: 9 Global Step: 16640 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-03-05 08:47:23,850-Speed 9395.34 samples/sec Loss 2.9993 LearningRate 0.0007 Epoch: 9 Global Step: 16650 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-03-05 08:47:50,032-Speed 9387.15 samples/sec Loss 2.9757 LearningRate 0.0007 Epoch: 9 Global Step: 16660 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-03-05 08:48:16,170-Speed 9402.70 samples/sec Loss 2.9682 LearningRate 0.0007 Epoch: 9 Global Step: 16670 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-03-05 08:48:42,308-Speed 9402.92 samples/sec Loss 2.9782 LearningRate 0.0007 Epoch: 9 Global Step: 16680 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-03-05 08:49:08,392-Speed 9422.29 samples/sec Loss 2.9838 LearningRate 0.0007 Epoch: 9 Global Step: 16690 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-03-05 08:49:34,445-Speed 9433.81 samples/sec Loss 2.9786 LearningRate 0.0007 Epoch: 9 Global Step: 16700 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-03-05 08:50:00,560-Speed 9410.85 samples/sec Loss 3.0351 LearningRate 0.0007 Epoch: 9 Global Step: 16710 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-03-05 08:50:26,665-Speed 9415.03 samples/sec Loss 2.9979 LearningRate 0.0007 Epoch: 9 Global Step: 16720 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-03-05 08:50:52,844-Speed 9387.83 samples/sec Loss 2.9558 LearningRate 0.0007 Epoch: 9 Global Step: 16730 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-03-05 08:51:19,005-Speed 9394.46 samples/sec Loss 2.9487 LearningRate 0.0007 Epoch: 9 Global Step: 16740 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:51:45,115-Speed 9413.10 samples/sec Loss 2.9659 LearningRate 0.0007 Epoch: 9 Global Step: 16750 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:52:11,318-Speed 9379.46 samples/sec Loss 2.9645 LearningRate 0.0007 Epoch: 9 Global Step: 16760 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:52:37,483-Speed 9393.24 samples/sec Loss 2.9719 LearningRate 0.0007 Epoch: 9 Global Step: 16770 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:53:03,567-Speed 9422.31 samples/sec Loss 2.9635 LearningRate 0.0007 Epoch: 9 Global Step: 16780 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-03-05 08:53:29,603-Speed 9439.72 samples/sec Loss 2.9905 LearningRate 0.0007 Epoch: 9 Global Step: 16790 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-03-05 08:53:55,816-Speed 9375.72 samples/sec Loss 2.9841 LearningRate 0.0007 Epoch: 9 Global Step: 16800 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-03-05 08:54:21,965-Speed 9399.31 samples/sec Loss 2.9569 LearningRate 0.0007 Epoch: 9 Global Step: 16810 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-03-05 08:54:48,103-Speed 9402.91 samples/sec Loss 2.9525 LearningRate 0.0007 Epoch: 9 Global Step: 16820 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-03-05 08:55:14,254-Speed 9398.02 samples/sec Loss 2.9427 LearningRate 0.0007 Epoch: 9 Global Step: 16830 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-03-05 08:55:40,375-Speed 9409.12 samples/sec Loss 2.9483 LearningRate 0.0007 Epoch: 9 Global Step: 16840 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-03-05 08:56:06,558-Speed 9386.50 samples/sec Loss 2.9533 LearningRate 0.0007 Epoch: 9 Global Step: 16850 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-03-05 08:56:32,708-Speed 9398.78 samples/sec Loss 2.9537 LearningRate 0.0007 Epoch: 9 Global Step: 16860 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-03-05 08:56:58,822-Speed 9411.34 samples/sec Loss 2.9897 LearningRate 0.0007 Epoch: 9 Global Step: 16870 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-03-05 08:57:24,920-Speed 9417.13 samples/sec Loss 2.9768 LearningRate 0.0007 Epoch: 9 Global Step: 16880 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-03-05 08:57:51,100-Speed 9387.81 samples/sec Loss 2.9272 LearningRate 0.0007 Epoch: 9 Global Step: 16890 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 08:58:17,284-Speed 9386.49 samples/sec Loss 2.9290 LearningRate 0.0007 Epoch: 9 Global Step: 16900 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 08:58:43,414-Speed 9405.58 samples/sec Loss 2.9355 LearningRate 0.0007 Epoch: 9 Global Step: 16910 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 08:59:09,572-Speed 9395.71 samples/sec Loss 2.9499 LearningRate 0.0007 Epoch: 9 Global Step: 16920 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 08:59:35,714-Speed 9401.14 samples/sec Loss 2.9411 LearningRate 0.0007 Epoch: 9 Global Step: 16930 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:00:01,890-Speed 9389.48 samples/sec Loss 2.9458 LearningRate 0.0007 Epoch: 9 Global Step: 16940 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:00:28,038-Speed 9399.08 samples/sec Loss 2.9324 LearningRate 0.0007 Epoch: 9 Global Step: 16950 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:00:54,164-Speed 9407.15 samples/sec Loss 2.9489 LearningRate 0.0007 Epoch: 9 Global Step: 16960 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:01:20,345-Speed 9387.61 samples/sec Loss 2.9622 LearningRate 0.0007 Epoch: 9 Global Step: 16970 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:01:46,429-Speed 9422.34 samples/sec Loss 2.9515 LearningRate 0.0007 Epoch: 9 Global Step: 16980 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:02:12,548-Speed 9409.53 samples/sec Loss 2.9242 LearningRate 0.0007 Epoch: 9 Global Step: 16990 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-03-05 09:02:38,700-Speed 9398.12 samples/sec Loss 2.9399 LearningRate 0.0007 Epoch: 9 Global Step: 17000 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-03-05 09:03:04,813-Speed 9411.63 samples/sec Loss 2.9142 LearningRate 0.0007 Epoch: 9 Global Step: 17010 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-03-05 09:03:30,905-Speed 9419.68 samples/sec Loss 2.9393 LearningRate 0.0007 Epoch: 9 Global Step: 17020 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-03-05 09:03:57,097-Speed 9383.17 samples/sec Loss 2.9283 LearningRate 0.0007 Epoch: 9 Global Step: 17030 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:04:23,212-Speed 9411.30 samples/sec Loss 2.9262 LearningRate 0.0007 Epoch: 9 Global Step: 17040 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:04:49,323-Speed 9412.46 samples/sec Loss 2.9699 LearningRate 0.0007 Epoch: 9 Global Step: 17050 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:05:15,448-Speed 9407.64 samples/sec Loss 2.9270 LearningRate 0.0007 Epoch: 9 Global Step: 17060 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:05:41,594-Speed 9399.65 samples/sec Loss 2.9324 LearningRate 0.0007 Epoch: 9 Global Step: 17070 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:06:07,742-Speed 9399.34 samples/sec Loss 2.9241 LearningRate 0.0007 Epoch: 9 Global Step: 17080 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:06:33,860-Speed 9409.86 samples/sec Loss 2.9179 LearningRate 0.0007 Epoch: 9 Global Step: 17090 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:07:00,022-Speed 9394.76 samples/sec Loss 2.9059 LearningRate 0.0007 Epoch: 9 Global Step: 17100 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:07:26,183-Speed 9394.75 samples/sec Loss 2.9089 LearningRate 0.0007 Epoch: 9 Global Step: 17110 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:07:52,459-Speed 9353.50 samples/sec Loss 2.9197 LearningRate 0.0007 Epoch: 9 Global Step: 17120 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:08:18,759-Speed 9344.72 samples/sec Loss 2.9041 LearningRate 0.0007 Epoch: 9 Global Step: 17130 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:08:44,862-Speed 9415.63 samples/sec Loss 2.9159 LearningRate 0.0007 Epoch: 9 Global Step: 17140 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:09:10,984-Speed 9408.68 samples/sec Loss 2.9214 LearningRate 0.0007 Epoch: 9 Global Step: 17150 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:09:37,148-Speed 9393.56 samples/sec Loss 2.9335 LearningRate 0.0007 Epoch: 9 Global Step: 17160 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:10:03,342-Speed 9382.87 samples/sec Loss 2.9166 LearningRate 0.0007 Epoch: 9 Global Step: 17170 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:10:29,502-Speed 9394.68 samples/sec Loss 2.9064 LearningRate 0.0007 Epoch: 9 Global Step: 17180 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:10:55,632-Speed 9405.55 samples/sec Loss 2.9413 LearningRate 0.0007 Epoch: 9 Global Step: 17190 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:11:21,685-Speed 9433.93 samples/sec Loss 2.9068 LearningRate 0.0007 Epoch: 9 Global Step: 17200 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:11:47,818-Speed 9404.63 samples/sec Loss 2.9073 LearningRate 0.0007 Epoch: 9 Global Step: 17210 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:12:13,923-Speed 9414.78 samples/sec Loss 2.9068 LearningRate 0.0007 Epoch: 9 Global Step: 17220 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:12:40,042-Speed 9409.60 samples/sec Loss 2.9316 LearningRate 0.0007 Epoch: 9 Global Step: 17230 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-03-05 09:13:06,144-Speed 9415.64 samples/sec Loss 2.9984 LearningRate 0.0007 Epoch: 9 Global Step: 17240 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-03-05 09:13:32,287-Speed 9401.47 samples/sec Loss 2.9489 LearningRate 0.0007 Epoch: 9 Global Step: 17250 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-03-05 09:13:58,408-Speed 9408.64 samples/sec Loss 2.9391 LearningRate 0.0007 Epoch: 9 Global Step: 17260 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-03-05 09:14:24,536-Speed 9406.45 samples/sec Loss 2.9391 LearningRate 0.0007 Epoch: 9 Global Step: 17270 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-03-05 09:14:50,603-Speed 9428.64 samples/sec Loss 2.9434 LearningRate 0.0007 Epoch: 9 Global Step: 17280 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-03-05 09:16:10,547-Speed 3074.23 samples/sec Loss 2.8809 LearningRate 0.0007 Epoch: 10 Global Step: 17290 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-03-05 09:16:36,526-Speed 9460.29 samples/sec Loss 2.8619 LearningRate 0.0007 Epoch: 10 Global Step: 17300 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-03-05 09:17:02,596-Speed 9427.30 samples/sec Loss 2.8712 LearningRate 0.0007 Epoch: 10 Global Step: 17310 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-03-05 09:17:28,775-Speed 9388.37 samples/sec Loss 2.8709 LearningRate 0.0007 Epoch: 10 Global Step: 17320 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-03-05 09:17:54,948-Speed 9390.24 samples/sec Loss 2.8637 LearningRate 0.0007 Epoch: 10 Global Step: 17330 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-03-05 09:18:21,043-Speed 9419.17 samples/sec Loss 2.8764 LearningRate 0.0007 Epoch: 10 Global Step: 17340 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-03-05 09:18:47,080-Speed 9439.29 samples/sec Loss 2.8991 LearningRate 0.0007 Epoch: 10 Global Step: 17350 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:19:13,179-Speed 9416.72 samples/sec Loss 2.8714 LearningRate 0.0007 Epoch: 10 Global Step: 17360 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:19:39,224-Speed 9436.51 samples/sec Loss 2.8766 LearningRate 0.0007 Epoch: 10 Global Step: 17370 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:20:05,271-Speed 9435.84 samples/sec Loss 2.8897 LearningRate 0.0007 Epoch: 10 Global Step: 17380 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:20:31,372-Speed 9416.44 samples/sec Loss 2.8740 LearningRate 0.0007 Epoch: 10 Global Step: 17390 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:20:57,441-Speed 9427.62 samples/sec Loss 2.8616 LearningRate 0.0007 Epoch: 10 Global Step: 17400 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:21:23,565-Speed 9408.34 samples/sec Loss 2.8478 LearningRate 0.0007 Epoch: 10 Global Step: 17410 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:21:49,832-Speed 9356.59 samples/sec Loss 2.8564 LearningRate 0.0007 Epoch: 10 Global Step: 17420 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:22:15,886-Speed 9433.31 samples/sec Loss 2.8894 LearningRate 0.0007 Epoch: 10 Global Step: 17430 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:22:42,018-Speed 9404.88 samples/sec Loss 2.8966 LearningRate 0.0007 Epoch: 10 Global Step: 17440 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:23:08,294-Speed 9353.50 samples/sec Loss 2.8629 LearningRate 0.0007 Epoch: 10 Global Step: 17450 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-03-05 09:23:34,418-Speed 9407.79 samples/sec Loss 2.8977 LearningRate 0.0007 Epoch: 10 Global Step: 17460 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-03-05 09:24:00,532-Speed 9411.18 samples/sec Loss 2.8642 LearningRate 0.0007 Epoch: 10 Global Step: 17470 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-03-05 09:24:26,648-Speed 9410.79 samples/sec Loss 2.8771 LearningRate 0.0007 Epoch: 10 Global Step: 17480 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:24:52,841-Speed 9383.39 samples/sec Loss 2.8746 LearningRate 0.0007 Epoch: 10 Global Step: 17490 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:25:19,038-Speed 9381.49 samples/sec Loss 2.8746 LearningRate 0.0007 Epoch: 10 Global Step: 17500 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:25:45,206-Speed 9392.08 samples/sec Loss 2.8601 LearningRate 0.0007 Epoch: 10 Global Step: 17510 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:26:11,334-Speed 9406.24 samples/sec Loss 2.8532 LearningRate 0.0007 Epoch: 10 Global Step: 17520 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:26:37,475-Speed 9401.67 samples/sec Loss 2.8602 LearningRate 0.0007 Epoch: 10 Global Step: 17530 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:27:03,593-Speed 9410.19 samples/sec Loss 2.8665 LearningRate 0.0007 Epoch: 10 Global Step: 17540 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:27:29,677-Speed 9422.44 samples/sec Loss 2.8552 LearningRate 0.0007 Epoch: 10 Global Step: 17550 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:27:55,781-Speed 9415.34 samples/sec Loss 2.8546 LearningRate 0.0007 Epoch: 10 Global Step: 17560 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:28:21,857-Speed 9425.04 samples/sec Loss 2.8949 LearningRate 0.0007 Epoch: 10 Global Step: 17570 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:28:48,002-Speed 9400.37 samples/sec Loss 2.8715 LearningRate 0.0007 Epoch: 10 Global Step: 17580 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-03-05 09:29:14,155-Speed 9397.72 samples/sec Loss 2.8475 LearningRate 0.0007 Epoch: 10 Global Step: 17590 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-03-05 09:29:40,276-Speed 9409.07 samples/sec Loss 2.8688 LearningRate 0.0007 Epoch: 10 Global Step: 17600 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-03-05 09:30:06,375-Speed 9416.76 samples/sec Loss 2.8766 LearningRate 0.0007 Epoch: 10 Global Step: 17610 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:30:32,515-Speed 9402.20 samples/sec Loss 2.8432 LearningRate 0.0007 Epoch: 10 Global Step: 17620 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:30:58,594-Speed 9424.12 samples/sec Loss 2.8595 LearningRate 0.0007 Epoch: 10 Global Step: 17630 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:31:24,716-Speed 9409.37 samples/sec Loss 2.8547 LearningRate 0.0007 Epoch: 10 Global Step: 17640 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:31:50,840-Speed 9408.10 samples/sec Loss 2.8429 LearningRate 0.0007 Epoch: 10 Global Step: 17650 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:32:16,971-Speed 9405.23 samples/sec Loss 2.8506 LearningRate 0.0007 Epoch: 10 Global Step: 17660 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:32:43,113-Speed 9401.37 samples/sec Loss 2.8598 LearningRate 0.0007 Epoch: 10 Global Step: 17670 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:33:09,217-Speed 9415.29 samples/sec Loss 2.8587 LearningRate 0.0007 Epoch: 10 Global Step: 17680 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:33:35,385-Speed 9392.12 samples/sec Loss 2.8437 LearningRate 0.0007 Epoch: 10 Global Step: 17690 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:34:01,520-Speed 9403.91 samples/sec Loss 2.8354 LearningRate 0.0007 Epoch: 10 Global Step: 17700 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:34:27,705-Speed 9385.64 samples/sec Loss 2.8512 LearningRate 0.0007 Epoch: 10 Global Step: 17710 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-03-05 09:34:53,892-Speed 9385.27 samples/sec Loss 2.8371 LearningRate 0.0007 Epoch: 10 Global Step: 17720 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-03-05 09:35:20,051-Speed 9395.38 samples/sec Loss 2.8488 LearningRate 0.0007 Epoch: 10 Global Step: 17730 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-03-05 09:35:46,226-Speed 9389.31 samples/sec Loss 2.8511 LearningRate 0.0007 Epoch: 10 Global Step: 17740 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-03-05 09:36:12,498-Speed 9355.07 samples/sec Loss 2.8483 LearningRate 0.0007 Epoch: 10 Global Step: 17750 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-03-05 09:36:38,681-Speed 9386.69 samples/sec Loss 2.8423 LearningRate 0.0007 Epoch: 10 Global Step: 17760 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-03-05 09:37:04,876-Speed 9382.25 samples/sec Loss 2.8479 LearningRate 0.0007 Epoch: 10 Global Step: 17770 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:37:30,986-Speed 9413.02 samples/sec Loss 2.8539 LearningRate 0.0007 Epoch: 10 Global Step: 17780 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:37:57,070-Speed 9422.17 samples/sec Loss 2.8521 LearningRate 0.0007 Epoch: 10 Global Step: 17790 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:38:23,154-Speed 9422.09 samples/sec Loss 2.8330 LearningRate 0.0007 Epoch: 10 Global Step: 17800 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:38:49,311-Speed 9396.06 samples/sec Loss 2.8436 LearningRate 0.0007 Epoch: 10 Global Step: 17810 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:39:15,450-Speed 9402.40 samples/sec Loss 2.8331 LearningRate 0.0007 Epoch: 10 Global Step: 17820 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:39:41,666-Speed 9375.08 samples/sec Loss 2.8087 LearningRate 0.0007 Epoch: 10 Global Step: 17830 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:40:07,840-Speed 9389.97 samples/sec Loss 2.8293 LearningRate 0.0007 Epoch: 10 Global Step: 17840 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:40:33,898-Speed 9431.83 samples/sec Loss 2.8205 LearningRate 0.0007 Epoch: 10 Global Step: 17850 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:41:00,039-Speed 9401.84 samples/sec Loss 2.8282 LearningRate 0.0007 Epoch: 10 Global Step: 17860 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:41:26,134-Speed 9418.26 samples/sec Loss 2.8118 LearningRate 0.0007 Epoch: 10 Global Step: 17870 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-03-05 09:41:52,306-Speed 9390.68 samples/sec Loss 2.8204 LearningRate 0.0007 Epoch: 10 Global Step: 17880 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-03-05 09:42:18,487-Speed 9387.47 samples/sec Loss 2.8361 LearningRate 0.0007 Epoch: 10 Global Step: 17890 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-03-05 09:42:44,601-Speed 9411.14 samples/sec Loss 2.8287 LearningRate 0.0007 Epoch: 10 Global Step: 17900 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-03-05 09:43:10,732-Speed 9405.50 samples/sec Loss 2.8390 LearningRate 0.0007 Epoch: 10 Global Step: 17910 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-03-05 09:43:36,836-Speed 9414.68 samples/sec Loss 2.8110 LearningRate 0.0007 Epoch: 10 Global Step: 17920 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:44:02,966-Speed 9405.92 samples/sec Loss 2.8213 LearningRate 0.0007 Epoch: 10 Global Step: 17930 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:44:29,079-Speed 9411.94 samples/sec Loss 2.8048 LearningRate 0.0007 Epoch: 10 Global Step: 17940 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:44:55,151-Speed 9426.57 samples/sec Loss 2.8089 LearningRate 0.0007 Epoch: 10 Global Step: 17950 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:45:21,282-Speed 9405.71 samples/sec Loss 2.8352 LearningRate 0.0007 Epoch: 10 Global Step: 17960 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:45:47,379-Speed 9417.73 samples/sec Loss 2.8124 LearningRate 0.0007 Epoch: 10 Global Step: 17970 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:46:13,617-Speed 9366.88 samples/sec Loss 2.8098 LearningRate 0.0007 Epoch: 10 Global Step: 17980 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:46:39,782-Speed 9392.99 samples/sec Loss 2.8066 LearningRate 0.0007 Epoch: 10 Global Step: 17990 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:47:05,955-Speed 9390.28 samples/sec Loss 2.8074 LearningRate 0.0007 Epoch: 10 Global Step: 18000 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:47:32,037-Speed 9423.24 samples/sec Loss 2.8214 LearningRate 0.0007 Epoch: 10 Global Step: 18010 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:47:58,154-Speed 9410.25 samples/sec Loss 2.8221 LearningRate 0.0007 Epoch: 10 Global Step: 18020 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-03-05 09:48:24,248-Speed 9418.76 samples/sec Loss 2.8083 LearningRate 0.0007 Epoch: 10 Global Step: 18030 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:48:50,352-Speed 9414.98 samples/sec Loss 2.7758 LearningRate 0.0007 Epoch: 10 Global Step: 18040 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:49:16,380-Speed 9442.48 samples/sec Loss 2.8225 LearningRate 0.0007 Epoch: 10 Global Step: 18050 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:49:42,516-Speed 9403.35 samples/sec Loss 2.8184 LearningRate 0.0007 Epoch: 10 Global Step: 18060 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:50:08,589-Speed 9426.46 samples/sec Loss 2.7717 LearningRate 0.0007 Epoch: 10 Global Step: 18070 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:50:34,749-Speed 9394.81 samples/sec Loss 2.7877 LearningRate 0.0007 Epoch: 10 Global Step: 18080 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:51:00,920-Speed 9390.81 samples/sec Loss 2.7942 LearningRate 0.0007 Epoch: 10 Global Step: 18090 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:51:27,081-Speed 9394.56 samples/sec Loss 2.8230 LearningRate 0.0007 Epoch: 10 Global Step: 18100 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:51:53,333-Speed 9361.72 samples/sec Loss 2.8154 LearningRate 0.0007 Epoch: 10 Global Step: 18110 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:52:19,445-Speed 9412.34 samples/sec Loss 2.8054 LearningRate 0.0007 Epoch: 10 Global Step: 18120 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:52:45,600-Speed 9396.66 samples/sec Loss 2.7837 LearningRate 0.0007 Epoch: 10 Global Step: 18130 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-03-05 09:53:11,774-Speed 9389.85 samples/sec Loss 2.8134 LearningRate 0.0007 Epoch: 10 Global Step: 18140 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-03-05 09:53:37,942-Speed 9391.84 samples/sec Loss 2.7969 LearningRate 0.0007 Epoch: 10 Global Step: 18150 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-03-05 09:54:04,107-Speed 9393.34 samples/sec Loss 2.7867 LearningRate 0.0007 Epoch: 10 Global Step: 18160 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-03-05 09:54:30,170-Speed 9430.69 samples/sec Loss 2.8029 LearningRate 0.0007 Epoch: 10 Global Step: 18170 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:54:56,340-Speed 9391.35 samples/sec Loss 2.8152 LearningRate 0.0007 Epoch: 10 Global Step: 18180 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:55:22,411-Speed 9427.17 samples/sec Loss 2.7902 LearningRate 0.0007 Epoch: 10 Global Step: 18190 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:55:48,537-Speed 9406.80 samples/sec Loss 2.7780 LearningRate 0.0007 Epoch: 10 Global Step: 18200 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:56:14,665-Speed 9406.76 samples/sec Loss 2.7896 LearningRate 0.0007 Epoch: 10 Global Step: 18210 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:56:40,715-Speed 9434.68 samples/sec Loss 2.7602 LearningRate 0.0007 Epoch: 10 Global Step: 18220 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-03-05 09:57:06,757-Speed 9437.35 samples/sec Loss 2.7584 LearningRate 0.0007 Epoch: 10 Global Step: 18230 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-03-05 09:57:32,832-Speed 9425.74 samples/sec Loss 2.7761 LearningRate 0.0007 Epoch: 10 Global Step: 18240 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-03-05 09:57:58,976-Speed 9400.55 samples/sec Loss 2.7608 LearningRate 0.0007 Epoch: 10 Global Step: 18250 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-03-05 09:58:25,140-Speed 9393.34 samples/sec Loss 2.7746 LearningRate 0.0007 Epoch: 10 Global Step: 18260 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-03-05 09:58:51,257-Speed 9410.54 samples/sec Loss 2.7659 LearningRate 0.0007 Epoch: 10 Global Step: 18270 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-03-05 09:59:17,297-Speed 9438.29 samples/sec Loss 2.7656 LearningRate 0.0007 Epoch: 10 Global Step: 18280 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-03-05 09:59:43,549-Speed 9361.63 samples/sec Loss 2.7966 LearningRate 0.0007 Epoch: 10 Global Step: 18290 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-03-05 10:00:09,694-Speed 9400.64 samples/sec Loss 2.7946 LearningRate 0.0007 Epoch: 10 Global Step: 18300 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-03-05 10:00:35,798-Speed 9415.04 samples/sec Loss 2.7658 LearningRate 0.0007 Epoch: 10 Global Step: 18310 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-03-05 10:01:01,947-Speed 9399.04 samples/sec Loss 2.7650 LearningRate 0.0007 Epoch: 10 Global Step: 18320 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-03-05 10:01:28,253-Speed 9342.82 samples/sec Loss 2.7689 LearningRate 0.0007 Epoch: 10 Global Step: 18330 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:01:54,443-Speed 9383.99 samples/sec Loss 2.7769 LearningRate 0.0007 Epoch: 10 Global Step: 18340 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:02:20,559-Speed 9411.11 samples/sec Loss 2.7582 LearningRate 0.0007 Epoch: 10 Global Step: 18350 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:02:46,698-Speed 9402.44 samples/sec Loss 2.7684 LearningRate 0.0007 Epoch: 10 Global Step: 18360 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:03:12,822-Speed 9408.00 samples/sec Loss 2.7805 LearningRate 0.0007 Epoch: 10 Global Step: 18370 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-03-05 10:03:38,939-Speed 9410.65 samples/sec Loss 2.7842 LearningRate 0.0007 Epoch: 10 Global Step: 18380 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-03-05 10:04:05,053-Speed 9411.43 samples/sec Loss 2.7667 LearningRate 0.0007 Epoch: 10 Global Step: 18390 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-03-05 10:04:31,222-Speed 9392.04 samples/sec Loss 2.7446 LearningRate 0.0007 Epoch: 10 Global Step: 18400 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-03-05 10:04:57,335-Speed 9411.85 samples/sec Loss 2.7648 LearningRate 0.0007 Epoch: 10 Global Step: 18410 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-03-05 10:05:23,471-Speed 9403.47 samples/sec Loss 2.7988 LearningRate 0.0007 Epoch: 10 Global Step: 18420 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-03-05 10:05:49,616-Speed 9400.33 samples/sec Loss 2.7766 LearningRate 0.0007 Epoch: 10 Global Step: 18430 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-03-05 10:06:15,725-Speed 9413.19 samples/sec Loss 2.7666 LearningRate 0.0007 Epoch: 10 Global Step: 18440 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-03-05 10:06:41,817-Speed 9419.72 samples/sec Loss 2.7418 LearningRate 0.0007 Epoch: 10 Global Step: 18450 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-03-05 10:07:07,952-Speed 9403.93 samples/sec Loss 2.7546 LearningRate 0.0007 Epoch: 10 Global Step: 18460 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-03-05 10:07:34,103-Speed 9398.45 samples/sec Loss 2.7640 LearningRate 0.0007 Epoch: 10 Global Step: 18470 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:08:00,244-Speed 9402.45 samples/sec Loss 2.7448 LearningRate 0.0007 Epoch: 10 Global Step: 18480 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:08:26,448-Speed 9379.39 samples/sec Loss 2.7507 LearningRate 0.0007 Epoch: 10 Global Step: 18490 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:08:52,613-Speed 9393.02 samples/sec Loss 2.7764 LearningRate 0.0007 Epoch: 10 Global Step: 18500 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:09:18,731-Speed 9410.27 samples/sec Loss 2.7471 LearningRate 0.0007 Epoch: 10 Global Step: 18510 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:09:44,986-Speed 9361.21 samples/sec Loss 2.7329 LearningRate 0.0007 Epoch: 10 Global Step: 18520 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:10:11,168-Speed 9386.74 samples/sec Loss 2.7364 LearningRate 0.0007 Epoch: 10 Global Step: 18530 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:10:37,249-Speed 9423.60 samples/sec Loss 2.7415 LearningRate 0.0007 Epoch: 10 Global Step: 18540 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:11:03,371-Speed 9408.46 samples/sec Loss 2.7452 LearningRate 0.0007 Epoch: 10 Global Step: 18550 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:11:29,496-Speed 9407.97 samples/sec Loss 2.7603 LearningRate 0.0007 Epoch: 10 Global Step: 18560 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:11:55,576-Speed 9423.48 samples/sec Loss 2.7210 LearningRate 0.0007 Epoch: 10 Global Step: 18570 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-03-05 10:12:21,625-Speed 9434.94 samples/sec Loss 2.7369 LearningRate 0.0007 Epoch: 10 Global Step: 18580 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:12:47,734-Speed 9413.28 samples/sec Loss 2.7544 LearningRate 0.0007 Epoch: 10 Global Step: 18590 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:13:13,894-Speed 9394.84 samples/sec Loss 2.7325 LearningRate 0.0007 Epoch: 10 Global Step: 18600 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:13:40,017-Speed 9408.47 samples/sec Loss 2.7320 LearningRate 0.0007 Epoch: 10 Global Step: 18610 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:14:06,097-Speed 9423.98 samples/sec Loss 2.7227 LearningRate 0.0007 Epoch: 10 Global Step: 18620 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:14:32,268-Speed 9391.07 samples/sec Loss 2.7171 LearningRate 0.0007 Epoch: 10 Global Step: 18630 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:14:58,350-Speed 9423.00 samples/sec Loss 2.7480 LearningRate 0.0007 Epoch: 10 Global Step: 18640 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:15:24,439-Speed 9420.41 samples/sec Loss 2.7544 LearningRate 0.0007 Epoch: 10 Global Step: 18650 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:15:50,499-Speed 9430.93 samples/sec Loss 2.7123 LearningRate 0.0007 Epoch: 10 Global Step: 18660 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:16:16,552-Speed 9433.47 samples/sec Loss 2.7124 LearningRate 0.0007 Epoch: 10 Global Step: 18670 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:16:42,659-Speed 9413.97 samples/sec Loss 2.7243 LearningRate 0.0007 Epoch: 10 Global Step: 18680 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-03-05 10:17:08,772-Speed 9411.99 samples/sec Loss 2.7154 LearningRate 0.0007 Epoch: 10 Global Step: 18690 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-03-05 10:17:34,833-Speed 9430.45 samples/sec Loss 2.7307 LearningRate 0.0007 Epoch: 10 Global Step: 18700 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-03-05 10:18:00,944-Speed 9412.95 samples/sec Loss 2.7474 LearningRate 0.0007 Epoch: 10 Global Step: 18710 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-03-05 10:18:27,089-Speed 9400.18 samples/sec Loss 2.7276 LearningRate 0.0007 Epoch: 10 Global Step: 18720 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-03-05 10:18:53,311-Speed 9372.73 samples/sec Loss 2.7317 LearningRate 0.0007 Epoch: 10 Global Step: 18730 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-03-05 10:19:19,424-Speed 9411.48 samples/sec Loss 2.7279 LearningRate 0.0007 Epoch: 10 Global Step: 18740 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-03-05 10:19:45,617-Speed 9383.01 samples/sec Loss 2.7153 LearningRate 0.0007 Epoch: 10 Global Step: 18750 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-03-05 10:20:11,808-Speed 9383.86 samples/sec Loss 2.7067 LearningRate 0.0007 Epoch: 10 Global Step: 18760 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-03-05 10:20:37,933-Speed 9407.49 samples/sec Loss 2.7209 LearningRate 0.0007 Epoch: 10 Global Step: 18770 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-03-05 10:21:04,037-Speed 9415.23 samples/sec Loss 2.7143 LearningRate 0.0007 Epoch: 10 Global Step: 18780 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-03-05 10:21:30,179-Speed 9401.68 samples/sec Loss 2.7140 LearningRate 0.0007 Epoch: 10 Global Step: 18790 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-03-05 10:21:56,349-Speed 9391.40 samples/sec Loss 2.7321 LearningRate 0.0007 Epoch: 10 Global Step: 18800 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-03-05 10:22:22,510-Speed 9394.62 samples/sec Loss 2.7346 LearningRate 0.0007 Epoch: 10 Global Step: 18810 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:22:48,674-Speed 9393.73 samples/sec Loss 2.7311 LearningRate 0.0007 Epoch: 10 Global Step: 18820 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:23:14,881-Speed 9378.35 samples/sec Loss 2.7119 LearningRate 0.0007 Epoch: 10 Global Step: 18830 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:23:40,997-Speed 9410.76 samples/sec Loss 2.6906 LearningRate 0.0007 Epoch: 10 Global Step: 18840 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:24:07,096-Speed 9417.02 samples/sec Loss 2.7274 LearningRate 0.0007 Epoch: 10 Global Step: 18850 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-03-05 10:24:33,176-Speed 9423.79 samples/sec Loss 2.7161 LearningRate 0.0007 Epoch: 10 Global Step: 18860 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-03-05 10:24:59,321-Speed 9400.54 samples/sec Loss 2.7211 LearningRate 0.0007 Epoch: 10 Global Step: 18870 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-03-05 10:25:25,536-Speed 9375.27 samples/sec Loss 2.7147 LearningRate 0.0007 Epoch: 10 Global Step: 18880 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-03-05 10:25:51,670-Speed 9404.28 samples/sec Loss 2.7048 LearningRate 0.0007 Epoch: 10 Global Step: 18890 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-03-05 10:26:17,864-Speed 9383.17 samples/sec Loss 2.7148 LearningRate 0.0007 Epoch: 10 Global Step: 18900 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-03-05 10:26:43,976-Speed 9412.12 samples/sec Loss 2.6866 LearningRate 0.0007 Epoch: 10 Global Step: 18910 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-03-05 10:27:10,191-Speed 9375.06 samples/sec Loss 2.7055 LearningRate 0.0007 Epoch: 10 Global Step: 18920 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-03-05 10:27:36,309-Speed 9410.20 samples/sec Loss 2.7290 LearningRate 0.0007 Epoch: 10 Global Step: 18930 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-03-05 10:28:02,452-Speed 9400.87 samples/sec Loss 2.7122 LearningRate 0.0007 Epoch: 10 Global Step: 18940 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-03-05 10:28:28,593-Speed 9401.94 samples/sec Loss 2.7179 LearningRate 0.0007 Epoch: 10 Global Step: 18950 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:28:54,795-Speed 9380.06 samples/sec Loss 2.7120 LearningRate 0.0007 Epoch: 10 Global Step: 18960 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:29:20,966-Speed 9390.96 samples/sec Loss 2.7020 LearningRate 0.0006 Epoch: 10 Global Step: 18970 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:29:47,141-Speed 9389.65 samples/sec Loss 2.7008 LearningRate 0.0006 Epoch: 10 Global Step: 18980 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:30:13,272-Speed 9405.26 samples/sec Loss 2.7242 LearningRate 0.0006 Epoch: 10 Global Step: 18990 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:30:39,391-Speed 9409.83 samples/sec Loss 2.7221 LearningRate 0.0006 Epoch: 10 Global Step: 19000 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:31:05,576-Speed 9385.92 samples/sec Loss 2.7196 LearningRate 0.0006 Epoch: 10 Global Step: 19010 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:32:25,097-Speed 3090.57 samples/sec Loss 2.6822 LearningRate 0.0006 Epoch: 11 Global Step: 19020 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:32:51,080-Speed 9458.94 samples/sec Loss 2.6798 LearningRate 0.0006 Epoch: 11 Global Step: 19030 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:33:17,102-Speed 9444.82 samples/sec Loss 2.6798 LearningRate 0.0006 Epoch: 11 Global Step: 19040 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:33:43,169-Speed 9428.55 samples/sec Loss 2.6817 LearningRate 0.0006 Epoch: 11 Global Step: 19050 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:34:09,342-Speed 9390.32 samples/sec Loss 2.6513 LearningRate 0.0006 Epoch: 11 Global Step: 19060 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:34:35,451-Speed 9413.22 samples/sec Loss 2.6685 LearningRate 0.0006 Epoch: 11 Global Step: 19070 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:35:01,472-Speed 9445.42 samples/sec Loss 2.6630 LearningRate 0.0006 Epoch: 11 Global Step: 19080 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:35:27,496-Speed 9444.12 samples/sec Loss 2.6662 LearningRate 0.0006 Epoch: 11 Global Step: 19090 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:35:53,618-Speed 9408.49 samples/sec Loss 2.6842 LearningRate 0.0006 Epoch: 11 Global Step: 19100 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:36:19,627-Speed 9449.46 samples/sec Loss 2.6673 LearningRate 0.0006 Epoch: 11 Global Step: 19110 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:36:45,674-Speed 9435.81 samples/sec Loss 2.6567 LearningRate 0.0006 Epoch: 11 Global Step: 19120 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:37:11,767-Speed 9419.38 samples/sec Loss 2.6806 LearningRate 0.0006 Epoch: 11 Global Step: 19130 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:37:37,866-Speed 9416.90 samples/sec Loss 2.6837 LearningRate 0.0006 Epoch: 11 Global Step: 19140 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:38:03,983-Speed 9410.17 samples/sec Loss 2.6669 LearningRate 0.0006 Epoch: 11 Global Step: 19150 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-03-05 10:38:30,036-Speed 9433.79 samples/sec Loss 2.6395 LearningRate 0.0006 Epoch: 11 Global Step: 19160 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-03-05 10:38:56,060-Speed 9444.07 samples/sec Loss 2.6640 LearningRate 0.0006 Epoch: 11 Global Step: 19170 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:39:22,103-Speed 9437.16 samples/sec Loss 2.6809 LearningRate 0.0006 Epoch: 11 Global Step: 19180 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:39:48,176-Speed 9426.31 samples/sec Loss 2.6923 LearningRate 0.0006 Epoch: 11 Global Step: 19190 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:40:14,328-Speed 9397.93 samples/sec Loss 2.6636 LearningRate 0.0006 Epoch: 11 Global Step: 19200 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:40:40,469-Speed 9401.47 samples/sec Loss 2.6650 LearningRate 0.0006 Epoch: 11 Global Step: 19210 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:41:06,589-Speed 9409.57 samples/sec Loss 2.6611 LearningRate 0.0006 Epoch: 11 Global Step: 19220 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:41:32,735-Speed 9399.69 samples/sec Loss 2.6716 LearningRate 0.0006 Epoch: 11 Global Step: 19230 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:41:58,799-Speed 9429.81 samples/sec Loss 2.6846 LearningRate 0.0006 Epoch: 11 Global Step: 19240 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:42:24,937-Speed 9402.68 samples/sec Loss 2.6629 LearningRate 0.0006 Epoch: 11 Global Step: 19250 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:42:51,012-Speed 9425.82 samples/sec Loss 2.6690 LearningRate 0.0006 Epoch: 11 Global Step: 19260 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:43:17,128-Speed 9410.78 samples/sec Loss 2.6732 LearningRate 0.0006 Epoch: 11 Global Step: 19270 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-03-05 10:43:43,255-Speed 9406.77 samples/sec Loss 2.6607 LearningRate 0.0006 Epoch: 11 Global Step: 19280 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:44:09,336-Speed 9423.61 samples/sec Loss 2.6439 LearningRate 0.0006 Epoch: 11 Global Step: 19290 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:44:35,391-Speed 9432.87 samples/sec Loss 2.6569 LearningRate 0.0006 Epoch: 11 Global Step: 19300 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:45:01,528-Speed 9403.17 samples/sec Loss 2.7066 LearningRate 0.0006 Epoch: 11 Global Step: 19310 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:45:27,703-Speed 9389.35 samples/sec Loss 2.6831 LearningRate 0.0006 Epoch: 11 Global Step: 19320 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:45:53,807-Speed 9415.42 samples/sec Loss 2.6420 LearningRate 0.0006 Epoch: 11 Global Step: 19330 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:46:19,966-Speed 9395.03 samples/sec Loss 2.6705 LearningRate 0.0006 Epoch: 11 Global Step: 19340 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:46:46,122-Speed 9396.36 samples/sec Loss 2.6587 LearningRate 0.0006 Epoch: 11 Global Step: 19350 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:47:12,240-Speed 9410.20 samples/sec Loss 2.6342 LearningRate 0.0006 Epoch: 11 Global Step: 19360 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:47:38,316-Speed 9425.29 samples/sec Loss 2.6457 LearningRate 0.0006 Epoch: 11 Global Step: 19370 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:48:04,463-Speed 9399.53 samples/sec Loss 2.6796 LearningRate 0.0006 Epoch: 11 Global Step: 19380 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-03-05 10:48:30,566-Speed 9415.31 samples/sec Loss 2.6462 LearningRate 0.0006 Epoch: 11 Global Step: 19390 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-03-05 10:48:56,669-Speed 9415.28 samples/sec Loss 2.6460 LearningRate 0.0006 Epoch: 11 Global Step: 19400 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:49:22,822-Speed 9397.44 samples/sec Loss 2.6369 LearningRate 0.0006 Epoch: 11 Global Step: 19410 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:49:49,032-Speed 9377.02 samples/sec Loss 2.6459 LearningRate 0.0006 Epoch: 11 Global Step: 19420 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:50:15,154-Speed 9408.48 samples/sec Loss 2.6428 LearningRate 0.0006 Epoch: 11 Global Step: 19430 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:50:41,301-Speed 9399.61 samples/sec Loss 2.6528 LearningRate 0.0006 Epoch: 11 Global Step: 19440 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:51:07,396-Speed 9418.45 samples/sec Loss 2.6264 LearningRate 0.0006 Epoch: 11 Global Step: 19450 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:51:33,485-Speed 9420.36 samples/sec Loss 2.6393 LearningRate 0.0006 Epoch: 11 Global Step: 19460 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:51:59,608-Speed 9408.75 samples/sec Loss 2.6359 LearningRate 0.0006 Epoch: 11 Global Step: 19470 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:52:25,690-Speed 9423.03 samples/sec Loss 2.6501 LearningRate 0.0006 Epoch: 11 Global Step: 19480 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-03-05 10:52:51,858-Speed 9391.95 samples/sec Loss 2.6362 LearningRate 0.0006 Epoch: 11 Global Step: 19490 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-03-05 10:53:18,074-Speed 9374.66 samples/sec Loss 2.6670 LearningRate 0.0006 Epoch: 11 Global Step: 19500 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-03-05 10:53:44,215-Speed 9401.93 samples/sec Loss 2.6596 LearningRate 0.0006 Epoch: 11 Global Step: 19510 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-03-05 10:54:10,297-Speed 9423.35 samples/sec Loss 2.6485 LearningRate 0.0006 Epoch: 11 Global Step: 19520 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-03-05 10:54:36,376-Speed 9424.17 samples/sec Loss 2.6366 LearningRate 0.0006 Epoch: 11 Global Step: 19530 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-03-05 10:55:02,477-Speed 9416.02 samples/sec Loss 2.6235 LearningRate 0.0006 Epoch: 11 Global Step: 19540 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-03-05 10:55:28,582-Speed 9414.70 samples/sec Loss 2.6290 LearningRate 0.0006 Epoch: 11 Global Step: 19550 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-03-05 10:55:54,776-Speed 9382.75 samples/sec Loss 2.6634 LearningRate 0.0006 Epoch: 11 Global Step: 19560 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-03-05 10:56:20,888-Speed 9412.02 samples/sec Loss 2.6233 LearningRate 0.0006 Epoch: 11 Global Step: 19570 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-03-05 10:56:47,006-Speed 9410.07 samples/sec Loss 2.6508 LearningRate 0.0006 Epoch: 11 Global Step: 19580 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-03-05 10:57:13,185-Speed 9418.41 samples/sec Loss 2.6251 LearningRate 0.0006 Epoch: 11 Global Step: 19590 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 10:57:39,296-Speed 9412.38 samples/sec Loss 2.6196 LearningRate 0.0006 Epoch: 11 Global Step: 19600 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 10:58:05,370-Speed 9426.04 samples/sec Loss 2.6432 LearningRate 0.0006 Epoch: 11 Global Step: 19610 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 10:58:31,523-Speed 9429.94 samples/sec Loss 2.6230 LearningRate 0.0006 Epoch: 11 Global Step: 19620 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 10:58:57,655-Speed 9404.93 samples/sec Loss 2.6406 LearningRate 0.0006 Epoch: 11 Global Step: 19630 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 10:59:23,716-Speed 9430.45 samples/sec Loss 2.6232 LearningRate 0.0006 Epoch: 11 Global Step: 19640 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 10:59:49,935-Speed 9429.61 samples/sec Loss 2.6514 LearningRate 0.0006 Epoch: 11 Global Step: 19650 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-03-05 11:00:16,097-Speed 9394.02 samples/sec Loss 2.6308 LearningRate 0.0006 Epoch: 11 Global Step: 19660 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-03-05 11:00:42,309-Speed 9376.23 samples/sec Loss 2.6152 LearningRate 0.0006 Epoch: 11 Global Step: 19670 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-03-05 11:01:08,423-Speed 9411.51 samples/sec Loss 2.6041 LearningRate 0.0006 Epoch: 11 Global Step: 19680 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-03-05 11:01:34,711-Speed 9396.69 samples/sec Loss 2.6317 LearningRate 0.0006 Epoch: 11 Global Step: 19690 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-03-05 11:02:00,871-Speed 9394.85 samples/sec Loss 2.6222 LearningRate 0.0006 Epoch: 11 Global Step: 19700 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-03-05 11:02:27,080-Speed 9377.48 samples/sec Loss 2.6203 LearningRate 0.0006 Epoch: 11 Global Step: 19710 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-03-05 11:02:53,332-Speed 9413.97 samples/sec Loss 2.6297 LearningRate 0.0006 Epoch: 11 Global Step: 19720 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-03-05 11:03:19,520-Speed 9384.75 samples/sec Loss 2.6116 LearningRate 0.0006 Epoch: 11 Global Step: 19730 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-03-05 11:03:45,764-Speed 9405.29 samples/sec Loss 2.6030 LearningRate 0.0006 Epoch: 11 Global Step: 19740 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-03-05 11:04:11,896-Speed 9404.91 samples/sec Loss 2.6266 LearningRate 0.0006 Epoch: 11 Global Step: 19750 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:04:38,080-Speed 9385.90 samples/sec Loss 2.6058 LearningRate 0.0006 Epoch: 11 Global Step: 19760 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:05:04,291-Speed 9376.72 samples/sec Loss 2.6326 LearningRate 0.0006 Epoch: 11 Global Step: 19770 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:05:30,390-Speed 9417.00 samples/sec Loss 2.6043 LearningRate 0.0006 Epoch: 11 Global Step: 19780 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:05:56,561-Speed 9437.17 samples/sec Loss 2.6206 LearningRate 0.0006 Epoch: 11 Global Step: 19790 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:06:22,654-Speed 9418.96 samples/sec Loss 2.6116 LearningRate 0.0006 Epoch: 11 Global Step: 19800 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:06:48,931-Speed 9393.18 samples/sec Loss 2.5967 LearningRate 0.0006 Epoch: 11 Global Step: 19810 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:07:15,020-Speed 9420.34 samples/sec Loss 2.6016 LearningRate 0.0006 Epoch: 11 Global Step: 19820 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:07:41,181-Speed 9394.62 samples/sec Loss 2.6347 LearningRate 0.0006 Epoch: 11 Global Step: 19830 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:08:07,511-Speed 9380.92 samples/sec Loss 2.6015 LearningRate 0.0006 Epoch: 11 Global Step: 19840 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:08:33,758-Speed 9363.72 samples/sec Loss 2.6034 LearningRate 0.0006 Epoch: 11 Global Step: 19850 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-03-05 11:08:59,898-Speed 9402.25 samples/sec Loss 2.5955 LearningRate 0.0006 Epoch: 11 Global Step: 19860 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-03-05 11:09:25,993-Speed 9418.32 samples/sec Loss 2.6054 LearningRate 0.0006 Epoch: 11 Global Step: 19870 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:09:52,072-Speed 9423.85 samples/sec Loss 2.6189 LearningRate 0.0006 Epoch: 11 Global Step: 19880 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:10:18,137-Speed 9429.05 samples/sec Loss 2.6106 LearningRate 0.0006 Epoch: 11 Global Step: 19890 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:10:44,327-Speed 9419.67 samples/sec Loss 2.6147 LearningRate 0.0006 Epoch: 11 Global Step: 19900 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:11:10,550-Speed 9400.08 samples/sec Loss 2.5853 LearningRate 0.0006 Epoch: 11 Global Step: 19910 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:11:36,642-Speed 9419.46 samples/sec Loss 2.5684 LearningRate 0.0006 Epoch: 11 Global Step: 19920 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:12:02,851-Speed 9377.28 samples/sec Loss 2.5840 LearningRate 0.0006 Epoch: 11 Global Step: 19930 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:12:28,988-Speed 9403.26 samples/sec Loss 2.6108 LearningRate 0.0006 Epoch: 11 Global Step: 19940 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:12:55,198-Speed 9376.89 samples/sec Loss 2.6144 LearningRate 0.0006 Epoch: 11 Global Step: 19950 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:13:21,281-Speed 9422.55 samples/sec Loss 2.6017 LearningRate 0.0006 Epoch: 11 Global Step: 19960 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:13:47,484-Speed 9379.45 samples/sec Loss 2.5995 LearningRate 0.0006 Epoch: 11 Global Step: 19970 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-03-05 11:14:13,636-Speed 9397.47 samples/sec Loss 2.5725 LearningRate 0.0006 Epoch: 11 Global Step: 19980 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-03-05 11:14:39,887-Speed 9362.49 samples/sec Loss 2.5608 LearningRate 0.0006 Epoch: 11 Global Step: 19990 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-03-05 11:15:06,121-Speed 9397.25 samples/sec Loss 2.6000 LearningRate 0.0006 Epoch: 11 Global Step: 20000 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-03-05 11:15:32,183-Speed 9430.18 samples/sec Loss 2.5817 LearningRate 0.0006 Epoch: 11 Global Step: 20010 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:15:58,337-Speed 9397.00 samples/sec Loss 2.6098 LearningRate 0.0006 Epoch: 11 Global Step: 20020 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:16:24,531-Speed 9382.74 samples/sec Loss 2.5933 LearningRate 0.0006 Epoch: 11 Global Step: 20030 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:16:50,610-Speed 9424.09 samples/sec Loss 2.5860 LearningRate 0.0006 Epoch: 11 Global Step: 20040 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:17:16,697-Speed 9421.37 samples/sec Loss 2.5774 LearningRate 0.0006 Epoch: 11 Global Step: 20050 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:17:42,845-Speed 9399.22 samples/sec Loss 2.5636 LearningRate 0.0006 Epoch: 11 Global Step: 20060 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:18:08,952-Speed 9413.87 samples/sec Loss 2.5692 LearningRate 0.0006 Epoch: 11 Global Step: 20070 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:18:35,043-Speed 9419.77 samples/sec Loss 2.5841 LearningRate 0.0006 Epoch: 11 Global Step: 20080 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:19:01,110-Speed 9429.28 samples/sec Loss 2.5839 LearningRate 0.0006 Epoch: 11 Global Step: 20090 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:19:27,247-Speed 9403.06 samples/sec Loss 2.5808 LearningRate 0.0006 Epoch: 11 Global Step: 20100 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:19:53,370-Speed 9408.39 samples/sec Loss 2.5639 LearningRate 0.0006 Epoch: 11 Global Step: 20110 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-03-05 11:20:19,538-Speed 9391.83 samples/sec Loss 2.6005 LearningRate 0.0006 Epoch: 11 Global Step: 20120 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-03-05 11:20:45,623-Speed 9421.97 samples/sec Loss 2.5775 LearningRate 0.0006 Epoch: 11 Global Step: 20130 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-03-05 11:21:11,821-Speed 9381.50 samples/sec Loss 2.5662 LearningRate 0.0006 Epoch: 11 Global Step: 20140 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-03-05 11:21:37,989-Speed 9391.94 samples/sec Loss 2.5690 LearningRate 0.0006 Epoch: 11 Global Step: 20150 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-03-05 11:22:04,056-Speed 9428.35 samples/sec Loss 2.5673 LearningRate 0.0006 Epoch: 11 Global Step: 20160 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-03-05 11:22:30,237-Speed 9387.23 samples/sec Loss 2.5852 LearningRate 0.0006 Epoch: 11 Global Step: 20170 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-03-05 11:22:56,352-Speed 9411.04 samples/sec Loss 2.5702 LearningRate 0.0006 Epoch: 11 Global Step: 20180 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-03-05 11:23:22,455-Speed 9415.22 samples/sec Loss 2.5741 LearningRate 0.0006 Epoch: 11 Global Step: 20190 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:23:48,584-Speed 9406.30 samples/sec Loss 2.5665 LearningRate 0.0006 Epoch: 11 Global Step: 20200 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:24:14,658-Speed 9425.74 samples/sec Loss 2.5708 LearningRate 0.0006 Epoch: 11 Global Step: 20210 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:24:40,783-Speed 9407.58 samples/sec Loss 2.5774 LearningRate 0.0006 Epoch: 11 Global Step: 20220 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:25:06,920-Speed 9402.88 samples/sec Loss 2.5571 LearningRate 0.0006 Epoch: 11 Global Step: 20230 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:25:33,111-Speed 9384.04 samples/sec Loss 2.5919 LearningRate 0.0006 Epoch: 11 Global Step: 20240 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:25:59,322-Speed 9376.61 samples/sec Loss 2.5797 LearningRate 0.0006 Epoch: 11 Global Step: 20250 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:26:25,397-Speed 9425.41 samples/sec Loss 2.5704 LearningRate 0.0006 Epoch: 11 Global Step: 20260 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:26:51,529-Speed 9405.18 samples/sec Loss 2.5628 LearningRate 0.0006 Epoch: 11 Global Step: 20270 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:27:17,722-Speed 9382.88 samples/sec Loss 2.5750 LearningRate 0.0006 Epoch: 11 Global Step: 20280 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:27:43,891-Speed 9391.96 samples/sec Loss 2.5521 LearningRate 0.0006 Epoch: 11 Global Step: 20290 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-03-05 11:28:10,012-Speed 9408.78 samples/sec Loss 2.5451 LearningRate 0.0006 Epoch: 11 Global Step: 20300 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-03-05 11:28:36,158-Speed 9400.29 samples/sec Loss 2.5393 LearningRate 0.0006 Epoch: 11 Global Step: 20310 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-03-05 11:29:02,374-Speed 9374.64 samples/sec Loss 2.5512 LearningRate 0.0006 Epoch: 11 Global Step: 20320 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-03-05 11:29:28,527-Speed 9397.64 samples/sec Loss 2.5682 LearningRate 0.0006 Epoch: 11 Global Step: 20330 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:29:54,697-Speed 9392.11 samples/sec Loss 2.5566 LearningRate 0.0006 Epoch: 11 Global Step: 20340 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:30:20,759-Speed 9430.36 samples/sec Loss 2.5457 LearningRate 0.0006 Epoch: 11 Global Step: 20350 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:30:46,882-Speed 9407.89 samples/sec Loss 2.5321 LearningRate 0.0006 Epoch: 11 Global Step: 20360 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:31:13,033-Speed 9398.13 samples/sec Loss 2.5660 LearningRate 0.0006 Epoch: 11 Global Step: 20370 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:31:39,132-Speed 9417.39 samples/sec Loss 2.5621 LearningRate 0.0006 Epoch: 11 Global Step: 20380 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:32:05,258-Speed 9407.16 samples/sec Loss 2.5526 LearningRate 0.0006 Epoch: 11 Global Step: 20390 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:32:31,324-Speed 9428.68 samples/sec Loss 2.5372 LearningRate 0.0006 Epoch: 11 Global Step: 20400 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:32:57,402-Speed 9424.26 samples/sec Loss 2.5297 LearningRate 0.0006 Epoch: 11 Global Step: 20410 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:33:23,555-Speed 9397.52 samples/sec Loss 2.5318 LearningRate 0.0006 Epoch: 11 Global Step: 20420 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:33:49,663-Speed 9413.83 samples/sec Loss 2.5399 LearningRate 0.0006 Epoch: 11 Global Step: 20430 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-03-05 11:34:15,795-Speed 9404.89 samples/sec Loss 2.5374 LearningRate 0.0006 Epoch: 11 Global Step: 20440 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:34:41,888-Speed 9418.84 samples/sec Loss 2.5307 LearningRate 0.0006 Epoch: 11 Global Step: 20450 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:35:07,999-Speed 9412.46 samples/sec Loss 2.5397 LearningRate 0.0006 Epoch: 11 Global Step: 20460 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:35:34,222-Speed 9372.31 samples/sec Loss 2.5587 LearningRate 0.0006 Epoch: 11 Global Step: 20470 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:36:00,320-Speed 9417.32 samples/sec Loss 2.5396 LearningRate 0.0006 Epoch: 11 Global Step: 20480 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:36:26,384-Speed 9429.47 samples/sec Loss 2.5787 LearningRate 0.0006 Epoch: 11 Global Step: 20490 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:36:52,590-Speed 9378.46 samples/sec Loss 2.5854 LearningRate 0.0006 Epoch: 11 Global Step: 20500 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:37:18,769-Speed 9388.09 samples/sec Loss 2.5513 LearningRate 0.0006 Epoch: 11 Global Step: 20510 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:37:44,908-Speed 9402.39 samples/sec Loss 2.5402 LearningRate 0.0006 Epoch: 11 Global Step: 20520 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:38:10,992-Speed 9422.37 samples/sec Loss 2.5429 LearningRate 0.0006 Epoch: 11 Global Step: 20530 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:38:37,168-Speed 9389.03 samples/sec Loss 2.5492 LearningRate 0.0006 Epoch: 11 Global Step: 20540 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-03-05 11:39:03,364-Speed 9381.96 samples/sec Loss 2.5446 LearningRate 0.0006 Epoch: 11 Global Step: 20550 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-03-05 11:39:29,498-Speed 9404.33 samples/sec Loss 2.5327 LearningRate 0.0006 Epoch: 11 Global Step: 20560 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-03-05 11:39:55,655-Speed 9396.06 samples/sec Loss 2.5399 LearningRate 0.0006 Epoch: 11 Global Step: 20570 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-03-05 11:40:21,706-Speed 9434.53 samples/sec Loss 2.5269 LearningRate 0.0006 Epoch: 11 Global Step: 20580 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-03-05 11:40:47,794-Speed 9421.05 samples/sec Loss 2.5464 LearningRate 0.0006 Epoch: 11 Global Step: 20590 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:41:13,987-Speed 9383.11 samples/sec Loss 2.5237 LearningRate 0.0006 Epoch: 11 Global Step: 20600 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:41:40,193-Speed 9378.19 samples/sec Loss 2.5173 LearningRate 0.0006 Epoch: 11 Global Step: 20610 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:42:06,313-Speed 9409.33 samples/sec Loss 2.5201 LearningRate 0.0006 Epoch: 11 Global Step: 20620 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:42:32,459-Speed 9400.11 samples/sec Loss 2.5285 LearningRate 0.0006 Epoch: 11 Global Step: 20630 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:42:58,549-Speed 9420.00 samples/sec Loss 2.5433 LearningRate 0.0006 Epoch: 11 Global Step: 20640 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:43:24,635-Speed 9421.66 samples/sec Loss 2.5304 LearningRate 0.0006 Epoch: 11 Global Step: 20650 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:43:50,835-Speed 9380.44 samples/sec Loss 2.5091 LearningRate 0.0006 Epoch: 11 Global Step: 20660 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:44:16,975-Speed 9401.85 samples/sec Loss 2.5460 LearningRate 0.0006 Epoch: 11 Global Step: 20670 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:44:43,063-Speed 9421.25 samples/sec Loss 2.5291 LearningRate 0.0006 Epoch: 11 Global Step: 20680 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:45:09,177-Speed 9411.11 samples/sec Loss 2.5440 LearningRate 0.0006 Epoch: 11 Global Step: 20690 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-03-05 11:45:35,258-Speed 9423.59 samples/sec Loss 2.5459 LearningRate 0.0006 Epoch: 11 Global Step: 20700 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-03-05 11:46:01,367-Speed 9413.11 samples/sec Loss 2.5475 LearningRate 0.0006 Epoch: 11 Global Step: 20710 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-03-05 11:46:27,543-Speed 9389.14 samples/sec Loss 2.5384 LearningRate 0.0006 Epoch: 11 Global Step: 20720 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-03-05 11:46:53,689-Speed 9400.07 samples/sec Loss 2.5424 LearningRate 0.0006 Epoch: 11 Global Step: 20730 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-03-05 11:47:19,759-Speed 9427.20 samples/sec Loss 2.5489 LearningRate 0.0006 Epoch: 11 Global Step: 20740 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:48:38,025-Speed 3140.11 samples/sec Loss 2.5074 LearningRate 0.0006 Epoch: 12 Global Step: 20750 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:49:03,939-Speed 9484.25 samples/sec Loss 2.4770 LearningRate 0.0006 Epoch: 12 Global Step: 20760 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:49:29,943-Speed 9451.47 samples/sec Loss 2.5063 LearningRate 0.0006 Epoch: 12 Global Step: 20770 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:49:56,036-Speed 9418.78 samples/sec Loss 2.4904 LearningRate 0.0006 Epoch: 12 Global Step: 20780 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:50:22,031-Speed 9454.47 samples/sec Loss 2.4941 LearningRate 0.0006 Epoch: 12 Global Step: 20790 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:50:47,983-Speed 9470.42 samples/sec Loss 2.5037 LearningRate 0.0006 Epoch: 12 Global Step: 20800 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:51:13,948-Speed 9465.68 samples/sec Loss 2.5010 LearningRate 0.0006 Epoch: 12 Global Step: 20810 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:51:40,033-Speed 9421.99 samples/sec Loss 2.4928 LearningRate 0.0006 Epoch: 12 Global Step: 20820 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:52:06,169-Speed 9403.54 samples/sec Loss 2.5084 LearningRate 0.0006 Epoch: 12 Global Step: 20830 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:52:32,305-Speed 9403.52 samples/sec Loss 2.5045 LearningRate 0.0006 Epoch: 12 Global Step: 20840 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-03-05 11:52:58,448-Speed 9401.12 samples/sec Loss 2.4926 LearningRate 0.0006 Epoch: 12 Global Step: 20850 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-03-05 11:53:24,570-Speed 9408.55 samples/sec Loss 2.4882 LearningRate 0.0006 Epoch: 12 Global Step: 20860 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-03-05 11:53:50,676-Speed 9414.37 samples/sec Loss 2.4859 LearningRate 0.0006 Epoch: 12 Global Step: 20870 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-03-05 11:54:16,777-Speed 9415.79 samples/sec Loss 2.5004 LearningRate 0.0006 Epoch: 12 Global Step: 20880 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-03-05 11:54:42,849-Speed 9426.71 samples/sec Loss 2.5001 LearningRate 0.0006 Epoch: 12 Global Step: 20890 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:55:08,925-Speed 9425.19 samples/sec Loss 2.4973 LearningRate 0.0006 Epoch: 12 Global Step: 20900 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:55:35,077-Speed 9397.72 samples/sec Loss 2.5085 LearningRate 0.0006 Epoch: 12 Global Step: 20910 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:56:01,172-Speed 9418.55 samples/sec Loss 2.5029 LearningRate 0.0006 Epoch: 12 Global Step: 20920 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:56:27,228-Speed 9432.38 samples/sec Loss 2.5073 LearningRate 0.0006 Epoch: 12 Global Step: 20930 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:56:53,287-Speed 9431.31 samples/sec Loss 2.4859 LearningRate 0.0006 Epoch: 12 Global Step: 20940 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-03-05 11:57:19,345-Speed 9431.69 samples/sec Loss 2.4884 LearningRate 0.0006 Epoch: 12 Global Step: 20950 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 11:57:45,441-Speed 9418.16 samples/sec Loss 2.4841 LearningRate 0.0006 Epoch: 12 Global Step: 20960 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 11:58:11,515-Speed 9426.03 samples/sec Loss 2.4898 LearningRate 0.0006 Epoch: 12 Global Step: 20970 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 11:58:37,601-Speed 9421.36 samples/sec Loss 2.4850 LearningRate 0.0006 Epoch: 12 Global Step: 20980 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 11:59:03,719-Speed 9410.39 samples/sec Loss 2.4966 LearningRate 0.0006 Epoch: 12 Global Step: 20990 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-03-05 11:59:29,846-Speed 9406.73 samples/sec Loss 2.4927 LearningRate 0.0006 Epoch: 12 Global Step: 21000 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-03-05 11:59:56,025-Speed 9388.41 samples/sec Loss 2.5074 LearningRate 0.0006 Epoch: 12 Global Step: 21010 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-03-05 12:00:22,081-Speed 9432.56 samples/sec Loss 2.4781 LearningRate 0.0006 Epoch: 12 Global Step: 21020 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:00:48,148-Speed 9428.59 samples/sec Loss 2.4705 LearningRate 0.0006 Epoch: 12 Global Step: 21030 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:01:14,240-Speed 9419.39 samples/sec Loss 2.5094 LearningRate 0.0006 Epoch: 12 Global Step: 21040 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:01:40,354-Speed 9411.40 samples/sec Loss 2.5154 LearningRate 0.0006 Epoch: 12 Global Step: 21050 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:02:06,471-Speed 9410.66 samples/sec Loss 2.4989 LearningRate 0.0006 Epoch: 12 Global Step: 21060 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:02:32,556-Speed 9422.05 samples/sec Loss 2.4918 LearningRate 0.0006 Epoch: 12 Global Step: 21070 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:02:58,770-Speed 9375.52 samples/sec Loss 2.5088 LearningRate 0.0006 Epoch: 12 Global Step: 21080 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:03:24,912-Speed 9401.22 samples/sec Loss 2.4931 LearningRate 0.0006 Epoch: 12 Global Step: 21090 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:03:51,112-Speed 9380.94 samples/sec Loss 2.4815 LearningRate 0.0006 Epoch: 12 Global Step: 21100 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:04:17,283-Speed 9391.18 samples/sec Loss 2.4938 LearningRate 0.0006 Epoch: 12 Global Step: 21110 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:04:43,320-Speed 9439.34 samples/sec Loss 2.4885 LearningRate 0.0006 Epoch: 12 Global Step: 21120 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:05:09,449-Speed 9406.05 samples/sec Loss 2.4668 LearningRate 0.0006 Epoch: 12 Global Step: 21130 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:05:35,551-Speed 9415.99 samples/sec Loss 2.4766 LearningRate 0.0006 Epoch: 12 Global Step: 21140 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:06:01,628-Speed 9424.99 samples/sec Loss 2.4749 LearningRate 0.0006 Epoch: 12 Global Step: 21150 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:06:27,722-Speed 9418.59 samples/sec Loss 2.4763 LearningRate 0.0006 Epoch: 12 Global Step: 21160 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:06:53,869-Speed 9399.68 samples/sec Loss 2.4876 LearningRate 0.0006 Epoch: 12 Global Step: 21170 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:07:19,974-Speed 9414.39 samples/sec Loss 2.4650 LearningRate 0.0006 Epoch: 12 Global Step: 21180 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:07:46,096-Speed 9408.68 samples/sec Loss 2.4630 LearningRate 0.0006 Epoch: 12 Global Step: 21190 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:08:12,218-Speed 9409.53 samples/sec Loss 2.4742 LearningRate 0.0006 Epoch: 12 Global Step: 21200 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:08:38,338-Speed 9409.43 samples/sec Loss 2.4986 LearningRate 0.0006 Epoch: 12 Global Step: 21210 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:09:04,404-Speed 9428.74 samples/sec Loss 2.4863 LearningRate 0.0006 Epoch: 12 Global Step: 21220 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:09:30,514-Speed 9412.68 samples/sec Loss 2.4664 LearningRate 0.0006 Epoch: 12 Global Step: 21230 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:09:56,635-Speed 9409.07 samples/sec Loss 2.4725 LearningRate 0.0006 Epoch: 12 Global Step: 21240 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:10:22,705-Speed 9427.27 samples/sec Loss 2.4645 LearningRate 0.0006 Epoch: 12 Global Step: 21250 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:10:48,815-Speed 9413.03 samples/sec Loss 2.4696 LearningRate 0.0006 Epoch: 12 Global Step: 21260 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:11:15,003-Speed 9384.90 samples/sec Loss 2.4695 LearningRate 0.0006 Epoch: 12 Global Step: 21270 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:11:41,169-Speed 9392.51 samples/sec Loss 2.4751 LearningRate 0.0006 Epoch: 12 Global Step: 21280 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:12:07,339-Speed 9391.22 samples/sec Loss 2.4580 LearningRate 0.0006 Epoch: 12 Global Step: 21290 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:12:33,413-Speed 9425.96 samples/sec Loss 2.4514 LearningRate 0.0006 Epoch: 12 Global Step: 21300 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:12:59,633-Speed 9373.60 samples/sec Loss 2.4590 LearningRate 0.0006 Epoch: 12 Global Step: 21310 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:13:25,739-Speed 9414.49 samples/sec Loss 2.4515 LearningRate 0.0006 Epoch: 12 Global Step: 21320 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:13:51,830-Speed 9419.89 samples/sec Loss 2.4612 LearningRate 0.0006 Epoch: 12 Global Step: 21330 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:14:17,961-Speed 9405.60 samples/sec Loss 2.4529 LearningRate 0.0006 Epoch: 12 Global Step: 21340 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:14:44,207-Speed 9364.18 samples/sec Loss 2.4710 LearningRate 0.0006 Epoch: 12 Global Step: 21350 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:15:10,330-Speed 9408.09 samples/sec Loss 2.4590 LearningRate 0.0006 Epoch: 12 Global Step: 21360 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:15:36,495-Speed 9393.20 samples/sec Loss 2.4623 LearningRate 0.0006 Epoch: 12 Global Step: 21370 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:16:02,673-Speed 9388.40 samples/sec Loss 2.4613 LearningRate 0.0006 Epoch: 12 Global Step: 21380 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:16:28,785-Speed 9412.10 samples/sec Loss 2.4530 LearningRate 0.0006 Epoch: 12 Global Step: 21390 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:16:54,885-Speed 9416.66 samples/sec Loss 2.4540 LearningRate 0.0006 Epoch: 12 Global Step: 21400 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:17:20,979-Speed 9418.96 samples/sec Loss 2.4531 LearningRate 0.0006 Epoch: 12 Global Step: 21410 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:17:47,042-Speed 9429.96 samples/sec Loss 2.4492 LearningRate 0.0006 Epoch: 12 Global Step: 21420 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:18:13,207-Speed 9393.07 samples/sec Loss 2.4475 LearningRate 0.0006 Epoch: 12 Global Step: 21430 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:18:39,309-Speed 9415.95 samples/sec Loss 2.4371 LearningRate 0.0006 Epoch: 12 Global Step: 21440 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:19:05,431-Speed 9408.74 samples/sec Loss 2.4543 LearningRate 0.0006 Epoch: 12 Global Step: 21450 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:19:31,582-Speed 9398.26 samples/sec Loss 2.4494 LearningRate 0.0006 Epoch: 12 Global Step: 21460 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:19:57,751-Speed 9391.38 samples/sec Loss 2.4678 LearningRate 0.0006 Epoch: 12 Global Step: 21470 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:20:23,775-Speed 9444.22 samples/sec Loss 2.4576 LearningRate 0.0006 Epoch: 12 Global Step: 21480 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:20:49,909-Speed 9404.37 samples/sec Loss 2.4492 LearningRate 0.0006 Epoch: 12 Global Step: 21490 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:21:16,024-Speed 9411.16 samples/sec Loss 2.4523 LearningRate 0.0006 Epoch: 12 Global Step: 21500 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:21:42,161-Speed 9403.05 samples/sec Loss 2.4456 LearningRate 0.0006 Epoch: 12 Global Step: 21510 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:22:08,281-Speed 9409.74 samples/sec Loss 2.4360 LearningRate 0.0006 Epoch: 12 Global Step: 21520 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:22:34,439-Speed 9395.48 samples/sec Loss 2.4529 LearningRate 0.0006 Epoch: 12 Global Step: 21530 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:23:00,673-Speed 9368.40 samples/sec Loss 2.4327 LearningRate 0.0006 Epoch: 12 Global Step: 21540 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:23:26,776-Speed 9415.31 samples/sec Loss 2.4607 LearningRate 0.0006 Epoch: 12 Global Step: 21550 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:23:52,926-Speed 9398.83 samples/sec Loss 2.4484 LearningRate 0.0006 Epoch: 12 Global Step: 21560 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:24:19,037-Speed 9412.58 samples/sec Loss 2.4521 LearningRate 0.0006 Epoch: 12 Global Step: 21570 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:24:45,204-Speed 9392.06 samples/sec Loss 2.4336 LearningRate 0.0006 Epoch: 12 Global Step: 21580 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:25:11,383-Speed 9388.27 samples/sec Loss 2.4343 LearningRate 0.0006 Epoch: 12 Global Step: 21590 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:25:37,585-Speed 9380.12 samples/sec Loss 2.4508 LearningRate 0.0006 Epoch: 12 Global Step: 21600 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:26:03,700-Speed 9411.06 samples/sec Loss 2.4215 LearningRate 0.0006 Epoch: 12 Global Step: 21610 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:26:29,910-Speed 9377.19 samples/sec Loss 2.4396 LearningRate 0.0006 Epoch: 12 Global Step: 21620 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:26:56,022-Speed 9412.06 samples/sec Loss 2.4354 LearningRate 0.0006 Epoch: 12 Global Step: 21630 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:27:22,154-Speed 9405.48 samples/sec Loss 2.4534 LearningRate 0.0006 Epoch: 12 Global Step: 21640 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:27:48,245-Speed 9419.68 samples/sec Loss 2.4252 LearningRate 0.0006 Epoch: 12 Global Step: 21650 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:28:14,369-Speed 9407.97 samples/sec Loss 2.4201 LearningRate 0.0006 Epoch: 12 Global Step: 21660 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:28:40,536-Speed 9392.32 samples/sec Loss 2.4310 LearningRate 0.0006 Epoch: 12 Global Step: 21670 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:29:06,690-Speed 9397.11 samples/sec Loss 2.4284 LearningRate 0.0006 Epoch: 12 Global Step: 21680 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:29:32,779-Speed 9420.74 samples/sec Loss 2.4253 LearningRate 0.0006 Epoch: 12 Global Step: 21690 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:29:58,882-Speed 9415.16 samples/sec Loss 2.4175 LearningRate 0.0006 Epoch: 12 Global Step: 21700 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:30:25,040-Speed 9395.72 samples/sec Loss 2.4156 LearningRate 0.0006 Epoch: 12 Global Step: 21710 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:30:51,076-Speed 9439.71 samples/sec Loss 2.4286 LearningRate 0.0006 Epoch: 12 Global Step: 21720 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:31:17,187-Speed 9412.49 samples/sec Loss 2.4282 LearningRate 0.0006 Epoch: 12 Global Step: 21730 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:31:43,325-Speed 9402.91 samples/sec Loss 2.4354 LearningRate 0.0006 Epoch: 12 Global Step: 21740 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:32:09,441-Speed 9410.29 samples/sec Loss 2.4119 LearningRate 0.0006 Epoch: 12 Global Step: 21750 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:32:35,558-Speed 9410.42 samples/sec Loss 2.4029 LearningRate 0.0006 Epoch: 12 Global Step: 21760 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:33:01,698-Speed 9401.94 samples/sec Loss 2.4567 LearningRate 0.0006 Epoch: 12 Global Step: 21770 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:33:27,797-Speed 9416.67 samples/sec Loss 2.4323 LearningRate 0.0006 Epoch: 12 Global Step: 21780 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:33:54,011-Speed 9376.05 samples/sec Loss 2.4246 LearningRate 0.0006 Epoch: 12 Global Step: 21790 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:34:20,092-Speed 9423.31 samples/sec Loss 2.4092 LearningRate 0.0006 Epoch: 12 Global Step: 21800 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:34:46,173-Speed 9423.28 samples/sec Loss 2.4012 LearningRate 0.0006 Epoch: 12 Global Step: 21810 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:35:12,248-Speed 9425.44 samples/sec Loss 2.4029 LearningRate 0.0006 Epoch: 12 Global Step: 21820 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:35:38,341-Speed 9419.18 samples/sec Loss 2.4089 LearningRate 0.0006 Epoch: 12 Global Step: 21830 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:36:04,555-Speed 9375.80 samples/sec Loss 2.4465 LearningRate 0.0006 Epoch: 12 Global Step: 21840 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:36:30,725-Speed 9391.20 samples/sec Loss 2.3940 LearningRate 0.0006 Epoch: 12 Global Step: 21850 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:36:56,918-Speed 9383.29 samples/sec Loss 2.4206 LearningRate 0.0006 Epoch: 12 Global Step: 21860 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:37:23,013-Speed 9418.26 samples/sec Loss 2.4157 LearningRate 0.0006 Epoch: 12 Global Step: 21870 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:37:49,185-Speed 9390.63 samples/sec Loss 2.4104 LearningRate 0.0006 Epoch: 12 Global Step: 21880 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:38:15,293-Speed 9413.89 samples/sec Loss 2.4170 LearningRate 0.0006 Epoch: 12 Global Step: 21890 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:38:41,350-Speed 9431.91 samples/sec Loss 2.4379 LearningRate 0.0006 Epoch: 12 Global Step: 21900 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:39:07,510-Speed 9395.12 samples/sec Loss 2.4214 LearningRate 0.0006 Epoch: 12 Global Step: 21910 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:39:33,613-Speed 9415.17 samples/sec Loss 2.4084 LearningRate 0.0006 Epoch: 12 Global Step: 21920 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-03-05 12:39:59,792-Speed 9388.47 samples/sec Loss 2.3876 LearningRate 0.0006 Epoch: 12 Global Step: 21930 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-03-05 12:40:25,832-Speed 9438.19 samples/sec Loss 2.4127 LearningRate 0.0006 Epoch: 12 Global Step: 21940 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:40:51,898-Speed 9428.90 samples/sec Loss 2.4109 LearningRate 0.0006 Epoch: 12 Global Step: 21950 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:41:17,950-Speed 9433.47 samples/sec Loss 2.4157 LearningRate 0.0006 Epoch: 12 Global Step: 21960 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:41:44,041-Speed 9420.05 samples/sec Loss 2.4122 LearningRate 0.0006 Epoch: 12 Global Step: 21970 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:42:10,074-Speed 9440.61 samples/sec Loss 2.4072 LearningRate 0.0006 Epoch: 12 Global Step: 21980 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:42:36,233-Speed 9395.34 samples/sec Loss 2.4142 LearningRate 0.0006 Epoch: 12 Global Step: 21990 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:43:02,308-Speed 9425.46 samples/sec Loss 2.3955 LearningRate 0.0006 Epoch: 12 Global Step: 22000 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:43:28,423-Speed 9411.21 samples/sec Loss 2.3901 LearningRate 0.0006 Epoch: 12 Global Step: 22010 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:43:54,602-Speed 9388.01 samples/sec Loss 2.3963 LearningRate 0.0006 Epoch: 12 Global Step: 22020 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:44:20,726-Speed 9408.20 samples/sec Loss 2.4186 LearningRate 0.0006 Epoch: 12 Global Step: 22030 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:44:46,830-Speed 9415.15 samples/sec Loss 2.3774 LearningRate 0.0006 Epoch: 12 Global Step: 22040 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:45:12,916-Speed 9421.61 samples/sec Loss 2.3929 LearningRate 0.0006 Epoch: 12 Global Step: 22050 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-03-05 12:45:39,046-Speed 9405.75 samples/sec Loss 2.3851 LearningRate 0.0006 Epoch: 12 Global Step: 22060 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:46:05,178-Speed 9404.85 samples/sec Loss 2.3776 LearningRate 0.0006 Epoch: 12 Global Step: 22070 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:46:31,363-Speed 9386.13 samples/sec Loss 2.3969 LearningRate 0.0006 Epoch: 12 Global Step: 22080 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:46:57,568-Speed 9378.92 samples/sec Loss 2.3909 LearningRate 0.0006 Epoch: 12 Global Step: 22090 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:47:23,648-Speed 9423.75 samples/sec Loss 2.3841 LearningRate 0.0006 Epoch: 12 Global Step: 22100 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:47:49,784-Speed 9403.27 samples/sec Loss 2.3900 LearningRate 0.0006 Epoch: 12 Global Step: 22110 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:48:15,902-Speed 9410.26 samples/sec Loss 2.3788 LearningRate 0.0006 Epoch: 12 Global Step: 22120 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:48:41,959-Speed 9432.06 samples/sec Loss 2.3898 LearningRate 0.0006 Epoch: 12 Global Step: 22130 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:49:08,141-Speed 9387.07 samples/sec Loss 2.4066 LearningRate 0.0006 Epoch: 12 Global Step: 22140 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:49:34,321-Speed 9387.79 samples/sec Loss 2.4005 LearningRate 0.0006 Epoch: 12 Global Step: 22150 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:50:00,497-Speed 9389.39 samples/sec Loss 2.3892 LearningRate 0.0006 Epoch: 12 Global Step: 22160 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-03-05 12:50:26,681-Speed 9385.87 samples/sec Loss 2.3952 LearningRate 0.0006 Epoch: 12 Global Step: 22170 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-03-05 12:50:52,878-Speed 9381.97 samples/sec Loss 2.3892 LearningRate 0.0006 Epoch: 12 Global Step: 22180 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-03-05 12:51:19,126-Speed 9363.31 samples/sec Loss 2.3796 LearningRate 0.0006 Epoch: 12 Global Step: 22190 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:51:45,427-Speed 9344.50 samples/sec Loss 2.3922 LearningRate 0.0006 Epoch: 12 Global Step: 22200 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:52:11,607-Speed 9388.00 samples/sec Loss 2.3850 LearningRate 0.0006 Epoch: 12 Global Step: 22210 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:52:37,767-Speed 9394.93 samples/sec Loss 2.3709 LearningRate 0.0006 Epoch: 12 Global Step: 22220 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:53:03,935-Speed 9392.10 samples/sec Loss 2.4215 LearningRate 0.0006 Epoch: 12 Global Step: 22230 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:53:30,000-Speed 9429.17 samples/sec Loss 2.4216 LearningRate 0.0006 Epoch: 12 Global Step: 22240 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:53:56,145-Speed 9400.22 samples/sec Loss 2.3980 LearningRate 0.0006 Epoch: 12 Global Step: 22250 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-05 12:54:22,266-Speed 9409.00 samples/sec Loss 2.3815 LearningRate 0.0006 Epoch: 12 Global Step: 22260 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 12:54:48,492-Speed 9371.10 samples/sec Loss 2.3603 LearningRate 0.0006 Epoch: 12 Global Step: 22270 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 12:55:14,681-Speed 9384.67 samples/sec Loss 2.3744 LearningRate 0.0006 Epoch: 12 Global Step: 22280 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 12:55:40,916-Speed 9368.22 samples/sec Loss 2.3885 LearningRate 0.0006 Epoch: 12 Global Step: 22290 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-03-05 12:56:07,084-Speed 9391.95 samples/sec Loss 2.3702 LearningRate 0.0006 Epoch: 12 Global Step: 22300 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 12:56:33,167-Speed 9422.67 samples/sec Loss 2.3767 LearningRate 0.0006 Epoch: 12 Global Step: 22310 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 12:56:59,248-Speed 9423.61 samples/sec Loss 2.3950 LearningRate 0.0006 Epoch: 12 Global Step: 22320 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 12:57:25,411-Speed 9393.87 samples/sec Loss 2.3745 LearningRate 0.0006 Epoch: 12 Global Step: 22330 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 12:57:51,506-Speed 9418.52 samples/sec Loss 2.3696 LearningRate 0.0006 Epoch: 12 Global Step: 22340 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 12:58:17,621-Speed 9411.27 samples/sec Loss 2.3801 LearningRate 0.0006 Epoch: 12 Global Step: 22350 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 12:58:43,698-Speed 9425.00 samples/sec Loss 2.3793 LearningRate 0.0006 Epoch: 12 Global Step: 22360 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 12:59:09,787-Speed 9420.59 samples/sec Loss 2.3777 LearningRate 0.0006 Epoch: 12 Global Step: 22370 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 12:59:35,908-Speed 9408.97 samples/sec Loss 2.3716 LearningRate 0.0006 Epoch: 12 Global Step: 22380 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:00:02,045-Speed 9402.98 samples/sec Loss 2.3615 LearningRate 0.0006 Epoch: 12 Global Step: 22390 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:00:28,146-Speed 9416.40 samples/sec Loss 2.3631 LearningRate 0.0006 Epoch: 12 Global Step: 22400 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-03-05 13:00:54,289-Speed 9400.81 samples/sec Loss 2.3949 LearningRate 0.0006 Epoch: 12 Global Step: 22410 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-03-05 13:01:20,338-Speed 9435.01 samples/sec Loss 2.3915 LearningRate 0.0006 Epoch: 12 Global Step: 22420 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:01:46,509-Speed 9391.00 samples/sec Loss 2.4119 LearningRate 0.0006 Epoch: 12 Global Step: 22430 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:02:12,609-Speed 9416.41 samples/sec Loss 2.3887 LearningRate 0.0006 Epoch: 12 Global Step: 22440 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:02:38,817-Speed 9377.53 samples/sec Loss 2.3769 LearningRate 0.0006 Epoch: 12 Global Step: 22450 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:03:04,884-Speed 9428.52 samples/sec Loss 2.3950 LearningRate 0.0006 Epoch: 12 Global Step: 22460 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:04:23,950-Speed 3108.37 samples/sec Loss 2.3930 LearningRate 0.0006 Epoch: 13 Global Step: 22470 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:04:50,008-Speed 9431.82 samples/sec Loss 2.3370 LearningRate 0.0006 Epoch: 13 Global Step: 22480 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:05:16,064-Speed 9432.66 samples/sec Loss 2.3524 LearningRate 0.0006 Epoch: 13 Global Step: 22490 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:05:42,191-Speed 9406.80 samples/sec Loss 2.3550 LearningRate 0.0006 Epoch: 13 Global Step: 22500 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:06:08,300-Speed 9413.39 samples/sec Loss 2.3399 LearningRate 0.0006 Epoch: 13 Global Step: 22510 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:06:34,400-Speed 9416.80 samples/sec Loss 2.3346 LearningRate 0.0006 Epoch: 13 Global Step: 22520 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-03-05 13:07:00,495-Speed 9418.23 samples/sec Loss 2.3394 LearningRate 0.0006 Epoch: 13 Global Step: 22530 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-03-05 13:07:26,618-Speed 9408.21 samples/sec Loss 2.3538 LearningRate 0.0006 Epoch: 13 Global Step: 22540 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-03-05 13:07:52,719-Speed 9416.86 samples/sec Loss 2.3385 LearningRate 0.0006 Epoch: 13 Global Step: 22550 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-03-05 13:08:18,857-Speed 9402.48 samples/sec Loss 2.3420 LearningRate 0.0006 Epoch: 13 Global Step: 22560 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:08:45,074-Speed 9374.55 samples/sec Loss 2.3369 LearningRate 0.0006 Epoch: 13 Global Step: 22570 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:09:11,233-Speed 9395.29 samples/sec Loss 2.3514 LearningRate 0.0006 Epoch: 13 Global Step: 22580 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:09:37,270-Speed 9440.11 samples/sec Loss 2.3567 LearningRate 0.0006 Epoch: 13 Global Step: 22590 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:10:03,365-Speed 9418.50 samples/sec Loss 2.3295 LearningRate 0.0006 Epoch: 13 Global Step: 22600 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:10:29,481-Speed 9410.66 samples/sec Loss 2.3340 LearningRate 0.0006 Epoch: 13 Global Step: 22610 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:10:55,656-Speed 9389.41 samples/sec Loss 2.3408 LearningRate 0.0006 Epoch: 13 Global Step: 22620 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:11:21,776-Speed 9409.47 samples/sec Loss 2.3448 LearningRate 0.0006 Epoch: 13 Global Step: 22630 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:11:47,895-Speed 9409.41 samples/sec Loss 2.3389 LearningRate 0.0006 Epoch: 13 Global Step: 22640 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:12:13,978-Speed 9422.76 samples/sec Loss 2.3475 LearningRate 0.0006 Epoch: 13 Global Step: 22650 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:12:40,054-Speed 9425.17 samples/sec Loss 2.3467 LearningRate 0.0006 Epoch: 13 Global Step: 22660 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-03-05 13:13:06,214-Speed 9394.92 samples/sec Loss 2.3639 LearningRate 0.0006 Epoch: 13 Global Step: 22670 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-03-05 13:13:32,341-Speed 9406.73 samples/sec Loss 2.3367 LearningRate 0.0006 Epoch: 13 Global Step: 22680 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-03-05 13:13:58,515-Speed 9389.81 samples/sec Loss 2.3334 LearningRate 0.0006 Epoch: 13 Global Step: 22690 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-03-05 13:14:24,707-Speed 9383.48 samples/sec Loss 2.3279 LearningRate 0.0006 Epoch: 13 Global Step: 22700 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-03-05 13:14:50,803-Speed 9418.16 samples/sec Loss 2.3610 LearningRate 0.0006 Epoch: 13 Global Step: 22710 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-03-05 13:15:16,924-Speed 9408.66 samples/sec Loss 2.3495 LearningRate 0.0006 Epoch: 13 Global Step: 22720 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:15:43,046-Speed 9408.68 samples/sec Loss 2.3585 LearningRate 0.0006 Epoch: 13 Global Step: 22730 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:16:09,178-Speed 9405.05 samples/sec Loss 2.3448 LearningRate 0.0006 Epoch: 13 Global Step: 22740 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-05 13:16:35,274-Speed 9418.18 samples/sec Loss 2.3270 LearningRate 0.0006 Epoch: 13 Global Step: 22750 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-05 13:17:01,357-Speed 9422.88 samples/sec Loss 2.3546 LearningRate 0.0006 Epoch: 13 Global Step: 22760 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-05 13:17:27,526-Speed 9391.89 samples/sec Loss 2.3524 LearningRate 0.0006 Epoch: 13 Global Step: 22770 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-05 13:17:53,652-Speed 9407.48 samples/sec Loss 2.3389 LearningRate 0.0006 Epoch: 13 Global Step: 22780 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-05 13:18:19,757-Speed 9414.99 samples/sec Loss 2.3349 LearningRate 0.0006 Epoch: 13 Global Step: 22790 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-05 13:18:45,843-Speed 9422.01 samples/sec Loss 2.3454 LearningRate 0.0006 Epoch: 13 Global Step: 22800 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-05 13:19:11,976-Speed 9404.76 samples/sec Loss 2.3661 LearningRate 0.0006 Epoch: 13 Global Step: 22810 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-05 13:19:38,053-Speed 9424.73 samples/sec Loss 2.3442 LearningRate 0.0006 Epoch: 13 Global Step: 22820 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-05 13:20:04,265-Speed 9376.42 samples/sec Loss 2.3227 LearningRate 0.0006 Epoch: 13 Global Step: 22830 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-05 13:20:30,398-Speed 9404.56 samples/sec Loss 2.3297 LearningRate 0.0006 Epoch: 13 Global Step: 22840 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:20:56,582-Speed 9386.37 samples/sec Loss 2.3357 LearningRate 0.0006 Epoch: 13 Global Step: 22850 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:21:22,754-Speed 9390.58 samples/sec Loss 2.3415 LearningRate 0.0006 Epoch: 13 Global Step: 22860 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:21:48,950-Speed 9381.95 samples/sec Loss 2.3508 LearningRate 0.0006 Epoch: 13 Global Step: 22870 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:22:15,043-Speed 9419.07 samples/sec Loss 2.3399 LearningRate 0.0006 Epoch: 13 Global Step: 22880 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-05 13:22:41,136-Speed 9419.80 samples/sec Loss 2.3179 LearningRate 0.0006 Epoch: 13 Global Step: 22890 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-05 13:23:07,258-Speed 9408.45 samples/sec Loss 2.3266 LearningRate 0.0006 Epoch: 13 Global Step: 22900 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-05 13:23:33,431-Speed 9390.46 samples/sec Loss 2.3199 LearningRate 0.0006 Epoch: 13 Global Step: 22910 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-05 13:23:59,637-Speed 9378.45 samples/sec Loss 2.3235 LearningRate 0.0006 Epoch: 13 Global Step: 22920 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-05 13:24:25,766-Speed 9406.23 samples/sec Loss 2.3264 LearningRate 0.0006 Epoch: 13 Global Step: 22930 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-05 13:24:51,831-Speed 9429.50 samples/sec Loss 2.3210 LearningRate 0.0006 Epoch: 13 Global Step: 22940 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-05 13:25:18,014-Speed 9386.80 samples/sec Loss 2.3228 LearningRate 0.0006 Epoch: 13 Global Step: 22950 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-05 13:25:44,188-Speed 9389.60 samples/sec Loss 2.3308 LearningRate 0.0006 Epoch: 13 Global Step: 22960 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-05 13:26:10,364-Speed 9389.22 samples/sec Loss 2.3275 LearningRate 0.0006 Epoch: 13 Global Step: 22970 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-05 13:26:36,500-Speed 9403.98 samples/sec Loss 2.3398 LearningRate 0.0006 Epoch: 13 Global Step: 22980 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:27:02,627-Speed 9407.66 samples/sec Loss 2.3270 LearningRate 0.0005 Epoch: 13 Global Step: 22990 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:27:28,799-Speed 9390.65 samples/sec Loss 2.3190 LearningRate 0.0005 Epoch: 13 Global Step: 23000 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:27:54,912-Speed 9411.59 samples/sec Loss 2.3224 LearningRate 0.0005 Epoch: 13 Global Step: 23010 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:28:21,074-Speed 9394.25 samples/sec Loss 2.3103 LearningRate 0.0005 Epoch: 13 Global Step: 23020 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:28:47,169-Speed 9418.50 samples/sec Loss 2.3034 LearningRate 0.0005 Epoch: 13 Global Step: 23030 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:29:13,280-Speed 9412.47 samples/sec Loss 2.3250 LearningRate 0.0005 Epoch: 13 Global Step: 23040 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:29:39,370-Speed 9420.17 samples/sec Loss 2.3173 LearningRate 0.0005 Epoch: 13 Global Step: 23050 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:30:05,458-Speed 9421.08 samples/sec Loss 2.3097 LearningRate 0.0005 Epoch: 13 Global Step: 23060 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:30:31,613-Speed 9396.91 samples/sec Loss 2.3160 LearningRate 0.0005 Epoch: 13 Global Step: 23070 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-05 13:30:57,754-Speed 9401.74 samples/sec Loss 2.3177 LearningRate 0.0005 Epoch: 13 Global Step: 23080 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-05 13:31:23,880-Speed 9406.96 samples/sec Loss 2.3183 LearningRate 0.0005 Epoch: 13 Global Step: 23090 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-05 13:31:50,069-Speed 9384.38 samples/sec Loss 2.3111 LearningRate 0.0005 Epoch: 13 Global Step: 23100 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-05 13:32:16,166-Speed 9417.72 samples/sec Loss 2.3039 LearningRate 0.0005 Epoch: 13 Global Step: 23110 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-05 13:32:42,219-Speed 9433.26 samples/sec Loss 2.3050 LearningRate 0.0005 Epoch: 13 Global Step: 23120 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-05 13:33:08,352-Speed 9404.89 samples/sec Loss 2.2989 LearningRate 0.0005 Epoch: 13 Global Step: 23130 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-05 13:33:34,559-Speed 9377.82 samples/sec Loss 2.2983 LearningRate 0.0005 Epoch: 13 Global Step: 23140 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-05 13:34:00,738-Speed 9388.28 samples/sec Loss 2.3189 LearningRate 0.0005 Epoch: 13 Global Step: 23150 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-05 13:34:26,927-Speed 9384.72 samples/sec Loss 2.3013 LearningRate 0.0005 Epoch: 13 Global Step: 23160 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-05 13:34:53,076-Speed 9398.78 samples/sec Loss 2.2926 LearningRate 0.0005 Epoch: 13 Global Step: 23170 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:35:19,182-Speed 9414.36 samples/sec Loss 2.3038 LearningRate 0.0005 Epoch: 13 Global Step: 23180 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:35:45,282-Speed 9416.61 samples/sec Loss 2.2939 LearningRate 0.0005 Epoch: 13 Global Step: 23190 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:36:11,366-Speed 9422.56 samples/sec Loss 2.3037 LearningRate 0.0005 Epoch: 13 Global Step: 23200 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:36:37,411-Speed 9436.49 samples/sec Loss 2.2978 LearningRate 0.0005 Epoch: 13 Global Step: 23210 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:37:03,488-Speed 9424.84 samples/sec Loss 2.3170 LearningRate 0.0005 Epoch: 13 Global Step: 23220 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:37:29,586-Speed 9417.45 samples/sec Loss 2.3149 LearningRate 0.0005 Epoch: 13 Global Step: 23230 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:37:55,666-Speed 9423.80 samples/sec Loss 2.3062 LearningRate 0.0005 Epoch: 13 Global Step: 23240 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:38:21,751-Speed 9421.93 samples/sec Loss 2.3141 LearningRate 0.0005 Epoch: 13 Global Step: 23250 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:38:47,942-Speed 9384.20 samples/sec Loss 2.3009 LearningRate 0.0005 Epoch: 13 Global Step: 23260 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:39:14,080-Speed 9402.73 samples/sec Loss 2.3046 LearningRate 0.0005 Epoch: 13 Global Step: 23270 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-03-05 13:39:40,247-Speed 9392.58 samples/sec Loss 2.2828 LearningRate 0.0005 Epoch: 13 Global Step: 23280 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:40:06,411-Speed 9393.85 samples/sec Loss 2.2858 LearningRate 0.0005 Epoch: 13 Global Step: 23290 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:40:32,531-Speed 9409.23 samples/sec Loss 2.2944 LearningRate 0.0005 Epoch: 13 Global Step: 23300 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:40:58,596-Speed 9429.10 samples/sec Loss 2.3020 LearningRate 0.0005 Epoch: 13 Global Step: 23310 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-05 13:41:24,823-Speed 9371.14 samples/sec Loss 2.2843 LearningRate 0.0005 Epoch: 13 Global Step: 23320 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-05 13:41:51,096-Speed 9354.45 samples/sec Loss 2.2932 LearningRate 0.0005 Epoch: 13 Global Step: 23330 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-05 13:42:17,289-Speed 9383.09 samples/sec Loss 2.2837 LearningRate 0.0005 Epoch: 13 Global Step: 23340 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-05 13:42:43,378-Speed 9420.45 samples/sec Loss 2.2800 LearningRate 0.0005 Epoch: 13 Global Step: 23350 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-05 13:43:09,552-Speed 9389.92 samples/sec Loss 2.3006 LearningRate 0.0005 Epoch: 13 Global Step: 23360 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-05 13:43:35,703-Speed 9398.29 samples/sec Loss 2.2966 LearningRate 0.0005 Epoch: 13 Global Step: 23370 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-05 13:44:01,852-Speed 9398.59 samples/sec Loss 2.2791 LearningRate 0.0005 Epoch: 13 Global Step: 23380 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-05 13:44:28,004-Speed 9398.09 samples/sec Loss 2.2970 LearningRate 0.0005 Epoch: 13 Global Step: 23390 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-05 13:44:54,113-Speed 9413.07 samples/sec Loss 2.2735 LearningRate 0.0005 Epoch: 13 Global Step: 23400 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-05 13:45:20,279-Speed 9392.78 samples/sec Loss 2.2746 LearningRate 0.0005 Epoch: 13 Global Step: 23410 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:45:46,373-Speed 9419.13 samples/sec Loss 2.2777 LearningRate 0.0005 Epoch: 13 Global Step: 23420 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-05 13:46:12,508-Speed 9403.71 samples/sec Loss 2.2667 LearningRate 0.0005 Epoch: 13 Global Step: 23430 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-05 13:46:38,607-Speed 9416.90 samples/sec Loss 2.2887 LearningRate 0.0005 Epoch: 13 Global Step: 23440 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-05 13:47:04,686-Speed 9423.87 samples/sec Loss 2.2971 LearningRate 0.0005 Epoch: 13 Global Step: 23450 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-05 13:47:30,780-Speed 9418.82 samples/sec Loss 2.2782 LearningRate 0.0005 Epoch: 13 Global Step: 23460 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-05 13:47:56,965-Speed 9386.25 samples/sec Loss 2.2926 LearningRate 0.0005 Epoch: 13 Global Step: 23470 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-05 13:48:23,071-Speed 9414.33 samples/sec Loss 2.2779 LearningRate 0.0005 Epoch: 13 Global Step: 23480 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-05 13:48:49,159-Speed 9420.88 samples/sec Loss 2.2582 LearningRate 0.0005 Epoch: 13 Global Step: 23490 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-05 13:49:15,287-Speed 9406.35 samples/sec Loss 2.2631 LearningRate 0.0005 Epoch: 13 Global Step: 23500 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-05 13:49:41,411-Speed 9407.91 samples/sec Loss 2.2652 LearningRate 0.0005 Epoch: 13 Global Step: 23510 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-05 13:50:07,594-Speed 9387.46 samples/sec Loss 2.2682 LearningRate 0.0005 Epoch: 13 Global Step: 23520 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:50:33,746-Speed 9397.59 samples/sec Loss 2.2819 LearningRate 0.0005 Epoch: 13 Global Step: 23530 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:50:59,823-Speed 9425.11 samples/sec Loss 2.2716 LearningRate 0.0005 Epoch: 13 Global Step: 23540 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:51:26,032-Speed 9377.06 samples/sec Loss 2.2710 LearningRate 0.0005 Epoch: 13 Global Step: 23550 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:51:52,129-Speed 9417.92 samples/sec Loss 2.2697 LearningRate 0.0005 Epoch: 13 Global Step: 23560 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:52:18,196-Speed 9429.16 samples/sec Loss 2.2902 LearningRate 0.0005 Epoch: 13 Global Step: 23570 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:52:44,364-Speed 9392.05 samples/sec Loss 2.2685 LearningRate 0.0005 Epoch: 13 Global Step: 23580 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:53:10,536-Speed 9390.81 samples/sec Loss 2.2992 LearningRate 0.0005 Epoch: 13 Global Step: 23590 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:53:36,717-Speed 9387.24 samples/sec Loss 2.2769 LearningRate 0.0005 Epoch: 13 Global Step: 23600 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:54:03,072-Speed 9325.25 samples/sec Loss 2.2681 LearningRate 0.0005 Epoch: 13 Global Step: 23610 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-05 13:54:29,316-Speed 9365.03 samples/sec Loss 2.2614 LearningRate 0.0005 Epoch: 13 Global Step: 23620 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-05 13:54:55,649-Speed 9333.25 samples/sec Loss 2.2614 LearningRate 0.0005 Epoch: 13 Global Step: 23630 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-05 13:55:21,973-Speed 9336.65 samples/sec Loss 2.2765 LearningRate 0.0005 Epoch: 13 Global Step: 23640 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-05 13:55:48,195-Speed 9372.45 samples/sec Loss 2.2771 LearningRate 0.0005 Epoch: 13 Global Step: 23650 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 13:56:14,563-Speed 9321.04 samples/sec Loss 2.2779 LearningRate 0.0005 Epoch: 13 Global Step: 23660 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 13:56:40,870-Speed 9342.53 samples/sec Loss 2.2652 LearningRate 0.0005 Epoch: 13 Global Step: 23670 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 13:57:07,156-Speed 9349.73 samples/sec Loss 2.2783 LearningRate 0.0005 Epoch: 13 Global Step: 23680 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 13:57:33,350-Speed 9382.59 samples/sec Loss 2.2613 LearningRate 0.0005 Epoch: 13 Global Step: 23690 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 13:57:59,563-Speed 9376.22 samples/sec Loss 2.2678 LearningRate 0.0005 Epoch: 13 Global Step: 23700 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 13:58:25,844-Speed 9351.80 samples/sec Loss 2.2634 LearningRate 0.0005 Epoch: 13 Global Step: 23710 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 13:58:52,196-Speed 9326.17 samples/sec Loss 2.2511 LearningRate 0.0005 Epoch: 13 Global Step: 23720 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 13:59:18,430-Speed 9368.46 samples/sec Loss 2.2715 LearningRate 0.0005 Epoch: 13 Global Step: 23730 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 13:59:44,727-Speed 9346.05 samples/sec Loss 2.2700 LearningRate 0.0005 Epoch: 13 Global Step: 23740 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:00:11,000-Speed 9354.44 samples/sec Loss 2.2436 LearningRate 0.0005 Epoch: 13 Global Step: 23750 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-05 14:00:37,236-Speed 9368.01 samples/sec Loss 2.2612 LearningRate 0.0005 Epoch: 13 Global Step: 23760 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-05 14:01:03,424-Speed 9384.60 samples/sec Loss 2.2385 LearningRate 0.0005 Epoch: 13 Global Step: 23770 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:01:29,651-Speed 9370.95 samples/sec Loss 2.2444 LearningRate 0.0005 Epoch: 13 Global Step: 23780 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:01:55,840-Speed 9385.24 samples/sec Loss 2.2613 LearningRate 0.0005 Epoch: 13 Global Step: 23790 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:02:22,077-Speed 9367.42 samples/sec Loss 2.2544 LearningRate 0.0005 Epoch: 13 Global Step: 23800 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:02:48,248-Speed 9390.84 samples/sec Loss 2.2603 LearningRate 0.0005 Epoch: 13 Global Step: 23810 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:03:14,447-Speed 9380.84 samples/sec Loss 2.2663 LearningRate 0.0005 Epoch: 13 Global Step: 23820 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:03:40,565-Speed 9410.08 samples/sec Loss 2.2437 LearningRate 0.0005 Epoch: 13 Global Step: 23830 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:04:06,809-Speed 9364.81 samples/sec Loss 2.2454 LearningRate 0.0005 Epoch: 13 Global Step: 23840 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:04:32,989-Speed 9387.95 samples/sec Loss 2.2443 LearningRate 0.0005 Epoch: 13 Global Step: 23850 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:04:59,048-Speed 9431.12 samples/sec Loss 2.2651 LearningRate 0.0005 Epoch: 13 Global Step: 23860 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:05:25,116-Speed 9428.00 samples/sec Loss 2.2632 LearningRate 0.0005 Epoch: 13 Global Step: 23870 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:05:51,221-Speed 9414.97 samples/sec Loss 2.2398 LearningRate 0.0005 Epoch: 13 Global Step: 23880 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:06:17,426-Speed 9378.77 samples/sec Loss 2.2456 LearningRate 0.0005 Epoch: 13 Global Step: 23890 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:06:43,571-Speed 9400.31 samples/sec Loss 2.2355 LearningRate 0.0005 Epoch: 13 Global Step: 23900 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:07:09,633-Speed 9430.17 samples/sec Loss 2.2468 LearningRate 0.0005 Epoch: 13 Global Step: 23910 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-05 14:07:35,724-Speed 9419.51 samples/sec Loss 2.2467 LearningRate 0.0005 Epoch: 13 Global Step: 23920 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-05 14:08:01,896-Speed 9390.96 samples/sec Loss 2.2437 LearningRate 0.0005 Epoch: 13 Global Step: 23930 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-05 14:08:28,037-Speed 9401.86 samples/sec Loss 2.2555 LearningRate 0.0005 Epoch: 13 Global Step: 23940 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-05 14:08:54,159-Speed 9408.72 samples/sec Loss 2.2430 LearningRate 0.0005 Epoch: 13 Global Step: 23950 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-05 14:09:20,241-Speed 9422.89 samples/sec Loss 2.2384 LearningRate 0.0005 Epoch: 13 Global Step: 23960 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-05 14:09:46,352-Speed 9412.47 samples/sec Loss 2.2307 LearningRate 0.0005 Epoch: 13 Global Step: 23970 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-05 14:10:12,532-Speed 9387.85 samples/sec Loss 2.2412 LearningRate 0.0005 Epoch: 13 Global Step: 23980 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-05 14:10:38,589-Speed 9432.22 samples/sec Loss 2.2532 LearningRate 0.0005 Epoch: 13 Global Step: 23990 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-05 14:11:04,724-Speed 9403.89 samples/sec Loss 2.2319 LearningRate 0.0005 Epoch: 13 Global Step: 24000 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-05 14:11:30,870-Speed 9399.99 samples/sec Loss 2.2558 LearningRate 0.0005 Epoch: 13 Global Step: 24010 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:11:56,941-Speed 9426.66 samples/sec Loss 2.2425 LearningRate 0.0005 Epoch: 13 Global Step: 24020 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:12:22,998-Speed 9432.30 samples/sec Loss 2.2317 LearningRate 0.0005 Epoch: 13 Global Step: 24030 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:12:49,194-Speed 9381.77 samples/sec Loss 2.2290 LearningRate 0.0005 Epoch: 13 Global Step: 24040 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:13:15,329-Speed 9403.69 samples/sec Loss 2.2468 LearningRate 0.0005 Epoch: 13 Global Step: 24050 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:13:41,468-Speed 9402.73 samples/sec Loss 2.2416 LearningRate 0.0005 Epoch: 13 Global Step: 24060 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:14:07,536-Speed 9428.01 samples/sec Loss 2.2375 LearningRate 0.0005 Epoch: 13 Global Step: 24070 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:14:33,645-Speed 9413.51 samples/sec Loss 2.2415 LearningRate 0.0005 Epoch: 13 Global Step: 24080 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:14:59,804-Speed 9394.93 samples/sec Loss 2.2426 LearningRate 0.0005 Epoch: 13 Global Step: 24090 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:15:25,981-Speed 9388.74 samples/sec Loss 2.2399 LearningRate 0.0005 Epoch: 13 Global Step: 24100 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:15:52,070-Speed 9420.55 samples/sec Loss 2.2365 LearningRate 0.0005 Epoch: 13 Global Step: 24110 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-05 14:16:18,217-Speed 9399.56 samples/sec Loss 2.2474 LearningRate 0.0005 Epoch: 13 Global Step: 24120 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-05 14:16:44,307-Speed 9420.21 samples/sec Loss 2.2249 LearningRate 0.0005 Epoch: 13 Global Step: 24130 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:17:10,374-Speed 9428.88 samples/sec Loss 2.2584 LearningRate 0.0005 Epoch: 13 Global Step: 24140 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:17:36,399-Speed 9443.60 samples/sec Loss 2.2329 LearningRate 0.0005 Epoch: 13 Global Step: 24150 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:18:02,472-Speed 9426.03 samples/sec Loss 2.2637 LearningRate 0.0005 Epoch: 13 Global Step: 24160 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:18:28,563-Speed 9419.91 samples/sec Loss 2.2593 LearningRate 0.0005 Epoch: 13 Global Step: 24170 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:18:54,646-Speed 9422.40 samples/sec Loss 2.2717 LearningRate 0.0005 Epoch: 13 Global Step: 24180 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:19:20,706-Speed 9431.34 samples/sec Loss 2.2575 LearningRate 0.0005 Epoch: 13 Global Step: 24190 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:20:38,883-Speed 3143.68 samples/sec Loss 2.2348 LearningRate 0.0005 Epoch: 14 Global Step: 24200 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:21:04,737-Speed 9506.12 samples/sec Loss 2.1885 LearningRate 0.0005 Epoch: 14 Global Step: 24210 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:21:30,706-Speed 9463.89 samples/sec Loss 2.2159 LearningRate 0.0005 Epoch: 14 Global Step: 24220 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:21:56,707-Speed 9452.46 samples/sec Loss 2.2135 LearningRate 0.0005 Epoch: 14 Global Step: 24230 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-05 14:22:22,652-Speed 9472.83 samples/sec Loss 2.2177 LearningRate 0.0005 Epoch: 14 Global Step: 24240 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:22:48,640-Speed 9457.00 samples/sec Loss 2.2006 LearningRate 0.0005 Epoch: 14 Global Step: 24250 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:23:14,620-Speed 9460.35 samples/sec Loss 2.1888 LearningRate 0.0005 Epoch: 14 Global Step: 24260 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:23:40,592-Speed 9462.90 samples/sec Loss 2.1943 LearningRate 0.0005 Epoch: 14 Global Step: 24270 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:24:06,541-Speed 9471.27 samples/sec Loss 2.2248 LearningRate 0.0005 Epoch: 14 Global Step: 24280 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:24:32,589-Speed 9435.48 samples/sec Loss 2.2197 LearningRate 0.0005 Epoch: 14 Global Step: 24290 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:24:58,633-Speed 9436.59 samples/sec Loss 2.2053 LearningRate 0.0005 Epoch: 14 Global Step: 24300 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:25:24,635-Speed 9452.15 samples/sec Loss 2.2037 LearningRate 0.0005 Epoch: 14 Global Step: 24310 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:25:50,665-Speed 9442.09 samples/sec Loss 2.1874 LearningRate 0.0005 Epoch: 14 Global Step: 24320 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-05 14:26:16,683-Speed 9446.12 samples/sec Loss 2.2006 LearningRate 0.0005 Epoch: 14 Global Step: 24330 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-05 14:26:42,730-Speed 9435.49 samples/sec Loss 2.2506 LearningRate 0.0005 Epoch: 14 Global Step: 24340 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-05 14:27:08,804-Speed 9425.89 samples/sec Loss 2.2244 LearningRate 0.0005 Epoch: 14 Global Step: 24350 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-05 14:27:34,846-Speed 9437.52 samples/sec Loss 2.2029 LearningRate 0.0005 Epoch: 14 Global Step: 24360 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-05 14:28:00,948-Speed 9415.61 samples/sec Loss 2.2316 LearningRate 0.0005 Epoch: 14 Global Step: 24370 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-05 14:28:27,029-Speed 9423.89 samples/sec Loss 2.2109 LearningRate 0.0005 Epoch: 14 Global Step: 24380 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-05 14:28:53,191-Speed 9394.06 samples/sec Loss 2.1909 LearningRate 0.0005 Epoch: 14 Global Step: 24390 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-05 14:29:19,370-Speed 9387.91 samples/sec Loss 2.1837 LearningRate 0.0005 Epoch: 14 Global Step: 24400 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-05 14:29:45,471-Speed 9416.43 samples/sec Loss 2.1840 LearningRate 0.0005 Epoch: 14 Global Step: 24410 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-05 14:30:11,576-Speed 9414.80 samples/sec Loss 2.2107 LearningRate 0.0005 Epoch: 14 Global Step: 24420 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:30:37,798-Speed 9372.66 samples/sec Loss 2.1989 LearningRate 0.0005 Epoch: 14 Global Step: 24430 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:31:03,907-Speed 9412.98 samples/sec Loss 2.2030 LearningRate 0.0005 Epoch: 14 Global Step: 24440 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:31:30,026-Speed 9409.75 samples/sec Loss 2.2154 LearningRate 0.0005 Epoch: 14 Global Step: 24450 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:31:56,052-Speed 9443.51 samples/sec Loss 2.2150 LearningRate 0.0005 Epoch: 14 Global Step: 24460 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-05 14:32:22,099-Speed 9435.53 samples/sec Loss 2.2106 LearningRate 0.0005 Epoch: 14 Global Step: 24470 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-05 14:32:48,230-Speed 9405.88 samples/sec Loss 2.2193 LearningRate 0.0005 Epoch: 14 Global Step: 24480 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-05 14:33:14,414-Speed 9386.44 samples/sec Loss 2.2086 LearningRate 0.0005 Epoch: 14 Global Step: 24490 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-05 14:33:40,524-Speed 9412.79 samples/sec Loss 2.2012 LearningRate 0.0005 Epoch: 14 Global Step: 24500 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-05 14:34:06,616-Speed 9419.27 samples/sec Loss 2.1989 LearningRate 0.0005 Epoch: 14 Global Step: 24510 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-05 14:34:32,653-Speed 9439.69 samples/sec Loss 2.1946 LearningRate 0.0005 Epoch: 14 Global Step: 24520 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-05 14:34:58,760-Speed 9414.09 samples/sec Loss 2.1992 LearningRate 0.0005 Epoch: 14 Global Step: 24530 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-05 14:35:24,779-Speed 9445.68 samples/sec Loss 2.1876 LearningRate 0.0005 Epoch: 14 Global Step: 24540 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-05 14:35:50,933-Speed 9397.23 samples/sec Loss 2.1849 LearningRate 0.0005 Epoch: 14 Global Step: 24550 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-05 14:36:17,040-Speed 9414.08 samples/sec Loss 2.2051 LearningRate 0.0005 Epoch: 14 Global Step: 24560 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:36:43,013-Speed 9462.94 samples/sec Loss 2.2226 LearningRate 0.0005 Epoch: 14 Global Step: 24570 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-05 14:37:09,075-Speed 9430.27 samples/sec Loss 2.2351 LearningRate 0.0005 Epoch: 14 Global Step: 24580 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-05 14:37:35,232-Speed 9396.06 samples/sec Loss 2.2155 LearningRate 0.0005 Epoch: 14 Global Step: 24590 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-05 14:38:01,391-Speed 9395.18 samples/sec Loss 2.2105 LearningRate 0.0005 Epoch: 14 Global Step: 24600 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-05 14:38:27,476-Speed 9421.88 samples/sec Loss 2.2012 LearningRate 0.0005 Epoch: 14 Global Step: 24610 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-05 14:38:53,631-Speed 9397.03 samples/sec Loss 2.2046 LearningRate 0.0005 Epoch: 14 Global Step: 24620 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-05 14:39:19,769-Speed 9403.12 samples/sec Loss 2.1929 LearningRate 0.0005 Epoch: 14 Global Step: 24630 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-05 14:39:45,959-Speed 9384.01 samples/sec Loss 2.1855 LearningRate 0.0005 Epoch: 14 Global Step: 24640 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-05 14:40:12,143-Speed 9386.19 samples/sec Loss 2.1908 LearningRate 0.0005 Epoch: 14 Global Step: 24650 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-05 14:40:38,357-Speed 9375.52 samples/sec Loss 2.1939 LearningRate 0.0005 Epoch: 14 Global Step: 24660 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-05 14:41:04,493-Speed 9403.79 samples/sec Loss 2.2024 LearningRate 0.0005 Epoch: 14 Global Step: 24670 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:41:30,738-Speed 9364.48 samples/sec Loss 2.1909 LearningRate 0.0005 Epoch: 14 Global Step: 24680 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:41:56,956-Speed 9374.20 samples/sec Loss 2.1838 LearningRate 0.0005 Epoch: 14 Global Step: 24690 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:42:23,118-Speed 9394.12 samples/sec Loss 2.1746 LearningRate 0.0005 Epoch: 14 Global Step: 24700 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-05 14:42:49,268-Speed 9398.81 samples/sec Loss 2.1685 LearningRate 0.0005 Epoch: 14 Global Step: 24710 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-05 14:43:15,459-Speed 9383.97 samples/sec Loss 2.1978 LearningRate 0.0005 Epoch: 14 Global Step: 24720 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-05 14:43:41,599-Speed 9402.12 samples/sec Loss 2.1821 LearningRate 0.0005 Epoch: 14 Global Step: 24730 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-05 14:44:07,758-Speed 9395.08 samples/sec Loss 2.1972 LearningRate 0.0005 Epoch: 14 Global Step: 24740 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-05 14:44:33,949-Speed 9383.90 samples/sec Loss 2.1912 LearningRate 0.0005 Epoch: 14 Global Step: 24750 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-05 14:45:00,049-Speed 9416.74 samples/sec Loss 2.1707 LearningRate 0.0005 Epoch: 14 Global Step: 24760 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-05 14:45:26,211-Speed 9395.45 samples/sec Loss 2.1787 LearningRate 0.0005 Epoch: 14 Global Step: 24770 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-05 14:45:52,360-Speed 9399.06 samples/sec Loss 2.1979 LearningRate 0.0005 Epoch: 14 Global Step: 24780 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-05 14:46:18,491-Speed 9405.13 samples/sec Loss 2.1820 LearningRate 0.0005 Epoch: 14 Global Step: 24790 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-05 14:46:44,620-Speed 9406.20 samples/sec Loss 2.1811 LearningRate 0.0005 Epoch: 14 Global Step: 24800 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:47:10,734-Speed 9411.46 samples/sec Loss 2.1648 LearningRate 0.0005 Epoch: 14 Global Step: 24810 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:47:37,037-Speed 9344.36 samples/sec Loss 2.1704 LearningRate 0.0005 Epoch: 14 Global Step: 24820 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:48:03,312-Speed 9353.67 samples/sec Loss 2.1713 LearningRate 0.0005 Epoch: 14 Global Step: 24830 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:48:29,579-Speed 9356.88 samples/sec Loss 2.1759 LearningRate 0.0005 Epoch: 14 Global Step: 24840 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:48:55,713-Speed 9404.39 samples/sec Loss 2.1768 LearningRate 0.0005 Epoch: 14 Global Step: 24850 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:49:21,886-Speed 9390.38 samples/sec Loss 2.1750 LearningRate 0.0005 Epoch: 14 Global Step: 24860 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:49:48,113-Speed 9371.28 samples/sec Loss 2.1762 LearningRate 0.0005 Epoch: 14 Global Step: 24870 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:50:14,301-Speed 9384.66 samples/sec Loss 2.1789 LearningRate 0.0005 Epoch: 14 Global Step: 24880 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:50:40,465-Speed 9393.42 samples/sec Loss 2.1798 LearningRate 0.0005 Epoch: 14 Global Step: 24890 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:51:06,669-Speed 9379.34 samples/sec Loss 2.1924 LearningRate 0.0005 Epoch: 14 Global Step: 24900 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-05 14:51:32,826-Speed 9395.83 samples/sec Loss 2.1747 LearningRate 0.0005 Epoch: 14 Global Step: 24910 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-05 14:51:58,955-Speed 9406.18 samples/sec Loss 2.1631 LearningRate 0.0005 Epoch: 14 Global Step: 24920 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:52:25,171-Speed 9374.95 samples/sec Loss 2.1705 LearningRate 0.0005 Epoch: 14 Global Step: 24930 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:52:51,281-Speed 9412.80 samples/sec Loss 2.1749 LearningRate 0.0005 Epoch: 14 Global Step: 24940 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:53:17,453-Speed 9391.25 samples/sec Loss 2.1969 LearningRate 0.0005 Epoch: 14 Global Step: 24950 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:53:43,672-Speed 9373.82 samples/sec Loss 2.1848 LearningRate 0.0005 Epoch: 14 Global Step: 24960 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:54:09,845-Speed 9390.27 samples/sec Loss 2.1752 LearningRate 0.0005 Epoch: 14 Global Step: 24970 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-05 14:54:36,017-Speed 9390.45 samples/sec Loss 2.1698 LearningRate 0.0005 Epoch: 14 Global Step: 24980 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 14:55:02,183-Speed 9392.82 samples/sec Loss 2.1713 LearningRate 0.0005 Epoch: 14 Global Step: 24990 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 14:55:28,365-Speed 9387.45 samples/sec Loss 2.1650 LearningRate 0.0005 Epoch: 14 Global Step: 25000 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 14:55:54,472-Speed 9414.34 samples/sec Loss 2.1598 LearningRate 0.0005 Epoch: 14 Global Step: 25010 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 14:56:20,630-Speed 9395.66 samples/sec Loss 2.1634 LearningRate 0.0005 Epoch: 14 Global Step: 25020 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-05 14:56:46,792-Speed 9393.96 samples/sec Loss 2.1524 LearningRate 0.0005 Epoch: 14 Global Step: 25030 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-05 14:57:12,858-Speed 9429.09 samples/sec Loss 2.1556 LearningRate 0.0005 Epoch: 14 Global Step: 25040 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 14:57:39,006-Speed 9399.63 samples/sec Loss 2.1573 LearningRate 0.0005 Epoch: 14 Global Step: 25050 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 14:58:05,122-Speed 9410.69 samples/sec Loss 2.1579 LearningRate 0.0005 Epoch: 14 Global Step: 25060 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 14:58:31,148-Speed 9443.39 samples/sec Loss 2.1452 LearningRate 0.0005 Epoch: 14 Global Step: 25070 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 14:58:57,351-Speed 9379.43 samples/sec Loss 2.1612 LearningRate 0.0005 Epoch: 14 Global Step: 25080 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 14:59:23,457-Speed 9414.13 samples/sec Loss 2.1712 LearningRate 0.0005 Epoch: 14 Global Step: 25090 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 14:59:49,551-Speed 9419.08 samples/sec Loss 2.1611 LearningRate 0.0005 Epoch: 14 Global Step: 25100 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:00:15,721-Speed 9391.43 samples/sec Loss 2.1508 LearningRate 0.0005 Epoch: 14 Global Step: 25110 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:00:41,792-Speed 9426.85 samples/sec Loss 2.1496 LearningRate 0.0005 Epoch: 14 Global Step: 25120 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:01:07,937-Speed 9400.15 samples/sec Loss 2.1554 LearningRate 0.0005 Epoch: 14 Global Step: 25130 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:01:34,051-Speed 9411.57 samples/sec Loss 2.1622 LearningRate 0.0005 Epoch: 14 Global Step: 25140 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-05 15:02:00,071-Speed 9445.80 samples/sec Loss 2.1562 LearningRate 0.0005 Epoch: 14 Global Step: 25150 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:02:26,191-Speed 9409.25 samples/sec Loss 2.1502 LearningRate 0.0005 Epoch: 14 Global Step: 25160 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:02:52,300-Speed 9413.37 samples/sec Loss 2.1657 LearningRate 0.0005 Epoch: 14 Global Step: 25170 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:03:18,373-Speed 9426.20 samples/sec Loss 2.1364 LearningRate 0.0005 Epoch: 14 Global Step: 25180 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:03:44,535-Speed 9394.32 samples/sec Loss 2.1490 LearningRate 0.0005 Epoch: 14 Global Step: 25190 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:04:10,606-Speed 9427.04 samples/sec Loss 2.1369 LearningRate 0.0005 Epoch: 14 Global Step: 25200 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:04:36,747-Speed 9401.72 samples/sec Loss 2.1508 LearningRate 0.0005 Epoch: 14 Global Step: 25210 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:05:02,815-Speed 9427.91 samples/sec Loss 2.1412 LearningRate 0.0005 Epoch: 14 Global Step: 25220 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:05:28,876-Speed 9430.82 samples/sec Loss 2.1487 LearningRate 0.0005 Epoch: 14 Global Step: 25230 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:05:54,968-Speed 9419.35 samples/sec Loss 2.1687 LearningRate 0.0005 Epoch: 14 Global Step: 25240 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:06:21,101-Speed 9404.54 samples/sec Loss 2.1457 LearningRate 0.0005 Epoch: 14 Global Step: 25250 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-05 15:06:47,232-Speed 9405.25 samples/sec Loss 2.1433 LearningRate 0.0005 Epoch: 14 Global Step: 25260 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-05 15:07:13,298-Speed 9428.87 samples/sec Loss 2.1387 LearningRate 0.0005 Epoch: 14 Global Step: 25270 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-05 15:07:39,452-Speed 9397.16 samples/sec Loss 2.1361 LearningRate 0.0005 Epoch: 14 Global Step: 25280 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:08:05,591-Speed 9402.44 samples/sec Loss 2.1517 LearningRate 0.0005 Epoch: 14 Global Step: 25290 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:08:31,804-Speed 9375.98 samples/sec Loss 2.1457 LearningRate 0.0005 Epoch: 14 Global Step: 25300 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:08:57,862-Speed 9431.37 samples/sec Loss 2.1395 LearningRate 0.0005 Epoch: 14 Global Step: 25310 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:09:23,947-Speed 9422.05 samples/sec Loss 2.1499 LearningRate 0.0005 Epoch: 14 Global Step: 25320 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:09:50,033-Speed 9421.71 samples/sec Loss 2.1393 LearningRate 0.0005 Epoch: 14 Global Step: 25330 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:10:16,170-Speed 9402.96 samples/sec Loss 2.1460 LearningRate 0.0005 Epoch: 14 Global Step: 25340 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:10:42,314-Speed 9400.79 samples/sec Loss 2.1522 LearningRate 0.0005 Epoch: 14 Global Step: 25350 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:11:08,423-Speed 9413.38 samples/sec Loss 2.1367 LearningRate 0.0005 Epoch: 14 Global Step: 25360 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:11:34,537-Speed 9411.34 samples/sec Loss 2.1352 LearningRate 0.0005 Epoch: 14 Global Step: 25370 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:12:00,575-Speed 9438.76 samples/sec Loss 2.1501 LearningRate 0.0005 Epoch: 14 Global Step: 25380 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:12:26,672-Speed 9417.82 samples/sec Loss 2.1563 LearningRate 0.0005 Epoch: 14 Global Step: 25390 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:12:52,853-Speed 9387.61 samples/sec Loss 2.1359 LearningRate 0.0005 Epoch: 14 Global Step: 25400 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:13:18,924-Speed 9426.81 samples/sec Loss 2.1293 LearningRate 0.0005 Epoch: 14 Global Step: 25410 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:13:45,005-Speed 9423.27 samples/sec Loss 2.1283 LearningRate 0.0005 Epoch: 14 Global Step: 25420 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:14:11,068-Speed 9429.78 samples/sec Loss 2.1229 LearningRate 0.0005 Epoch: 14 Global Step: 25430 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:14:37,257-Speed 9385.17 samples/sec Loss 2.1406 LearningRate 0.0005 Epoch: 14 Global Step: 25440 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:15:03,402-Speed 9400.32 samples/sec Loss 2.1293 LearningRate 0.0005 Epoch: 14 Global Step: 25450 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:15:29,487-Speed 9421.89 samples/sec Loss 2.1242 LearningRate 0.0005 Epoch: 14 Global Step: 25460 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:15:55,556-Speed 9427.99 samples/sec Loss 2.1499 LearningRate 0.0005 Epoch: 14 Global Step: 25470 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:16:21,637-Speed 9423.34 samples/sec Loss 2.1283 LearningRate 0.0005 Epoch: 14 Global Step: 25480 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-05 15:16:47,750-Speed 9411.72 samples/sec Loss 2.1338 LearningRate 0.0005 Epoch: 14 Global Step: 25490 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-05 15:17:13,896-Speed 9399.90 samples/sec Loss 2.1221 LearningRate 0.0005 Epoch: 14 Global Step: 25500 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-05 15:17:39,982-Speed 9421.55 samples/sec Loss 2.1163 LearningRate 0.0005 Epoch: 14 Global Step: 25510 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-05 15:18:06,124-Speed 9401.30 samples/sec Loss 2.1378 LearningRate 0.0005 Epoch: 14 Global Step: 25520 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-05 15:18:32,186-Speed 9430.46 samples/sec Loss 2.1279 LearningRate 0.0005 Epoch: 14 Global Step: 25530 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-05 15:18:58,253-Speed 9428.35 samples/sec Loss 2.1371 LearningRate 0.0005 Epoch: 14 Global Step: 25540 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:19:24,374-Speed 9408.73 samples/sec Loss 2.1313 LearningRate 0.0005 Epoch: 14 Global Step: 25550 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:19:50,479-Speed 9414.66 samples/sec Loss 2.1388 LearningRate 0.0005 Epoch: 14 Global Step: 25560 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:20:16,528-Speed 9435.05 samples/sec Loss 2.1304 LearningRate 0.0005 Epoch: 14 Global Step: 25570 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:20:42,677-Speed 9398.82 samples/sec Loss 2.1274 LearningRate 0.0005 Epoch: 14 Global Step: 25580 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:21:08,819-Speed 9401.49 samples/sec Loss 2.1329 LearningRate 0.0005 Epoch: 14 Global Step: 25590 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:21:34,840-Speed 9445.03 samples/sec Loss 2.1323 LearningRate 0.0005 Epoch: 14 Global Step: 25600 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:22:00,887-Speed 9435.82 samples/sec Loss 2.1045 LearningRate 0.0005 Epoch: 14 Global Step: 25610 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:22:26,991-Speed 9414.92 samples/sec Loss 2.1133 LearningRate 0.0005 Epoch: 14 Global Step: 25620 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:22:53,094-Speed 9415.42 samples/sec Loss 2.1120 LearningRate 0.0005 Epoch: 14 Global Step: 25630 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:23:19,165-Speed 9427.01 samples/sec Loss 2.1074 LearningRate 0.0005 Epoch: 14 Global Step: 25640 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:23:45,253-Speed 9420.89 samples/sec Loss 2.1103 LearningRate 0.0005 Epoch: 14 Global Step: 25650 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:24:11,283-Speed 9441.85 samples/sec Loss 2.1410 LearningRate 0.0005 Epoch: 14 Global Step: 25660 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:24:37,367-Speed 9422.12 samples/sec Loss 2.1167 LearningRate 0.0005 Epoch: 14 Global Step: 25670 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:25:03,551-Speed 9386.40 samples/sec Loss 2.1116 LearningRate 0.0005 Epoch: 14 Global Step: 25680 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:25:29,718-Speed 9392.35 samples/sec Loss 2.1275 LearningRate 0.0005 Epoch: 14 Global Step: 25690 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:25:55,840-Speed 9408.57 samples/sec Loss 2.1394 LearningRate 0.0005 Epoch: 14 Global Step: 25700 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:26:21,968-Speed 9406.36 samples/sec Loss 2.1197 LearningRate 0.0005 Epoch: 14 Global Step: 25710 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:26:48,116-Speed 9399.18 samples/sec Loss 2.1243 LearningRate 0.0005 Epoch: 14 Global Step: 25720 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:27:14,222-Speed 9414.76 samples/sec Loss 2.1075 LearningRate 0.0005 Epoch: 14 Global Step: 25730 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:27:40,315-Speed 9419.38 samples/sec Loss 2.1267 LearningRate 0.0005 Epoch: 14 Global Step: 25740 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-05 15:28:06,416-Speed 9417.35 samples/sec Loss 2.1149 LearningRate 0.0005 Epoch: 14 Global Step: 25750 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-05 15:28:32,567-Speed 9398.05 samples/sec Loss 2.1213 LearningRate 0.0005 Epoch: 14 Global Step: 25760 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:28:58,567-Speed 9452.97 samples/sec Loss 2.1242 LearningRate 0.0005 Epoch: 14 Global Step: 25770 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:29:24,633-Speed 9428.50 samples/sec Loss 2.1094 LearningRate 0.0005 Epoch: 14 Global Step: 25780 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:29:50,734-Speed 9416.33 samples/sec Loss 2.1114 LearningRate 0.0005 Epoch: 14 Global Step: 25790 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:30:16,768-Speed 9440.51 samples/sec Loss 2.0963 LearningRate 0.0005 Epoch: 14 Global Step: 25800 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:30:42,906-Speed 9402.65 samples/sec Loss 2.1167 LearningRate 0.0005 Epoch: 14 Global Step: 25810 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:31:09,065-Speed 9395.15 samples/sec Loss 2.1147 LearningRate 0.0005 Epoch: 14 Global Step: 25820 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:31:35,147-Speed 9423.16 samples/sec Loss 2.1142 LearningRate 0.0005 Epoch: 14 Global Step: 25830 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:32:01,302-Speed 9396.58 samples/sec Loss 2.1072 LearningRate 0.0005 Epoch: 14 Global Step: 25840 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:32:27,421-Speed 9409.90 samples/sec Loss 2.1023 LearningRate 0.0005 Epoch: 14 Global Step: 25850 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:32:53,548-Speed 9406.78 samples/sec Loss 2.1220 LearningRate 0.0005 Epoch: 14 Global Step: 25860 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-05 15:33:19,651-Speed 9415.41 samples/sec Loss 2.1052 LearningRate 0.0005 Epoch: 14 Global Step: 25870 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-05 15:33:45,673-Speed 9444.93 samples/sec Loss 2.1180 LearningRate 0.0005 Epoch: 14 Global Step: 25880 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-05 15:34:11,723-Speed 9434.42 samples/sec Loss 2.1347 LearningRate 0.0005 Epoch: 14 Global Step: 25890 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-05 15:34:37,754-Speed 9441.63 samples/sec Loss 2.1324 LearningRate 0.0005 Epoch: 14 Global Step: 25900 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-05 15:35:03,820-Speed 9428.61 samples/sec Loss 2.1267 LearningRate 0.0005 Epoch: 14 Global Step: 25910 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:35:29,875-Speed 9432.93 samples/sec Loss 2.1499 LearningRate 0.0005 Epoch: 14 Global Step: 25920 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:36:47,735-Speed 3156.49 samples/sec Loss 2.1032 LearningRate 0.0005 Epoch: 15 Global Step: 25930 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:37:13,655-Speed 9482.09 samples/sec Loss 2.0739 LearningRate 0.0005 Epoch: 15 Global Step: 25940 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:37:39,660-Speed 9451.05 samples/sec Loss 2.0663 LearningRate 0.0005 Epoch: 15 Global Step: 25950 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:38:05,782-Speed 9408.71 samples/sec Loss 2.0801 LearningRate 0.0005 Epoch: 15 Global Step: 25960 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:38:31,815-Speed 9440.60 samples/sec Loss 2.0872 LearningRate 0.0005 Epoch: 15 Global Step: 25970 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:38:57,824-Speed 9449.51 samples/sec Loss 2.0945 LearningRate 0.0005 Epoch: 15 Global Step: 25980 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:39:23,806-Speed 9459.68 samples/sec Loss 2.0631 LearningRate 0.0005 Epoch: 15 Global Step: 25990 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:39:49,810-Speed 9451.37 samples/sec Loss 2.0824 LearningRate 0.0005 Epoch: 15 Global Step: 26000 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:40:15,897-Speed 9421.19 samples/sec Loss 2.1073 LearningRate 0.0005 Epoch: 15 Global Step: 26010 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-05 15:40:42,021-Speed 9408.16 samples/sec Loss 2.0832 LearningRate 0.0005 Epoch: 15 Global Step: 26020 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-05 15:41:08,021-Speed 9452.51 samples/sec Loss 2.0846 LearningRate 0.0005 Epoch: 15 Global Step: 26030 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:41:34,060-Speed 9438.81 samples/sec Loss 2.0900 LearningRate 0.0005 Epoch: 15 Global Step: 26040 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:42:00,139-Speed 9424.27 samples/sec Loss 2.0809 LearningRate 0.0005 Epoch: 15 Global Step: 26050 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:42:26,175-Speed 9439.43 samples/sec Loss 2.0732 LearningRate 0.0005 Epoch: 15 Global Step: 26060 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:42:52,168-Speed 9455.37 samples/sec Loss 2.0812 LearningRate 0.0005 Epoch: 15 Global Step: 26070 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:43:18,155-Speed 9457.54 samples/sec Loss 2.0723 LearningRate 0.0005 Epoch: 15 Global Step: 26080 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-03-05 15:43:44,191-Speed 9439.27 samples/sec Loss 2.0955 LearningRate 0.0005 Epoch: 15 Global Step: 26090 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-03-05 15:44:10,276-Speed 9422.24 samples/sec Loss 2.1377 LearningRate 0.0005 Epoch: 15 Global Step: 26100 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-03-05 15:44:36,383-Speed 9414.12 samples/sec Loss 2.0953 LearningRate 0.0005 Epoch: 15 Global Step: 26110 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-03-05 15:45:02,533-Speed 9398.61 samples/sec Loss 2.0814 LearningRate 0.0005 Epoch: 15 Global Step: 26120 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-03-05 15:45:28,626-Speed 9419.12 samples/sec Loss 2.0717 LearningRate 0.0005 Epoch: 15 Global Step: 26130 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-03-05 15:45:54,790-Speed 9393.40 samples/sec Loss 2.0804 LearningRate 0.0005 Epoch: 15 Global Step: 26140 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-03-05 15:46:20,853-Speed 9429.83 samples/sec Loss 2.0720 LearningRate 0.0005 Epoch: 15 Global Step: 26150 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-03-05 15:46:46,931-Speed 9424.60 samples/sec Loss 2.0898 LearningRate 0.0005 Epoch: 15 Global Step: 26160 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-03-05 15:47:13,044-Speed 9411.85 samples/sec Loss 2.0959 LearningRate 0.0005 Epoch: 15 Global Step: 26170 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-03-05 15:47:39,088-Speed 9436.73 samples/sec Loss 2.0878 LearningRate 0.0005 Epoch: 15 Global Step: 26180 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:48:05,268-Speed 9387.62 samples/sec Loss 2.0785 LearningRate 0.0005 Epoch: 15 Global Step: 26190 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:48:31,401-Speed 9404.68 samples/sec Loss 2.0763 LearningRate 0.0005 Epoch: 15 Global Step: 26200 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:48:57,516-Speed 9411.14 samples/sec Loss 2.0684 LearningRate 0.0005 Epoch: 15 Global Step: 26210 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:49:23,613-Speed 9417.77 samples/sec Loss 2.0887 LearningRate 0.0005 Epoch: 15 Global Step: 26220 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:49:49,679-Speed 9428.87 samples/sec Loss 2.0990 LearningRate 0.0005 Epoch: 15 Global Step: 26230 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:50:15,756-Speed 9424.64 samples/sec Loss 2.0850 LearningRate 0.0005 Epoch: 15 Global Step: 26240 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:50:41,841-Speed 9422.26 samples/sec Loss 2.0803 LearningRate 0.0005 Epoch: 15 Global Step: 26250 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:51:07,948-Speed 9413.72 samples/sec Loss 2.0813 LearningRate 0.0005 Epoch: 15 Global Step: 26260 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:51:33,999-Speed 9434.43 samples/sec Loss 2.0803 LearningRate 0.0005 Epoch: 15 Global Step: 26270 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:52:00,036-Speed 9439.52 samples/sec Loss 2.1145 LearningRate 0.0005 Epoch: 15 Global Step: 26280 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:52:26,102-Speed 9428.60 samples/sec Loss 2.0759 LearningRate 0.0005 Epoch: 15 Global Step: 26290 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:52:52,272-Speed 9391.21 samples/sec Loss 2.0734 LearningRate 0.0005 Epoch: 15 Global Step: 26300 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:53:18,327-Speed 9432.96 samples/sec Loss 2.0579 LearningRate 0.0005 Epoch: 15 Global Step: 26310 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:53:44,469-Speed 9401.39 samples/sec Loss 2.0877 LearningRate 0.0005 Epoch: 15 Global Step: 26320 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:54:10,566-Speed 9417.48 samples/sec Loss 2.0828 LearningRate 0.0005 Epoch: 15 Global Step: 26330 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-05 15:54:36,614-Speed 9435.41 samples/sec Loss 2.0871 LearningRate 0.0005 Epoch: 15 Global Step: 26340 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-05 15:55:02,694-Speed 9423.76 samples/sec Loss 2.0653 LearningRate 0.0005 Epoch: 15 Global Step: 26350 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-05 15:55:28,829-Speed 9403.84 samples/sec Loss 2.0849 LearningRate 0.0005 Epoch: 15 Global Step: 26360 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-05 15:55:54,885-Speed 9432.33 samples/sec Loss 2.0599 LearningRate 0.0005 Epoch: 15 Global Step: 26370 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-05 15:56:20,938-Speed 9433.39 samples/sec Loss 2.0705 LearningRate 0.0005 Epoch: 15 Global Step: 26380 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-05 15:56:46,979-Speed 9437.84 samples/sec Loss 2.0829 LearningRate 0.0005 Epoch: 15 Global Step: 26390 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-05 15:57:13,021-Speed 9437.67 samples/sec Loss 2.0730 LearningRate 0.0005 Epoch: 15 Global Step: 26400 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-05 15:57:39,097-Speed 9425.19 samples/sec Loss 2.0648 LearningRate 0.0005 Epoch: 15 Global Step: 26410 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-05 15:58:05,131-Speed 9440.25 samples/sec Loss 2.0780 LearningRate 0.0005 Epoch: 15 Global Step: 26420 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-05 15:58:31,248-Speed 9410.44 samples/sec Loss 2.0732 LearningRate 0.0005 Epoch: 15 Global Step: 26430 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-05 15:58:57,327-Speed 9423.75 samples/sec Loss 2.0720 LearningRate 0.0005 Epoch: 15 Global Step: 26440 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 15:59:23,382-Speed 9432.99 samples/sec Loss 2.0820 LearningRate 0.0005 Epoch: 15 Global Step: 26450 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 15:59:49,418-Speed 9439.69 samples/sec Loss 2.0690 LearningRate 0.0005 Epoch: 15 Global Step: 26460 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:00:15,491-Speed 9426.26 samples/sec Loss 2.0568 LearningRate 0.0005 Epoch: 15 Global Step: 26470 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:00:41,495-Speed 9451.25 samples/sec Loss 2.0627 LearningRate 0.0005 Epoch: 15 Global Step: 26480 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:01:07,531-Speed 9439.85 samples/sec Loss 2.0713 LearningRate 0.0005 Epoch: 15 Global Step: 26490 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-05 16:01:33,618-Speed 9421.13 samples/sec Loss 2.0508 LearningRate 0.0005 Epoch: 15 Global Step: 26500 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-05 16:01:59,743-Speed 9407.81 samples/sec Loss 2.0498 LearningRate 0.0005 Epoch: 15 Global Step: 26510 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-05 16:02:25,842-Speed 9416.73 samples/sec Loss 2.0577 LearningRate 0.0005 Epoch: 15 Global Step: 26520 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-05 16:02:51,961-Speed 9409.96 samples/sec Loss 2.0496 LearningRate 0.0005 Epoch: 15 Global Step: 26530 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-05 16:03:18,064-Speed 9415.63 samples/sec Loss 2.0590 LearningRate 0.0005 Epoch: 15 Global Step: 26540 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-05 16:03:44,325-Speed 9358.85 samples/sec Loss 2.0489 LearningRate 0.0005 Epoch: 15 Global Step: 26550 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-05 16:04:10,377-Speed 9434.00 samples/sec Loss 2.0571 LearningRate 0.0005 Epoch: 15 Global Step: 26560 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-05 16:04:36,452-Speed 9425.50 samples/sec Loss 2.0695 LearningRate 0.0005 Epoch: 15 Global Step: 26570 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-05 16:05:02,640-Speed 9385.12 samples/sec Loss 2.0600 LearningRate 0.0005 Epoch: 15 Global Step: 26580 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-05 16:05:28,686-Speed 9435.96 samples/sec Loss 2.0479 LearningRate 0.0005 Epoch: 15 Global Step: 26590 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:05:54,823-Speed 9403.50 samples/sec Loss 2.0555 LearningRate 0.0005 Epoch: 15 Global Step: 26600 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:06:20,921-Speed 9417.21 samples/sec Loss 2.0610 LearningRate 0.0005 Epoch: 15 Global Step: 26610 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:06:47,015-Speed 9418.89 samples/sec Loss 2.0406 LearningRate 0.0005 Epoch: 15 Global Step: 26620 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:07:13,097-Speed 9422.66 samples/sec Loss 2.0466 LearningRate 0.0005 Epoch: 15 Global Step: 26630 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:07:39,144-Speed 9435.81 samples/sec Loss 2.0501 LearningRate 0.0005 Epoch: 15 Global Step: 26640 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:08:05,304-Speed 9394.79 samples/sec Loss 2.0664 LearningRate 0.0005 Epoch: 15 Global Step: 26650 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:08:31,385-Speed 9423.23 samples/sec Loss 2.0578 LearningRate 0.0005 Epoch: 15 Global Step: 26660 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:08:57,441-Speed 9432.58 samples/sec Loss 2.0641 LearningRate 0.0005 Epoch: 15 Global Step: 26670 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:09:23,520-Speed 9423.97 samples/sec Loss 2.0560 LearningRate 0.0005 Epoch: 15 Global Step: 26680 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:09:49,577-Speed 9432.26 samples/sec Loss 2.0350 LearningRate 0.0005 Epoch: 15 Global Step: 26690 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-05 16:10:15,645-Speed 9428.25 samples/sec Loss 2.0432 LearningRate 0.0005 Epoch: 15 Global Step: 26700 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-05 16:10:41,742-Speed 9417.57 samples/sec Loss 2.0558 LearningRate 0.0005 Epoch: 15 Global Step: 26710 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:11:07,878-Speed 9403.80 samples/sec Loss 2.0494 LearningRate 0.0005 Epoch: 15 Global Step: 26720 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:11:34,114-Speed 9367.91 samples/sec Loss 2.0412 LearningRate 0.0005 Epoch: 15 Global Step: 26730 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:12:00,229-Speed 9411.18 samples/sec Loss 2.0454 LearningRate 0.0005 Epoch: 15 Global Step: 26740 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:12:26,399-Speed 9391.18 samples/sec Loss 2.0485 LearningRate 0.0005 Epoch: 15 Global Step: 26750 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:12:52,457-Speed 9431.86 samples/sec Loss 2.0225 LearningRate 0.0005 Epoch: 15 Global Step: 26760 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-05 16:13:18,574-Speed 9410.36 samples/sec Loss 2.0438 LearningRate 0.0005 Epoch: 15 Global Step: 26770 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-05 16:13:44,671-Speed 9417.53 samples/sec Loss 2.0266 LearningRate 0.0005 Epoch: 15 Global Step: 26780 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-05 16:14:10,813-Speed 9401.46 samples/sec Loss 2.0305 LearningRate 0.0005 Epoch: 15 Global Step: 26790 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-05 16:14:36,926-Speed 9411.87 samples/sec Loss 2.0194 LearningRate 0.0005 Epoch: 15 Global Step: 26800 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-05 16:15:03,067-Speed 9402.17 samples/sec Loss 2.0478 LearningRate 0.0005 Epoch: 15 Global Step: 26810 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-05 16:15:29,268-Speed 9380.00 samples/sec Loss 2.0243 LearningRate 0.0005 Epoch: 15 Global Step: 26820 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-05 16:15:55,388-Speed 9409.33 samples/sec Loss 2.0199 LearningRate 0.0005 Epoch: 15 Global Step: 26830 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-05 16:16:21,490-Speed 9415.86 samples/sec Loss 2.0570 LearningRate 0.0005 Epoch: 15 Global Step: 26840 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-05 16:16:47,745-Speed 9360.86 samples/sec Loss 2.0300 LearningRate 0.0005 Epoch: 15 Global Step: 26850 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-05 16:17:13,909-Speed 9393.63 samples/sec Loss 2.0410 LearningRate 0.0005 Epoch: 15 Global Step: 26860 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:17:39,951-Speed 9437.45 samples/sec Loss 2.0496 LearningRate 0.0005 Epoch: 15 Global Step: 26870 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:18:06,031-Speed 9423.83 samples/sec Loss 2.0498 LearningRate 0.0005 Epoch: 15 Global Step: 26880 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:18:32,169-Speed 9402.86 samples/sec Loss 2.0273 LearningRate 0.0005 Epoch: 15 Global Step: 26890 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:18:58,218-Speed 9434.80 samples/sec Loss 2.0306 LearningRate 0.0005 Epoch: 15 Global Step: 26900 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:19:24,372-Speed 9397.14 samples/sec Loss 2.0284 LearningRate 0.0005 Epoch: 15 Global Step: 26910 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:19:50,458-Speed 9421.48 samples/sec Loss 2.0367 LearningRate 0.0005 Epoch: 15 Global Step: 26920 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:20:16,602-Speed 9400.60 samples/sec Loss 2.0367 LearningRate 0.0005 Epoch: 15 Global Step: 26930 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:20:42,743-Speed 9401.54 samples/sec Loss 2.0642 LearningRate 0.0005 Epoch: 15 Global Step: 26940 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:21:08,873-Speed 9405.91 samples/sec Loss 2.0392 LearningRate 0.0005 Epoch: 15 Global Step: 26950 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:21:34,995-Speed 9408.62 samples/sec Loss 2.0390 LearningRate 0.0005 Epoch: 15 Global Step: 26960 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-05 16:22:01,069-Speed 9426.10 samples/sec Loss 2.0376 LearningRate 0.0005 Epoch: 15 Global Step: 26970 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:22:27,141-Speed 9426.69 samples/sec Loss 2.0277 LearningRate 0.0005 Epoch: 15 Global Step: 26980 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:22:53,225-Speed 9422.36 samples/sec Loss 2.0307 LearningRate 0.0005 Epoch: 15 Global Step: 26990 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:23:19,324-Speed 9416.89 samples/sec Loss 2.0392 LearningRate 0.0005 Epoch: 15 Global Step: 27000 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:23:45,375-Speed 9434.09 samples/sec Loss 2.0337 LearningRate 0.0005 Epoch: 15 Global Step: 27010 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:24:11,446-Speed 9427.52 samples/sec Loss 2.0322 LearningRate 0.0005 Epoch: 15 Global Step: 27020 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:24:37,483-Speed 9439.15 samples/sec Loss 2.0293 LearningRate 0.0005 Epoch: 15 Global Step: 27030 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:25:03,554-Speed 9427.23 samples/sec Loss 2.0161 LearningRate 0.0005 Epoch: 15 Global Step: 27040 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:25:29,643-Speed 9420.35 samples/sec Loss 2.0196 LearningRate 0.0005 Epoch: 15 Global Step: 27050 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:25:55,810-Speed 9392.41 samples/sec Loss 2.0237 LearningRate 0.0005 Epoch: 15 Global Step: 27060 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:26:21,997-Speed 9385.06 samples/sec Loss 2.0331 LearningRate 0.0005 Epoch: 15 Global Step: 27070 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-05 16:26:48,135-Speed 9402.82 samples/sec Loss 2.0146 LearningRate 0.0005 Epoch: 15 Global Step: 27080 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-05 16:27:14,151-Speed 9446.92 samples/sec Loss 2.0129 LearningRate 0.0005 Epoch: 15 Global Step: 27090 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:27:40,265-Speed 9411.47 samples/sec Loss 2.0098 LearningRate 0.0005 Epoch: 15 Global Step: 27100 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:28:06,345-Speed 9423.79 samples/sec Loss 2.0355 LearningRate 0.0005 Epoch: 15 Global Step: 27110 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:28:32,373-Speed 9442.51 samples/sec Loss 2.0286 LearningRate 0.0005 Epoch: 15 Global Step: 27120 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:28:58,396-Speed 9444.27 samples/sec Loss 2.0202 LearningRate 0.0005 Epoch: 15 Global Step: 27130 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:29:24,563-Speed 9392.33 samples/sec Loss 2.0224 LearningRate 0.0005 Epoch: 15 Global Step: 27140 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:29:50,665-Speed 9416.11 samples/sec Loss 2.0119 LearningRate 0.0005 Epoch: 15 Global Step: 27150 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:30:16,702-Speed 9439.09 samples/sec Loss 2.0387 LearningRate 0.0005 Epoch: 15 Global Step: 27160 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:30:42,782-Speed 9423.88 samples/sec Loss 2.0209 LearningRate 0.0005 Epoch: 15 Global Step: 27170 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:31:08,860-Speed 9424.57 samples/sec Loss 1.9995 LearningRate 0.0005 Epoch: 15 Global Step: 27180 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:31:35,014-Speed 9396.84 samples/sec Loss 2.0126 LearningRate 0.0005 Epoch: 15 Global Step: 27190 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:32:01,195-Speed 9387.83 samples/sec Loss 2.0162 LearningRate 0.0005 Epoch: 15 Global Step: 27200 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:32:27,319-Speed 9407.87 samples/sec Loss 2.0332 LearningRate 0.0005 Epoch: 15 Global Step: 27210 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:32:53,457-Speed 9402.73 samples/sec Loss 2.0290 LearningRate 0.0005 Epoch: 15 Global Step: 27220 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:33:19,582-Speed 9407.50 samples/sec Loss 2.0102 LearningRate 0.0005 Epoch: 15 Global Step: 27230 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:33:45,616-Speed 9440.61 samples/sec Loss 2.0160 LearningRate 0.0005 Epoch: 15 Global Step: 27240 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:34:11,674-Speed 9431.84 samples/sec Loss 2.0111 LearningRate 0.0005 Epoch: 15 Global Step: 27250 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:34:37,781-Speed 9413.77 samples/sec Loss 2.0139 LearningRate 0.0005 Epoch: 15 Global Step: 27260 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:35:03,834-Speed 9433.38 samples/sec Loss 2.0235 LearningRate 0.0005 Epoch: 15 Global Step: 27270 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:35:29,957-Speed 9408.07 samples/sec Loss 2.0215 LearningRate 0.0005 Epoch: 15 Global Step: 27280 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:35:56,100-Speed 9401.12 samples/sec Loss 2.0173 LearningRate 0.0005 Epoch: 15 Global Step: 27290 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-05 16:36:22,201-Speed 9416.53 samples/sec Loss 1.9894 LearningRate 0.0005 Epoch: 15 Global Step: 27300 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-05 16:36:48,290-Speed 9420.16 samples/sec Loss 2.0122 LearningRate 0.0005 Epoch: 15 Global Step: 27310 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:37:14,423-Speed 9404.94 samples/sec Loss 2.0247 LearningRate 0.0005 Epoch: 15 Global Step: 27320 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:37:40,539-Speed 9410.97 samples/sec Loss 1.9989 LearningRate 0.0005 Epoch: 15 Global Step: 27330 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:38:06,586-Speed 9435.57 samples/sec Loss 2.0024 LearningRate 0.0005 Epoch: 15 Global Step: 27340 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:38:32,662-Speed 9425.36 samples/sec Loss 2.0275 LearningRate 0.0005 Epoch: 15 Global Step: 27350 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:38:58,783-Speed 9409.15 samples/sec Loss 2.0132 LearningRate 0.0005 Epoch: 15 Global Step: 27360 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:39:24,846-Speed 9430.01 samples/sec Loss 1.9980 LearningRate 0.0005 Epoch: 15 Global Step: 27370 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:39:51,020-Speed 9389.78 samples/sec Loss 2.0101 LearningRate 0.0005 Epoch: 15 Global Step: 27380 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:40:17,125-Speed 9414.59 samples/sec Loss 2.0102 LearningRate 0.0004 Epoch: 15 Global Step: 27390 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:40:43,173-Speed 9435.71 samples/sec Loss 1.9994 LearningRate 0.0004 Epoch: 15 Global Step: 27400 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-05 16:41:09,282-Speed 9413.40 samples/sec Loss 2.0035 LearningRate 0.0004 Epoch: 15 Global Step: 27410 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-05 16:41:35,318-Speed 9439.53 samples/sec Loss 1.9965 LearningRate 0.0004 Epoch: 15 Global Step: 27420 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-05 16:42:01,429-Speed 9412.58 samples/sec Loss 2.0184 LearningRate 0.0004 Epoch: 15 Global Step: 27430 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-05 16:42:27,502-Speed 9426.23 samples/sec Loss 2.0228 LearningRate 0.0004 Epoch: 15 Global Step: 27440 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-05 16:42:53,569-Speed 9428.29 samples/sec Loss 2.0088 LearningRate 0.0004 Epoch: 15 Global Step: 27450 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-05 16:43:19,602-Speed 9441.07 samples/sec Loss 2.0085 LearningRate 0.0004 Epoch: 15 Global Step: 27460 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-05 16:43:45,722-Speed 9409.22 samples/sec Loss 2.0089 LearningRate 0.0004 Epoch: 15 Global Step: 27470 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-05 16:44:11,834-Speed 9412.04 samples/sec Loss 2.0128 LearningRate 0.0004 Epoch: 15 Global Step: 27480 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-05 16:44:37,925-Speed 9420.15 samples/sec Loss 2.0061 LearningRate 0.0004 Epoch: 15 Global Step: 27490 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-05 16:45:03,966-Speed 9437.61 samples/sec Loss 1.9907 LearningRate 0.0004 Epoch: 15 Global Step: 27500 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:45:30,098-Speed 9405.00 samples/sec Loss 1.9940 LearningRate 0.0004 Epoch: 15 Global Step: 27510 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:45:56,201-Speed 9415.76 samples/sec Loss 1.9799 LearningRate 0.0004 Epoch: 15 Global Step: 27520 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:46:22,314-Speed 9411.68 samples/sec Loss 1.9708 LearningRate 0.0004 Epoch: 15 Global Step: 27530 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:46:48,436-Speed 9408.84 samples/sec Loss 2.0011 LearningRate 0.0004 Epoch: 15 Global Step: 27540 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:47:14,498-Speed 9430.14 samples/sec Loss 2.0020 LearningRate 0.0004 Epoch: 15 Global Step: 27550 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:47:40,665-Speed 9392.17 samples/sec Loss 2.0167 LearningRate 0.0004 Epoch: 15 Global Step: 27560 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:48:06,743-Speed 9424.67 samples/sec Loss 1.9944 LearningRate 0.0004 Epoch: 15 Global Step: 27570 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:48:32,857-Speed 9411.11 samples/sec Loss 2.0127 LearningRate 0.0004 Epoch: 15 Global Step: 27580 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:48:59,030-Speed 9390.29 samples/sec Loss 1.9872 LearningRate 0.0004 Epoch: 15 Global Step: 27590 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:49:25,180-Speed 9400.90 samples/sec Loss 1.9950 LearningRate 0.0004 Epoch: 15 Global Step: 27600 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-05 16:49:51,291-Speed 9413.26 samples/sec Loss 2.0167 LearningRate 0.0004 Epoch: 15 Global Step: 27610 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-05 16:50:17,353-Speed 9430.32 samples/sec Loss 2.0148 LearningRate 0.0004 Epoch: 15 Global Step: 27620 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-05 16:50:43,418-Speed 9428.87 samples/sec Loss 2.0066 LearningRate 0.0004 Epoch: 15 Global Step: 27630 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-05 16:51:09,679-Speed 9358.84 samples/sec Loss 2.0189 LearningRate 0.0004 Epoch: 15 Global Step: 27640 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-05 16:51:35,726-Speed 9436.01 samples/sec Loss 2.0409 LearningRate 0.0004 Epoch: 15 Global Step: 27650 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:52:54,415-Speed 3123.23 samples/sec Loss 1.9931 LearningRate 0.0004 Epoch: 16 Global Step: 27660 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:53:20,293-Speed 9497.12 samples/sec Loss 1.9799 LearningRate 0.0004 Epoch: 16 Global Step: 27670 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:53:46,267-Speed 9462.35 samples/sec Loss 1.9510 LearningRate 0.0004 Epoch: 16 Global Step: 27680 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-05 16:54:12,303-Speed 9439.63 samples/sec Loss 1.9701 LearningRate 0.0004 Epoch: 16 Global Step: 27690 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 16:54:38,168-Speed 9502.15 samples/sec Loss 1.9555 LearningRate 0.0004 Epoch: 16 Global Step: 27700 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 16:55:04,124-Speed 9468.64 samples/sec Loss 1.9607 LearningRate 0.0004 Epoch: 16 Global Step: 27710 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 16:55:30,058-Speed 9476.97 samples/sec Loss 1.9595 LearningRate 0.0004 Epoch: 16 Global Step: 27720 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-05 16:55:55,967-Speed 9485.88 samples/sec Loss 1.9689 LearningRate 0.0004 Epoch: 16 Global Step: 27730 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-05 16:56:21,910-Speed 9473.71 samples/sec Loss 1.9838 LearningRate 0.0004 Epoch: 16 Global Step: 27740 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-05 16:56:47,869-Speed 9467.63 samples/sec Loss 1.9728 LearningRate 0.0004 Epoch: 16 Global Step: 27750 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-05 16:57:13,871-Speed 9452.44 samples/sec Loss 1.9599 LearningRate 0.0004 Epoch: 16 Global Step: 27760 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-05 16:57:39,818-Speed 9471.99 samples/sec Loss 1.9735 LearningRate 0.0004 Epoch: 16 Global Step: 27770 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-05 16:58:05,887-Speed 9427.81 samples/sec Loss 1.9741 LearningRate 0.0004 Epoch: 16 Global Step: 27780 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-05 16:58:31,884-Speed 9453.61 samples/sec Loss 1.9641 LearningRate 0.0004 Epoch: 16 Global Step: 27790 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-05 16:58:57,928-Speed 9436.58 samples/sec Loss 1.9744 LearningRate 0.0004 Epoch: 16 Global Step: 27800 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-05 16:59:23,942-Speed 9447.91 samples/sec Loss 1.9810 LearningRate 0.0004 Epoch: 16 Global Step: 27810 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-05 16:59:49,990-Speed 9435.32 samples/sec Loss 1.9853 LearningRate 0.0004 Epoch: 16 Global Step: 27820 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:00:16,112-Speed 9408.39 samples/sec Loss 1.9859 LearningRate 0.0004 Epoch: 16 Global Step: 27830 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:00:42,234-Speed 9408.83 samples/sec Loss 1.9824 LearningRate 0.0004 Epoch: 16 Global Step: 27840 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:01:08,318-Speed 9421.98 samples/sec Loss 1.9801 LearningRate 0.0004 Epoch: 16 Global Step: 27850 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:01:34,386-Speed 9427.99 samples/sec Loss 1.9646 LearningRate 0.0004 Epoch: 16 Global Step: 27860 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:02:00,610-Speed 9372.06 samples/sec Loss 1.9593 LearningRate 0.0004 Epoch: 16 Global Step: 27870 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:02:26,789-Speed 9387.86 samples/sec Loss 1.9694 LearningRate 0.0004 Epoch: 16 Global Step: 27880 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:02:52,972-Speed 9386.74 samples/sec Loss 1.9760 LearningRate 0.0004 Epoch: 16 Global Step: 27890 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:03:19,119-Speed 9399.81 samples/sec Loss 1.9718 LearningRate 0.0004 Epoch: 16 Global Step: 27900 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:03:45,333-Speed 9375.68 samples/sec Loss 1.9853 LearningRate 0.0004 Epoch: 16 Global Step: 27910 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:04:11,516-Speed 9386.63 samples/sec Loss 1.9622 LearningRate 0.0004 Epoch: 16 Global Step: 27920 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-03-05 17:04:37,628-Speed 9412.65 samples/sec Loss 1.9787 LearningRate 0.0004 Epoch: 16 Global Step: 27930 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:05:03,772-Speed 9400.22 samples/sec Loss 1.9618 LearningRate 0.0004 Epoch: 16 Global Step: 27940 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:05:29,927-Speed 9396.91 samples/sec Loss 1.9706 LearningRate 0.0004 Epoch: 16 Global Step: 27950 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:05:56,136-Speed 9377.10 samples/sec Loss 1.9635 LearningRate 0.0004 Epoch: 16 Global Step: 27960 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:06:22,291-Speed 9396.87 samples/sec Loss 1.9793 LearningRate 0.0004 Epoch: 16 Global Step: 27970 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:06:48,427-Speed 9403.85 samples/sec Loss 1.9602 LearningRate 0.0004 Epoch: 16 Global Step: 27980 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:07:14,626-Speed 9380.96 samples/sec Loss 1.9819 LearningRate 0.0004 Epoch: 16 Global Step: 27990 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:07:40,789-Speed 9393.75 samples/sec Loss 1.9656 LearningRate 0.0004 Epoch: 16 Global Step: 28000 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:08:07,052-Speed 9358.43 samples/sec Loss 1.9618 LearningRate 0.0004 Epoch: 16 Global Step: 28010 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:08:33,231-Speed 9387.97 samples/sec Loss 1.9519 LearningRate 0.0004 Epoch: 16 Global Step: 28020 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:08:59,362-Speed 9405.66 samples/sec Loss 1.9679 LearningRate 0.0004 Epoch: 16 Global Step: 28030 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:09:25,496-Speed 9404.69 samples/sec Loss 1.9751 LearningRate 0.0004 Epoch: 16 Global Step: 28040 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:09:51,673-Speed 9388.72 samples/sec Loss 1.9547 LearningRate 0.0004 Epoch: 16 Global Step: 28050 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:10:17,808-Speed 9404.05 samples/sec Loss 1.9706 LearningRate 0.0004 Epoch: 16 Global Step: 28060 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:10:43,962-Speed 9397.32 samples/sec Loss 1.9730 LearningRate 0.0004 Epoch: 16 Global Step: 28070 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:11:10,108-Speed 9399.74 samples/sec Loss 1.9751 LearningRate 0.0004 Epoch: 16 Global Step: 28080 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:11:36,224-Speed 9410.81 samples/sec Loss 1.9509 LearningRate 0.0004 Epoch: 16 Global Step: 28090 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:12:02,389-Speed 9392.80 samples/sec Loss 1.9670 LearningRate 0.0004 Epoch: 16 Global Step: 28100 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:12:28,587-Speed 9381.46 samples/sec Loss 1.9647 LearningRate 0.0004 Epoch: 16 Global Step: 28110 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:12:54,699-Speed 9412.04 samples/sec Loss 1.9607 LearningRate 0.0004 Epoch: 16 Global Step: 28120 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-05 17:13:20,902-Speed 9379.42 samples/sec Loss 1.9655 LearningRate 0.0004 Epoch: 16 Global Step: 28130 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-05 17:13:47,046-Speed 9400.85 samples/sec Loss 1.9469 LearningRate 0.0004 Epoch: 16 Global Step: 28140 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-05 17:14:13,280-Speed 9368.25 samples/sec Loss 1.9655 LearningRate 0.0004 Epoch: 16 Global Step: 28150 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-05 17:14:39,496-Speed 9374.59 samples/sec Loss 1.9598 LearningRate 0.0004 Epoch: 16 Global Step: 28160 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-05 17:15:05,748-Speed 9362.30 samples/sec Loss 1.9603 LearningRate 0.0004 Epoch: 16 Global Step: 28170 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-05 17:15:31,931-Speed 9386.30 samples/sec Loss 1.9633 LearningRate 0.0004 Epoch: 16 Global Step: 28180 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-03-05 17:15:58,073-Speed 9401.87 samples/sec Loss 1.9574 LearningRate 0.0004 Epoch: 16 Global Step: 28190 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-03-05 17:16:24,222-Speed 9399.19 samples/sec Loss 1.9591 LearningRate 0.0004 Epoch: 16 Global Step: 28200 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-03-05 17:16:50,449-Speed 9370.74 samples/sec Loss 1.9487 LearningRate 0.0004 Epoch: 16 Global Step: 28210 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-03-05 17:17:16,653-Speed 9379.29 samples/sec Loss 1.9401 LearningRate 0.0004 Epoch: 16 Global Step: 28220 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-03-05 17:17:42,909-Speed 9360.60 samples/sec Loss 1.9433 LearningRate 0.0004 Epoch: 16 Global Step: 28230 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-03-05 17:18:09,071-Speed 9394.11 samples/sec Loss 1.9322 LearningRate 0.0004 Epoch: 16 Global Step: 28240 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-03-05 17:18:35,195-Speed 9408.04 samples/sec Loss 1.9607 LearningRate 0.0004 Epoch: 16 Global Step: 28250 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-03-05 17:19:01,427-Speed 9369.39 samples/sec Loss 1.9690 LearningRate 0.0004 Epoch: 16 Global Step: 28260 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-03-05 17:19:27,685-Speed 9359.69 samples/sec Loss 1.9463 LearningRate 0.0004 Epoch: 16 Global Step: 28270 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-03-05 17:19:53,930-Speed 9364.63 samples/sec Loss 1.9474 LearningRate 0.0004 Epoch: 16 Global Step: 28280 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-05 17:20:20,170-Speed 9366.40 samples/sec Loss 1.9553 LearningRate 0.0004 Epoch: 16 Global Step: 28290 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-05 17:20:46,330-Speed 9394.95 samples/sec Loss 1.9689 LearningRate 0.0004 Epoch: 16 Global Step: 28300 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-05 17:21:12,530-Speed 9380.53 samples/sec Loss 1.9410 LearningRate 0.0004 Epoch: 16 Global Step: 28310 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-05 17:21:38,770-Speed 9366.40 samples/sec Loss 1.9580 LearningRate 0.0004 Epoch: 16 Global Step: 28320 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-05 17:22:04,972-Speed 9379.71 samples/sec Loss 1.9391 LearningRate 0.0004 Epoch: 16 Global Step: 28330 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-05 17:22:31,231-Speed 9359.54 samples/sec Loss 1.9509 LearningRate 0.0004 Epoch: 16 Global Step: 28340 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-05 17:22:57,435-Speed 9379.19 samples/sec Loss 1.9494 LearningRate 0.0004 Epoch: 16 Global Step: 28350 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-05 17:23:23,658-Speed 9372.64 samples/sec Loss 1.9487 LearningRate 0.0004 Epoch: 16 Global Step: 28360 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-05 17:23:49,916-Speed 9359.69 samples/sec Loss 1.9371 LearningRate 0.0004 Epoch: 16 Global Step: 28370 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-05 17:24:16,166-Speed 9362.92 samples/sec Loss 1.9504 LearningRate 0.0004 Epoch: 16 Global Step: 28380 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:24:42,381-Speed 9375.23 samples/sec Loss 1.9557 LearningRate 0.0004 Epoch: 16 Global Step: 28390 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:25:08,624-Speed 9365.12 samples/sec Loss 1.9500 LearningRate 0.0004 Epoch: 16 Global Step: 28400 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:25:34,874-Speed 9362.67 samples/sec Loss 1.9434 LearningRate 0.0004 Epoch: 16 Global Step: 28410 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:26:01,134-Speed 9359.25 samples/sec Loss 1.9253 LearningRate 0.0004 Epoch: 16 Global Step: 28420 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:26:27,326-Speed 9383.23 samples/sec Loss 1.9419 LearningRate 0.0004 Epoch: 16 Global Step: 28430 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:26:53,550-Speed 9372.06 samples/sec Loss 1.9364 LearningRate 0.0004 Epoch: 16 Global Step: 28440 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:27:19,698-Speed 9399.17 samples/sec Loss 1.9382 LearningRate 0.0004 Epoch: 16 Global Step: 28450 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:27:45,871-Speed 9390.30 samples/sec Loss 1.9456 LearningRate 0.0004 Epoch: 16 Global Step: 28460 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:28:12,121-Speed 9362.84 samples/sec Loss 1.9438 LearningRate 0.0004 Epoch: 16 Global Step: 28470 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:28:38,311-Speed 9383.94 samples/sec Loss 1.9230 LearningRate 0.0004 Epoch: 16 Global Step: 28480 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-03-05 17:29:04,537-Speed 9371.31 samples/sec Loss 1.9522 LearningRate 0.0004 Epoch: 16 Global Step: 28490 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-03-05 17:29:30,728-Speed 9383.52 samples/sec Loss 1.9287 LearningRate 0.0004 Epoch: 16 Global Step: 28500 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-03-05 17:29:56,906-Speed 9388.48 samples/sec Loss 1.9313 LearningRate 0.0004 Epoch: 16 Global Step: 28510 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:30:23,102-Speed 9382.14 samples/sec Loss 1.9416 LearningRate 0.0004 Epoch: 16 Global Step: 28520 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:30:49,310-Speed 9377.97 samples/sec Loss 1.9288 LearningRate 0.0004 Epoch: 16 Global Step: 28530 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:31:15,552-Speed 9365.43 samples/sec Loss 1.9131 LearningRate 0.0004 Epoch: 16 Global Step: 28540 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:31:41,757-Speed 9378.77 samples/sec Loss 1.9134 LearningRate 0.0004 Epoch: 16 Global Step: 28550 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:32:08,035-Speed 9352.91 samples/sec Loss 1.9196 LearningRate 0.0004 Epoch: 16 Global Step: 28560 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:32:34,220-Speed 9386.06 samples/sec Loss 1.9223 LearningRate 0.0004 Epoch: 16 Global Step: 28570 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-05 17:33:00,481-Speed 9358.43 samples/sec Loss 1.9382 LearningRate 0.0004 Epoch: 16 Global Step: 28580 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-05 17:33:26,746-Speed 9357.40 samples/sec Loss 1.9184 LearningRate 0.0004 Epoch: 16 Global Step: 28590 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-05 17:33:53,035-Speed 9348.78 samples/sec Loss 1.9254 LearningRate 0.0004 Epoch: 16 Global Step: 28600 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-05 17:34:19,254-Speed 9373.78 samples/sec Loss 1.9301 LearningRate 0.0004 Epoch: 16 Global Step: 28610 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-05 17:34:45,522-Speed 9356.19 samples/sec Loss 1.9278 LearningRate 0.0004 Epoch: 16 Global Step: 28620 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-05 17:35:11,703-Speed 9387.42 samples/sec Loss 1.9607 LearningRate 0.0004 Epoch: 16 Global Step: 28630 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-05 17:35:37,904-Speed 9379.89 samples/sec Loss 1.9288 LearningRate 0.0004 Epoch: 16 Global Step: 28640 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-05 17:36:04,172-Speed 9356.13 samples/sec Loss 1.9248 LearningRate 0.0004 Epoch: 16 Global Step: 28650 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-05 17:36:30,398-Speed 9371.68 samples/sec Loss 1.9140 LearningRate 0.0004 Epoch: 16 Global Step: 28660 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-05 17:36:56,592-Speed 9382.51 samples/sec Loss 1.9241 LearningRate 0.0004 Epoch: 16 Global Step: 28670 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:37:22,769-Speed 9389.07 samples/sec Loss 1.9256 LearningRate 0.0004 Epoch: 16 Global Step: 28680 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:37:49,014-Speed 9364.43 samples/sec Loss 1.9343 LearningRate 0.0004 Epoch: 16 Global Step: 28690 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-05 17:38:15,206-Speed 9383.70 samples/sec Loss 1.9327 LearningRate 0.0004 Epoch: 16 Global Step: 28700 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-05 17:38:41,322-Speed 9410.89 samples/sec Loss 1.9252 LearningRate 0.0004 Epoch: 16 Global Step: 28710 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-05 17:39:07,497-Speed 9389.33 samples/sec Loss 1.9145 LearningRate 0.0004 Epoch: 16 Global Step: 28720 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-05 17:39:33,702-Speed 9378.85 samples/sec Loss 1.9176 LearningRate 0.0004 Epoch: 16 Global Step: 28730 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-05 17:39:59,902-Speed 9380.74 samples/sec Loss 1.9341 LearningRate 0.0004 Epoch: 16 Global Step: 28740 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-05 17:40:26,047-Speed 9400.59 samples/sec Loss 1.9216 LearningRate 0.0004 Epoch: 16 Global Step: 28750 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-05 17:40:52,211-Speed 9393.10 samples/sec Loss 1.9302 LearningRate 0.0004 Epoch: 16 Global Step: 28760 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-05 17:41:18,377-Speed 9392.68 samples/sec Loss 1.9139 LearningRate 0.0004 Epoch: 16 Global Step: 28770 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-05 17:41:44,556-Speed 9388.15 samples/sec Loss 1.9145 LearningRate 0.0004 Epoch: 16 Global Step: 28780 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-05 17:42:10,672-Speed 9410.93 samples/sec Loss 1.9173 LearningRate 0.0004 Epoch: 16 Global Step: 28790 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:42:36,881-Speed 9377.30 samples/sec Loss 1.9140 LearningRate 0.0004 Epoch: 16 Global Step: 28800 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:43:03,038-Speed 9395.79 samples/sec Loss 1.9132 LearningRate 0.0004 Epoch: 16 Global Step: 28810 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:43:29,189-Speed 9398.37 samples/sec Loss 1.9118 LearningRate 0.0004 Epoch: 16 Global Step: 28820 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:43:55,423-Speed 9369.61 samples/sec Loss 1.9036 LearningRate 0.0004 Epoch: 16 Global Step: 28830 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:44:21,659-Speed 9367.60 samples/sec Loss 1.9165 LearningRate 0.0004 Epoch: 16 Global Step: 28840 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:44:47,861-Speed 9380.33 samples/sec Loss 1.9221 LearningRate 0.0004 Epoch: 16 Global Step: 28850 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:45:13,987-Speed 9407.20 samples/sec Loss 1.9162 LearningRate 0.0004 Epoch: 16 Global Step: 28860 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:45:40,043-Speed 9432.46 samples/sec Loss 1.9159 LearningRate 0.0004 Epoch: 16 Global Step: 28870 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:46:06,151-Speed 9413.49 samples/sec Loss 1.9108 LearningRate 0.0004 Epoch: 16 Global Step: 28880 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:46:32,312-Speed 9394.65 samples/sec Loss 1.9128 LearningRate 0.0004 Epoch: 16 Global Step: 28890 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-03-05 17:46:58,367-Speed 9437.50 samples/sec Loss 1.9098 LearningRate 0.0004 Epoch: 16 Global Step: 28900 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:47:24,462-Speed 9418.31 samples/sec Loss 1.9271 LearningRate 0.0004 Epoch: 16 Global Step: 28910 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:47:50,566-Speed 9415.07 samples/sec Loss 1.9177 LearningRate 0.0004 Epoch: 16 Global Step: 28920 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:48:16,694-Speed 9406.57 samples/sec Loss 1.8918 LearningRate 0.0004 Epoch: 16 Global Step: 28930 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:48:42,762-Speed 9428.23 samples/sec Loss 1.9125 LearningRate 0.0004 Epoch: 16 Global Step: 28940 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:49:08,895-Speed 9404.77 samples/sec Loss 1.9170 LearningRate 0.0004 Epoch: 16 Global Step: 28950 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-05 17:49:35,009-Speed 9411.48 samples/sec Loss 1.9153 LearningRate 0.0004 Epoch: 16 Global Step: 28960 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-05 17:50:01,177-Speed 9392.11 samples/sec Loss 1.9023 LearningRate 0.0004 Epoch: 16 Global Step: 28970 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-05 17:50:27,245-Speed 9428.16 samples/sec Loss 1.9250 LearningRate 0.0004 Epoch: 16 Global Step: 28980 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-05 17:50:53,330-Speed 9422.16 samples/sec Loss 1.9192 LearningRate 0.0004 Epoch: 16 Global Step: 28990 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-05 17:51:19,521-Speed 9383.90 samples/sec Loss 1.8999 LearningRate 0.0004 Epoch: 16 Global Step: 29000 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-05 17:51:45,594-Speed 9426.44 samples/sec Loss 1.9047 LearningRate 0.0004 Epoch: 16 Global Step: 29010 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-05 17:52:11,769-Speed 9389.29 samples/sec Loss 1.8927 LearningRate 0.0004 Epoch: 16 Global Step: 29020 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-05 17:52:37,989-Speed 9373.40 samples/sec Loss 1.9072 LearningRate 0.0004 Epoch: 16 Global Step: 29030 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-05 17:53:04,156-Speed 9392.73 samples/sec Loss 1.8942 LearningRate 0.0004 Epoch: 16 Global Step: 29040 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-05 17:53:30,287-Speed 9405.13 samples/sec Loss 1.9040 LearningRate 0.0004 Epoch: 16 Global Step: 29050 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-05 17:53:56,363-Speed 9425.16 samples/sec Loss 1.9011 LearningRate 0.0004 Epoch: 16 Global Step: 29060 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-05 17:54:22,415-Speed 9433.64 samples/sec Loss 1.8996 LearningRate 0.0004 Epoch: 16 Global Step: 29070 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-05 17:54:48,554-Speed 9402.46 samples/sec Loss 1.8948 LearningRate 0.0004 Epoch: 16 Global Step: 29080 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-05 17:55:14,627-Speed 9426.62 samples/sec Loss 1.9017 LearningRate 0.0004 Epoch: 16 Global Step: 29090 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-05 17:55:40,680-Speed 9433.48 samples/sec Loss 1.8903 LearningRate 0.0004 Epoch: 16 Global Step: 29100 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-05 17:56:06,753-Speed 9426.19 samples/sec Loss 1.9046 LearningRate 0.0004 Epoch: 16 Global Step: 29110 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-05 17:56:32,822-Speed 9427.74 samples/sec Loss 1.9036 LearningRate 0.0004 Epoch: 16 Global Step: 29120 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-05 17:56:58,871-Speed 9434.85 samples/sec Loss 1.8991 LearningRate 0.0004 Epoch: 16 Global Step: 29130 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-05 17:57:24,940-Speed 9427.80 samples/sec Loss 1.9046 LearningRate 0.0004 Epoch: 16 Global Step: 29140 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-05 17:57:51,066-Speed 9407.36 samples/sec Loss 1.8985 LearningRate 0.0004 Epoch: 16 Global Step: 29150 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-05 17:58:17,245-Speed 9389.31 samples/sec Loss 1.8905 LearningRate 0.0004 Epoch: 16 Global Step: 29160 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-05 17:58:43,337-Speed 9419.40 samples/sec Loss 1.8869 LearningRate 0.0004 Epoch: 16 Global Step: 29170 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-05 17:59:09,461-Speed 9407.86 samples/sec Loss 1.8985 LearningRate 0.0004 Epoch: 16 Global Step: 29180 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-05 17:59:35,557-Speed 9418.03 samples/sec Loss 1.9051 LearningRate 0.0004 Epoch: 16 Global Step: 29190 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-05 18:00:01,720-Speed 9394.13 samples/sec Loss 1.9059 LearningRate 0.0004 Epoch: 16 Global Step: 29200 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-05 18:00:27,849-Speed 9405.94 samples/sec Loss 1.9088 LearningRate 0.0004 Epoch: 16 Global Step: 29210 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-05 18:00:53,904-Speed 9432.86 samples/sec Loss 1.8965 LearningRate 0.0004 Epoch: 16 Global Step: 29220 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-05 18:01:19,965-Speed 9430.75 samples/sec Loss 1.8966 LearningRate 0.0004 Epoch: 16 Global Step: 29230 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-05 18:01:46,016-Speed 9434.02 samples/sec Loss 1.8967 LearningRate 0.0004 Epoch: 16 Global Step: 29240 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-05 18:02:12,102-Speed 9421.54 samples/sec Loss 1.9134 LearningRate 0.0004 Epoch: 16 Global Step: 29250 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-05 18:02:38,286-Speed 9386.26 samples/sec Loss 1.8979 LearningRate 0.0004 Epoch: 16 Global Step: 29260 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-05 18:03:04,406-Speed 9409.27 samples/sec Loss 1.8967 LearningRate 0.0004 Epoch: 16 Global Step: 29270 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-05 18:03:30,534-Speed 9406.35 samples/sec Loss 1.8920 LearningRate 0.0004 Epoch: 16 Global Step: 29280 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-05 18:03:56,733-Speed 9381.05 samples/sec Loss 1.8958 LearningRate 0.0004 Epoch: 16 Global Step: 29290 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-05 18:04:22,858-Speed 9407.39 samples/sec Loss 1.8989 LearningRate 0.0004 Epoch: 16 Global Step: 29300 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-05 18:04:48,937-Speed 9424.27 samples/sec Loss 1.8927 LearningRate 0.0004 Epoch: 16 Global Step: 29310 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-05 18:05:15,010-Speed 9425.91 samples/sec Loss 1.8951 LearningRate 0.0004 Epoch: 16 Global Step: 29320 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-05 18:05:41,140-Speed 9405.82 samples/sec Loss 1.9060 LearningRate 0.0004 Epoch: 16 Global Step: 29330 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-05 18:06:07,195-Speed 9432.71 samples/sec Loss 1.8988 LearningRate 0.0004 Epoch: 16 Global Step: 29340 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-03-05 18:06:33,238-Speed 9437.27 samples/sec Loss 1.8930 LearningRate 0.0004 Epoch: 16 Global Step: 29350 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-05 18:06:59,363-Speed 9407.36 samples/sec Loss 1.9157 LearningRate 0.0004 Epoch: 16 Global Step: 29360 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-05 18:07:25,520-Speed 9395.87 samples/sec Loss 1.9141 LearningRate 0.0004 Epoch: 16 Global Step: 29370 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-05 18:07:51,661-Speed 9401.49 samples/sec Loss 1.9160 LearningRate 0.0004 Epoch: 16 Global Step: 29380 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-05 18:09:11,429-Speed 3081.05 samples/sec Loss 1.8538 LearningRate 0.0004 Epoch: 17 Global Step: 29390 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-05 18:09:37,371-Speed 9473.73 samples/sec Loss 1.8602 LearningRate 0.0004 Epoch: 17 Global Step: 29400 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-05 18:10:03,455-Speed 9422.51 samples/sec Loss 1.8552 LearningRate 0.0004 Epoch: 17 Global Step: 29410 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-05 18:10:29,549-Speed 9418.60 samples/sec Loss 1.8603 LearningRate 0.0004 Epoch: 17 Global Step: 29420 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-05 18:10:55,558-Speed 9449.77 samples/sec Loss 1.8610 LearningRate 0.0004 Epoch: 17 Global Step: 29430 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-05 18:11:21,564-Speed 9450.30 samples/sec Loss 1.8740 LearningRate 0.0004 Epoch: 17 Global Step: 29440 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-05 18:11:47,581-Speed 9446.71 samples/sec Loss 1.8548 LearningRate 0.0004 Epoch: 17 Global Step: 29450 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-05 18:12:13,629-Speed 9434.94 samples/sec Loss 1.8816 LearningRate 0.0004 Epoch: 17 Global Step: 29460 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-05 18:12:39,626-Speed 9454.28 samples/sec Loss 1.8741 LearningRate 0.0004 Epoch: 17 Global Step: 29470 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-05 18:13:05,717-Speed 9419.72 samples/sec Loss 1.8732 LearningRate 0.0004 Epoch: 17 Global Step: 29480 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-05 18:13:31,769-Speed 9433.78 samples/sec Loss 1.8694 LearningRate 0.0004 Epoch: 17 Global Step: 29490 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-05 18:13:57,932-Speed 9393.83 samples/sec Loss 1.8764 LearningRate 0.0004 Epoch: 17 Global Step: 29500 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-05 18:14:23,972-Speed 9438.30 samples/sec Loss 1.8932 LearningRate 0.0004 Epoch: 17 Global Step: 29510 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-05 18:14:50,017-Speed 9436.43 samples/sec Loss 1.8471 LearningRate 0.0004 Epoch: 17 Global Step: 29520 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-05 18:15:16,113-Speed 9417.84 samples/sec Loss 1.8574 LearningRate 0.0004 Epoch: 17 Global Step: 29530 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-05 18:15:42,171-Speed 9431.68 samples/sec Loss 1.8612 LearningRate 0.0004 Epoch: 17 Global Step: 29540 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-05 18:16:08,234-Speed 9429.60 samples/sec Loss 1.8516 LearningRate 0.0004 Epoch: 17 Global Step: 29550 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-05 18:16:34,283-Speed 9435.11 samples/sec Loss 1.8759 LearningRate 0.0004 Epoch: 17 Global Step: 29560 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-03-05 18:17:00,445-Speed 9393.83 samples/sec Loss 1.8607 LearningRate 0.0004 Epoch: 17 Global Step: 29570 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-03-05 18:17:26,482-Speed 9439.60 samples/sec Loss 1.8640 LearningRate 0.0004 Epoch: 17 Global Step: 29580 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-05 18:17:52,505-Speed 9444.56 samples/sec Loss 1.8589 LearningRate 0.0004 Epoch: 17 Global Step: 29590 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-05 18:18:18,451-Speed 9472.09 samples/sec Loss 1.8617 LearningRate 0.0004 Epoch: 17 Global Step: 29600 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-05 18:18:44,499-Speed 9435.63 samples/sec Loss 1.8729 LearningRate 0.0004 Epoch: 17 Global Step: 29610 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-05 18:19:10,542-Speed 9437.24 samples/sec Loss 1.8633 LearningRate 0.0004 Epoch: 17 Global Step: 29620 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-05 18:19:36,679-Speed 9403.30 samples/sec Loss 1.8621 LearningRate 0.0004 Epoch: 17 Global Step: 29630 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-05 18:20:02,811-Speed 9404.89 samples/sec Loss 1.8797 LearningRate 0.0004 Epoch: 17 Global Step: 29640 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-05 18:20:28,843-Speed 9441.24 samples/sec Loss 1.8807 LearningRate 0.0004 Epoch: 17 Global Step: 29650 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-05 18:20:54,883-Speed 9438.00 samples/sec Loss 1.8653 LearningRate 0.0004 Epoch: 17 Global Step: 29660 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-05 18:21:21,023-Speed 9402.14 samples/sec Loss 1.8756 LearningRate 0.0004 Epoch: 17 Global Step: 29670 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-05 18:21:47,137-Speed 9411.71 samples/sec Loss 1.8802 LearningRate 0.0004 Epoch: 17 Global Step: 29680 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-05 18:22:13,176-Speed 9438.76 samples/sec Loss 1.8491 LearningRate 0.0004 Epoch: 17 Global Step: 29690 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-05 18:22:39,275-Speed 9416.90 samples/sec Loss 1.8624 LearningRate 0.0004 Epoch: 17 Global Step: 29700 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-05 18:23:05,317-Speed 9437.23 samples/sec Loss 1.8661 LearningRate 0.0004 Epoch: 17 Global Step: 29710 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-05 18:23:31,496-Speed 9388.11 samples/sec Loss 1.8673 LearningRate 0.0004 Epoch: 17 Global Step: 29720 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-05 18:23:57,619-Speed 9408.84 samples/sec Loss 1.8587 LearningRate 0.0004 Epoch: 17 Global Step: 29730 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-05 18:24:23,720-Speed 9416.04 samples/sec Loss 1.8649 LearningRate 0.0004 Epoch: 17 Global Step: 29740 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-05 18:24:49,875-Speed 9397.01 samples/sec Loss 1.8648 LearningRate 0.0004 Epoch: 17 Global Step: 29750 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-03-05 18:25:15,918-Speed 9437.21 samples/sec Loss 1.8516 LearningRate 0.0004 Epoch: 17 Global Step: 29760 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-03-05 18:25:42,039-Speed 9408.64 samples/sec Loss 1.8691 LearningRate 0.0004 Epoch: 17 Global Step: 29770 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-03-05 18:26:08,141-Speed 9415.95 samples/sec Loss 1.8649 LearningRate 0.0004 Epoch: 17 Global Step: 29780 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-03-05 18:26:34,349-Speed 9377.54 samples/sec Loss 1.8606 LearningRate 0.0004 Epoch: 17 Global Step: 29790 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-03-05 18:27:00,640-Speed 9347.96 samples/sec Loss 1.8610 LearningRate 0.0004 Epoch: 17 Global Step: 29800 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-03-05 18:27:26,832-Speed 9383.39 samples/sec Loss 1.8700 LearningRate 0.0004 Epoch: 17 Global Step: 29810 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-03-05 18:27:52,958-Speed 9407.28 samples/sec Loss 1.8726 LearningRate 0.0004 Epoch: 17 Global Step: 29820 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-03-05 18:28:19,099-Speed 9401.86 samples/sec Loss 1.8713 LearningRate 0.0004 Epoch: 17 Global Step: 29830 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-03-05 18:28:45,203-Speed 9415.08 samples/sec Loss 1.8591 LearningRate 0.0004 Epoch: 17 Global Step: 29840 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-03-05 18:29:11,329-Speed 9406.87 samples/sec Loss 1.8634 LearningRate 0.0004 Epoch: 17 Global Step: 29850 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-05 18:29:37,473-Speed 9400.61 samples/sec Loss 1.8610 LearningRate 0.0004 Epoch: 17 Global Step: 29860 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-05 18:30:03,522-Speed 9435.01 samples/sec Loss 1.8468 LearningRate 0.0004 Epoch: 17 Global Step: 29870 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-05 18:30:29,660-Speed 9403.16 samples/sec Loss 1.8589 LearningRate 0.0004 Epoch: 17 Global Step: 29880 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-05 18:30:55,834-Speed 9389.85 samples/sec Loss 1.8580 LearningRate 0.0004 Epoch: 17 Global Step: 29890 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-05 18:31:21,930-Speed 9417.70 samples/sec Loss 1.8454 LearningRate 0.0004 Epoch: 17 Global Step: 29900 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-05 18:31:48,102-Speed 9390.66 samples/sec Loss 1.8580 LearningRate 0.0004 Epoch: 17 Global Step: 29910 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-05 18:32:14,319-Speed 9374.56 samples/sec Loss 1.8426 LearningRate 0.0004 Epoch: 17 Global Step: 29920 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-05 18:32:40,346-Speed 9442.90 samples/sec Loss 1.8508 LearningRate 0.0004 Epoch: 17 Global Step: 29930 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-05 18:33:06,431-Speed 9421.59 samples/sec Loss 1.8559 LearningRate 0.0004 Epoch: 17 Global Step: 29940 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-05 18:33:32,534-Speed 9415.59 samples/sec Loss 1.8481 LearningRate 0.0004 Epoch: 17 Global Step: 29950 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-05 18:33:58,684-Speed 9398.16 samples/sec Loss 1.8464 LearningRate 0.0004 Epoch: 17 Global Step: 29960 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-05 18:34:24,737-Speed 9433.55 samples/sec Loss 1.8487 LearningRate 0.0004 Epoch: 17 Global Step: 29970 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-05 18:34:50,799-Speed 9430.24 samples/sec Loss 1.8490 LearningRate 0.0004 Epoch: 17 Global Step: 29980 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-05 18:35:16,859-Speed 9430.65 samples/sec Loss 1.8443 LearningRate 0.0004 Epoch: 17 Global Step: 29990 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-05 18:35:42,885-Speed 9443.27 samples/sec Loss 1.8363 LearningRate 0.0004 Epoch: 17 Global Step: 30000 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-03-05 18:36:09,063-Speed 9388.30 samples/sec Loss 1.8530 LearningRate 0.0004 Epoch: 17 Global Step: 30010 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-03-05 18:36:35,168-Speed 9415.18 samples/sec Loss 1.8437 LearningRate 0.0004 Epoch: 17 Global Step: 30020 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-03-05 18:37:01,380-Speed 9376.53 samples/sec Loss 1.8512 LearningRate 0.0004 Epoch: 17 Global Step: 30030 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-03-05 18:37:27,571-Speed 9383.70 samples/sec Loss 1.8395 LearningRate 0.0004 Epoch: 17 Global Step: 30040 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-03-05 18:37:53,630-Speed 9431.49 samples/sec Loss 1.8482 LearningRate 0.0004 Epoch: 17 Global Step: 30050 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-03-05 18:38:19,759-Speed 9405.92 samples/sec Loss 1.8483 LearningRate 0.0004 Epoch: 17 Global Step: 30060 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-03-05 18:38:45,868-Speed 9413.33 samples/sec Loss 1.8423 LearningRate 0.0004 Epoch: 17 Global Step: 30070 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-03-05 18:39:11,977-Speed 9413.48 samples/sec Loss 1.8470 LearningRate 0.0004 Epoch: 17 Global Step: 30080 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-03-05 18:39:38,105-Speed 9406.31 samples/sec Loss 1.8528 LearningRate 0.0004 Epoch: 17 Global Step: 30090 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-03-05 18:40:04,250-Speed 9400.37 samples/sec Loss 1.8442 LearningRate 0.0004 Epoch: 17 Global Step: 30100 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-05 18:40:30,372-Speed 9408.36 samples/sec Loss 1.8419 LearningRate 0.0004 Epoch: 17 Global Step: 30110 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-05 18:40:56,542-Speed 9392.32 samples/sec Loss 1.8290 LearningRate 0.0004 Epoch: 17 Global Step: 30120 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-05 18:41:22,666-Speed 9408.03 samples/sec Loss 1.8469 LearningRate 0.0004 Epoch: 17 Global Step: 30130 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-05 18:41:48,825-Speed 9395.17 samples/sec Loss 1.8535 LearningRate 0.0004 Epoch: 17 Global Step: 30140 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-05 18:42:14,839-Speed 9447.67 samples/sec Loss 1.8401 LearningRate 0.0004 Epoch: 17 Global Step: 30150 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-05 18:42:40,876-Speed 9439.18 samples/sec Loss 1.8429 LearningRate 0.0004 Epoch: 17 Global Step: 30160 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-05 18:43:06,967-Speed 9420.83 samples/sec Loss 1.8372 LearningRate 0.0004 Epoch: 17 Global Step: 30170 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-05 18:43:33,000-Speed 9440.87 samples/sec Loss 1.8324 LearningRate 0.0004 Epoch: 17 Global Step: 30180 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-05 18:43:59,076-Speed 9425.27 samples/sec Loss 1.8529 LearningRate 0.0004 Epoch: 17 Global Step: 30190 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-05 18:44:25,177-Speed 9416.14 samples/sec Loss 1.8343 LearningRate 0.0004 Epoch: 17 Global Step: 30200 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-05 18:44:51,273-Speed 9417.66 samples/sec Loss 1.8346 LearningRate 0.0004 Epoch: 17 Global Step: 30210 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-05 18:45:17,356-Speed 9422.79 samples/sec Loss 1.8485 LearningRate 0.0004 Epoch: 17 Global Step: 30220 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-05 18:45:43,408-Speed 9434.08 samples/sec Loss 1.8399 LearningRate 0.0004 Epoch: 17 Global Step: 30230 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-05 18:46:09,493-Speed 9422.08 samples/sec Loss 1.8346 LearningRate 0.0004 Epoch: 17 Global Step: 30240 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-05 18:46:35,604-Speed 9412.16 samples/sec Loss 1.8389 LearningRate 0.0004 Epoch: 17 Global Step: 30250 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-05 18:47:01,758-Speed 9397.15 samples/sec Loss 1.8270 LearningRate 0.0004 Epoch: 17 Global Step: 30260 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-05 18:47:27,834-Speed 9425.18 samples/sec Loss 1.8319 LearningRate 0.0004 Epoch: 17 Global Step: 30270 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-05 18:47:53,933-Speed 9417.10 samples/sec Loss 1.8368 LearningRate 0.0004 Epoch: 17 Global Step: 30280 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-05 18:48:20,061-Speed 9406.17 samples/sec Loss 1.8195 LearningRate 0.0004 Epoch: 17 Global Step: 30290 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-05 18:48:46,205-Speed 9400.64 samples/sec Loss 1.8183 LearningRate 0.0004 Epoch: 17 Global Step: 30300 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-03-05 18:49:12,284-Speed 9424.09 samples/sec Loss 1.8316 LearningRate 0.0004 Epoch: 17 Global Step: 30310 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-05 18:49:38,471-Speed 9385.37 samples/sec Loss 1.8354 LearningRate 0.0004 Epoch: 17 Global Step: 30320 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-05 18:50:04,521-Speed 9434.49 samples/sec Loss 1.8281 LearningRate 0.0004 Epoch: 17 Global Step: 30330 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-05 18:50:30,601-Speed 9423.98 samples/sec Loss 1.8327 LearningRate 0.0004 Epoch: 17 Global Step: 30340 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-05 18:50:56,733-Speed 9404.69 samples/sec Loss 1.8353 LearningRate 0.0004 Epoch: 17 Global Step: 30350 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-05 18:51:22,816-Speed 9422.82 samples/sec Loss 1.8296 LearningRate 0.0004 Epoch: 17 Global Step: 30360 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-05 18:51:48,907-Speed 9419.68 samples/sec Loss 1.8262 LearningRate 0.0004 Epoch: 17 Global Step: 30370 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-05 18:52:14,994-Speed 9421.19 samples/sec Loss 1.8206 LearningRate 0.0004 Epoch: 17 Global Step: 30380 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-05 18:52:41,115-Speed 9409.28 samples/sec Loss 1.8119 LearningRate 0.0004 Epoch: 17 Global Step: 30390 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 18:53:07,240-Speed 9407.56 samples/sec Loss 1.8240 LearningRate 0.0004 Epoch: 17 Global Step: 30400 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 18:53:33,379-Speed 9402.67 samples/sec Loss 1.8137 LearningRate 0.0004 Epoch: 17 Global Step: 30410 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-03-05 18:53:59,389-Speed 9449.02 samples/sec Loss 1.8214 LearningRate 0.0004 Epoch: 17 Global Step: 30420 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 18:54:25,469-Speed 9423.79 samples/sec Loss 1.8263 LearningRate 0.0004 Epoch: 17 Global Step: 30430 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 18:54:51,571-Speed 9415.76 samples/sec Loss 1.8165 LearningRate 0.0004 Epoch: 17 Global Step: 30440 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 18:55:17,686-Speed 9410.95 samples/sec Loss 1.8136 LearningRate 0.0004 Epoch: 17 Global Step: 30450 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 18:55:43,771-Speed 9422.17 samples/sec Loss 1.8228 LearningRate 0.0004 Epoch: 17 Global Step: 30460 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 18:56:09,866-Speed 9418.75 samples/sec Loss 1.8068 LearningRate 0.0004 Epoch: 17 Global Step: 30470 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 18:56:35,981-Speed 9411.04 samples/sec Loss 1.8249 LearningRate 0.0004 Epoch: 17 Global Step: 30480 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 18:57:02,053-Speed 9426.38 samples/sec Loss 1.8285 LearningRate 0.0004 Epoch: 17 Global Step: 30490 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 18:57:28,259-Speed 9378.42 samples/sec Loss 1.8120 LearningRate 0.0004 Epoch: 17 Global Step: 30500 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 18:57:54,470-Speed 9376.60 samples/sec Loss 1.8199 LearningRate 0.0004 Epoch: 17 Global Step: 30510 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 18:58:20,634-Speed 9393.62 samples/sec Loss 1.8161 LearningRate 0.0004 Epoch: 17 Global Step: 30520 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 18:58:46,765-Speed 9405.13 samples/sec Loss 1.8084 LearningRate 0.0004 Epoch: 17 Global Step: 30530 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 18:59:12,965-Speed 9380.55 samples/sec Loss 1.8194 LearningRate 0.0004 Epoch: 17 Global Step: 30540 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 18:59:39,149-Speed 9385.95 samples/sec Loss 1.8184 LearningRate 0.0004 Epoch: 17 Global Step: 30550 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:00:05,384-Speed 9368.00 samples/sec Loss 1.8297 LearningRate 0.0004 Epoch: 17 Global Step: 30560 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:00:31,533-Speed 9398.97 samples/sec Loss 1.8393 LearningRate 0.0004 Epoch: 17 Global Step: 30570 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:00:57,672-Speed 9402.57 samples/sec Loss 1.8147 LearningRate 0.0004 Epoch: 17 Global Step: 30580 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:01:23,837-Speed 9393.03 samples/sec Loss 1.8152 LearningRate 0.0004 Epoch: 17 Global Step: 30590 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:01:50,023-Speed 9385.63 samples/sec Loss 1.8026 LearningRate 0.0004 Epoch: 17 Global Step: 30600 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:02:16,139-Speed 9410.54 samples/sec Loss 1.8026 LearningRate 0.0004 Epoch: 17 Global Step: 30610 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:02:42,195-Speed 9432.80 samples/sec Loss 1.8162 LearningRate 0.0004 Epoch: 17 Global Step: 30620 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:03:08,283-Speed 9420.81 samples/sec Loss 1.8138 LearningRate 0.0004 Epoch: 17 Global Step: 30630 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:03:34,357-Speed 9425.75 samples/sec Loss 1.8101 LearningRate 0.0004 Epoch: 17 Global Step: 30640 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:04:00,472-Speed 9411.15 samples/sec Loss 1.8192 LearningRate 0.0004 Epoch: 17 Global Step: 30650 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:04:26,596-Speed 9407.78 samples/sec Loss 1.8107 LearningRate 0.0004 Epoch: 17 Global Step: 30660 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:04:52,763-Speed 9392.75 samples/sec Loss 1.8186 LearningRate 0.0004 Epoch: 17 Global Step: 30670 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:05:18,937-Speed 9389.78 samples/sec Loss 1.8113 LearningRate 0.0004 Epoch: 17 Global Step: 30680 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:05:45,139-Speed 9379.54 samples/sec Loss 1.8100 LearningRate 0.0004 Epoch: 17 Global Step: 30690 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:06:11,323-Speed 9386.52 samples/sec Loss 1.8151 LearningRate 0.0004 Epoch: 17 Global Step: 30700 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:06:37,485-Speed 9394.23 samples/sec Loss 1.8067 LearningRate 0.0004 Epoch: 17 Global Step: 30710 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:07:03,637-Speed 9397.46 samples/sec Loss 1.8090 LearningRate 0.0004 Epoch: 17 Global Step: 30720 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-03-05 19:07:29,728-Speed 9419.77 samples/sec Loss 1.8006 LearningRate 0.0004 Epoch: 17 Global Step: 30730 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-03-05 19:07:55,854-Speed 9407.02 samples/sec Loss 1.7973 LearningRate 0.0004 Epoch: 17 Global Step: 30740 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:08:22,110-Speed 9360.79 samples/sec Loss 1.8023 LearningRate 0.0004 Epoch: 17 Global Step: 30750 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:08:48,201-Speed 9419.96 samples/sec Loss 1.7962 LearningRate 0.0004 Epoch: 17 Global Step: 30760 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:09:14,305-Speed 9415.04 samples/sec Loss 1.8089 LearningRate 0.0004 Epoch: 17 Global Step: 30770 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:09:40,506-Speed 9379.78 samples/sec Loss 1.7997 LearningRate 0.0004 Epoch: 17 Global Step: 30780 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:10:06,670-Speed 9393.46 samples/sec Loss 1.7892 LearningRate 0.0004 Epoch: 17 Global Step: 30790 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:10:32,855-Speed 9386.07 samples/sec Loss 1.7985 LearningRate 0.0004 Epoch: 17 Global Step: 30800 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:10:59,012-Speed 9395.98 samples/sec Loss 1.8011 LearningRate 0.0004 Epoch: 17 Global Step: 30810 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:11:25,139-Speed 9406.58 samples/sec Loss 1.8003 LearningRate 0.0004 Epoch: 17 Global Step: 30820 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:11:51,262-Speed 9408.39 samples/sec Loss 1.8043 LearningRate 0.0004 Epoch: 17 Global Step: 30830 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:12:17,329-Speed 9428.36 samples/sec Loss 1.7934 LearningRate 0.0004 Epoch: 17 Global Step: 30840 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:12:43,475-Speed 9400.10 samples/sec Loss 1.7894 LearningRate 0.0004 Epoch: 17 Global Step: 30850 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:13:09,557-Speed 9422.76 samples/sec Loss 1.8043 LearningRate 0.0004 Epoch: 17 Global Step: 30860 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:13:35,660-Speed 9415.61 samples/sec Loss 1.8018 LearningRate 0.0004 Epoch: 17 Global Step: 30870 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:14:01,797-Speed 9403.67 samples/sec Loss 1.8000 LearningRate 0.0004 Epoch: 17 Global Step: 30880 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:14:27,946-Speed 9398.90 samples/sec Loss 1.8042 LearningRate 0.0004 Epoch: 17 Global Step: 30890 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:14:54,021-Speed 9425.55 samples/sec Loss 1.8130 LearningRate 0.0004 Epoch: 17 Global Step: 30900 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:15:20,112-Speed 9419.70 samples/sec Loss 1.7981 LearningRate 0.0004 Epoch: 17 Global Step: 30910 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:15:46,280-Speed 9392.05 samples/sec Loss 1.7995 LearningRate 0.0004 Epoch: 17 Global Step: 30920 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:16:12,413-Speed 9404.65 samples/sec Loss 1.7881 LearningRate 0.0004 Epoch: 17 Global Step: 30930 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:16:38,530-Speed 9410.26 samples/sec Loss 1.7980 LearningRate 0.0004 Epoch: 17 Global Step: 30940 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-03-05 19:17:04,662-Speed 9404.93 samples/sec Loss 1.8061 LearningRate 0.0004 Epoch: 17 Global Step: 30950 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:17:30,802-Speed 9402.30 samples/sec Loss 1.8005 LearningRate 0.0004 Epoch: 17 Global Step: 30960 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:17:56,941-Speed 9402.37 samples/sec Loss 1.7975 LearningRate 0.0004 Epoch: 17 Global Step: 30970 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:18:23,071-Speed 9405.64 samples/sec Loss 1.7867 LearningRate 0.0004 Epoch: 17 Global Step: 30980 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:18:49,227-Speed 9396.42 samples/sec Loss 1.7850 LearningRate 0.0004 Epoch: 17 Global Step: 30990 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:19:15,367-Speed 9402.11 samples/sec Loss 1.7820 LearningRate 0.0004 Epoch: 17 Global Step: 31000 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:19:41,455-Speed 9421.08 samples/sec Loss 1.7909 LearningRate 0.0004 Epoch: 17 Global Step: 31010 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:20:07,602-Speed 9399.33 samples/sec Loss 1.8130 LearningRate 0.0004 Epoch: 17 Global Step: 31020 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:20:33,640-Speed 9439.04 samples/sec Loss 1.8147 LearningRate 0.0004 Epoch: 17 Global Step: 31030 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:20:59,684-Speed 9436.89 samples/sec Loss 1.8106 LearningRate 0.0004 Epoch: 17 Global Step: 31040 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:21:25,851-Speed 9392.39 samples/sec Loss 1.8154 LearningRate 0.0004 Epoch: 17 Global Step: 31050 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-03-05 19:21:51,963-Speed 9412.28 samples/sec Loss 1.8082 LearningRate 0.0004 Epoch: 17 Global Step: 31060 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-03-05 19:22:18,070-Speed 9414.19 samples/sec Loss 1.8189 LearningRate 0.0004 Epoch: 17 Global Step: 31070 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-03-05 19:22:44,149-Speed 9423.97 samples/sec Loss 1.7969 LearningRate 0.0004 Epoch: 17 Global Step: 31080 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:23:10,268-Speed 9409.55 samples/sec Loss 1.8044 LearningRate 0.0004 Epoch: 17 Global Step: 31090 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:23:36,355-Speed 9421.20 samples/sec Loss 1.8200 LearningRate 0.0004 Epoch: 17 Global Step: 31100 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:24:02,493-Speed 9402.94 samples/sec Loss 1.8071 LearningRate 0.0004 Epoch: 17 Global Step: 31110 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:25:22,145-Speed 3085.48 samples/sec Loss 1.7660 LearningRate 0.0004 Epoch: 18 Global Step: 31120 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-05 19:25:48,100-Speed 9469.01 samples/sec Loss 1.7552 LearningRate 0.0004 Epoch: 18 Global Step: 31130 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-05 19:26:14,194-Speed 9418.76 samples/sec Loss 1.7699 LearningRate 0.0004 Epoch: 18 Global Step: 31140 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-05 19:26:40,292-Speed 9417.46 samples/sec Loss 1.7712 LearningRate 0.0004 Epoch: 18 Global Step: 31150 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-05 19:27:06,477-Speed 9385.99 samples/sec Loss 1.7747 LearningRate 0.0004 Epoch: 18 Global Step: 31160 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-05 19:27:32,520-Speed 9437.10 samples/sec Loss 1.7665 LearningRate 0.0004 Epoch: 18 Global Step: 31170 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-05 19:27:58,725-Speed 9378.79 samples/sec Loss 1.7719 LearningRate 0.0004 Epoch: 18 Global Step: 31180 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-05 19:28:24,835-Speed 9413.03 samples/sec Loss 1.7741 LearningRate 0.0004 Epoch: 18 Global Step: 31190 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-05 19:28:50,956-Speed 9409.08 samples/sec Loss 1.7711 LearningRate 0.0004 Epoch: 18 Global Step: 31200 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-05 19:29:17,086-Speed 9405.60 samples/sec Loss 1.7695 LearningRate 0.0004 Epoch: 18 Global Step: 31210 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-05 19:29:43,172-Speed 9421.59 samples/sec Loss 1.7588 LearningRate 0.0004 Epoch: 18 Global Step: 31220 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:30:09,303-Speed 9405.24 samples/sec Loss 1.7781 LearningRate 0.0004 Epoch: 18 Global Step: 31230 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:30:35,352-Speed 9434.73 samples/sec Loss 1.7749 LearningRate 0.0004 Epoch: 18 Global Step: 31240 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:31:01,408-Speed 9432.55 samples/sec Loss 1.7615 LearningRate 0.0004 Epoch: 18 Global Step: 31250 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-05 19:31:27,543-Speed 9403.96 samples/sec Loss 1.7736 LearningRate 0.0004 Epoch: 18 Global Step: 31260 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-05 19:31:53,691-Speed 9398.96 samples/sec Loss 1.7771 LearningRate 0.0004 Epoch: 18 Global Step: 31270 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-05 19:32:19,790-Speed 9416.68 samples/sec Loss 1.7687 LearningRate 0.0004 Epoch: 18 Global Step: 31280 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-05 19:32:46,013-Speed 9372.27 samples/sec Loss 1.7742 LearningRate 0.0004 Epoch: 18 Global Step: 31290 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-05 19:33:12,128-Speed 9411.43 samples/sec Loss 1.7759 LearningRate 0.0004 Epoch: 18 Global Step: 31300 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-05 19:33:38,255-Speed 9406.53 samples/sec Loss 1.7670 LearningRate 0.0004 Epoch: 18 Global Step: 31310 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-05 19:34:04,419-Speed 9393.33 samples/sec Loss 1.7761 LearningRate 0.0004 Epoch: 18 Global Step: 31320 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-05 19:34:30,546-Speed 9406.71 samples/sec Loss 1.7778 LearningRate 0.0004 Epoch: 18 Global Step: 31330 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-05 19:34:56,644-Speed 9417.28 samples/sec Loss 1.7752 LearningRate 0.0004 Epoch: 18 Global Step: 31340 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-05 19:35:22,750-Speed 9414.16 samples/sec Loss 1.7661 LearningRate 0.0004 Epoch: 18 Global Step: 31350 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:35:48,970-Speed 9373.43 samples/sec Loss 1.7770 LearningRate 0.0004 Epoch: 18 Global Step: 31360 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:36:15,076-Speed 9414.28 samples/sec Loss 1.7712 LearningRate 0.0004 Epoch: 18 Global Step: 31370 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:36:41,171-Speed 9418.33 samples/sec Loss 1.7790 LearningRate 0.0004 Epoch: 18 Global Step: 31380 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:37:07,302-Speed 9405.64 samples/sec Loss 1.7656 LearningRate 0.0004 Epoch: 18 Global Step: 31390 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:37:33,405-Speed 9415.36 samples/sec Loss 1.7605 LearningRate 0.0004 Epoch: 18 Global Step: 31400 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:37:59,474-Speed 9427.82 samples/sec Loss 1.7761 LearningRate 0.0004 Epoch: 18 Global Step: 31410 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:38:25,568-Speed 9418.67 samples/sec Loss 1.7634 LearningRate 0.0004 Epoch: 18 Global Step: 31420 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:38:51,657-Speed 9420.29 samples/sec Loss 1.7730 LearningRate 0.0004 Epoch: 18 Global Step: 31430 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:39:17,729-Speed 9426.89 samples/sec Loss 1.7836 LearningRate 0.0004 Epoch: 18 Global Step: 31440 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:39:43,850-Speed 9408.87 samples/sec Loss 1.7654 LearningRate 0.0004 Epoch: 18 Global Step: 31450 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-03-05 19:40:09,907-Speed 9432.27 samples/sec Loss 1.7662 LearningRate 0.0004 Epoch: 18 Global Step: 31460 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:40:36,115-Speed 9377.54 samples/sec Loss 1.7677 LearningRate 0.0004 Epoch: 18 Global Step: 31470 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:41:02,230-Speed 9411.25 samples/sec Loss 1.7633 LearningRate 0.0004 Epoch: 18 Global Step: 31480 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:41:28,381-Speed 9398.51 samples/sec Loss 1.7673 LearningRate 0.0004 Epoch: 18 Global Step: 31490 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:41:54,561-Speed 9387.59 samples/sec Loss 1.7733 LearningRate 0.0004 Epoch: 18 Global Step: 31500 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:42:20,714-Speed 9397.51 samples/sec Loss 1.7631 LearningRate 0.0004 Epoch: 18 Global Step: 31510 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:42:46,796-Speed 9422.67 samples/sec Loss 1.7763 LearningRate 0.0004 Epoch: 18 Global Step: 31520 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:43:12,890-Speed 9418.69 samples/sec Loss 1.7667 LearningRate 0.0004 Epoch: 18 Global Step: 31530 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:43:39,019-Speed 9406.16 samples/sec Loss 1.7670 LearningRate 0.0004 Epoch: 18 Global Step: 31540 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:44:05,199-Speed 9387.85 samples/sec Loss 1.7646 LearningRate 0.0004 Epoch: 18 Global Step: 31550 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:44:31,529-Speed 9333.98 samples/sec Loss 1.7564 LearningRate 0.0004 Epoch: 18 Global Step: 31560 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-03-05 19:44:57,598-Speed 9427.71 samples/sec Loss 1.7676 LearningRate 0.0004 Epoch: 18 Global Step: 31570 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:45:23,753-Speed 9396.85 samples/sec Loss 1.7559 LearningRate 0.0004 Epoch: 18 Global Step: 31580 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:45:49,906-Speed 9397.32 samples/sec Loss 1.7613 LearningRate 0.0004 Epoch: 18 Global Step: 31590 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:46:16,123-Speed 9374.85 samples/sec Loss 1.7699 LearningRate 0.0004 Epoch: 18 Global Step: 31600 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:46:42,246-Speed 9407.92 samples/sec Loss 1.7568 LearningRate 0.0004 Epoch: 18 Global Step: 31610 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:47:08,419-Speed 9389.98 samples/sec Loss 1.7619 LearningRate 0.0004 Epoch: 18 Global Step: 31620 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:47:34,509-Speed 9420.09 samples/sec Loss 1.7575 LearningRate 0.0004 Epoch: 18 Global Step: 31630 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:48:00,631-Speed 9408.89 samples/sec Loss 1.7649 LearningRate 0.0004 Epoch: 18 Global Step: 31640 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:48:26,746-Speed 9411.18 samples/sec Loss 1.7542 LearningRate 0.0004 Epoch: 18 Global Step: 31650 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:48:52,789-Speed 9436.98 samples/sec Loss 1.7563 LearningRate 0.0004 Epoch: 18 Global Step: 31660 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:49:18,884-Speed 9418.22 samples/sec Loss 1.7434 LearningRate 0.0004 Epoch: 18 Global Step: 31670 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:49:44,982-Speed 9417.45 samples/sec Loss 1.7538 LearningRate 0.0004 Epoch: 18 Global Step: 31680 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:50:11,202-Speed 9373.56 samples/sec Loss 1.7522 LearningRate 0.0004 Epoch: 18 Global Step: 31690 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:50:37,279-Speed 9424.76 samples/sec Loss 1.7623 LearningRate 0.0004 Epoch: 18 Global Step: 31700 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:51:03,389-Speed 9412.79 samples/sec Loss 1.7533 LearningRate 0.0004 Epoch: 18 Global Step: 31710 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:51:29,519-Speed 9405.61 samples/sec Loss 1.7701 LearningRate 0.0004 Epoch: 18 Global Step: 31720 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:51:55,694-Speed 9389.57 samples/sec Loss 1.7597 LearningRate 0.0004 Epoch: 18 Global Step: 31730 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:52:21,855-Speed 9395.32 samples/sec Loss 1.7541 LearningRate 0.0004 Epoch: 18 Global Step: 31740 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-05 19:52:47,929-Speed 9426.09 samples/sec Loss 1.7577 LearningRate 0.0004 Epoch: 18 Global Step: 31750 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 19:53:14,059-Speed 9405.55 samples/sec Loss 1.7457 LearningRate 0.0004 Epoch: 18 Global Step: 31760 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-05 19:53:40,220-Speed 9394.49 samples/sec Loss 1.7520 LearningRate 0.0004 Epoch: 18 Global Step: 31770 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-05 19:54:06,379-Speed 9395.27 samples/sec Loss 1.7591 LearningRate 0.0004 Epoch: 18 Global Step: 31780 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-05 19:54:32,561-Speed 9387.05 samples/sec Loss 1.7410 LearningRate 0.0004 Epoch: 18 Global Step: 31790 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-05 19:54:58,706-Speed 9400.37 samples/sec Loss 1.7396 LearningRate 0.0004 Epoch: 18 Global Step: 31800 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-05 19:55:24,771-Speed 9429.04 samples/sec Loss 1.7647 LearningRate 0.0004 Epoch: 18 Global Step: 31810 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-05 19:55:50,907-Speed 9403.62 samples/sec Loss 1.7391 LearningRate 0.0004 Epoch: 18 Global Step: 31820 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-05 19:56:17,078-Speed 9391.31 samples/sec Loss 1.7452 LearningRate 0.0004 Epoch: 18 Global Step: 31830 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-05 19:56:43,210-Speed 9404.67 samples/sec Loss 1.7510 LearningRate 0.0004 Epoch: 18 Global Step: 31840 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-05 19:57:09,320-Speed 9413.18 samples/sec Loss 1.7459 LearningRate 0.0004 Epoch: 18 Global Step: 31850 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-05 19:57:35,383-Speed 9429.64 samples/sec Loss 1.7468 LearningRate 0.0004 Epoch: 18 Global Step: 31860 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 19:58:01,511-Speed 9406.32 samples/sec Loss 1.7483 LearningRate 0.0004 Epoch: 18 Global Step: 31870 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 19:58:27,627-Speed 9411.01 samples/sec Loss 1.7409 LearningRate 0.0004 Epoch: 18 Global Step: 31880 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 19:58:53,704-Speed 9424.96 samples/sec Loss 1.7478 LearningRate 0.0004 Epoch: 18 Global Step: 31890 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 19:59:19,818-Speed 9411.25 samples/sec Loss 1.7425 LearningRate 0.0004 Epoch: 18 Global Step: 31900 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 19:59:45,961-Speed 9401.21 samples/sec Loss 1.7499 LearningRate 0.0004 Epoch: 18 Global Step: 31910 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:00:12,031-Speed 9427.39 samples/sec Loss 1.7287 LearningRate 0.0004 Epoch: 18 Global Step: 31920 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-05 20:00:38,156-Speed 9407.52 samples/sec Loss 1.7308 LearningRate 0.0004 Epoch: 18 Global Step: 31930 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-05 20:01:04,339-Speed 9386.65 samples/sec Loss 1.7347 LearningRate 0.0004 Epoch: 18 Global Step: 31940 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-05 20:01:30,433-Speed 9418.47 samples/sec Loss 1.7342 LearningRate 0.0004 Epoch: 18 Global Step: 31950 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-05 20:01:56,511-Speed 9424.67 samples/sec Loss 1.7356 LearningRate 0.0004 Epoch: 18 Global Step: 31960 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-05 20:02:22,659-Speed 9399.29 samples/sec Loss 1.7433 LearningRate 0.0004 Epoch: 18 Global Step: 31970 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-05 20:02:48,794-Speed 9403.91 samples/sec Loss 1.7482 LearningRate 0.0004 Epoch: 18 Global Step: 31980 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-05 20:03:14,883-Speed 9420.55 samples/sec Loss 1.7354 LearningRate 0.0004 Epoch: 18 Global Step: 31990 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-05 20:03:41,007-Speed 9408.11 samples/sec Loss 1.7272 LearningRate 0.0004 Epoch: 18 Global Step: 32000 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-05 20:04:07,057-Speed 9434.32 samples/sec Loss 1.7276 LearningRate 0.0004 Epoch: 18 Global Step: 32010 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-05 20:04:33,137-Speed 9423.77 samples/sec Loss 1.7403 LearningRate 0.0004 Epoch: 18 Global Step: 32020 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:04:59,258-Speed 9408.98 samples/sec Loss 1.7410 LearningRate 0.0004 Epoch: 18 Global Step: 32030 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:05:25,375-Speed 9410.28 samples/sec Loss 1.7394 LearningRate 0.0004 Epoch: 18 Global Step: 32040 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:05:51,468-Speed 9419.22 samples/sec Loss 1.7233 LearningRate 0.0004 Epoch: 18 Global Step: 32050 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:06:17,645-Speed 9388.82 samples/sec Loss 1.7378 LearningRate 0.0004 Epoch: 18 Global Step: 32060 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:06:43,758-Speed 9411.81 samples/sec Loss 1.7370 LearningRate 0.0004 Epoch: 18 Global Step: 32070 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:07:09,867-Speed 9414.04 samples/sec Loss 1.7348 LearningRate 0.0004 Epoch: 18 Global Step: 32080 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:07:36,113-Speed 9363.88 samples/sec Loss 1.7404 LearningRate 0.0004 Epoch: 18 Global Step: 32090 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:08:02,258-Speed 9400.48 samples/sec Loss 1.7387 LearningRate 0.0004 Epoch: 18 Global Step: 32100 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:08:28,368-Speed 9412.95 samples/sec Loss 1.7298 LearningRate 0.0004 Epoch: 18 Global Step: 32110 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:08:54,519-Speed 9398.32 samples/sec Loss 1.7325 LearningRate 0.0004 Epoch: 18 Global Step: 32120 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-03-05 20:09:20,772-Speed 9362.19 samples/sec Loss 1.7305 LearningRate 0.0004 Epoch: 18 Global Step: 32130 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-03-05 20:09:46,922-Speed 9398.64 samples/sec Loss 1.7282 LearningRate 0.0004 Epoch: 18 Global Step: 32140 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:10:13,074-Speed 9397.85 samples/sec Loss 1.7188 LearningRate 0.0004 Epoch: 18 Global Step: 32150 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:10:39,219-Speed 9400.25 samples/sec Loss 1.7293 LearningRate 0.0004 Epoch: 18 Global Step: 32160 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:11:05,349-Speed 9405.77 samples/sec Loss 1.7335 LearningRate 0.0004 Epoch: 18 Global Step: 32170 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:11:31,577-Speed 9370.24 samples/sec Loss 1.7263 LearningRate 0.0004 Epoch: 18 Global Step: 32180 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:11:57,800-Speed 9372.22 samples/sec Loss 1.7391 LearningRate 0.0004 Epoch: 18 Global Step: 32190 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:12:24,093-Speed 9347.61 samples/sec Loss 1.7488 LearningRate 0.0004 Epoch: 18 Global Step: 32200 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:12:50,284-Speed 9383.43 samples/sec Loss 1.7334 LearningRate 0.0004 Epoch: 18 Global Step: 32210 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:13:16,389-Speed 9415.15 samples/sec Loss 1.7251 LearningRate 0.0004 Epoch: 18 Global Step: 32220 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:13:42,526-Speed 9403.25 samples/sec Loss 1.7350 LearningRate 0.0004 Epoch: 18 Global Step: 32230 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:14:08,690-Speed 9393.46 samples/sec Loss 1.7222 LearningRate 0.0004 Epoch: 18 Global Step: 32240 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-03-05 20:14:34,826-Speed 9403.40 samples/sec Loss 1.7104 LearningRate 0.0004 Epoch: 18 Global Step: 32250 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-03-05 20:15:00,864-Speed 9439.08 samples/sec Loss 1.7130 LearningRate 0.0004 Epoch: 18 Global Step: 32260 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:15:26,984-Speed 9409.66 samples/sec Loss 1.7205 LearningRate 0.0004 Epoch: 18 Global Step: 32270 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:15:53,053-Speed 9427.74 samples/sec Loss 1.7153 LearningRate 0.0004 Epoch: 18 Global Step: 32280 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:16:19,159-Speed 9414.25 samples/sec Loss 1.7275 LearningRate 0.0004 Epoch: 18 Global Step: 32290 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:16:45,232-Speed 9426.35 samples/sec Loss 1.7267 LearningRate 0.0004 Epoch: 18 Global Step: 32300 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:17:11,382-Speed 9398.28 samples/sec Loss 1.7403 LearningRate 0.0004 Epoch: 18 Global Step: 32310 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:17:37,525-Speed 9401.03 samples/sec Loss 1.7219 LearningRate 0.0003 Epoch: 18 Global Step: 32320 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:18:03,596-Speed 9427.21 samples/sec Loss 1.7149 LearningRate 0.0003 Epoch: 18 Global Step: 32330 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:18:29,643-Speed 9435.39 samples/sec Loss 1.7194 LearningRate 0.0003 Epoch: 18 Global Step: 32340 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:18:55,681-Speed 9438.98 samples/sec Loss 1.7123 LearningRate 0.0003 Epoch: 18 Global Step: 32350 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:19:21,812-Speed 9405.50 samples/sec Loss 1.7209 LearningRate 0.0003 Epoch: 18 Global Step: 32360 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-03-05 20:19:48,000-Speed 9384.78 samples/sec Loss 1.7102 LearningRate 0.0003 Epoch: 18 Global Step: 32370 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-03-05 20:20:14,258-Speed 9359.95 samples/sec Loss 1.7208 LearningRate 0.0003 Epoch: 18 Global Step: 32380 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-03-05 20:20:40,326-Speed 9427.99 samples/sec Loss 1.7249 LearningRate 0.0003 Epoch: 18 Global Step: 32390 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:21:06,492-Speed 9392.59 samples/sec Loss 1.7151 LearningRate 0.0003 Epoch: 18 Global Step: 32400 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:21:32,767-Speed 9353.64 samples/sec Loss 1.7160 LearningRate 0.0003 Epoch: 18 Global Step: 32410 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:21:59,001-Speed 9368.47 samples/sec Loss 1.7261 LearningRate 0.0003 Epoch: 18 Global Step: 32420 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-05 20:22:25,091-Speed 9420.02 samples/sec Loss 1.7030 LearningRate 0.0003 Epoch: 18 Global Step: 32430 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-05 20:22:51,289-Speed 9381.37 samples/sec Loss 1.7097 LearningRate 0.0003 Epoch: 18 Global Step: 32440 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-05 20:23:17,449-Speed 9394.82 samples/sec Loss 1.7151 LearningRate 0.0003 Epoch: 18 Global Step: 32450 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-05 20:23:43,557-Speed 9413.76 samples/sec Loss 1.7051 LearningRate 0.0003 Epoch: 18 Global Step: 32460 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-05 20:24:09,758-Speed 9380.15 samples/sec Loss 1.7146 LearningRate 0.0003 Epoch: 18 Global Step: 32470 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-05 20:24:35,958-Speed 9380.30 samples/sec Loss 1.7075 LearningRate 0.0003 Epoch: 18 Global Step: 32480 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-05 20:25:02,177-Speed 9373.73 samples/sec Loss 1.7145 LearningRate 0.0003 Epoch: 18 Global Step: 32490 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-05 20:25:28,382-Speed 9378.74 samples/sec Loss 1.7148 LearningRate 0.0003 Epoch: 18 Global Step: 32500 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-05 20:25:54,684-Speed 9344.49 samples/sec Loss 1.7099 LearningRate 0.0003 Epoch: 18 Global Step: 32510 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-05 20:26:20,916-Speed 9369.02 samples/sec Loss 1.7126 LearningRate 0.0003 Epoch: 18 Global Step: 32520 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:26:47,019-Speed 9415.36 samples/sec Loss 1.7221 LearningRate 0.0003 Epoch: 18 Global Step: 32530 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:27:13,182-Speed 9393.88 samples/sec Loss 1.7021 LearningRate 0.0003 Epoch: 18 Global Step: 32540 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:27:39,374-Speed 9384.13 samples/sec Loss 1.7255 LearningRate 0.0003 Epoch: 18 Global Step: 32550 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:28:05,616-Speed 9365.35 samples/sec Loss 1.7170 LearningRate 0.0003 Epoch: 18 Global Step: 32560 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:28:31,779-Speed 9393.75 samples/sec Loss 1.7121 LearningRate 0.0003 Epoch: 18 Global Step: 32570 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:28:57,933-Speed 9397.16 samples/sec Loss 1.6984 LearningRate 0.0003 Epoch: 18 Global Step: 32580 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:29:24,004-Speed 9427.16 samples/sec Loss 1.7083 LearningRate 0.0003 Epoch: 18 Global Step: 32590 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:29:50,191-Speed 9385.40 samples/sec Loss 1.7045 LearningRate 0.0003 Epoch: 18 Global Step: 32600 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:30:16,271-Speed 9423.94 samples/sec Loss 1.7087 LearningRate 0.0003 Epoch: 18 Global Step: 32610 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:30:42,306-Speed 9439.74 samples/sec Loss 1.7105 LearningRate 0.0003 Epoch: 18 Global Step: 32620 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-05 20:31:08,372-Speed 9428.99 samples/sec Loss 1.7024 LearningRate 0.0003 Epoch: 18 Global Step: 32630 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-05 20:31:34,490-Speed 9409.73 samples/sec Loss 1.7116 LearningRate 0.0003 Epoch: 18 Global Step: 32640 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-05 20:32:00,598-Speed 9413.75 samples/sec Loss 1.7043 LearningRate 0.0003 Epoch: 18 Global Step: 32650 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-05 20:32:26,645-Speed 9435.93 samples/sec Loss 1.7135 LearningRate 0.0003 Epoch: 18 Global Step: 32660 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-05 20:32:52,719-Speed 9425.66 samples/sec Loss 1.7116 LearningRate 0.0003 Epoch: 18 Global Step: 32670 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-05 20:33:18,795-Speed 9425.10 samples/sec Loss 1.7092 LearningRate 0.0003 Epoch: 18 Global Step: 32680 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-05 20:33:44,915-Speed 9409.15 samples/sec Loss 1.7025 LearningRate 0.0003 Epoch: 18 Global Step: 32690 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-05 20:34:11,067-Speed 9397.88 samples/sec Loss 1.7130 LearningRate 0.0003 Epoch: 18 Global Step: 32700 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-05 20:34:37,317-Speed 9362.60 samples/sec Loss 1.6987 LearningRate 0.0003 Epoch: 18 Global Step: 32710 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-05 20:35:03,403-Speed 9421.70 samples/sec Loss 1.7009 LearningRate 0.0003 Epoch: 18 Global Step: 32720 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:35:29,667-Speed 9357.73 samples/sec Loss 1.7089 LearningRate 0.0003 Epoch: 18 Global Step: 32730 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:35:55,813-Speed 9399.90 samples/sec Loss 1.6939 LearningRate 0.0003 Epoch: 18 Global Step: 32740 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:36:21,896-Speed 9422.74 samples/sec Loss 1.6946 LearningRate 0.0003 Epoch: 18 Global Step: 32750 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:36:47,965-Speed 9427.93 samples/sec Loss 1.7000 LearningRate 0.0003 Epoch: 18 Global Step: 32760 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:37:14,155-Speed 9383.82 samples/sec Loss 1.7177 LearningRate 0.0003 Epoch: 18 Global Step: 32770 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:37:40,260-Speed 9414.96 samples/sec Loss 1.7062 LearningRate 0.0003 Epoch: 18 Global Step: 32780 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:38:06,340-Speed 9423.36 samples/sec Loss 1.7173 LearningRate 0.0003 Epoch: 18 Global Step: 32790 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:38:32,517-Speed 9389.01 samples/sec Loss 1.7294 LearningRate 0.0003 Epoch: 18 Global Step: 32800 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:38:58,591-Speed 9426.09 samples/sec Loss 1.7057 LearningRate 0.0003 Epoch: 18 Global Step: 32810 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:39:24,652-Speed 9431.07 samples/sec Loss 1.7170 LearningRate 0.0003 Epoch: 18 Global Step: 32820 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-03-05 20:39:50,774-Speed 9408.58 samples/sec Loss 1.7156 LearningRate 0.0003 Epoch: 18 Global Step: 32830 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-03-05 20:41:10,888-Speed 3067.69 samples/sec Loss 1.7092 LearningRate 0.0003 Epoch: 19 Global Step: 32840 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-03-05 20:41:36,641-Speed 9543.29 samples/sec Loss 1.6822 LearningRate 0.0003 Epoch: 19 Global Step: 32850 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:42:02,514-Speed 9499.47 samples/sec Loss 1.6768 LearningRate 0.0003 Epoch: 19 Global Step: 32860 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:42:28,419-Speed 9487.24 samples/sec Loss 1.6857 LearningRate 0.0003 Epoch: 19 Global Step: 32870 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:42:54,382-Speed 9466.35 samples/sec Loss 1.6928 LearningRate 0.0003 Epoch: 19 Global Step: 32880 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:43:20,277-Speed 9490.84 samples/sec Loss 1.6942 LearningRate 0.0003 Epoch: 19 Global Step: 32890 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:43:46,186-Speed 9486.12 samples/sec Loss 1.6916 LearningRate 0.0003 Epoch: 19 Global Step: 32900 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:44:12,106-Speed 9481.63 samples/sec Loss 1.6735 LearningRate 0.0003 Epoch: 19 Global Step: 32910 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:44:38,156-Speed 9434.73 samples/sec Loss 1.6952 LearningRate 0.0003 Epoch: 19 Global Step: 32920 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:45:04,078-Speed 9481.05 samples/sec Loss 1.6820 LearningRate 0.0003 Epoch: 19 Global Step: 32930 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:45:30,064-Speed 9458.03 samples/sec Loss 1.6818 LearningRate 0.0003 Epoch: 19 Global Step: 32940 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:45:55,966-Speed 9488.44 samples/sec Loss 1.6822 LearningRate 0.0003 Epoch: 19 Global Step: 32950 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-03-05 20:46:21,873-Speed 9486.74 samples/sec Loss 1.6695 LearningRate 0.0003 Epoch: 19 Global Step: 32960 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:46:47,766-Speed 9491.89 samples/sec Loss 1.6900 LearningRate 0.0003 Epoch: 19 Global Step: 32970 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:47:13,766-Speed 9452.57 samples/sec Loss 1.6856 LearningRate 0.0003 Epoch: 19 Global Step: 32980 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:47:39,681-Speed 9483.60 samples/sec Loss 1.6767 LearningRate 0.0003 Epoch: 19 Global Step: 32990 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:48:05,691-Speed 9449.38 samples/sec Loss 1.6777 LearningRate 0.0003 Epoch: 19 Global Step: 33000 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:48:31,684-Speed 9455.08 samples/sec Loss 1.6677 LearningRate 0.0003 Epoch: 19 Global Step: 33010 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:48:57,646-Speed 9466.43 samples/sec Loss 1.6709 LearningRate 0.0003 Epoch: 19 Global Step: 33020 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:49:23,614-Speed 9464.46 samples/sec Loss 1.6827 LearningRate 0.0003 Epoch: 19 Global Step: 33030 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:49:49,641-Speed 9443.06 samples/sec Loss 1.6870 LearningRate 0.0003 Epoch: 19 Global Step: 33040 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:50:15,591-Speed 9471.15 samples/sec Loss 1.6806 LearningRate 0.0003 Epoch: 19 Global Step: 33050 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:50:41,573-Speed 9459.68 samples/sec Loss 1.6788 LearningRate 0.0003 Epoch: 19 Global Step: 33060 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:51:07,568-Speed 9454.53 samples/sec Loss 1.6736 LearningRate 0.0003 Epoch: 19 Global Step: 33070 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:51:33,564-Speed 9453.82 samples/sec Loss 1.6836 LearningRate 0.0003 Epoch: 19 Global Step: 33080 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-05 20:51:59,584-Speed 9445.60 samples/sec Loss 1.6976 LearningRate 0.0003 Epoch: 19 Global Step: 33090 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-05 20:52:25,564-Speed 9459.90 samples/sec Loss 1.6737 LearningRate 0.0003 Epoch: 19 Global Step: 33100 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-05 20:52:51,490-Speed 9479.84 samples/sec Loss 1.6856 LearningRate 0.0003 Epoch: 19 Global Step: 33110 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-05 20:53:17,509-Speed 9445.81 samples/sec Loss 1.6828 LearningRate 0.0003 Epoch: 19 Global Step: 33120 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-05 20:53:43,551-Speed 9437.25 samples/sec Loss 1.6820 LearningRate 0.0003 Epoch: 19 Global Step: 33130 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-05 20:54:09,592-Speed 9437.81 samples/sec Loss 1.6858 LearningRate 0.0003 Epoch: 19 Global Step: 33140 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-05 20:54:35,594-Speed 9451.93 samples/sec Loss 1.6732 LearningRate 0.0003 Epoch: 19 Global Step: 33150 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-05 20:55:01,610-Speed 9447.29 samples/sec Loss 1.6888 LearningRate 0.0003 Epoch: 19 Global Step: 33160 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-05 20:55:27,670-Speed 9430.94 samples/sec Loss 1.6868 LearningRate 0.0003 Epoch: 19 Global Step: 33170 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-05 20:55:53,658-Speed 9457.18 samples/sec Loss 1.6806 LearningRate 0.0003 Epoch: 19 Global Step: 33180 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-05 20:56:19,646-Speed 9456.98 samples/sec Loss 1.6771 LearningRate 0.0003 Epoch: 19 Global Step: 33190 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 20:56:45,762-Speed 9410.71 samples/sec Loss 1.6836 LearningRate 0.0003 Epoch: 19 Global Step: 33200 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 20:57:11,812-Speed 9434.94 samples/sec Loss 1.6753 LearningRate 0.0003 Epoch: 19 Global Step: 33210 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 20:57:37,919-Speed 9414.12 samples/sec Loss 1.6695 LearningRate 0.0003 Epoch: 19 Global Step: 33220 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 20:58:04,038-Speed 9409.54 samples/sec Loss 1.6765 LearningRate 0.0003 Epoch: 19 Global Step: 33230 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 20:58:30,136-Speed 9417.33 samples/sec Loss 1.6716 LearningRate 0.0003 Epoch: 19 Global Step: 33240 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 20:58:56,185-Speed 9434.65 samples/sec Loss 1.6657 LearningRate 0.0003 Epoch: 19 Global Step: 33250 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 20:59:22,249-Speed 9429.72 samples/sec Loss 1.6663 LearningRate 0.0003 Epoch: 19 Global Step: 33260 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 20:59:48,408-Speed 9395.10 samples/sec Loss 1.6837 LearningRate 0.0003 Epoch: 19 Global Step: 33270 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:00:14,559-Speed 9397.92 samples/sec Loss 1.6742 LearningRate 0.0003 Epoch: 19 Global Step: 33280 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:00:40,591-Speed 9441.38 samples/sec Loss 1.6871 LearningRate 0.0003 Epoch: 19 Global Step: 33290 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:01:06,721-Speed 9405.53 samples/sec Loss 1.6796 LearningRate 0.0003 Epoch: 19 Global Step: 33300 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:01:32,809-Speed 9421.12 samples/sec Loss 1.6818 LearningRate 0.0003 Epoch: 19 Global Step: 33310 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:01:58,918-Speed 9413.17 samples/sec Loss 1.6744 LearningRate 0.0003 Epoch: 19 Global Step: 33320 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:02:25,053-Speed 9403.75 samples/sec Loss 1.6710 LearningRate 0.0003 Epoch: 19 Global Step: 33330 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:02:51,116-Speed 9429.64 samples/sec Loss 1.6622 LearningRate 0.0003 Epoch: 19 Global Step: 33340 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:03:17,189-Speed 9426.35 samples/sec Loss 1.6768 LearningRate 0.0003 Epoch: 19 Global Step: 33350 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:03:43,391-Speed 9379.99 samples/sec Loss 1.6649 LearningRate 0.0003 Epoch: 19 Global Step: 33360 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:04:09,528-Speed 9403.14 samples/sec Loss 1.6584 LearningRate 0.0003 Epoch: 19 Global Step: 33370 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:04:35,648-Speed 9409.16 samples/sec Loss 1.6744 LearningRate 0.0003 Epoch: 19 Global Step: 33380 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:05:01,734-Speed 9421.57 samples/sec Loss 1.6725 LearningRate 0.0003 Epoch: 19 Global Step: 33390 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-03-05 21:05:27,894-Speed 9395.09 samples/sec Loss 1.6786 LearningRate 0.0003 Epoch: 19 Global Step: 33400 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-03-05 21:05:54,013-Speed 9409.67 samples/sec Loss 1.6603 LearningRate 0.0003 Epoch: 19 Global Step: 33410 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-03-05 21:06:20,118-Speed 9415.14 samples/sec Loss 1.6568 LearningRate 0.0003 Epoch: 19 Global Step: 33420 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:06:46,308-Speed 9384.04 samples/sec Loss 1.6626 LearningRate 0.0003 Epoch: 19 Global Step: 33430 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:07:12,421-Speed 9412.11 samples/sec Loss 1.6660 LearningRate 0.0003 Epoch: 19 Global Step: 33440 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:07:38,551-Speed 9405.73 samples/sec Loss 1.6728 LearningRate 0.0003 Epoch: 19 Global Step: 33450 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:08:04,610-Speed 9431.50 samples/sec Loss 1.6521 LearningRate 0.0003 Epoch: 19 Global Step: 33460 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:08:30,760-Speed 9399.16 samples/sec Loss 1.6459 LearningRate 0.0003 Epoch: 19 Global Step: 33470 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:08:56,976-Speed 9374.96 samples/sec Loss 1.6631 LearningRate 0.0003 Epoch: 19 Global Step: 33480 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:09:23,095-Speed 9409.90 samples/sec Loss 1.6515 LearningRate 0.0003 Epoch: 19 Global Step: 33490 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:09:49,333-Speed 9367.13 samples/sec Loss 1.6584 LearningRate 0.0003 Epoch: 19 Global Step: 33500 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:10:15,410-Speed 9424.60 samples/sec Loss 1.6629 LearningRate 0.0003 Epoch: 19 Global Step: 33510 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:10:41,526-Speed 9410.96 samples/sec Loss 1.6600 LearningRate 0.0003 Epoch: 19 Global Step: 33520 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-03-05 21:11:07,738-Speed 9376.24 samples/sec Loss 1.6558 LearningRate 0.0003 Epoch: 19 Global Step: 33530 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-03-05 21:11:33,900-Speed 9394.20 samples/sec Loss 1.6733 LearningRate 0.0003 Epoch: 19 Global Step: 33540 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-03-05 21:12:00,105-Speed 9378.71 samples/sec Loss 1.6761 LearningRate 0.0003 Epoch: 19 Global Step: 33550 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:12:26,207-Speed 9415.90 samples/sec Loss 1.6674 LearningRate 0.0003 Epoch: 19 Global Step: 33560 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:12:52,340-Speed 9404.77 samples/sec Loss 1.6603 LearningRate 0.0003 Epoch: 19 Global Step: 33570 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:13:18,486-Speed 9399.96 samples/sec Loss 1.6566 LearningRate 0.0003 Epoch: 19 Global Step: 33580 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:13:44,568-Speed 9423.12 samples/sec Loss 1.6498 LearningRate 0.0003 Epoch: 19 Global Step: 33590 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:14:10,740-Speed 9390.41 samples/sec Loss 1.6554 LearningRate 0.0003 Epoch: 19 Global Step: 33600 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:14:36,803-Speed 9430.37 samples/sec Loss 1.6500 LearningRate 0.0003 Epoch: 19 Global Step: 33610 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:15:02,916-Speed 9411.48 samples/sec Loss 1.6578 LearningRate 0.0003 Epoch: 19 Global Step: 33620 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:15:29,040-Speed 9408.25 samples/sec Loss 1.6495 LearningRate 0.0003 Epoch: 19 Global Step: 33630 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:15:55,117-Speed 9424.59 samples/sec Loss 1.6539 LearningRate 0.0003 Epoch: 19 Global Step: 33640 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:16:21,169-Speed 9434.03 samples/sec Loss 1.6572 LearningRate 0.0003 Epoch: 19 Global Step: 33650 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-03-05 21:16:47,216-Speed 9436.96 samples/sec Loss 1.6515 LearningRate 0.0003 Epoch: 19 Global Step: 33660 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-03-05 21:17:13,315-Speed 9416.98 samples/sec Loss 1.6462 LearningRate 0.0003 Epoch: 19 Global Step: 33670 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:17:39,435-Speed 9409.23 samples/sec Loss 1.6559 LearningRate 0.0003 Epoch: 19 Global Step: 33680 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:18:05,553-Speed 9409.94 samples/sec Loss 1.6570 LearningRate 0.0003 Epoch: 19 Global Step: 33690 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:18:31,671-Speed 9409.91 samples/sec Loss 1.6368 LearningRate 0.0003 Epoch: 19 Global Step: 33700 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:18:57,762-Speed 9419.64 samples/sec Loss 1.6531 LearningRate 0.0003 Epoch: 19 Global Step: 33710 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:19:23,910-Speed 9399.38 samples/sec Loss 1.6413 LearningRate 0.0003 Epoch: 19 Global Step: 33720 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:19:50,000-Speed 9420.02 samples/sec Loss 1.6414 LearningRate 0.0003 Epoch: 19 Global Step: 33730 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:20:16,150-Speed 9398.85 samples/sec Loss 1.6415 LearningRate 0.0003 Epoch: 19 Global Step: 33740 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:20:42,244-Speed 9418.39 samples/sec Loss 1.6526 LearningRate 0.0003 Epoch: 19 Global Step: 33750 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:21:08,391-Speed 9399.98 samples/sec Loss 1.6429 LearningRate 0.0003 Epoch: 19 Global Step: 33760 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:21:34,477-Speed 9421.54 samples/sec Loss 1.6530 LearningRate 0.0003 Epoch: 19 Global Step: 33770 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:22:00,665-Speed 9384.90 samples/sec Loss 1.6545 LearningRate 0.0003 Epoch: 19 Global Step: 33780 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:22:26,783-Speed 9410.04 samples/sec Loss 1.6484 LearningRate 0.0003 Epoch: 19 Global Step: 33790 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:22:52,877-Speed 9418.64 samples/sec Loss 1.6394 LearningRate 0.0003 Epoch: 19 Global Step: 33800 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:23:19,033-Speed 9396.36 samples/sec Loss 1.6574 LearningRate 0.0003 Epoch: 19 Global Step: 33810 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:23:45,171-Speed 9402.83 samples/sec Loss 1.6357 LearningRate 0.0003 Epoch: 19 Global Step: 33820 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:24:11,298-Speed 9406.89 samples/sec Loss 1.6384 LearningRate 0.0003 Epoch: 19 Global Step: 33830 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:24:37,420-Speed 9408.43 samples/sec Loss 1.6415 LearningRate 0.0003 Epoch: 19 Global Step: 33840 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:25:03,550-Speed 9405.55 samples/sec Loss 1.6387 LearningRate 0.0003 Epoch: 19 Global Step: 33850 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:25:29,674-Speed 9408.05 samples/sec Loss 1.6443 LearningRate 0.0003 Epoch: 19 Global Step: 33860 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:25:55,821-Speed 9399.60 samples/sec Loss 1.6455 LearningRate 0.0003 Epoch: 19 Global Step: 33870 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-03-05 21:26:21,940-Speed 9409.71 samples/sec Loss 1.6513 LearningRate 0.0003 Epoch: 19 Global Step: 33880 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-03-05 21:26:48,064-Speed 9407.93 samples/sec Loss 1.6545 LearningRate 0.0003 Epoch: 19 Global Step: 33890 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-03-05 21:27:14,154-Speed 9420.26 samples/sec Loss 1.6455 LearningRate 0.0003 Epoch: 19 Global Step: 33900 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-03-05 21:27:40,274-Speed 9409.27 samples/sec Loss 1.6238 LearningRate 0.0003 Epoch: 19 Global Step: 33910 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-03-05 21:28:06,428-Speed 9396.88 samples/sec Loss 1.6356 LearningRate 0.0003 Epoch: 19 Global Step: 33920 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-03-05 21:28:32,469-Speed 9437.89 samples/sec Loss 1.6401 LearningRate 0.0003 Epoch: 19 Global Step: 33930 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-03-05 21:28:58,522-Speed 9433.40 samples/sec Loss 1.6400 LearningRate 0.0003 Epoch: 19 Global Step: 33940 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-05 21:29:24,568-Speed 9435.94 samples/sec Loss 1.6318 LearningRate 0.0003 Epoch: 19 Global Step: 33950 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-05 21:29:50,670-Speed 9416.51 samples/sec Loss 1.6597 LearningRate 0.0003 Epoch: 19 Global Step: 33960 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-05 21:30:16,695-Speed 9443.47 samples/sec Loss 1.6486 LearningRate 0.0003 Epoch: 19 Global Step: 33970 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-05 21:30:42,837-Speed 9401.25 samples/sec Loss 1.6283 LearningRate 0.0003 Epoch: 19 Global Step: 33980 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-05 21:31:08,916-Speed 9424.11 samples/sec Loss 1.6336 LearningRate 0.0003 Epoch: 19 Global Step: 33990 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-05 21:31:35,015-Speed 9416.84 samples/sec Loss 1.6370 LearningRate 0.0003 Epoch: 19 Global Step: 34000 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-05 21:32:01,153-Speed 9402.93 samples/sec Loss 1.6291 LearningRate 0.0003 Epoch: 19 Global Step: 34010 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-05 21:32:27,283-Speed 9405.83 samples/sec Loss 1.6376 LearningRate 0.0003 Epoch: 19 Global Step: 34020 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-05 21:32:53,435-Speed 9397.62 samples/sec Loss 1.6356 LearningRate 0.0003 Epoch: 19 Global Step: 34030 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-05 21:33:19,588-Speed 9397.45 samples/sec Loss 1.6229 LearningRate 0.0003 Epoch: 19 Global Step: 34040 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:33:45,730-Speed 9401.41 samples/sec Loss 1.6338 LearningRate 0.0003 Epoch: 19 Global Step: 34050 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:34:11,935-Speed 9378.94 samples/sec Loss 1.6396 LearningRate 0.0003 Epoch: 19 Global Step: 34060 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:34:38,112-Speed 9388.54 samples/sec Loss 1.6418 LearningRate 0.0003 Epoch: 19 Global Step: 34070 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:35:04,242-Speed 9405.98 samples/sec Loss 1.6362 LearningRate 0.0003 Epoch: 19 Global Step: 34080 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:35:30,453-Speed 9376.52 samples/sec Loss 1.6239 LearningRate 0.0003 Epoch: 19 Global Step: 34090 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:35:56,668-Speed 9375.18 samples/sec Loss 1.6284 LearningRate 0.0003 Epoch: 19 Global Step: 34100 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:36:22,758-Speed 9421.00 samples/sec Loss 1.6261 LearningRate 0.0003 Epoch: 19 Global Step: 34110 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:36:48,919-Speed 9394.67 samples/sec Loss 1.6296 LearningRate 0.0003 Epoch: 19 Global Step: 34120 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:37:15,009-Speed 9419.97 samples/sec Loss 1.6279 LearningRate 0.0003 Epoch: 19 Global Step: 34130 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:37:41,123-Speed 9411.43 samples/sec Loss 1.6264 LearningRate 0.0003 Epoch: 19 Global Step: 34140 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:38:07,313-Speed 9384.35 samples/sec Loss 1.6318 LearningRate 0.0003 Epoch: 19 Global Step: 34150 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:38:33,445-Speed 9404.80 samples/sec Loss 1.6288 LearningRate 0.0003 Epoch: 19 Global Step: 34160 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:38:59,669-Speed 9372.10 samples/sec Loss 1.6292 LearningRate 0.0003 Epoch: 19 Global Step: 34170 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:39:25,834-Speed 9393.16 samples/sec Loss 1.6335 LearningRate 0.0003 Epoch: 19 Global Step: 34180 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:39:52,023-Speed 9384.38 samples/sec Loss 1.6398 LearningRate 0.0003 Epoch: 19 Global Step: 34190 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:40:18,089-Speed 9428.95 samples/sec Loss 1.6250 LearningRate 0.0003 Epoch: 19 Global Step: 34200 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:40:44,188-Speed 9416.85 samples/sec Loss 1.6232 LearningRate 0.0003 Epoch: 19 Global Step: 34210 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:41:10,313-Speed 9407.69 samples/sec Loss 1.6354 LearningRate 0.0003 Epoch: 19 Global Step: 34220 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:41:36,451-Speed 9402.92 samples/sec Loss 1.6151 LearningRate 0.0003 Epoch: 19 Global Step: 34230 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:42:02,550-Speed 9417.02 samples/sec Loss 1.6208 LearningRate 0.0003 Epoch: 19 Global Step: 34240 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-03-05 21:42:28,662-Speed 9412.37 samples/sec Loss 1.6144 LearningRate 0.0003 Epoch: 19 Global Step: 34250 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-03-05 21:42:54,704-Speed 9437.61 samples/sec Loss 1.6172 LearningRate 0.0003 Epoch: 19 Global Step: 34260 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:43:20,773-Speed 9427.57 samples/sec Loss 1.6220 LearningRate 0.0003 Epoch: 19 Global Step: 34270 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:43:46,907-Speed 9404.07 samples/sec Loss 1.6103 LearningRate 0.0003 Epoch: 19 Global Step: 34280 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:44:13,032-Speed 9407.83 samples/sec Loss 1.6206 LearningRate 0.0003 Epoch: 19 Global Step: 34290 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:44:39,116-Speed 9422.37 samples/sec Loss 1.6194 LearningRate 0.0003 Epoch: 19 Global Step: 34300 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:45:05,236-Speed 9409.48 samples/sec Loss 1.6184 LearningRate 0.0003 Epoch: 19 Global Step: 34310 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:45:31,326-Speed 9419.97 samples/sec Loss 1.6304 LearningRate 0.0003 Epoch: 19 Global Step: 34320 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:45:57,426-Speed 9416.51 samples/sec Loss 1.6151 LearningRate 0.0003 Epoch: 19 Global Step: 34330 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:46:23,446-Speed 9445.43 samples/sec Loss 1.6270 LearningRate 0.0003 Epoch: 19 Global Step: 34340 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:46:49,603-Speed 9396.32 samples/sec Loss 1.6353 LearningRate 0.0003 Epoch: 19 Global Step: 34350 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:47:15,752-Speed 9398.69 samples/sec Loss 1.6149 LearningRate 0.0003 Epoch: 19 Global Step: 34360 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-03-05 21:47:41,888-Speed 9403.28 samples/sec Loss 1.6187 LearningRate 0.0003 Epoch: 19 Global Step: 34370 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-03-05 21:48:08,036-Speed 9399.28 samples/sec Loss 1.6125 LearningRate 0.0003 Epoch: 19 Global Step: 34380 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:48:34,216-Speed 9387.55 samples/sec Loss 1.6024 LearningRate 0.0003 Epoch: 19 Global Step: 34390 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:49:00,434-Speed 9374.20 samples/sec Loss 1.6147 LearningRate 0.0003 Epoch: 19 Global Step: 34400 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:49:26,640-Speed 9378.72 samples/sec Loss 1.6102 LearningRate 0.0003 Epoch: 19 Global Step: 34410 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:49:52,776-Speed 9403.45 samples/sec Loss 1.6199 LearningRate 0.0003 Epoch: 19 Global Step: 34420 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:50:18,913-Speed 9403.33 samples/sec Loss 1.6049 LearningRate 0.0003 Epoch: 19 Global Step: 34430 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:50:45,050-Speed 9402.91 samples/sec Loss 1.6348 LearningRate 0.0003 Epoch: 19 Global Step: 34440 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-05 21:51:11,247-Speed 9381.63 samples/sec Loss 1.6202 LearningRate 0.0003 Epoch: 19 Global Step: 34450 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 21:51:37,291-Speed 9436.64 samples/sec Loss 1.6187 LearningRate 0.0003 Epoch: 19 Global Step: 34460 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 21:52:03,367-Speed 9425.18 samples/sec Loss 1.6186 LearningRate 0.0003 Epoch: 19 Global Step: 34470 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 21:52:29,476-Speed 9413.14 samples/sec Loss 1.6170 LearningRate 0.0003 Epoch: 19 Global Step: 34480 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-03-05 21:52:55,522-Speed 9435.86 samples/sec Loss 1.6205 LearningRate 0.0003 Epoch: 19 Global Step: 34490 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 21:53:21,660-Speed 9403.01 samples/sec Loss 1.6188 LearningRate 0.0003 Epoch: 19 Global Step: 34500 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 21:53:47,776-Speed 9410.78 samples/sec Loss 1.6271 LearningRate 0.0003 Epoch: 19 Global Step: 34510 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 21:54:13,888-Speed 9412.21 samples/sec Loss 1.6220 LearningRate 0.0003 Epoch: 19 Global Step: 34520 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-05 21:54:39,963-Speed 9425.14 samples/sec Loss 1.6208 LearningRate 0.0003 Epoch: 19 Global Step: 34530 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-05 21:55:06,136-Speed 9390.14 samples/sec Loss 1.6239 LearningRate 0.0003 Epoch: 19 Global Step: 34540 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-05 21:55:32,193-Speed 9432.73 samples/sec Loss 1.6310 LearningRate 0.0003 Epoch: 19 Global Step: 34550 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-05 21:55:58,283-Speed 9419.97 samples/sec Loss 1.6248 LearningRate 0.0003 Epoch: 19 Global Step: 34560 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-05 21:57:16,320-Speed 3149.33 samples/sec Loss 1.6128 LearningRate 0.0003 Epoch: 20 Global Step: 34570 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-05 21:57:42,292-Speed 9463.27 samples/sec Loss 1.5910 LearningRate 0.0003 Epoch: 20 Global Step: 34580 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-05 21:58:08,335-Speed 9437.20 samples/sec Loss 1.5977 LearningRate 0.0003 Epoch: 20 Global Step: 34590 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-05 21:58:34,439-Speed 9415.29 samples/sec Loss 1.5920 LearningRate 0.0003 Epoch: 20 Global Step: 34600 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-05 21:59:00,517-Speed 9424.65 samples/sec Loss 1.5978 LearningRate 0.0003 Epoch: 20 Global Step: 34610 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-05 21:59:26,571-Speed 9432.99 samples/sec Loss 1.6085 LearningRate 0.0003 Epoch: 20 Global Step: 34620 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 21:59:52,697-Speed 9407.39 samples/sec Loss 1.5904 LearningRate 0.0003 Epoch: 20 Global Step: 34630 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:00:18,774-Speed 9424.96 samples/sec Loss 1.5981 LearningRate 0.0003 Epoch: 20 Global Step: 34640 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:00:44,866-Speed 9419.26 samples/sec Loss 1.6022 LearningRate 0.0003 Epoch: 20 Global Step: 34650 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:01:10,899-Speed 9440.97 samples/sec Loss 1.5977 LearningRate 0.0003 Epoch: 20 Global Step: 34660 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:01:37,077-Speed 9388.50 samples/sec Loss 1.5985 LearningRate 0.0003 Epoch: 20 Global Step: 34670 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:02:03,148-Speed 9427.12 samples/sec Loss 1.6073 LearningRate 0.0003 Epoch: 20 Global Step: 34680 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:02:29,246-Speed 9417.10 samples/sec Loss 1.6081 LearningRate 0.0003 Epoch: 20 Global Step: 34690 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:02:55,410-Speed 9393.47 samples/sec Loss 1.5999 LearningRate 0.0003 Epoch: 20 Global Step: 34700 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:03:21,551-Speed 9402.02 samples/sec Loss 1.5916 LearningRate 0.0003 Epoch: 20 Global Step: 34710 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:03:47,648-Speed 9417.38 samples/sec Loss 1.6014 LearningRate 0.0003 Epoch: 20 Global Step: 34720 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-03-05 22:04:13,718-Speed 9427.38 samples/sec Loss 1.5916 LearningRate 0.0003 Epoch: 20 Global Step: 34730 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-03-05 22:04:39,743-Speed 9443.98 samples/sec Loss 1.5972 LearningRate 0.0003 Epoch: 20 Global Step: 34740 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:05:05,833-Speed 9420.31 samples/sec Loss 1.6009 LearningRate 0.0003 Epoch: 20 Global Step: 34750 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:05:31,939-Speed 9414.88 samples/sec Loss 1.5998 LearningRate 0.0003 Epoch: 20 Global Step: 34760 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:05:58,065-Speed 9407.08 samples/sec Loss 1.5895 LearningRate 0.0003 Epoch: 20 Global Step: 34770 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:06:24,229-Speed 9393.30 samples/sec Loss 1.5971 LearningRate 0.0003 Epoch: 20 Global Step: 34780 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:06:50,504-Speed 9353.98 samples/sec Loss 1.5948 LearningRate 0.0003 Epoch: 20 Global Step: 34790 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:07:16,725-Speed 9373.18 samples/sec Loss 1.5998 LearningRate 0.0003 Epoch: 20 Global Step: 34800 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:07:42,865-Speed 9401.99 samples/sec Loss 1.6016 LearningRate 0.0003 Epoch: 20 Global Step: 34810 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:08:09,135-Speed 9355.59 samples/sec Loss 1.5968 LearningRate 0.0003 Epoch: 20 Global Step: 34820 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:08:35,196-Speed 9430.47 samples/sec Loss 1.5990 LearningRate 0.0003 Epoch: 20 Global Step: 34830 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:09:01,308-Speed 9411.95 samples/sec Loss 1.5998 LearningRate 0.0003 Epoch: 20 Global Step: 34840 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:09:27,470-Speed 9394.19 samples/sec Loss 1.5826 LearningRate 0.0003 Epoch: 20 Global Step: 34850 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:09:53,605-Speed 9403.95 samples/sec Loss 1.5935 LearningRate 0.0003 Epoch: 20 Global Step: 34860 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:10:19,773-Speed 9391.98 samples/sec Loss 1.5969 LearningRate 0.0003 Epoch: 20 Global Step: 34870 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:10:45,982-Speed 9377.23 samples/sec Loss 1.5952 LearningRate 0.0003 Epoch: 20 Global Step: 34880 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:11:12,166-Speed 9386.13 samples/sec Loss 1.5940 LearningRate 0.0003 Epoch: 20 Global Step: 34890 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:11:38,386-Speed 9373.61 samples/sec Loss 1.5965 LearningRate 0.0003 Epoch: 20 Global Step: 34900 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:12:04,582-Speed 9382.10 samples/sec Loss 1.5949 LearningRate 0.0003 Epoch: 20 Global Step: 34910 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:12:30,819-Speed 9367.13 samples/sec Loss 1.5938 LearningRate 0.0003 Epoch: 20 Global Step: 34920 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:12:56,854-Speed 9439.79 samples/sec Loss 1.5844 LearningRate 0.0003 Epoch: 20 Global Step: 34930 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:13:22,995-Speed 9401.83 samples/sec Loss 1.5948 LearningRate 0.0003 Epoch: 20 Global Step: 34940 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-03-05 22:13:49,100-Speed 9414.41 samples/sec Loss 1.6011 LearningRate 0.0003 Epoch: 20 Global Step: 34950 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:14:15,216-Speed 9411.01 samples/sec Loss 1.5960 LearningRate 0.0003 Epoch: 20 Global Step: 34960 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:14:41,357-Speed 9401.53 samples/sec Loss 1.5987 LearningRate 0.0003 Epoch: 20 Global Step: 34970 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:15:07,420-Speed 9429.58 samples/sec Loss 1.5873 LearningRate 0.0003 Epoch: 20 Global Step: 34980 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:15:33,525-Speed 9415.05 samples/sec Loss 1.5979 LearningRate 0.0003 Epoch: 20 Global Step: 34990 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:15:59,678-Speed 9397.42 samples/sec Loss 1.5832 LearningRate 0.0003 Epoch: 20 Global Step: 35000 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:16:25,764-Speed 9421.35 samples/sec Loss 1.5988 LearningRate 0.0003 Epoch: 20 Global Step: 35010 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:16:51,851-Speed 9421.11 samples/sec Loss 1.5856 LearningRate 0.0003 Epoch: 20 Global Step: 35020 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:17:17,978-Speed 9406.67 samples/sec Loss 1.5909 LearningRate 0.0003 Epoch: 20 Global Step: 35030 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:17:44,086-Speed 9414.32 samples/sec Loss 1.5941 LearningRate 0.0003 Epoch: 20 Global Step: 35040 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:18:10,187-Speed 9416.11 samples/sec Loss 1.5982 LearningRate 0.0003 Epoch: 20 Global Step: 35050 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-05 22:18:36,347-Speed 9394.75 samples/sec Loss 1.5991 LearningRate 0.0003 Epoch: 20 Global Step: 35060 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-05 22:19:02,499-Speed 9398.20 samples/sec Loss 1.6032 LearningRate 0.0003 Epoch: 20 Global Step: 35070 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-05 22:19:28,660-Speed 9394.49 samples/sec Loss 1.5878 LearningRate 0.0003 Epoch: 20 Global Step: 35080 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-05 22:19:54,739-Speed 9425.05 samples/sec Loss 1.5916 LearningRate 0.0003 Epoch: 20 Global Step: 35090 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-05 22:20:20,844-Speed 9414.75 samples/sec Loss 1.5908 LearningRate 0.0003 Epoch: 20 Global Step: 35100 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-05 22:20:46,935-Speed 9419.75 samples/sec Loss 1.5894 LearningRate 0.0003 Epoch: 20 Global Step: 35110 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-05 22:21:13,029-Speed 9418.71 samples/sec Loss 1.5767 LearningRate 0.0003 Epoch: 20 Global Step: 35120 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-05 22:21:39,166-Speed 9402.94 samples/sec Loss 1.5800 LearningRate 0.0003 Epoch: 20 Global Step: 35130 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-05 22:22:05,290-Speed 9408.10 samples/sec Loss 1.5769 LearningRate 0.0003 Epoch: 20 Global Step: 35140 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-05 22:22:31,377-Speed 9421.03 samples/sec Loss 1.5858 LearningRate 0.0003 Epoch: 20 Global Step: 35150 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:22:57,472-Speed 9418.36 samples/sec Loss 1.5794 LearningRate 0.0003 Epoch: 20 Global Step: 35160 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:23:23,587-Speed 9411.01 samples/sec Loss 1.5682 LearningRate 0.0003 Epoch: 20 Global Step: 35170 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:23:49,729-Speed 9401.51 samples/sec Loss 1.5781 LearningRate 0.0003 Epoch: 20 Global Step: 35180 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:24:15,847-Speed 9410.33 samples/sec Loss 1.5759 LearningRate 0.0003 Epoch: 20 Global Step: 35190 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:24:41,923-Speed 9425.22 samples/sec Loss 1.5789 LearningRate 0.0003 Epoch: 20 Global Step: 35200 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:25:08,029-Speed 9414.04 samples/sec Loss 1.5833 LearningRate 0.0003 Epoch: 20 Global Step: 35210 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:25:34,133-Speed 9415.07 samples/sec Loss 1.5834 LearningRate 0.0003 Epoch: 20 Global Step: 35220 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:26:00,276-Speed 9401.02 samples/sec Loss 1.5729 LearningRate 0.0003 Epoch: 20 Global Step: 35230 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:26:26,384-Speed 9413.91 samples/sec Loss 1.5800 LearningRate 0.0003 Epoch: 20 Global Step: 35240 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:26:52,518-Speed 9404.32 samples/sec Loss 1.5624 LearningRate 0.0003 Epoch: 20 Global Step: 35250 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-03-05 22:27:18,588-Speed 9427.35 samples/sec Loss 1.5706 LearningRate 0.0003 Epoch: 20 Global Step: 35260 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:27:44,736-Speed 9399.28 samples/sec Loss 1.5756 LearningRate 0.0003 Epoch: 20 Global Step: 35270 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:28:10,886-Speed 9398.59 samples/sec Loss 1.5832 LearningRate 0.0003 Epoch: 20 Global Step: 35280 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:28:36,992-Speed 9414.39 samples/sec Loss 1.5766 LearningRate 0.0003 Epoch: 20 Global Step: 35290 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:29:03,038-Speed 9436.06 samples/sec Loss 1.5735 LearningRate 0.0003 Epoch: 20 Global Step: 35300 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:29:29,130-Speed 9419.63 samples/sec Loss 1.5732 LearningRate 0.0003 Epoch: 20 Global Step: 35310 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:29:55,210-Speed 9423.88 samples/sec Loss 1.5738 LearningRate 0.0003 Epoch: 20 Global Step: 35320 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:30:21,355-Speed 9400.26 samples/sec Loss 1.5766 LearningRate 0.0003 Epoch: 20 Global Step: 35330 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:30:47,423-Speed 9428.42 samples/sec Loss 1.5773 LearningRate 0.0003 Epoch: 20 Global Step: 35340 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:31:13,495-Speed 9426.80 samples/sec Loss 1.5722 LearningRate 0.0003 Epoch: 20 Global Step: 35350 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:31:39,578-Speed 9422.48 samples/sec Loss 1.5736 LearningRate 0.0003 Epoch: 20 Global Step: 35360 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-03-05 22:32:05,679-Speed 9416.19 samples/sec Loss 1.5602 LearningRate 0.0003 Epoch: 20 Global Step: 35370 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-05 22:32:31,821-Speed 9401.34 samples/sec Loss 1.5804 LearningRate 0.0003 Epoch: 20 Global Step: 35380 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-05 22:32:58,035-Speed 9375.67 samples/sec Loss 1.5760 LearningRate 0.0003 Epoch: 20 Global Step: 35390 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-05 22:33:24,197-Speed 9394.45 samples/sec Loss 1.5688 LearningRate 0.0003 Epoch: 20 Global Step: 35400 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-05 22:33:50,427-Speed 9369.78 samples/sec Loss 1.5639 LearningRate 0.0003 Epoch: 20 Global Step: 35410 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-05 22:34:16,547-Speed 9409.25 samples/sec Loss 1.5615 LearningRate 0.0003 Epoch: 20 Global Step: 35420 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-05 22:34:42,718-Speed 9390.85 samples/sec Loss 1.5613 LearningRate 0.0003 Epoch: 20 Global Step: 35430 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-05 22:35:08,848-Speed 9405.68 samples/sec Loss 1.5735 LearningRate 0.0003 Epoch: 20 Global Step: 35440 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-05 22:35:34,993-Speed 9400.31 samples/sec Loss 1.5528 LearningRate 0.0003 Epoch: 20 Global Step: 35450 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-05 22:36:01,219-Speed 9370.90 samples/sec Loss 1.5560 LearningRate 0.0003 Epoch: 20 Global Step: 35460 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-05 22:36:27,386-Speed 9392.49 samples/sec Loss 1.5540 LearningRate 0.0003 Epoch: 20 Global Step: 35470 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:36:53,489-Speed 9415.55 samples/sec Loss 1.5654 LearningRate 0.0003 Epoch: 20 Global Step: 35480 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-05 22:37:19,696-Speed 9378.09 samples/sec Loss 1.5594 LearningRate 0.0003 Epoch: 20 Global Step: 35490 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-05 22:37:45,830-Speed 9404.02 samples/sec Loss 1.5592 LearningRate 0.0003 Epoch: 20 Global Step: 35500 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-05 22:38:12,012-Speed 9386.99 samples/sec Loss 1.5674 LearningRate 0.0003 Epoch: 20 Global Step: 35510 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-05 22:38:38,097-Speed 9421.97 samples/sec Loss 1.5682 LearningRate 0.0003 Epoch: 20 Global Step: 35520 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-05 22:39:04,181-Speed 9422.19 samples/sec Loss 1.5582 LearningRate 0.0003 Epoch: 20 Global Step: 35530 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-05 22:39:30,402-Speed 9372.81 samples/sec Loss 1.5603 LearningRate 0.0003 Epoch: 20 Global Step: 35540 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-05 22:39:56,552-Speed 9398.74 samples/sec Loss 1.5782 LearningRate 0.0003 Epoch: 20 Global Step: 35550 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-05 22:40:22,634-Speed 9423.02 samples/sec Loss 1.5795 LearningRate 0.0003 Epoch: 20 Global Step: 35560 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-05 22:40:48,733-Speed 9416.83 samples/sec Loss 1.5585 LearningRate 0.0003 Epoch: 20 Global Step: 35570 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-05 22:41:14,821-Speed 9421.04 samples/sec Loss 1.5683 LearningRate 0.0003 Epoch: 20 Global Step: 35580 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:41:41,038-Speed 9374.55 samples/sec Loss 1.5697 LearningRate 0.0003 Epoch: 20 Global Step: 35590 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:42:07,174-Speed 9403.66 samples/sec Loss 1.5583 LearningRate 0.0003 Epoch: 20 Global Step: 35600 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:42:33,261-Speed 9421.10 samples/sec Loss 1.5607 LearningRate 0.0003 Epoch: 20 Global Step: 35610 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:42:59,365-Speed 9415.17 samples/sec Loss 1.5590 LearningRate 0.0003 Epoch: 20 Global Step: 35620 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:43:25,466-Speed 9416.77 samples/sec Loss 1.5597 LearningRate 0.0003 Epoch: 20 Global Step: 35630 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:43:51,556-Speed 9420.21 samples/sec Loss 1.5587 LearningRate 0.0003 Epoch: 20 Global Step: 35640 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:44:17,805-Speed 9363.02 samples/sec Loss 1.5614 LearningRate 0.0003 Epoch: 20 Global Step: 35650 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:44:43,981-Speed 9388.88 samples/sec Loss 1.5541 LearningRate 0.0003 Epoch: 20 Global Step: 35660 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:45:10,162-Speed 9387.45 samples/sec Loss 1.5526 LearningRate 0.0003 Epoch: 20 Global Step: 35670 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:45:36,268-Speed 9414.42 samples/sec Loss 1.5567 LearningRate 0.0003 Epoch: 20 Global Step: 35680 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-03-05 22:46:02,443-Speed 9389.50 samples/sec Loss 1.5539 LearningRate 0.0003 Epoch: 20 Global Step: 35690 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:46:28,589-Speed 9399.85 samples/sec Loss 1.5639 LearningRate 0.0003 Epoch: 20 Global Step: 35700 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:46:54,759-Speed 9391.29 samples/sec Loss 1.5588 LearningRate 0.0003 Epoch: 20 Global Step: 35710 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:47:20,784-Speed 9443.82 samples/sec Loss 1.5633 LearningRate 0.0003 Epoch: 20 Global Step: 35720 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:47:47,038-Speed 9361.11 samples/sec Loss 1.5660 LearningRate 0.0003 Epoch: 20 Global Step: 35730 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:48:13,226-Speed 9384.79 samples/sec Loss 1.5625 LearningRate 0.0003 Epoch: 20 Global Step: 35740 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-05 22:48:39,280-Speed 9433.20 samples/sec Loss 1.5517 LearningRate 0.0003 Epoch: 20 Global Step: 35750 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-05 22:49:05,325-Speed 9436.38 samples/sec Loss 1.5475 LearningRate 0.0003 Epoch: 20 Global Step: 35760 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-05 22:49:31,526-Speed 9380.18 samples/sec Loss 1.5591 LearningRate 0.0003 Epoch: 20 Global Step: 35770 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-05 22:49:57,574-Speed 9435.41 samples/sec Loss 1.5458 LearningRate 0.0003 Epoch: 20 Global Step: 35780 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-05 22:50:23,736-Speed 9394.06 samples/sec Loss 1.5488 LearningRate 0.0003 Epoch: 20 Global Step: 35790 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-05 22:50:49,937-Speed 9380.01 samples/sec Loss 1.5546 LearningRate 0.0003 Epoch: 20 Global Step: 35800 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-05 22:51:16,054-Speed 9410.23 samples/sec Loss 1.5549 LearningRate 0.0003 Epoch: 20 Global Step: 35810 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 22:51:42,101-Speed 9435.98 samples/sec Loss 1.5482 LearningRate 0.0003 Epoch: 20 Global Step: 35820 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 22:52:08,374-Speed 9354.39 samples/sec Loss 1.5510 LearningRate 0.0003 Epoch: 20 Global Step: 35830 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 22:52:34,518-Speed 9400.43 samples/sec Loss 1.5637 LearningRate 0.0003 Epoch: 20 Global Step: 35840 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 22:53:00,648-Speed 9405.70 samples/sec Loss 1.5441 LearningRate 0.0003 Epoch: 20 Global Step: 35850 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 22:53:26,774-Speed 9407.24 samples/sec Loss 1.5537 LearningRate 0.0003 Epoch: 20 Global Step: 35860 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 22:53:52,945-Speed 9390.80 samples/sec Loss 1.5403 LearningRate 0.0003 Epoch: 20 Global Step: 35870 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 22:54:19,100-Speed 9396.97 samples/sec Loss 1.5578 LearningRate 0.0003 Epoch: 20 Global Step: 35880 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 22:54:45,252-Speed 9397.72 samples/sec Loss 1.5463 LearningRate 0.0003 Epoch: 20 Global Step: 35890 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 22:55:11,405-Speed 9397.44 samples/sec Loss 1.5456 LearningRate 0.0003 Epoch: 20 Global Step: 35900 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 22:55:37,601-Speed 9381.86 samples/sec Loss 1.5399 LearningRate 0.0003 Epoch: 20 Global Step: 35910 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 22:56:03,697-Speed 9417.79 samples/sec Loss 1.5521 LearningRate 0.0003 Epoch: 20 Global Step: 35920 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 22:56:29,813-Speed 9410.73 samples/sec Loss 1.5484 LearningRate 0.0003 Epoch: 20 Global Step: 35930 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 22:56:55,896-Speed 9422.79 samples/sec Loss 1.5459 LearningRate 0.0003 Epoch: 20 Global Step: 35940 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 22:57:22,011-Speed 9410.78 samples/sec Loss 1.5411 LearningRate 0.0003 Epoch: 20 Global Step: 35950 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-03-05 22:57:48,098-Speed 9421.24 samples/sec Loss 1.5532 LearningRate 0.0003 Epoch: 20 Global Step: 35960 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 22:58:14,206-Speed 9413.68 samples/sec Loss 1.5382 LearningRate 0.0003 Epoch: 20 Global Step: 35970 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 22:58:40,263-Speed 9432.10 samples/sec Loss 1.5376 LearningRate 0.0003 Epoch: 20 Global Step: 35980 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 22:59:06,415-Speed 9397.60 samples/sec Loss 1.5479 LearningRate 0.0003 Epoch: 20 Global Step: 35990 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 22:59:32,535-Speed 9409.32 samples/sec Loss 1.5306 LearningRate 0.0003 Epoch: 20 Global Step: 36000 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 22:59:58,628-Speed 9419.02 samples/sec Loss 1.5494 LearningRate 0.0003 Epoch: 20 Global Step: 36010 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:00:24,680-Speed 9434.26 samples/sec Loss 1.5448 LearningRate 0.0003 Epoch: 20 Global Step: 36020 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:00:50,760-Speed 9423.61 samples/sec Loss 1.5489 LearningRate 0.0003 Epoch: 20 Global Step: 36030 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:01:16,867-Speed 9413.84 samples/sec Loss 1.5372 LearningRate 0.0003 Epoch: 20 Global Step: 36040 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:01:42,935-Speed 9428.31 samples/sec Loss 1.5325 LearningRate 0.0003 Epoch: 20 Global Step: 36050 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:02:09,074-Speed 9402.44 samples/sec Loss 1.5382 LearningRate 0.0003 Epoch: 20 Global Step: 36060 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:02:35,187-Speed 9411.73 samples/sec Loss 1.5478 LearningRate 0.0003 Epoch: 20 Global Step: 36070 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:03:01,289-Speed 9415.87 samples/sec Loss 1.5426 LearningRate 0.0003 Epoch: 20 Global Step: 36080 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:03:27,388-Speed 9417.03 samples/sec Loss 1.5431 LearningRate 0.0003 Epoch: 20 Global Step: 36090 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:03:53,544-Speed 9396.10 samples/sec Loss 1.5326 LearningRate 0.0003 Epoch: 20 Global Step: 36100 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:04:19,652-Speed 9413.71 samples/sec Loss 1.5365 LearningRate 0.0003 Epoch: 20 Global Step: 36110 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 23:04:45,745-Speed 9419.90 samples/sec Loss 1.5401 LearningRate 0.0003 Epoch: 20 Global Step: 36120 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 23:05:11,852-Speed 9413.82 samples/sec Loss 1.5453 LearningRate 0.0003 Epoch: 20 Global Step: 36130 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 23:05:37,969-Speed 9410.49 samples/sec Loss 1.5468 LearningRate 0.0003 Epoch: 20 Global Step: 36140 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 23:06:04,188-Speed 9373.88 samples/sec Loss 1.5350 LearningRate 0.0003 Epoch: 20 Global Step: 36150 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 23:06:30,249-Speed 9430.73 samples/sec Loss 1.5423 LearningRate 0.0003 Epoch: 20 Global Step: 36160 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 23:06:56,352-Speed 9415.42 samples/sec Loss 1.5284 LearningRate 0.0003 Epoch: 20 Global Step: 36170 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 23:07:22,478-Speed 9407.15 samples/sec Loss 1.5382 LearningRate 0.0003 Epoch: 20 Global Step: 36180 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 23:07:48,689-Speed 9376.79 samples/sec Loss 1.5488 LearningRate 0.0003 Epoch: 20 Global Step: 36190 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 23:08:14,835-Speed 9399.89 samples/sec Loss 1.5445 LearningRate 0.0003 Epoch: 20 Global Step: 36200 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 23:08:40,962-Speed 9407.15 samples/sec Loss 1.5325 LearningRate 0.0003 Epoch: 20 Global Step: 36210 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 23:09:07,103-Speed 9401.79 samples/sec Loss 1.5341 LearningRate 0.0003 Epoch: 20 Global Step: 36220 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 23:09:33,222-Speed 9409.77 samples/sec Loss 1.5398 LearningRate 0.0003 Epoch: 20 Global Step: 36230 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 23:09:59,342-Speed 9409.31 samples/sec Loss 1.5413 LearningRate 0.0003 Epoch: 20 Global Step: 36240 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 23:10:25,412-Speed 9427.43 samples/sec Loss 1.5308 LearningRate 0.0003 Epoch: 20 Global Step: 36250 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 23:10:51,562-Speed 9398.41 samples/sec Loss 1.5577 LearningRate 0.0003 Epoch: 20 Global Step: 36260 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 23:11:17,699-Speed 9403.21 samples/sec Loss 1.5484 LearningRate 0.0003 Epoch: 20 Global Step: 36270 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 23:11:43,818-Speed 9409.76 samples/sec Loss 1.5475 LearningRate 0.0003 Epoch: 20 Global Step: 36280 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 23:12:09,888-Speed 9427.24 samples/sec Loss 1.5444 LearningRate 0.0003 Epoch: 20 Global Step: 36290 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:13:29,106-Speed 3102.36 samples/sec Loss 1.5365 LearningRate 0.0003 Epoch: 21 Global Step: 36300 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:13:55,008-Speed 9488.60 samples/sec Loss 1.5156 LearningRate 0.0003 Epoch: 21 Global Step: 36310 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:14:20,996-Speed 9457.33 samples/sec Loss 1.5155 LearningRate 0.0003 Epoch: 21 Global Step: 36320 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:14:46,975-Speed 9460.09 samples/sec Loss 1.5173 LearningRate 0.0003 Epoch: 21 Global Step: 36330 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:15:12,974-Speed 9453.08 samples/sec Loss 1.5117 LearningRate 0.0003 Epoch: 21 Global Step: 36340 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:15:39,028-Speed 9433.36 samples/sec Loss 1.5166 LearningRate 0.0003 Epoch: 21 Global Step: 36350 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:16:05,032-Speed 9451.07 samples/sec Loss 1.5230 LearningRate 0.0003 Epoch: 21 Global Step: 36360 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:16:30,982-Speed 9470.94 samples/sec Loss 1.5163 LearningRate 0.0003 Epoch: 21 Global Step: 36370 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:16:57,007-Speed 9443.73 samples/sec Loss 1.5137 LearningRate 0.0003 Epoch: 21 Global Step: 36380 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:17:23,034-Speed 9442.80 samples/sec Loss 1.5128 LearningRate 0.0003 Epoch: 21 Global Step: 36390 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 23:17:49,117-Speed 9422.70 samples/sec Loss 1.5114 LearningRate 0.0003 Epoch: 21 Global Step: 36400 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 23:18:15,131-Speed 9447.58 samples/sec Loss 1.5281 LearningRate 0.0003 Epoch: 21 Global Step: 36410 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 23:18:41,149-Speed 9445.91 samples/sec Loss 1.5147 LearningRate 0.0003 Epoch: 21 Global Step: 36420 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 23:19:07,172-Speed 9444.16 samples/sec Loss 1.5133 LearningRate 0.0003 Epoch: 21 Global Step: 36430 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 23:19:33,360-Speed 9384.99 samples/sec Loss 1.5184 LearningRate 0.0003 Epoch: 21 Global Step: 36440 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 23:19:59,426-Speed 9428.62 samples/sec Loss 1.5183 LearningRate 0.0003 Epoch: 21 Global Step: 36450 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 23:20:25,519-Speed 9419.22 samples/sec Loss 1.5315 LearningRate 0.0003 Epoch: 21 Global Step: 36460 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 23:20:51,536-Speed 9446.54 samples/sec Loss 1.5235 LearningRate 0.0003 Epoch: 21 Global Step: 36470 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:21:17,703-Speed 9392.17 samples/sec Loss 1.5181 LearningRate 0.0003 Epoch: 21 Global Step: 36480 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:21:43,717-Speed 9447.72 samples/sec Loss 1.5281 LearningRate 0.0003 Epoch: 21 Global Step: 36490 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:22:09,879-Speed 9394.27 samples/sec Loss 1.5155 LearningRate 0.0003 Epoch: 21 Global Step: 36500 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:22:36,067-Speed 9384.92 samples/sec Loss 1.5200 LearningRate 0.0003 Epoch: 21 Global Step: 36510 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:23:02,245-Speed 9388.35 samples/sec Loss 1.5215 LearningRate 0.0003 Epoch: 21 Global Step: 36520 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:23:28,254-Speed 9449.75 samples/sec Loss 1.5110 LearningRate 0.0003 Epoch: 21 Global Step: 36530 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:23:54,339-Speed 9422.10 samples/sec Loss 1.5290 LearningRate 0.0003 Epoch: 21 Global Step: 36540 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:24:20,376-Speed 9439.40 samples/sec Loss 1.5152 LearningRate 0.0003 Epoch: 21 Global Step: 36550 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:24:46,470-Speed 9418.80 samples/sec Loss 1.5189 LearningRate 0.0003 Epoch: 21 Global Step: 36560 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:25:12,755-Speed 9349.98 samples/sec Loss 1.5196 LearningRate 0.0003 Epoch: 21 Global Step: 36570 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 23:25:38,832-Speed 9424.92 samples/sec Loss 1.5206 LearningRate 0.0003 Epoch: 21 Global Step: 36580 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 23:26:04,998-Speed 9392.63 samples/sec Loss 1.5129 LearningRate 0.0003 Epoch: 21 Global Step: 36590 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 23:26:31,127-Speed 9406.20 samples/sec Loss 1.5303 LearningRate 0.0003 Epoch: 21 Global Step: 36600 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 23:26:57,243-Speed 9410.47 samples/sec Loss 1.5165 LearningRate 0.0003 Epoch: 21 Global Step: 36610 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 23:27:23,375-Speed 9405.43 samples/sec Loss 1.5079 LearningRate 0.0003 Epoch: 21 Global Step: 36620 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 23:27:49,522-Speed 9399.80 samples/sec Loss 1.5147 LearningRate 0.0003 Epoch: 21 Global Step: 36630 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 23:28:15,677-Speed 9396.60 samples/sec Loss 1.5161 LearningRate 0.0003 Epoch: 21 Global Step: 36640 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 23:28:41,816-Speed 9402.70 samples/sec Loss 1.5150 LearningRate 0.0003 Epoch: 21 Global Step: 36650 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 23:29:08,025-Speed 9377.16 samples/sec Loss 1.5096 LearningRate 0.0003 Epoch: 21 Global Step: 36660 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 23:29:34,103-Speed 9424.51 samples/sec Loss 1.5102 LearningRate 0.0003 Epoch: 21 Global Step: 36670 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-03-05 23:30:00,256-Speed 9397.53 samples/sec Loss 1.5216 LearningRate 0.0003 Epoch: 21 Global Step: 36680 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-03-05 23:30:26,342-Speed 9421.59 samples/sec Loss 1.5049 LearningRate 0.0003 Epoch: 21 Global Step: 36690 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 23:30:52,500-Speed 9395.44 samples/sec Loss 1.5145 LearningRate 0.0003 Epoch: 21 Global Step: 36700 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 23:31:18,706-Speed 9378.56 samples/sec Loss 1.5049 LearningRate 0.0003 Epoch: 21 Global Step: 36710 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 23:31:44,830-Speed 9407.87 samples/sec Loss 1.5054 LearningRate 0.0003 Epoch: 21 Global Step: 36720 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 23:32:10,995-Speed 9392.75 samples/sec Loss 1.5062 LearningRate 0.0003 Epoch: 21 Global Step: 36730 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 23:32:37,105-Speed 9412.91 samples/sec Loss 1.5120 LearningRate 0.0003 Epoch: 21 Global Step: 36740 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:33:03,241-Speed 9403.74 samples/sec Loss 1.5148 LearningRate 0.0003 Epoch: 21 Global Step: 36750 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:33:29,345-Speed 9415.00 samples/sec Loss 1.5149 LearningRate 0.0003 Epoch: 21 Global Step: 36760 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:33:55,506-Speed 9394.54 samples/sec Loss 1.5096 LearningRate 0.0003 Epoch: 21 Global Step: 36770 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:34:21,646-Speed 9402.19 samples/sec Loss 1.5133 LearningRate 0.0003 Epoch: 21 Global Step: 36780 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:34:47,801-Speed 9396.35 samples/sec Loss 1.5036 LearningRate 0.0003 Epoch: 21 Global Step: 36790 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:35:13,884-Speed 9422.79 samples/sec Loss 1.5123 LearningRate 0.0003 Epoch: 21 Global Step: 36800 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:35:39,985-Speed 9416.17 samples/sec Loss 1.5049 LearningRate 0.0003 Epoch: 21 Global Step: 36810 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:36:06,231-Speed 9364.06 samples/sec Loss 1.5071 LearningRate 0.0003 Epoch: 21 Global Step: 36820 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:36:32,352-Speed 9409.05 samples/sec Loss 1.5149 LearningRate 0.0003 Epoch: 21 Global Step: 36830 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:36:58,482-Speed 9405.64 samples/sec Loss 1.5089 LearningRate 0.0003 Epoch: 21 Global Step: 36840 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 23:37:24,548-Speed 9428.86 samples/sec Loss 1.4923 LearningRate 0.0003 Epoch: 21 Global Step: 36850 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 23:37:50,669-Speed 9408.66 samples/sec Loss 1.5018 LearningRate 0.0003 Epoch: 21 Global Step: 36860 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:38:16,761-Speed 9419.33 samples/sec Loss 1.4947 LearningRate 0.0003 Epoch: 21 Global Step: 36870 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:38:42,839-Speed 9425.60 samples/sec Loss 1.5055 LearningRate 0.0003 Epoch: 21 Global Step: 36880 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:39:08,949-Speed 9412.60 samples/sec Loss 1.5098 LearningRate 0.0003 Epoch: 21 Global Step: 36890 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:39:35,008-Speed 9431.50 samples/sec Loss 1.4984 LearningRate 0.0003 Epoch: 21 Global Step: 36900 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:40:01,064-Speed 9432.45 samples/sec Loss 1.5077 LearningRate 0.0003 Epoch: 21 Global Step: 36910 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:40:27,211-Speed 9399.31 samples/sec Loss 1.4954 LearningRate 0.0003 Epoch: 21 Global Step: 36920 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:40:53,311-Speed 9416.88 samples/sec Loss 1.5037 LearningRate 0.0003 Epoch: 21 Global Step: 36930 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:41:19,397-Speed 9421.37 samples/sec Loss 1.4990 LearningRate 0.0003 Epoch: 21 Global Step: 36940 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:41:45,490-Speed 9418.86 samples/sec Loss 1.5007 LearningRate 0.0003 Epoch: 21 Global Step: 36950 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:42:11,615-Speed 9407.71 samples/sec Loss 1.4941 LearningRate 0.0003 Epoch: 21 Global Step: 36960 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 23:42:37,733-Speed 9409.79 samples/sec Loss 1.4914 LearningRate 0.0003 Epoch: 21 Global Step: 36970 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:43:03,908-Speed 9389.47 samples/sec Loss 1.5083 LearningRate 0.0003 Epoch: 21 Global Step: 36980 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:43:30,008-Speed 9416.54 samples/sec Loss 1.5009 LearningRate 0.0003 Epoch: 21 Global Step: 36990 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:43:56,105-Speed 9417.54 samples/sec Loss 1.5016 LearningRate 0.0003 Epoch: 21 Global Step: 37000 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:44:22,276-Speed 9391.35 samples/sec Loss 1.4959 LearningRate 0.0003 Epoch: 21 Global Step: 37010 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:44:48,409-Speed 9404.75 samples/sec Loss 1.5071 LearningRate 0.0003 Epoch: 21 Global Step: 37020 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:45:14,562-Speed 9397.53 samples/sec Loss 1.4949 LearningRate 0.0003 Epoch: 21 Global Step: 37030 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:45:40,714-Speed 9397.52 samples/sec Loss 1.4936 LearningRate 0.0003 Epoch: 21 Global Step: 37040 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:46:06,869-Speed 9396.70 samples/sec Loss 1.4954 LearningRate 0.0003 Epoch: 21 Global Step: 37050 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:46:33,036-Speed 9393.19 samples/sec Loss 1.4921 LearningRate 0.0003 Epoch: 21 Global Step: 37060 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:46:59,137-Speed 9416.10 samples/sec Loss 1.5016 LearningRate 0.0003 Epoch: 21 Global Step: 37070 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 23:47:25,261-Speed 9407.86 samples/sec Loss 1.4899 LearningRate 0.0003 Epoch: 21 Global Step: 37080 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 23:47:51,338-Speed 9424.89 samples/sec Loss 1.4864 LearningRate 0.0003 Epoch: 21 Global Step: 37090 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 23:48:17,518-Speed 9387.64 samples/sec Loss 1.5062 LearningRate 0.0003 Epoch: 21 Global Step: 37100 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 23:48:43,614-Speed 9419.76 samples/sec Loss 1.4975 LearningRate 0.0003 Epoch: 21 Global Step: 37110 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 23:49:09,778-Speed 9393.23 samples/sec Loss 1.4924 LearningRate 0.0003 Epoch: 21 Global Step: 37120 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 23:49:35,960-Speed 9386.89 samples/sec Loss 1.4962 LearningRate 0.0003 Epoch: 21 Global Step: 37130 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 23:50:02,078-Speed 9410.13 samples/sec Loss 1.4947 LearningRate 0.0003 Epoch: 21 Global Step: 37140 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-05 23:50:28,205-Speed 9407.05 samples/sec Loss 1.4894 LearningRate 0.0003 Epoch: 21 Global Step: 37150 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:50:54,304-Speed 9417.08 samples/sec Loss 1.4884 LearningRate 0.0003 Epoch: 21 Global Step: 37160 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-05 23:51:20,466-Speed 9393.91 samples/sec Loss 1.4809 LearningRate 0.0003 Epoch: 21 Global Step: 37170 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-05 23:51:46,602-Speed 9403.47 samples/sec Loss 1.4806 LearningRate 0.0003 Epoch: 21 Global Step: 37180 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-05 23:52:12,782-Speed 9387.99 samples/sec Loss 1.4855 LearningRate 0.0003 Epoch: 21 Global Step: 37190 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-05 23:52:39,003-Speed 9373.42 samples/sec Loss 1.4893 LearningRate 0.0003 Epoch: 21 Global Step: 37200 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-05 23:53:05,127-Speed 9407.68 samples/sec Loss 1.4920 LearningRate 0.0003 Epoch: 21 Global Step: 37210 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-05 23:53:31,258-Speed 9405.28 samples/sec Loss 1.4956 LearningRate 0.0003 Epoch: 21 Global Step: 37220 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-05 23:53:57,392-Speed 9404.41 samples/sec Loss 1.4898 LearningRate 0.0003 Epoch: 21 Global Step: 37230 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-05 23:54:23,492-Speed 9416.13 samples/sec Loss 1.4913 LearningRate 0.0003 Epoch: 21 Global Step: 37240 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-05 23:54:49,576-Speed 9422.40 samples/sec Loss 1.4952 LearningRate 0.0003 Epoch: 21 Global Step: 37250 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-05 23:55:15,708-Speed 9405.22 samples/sec Loss 1.4830 LearningRate 0.0003 Epoch: 21 Global Step: 37260 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-05 23:55:41,831-Speed 9408.14 samples/sec Loss 1.4783 LearningRate 0.0003 Epoch: 21 Global Step: 37270 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-05 23:56:07,986-Speed 9396.68 samples/sec Loss 1.4815 LearningRate 0.0003 Epoch: 21 Global Step: 37280 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-05 23:56:34,177-Speed 9383.56 samples/sec Loss 1.4780 LearningRate 0.0003 Epoch: 21 Global Step: 37290 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-05 23:57:00,321-Speed 9400.87 samples/sec Loss 1.4892 LearningRate 0.0003 Epoch: 21 Global Step: 37300 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-05 23:57:26,484-Speed 9393.66 samples/sec Loss 1.4807 LearningRate 0.0003 Epoch: 21 Global Step: 37310 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-05 23:57:52,611-Speed 9406.87 samples/sec Loss 1.4771 LearningRate 0.0003 Epoch: 21 Global Step: 37320 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-05 23:58:18,768-Speed 9395.82 samples/sec Loss 1.4828 LearningRate 0.0003 Epoch: 21 Global Step: 37330 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-05 23:58:44,985-Speed 9374.40 samples/sec Loss 1.4822 LearningRate 0.0003 Epoch: 21 Global Step: 37340 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-05 23:59:11,184-Speed 9381.05 samples/sec Loss 1.4813 LearningRate 0.0003 Epoch: 21 Global Step: 37350 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-03-05 23:59:37,343-Speed 9395.25 samples/sec Loss 1.4713 LearningRate 0.0003 Epoch: 21 Global Step: 37360 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:00:03,520-Speed 9389.07 samples/sec Loss 1.4845 LearningRate 0.0003 Epoch: 21 Global Step: 37370 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-06 00:00:29,728-Speed 9377.66 samples/sec Loss 1.4821 LearningRate 0.0003 Epoch: 21 Global Step: 37380 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-06 00:00:55,873-Speed 9400.56 samples/sec Loss 1.4743 LearningRate 0.0003 Epoch: 21 Global Step: 37390 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-06 00:01:21,976-Speed 9415.61 samples/sec Loss 1.4789 LearningRate 0.0003 Epoch: 21 Global Step: 37400 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-06 00:01:48,124-Speed 9399.43 samples/sec Loss 1.4850 LearningRate 0.0003 Epoch: 21 Global Step: 37410 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-06 00:02:14,307-Speed 9386.58 samples/sec Loss 1.4786 LearningRate 0.0003 Epoch: 21 Global Step: 37420 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-06 00:02:40,409-Speed 9415.61 samples/sec Loss 1.4758 LearningRate 0.0003 Epoch: 21 Global Step: 37430 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-06 00:03:06,549-Speed 9402.33 samples/sec Loss 1.4779 LearningRate 0.0003 Epoch: 21 Global Step: 37440 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-06 00:03:32,670-Speed 9409.17 samples/sec Loss 1.4802 LearningRate 0.0003 Epoch: 21 Global Step: 37450 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-06 00:03:58,813-Speed 9400.83 samples/sec Loss 1.4748 LearningRate 0.0003 Epoch: 21 Global Step: 37460 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-06 00:04:25,017-Speed 9379.58 samples/sec Loss 1.4802 LearningRate 0.0003 Epoch: 21 Global Step: 37470 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:04:51,185-Speed 9392.03 samples/sec Loss 1.4761 LearningRate 0.0003 Epoch: 21 Global Step: 37480 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:05:17,322-Speed 9403.23 samples/sec Loss 1.4722 LearningRate 0.0003 Epoch: 21 Global Step: 37490 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:05:43,397-Speed 9425.61 samples/sec Loss 1.4705 LearningRate 0.0003 Epoch: 21 Global Step: 37500 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:06:09,465-Speed 9428.15 samples/sec Loss 1.4745 LearningRate 0.0003 Epoch: 21 Global Step: 37510 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:06:35,654-Speed 9384.73 samples/sec Loss 1.4614 LearningRate 0.0003 Epoch: 21 Global Step: 37520 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:07:01,733-Speed 9424.21 samples/sec Loss 1.4805 LearningRate 0.0003 Epoch: 21 Global Step: 37530 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-06 00:07:27,805-Speed 9426.61 samples/sec Loss 1.4760 LearningRate 0.0003 Epoch: 21 Global Step: 37540 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-06 00:07:53,968-Speed 9393.76 samples/sec Loss 1.4727 LearningRate 0.0003 Epoch: 21 Global Step: 37550 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-06 00:08:20,067-Speed 9416.83 samples/sec Loss 1.4860 LearningRate 0.0003 Epoch: 21 Global Step: 37560 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-06 00:08:46,188-Speed 9408.82 samples/sec Loss 1.4712 LearningRate 0.0003 Epoch: 21 Global Step: 37570 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-06 00:09:12,372-Speed 9386.56 samples/sec Loss 1.4750 LearningRate 0.0003 Epoch: 21 Global Step: 37580 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-06 00:09:38,533-Speed 9394.56 samples/sec Loss 1.4723 LearningRate 0.0003 Epoch: 21 Global Step: 37590 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-06 00:10:04,651-Speed 9410.14 samples/sec Loss 1.4653 LearningRate 0.0003 Epoch: 21 Global Step: 37600 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-06 00:10:30,692-Speed 9437.65 samples/sec Loss 1.4673 LearningRate 0.0003 Epoch: 21 Global Step: 37610 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-06 00:10:56,776-Speed 9422.31 samples/sec Loss 1.4653 LearningRate 0.0003 Epoch: 21 Global Step: 37620 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-06 00:11:22,878-Speed 9415.87 samples/sec Loss 1.4723 LearningRate 0.0003 Epoch: 21 Global Step: 37630 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:11:49,055-Speed 9388.74 samples/sec Loss 1.4627 LearningRate 0.0003 Epoch: 21 Global Step: 37640 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:12:15,216-Speed 9394.42 samples/sec Loss 1.4735 LearningRate 0.0003 Epoch: 21 Global Step: 37650 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:12:41,370-Speed 9397.29 samples/sec Loss 1.4673 LearningRate 0.0003 Epoch: 21 Global Step: 37660 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:13:07,500-Speed 9405.56 samples/sec Loss 1.4686 LearningRate 0.0003 Epoch: 21 Global Step: 37670 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:13:33,645-Speed 9400.49 samples/sec Loss 1.4674 LearningRate 0.0003 Epoch: 21 Global Step: 37680 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-06 00:13:59,792-Speed 9399.40 samples/sec Loss 1.4581 LearningRate 0.0003 Epoch: 21 Global Step: 37690 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-06 00:14:25,932-Speed 9402.41 samples/sec Loss 1.4682 LearningRate 0.0003 Epoch: 21 Global Step: 37700 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-06 00:14:51,978-Speed 9436.24 samples/sec Loss 1.4635 LearningRate 0.0003 Epoch: 21 Global Step: 37710 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-06 00:15:18,011-Speed 9440.67 samples/sec Loss 1.4604 LearningRate 0.0003 Epoch: 21 Global Step: 37720 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-06 00:15:44,074-Speed 9430.05 samples/sec Loss 1.4699 LearningRate 0.0003 Epoch: 21 Global Step: 37730 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-06 00:16:10,197-Speed 9408.26 samples/sec Loss 1.4556 LearningRate 0.0003 Epoch: 21 Global Step: 37740 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-06 00:16:36,329-Speed 9404.90 samples/sec Loss 1.4651 LearningRate 0.0003 Epoch: 21 Global Step: 37750 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-06 00:17:02,449-Speed 9409.44 samples/sec Loss 1.4555 LearningRate 0.0003 Epoch: 21 Global Step: 37760 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-06 00:17:28,525-Speed 9425.36 samples/sec Loss 1.4681 LearningRate 0.0003 Epoch: 21 Global Step: 37770 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-06 00:17:54,595-Speed 9427.19 samples/sec Loss 1.4793 LearningRate 0.0003 Epoch: 21 Global Step: 37780 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:18:20,703-Speed 9413.88 samples/sec Loss 1.4866 LearningRate 0.0003 Epoch: 21 Global Step: 37790 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:18:46,820-Speed 9410.36 samples/sec Loss 1.4700 LearningRate 0.0003 Epoch: 21 Global Step: 37800 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:19:13,019-Speed 9381.04 samples/sec Loss 1.4674 LearningRate 0.0003 Epoch: 21 Global Step: 37810 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:19:39,185-Speed 9392.96 samples/sec Loss 1.4641 LearningRate 0.0003 Epoch: 21 Global Step: 37820 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:20:05,273-Speed 9420.92 samples/sec Loss 1.4647 LearningRate 0.0003 Epoch: 21 Global Step: 37830 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:20:31,536-Speed 9358.08 samples/sec Loss 1.4627 LearningRate 0.0003 Epoch: 21 Global Step: 37840 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:20:57,699-Speed 9394.10 samples/sec Loss 1.4567 LearningRate 0.0003 Epoch: 21 Global Step: 37850 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:21:23,816-Speed 9410.37 samples/sec Loss 1.4701 LearningRate 0.0003 Epoch: 21 Global Step: 37860 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:21:49,938-Speed 9408.71 samples/sec Loss 1.4530 LearningRate 0.0003 Epoch: 21 Global Step: 37870 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:22:16,116-Speed 9388.55 samples/sec Loss 1.4692 LearningRate 0.0003 Epoch: 21 Global Step: 37880 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:22:42,192-Speed 9425.25 samples/sec Loss 1.4530 LearningRate 0.0003 Epoch: 21 Global Step: 37890 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:23:08,311-Speed 9409.69 samples/sec Loss 1.4537 LearningRate 0.0003 Epoch: 21 Global Step: 37900 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:23:34,487-Speed 9389.39 samples/sec Loss 1.4608 LearningRate 0.0003 Epoch: 21 Global Step: 37910 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:24:00,653-Speed 9393.06 samples/sec Loss 1.4656 LearningRate 0.0003 Epoch: 21 Global Step: 37920 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:24:26,877-Speed 9372.44 samples/sec Loss 1.4612 LearningRate 0.0003 Epoch: 21 Global Step: 37930 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:24:53,091-Speed 9375.46 samples/sec Loss 1.4818 LearningRate 0.0003 Epoch: 21 Global Step: 37940 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:25:19,173-Speed 9423.16 samples/sec Loss 1.4700 LearningRate 0.0003 Epoch: 21 Global Step: 37950 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:25:45,317-Speed 9400.39 samples/sec Loss 1.4772 LearningRate 0.0003 Epoch: 21 Global Step: 37960 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:26:11,400-Speed 9422.59 samples/sec Loss 1.4609 LearningRate 0.0003 Epoch: 21 Global Step: 37970 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:26:37,522-Speed 9408.62 samples/sec Loss 1.4565 LearningRate 0.0003 Epoch: 21 Global Step: 37980 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-03-06 00:27:03,544-Speed 9444.76 samples/sec Loss 1.4651 LearningRate 0.0003 Epoch: 21 Global Step: 37990 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:27:29,638-Speed 9418.61 samples/sec Loss 1.4725 LearningRate 0.0003 Epoch: 21 Global Step: 38000 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:27:55,748-Speed 9413.20 samples/sec Loss 1.4656 LearningRate 0.0003 Epoch: 21 Global Step: 38010 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:28:21,794-Speed 9435.69 samples/sec Loss 1.4715 LearningRate 0.0002 Epoch: 21 Global Step: 38020 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:29:41,170-Speed 3096.21 samples/sec Loss 1.4485 LearningRate 0.0002 Epoch: 22 Global Step: 38030 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:30:07,129-Speed 9467.86 samples/sec Loss 1.4563 LearningRate 0.0002 Epoch: 22 Global Step: 38040 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:30:33,142-Speed 9447.84 samples/sec Loss 1.4259 LearningRate 0.0002 Epoch: 22 Global Step: 38050 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:30:59,207-Speed 9429.32 samples/sec Loss 1.4398 LearningRate 0.0002 Epoch: 22 Global Step: 38060 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:31:25,346-Speed 9402.33 samples/sec Loss 1.4425 LearningRate 0.0002 Epoch: 22 Global Step: 38070 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-06 00:31:51,485-Speed 9402.48 samples/sec Loss 1.4401 LearningRate 0.0002 Epoch: 22 Global Step: 38080 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-06 00:32:17,626-Speed 9402.04 samples/sec Loss 1.4328 LearningRate 0.0002 Epoch: 22 Global Step: 38090 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-06 00:32:43,766-Speed 9402.04 samples/sec Loss 1.4351 LearningRate 0.0002 Epoch: 22 Global Step: 38100 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-06 00:33:09,908-Speed 9401.29 samples/sec Loss 1.4428 LearningRate 0.0002 Epoch: 22 Global Step: 38110 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-06 00:33:35,984-Speed 9425.06 samples/sec Loss 1.4325 LearningRate 0.0002 Epoch: 22 Global Step: 38120 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-06 00:34:02,171-Speed 9385.24 samples/sec Loss 1.4402 LearningRate 0.0002 Epoch: 22 Global Step: 38130 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-06 00:34:28,398-Speed 9370.97 samples/sec Loss 1.4406 LearningRate 0.0002 Epoch: 22 Global Step: 38140 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-06 00:34:54,494-Speed 9417.79 samples/sec Loss 1.4409 LearningRate 0.0002 Epoch: 22 Global Step: 38150 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-06 00:35:20,608-Speed 9411.28 samples/sec Loss 1.4252 LearningRate 0.0002 Epoch: 22 Global Step: 38160 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-06 00:35:46,717-Speed 9413.23 samples/sec Loss 1.4439 LearningRate 0.0002 Epoch: 22 Global Step: 38170 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:36:12,964-Speed 9363.72 samples/sec Loss 1.4329 LearningRate 0.0002 Epoch: 22 Global Step: 38180 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:36:39,038-Speed 9425.92 samples/sec Loss 1.4391 LearningRate 0.0002 Epoch: 22 Global Step: 38190 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:37:05,137-Speed 9416.59 samples/sec Loss 1.4501 LearningRate 0.0002 Epoch: 22 Global Step: 38200 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:37:31,215-Speed 9424.42 samples/sec Loss 1.4410 LearningRate 0.0002 Epoch: 22 Global Step: 38210 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:37:57,346-Speed 9405.36 samples/sec Loss 1.4546 LearningRate 0.0002 Epoch: 22 Global Step: 38220 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:38:23,400-Speed 9433.22 samples/sec Loss 1.4373 LearningRate 0.0002 Epoch: 22 Global Step: 38230 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:38:49,473-Speed 9426.23 samples/sec Loss 1.4407 LearningRate 0.0002 Epoch: 22 Global Step: 38240 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:39:15,580-Speed 9413.94 samples/sec Loss 1.4412 LearningRate 0.0002 Epoch: 22 Global Step: 38250 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:39:41,767-Speed 9385.07 samples/sec Loss 1.4492 LearningRate 0.0002 Epoch: 22 Global Step: 38260 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:40:07,932-Speed 9392.97 samples/sec Loss 1.4442 LearningRate 0.0002 Epoch: 22 Global Step: 38270 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-03-06 00:40:34,023-Speed 9419.96 samples/sec Loss 1.4385 LearningRate 0.0002 Epoch: 22 Global Step: 38280 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-03-06 00:41:00,098-Speed 9425.37 samples/sec Loss 1.4372 LearningRate 0.0002 Epoch: 22 Global Step: 38290 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:41:26,265-Speed 9392.15 samples/sec Loss 1.4248 LearningRate 0.0002 Epoch: 22 Global Step: 38300 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:41:52,429-Speed 9393.62 samples/sec Loss 1.4396 LearningRate 0.0002 Epoch: 22 Global Step: 38310 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:42:18,504-Speed 9425.61 samples/sec Loss 1.4465 LearningRate 0.0002 Epoch: 22 Global Step: 38320 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:42:44,618-Speed 9411.66 samples/sec Loss 1.4451 LearningRate 0.0002 Epoch: 22 Global Step: 38330 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:43:10,730-Speed 9412.00 samples/sec Loss 1.4467 LearningRate 0.0002 Epoch: 22 Global Step: 38340 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:43:36,859-Speed 9406.15 samples/sec Loss 1.4365 LearningRate 0.0002 Epoch: 22 Global Step: 38350 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:44:03,037-Speed 9388.21 samples/sec Loss 1.4378 LearningRate 0.0002 Epoch: 22 Global Step: 38360 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:44:29,223-Speed 9385.73 samples/sec Loss 1.4494 LearningRate 0.0002 Epoch: 22 Global Step: 38370 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:44:55,389-Speed 9392.76 samples/sec Loss 1.4529 LearningRate 0.0002 Epoch: 22 Global Step: 38380 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:45:21,518-Speed 9405.99 samples/sec Loss 1.4417 LearningRate 0.0002 Epoch: 22 Global Step: 38390 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:45:47,649-Speed 9405.28 samples/sec Loss 1.4372 LearningRate 0.0002 Epoch: 22 Global Step: 38400 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:46:13,805-Speed 9396.44 samples/sec Loss 1.4386 LearningRate 0.0002 Epoch: 22 Global Step: 38410 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:46:39,882-Speed 9424.85 samples/sec Loss 1.4450 LearningRate 0.0002 Epoch: 22 Global Step: 38420 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:47:05,989-Speed 9414.84 samples/sec Loss 1.4364 LearningRate 0.0002 Epoch: 22 Global Step: 38430 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:47:32,289-Speed 9344.99 samples/sec Loss 1.4459 LearningRate 0.0002 Epoch: 22 Global Step: 38440 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:47:58,499-Speed 9376.66 samples/sec Loss 1.4376 LearningRate 0.0002 Epoch: 22 Global Step: 38450 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:48:24,740-Speed 9365.99 samples/sec Loss 1.4396 LearningRate 0.0002 Epoch: 22 Global Step: 38460 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:48:51,012-Speed 9355.01 samples/sec Loss 1.4404 LearningRate 0.0002 Epoch: 22 Global Step: 38470 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:49:17,233-Speed 9373.03 samples/sec Loss 1.4390 LearningRate 0.0002 Epoch: 22 Global Step: 38480 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-06 00:49:43,384-Speed 9398.15 samples/sec Loss 1.4417 LearningRate 0.0002 Epoch: 22 Global Step: 38490 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-06 00:50:09,600-Speed 9374.85 samples/sec Loss 1.4343 LearningRate 0.0002 Epoch: 22 Global Step: 38500 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-06 00:50:35,783-Speed 9386.43 samples/sec Loss 1.4312 LearningRate 0.0002 Epoch: 22 Global Step: 38510 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-06 00:51:01,906-Speed 9408.57 samples/sec Loss 1.4385 LearningRate 0.0002 Epoch: 22 Global Step: 38520 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-06 00:51:28,199-Speed 9347.42 samples/sec Loss 1.4279 LearningRate 0.0002 Epoch: 22 Global Step: 38530 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 00:51:54,357-Speed 9395.42 samples/sec Loss 1.4259 LearningRate 0.0002 Epoch: 22 Global Step: 38540 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 00:52:20,522-Speed 9392.86 samples/sec Loss 1.4341 LearningRate 0.0002 Epoch: 22 Global Step: 38550 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 00:52:46,717-Speed 9382.73 samples/sec Loss 1.4340 LearningRate 0.0002 Epoch: 22 Global Step: 38560 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 00:53:12,895-Speed 9388.59 samples/sec Loss 1.4410 LearningRate 0.0002 Epoch: 22 Global Step: 38570 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 00:53:39,080-Speed 9386.16 samples/sec Loss 1.4416 LearningRate 0.0002 Epoch: 22 Global Step: 38580 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 00:54:05,265-Speed 9385.60 samples/sec Loss 1.4297 LearningRate 0.0002 Epoch: 22 Global Step: 38590 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 00:54:31,434-Speed 9391.59 samples/sec Loss 1.4257 LearningRate 0.0002 Epoch: 22 Global Step: 38600 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 00:54:57,577-Speed 9401.03 samples/sec Loss 1.4295 LearningRate 0.0002 Epoch: 22 Global Step: 38610 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 00:55:23,734-Speed 9396.30 samples/sec Loss 1.4360 LearningRate 0.0002 Epoch: 22 Global Step: 38620 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 00:55:49,902-Speed 9392.17 samples/sec Loss 1.4220 LearningRate 0.0002 Epoch: 22 Global Step: 38630 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 00:56:16,156-Speed 9361.20 samples/sec Loss 1.4246 LearningRate 0.0002 Epoch: 22 Global Step: 38640 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 00:56:42,447-Speed 9348.27 samples/sec Loss 1.4288 LearningRate 0.0002 Epoch: 22 Global Step: 38650 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 00:57:08,774-Speed 9335.18 samples/sec Loss 1.4329 LearningRate 0.0002 Epoch: 22 Global Step: 38660 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 00:57:34,949-Speed 9389.58 samples/sec Loss 1.4228 LearningRate 0.0002 Epoch: 22 Global Step: 38670 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 00:58:01,061-Speed 9412.19 samples/sec Loss 1.4259 LearningRate 0.0002 Epoch: 22 Global Step: 38680 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 00:58:27,217-Speed 9396.34 samples/sec Loss 1.4424 LearningRate 0.0002 Epoch: 22 Global Step: 38690 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 00:58:53,441-Speed 9372.14 samples/sec Loss 1.4189 LearningRate 0.0002 Epoch: 22 Global Step: 38700 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 00:59:19,579-Speed 9402.73 samples/sec Loss 1.4218 LearningRate 0.0002 Epoch: 22 Global Step: 38710 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 00:59:45,726-Speed 9399.69 samples/sec Loss 1.4289 LearningRate 0.0002 Epoch: 22 Global Step: 38720 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 01:00:11,889-Speed 9393.90 samples/sec Loss 1.4242 LearningRate 0.0002 Epoch: 22 Global Step: 38730 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 01:00:38,064-Speed 9389.50 samples/sec Loss 1.4275 LearningRate 0.0002 Epoch: 22 Global Step: 38740 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 01:01:04,323-Speed 9359.32 samples/sec Loss 1.4168 LearningRate 0.0002 Epoch: 22 Global Step: 38750 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 01:01:30,580-Speed 9360.24 samples/sec Loss 1.4156 LearningRate 0.0002 Epoch: 22 Global Step: 38760 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 01:01:56,777-Speed 9381.67 samples/sec Loss 1.4314 LearningRate 0.0002 Epoch: 22 Global Step: 38770 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 01:02:22,961-Speed 9386.58 samples/sec Loss 1.4104 LearningRate 0.0002 Epoch: 22 Global Step: 38780 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:02:49,172-Speed 9376.62 samples/sec Loss 1.4223 LearningRate 0.0002 Epoch: 22 Global Step: 38790 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:03:15,431-Speed 9359.54 samples/sec Loss 1.4222 LearningRate 0.0002 Epoch: 22 Global Step: 38800 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:03:41,610-Speed 9387.88 samples/sec Loss 1.4184 LearningRate 0.0002 Epoch: 22 Global Step: 38810 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:04:07,819-Speed 9377.49 samples/sec Loss 1.4317 LearningRate 0.0002 Epoch: 22 Global Step: 38820 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:04:34,068-Speed 9362.89 samples/sec Loss 1.4269 LearningRate 0.0002 Epoch: 22 Global Step: 38830 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:05:00,238-Speed 9391.48 samples/sec Loss 1.4198 LearningRate 0.0002 Epoch: 22 Global Step: 38840 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:05:26,435-Speed 9381.39 samples/sec Loss 1.4256 LearningRate 0.0002 Epoch: 22 Global Step: 38850 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:05:52,635-Speed 9380.69 samples/sec Loss 1.4199 LearningRate 0.0002 Epoch: 22 Global Step: 38860 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:06:18,913-Speed 9352.49 samples/sec Loss 1.4144 LearningRate 0.0002 Epoch: 22 Global Step: 38870 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:06:45,062-Speed 9399.26 samples/sec Loss 1.4258 LearningRate 0.0002 Epoch: 22 Global Step: 38880 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-03-06 01:07:11,251-Speed 9384.20 samples/sec Loss 1.4167 LearningRate 0.0002 Epoch: 22 Global Step: 38890 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-03-06 01:07:37,392-Speed 9401.94 samples/sec Loss 1.4196 LearningRate 0.0002 Epoch: 22 Global Step: 38900 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:08:03,587-Speed 9382.22 samples/sec Loss 1.4115 LearningRate 0.0002 Epoch: 22 Global Step: 38910 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:08:29,829-Speed 9365.52 samples/sec Loss 1.4008 LearningRate 0.0002 Epoch: 22 Global Step: 38920 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:08:56,034-Speed 9378.97 samples/sec Loss 1.4145 LearningRate 0.0002 Epoch: 22 Global Step: 38930 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:09:22,217-Speed 9386.59 samples/sec Loss 1.4195 LearningRate 0.0002 Epoch: 22 Global Step: 38940 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:09:48,353-Speed 9403.59 samples/sec Loss 1.4142 LearningRate 0.0002 Epoch: 22 Global Step: 38950 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:10:14,487-Speed 9404.00 samples/sec Loss 1.4169 LearningRate 0.0002 Epoch: 22 Global Step: 38960 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:10:40,684-Speed 9381.72 samples/sec Loss 1.4106 LearningRate 0.0002 Epoch: 22 Global Step: 38970 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 01:11:06,860-Speed 9389.49 samples/sec Loss 1.4095 LearningRate 0.0002 Epoch: 22 Global Step: 38980 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 01:11:32,962-Speed 9415.56 samples/sec Loss 1.4102 LearningRate 0.0002 Epoch: 22 Global Step: 38990 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 01:11:59,234-Speed 9354.76 samples/sec Loss 1.4058 LearningRate 0.0002 Epoch: 22 Global Step: 39000 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 01:12:25,326-Speed 9419.22 samples/sec Loss 1.4131 LearningRate 0.0002 Epoch: 22 Global Step: 39010 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 01:12:51,519-Speed 9383.10 samples/sec Loss 1.4081 LearningRate 0.0002 Epoch: 22 Global Step: 39020 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 01:13:17,739-Speed 9373.37 samples/sec Loss 1.4037 LearningRate 0.0002 Epoch: 22 Global Step: 39030 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 01:13:43,876-Speed 9403.00 samples/sec Loss 1.4080 LearningRate 0.0002 Epoch: 22 Global Step: 39040 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 01:14:10,026-Speed 9398.77 samples/sec Loss 1.4141 LearningRate 0.0002 Epoch: 22 Global Step: 39050 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 01:14:36,160-Speed 9404.01 samples/sec Loss 1.4085 LearningRate 0.0002 Epoch: 22 Global Step: 39060 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 01:15:02,324-Speed 9393.56 samples/sec Loss 1.4068 LearningRate 0.0002 Epoch: 22 Global Step: 39070 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:15:28,395-Speed 9427.24 samples/sec Loss 1.4111 LearningRate 0.0002 Epoch: 22 Global Step: 39080 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:15:54,563-Speed 9392.00 samples/sec Loss 1.4102 LearningRate 0.0002 Epoch: 22 Global Step: 39090 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:16:20,627-Speed 9429.55 samples/sec Loss 1.4155 LearningRate 0.0002 Epoch: 22 Global Step: 39100 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:16:46,739-Speed 9412.25 samples/sec Loss 1.4132 LearningRate 0.0002 Epoch: 22 Global Step: 39110 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 01:17:12,851-Speed 9412.55 samples/sec Loss 1.4090 LearningRate 0.0002 Epoch: 22 Global Step: 39120 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 01:17:38,917-Speed 9428.57 samples/sec Loss 1.4065 LearningRate 0.0002 Epoch: 22 Global Step: 39130 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 01:18:05,004-Speed 9421.25 samples/sec Loss 1.4107 LearningRate 0.0002 Epoch: 22 Global Step: 39140 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 01:18:31,153-Speed 9398.86 samples/sec Loss 1.4147 LearningRate 0.0002 Epoch: 22 Global Step: 39150 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 01:18:57,342-Speed 9384.48 samples/sec Loss 1.4082 LearningRate 0.0002 Epoch: 22 Global Step: 39160 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 01:19:23,516-Speed 9389.82 samples/sec Loss 1.4015 LearningRate 0.0002 Epoch: 22 Global Step: 39170 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 01:19:49,708-Speed 9383.43 samples/sec Loss 1.4138 LearningRate 0.0002 Epoch: 22 Global Step: 39180 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 01:20:15,796-Speed 9420.89 samples/sec Loss 1.4101 LearningRate 0.0002 Epoch: 22 Global Step: 39190 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 01:20:42,044-Speed 9363.38 samples/sec Loss 1.4053 LearningRate 0.0002 Epoch: 22 Global Step: 39200 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 01:21:08,202-Speed 9395.79 samples/sec Loss 1.4042 LearningRate 0.0002 Epoch: 22 Global Step: 39210 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:21:34,308-Speed 9414.53 samples/sec Loss 1.3971 LearningRate 0.0002 Epoch: 22 Global Step: 39220 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:22:00,408-Speed 9416.63 samples/sec Loss 1.3892 LearningRate 0.0002 Epoch: 22 Global Step: 39230 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:22:26,558-Speed 9398.41 samples/sec Loss 1.3965 LearningRate 0.0002 Epoch: 22 Global Step: 39240 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:22:52,711-Speed 9397.52 samples/sec Loss 1.4148 LearningRate 0.0002 Epoch: 22 Global Step: 39250 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:23:18,874-Speed 9393.95 samples/sec Loss 1.4087 LearningRate 0.0002 Epoch: 22 Global Step: 39260 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:23:44,985-Speed 9412.75 samples/sec Loss 1.4046 LearningRate 0.0002 Epoch: 22 Global Step: 39270 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 01:24:11,040-Speed 9432.64 samples/sec Loss 1.3975 LearningRate 0.0002 Epoch: 22 Global Step: 39280 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 01:24:37,191-Speed 9398.32 samples/sec Loss 1.4053 LearningRate 0.0002 Epoch: 22 Global Step: 39290 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 01:25:03,333-Speed 9401.24 samples/sec Loss 1.3981 LearningRate 0.0002 Epoch: 22 Global Step: 39300 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 01:25:29,486-Speed 9397.52 samples/sec Loss 1.4066 LearningRate 0.0002 Epoch: 22 Global Step: 39310 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 01:25:55,659-Speed 9390.53 samples/sec Loss 1.4038 LearningRate 0.0002 Epoch: 22 Global Step: 39320 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 01:26:21,804-Speed 9399.90 samples/sec Loss 1.4020 LearningRate 0.0002 Epoch: 22 Global Step: 39330 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 01:26:47,940-Speed 9403.74 samples/sec Loss 1.3973 LearningRate 0.0002 Epoch: 22 Global Step: 39340 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 01:27:14,044-Speed 9414.94 samples/sec Loss 1.3914 LearningRate 0.0002 Epoch: 22 Global Step: 39350 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 01:27:40,190-Speed 9399.90 samples/sec Loss 1.3985 LearningRate 0.0002 Epoch: 22 Global Step: 39360 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 01:28:06,387-Speed 9381.63 samples/sec Loss 1.3864 LearningRate 0.0002 Epoch: 22 Global Step: 39370 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:28:32,545-Speed 9395.55 samples/sec Loss 1.4036 LearningRate 0.0002 Epoch: 22 Global Step: 39380 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:28:58,649-Speed 9414.67 samples/sec Loss 1.3955 LearningRate 0.0002 Epoch: 22 Global Step: 39390 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:29:24,833-Speed 9386.51 samples/sec Loss 1.3987 LearningRate 0.0002 Epoch: 22 Global Step: 39400 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:29:50,941-Speed 9413.60 samples/sec Loss 1.3908 LearningRate 0.0002 Epoch: 22 Global Step: 39410 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:30:17,150-Speed 9377.60 samples/sec Loss 1.4047 LearningRate 0.0002 Epoch: 22 Global Step: 39420 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:30:43,263-Speed 9411.84 samples/sec Loss 1.3919 LearningRate 0.0002 Epoch: 22 Global Step: 39430 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:31:09,368-Speed 9414.47 samples/sec Loss 1.4084 LearningRate 0.0002 Epoch: 22 Global Step: 39440 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:31:35,459-Speed 9420.11 samples/sec Loss 1.4009 LearningRate 0.0002 Epoch: 22 Global Step: 39450 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:32:01,631-Speed 9390.47 samples/sec Loss 1.3944 LearningRate 0.0002 Epoch: 22 Global Step: 39460 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:32:27,701-Speed 9427.32 samples/sec Loss 1.3954 LearningRate 0.0002 Epoch: 22 Global Step: 39470 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-03-06 01:32:53,833-Speed 9404.85 samples/sec Loss 1.3898 LearningRate 0.0002 Epoch: 22 Global Step: 39480 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:33:19,978-Speed 9400.28 samples/sec Loss 1.3884 LearningRate 0.0002 Epoch: 22 Global Step: 39490 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:33:46,037-Speed 9431.36 samples/sec Loss 1.3948 LearningRate 0.0002 Epoch: 22 Global Step: 39500 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:34:12,092-Speed 9433.03 samples/sec Loss 1.3921 LearningRate 0.0002 Epoch: 22 Global Step: 39510 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:34:38,158-Speed 9428.56 samples/sec Loss 1.3835 LearningRate 0.0002 Epoch: 22 Global Step: 39520 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:35:04,351-Speed 9383.08 samples/sec Loss 1.3964 LearningRate 0.0002 Epoch: 22 Global Step: 39530 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:35:30,391-Speed 9438.12 samples/sec Loss 1.3902 LearningRate 0.0002 Epoch: 22 Global Step: 39540 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:35:56,512-Speed 9409.03 samples/sec Loss 1.4015 LearningRate 0.0002 Epoch: 22 Global Step: 39550 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:36:22,625-Speed 9411.92 samples/sec Loss 1.3860 LearningRate 0.0002 Epoch: 22 Global Step: 39560 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:36:48,684-Speed 9431.02 samples/sec Loss 1.3799 LearningRate 0.0002 Epoch: 22 Global Step: 39570 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:37:14,770-Speed 9421.87 samples/sec Loss 1.3946 LearningRate 0.0002 Epoch: 22 Global Step: 39580 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-03-06 01:37:40,891-Speed 9408.84 samples/sec Loss 1.3847 LearningRate 0.0002 Epoch: 22 Global Step: 39590 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:38:07,048-Speed 9395.77 samples/sec Loss 1.3872 LearningRate 0.0002 Epoch: 22 Global Step: 39600 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:38:35,682-Speed 8583.31 samples/sec Loss 1.3906 LearningRate 0.0002 Epoch: 22 Global Step: 39610 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:39:01,775-Speed 9419.10 samples/sec Loss 1.3766 LearningRate 0.0002 Epoch: 22 Global Step: 39620 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 01:39:27,830-Speed 9432.43 samples/sec Loss 1.3839 LearningRate 0.0002 Epoch: 22 Global Step: 39630 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 01:39:53,990-Speed 9395.41 samples/sec Loss 1.4102 LearningRate 0.0002 Epoch: 22 Global Step: 39640 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 01:40:20,073-Speed 9422.45 samples/sec Loss 1.3976 LearningRate 0.0002 Epoch: 22 Global Step: 39650 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 01:40:46,222-Speed 9399.20 samples/sec Loss 1.3925 LearningRate 0.0002 Epoch: 22 Global Step: 39660 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 01:41:12,393-Speed 9391.28 samples/sec Loss 1.3927 LearningRate 0.0002 Epoch: 22 Global Step: 39670 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 01:41:38,500-Speed 9413.90 samples/sec Loss 1.3874 LearningRate 0.0002 Epoch: 22 Global Step: 39680 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 01:42:04,559-Speed 9431.38 samples/sec Loss 1.3896 LearningRate 0.0002 Epoch: 22 Global Step: 39690 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 01:42:30,628-Speed 9427.60 samples/sec Loss 1.4071 LearningRate 0.0002 Epoch: 22 Global Step: 39700 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 01:42:56,851-Speed 9372.40 samples/sec Loss 1.4086 LearningRate 0.0002 Epoch: 22 Global Step: 39710 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 01:43:22,976-Speed 9407.99 samples/sec Loss 1.4044 LearningRate 0.0002 Epoch: 22 Global Step: 39720 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:43:49,146-Speed 9391.61 samples/sec Loss 1.4069 LearningRate 0.0002 Epoch: 22 Global Step: 39730 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:44:15,324-Speed 9388.35 samples/sec Loss 1.4100 LearningRate 0.0002 Epoch: 22 Global Step: 39740 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:44:41,404-Speed 9423.73 samples/sec Loss 1.3949 LearningRate 0.0002 Epoch: 22 Global Step: 39750 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:46:00,627-Speed 3102.20 samples/sec Loss 1.3827 LearningRate 0.0002 Epoch: 23 Global Step: 39760 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:46:26,686-Speed 9431.49 samples/sec Loss 1.3677 LearningRate 0.0002 Epoch: 23 Global Step: 39770 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:46:52,677-Speed 9455.91 samples/sec Loss 1.3738 LearningRate 0.0002 Epoch: 23 Global Step: 39780 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:47:18,673-Speed 9454.03 samples/sec Loss 1.3620 LearningRate 0.0002 Epoch: 23 Global Step: 39790 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:47:44,635-Speed 9467.10 samples/sec Loss 1.3764 LearningRate 0.0002 Epoch: 23 Global Step: 39800 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:48:10,596-Speed 9467.04 samples/sec Loss 1.3657 LearningRate 0.0002 Epoch: 23 Global Step: 39810 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:48:36,551-Speed 9469.85 samples/sec Loss 1.3698 LearningRate 0.0002 Epoch: 23 Global Step: 39820 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-03-06 01:49:02,499-Speed 9471.53 samples/sec Loss 1.3760 LearningRate 0.0002 Epoch: 23 Global Step: 39830 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:49:28,647-Speed 9399.43 samples/sec Loss 1.3731 LearningRate 0.0002 Epoch: 23 Global Step: 39840 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:49:54,721-Speed 9425.97 samples/sec Loss 1.3694 LearningRate 0.0002 Epoch: 23 Global Step: 39850 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:50:20,758-Speed 9439.11 samples/sec Loss 1.3662 LearningRate 0.0002 Epoch: 23 Global Step: 39860 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:50:46,852-Speed 9418.78 samples/sec Loss 1.3665 LearningRate 0.0002 Epoch: 23 Global Step: 39870 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-06 01:51:12,920-Speed 9428.25 samples/sec Loss 1.3692 LearningRate 0.0002 Epoch: 23 Global Step: 39880 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 01:51:39,049-Speed 9405.81 samples/sec Loss 1.3709 LearningRate 0.0002 Epoch: 23 Global Step: 39890 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-06 01:52:05,120-Speed 9426.82 samples/sec Loss 1.3700 LearningRate 0.0002 Epoch: 23 Global Step: 39900 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-06 01:52:31,178-Speed 9431.74 samples/sec Loss 1.3669 LearningRate 0.0002 Epoch: 23 Global Step: 39910 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-06 01:52:57,289-Speed 9412.77 samples/sec Loss 1.3691 LearningRate 0.0002 Epoch: 23 Global Step: 39920 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-06 01:53:23,362-Speed 9426.39 samples/sec Loss 1.3757 LearningRate 0.0002 Epoch: 23 Global Step: 39930 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-06 01:53:49,544-Speed 9386.83 samples/sec Loss 1.3692 LearningRate 0.0002 Epoch: 23 Global Step: 39940 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-06 01:54:15,688-Speed 9400.72 samples/sec Loss 1.3681 LearningRate 0.0002 Epoch: 23 Global Step: 39950 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-06 01:54:41,847-Speed 9395.01 samples/sec Loss 1.3731 LearningRate 0.0002 Epoch: 23 Global Step: 39960 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-06 01:55:07,906-Speed 9431.49 samples/sec Loss 1.3701 LearningRate 0.0002 Epoch: 23 Global Step: 39970 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-06 01:55:34,014-Speed 9413.47 samples/sec Loss 1.3692 LearningRate 0.0002 Epoch: 23 Global Step: 39980 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 01:56:00,177-Speed 9393.81 samples/sec Loss 1.3726 LearningRate 0.0002 Epoch: 23 Global Step: 39990 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 01:56:26,299-Speed 9408.61 samples/sec Loss 1.3657 LearningRate 0.0002 Epoch: 23 Global Step: 40000 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 01:56:52,445-Speed 9399.80 samples/sec Loss 1.3718 LearningRate 0.0002 Epoch: 23 Global Step: 40010 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 01:57:18,636-Speed 9383.72 samples/sec Loss 1.3658 LearningRate 0.0002 Epoch: 23 Global Step: 40020 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 01:57:44,798-Speed 9394.32 samples/sec Loss 1.3605 LearningRate 0.0002 Epoch: 23 Global Step: 40030 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 01:58:10,922-Speed 9407.75 samples/sec Loss 1.3633 LearningRate 0.0002 Epoch: 23 Global Step: 40040 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 01:58:37,060-Speed 9402.69 samples/sec Loss 1.3708 LearningRate 0.0002 Epoch: 23 Global Step: 40050 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 01:59:03,329-Speed 9356.10 samples/sec Loss 1.3820 LearningRate 0.0002 Epoch: 23 Global Step: 40060 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 01:59:29,424-Speed 9418.53 samples/sec Loss 1.3733 LearningRate 0.0002 Epoch: 23 Global Step: 40070 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 01:59:55,575-Speed 9398.08 samples/sec Loss 1.3617 LearningRate 0.0002 Epoch: 23 Global Step: 40080 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-03-06 02:00:21,668-Speed 9419.06 samples/sec Loss 1.3621 LearningRate 0.0002 Epoch: 23 Global Step: 40090 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-03-06 02:00:47,761-Speed 9419.17 samples/sec Loss 1.3751 LearningRate 0.0002 Epoch: 23 Global Step: 40100 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-03-06 02:01:13,830-Speed 9427.73 samples/sec Loss 1.3682 LearningRate 0.0002 Epoch: 23 Global Step: 40110 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-03-06 02:01:39,929-Speed 9417.21 samples/sec Loss 1.3632 LearningRate 0.0002 Epoch: 23 Global Step: 40120 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-03-06 02:02:06,016-Speed 9421.32 samples/sec Loss 1.3741 LearningRate 0.0002 Epoch: 23 Global Step: 40130 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-03-06 02:02:32,209-Speed 9383.08 samples/sec Loss 1.3598 LearningRate 0.0002 Epoch: 23 Global Step: 40140 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-03-06 02:02:58,309-Speed 9416.61 samples/sec Loss 1.3683 LearningRate 0.0002 Epoch: 23 Global Step: 40150 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:03:24,513-Speed 9379.75 samples/sec Loss 1.3769 LearningRate 0.0002 Epoch: 23 Global Step: 40160 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:03:50,629-Speed 9410.63 samples/sec Loss 1.3673 LearningRate 0.0002 Epoch: 23 Global Step: 40170 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-06 02:04:16,702-Speed 9426.78 samples/sec Loss 1.3597 LearningRate 0.0002 Epoch: 23 Global Step: 40180 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-06 02:04:42,839-Speed 9403.36 samples/sec Loss 1.3759 LearningRate 0.0002 Epoch: 23 Global Step: 40190 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-06 02:05:08,934-Speed 9418.34 samples/sec Loss 1.3646 LearningRate 0.0002 Epoch: 23 Global Step: 40200 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-06 02:05:35,006-Speed 9426.86 samples/sec Loss 1.3716 LearningRate 0.0002 Epoch: 23 Global Step: 40210 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-06 02:06:01,206-Speed 9380.31 samples/sec Loss 1.3712 LearningRate 0.0002 Epoch: 23 Global Step: 40220 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-06 02:06:27,421-Speed 9375.06 samples/sec Loss 1.3611 LearningRate 0.0002 Epoch: 23 Global Step: 40230 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-06 02:06:53,572-Speed 9398.49 samples/sec Loss 1.3594 LearningRate 0.0002 Epoch: 23 Global Step: 40240 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-06 02:07:19,734-Speed 9394.21 samples/sec Loss 1.3562 LearningRate 0.0002 Epoch: 23 Global Step: 40250 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-06 02:07:45,810-Speed 9425.45 samples/sec Loss 1.3656 LearningRate 0.0002 Epoch: 23 Global Step: 40260 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-06 02:08:11,888-Speed 9424.44 samples/sec Loss 1.3598 LearningRate 0.0002 Epoch: 23 Global Step: 40270 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:08:38,038-Speed 9398.21 samples/sec Loss 1.3613 LearningRate 0.0002 Epoch: 23 Global Step: 40280 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:09:04,155-Speed 9410.68 samples/sec Loss 1.3564 LearningRate 0.0002 Epoch: 23 Global Step: 40290 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:09:30,262-Speed 9413.82 samples/sec Loss 1.3576 LearningRate 0.0002 Epoch: 23 Global Step: 40300 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:09:56,443-Speed 9387.56 samples/sec Loss 1.3528 LearningRate 0.0002 Epoch: 23 Global Step: 40310 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:10:22,537-Speed 9418.67 samples/sec Loss 1.3554 LearningRate 0.0002 Epoch: 23 Global Step: 40320 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:10:48,671-Speed 9404.09 samples/sec Loss 1.3591 LearningRate 0.0002 Epoch: 23 Global Step: 40330 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:11:14,731-Speed 9431.14 samples/sec Loss 1.3611 LearningRate 0.0002 Epoch: 23 Global Step: 40340 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:11:40,835-Speed 9414.99 samples/sec Loss 1.3639 LearningRate 0.0002 Epoch: 23 Global Step: 40350 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:12:06,911-Speed 9425.32 samples/sec Loss 1.3503 LearningRate 0.0002 Epoch: 23 Global Step: 40360 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:12:33,017-Speed 9414.06 samples/sec Loss 1.3523 LearningRate 0.0002 Epoch: 23 Global Step: 40370 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-03-06 02:12:59,273-Speed 9360.37 samples/sec Loss 1.3562 LearningRate 0.0002 Epoch: 23 Global Step: 40380 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:13:25,436-Speed 9394.19 samples/sec Loss 1.3681 LearningRate 0.0002 Epoch: 23 Global Step: 40390 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:13:51,527-Speed 9419.57 samples/sec Loss 1.3627 LearningRate 0.0002 Epoch: 23 Global Step: 40400 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:14:17,718-Speed 9383.86 samples/sec Loss 1.3656 LearningRate 0.0002 Epoch: 23 Global Step: 40410 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:14:43,873-Speed 9396.63 samples/sec Loss 1.3577 LearningRate 0.0002 Epoch: 23 Global Step: 40420 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:15:10,004-Speed 9405.03 samples/sec Loss 1.3662 LearningRate 0.0002 Epoch: 23 Global Step: 40430 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:15:36,194-Speed 9384.31 samples/sec Loss 1.3584 LearningRate 0.0002 Epoch: 23 Global Step: 40440 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:16:02,392-Speed 9381.78 samples/sec Loss 1.3507 LearningRate 0.0002 Epoch: 23 Global Step: 40450 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:16:28,529-Speed 9402.91 samples/sec Loss 1.3672 LearningRate 0.0002 Epoch: 23 Global Step: 40460 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:16:54,643-Speed 9411.37 samples/sec Loss 1.3610 LearningRate 0.0002 Epoch: 23 Global Step: 40470 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:17:20,798-Speed 9396.73 samples/sec Loss 1.3578 LearningRate 0.0002 Epoch: 23 Global Step: 40480 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-03-06 02:17:46,896-Speed 9417.07 samples/sec Loss 1.3500 LearningRate 0.0002 Epoch: 23 Global Step: 40490 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-03-06 02:18:13,088-Speed 9383.81 samples/sec Loss 1.3483 LearningRate 0.0002 Epoch: 23 Global Step: 40500 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-03-06 02:18:39,199-Speed 9412.28 samples/sec Loss 1.3476 LearningRate 0.0002 Epoch: 23 Global Step: 40510 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:19:05,344-Speed 9400.51 samples/sec Loss 1.3520 LearningRate 0.0002 Epoch: 23 Global Step: 40520 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:19:31,484-Speed 9402.10 samples/sec Loss 1.3434 LearningRate 0.0002 Epoch: 23 Global Step: 40530 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:19:57,656-Speed 9390.31 samples/sec Loss 1.3543 LearningRate 0.0002 Epoch: 23 Global Step: 40540 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:20:23,832-Speed 9389.57 samples/sec Loss 1.3558 LearningRate 0.0002 Epoch: 23 Global Step: 40550 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:20:49,938-Speed 9414.18 samples/sec Loss 1.3499 LearningRate 0.0002 Epoch: 23 Global Step: 40560 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:21:16,016-Speed 9424.36 samples/sec Loss 1.3483 LearningRate 0.0002 Epoch: 23 Global Step: 40570 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:21:42,123-Speed 9413.81 samples/sec Loss 1.3524 LearningRate 0.0002 Epoch: 23 Global Step: 40580 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:22:08,288-Speed 9393.22 samples/sec Loss 1.3440 LearningRate 0.0002 Epoch: 23 Global Step: 40590 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:22:34,400-Speed 9412.12 samples/sec Loss 1.3524 LearningRate 0.0002 Epoch: 23 Global Step: 40600 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:23:00,566-Speed 9392.75 samples/sec Loss 1.3529 LearningRate 0.0002 Epoch: 23 Global Step: 40610 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-03-06 02:23:26,643-Speed 9424.57 samples/sec Loss 1.3419 LearningRate 0.0002 Epoch: 23 Global Step: 40620 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:23:52,849-Speed 9378.43 samples/sec Loss 1.3467 LearningRate 0.0002 Epoch: 23 Global Step: 40630 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-06 02:24:19,055-Speed 9378.58 samples/sec Loss 1.3329 LearningRate 0.0002 Epoch: 23 Global Step: 40640 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-06 02:24:45,159-Speed 9415.66 samples/sec Loss 1.3426 LearningRate 0.0002 Epoch: 23 Global Step: 40650 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-06 02:25:11,440-Speed 9351.72 samples/sec Loss 1.3401 LearningRate 0.0002 Epoch: 23 Global Step: 40660 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-06 02:25:37,608-Speed 9391.97 samples/sec Loss 1.3346 LearningRate 0.0002 Epoch: 23 Global Step: 40670 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-06 02:26:03,779-Speed 9391.13 samples/sec Loss 1.3420 LearningRate 0.0002 Epoch: 23 Global Step: 40680 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-06 02:26:29,964-Speed 9386.32 samples/sec Loss 1.3328 LearningRate 0.0002 Epoch: 23 Global Step: 40690 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-06 02:26:56,180-Speed 9374.75 samples/sec Loss 1.3460 LearningRate 0.0002 Epoch: 23 Global Step: 40700 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-06 02:27:22,281-Speed 9416.24 samples/sec Loss 1.3448 LearningRate 0.0002 Epoch: 23 Global Step: 40710 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-06 02:27:48,593-Speed 9340.84 samples/sec Loss 1.3385 LearningRate 0.0002 Epoch: 23 Global Step: 40720 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-06 02:28:14,859-Speed 9356.92 samples/sec Loss 1.3442 LearningRate 0.0002 Epoch: 23 Global Step: 40730 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:28:41,013-Speed 9397.72 samples/sec Loss 1.3452 LearningRate 0.0002 Epoch: 23 Global Step: 40740 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:29:07,194-Speed 9387.54 samples/sec Loss 1.3410 LearningRate 0.0002 Epoch: 23 Global Step: 40750 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:29:33,306-Speed 9412.26 samples/sec Loss 1.3421 LearningRate 0.0002 Epoch: 23 Global Step: 40760 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:29:59,489-Speed 9386.28 samples/sec Loss 1.3536 LearningRate 0.0002 Epoch: 23 Global Step: 40770 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:30:25,634-Speed 9400.42 samples/sec Loss 1.3438 LearningRate 0.0002 Epoch: 23 Global Step: 40780 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:30:51,828-Speed 9382.77 samples/sec Loss 1.3466 LearningRate 0.0002 Epoch: 23 Global Step: 40790 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:31:18,036-Speed 9377.48 samples/sec Loss 1.3458 LearningRate 0.0002 Epoch: 23 Global Step: 40800 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:31:44,225-Speed 9384.51 samples/sec Loss 1.3440 LearningRate 0.0002 Epoch: 23 Global Step: 40810 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:32:10,472-Speed 9363.80 samples/sec Loss 1.3350 LearningRate 0.0002 Epoch: 23 Global Step: 40820 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:32:36,642-Speed 9396.29 samples/sec Loss 1.3343 LearningRate 0.0002 Epoch: 23 Global Step: 40830 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-03-06 02:33:02,795-Speed 9397.45 samples/sec Loss 1.3392 LearningRate 0.0002 Epoch: 23 Global Step: 40840 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-03-06 02:33:28,861-Speed 9428.68 samples/sec Loss 1.3431 LearningRate 0.0002 Epoch: 23 Global Step: 40850 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:33:54,995-Speed 9404.35 samples/sec Loss 1.3374 LearningRate 0.0002 Epoch: 23 Global Step: 40860 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:34:21,153-Speed 9395.61 samples/sec Loss 1.3350 LearningRate 0.0002 Epoch: 23 Global Step: 40870 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:34:47,290-Speed 9403.50 samples/sec Loss 1.3293 LearningRate 0.0002 Epoch: 23 Global Step: 40880 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:35:13,442-Speed 9397.40 samples/sec Loss 1.3431 LearningRate 0.0002 Epoch: 23 Global Step: 40890 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:35:39,650-Speed 9377.85 samples/sec Loss 1.3438 LearningRate 0.0002 Epoch: 23 Global Step: 40900 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:36:05,789-Speed 9402.19 samples/sec Loss 1.3330 LearningRate 0.0002 Epoch: 23 Global Step: 40910 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:36:31,913-Speed 9407.84 samples/sec Loss 1.3390 LearningRate 0.0002 Epoch: 23 Global Step: 40920 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:36:58,059-Speed 9399.82 samples/sec Loss 1.3373 LearningRate 0.0002 Epoch: 23 Global Step: 40930 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:37:24,160-Speed 9416.36 samples/sec Loss 1.3368 LearningRate 0.0002 Epoch: 23 Global Step: 40940 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:37:50,304-Speed 9400.19 samples/sec Loss 1.3382 LearningRate 0.0002 Epoch: 23 Global Step: 40950 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-03-06 02:38:16,397-Speed 9419.11 samples/sec Loss 1.3333 LearningRate 0.0002 Epoch: 23 Global Step: 40960 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-03-06 02:38:42,540-Speed 9401.08 samples/sec Loss 1.3395 LearningRate 0.0002 Epoch: 23 Global Step: 40970 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-03-06 02:39:08,700-Speed 9394.87 samples/sec Loss 1.3358 LearningRate 0.0002 Epoch: 23 Global Step: 40980 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:39:34,847-Speed 9399.57 samples/sec Loss 1.3380 LearningRate 0.0002 Epoch: 23 Global Step: 40990 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:40:00,986-Speed 9402.63 samples/sec Loss 1.3301 LearningRate 0.0002 Epoch: 23 Global Step: 41000 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:40:27,126-Speed 9401.98 samples/sec Loss 1.3360 LearningRate 0.0002 Epoch: 23 Global Step: 41010 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:40:53,227-Speed 9416.26 samples/sec Loss 1.3230 LearningRate 0.0002 Epoch: 23 Global Step: 41020 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:41:19,307-Speed 9423.94 samples/sec Loss 1.3230 LearningRate 0.0002 Epoch: 23 Global Step: 41030 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:41:45,430-Speed 9408.21 samples/sec Loss 1.3262 LearningRate 0.0002 Epoch: 23 Global Step: 41040 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:42:11,524-Speed 9418.48 samples/sec Loss 1.3229 LearningRate 0.0002 Epoch: 23 Global Step: 41050 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:42:37,709-Speed 9385.95 samples/sec Loss 1.3278 LearningRate 0.0002 Epoch: 23 Global Step: 41060 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:43:03,857-Speed 9399.35 samples/sec Loss 1.3244 LearningRate 0.0002 Epoch: 23 Global Step: 41070 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:43:29,964-Speed 9414.23 samples/sec Loss 1.3271 LearningRate 0.0002 Epoch: 23 Global Step: 41080 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:43:56,049-Speed 9421.74 samples/sec Loss 1.3234 LearningRate 0.0002 Epoch: 23 Global Step: 41090 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:44:22,111-Speed 9430.25 samples/sec Loss 1.3305 LearningRate 0.0002 Epoch: 23 Global Step: 41100 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:44:48,258-Speed 9399.30 samples/sec Loss 1.3270 LearningRate 0.0002 Epoch: 23 Global Step: 41110 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:45:14,396-Speed 9403.08 samples/sec Loss 1.3246 LearningRate 0.0002 Epoch: 23 Global Step: 41120 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:45:40,495-Speed 9417.16 samples/sec Loss 1.3245 LearningRate 0.0002 Epoch: 23 Global Step: 41130 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:46:06,639-Speed 9400.45 samples/sec Loss 1.3369 LearningRate 0.0002 Epoch: 23 Global Step: 41140 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:46:32,781-Speed 9401.38 samples/sec Loss 1.3254 LearningRate 0.0002 Epoch: 23 Global Step: 41150 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:46:58,940-Speed 9395.37 samples/sec Loss 1.3325 LearningRate 0.0002 Epoch: 23 Global Step: 41160 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:47:25,078-Speed 9403.00 samples/sec Loss 1.3385 LearningRate 0.0002 Epoch: 23 Global Step: 41170 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:47:51,277-Speed 9380.97 samples/sec Loss 1.3278 LearningRate 0.0002 Epoch: 23 Global Step: 41180 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-03-06 02:48:17,462-Speed 9385.89 samples/sec Loss 1.3351 LearningRate 0.0002 Epoch: 23 Global Step: 41190 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-03-06 02:48:43,659-Speed 9381.82 samples/sec Loss 1.3240 LearningRate 0.0002 Epoch: 23 Global Step: 41200 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:49:09,813-Speed 9396.91 samples/sec Loss 1.3259 LearningRate 0.0002 Epoch: 23 Global Step: 41210 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:49:35,981-Speed 9392.47 samples/sec Loss 1.3222 LearningRate 0.0002 Epoch: 23 Global Step: 41220 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:50:02,244-Speed 9357.87 samples/sec Loss 1.3227 LearningRate 0.0002 Epoch: 23 Global Step: 41230 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-06 02:50:28,431-Speed 9385.26 samples/sec Loss 1.3289 LearningRate 0.0002 Epoch: 23 Global Step: 41240 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 02:50:54,650-Speed 9373.80 samples/sec Loss 1.3219 LearningRate 0.0002 Epoch: 23 Global Step: 41250 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 02:51:20,847-Speed 9381.76 samples/sec Loss 1.3212 LearningRate 0.0002 Epoch: 23 Global Step: 41260 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 02:51:46,972-Speed 9407.55 samples/sec Loss 1.3259 LearningRate 0.0002 Epoch: 23 Global Step: 41270 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 02:52:13,192-Speed 9373.47 samples/sec Loss 1.3196 LearningRate 0.0002 Epoch: 23 Global Step: 41280 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 02:52:39,437-Speed 9364.63 samples/sec Loss 1.3247 LearningRate 0.0002 Epoch: 23 Global Step: 41290 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 02:53:05,631-Speed 9382.40 samples/sec Loss 1.3333 LearningRate 0.0002 Epoch: 23 Global Step: 41300 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-03-06 02:53:31,823-Speed 9383.63 samples/sec Loss 1.3270 LearningRate 0.0002 Epoch: 23 Global Step: 41310 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-03-06 02:53:57,988-Speed 9393.27 samples/sec Loss 1.3262 LearningRate 0.0002 Epoch: 23 Global Step: 41320 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-06 02:54:24,157-Speed 9391.23 samples/sec Loss 1.3295 LearningRate 0.0002 Epoch: 23 Global Step: 41330 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-06 02:54:50,308-Speed 9398.12 samples/sec Loss 1.3266 LearningRate 0.0002 Epoch: 23 Global Step: 41340 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-06 02:55:16,448-Speed 9402.35 samples/sec Loss 1.3292 LearningRate 0.0002 Epoch: 23 Global Step: 41350 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-06 02:55:42,604-Speed 9396.33 samples/sec Loss 1.3206 LearningRate 0.0002 Epoch: 23 Global Step: 41360 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-06 02:56:08,856-Speed 9362.34 samples/sec Loss 1.3236 LearningRate 0.0002 Epoch: 23 Global Step: 41370 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-06 02:56:35,050-Speed 9382.55 samples/sec Loss 1.3179 LearningRate 0.0002 Epoch: 23 Global Step: 41380 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-06 02:57:01,239-Speed 9384.57 samples/sec Loss 1.3196 LearningRate 0.0002 Epoch: 23 Global Step: 41390 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-06 02:57:27,449-Speed 9376.83 samples/sec Loss 1.3113 LearningRate 0.0002 Epoch: 23 Global Step: 41400 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-06 02:57:53,592-Speed 9401.10 samples/sec Loss 1.3338 LearningRate 0.0002 Epoch: 23 Global Step: 41410 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-06 02:58:19,746-Speed 9396.92 samples/sec Loss 1.3311 LearningRate 0.0002 Epoch: 23 Global Step: 41420 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 02:58:45,887-Speed 9402.17 samples/sec Loss 1.3283 LearningRate 0.0002 Epoch: 23 Global Step: 41430 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 02:59:12,034-Speed 9399.22 samples/sec Loss 1.3344 LearningRate 0.0002 Epoch: 23 Global Step: 41440 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 02:59:38,151-Speed 9410.73 samples/sec Loss 1.3282 LearningRate 0.0002 Epoch: 23 Global Step: 41450 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-06 03:00:04,290-Speed 9402.14 samples/sec Loss 1.3201 LearningRate 0.0002 Epoch: 23 Global Step: 41460 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-06 03:00:30,516-Speed 9371.41 samples/sec Loss 1.3273 LearningRate 0.0002 Epoch: 23 Global Step: 41470 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-06 03:00:56,727-Speed 9376.42 samples/sec Loss 1.3273 LearningRate 0.0002 Epoch: 23 Global Step: 41480 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-06 03:02:15,911-Speed 3103.70 samples/sec Loss 1.3062 LearningRate 0.0002 Epoch: 24 Global Step: 41490 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-06 03:02:41,901-Speed 9456.83 samples/sec Loss 1.3068 LearningRate 0.0002 Epoch: 24 Global Step: 41500 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-06 03:03:07,965-Speed 9429.29 samples/sec Loss 1.2984 LearningRate 0.0002 Epoch: 24 Global Step: 41510 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-06 03:03:33,960-Speed 9454.65 samples/sec Loss 1.3127 LearningRate 0.0002 Epoch: 24 Global Step: 41520 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-06 03:04:00,013-Speed 9433.67 samples/sec Loss 1.3081 LearningRate 0.0002 Epoch: 24 Global Step: 41530 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-06 03:04:25,959-Speed 9472.40 samples/sec Loss 1.3096 LearningRate 0.0002 Epoch: 24 Global Step: 41540 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-06 03:04:51,956-Speed 9453.38 samples/sec Loss 1.3020 LearningRate 0.0002 Epoch: 24 Global Step: 41550 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 03:05:17,896-Speed 9474.74 samples/sec Loss 1.3025 LearningRate 0.0002 Epoch: 24 Global Step: 41560 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 03:05:43,806-Speed 9485.81 samples/sec Loss 1.3112 LearningRate 0.0002 Epoch: 24 Global Step: 41570 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-06 03:06:09,817-Speed 9448.38 samples/sec Loss 1.3093 LearningRate 0.0002 Epoch: 24 Global Step: 41580 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-06 03:06:35,814-Speed 9453.92 samples/sec Loss 1.3067 LearningRate 0.0002 Epoch: 24 Global Step: 41590 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-06 03:07:01,763-Speed 9471.24 samples/sec Loss 1.3043 LearningRate 0.0002 Epoch: 24 Global Step: 41600 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-06 03:07:27,804-Speed 9437.96 samples/sec Loss 1.2929 LearningRate 0.0002 Epoch: 24 Global Step: 41610 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-06 03:07:53,845-Speed 9437.56 samples/sec Loss 1.2983 LearningRate 0.0002 Epoch: 24 Global Step: 41620 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-06 03:08:19,881-Speed 9439.73 samples/sec Loss 1.3123 LearningRate 0.0002 Epoch: 24 Global Step: 41630 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-06 03:08:45,900-Speed 9445.74 samples/sec Loss 1.3105 LearningRate 0.0002 Epoch: 24 Global Step: 41640 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-06 03:09:11,886-Speed 9457.67 samples/sec Loss 1.3025 LearningRate 0.0002 Epoch: 24 Global Step: 41650 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-06 03:09:37,903-Speed 9446.53 samples/sec Loss 1.2988 LearningRate 0.0002 Epoch: 24 Global Step: 41660 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-06 03:10:03,939-Speed 9439.98 samples/sec Loss 1.2989 LearningRate 0.0002 Epoch: 24 Global Step: 41670 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 03:10:29,973-Speed 9440.47 samples/sec Loss 1.2986 LearningRate 0.0002 Epoch: 24 Global Step: 41680 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 03:10:56,022-Speed 9434.93 samples/sec Loss 1.3051 LearningRate 0.0002 Epoch: 24 Global Step: 41690 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 03:11:22,039-Speed 9446.50 samples/sec Loss 1.3047 LearningRate 0.0002 Epoch: 24 Global Step: 41700 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 03:11:48,125-Speed 9421.27 samples/sec Loss 1.2989 LearningRate 0.0002 Epoch: 24 Global Step: 41710 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 03:12:14,214-Speed 9420.53 samples/sec Loss 1.3054 LearningRate 0.0002 Epoch: 24 Global Step: 41720 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-06 03:12:40,268-Speed 9432.90 samples/sec Loss 1.3141 LearningRate 0.0002 Epoch: 24 Global Step: 41730 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-06 03:13:06,314-Speed 9436.40 samples/sec Loss 1.3056 LearningRate 0.0002 Epoch: 24 Global Step: 41740 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-06 03:13:32,332-Speed 9446.04 samples/sec Loss 1.3157 LearningRate 0.0002 Epoch: 24 Global Step: 41750 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-06 03:13:58,371-Speed 9438.55 samples/sec Loss 1.2983 LearningRate 0.0002 Epoch: 24 Global Step: 41760 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-06 03:14:24,446-Speed 9425.57 samples/sec Loss 1.3009 LearningRate 0.0002 Epoch: 24 Global Step: 41770 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-06 03:14:50,488-Speed 9437.60 samples/sec Loss 1.3002 LearningRate 0.0002 Epoch: 24 Global Step: 41780 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-06 03:15:16,600-Speed 9412.19 samples/sec Loss 1.3001 LearningRate 0.0002 Epoch: 24 Global Step: 41790 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-06 03:15:42,651-Speed 9434.20 samples/sec Loss 1.3068 LearningRate 0.0002 Epoch: 24 Global Step: 41800 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-06 03:16:08,754-Speed 9415.33 samples/sec Loss 1.3109 LearningRate 0.0002 Epoch: 24 Global Step: 41810 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-06 03:16:34,803-Speed 9435.28 samples/sec Loss 1.3055 LearningRate 0.0002 Epoch: 24 Global Step: 41820 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 03:17:00,871-Speed 9427.97 samples/sec Loss 1.3081 LearningRate 0.0002 Epoch: 24 Global Step: 41830 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 03:17:26,972-Speed 9416.43 samples/sec Loss 1.2994 LearningRate 0.0002 Epoch: 24 Global Step: 41840 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 03:17:53,033-Speed 9430.76 samples/sec Loss 1.3054 LearningRate 0.0002 Epoch: 24 Global Step: 41850 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 03:18:19,072-Speed 9438.15 samples/sec Loss 1.2933 LearningRate 0.0002 Epoch: 24 Global Step: 41860 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 03:18:45,213-Speed 9401.74 samples/sec Loss 1.2952 LearningRate 0.0002 Epoch: 24 Global Step: 41870 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 03:19:11,396-Speed 9386.89 samples/sec Loss 1.3011 LearningRate 0.0002 Epoch: 24 Global Step: 41880 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 03:19:37,456-Speed 9430.87 samples/sec Loss 1.2933 LearningRate 0.0002 Epoch: 24 Global Step: 41890 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 03:20:03,628-Speed 9390.70 samples/sec Loss 1.2991 LearningRate 0.0002 Epoch: 24 Global Step: 41900 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 03:20:29,747-Speed 9409.51 samples/sec Loss 1.3006 LearningRate 0.0002 Epoch: 24 Global Step: 41910 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 03:20:55,865-Speed 9409.90 samples/sec Loss 1.3019 LearningRate 0.0002 Epoch: 24 Global Step: 41920 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-03-06 03:21:21,993-Speed 9406.35 samples/sec Loss 1.3049 LearningRate 0.0002 Epoch: 24 Global Step: 41930 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-03-06 03:21:48,147-Speed 9397.05 samples/sec Loss 1.2993 LearningRate 0.0002 Epoch: 24 Global Step: 41940 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-03-06 03:22:14,253-Speed 9414.19 samples/sec Loss 1.2944 LearningRate 0.0002 Epoch: 24 Global Step: 41950 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 03:22:40,358-Speed 9414.80 samples/sec Loss 1.3051 LearningRate 0.0002 Epoch: 24 Global Step: 41960 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 03:23:06,632-Speed 9353.91 samples/sec Loss 1.2984 LearningRate 0.0002 Epoch: 24 Global Step: 41970 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 03:23:32,788-Speed 9396.51 samples/sec Loss 1.3000 LearningRate 0.0002 Epoch: 24 Global Step: 41980 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 03:23:59,017-Speed 9370.17 samples/sec Loss 1.2992 LearningRate 0.0002 Epoch: 24 Global Step: 41990 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 03:24:25,236-Speed 9373.50 samples/sec Loss 1.2992 LearningRate 0.0002 Epoch: 24 Global Step: 42000 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 03:24:51,519-Speed 9351.10 samples/sec Loss 1.3014 LearningRate 0.0002 Epoch: 24 Global Step: 42010 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 03:25:17,691-Speed 9390.52 samples/sec Loss 1.2892 LearningRate 0.0002 Epoch: 24 Global Step: 42020 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 03:25:44,007-Speed 9339.07 samples/sec Loss 1.2918 LearningRate 0.0002 Epoch: 24 Global Step: 42030 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 03:26:10,225-Speed 9374.30 samples/sec Loss 1.2954 LearningRate 0.0002 Epoch: 24 Global Step: 42040 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 03:26:36,509-Speed 9350.45 samples/sec Loss 1.2887 LearningRate 0.0002 Epoch: 24 Global Step: 42050 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-03-06 03:27:02,689-Speed 9387.69 samples/sec Loss 1.2968 LearningRate 0.0002 Epoch: 24 Global Step: 42060 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-03-06 03:27:28,885-Speed 9382.09 samples/sec Loss 1.2856 LearningRate 0.0002 Epoch: 24 Global Step: 42070 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-03-06 03:27:55,162-Speed 9352.95 samples/sec Loss 1.2948 LearningRate 0.0002 Epoch: 24 Global Step: 42080 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-03-06 03:28:21,281-Speed 9409.81 samples/sec Loss 1.2839 LearningRate 0.0002 Epoch: 24 Global Step: 42090 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 03:28:47,721-Speed 9295.61 samples/sec Loss 1.2856 LearningRate 0.0002 Epoch: 24 Global Step: 42100 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 03:29:14,137-Speed 9303.65 samples/sec Loss 1.2929 LearningRate 0.0002 Epoch: 24 Global Step: 42110 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 03:29:40,555-Speed 9303.47 samples/sec Loss 1.2835 LearningRate 0.0002 Epoch: 24 Global Step: 42120 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 03:30:06,820-Speed 9357.21 samples/sec Loss 1.2944 LearningRate 0.0002 Epoch: 24 Global Step: 42130 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 03:30:33,124-Speed 9343.42 samples/sec Loss 1.2981 LearningRate 0.0002 Epoch: 24 Global Step: 42140 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 03:30:59,416-Speed 9347.73 samples/sec Loss 1.2856 LearningRate 0.0002 Epoch: 24 Global Step: 42150 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 03:31:25,859-Speed 9294.56 samples/sec Loss 1.2958 LearningRate 0.0002 Epoch: 24 Global Step: 42160 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 03:31:52,122-Speed 9357.97 samples/sec Loss 1.2942 LearningRate 0.0002 Epoch: 24 Global Step: 42170 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 03:32:18,451-Speed 9334.66 samples/sec Loss 1.2849 LearningRate 0.0002 Epoch: 24 Global Step: 42180 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 03:32:44,734-Speed 9351.25 samples/sec Loss 1.2959 LearningRate 0.0002 Epoch: 24 Global Step: 42190 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-03-06 03:33:11,065-Speed 9333.79 samples/sec Loss 1.2860 LearningRate 0.0002 Epoch: 24 Global Step: 42200 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 03:33:37,263-Speed 9381.51 samples/sec Loss 1.2907 LearningRate 0.0002 Epoch: 24 Global Step: 42210 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 03:34:03,597-Speed 9332.74 samples/sec Loss 1.2857 LearningRate 0.0002 Epoch: 24 Global Step: 42220 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 03:34:29,792-Speed 9382.46 samples/sec Loss 1.2924 LearningRate 0.0002 Epoch: 24 Global Step: 42230 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 03:34:56,056-Speed 9357.73 samples/sec Loss 1.2924 LearningRate 0.0002 Epoch: 24 Global Step: 42240 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 03:35:22,314-Speed 9359.82 samples/sec Loss 1.2792 LearningRate 0.0002 Epoch: 24 Global Step: 42250 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 03:35:48,443-Speed 9405.75 samples/sec Loss 1.2902 LearningRate 0.0002 Epoch: 24 Global Step: 42260 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 03:36:14,611-Speed 9392.16 samples/sec Loss 1.2983 LearningRate 0.0002 Epoch: 24 Global Step: 42270 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 03:36:40,788-Speed 9388.94 samples/sec Loss 1.2886 LearningRate 0.0002 Epoch: 24 Global Step: 42280 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 03:37:07,011-Speed 9372.25 samples/sec Loss 1.2822 LearningRate 0.0002 Epoch: 24 Global Step: 42290 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 03:37:33,258-Speed 9364.10 samples/sec Loss 1.2824 LearningRate 0.0002 Epoch: 24 Global Step: 42300 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 03:37:59,483-Speed 9371.24 samples/sec Loss 1.2817 LearningRate 0.0002 Epoch: 24 Global Step: 42310 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-06 03:38:25,762-Speed 9352.25 samples/sec Loss 1.2775 LearningRate 0.0002 Epoch: 24 Global Step: 42320 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-06 03:38:52,056-Speed 9347.33 samples/sec Loss 1.2786 LearningRate 0.0002 Epoch: 24 Global Step: 42330 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-03-06 03:39:18,348-Speed 9347.69 samples/sec Loss 1.2779 LearningRate 0.0002 Epoch: 24 Global Step: 42340 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-03-06 03:39:44,618-Speed 9355.39 samples/sec Loss 1.2816 LearningRate 0.0002 Epoch: 24 Global Step: 42350 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-03-06 03:40:10,957-Speed 9331.19 samples/sec Loss 1.2804 LearningRate 0.0002 Epoch: 24 Global Step: 42360 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-03-06 03:40:37,254-Speed 9345.81 samples/sec Loss 1.2763 LearningRate 0.0002 Epoch: 24 Global Step: 42370 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-03-06 03:41:03,522-Speed 9356.25 samples/sec Loss 1.2758 LearningRate 0.0002 Epoch: 24 Global Step: 42380 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-03-06 03:41:29,772-Speed 9363.09 samples/sec Loss 1.2821 LearningRate 0.0002 Epoch: 24 Global Step: 42390 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-03-06 03:41:56,082-Speed 9341.40 samples/sec Loss 1.2753 LearningRate 0.0002 Epoch: 24 Global Step: 42400 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-03-06 03:42:22,303-Speed 9372.89 samples/sec Loss 1.2745 LearningRate 0.0002 Epoch: 24 Global Step: 42410 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-03-06 03:42:48,509-Speed 9378.56 samples/sec Loss 1.2762 LearningRate 0.0002 Epoch: 24 Global Step: 42420 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-03-06 03:43:14,859-Speed 9327.30 samples/sec Loss 1.2715 LearningRate 0.0002 Epoch: 24 Global Step: 42430 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-06 03:43:41,173-Speed 9339.90 samples/sec Loss 1.2863 LearningRate 0.0002 Epoch: 24 Global Step: 42440 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-06 03:44:07,458-Speed 9350.35 samples/sec Loss 1.2819 LearningRate 0.0002 Epoch: 24 Global Step: 42450 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-06 03:44:33,651-Speed 9383.19 samples/sec Loss 1.2840 LearningRate 0.0002 Epoch: 24 Global Step: 42460 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-06 03:44:59,859-Speed 9377.78 samples/sec Loss 1.2806 LearningRate 0.0002 Epoch: 24 Global Step: 42470 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-06 03:45:26,020-Speed 9394.46 samples/sec Loss 1.2808 LearningRate 0.0002 Epoch: 24 Global Step: 42480 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-06 03:45:52,164-Speed 9400.48 samples/sec Loss 1.2788 LearningRate 0.0002 Epoch: 24 Global Step: 42490 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-06 03:46:18,276-Speed 9412.52 samples/sec Loss 1.2858 LearningRate 0.0002 Epoch: 24 Global Step: 42500 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-06 03:46:44,367-Speed 9419.66 samples/sec Loss 1.2767 LearningRate 0.0002 Epoch: 24 Global Step: 42510 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-03-06 03:47:10,629-Speed 9358.47 samples/sec Loss 1.2813 LearningRate 0.0002 Epoch: 24 Global Step: 42520 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-03-06 03:47:36,756-Speed 9406.47 samples/sec Loss 1.2809 LearningRate 0.0002 Epoch: 24 Global Step: 42530 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-03-06 03:48:03,010-Speed 9361.39 samples/sec Loss 1.2722 LearningRate 0.0002 Epoch: 24 Global Step: 42540 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-03-06 03:48:29,348-Speed 9331.44 samples/sec Loss 1.2685 LearningRate 0.0002 Epoch: 24 Global Step: 42550 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-03-06 03:48:55,662-Speed 9340.03 samples/sec Loss 1.2709 LearningRate 0.0002 Epoch: 24 Global Step: 42560 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-03-06 03:49:21,938-Speed 9353.37 samples/sec Loss 1.2733 LearningRate 0.0002 Epoch: 24 Global Step: 42570 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-03-06 03:49:48,073-Speed 9404.01 samples/sec Loss 1.2707 LearningRate 0.0002 Epoch: 24 Global Step: 42580 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-03-06 03:50:14,308-Speed 9367.91 samples/sec Loss 1.2773 LearningRate 0.0002 Epoch: 24 Global Step: 42590 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-03-06 03:50:40,483-Speed 9389.79 samples/sec Loss 1.2814 LearningRate 0.0002 Epoch: 24 Global Step: 42600 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-03-06 03:51:06,642-Speed 9395.27 samples/sec Loss 1.2776 LearningRate 0.0002 Epoch: 24 Global Step: 42610 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-06 03:51:32,752-Speed 9412.64 samples/sec Loss 1.2755 LearningRate 0.0002 Epoch: 24 Global Step: 42620 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-06 03:51:58,957-Speed 9378.84 samples/sec Loss 1.2701 LearningRate 0.0002 Epoch: 24 Global Step: 42630 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-06 03:52:25,321-Speed 9322.91 samples/sec Loss 1.2690 LearningRate 0.0002 Epoch: 24 Global Step: 42640 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-06 03:52:51,544-Speed 9372.60 samples/sec Loss 1.2705 LearningRate 0.0002 Epoch: 24 Global Step: 42650 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-06 03:53:17,711-Speed 9392.45 samples/sec Loss 1.2709 LearningRate 0.0002 Epoch: 24 Global Step: 42660 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-06 03:53:43,904-Speed 9383.19 samples/sec Loss 1.2639 LearningRate 0.0002 Epoch: 24 Global Step: 42670 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-06 03:54:10,186-Speed 9351.60 samples/sec Loss 1.2727 LearningRate 0.0002 Epoch: 24 Global Step: 42680 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-06 03:54:36,417-Speed 9369.46 samples/sec Loss 1.2635 LearningRate 0.0002 Epoch: 24 Global Step: 42690 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-06 03:55:02,627-Speed 9376.89 samples/sec Loss 1.2728 LearningRate 0.0002 Epoch: 24 Global Step: 42700 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-06 03:55:28,750-Speed 9408.30 samples/sec Loss 1.2714 LearningRate 0.0002 Epoch: 24 Global Step: 42710 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 03:55:54,930-Speed 9387.97 samples/sec Loss 1.2681 LearningRate 0.0002 Epoch: 24 Global Step: 42720 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 03:56:21,048-Speed 9410.62 samples/sec Loss 1.2616 LearningRate 0.0002 Epoch: 24 Global Step: 42730 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 03:56:47,274-Speed 9371.30 samples/sec Loss 1.2673 LearningRate 0.0002 Epoch: 24 Global Step: 42740 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 03:57:13,433-Speed 9395.22 samples/sec Loss 1.2571 LearningRate 0.0002 Epoch: 24 Global Step: 42750 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 03:57:39,650-Speed 9374.29 samples/sec Loss 1.2735 LearningRate 0.0002 Epoch: 24 Global Step: 42760 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 03:58:05,807-Speed 9395.86 samples/sec Loss 1.2607 LearningRate 0.0002 Epoch: 24 Global Step: 42770 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 03:58:31,934-Speed 9406.96 samples/sec Loss 1.2704 LearningRate 0.0002 Epoch: 24 Global Step: 42780 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 03:58:58,057-Speed 9408.02 samples/sec Loss 1.2605 LearningRate 0.0002 Epoch: 24 Global Step: 42790 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 03:59:24,225-Speed 9392.09 samples/sec Loss 1.2620 LearningRate 0.0002 Epoch: 24 Global Step: 42800 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 03:59:50,382-Speed 9395.97 samples/sec Loss 1.2730 LearningRate 0.0002 Epoch: 24 Global Step: 42810 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-03-06 04:00:16,526-Speed 9400.50 samples/sec Loss 1.2644 LearningRate 0.0002 Epoch: 24 Global Step: 42820 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-03-06 04:00:42,622-Speed 9418.25 samples/sec Loss 1.2622 LearningRate 0.0002 Epoch: 24 Global Step: 42830 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 04:01:08,736-Speed 9411.37 samples/sec Loss 1.2604 LearningRate 0.0002 Epoch: 24 Global Step: 42840 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 04:01:34,794-Speed 9431.80 samples/sec Loss 1.2592 LearningRate 0.0002 Epoch: 24 Global Step: 42850 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 04:02:00,949-Speed 9396.44 samples/sec Loss 1.2675 LearningRate 0.0002 Epoch: 24 Global Step: 42860 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 04:02:27,073-Speed 9408.22 samples/sec Loss 1.2686 LearningRate 0.0002 Epoch: 24 Global Step: 42870 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 04:02:53,290-Speed 9375.33 samples/sec Loss 1.2557 LearningRate 0.0002 Epoch: 24 Global Step: 42880 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 04:03:19,390-Speed 9416.47 samples/sec Loss 1.2592 LearningRate 0.0002 Epoch: 24 Global Step: 42890 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 04:03:45,532-Speed 9401.20 samples/sec Loss 1.2651 LearningRate 0.0002 Epoch: 24 Global Step: 42900 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 04:04:11,675-Speed 9401.27 samples/sec Loss 1.2655 LearningRate 0.0002 Epoch: 24 Global Step: 42910 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 04:04:37,817-Speed 9401.18 samples/sec Loss 1.2636 LearningRate 0.0002 Epoch: 24 Global Step: 42920 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 04:05:03,914-Speed 9418.59 samples/sec Loss 1.2655 LearningRate 0.0002 Epoch: 24 Global Step: 42930 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 04:05:30,021-Speed 9414.16 samples/sec Loss 1.2619 LearningRate 0.0002 Epoch: 24 Global Step: 42940 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 04:05:56,101-Speed 9423.68 samples/sec Loss 1.2646 LearningRate 0.0002 Epoch: 24 Global Step: 42950 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 04:06:22,204-Speed 9415.35 samples/sec Loss 1.2631 LearningRate 0.0002 Epoch: 24 Global Step: 42960 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 04:06:48,262-Speed 9431.70 samples/sec Loss 1.2674 LearningRate 0.0002 Epoch: 24 Global Step: 42970 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 04:07:14,439-Speed 9388.54 samples/sec Loss 1.2652 LearningRate 0.0002 Epoch: 24 Global Step: 42980 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 04:07:40,539-Speed 9416.53 samples/sec Loss 1.2521 LearningRate 0.0002 Epoch: 24 Global Step: 42990 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 04:08:06,654-Speed 9411.11 samples/sec Loss 1.2594 LearningRate 0.0002 Epoch: 24 Global Step: 43000 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 04:08:32,762-Speed 9413.48 samples/sec Loss 1.2608 LearningRate 0.0002 Epoch: 24 Global Step: 43010 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 04:08:58,896-Speed 9404.48 samples/sec Loss 1.2579 LearningRate 0.0002 Epoch: 24 Global Step: 43020 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 04:09:25,056-Speed 9394.91 samples/sec Loss 1.2506 LearningRate 0.0002 Epoch: 24 Global Step: 43030 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-03-06 04:09:51,260-Speed 9379.03 samples/sec Loss 1.2537 LearningRate 0.0002 Epoch: 24 Global Step: 43040 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-03-06 04:10:17,479-Speed 9374.05 samples/sec Loss 1.2646 LearningRate 0.0002 Epoch: 24 Global Step: 43050 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-03-06 04:10:43,647-Speed 9391.98 samples/sec Loss 1.2559 LearningRate 0.0002 Epoch: 24 Global Step: 43060 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-03-06 04:11:09,839-Speed 9383.42 samples/sec Loss 1.2567 LearningRate 0.0002 Epoch: 24 Global Step: 43070 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 04:11:35,937-Speed 9417.28 samples/sec Loss 1.2507 LearningRate 0.0002 Epoch: 24 Global Step: 43080 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 04:12:02,157-Speed 9373.39 samples/sec Loss 1.2640 LearningRate 0.0002 Epoch: 24 Global Step: 43090 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 04:12:28,271-Speed 9411.55 samples/sec Loss 1.2629 LearningRate 0.0002 Epoch: 24 Global Step: 43100 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 04:12:54,471-Speed 9380.78 samples/sec Loss 1.2487 LearningRate 0.0002 Epoch: 24 Global Step: 43110 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 04:13:20,584-Speed 9411.80 samples/sec Loss 1.2510 LearningRate 0.0002 Epoch: 24 Global Step: 43120 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-06 04:13:46,705-Speed 9409.16 samples/sec Loss 1.2666 LearningRate 0.0002 Epoch: 24 Global Step: 43130 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-06 04:14:12,899-Speed 9382.40 samples/sec Loss 1.2593 LearningRate 0.0002 Epoch: 24 Global Step: 43140 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-06 04:14:39,007-Speed 9414.04 samples/sec Loss 1.2666 LearningRate 0.0002 Epoch: 24 Global Step: 43150 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-06 04:15:05,145-Speed 9402.74 samples/sec Loss 1.2687 LearningRate 0.0002 Epoch: 24 Global Step: 43160 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-06 04:15:31,298-Speed 9398.15 samples/sec Loss 1.2599 LearningRate 0.0002 Epoch: 24 Global Step: 43170 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-06 04:15:57,531-Speed 9369.10 samples/sec Loss 1.2566 LearningRate 0.0002 Epoch: 24 Global Step: 43180 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-06 04:16:23,747-Speed 9374.53 samples/sec Loss 1.2648 LearningRate 0.0002 Epoch: 24 Global Step: 43190 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-06 04:16:49,928-Speed 9387.60 samples/sec Loss 1.2718 LearningRate 0.0002 Epoch: 24 Global Step: 43200 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-06 04:18:07,553-Speed 3166.07 samples/sec Loss 1.2672 LearningRate 0.0002 Epoch: 25 Global Step: 43210 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-06 04:18:33,616-Speed 9429.71 samples/sec Loss 1.2387 LearningRate 0.0002 Epoch: 25 Global Step: 43220 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-06 04:18:59,727-Speed 9412.55 samples/sec Loss 1.2556 LearningRate 0.0002 Epoch: 25 Global Step: 43230 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-06 04:19:25,893-Speed 9393.16 samples/sec Loss 1.2375 LearningRate 0.0002 Epoch: 25 Global Step: 43240 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-06 04:19:52,314-Speed 9302.17 samples/sec Loss 1.2389 LearningRate 0.0002 Epoch: 25 Global Step: 43250 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-06 04:20:18,737-Speed 9301.26 samples/sec Loss 1.2425 LearningRate 0.0002 Epoch: 25 Global Step: 43260 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-06 04:20:45,029-Speed 9347.68 samples/sec Loss 1.2490 LearningRate 0.0002 Epoch: 25 Global Step: 43270 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-06 04:21:11,286-Speed 9360.40 samples/sec Loss 1.2398 LearningRate 0.0002 Epoch: 25 Global Step: 43280 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-06 04:21:37,840-Speed 9255.48 samples/sec Loss 1.2370 LearningRate 0.0002 Epoch: 25 Global Step: 43290 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-06 04:22:04,260-Speed 9302.56 samples/sec Loss 1.2458 LearningRate 0.0002 Epoch: 25 Global Step: 43300 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-06 04:22:30,799-Speed 9260.68 samples/sec Loss 1.2457 LearningRate 0.0002 Epoch: 25 Global Step: 43310 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-06 04:22:57,171-Speed 9319.27 samples/sec Loss 1.2406 LearningRate 0.0002 Epoch: 25 Global Step: 43320 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-06 04:23:23,635-Speed 9287.30 samples/sec Loss 1.2462 LearningRate 0.0002 Epoch: 25 Global Step: 43330 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-06 04:23:49,977-Speed 9329.88 samples/sec Loss 1.2453 LearningRate 0.0002 Epoch: 25 Global Step: 43340 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-06 04:24:16,270-Speed 9347.35 samples/sec Loss 1.2416 LearningRate 0.0002 Epoch: 25 Global Step: 43350 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 04:24:42,834-Speed 9252.40 samples/sec Loss 1.2366 LearningRate 0.0002 Epoch: 25 Global Step: 43360 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 04:25:09,367-Speed 9262.82 samples/sec Loss 1.2449 LearningRate 0.0002 Epoch: 25 Global Step: 43370 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 04:25:35,972-Speed 9237.74 samples/sec Loss 1.2440 LearningRate 0.0002 Epoch: 25 Global Step: 43380 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 04:26:02,503-Speed 9263.28 samples/sec Loss 1.2421 LearningRate 0.0002 Epoch: 25 Global Step: 43390 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 04:26:29,062-Speed 9253.85 samples/sec Loss 1.2505 LearningRate 0.0002 Epoch: 25 Global Step: 43400 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 04:26:55,589-Speed 9264.88 samples/sec Loss 1.2419 LearningRate 0.0002 Epoch: 25 Global Step: 43410 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 04:27:22,131-Speed 9259.77 samples/sec Loss 1.2452 LearningRate 0.0002 Epoch: 25 Global Step: 43420 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 04:27:48,715-Speed 9245.19 samples/sec Loss 1.2365 LearningRate 0.0002 Epoch: 25 Global Step: 43430 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 04:28:15,272-Speed 9254.60 samples/sec Loss 1.2308 LearningRate 0.0002 Epoch: 25 Global Step: 43440 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 04:28:41,790-Speed 9268.31 samples/sec Loss 1.2499 LearningRate 0.0002 Epoch: 25 Global Step: 43450 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-03-06 04:29:08,242-Speed 9291.22 samples/sec Loss 1.2396 LearningRate 0.0002 Epoch: 25 Global Step: 43460 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-03-06 04:29:34,787-Speed 9258.68 samples/sec Loss 1.2428 LearningRate 0.0002 Epoch: 25 Global Step: 43470 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-03-06 04:30:01,286-Speed 9274.76 samples/sec Loss 1.2346 LearningRate 0.0002 Epoch: 25 Global Step: 43480 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-03-06 04:30:27,619-Speed 9333.34 samples/sec Loss 1.2359 LearningRate 0.0002 Epoch: 25 Global Step: 43490 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 04:30:54,171-Speed 9256.09 samples/sec Loss 1.2366 LearningRate 0.0002 Epoch: 25 Global Step: 43500 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 04:31:20,655-Speed 9279.89 samples/sec Loss 1.2363 LearningRate 0.0002 Epoch: 25 Global Step: 43510 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 04:31:47,233-Speed 9247.19 samples/sec Loss 1.2340 LearningRate 0.0002 Epoch: 25 Global Step: 43520 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 04:32:13,629-Speed 9310.81 samples/sec Loss 1.2403 LearningRate 0.0002 Epoch: 25 Global Step: 43530 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-06 04:32:40,004-Speed 9318.44 samples/sec Loss 1.2302 LearningRate 0.0002 Epoch: 25 Global Step: 43540 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-06 04:33:06,496-Speed 9277.06 samples/sec Loss 1.2419 LearningRate 0.0002 Epoch: 25 Global Step: 43550 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-06 04:33:33,055-Speed 9253.84 samples/sec Loss 1.2435 LearningRate 0.0002 Epoch: 25 Global Step: 43560 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-06 04:33:59,651-Speed 9241.06 samples/sec Loss 1.2369 LearningRate 0.0002 Epoch: 25 Global Step: 43570 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-06 04:34:26,255-Speed 9238.18 samples/sec Loss 1.2389 LearningRate 0.0002 Epoch: 25 Global Step: 43580 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-06 04:34:52,723-Speed 9285.61 samples/sec Loss 1.2491 LearningRate 0.0002 Epoch: 25 Global Step: 43590 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-06 04:35:19,210-Speed 9278.53 samples/sec Loss 1.2358 LearningRate 0.0002 Epoch: 25 Global Step: 43600 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-06 04:35:45,597-Speed 9314.98 samples/sec Loss 1.2413 LearningRate 0.0002 Epoch: 25 Global Step: 43610 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-06 04:36:12,121-Speed 9265.96 samples/sec Loss 1.2393 LearningRate 0.0002 Epoch: 25 Global Step: 43620 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-06 04:36:38,234-Speed 9411.81 samples/sec Loss 1.2399 LearningRate 0.0002 Epoch: 25 Global Step: 43630 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 04:37:04,366-Speed 9404.88 samples/sec Loss 1.2329 LearningRate 0.0002 Epoch: 25 Global Step: 43640 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 04:37:30,451-Speed 9422.28 samples/sec Loss 1.2228 LearningRate 0.0002 Epoch: 25 Global Step: 43650 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-06 04:37:56,593-Speed 9401.44 samples/sec Loss 1.2336 LearningRate 0.0002 Epoch: 25 Global Step: 43660 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-06 04:38:22,850-Speed 9360.32 samples/sec Loss 1.2401 LearningRate 0.0002 Epoch: 25 Global Step: 43670 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-06 04:38:49,065-Speed 9375.32 samples/sec Loss 1.2424 LearningRate 0.0002 Epoch: 25 Global Step: 43680 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-06 04:39:15,306-Speed 9366.07 samples/sec Loss 1.2302 LearningRate 0.0002 Epoch: 25 Global Step: 43690 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-06 04:39:41,471-Speed 9392.85 samples/sec Loss 1.2449 LearningRate 0.0002 Epoch: 25 Global Step: 43700 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-06 04:40:07,698-Speed 9371.00 samples/sec Loss 1.2385 LearningRate 0.0002 Epoch: 25 Global Step: 43710 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-06 04:40:33,868-Speed 9391.41 samples/sec Loss 1.2387 LearningRate 0.0002 Epoch: 25 Global Step: 43720 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-06 04:41:00,102-Speed 9368.30 samples/sec Loss 1.2366 LearningRate 0.0002 Epoch: 25 Global Step: 43730 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-06 04:41:26,286-Speed 9386.45 samples/sec Loss 1.2269 LearningRate 0.0002 Epoch: 25 Global Step: 43740 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-06 04:41:52,515-Speed 9370.10 samples/sec Loss 1.2382 LearningRate 0.0002 Epoch: 25 Global Step: 43750 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-06 04:42:18,773-Speed 9360.15 samples/sec Loss 1.2334 LearningRate 0.0002 Epoch: 25 Global Step: 43760 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-06 04:42:45,007-Speed 9368.46 samples/sec Loss 1.2314 LearningRate 0.0002 Epoch: 25 Global Step: 43770 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-06 04:43:11,200-Speed 9382.93 samples/sec Loss 1.2305 LearningRate 0.0002 Epoch: 25 Global Step: 43780 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-06 04:43:37,499-Speed 9345.36 samples/sec Loss 1.2387 LearningRate 0.0002 Epoch: 25 Global Step: 43790 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-06 04:44:03,717-Speed 9374.24 samples/sec Loss 1.2277 LearningRate 0.0002 Epoch: 25 Global Step: 43800 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-06 04:44:29,953-Speed 9367.52 samples/sec Loss 1.2201 LearningRate 0.0002 Epoch: 25 Global Step: 43810 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-06 04:44:56,097-Speed 9400.86 samples/sec Loss 1.2336 LearningRate 0.0002 Epoch: 25 Global Step: 43820 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-06 04:45:22,215-Speed 9409.92 samples/sec Loss 1.2256 LearningRate 0.0002 Epoch: 25 Global Step: 43830 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-06 04:45:48,312-Speed 9417.47 samples/sec Loss 1.2265 LearningRate 0.0002 Epoch: 25 Global Step: 43840 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-06 04:46:14,725-Speed 9305.12 samples/sec Loss 1.2324 LearningRate 0.0002 Epoch: 25 Global Step: 43850 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-06 04:46:41,016-Speed 9348.14 samples/sec Loss 1.2357 LearningRate 0.0002 Epoch: 25 Global Step: 43860 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 04:47:07,268-Speed 9361.78 samples/sec Loss 1.2283 LearningRate 0.0002 Epoch: 25 Global Step: 43870 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 04:47:33,505-Speed 9368.14 samples/sec Loss 1.2250 LearningRate 0.0002 Epoch: 25 Global Step: 43880 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 04:47:59,808-Speed 9343.49 samples/sec Loss 1.2300 LearningRate 0.0002 Epoch: 25 Global Step: 43890 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 04:48:26,144-Speed 9332.17 samples/sec Loss 1.2231 LearningRate 0.0002 Epoch: 25 Global Step: 43900 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 04:48:52,355-Speed 9376.42 samples/sec Loss 1.2304 LearningRate 0.0002 Epoch: 25 Global Step: 43910 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 04:49:18,548-Speed 9383.23 samples/sec Loss 1.2319 LearningRate 0.0002 Epoch: 25 Global Step: 43920 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 04:49:44,744-Speed 9381.93 samples/sec Loss 1.2255 LearningRate 0.0002 Epoch: 25 Global Step: 43930 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 04:50:11,067-Speed 9336.70 samples/sec Loss 1.2284 LearningRate 0.0002 Epoch: 25 Global Step: 43940 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 04:50:37,277-Speed 9377.11 samples/sec Loss 1.2341 LearningRate 0.0002 Epoch: 25 Global Step: 43950 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-06 04:51:03,550-Speed 9355.74 samples/sec Loss 1.2307 LearningRate 0.0002 Epoch: 25 Global Step: 43960 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-03-06 04:51:29,880-Speed 9334.28 samples/sec Loss 1.2231 LearningRate 0.0002 Epoch: 25 Global Step: 43970 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-06 04:51:56,162-Speed 9351.39 samples/sec Loss 1.2291 LearningRate 0.0002 Epoch: 25 Global Step: 43980 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-06 04:52:22,277-Speed 9410.86 samples/sec Loss 1.2306 LearningRate 0.0002 Epoch: 25 Global Step: 43990 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-06 04:52:48,513-Speed 9367.60 samples/sec Loss 1.2317 LearningRate 0.0002 Epoch: 25 Global Step: 44000 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-06 04:53:14,911-Speed 9310.37 samples/sec Loss 1.2230 LearningRate 0.0002 Epoch: 25 Global Step: 44010 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-06 04:53:41,428-Speed 9268.37 samples/sec Loss 1.2309 LearningRate 0.0002 Epoch: 25 Global Step: 44020 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-06 04:54:07,995-Speed 9250.87 samples/sec Loss 1.2307 LearningRate 0.0002 Epoch: 25 Global Step: 44030 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-06 04:54:34,429-Speed 9297.65 samples/sec Loss 1.2197 LearningRate 0.0002 Epoch: 25 Global Step: 44040 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-06 04:55:00,722-Speed 9347.45 samples/sec Loss 1.2251 LearningRate 0.0002 Epoch: 25 Global Step: 44050 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-06 04:55:27,165-Speed 9294.72 samples/sec Loss 1.2191 LearningRate 0.0002 Epoch: 25 Global Step: 44060 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-06 04:55:53,472-Speed 9342.17 samples/sec Loss 1.2126 LearningRate 0.0002 Epoch: 25 Global Step: 44070 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 04:56:19,976-Speed 9272.95 samples/sec Loss 1.2260 LearningRate 0.0002 Epoch: 25 Global Step: 44080 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 04:56:46,545-Speed 9250.47 samples/sec Loss 1.2303 LearningRate 0.0002 Epoch: 25 Global Step: 44090 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 04:57:13,199-Speed 9220.52 samples/sec Loss 1.2192 LearningRate 0.0002 Epoch: 25 Global Step: 44100 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 04:57:39,701-Speed 9273.66 samples/sec Loss 1.2167 LearningRate 0.0002 Epoch: 25 Global Step: 44110 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 04:58:06,624-Speed 9128.95 samples/sec Loss 1.2228 LearningRate 0.0002 Epoch: 25 Global Step: 44120 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 04:58:33,209-Speed 9244.55 samples/sec Loss 1.2217 LearningRate 0.0002 Epoch: 25 Global Step: 44130 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 04:58:59,773-Speed 9251.82 samples/sec Loss 1.2204 LearningRate 0.0002 Epoch: 25 Global Step: 44140 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 04:59:26,274-Speed 9275.11 samples/sec Loss 1.2153 LearningRate 0.0002 Epoch: 25 Global Step: 44150 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 04:59:52,914-Speed 9225.45 samples/sec Loss 1.2229 LearningRate 0.0002 Epoch: 25 Global Step: 44160 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 05:00:19,478-Speed 9252.28 samples/sec Loss 1.2040 LearningRate 0.0002 Epoch: 25 Global Step: 44170 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-06 05:00:45,797-Speed 9338.03 samples/sec Loss 1.2178 LearningRate 0.0002 Epoch: 25 Global Step: 44180 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-06 05:01:12,010-Speed 9375.92 samples/sec Loss 1.2145 LearningRate 0.0002 Epoch: 25 Global Step: 44190 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-06 05:01:38,232-Speed 9372.63 samples/sec Loss 1.2116 LearningRate 0.0002 Epoch: 25 Global Step: 44200 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-06 05:02:04,491-Speed 9359.47 samples/sec Loss 1.2192 LearningRate 0.0002 Epoch: 25 Global Step: 44210 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-06 05:02:30,705-Speed 9375.45 samples/sec Loss 1.2171 LearningRate 0.0002 Epoch: 25 Global Step: 44220 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-06 05:02:56,884-Speed 9388.09 samples/sec Loss 1.2062 LearningRate 0.0002 Epoch: 25 Global Step: 44230 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-06 05:03:23,038-Speed 9397.01 samples/sec Loss 1.2138 LearningRate 0.0002 Epoch: 25 Global Step: 44240 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-06 05:03:49,179-Speed 9401.40 samples/sec Loss 1.2131 LearningRate 0.0002 Epoch: 25 Global Step: 44250 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-06 05:04:15,294-Speed 9411.17 samples/sec Loss 1.2170 LearningRate 0.0002 Epoch: 25 Global Step: 44260 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-06 05:04:41,336-Speed 9437.45 samples/sec Loss 1.2246 LearningRate 0.0002 Epoch: 25 Global Step: 44270 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 05:05:07,498-Speed 9394.29 samples/sec Loss 1.2133 LearningRate 0.0002 Epoch: 25 Global Step: 44280 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 05:05:33,659-Speed 9394.41 samples/sec Loss 1.2167 LearningRate 0.0002 Epoch: 25 Global Step: 44290 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 05:05:59,936-Speed 9352.98 samples/sec Loss 1.2055 LearningRate 0.0002 Epoch: 25 Global Step: 44300 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 05:06:26,147-Speed 9376.60 samples/sec Loss 1.2014 LearningRate 0.0002 Epoch: 25 Global Step: 44310 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 05:06:52,338-Speed 9383.78 samples/sec Loss 1.2057 LearningRate 0.0002 Epoch: 25 Global Step: 44320 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 05:07:18,446-Speed 9413.88 samples/sec Loss 1.2176 LearningRate 0.0002 Epoch: 25 Global Step: 44330 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 05:07:44,665-Speed 9373.37 samples/sec Loss 1.2055 LearningRate 0.0002 Epoch: 25 Global Step: 44340 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 05:08:10,910-Speed 9364.57 samples/sec Loss 1.2076 LearningRate 0.0002 Epoch: 25 Global Step: 44350 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 05:08:37,137-Speed 9370.94 samples/sec Loss 1.2102 LearningRate 0.0002 Epoch: 25 Global Step: 44360 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 05:09:03,378-Speed 9366.23 samples/sec Loss 1.2094 LearningRate 0.0002 Epoch: 25 Global Step: 44370 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-06 05:09:29,583-Speed 9378.81 samples/sec Loss 1.2090 LearningRate 0.0002 Epoch: 25 Global Step: 44380 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-06 05:09:55,828-Speed 9364.36 samples/sec Loss 1.2080 LearningRate 0.0002 Epoch: 25 Global Step: 44390 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-06 05:10:22,133-Speed 9342.88 samples/sec Loss 1.2026 LearningRate 0.0002 Epoch: 25 Global Step: 44400 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-06 05:10:48,516-Speed 9315.83 samples/sec Loss 1.2050 LearningRate 0.0002 Epoch: 25 Global Step: 44410 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-06 05:11:14,717-Speed 9379.99 samples/sec Loss 1.2105 LearningRate 0.0002 Epoch: 25 Global Step: 44420 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 05:11:41,109-Speed 9312.31 samples/sec Loss 1.2093 LearningRate 0.0002 Epoch: 25 Global Step: 44430 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 05:12:07,546-Speed 9296.56 samples/sec Loss 1.2085 LearningRate 0.0002 Epoch: 25 Global Step: 44440 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 05:12:33,998-Speed 9291.28 samples/sec Loss 1.2048 LearningRate 0.0002 Epoch: 25 Global Step: 44450 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 05:13:00,443-Speed 9293.42 samples/sec Loss 1.2013 LearningRate 0.0002 Epoch: 25 Global Step: 44460 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 05:13:26,667-Speed 9371.97 samples/sec Loss 1.2125 LearningRate 0.0002 Epoch: 25 Global Step: 44470 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 05:13:52,784-Speed 9411.33 samples/sec Loss 1.2014 LearningRate 0.0002 Epoch: 25 Global Step: 44480 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 05:14:18,905-Speed 9408.98 samples/sec Loss 1.2052 LearningRate 0.0002 Epoch: 25 Global Step: 44490 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 05:14:45,119-Speed 9375.69 samples/sec Loss 1.2028 LearningRate 0.0002 Epoch: 25 Global Step: 44500 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 05:15:11,277-Speed 9395.87 samples/sec Loss 1.2066 LearningRate 0.0002 Epoch: 25 Global Step: 44510 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 05:15:37,434-Speed 9395.58 samples/sec Loss 1.2039 LearningRate 0.0002 Epoch: 25 Global Step: 44520 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-06 05:16:03,570-Speed 9403.86 samples/sec Loss 1.2096 LearningRate 0.0002 Epoch: 25 Global Step: 44530 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-06 05:16:29,743-Speed 9390.16 samples/sec Loss 1.2109 LearningRate 0.0002 Epoch: 25 Global Step: 44540 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 05:16:55,903-Speed 9394.73 samples/sec Loss 1.2053 LearningRate 0.0002 Epoch: 25 Global Step: 44550 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 05:17:22,061-Speed 9395.68 samples/sec Loss 1.2027 LearningRate 0.0002 Epoch: 25 Global Step: 44560 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 05:17:48,194-Speed 9404.75 samples/sec Loss 1.2081 LearningRate 0.0002 Epoch: 25 Global Step: 44570 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 05:18:14,267-Speed 9426.09 samples/sec Loss 1.2105 LearningRate 0.0002 Epoch: 25 Global Step: 44580 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-06 05:18:40,383-Speed 9410.90 samples/sec Loss 1.2005 LearningRate 0.0002 Epoch: 25 Global Step: 44590 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-06 05:19:06,596-Speed 9375.90 samples/sec Loss 1.2026 LearningRate 0.0002 Epoch: 25 Global Step: 44600 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-06 05:19:32,753-Speed 9395.79 samples/sec Loss 1.1974 LearningRate 0.0002 Epoch: 25 Global Step: 44610 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-06 05:19:58,983-Speed 9370.19 samples/sec Loss 1.2090 LearningRate 0.0002 Epoch: 25 Global Step: 44620 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-06 05:20:25,189-Speed 9378.44 samples/sec Loss 1.2045 LearningRate 0.0002 Epoch: 25 Global Step: 44630 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-06 05:20:51,336-Speed 9399.39 samples/sec Loss 1.2005 LearningRate 0.0002 Epoch: 25 Global Step: 44640 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-06 05:21:17,463-Speed 9406.95 samples/sec Loss 1.1994 LearningRate 0.0002 Epoch: 25 Global Step: 44650 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-06 05:21:43,663-Speed 9380.24 samples/sec Loss 1.2044 LearningRate 0.0002 Epoch: 25 Global Step: 44660 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-06 05:22:09,955-Speed 9348.05 samples/sec Loss 1.2035 LearningRate 0.0002 Epoch: 25 Global Step: 44670 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-06 05:22:36,146-Speed 9383.69 samples/sec Loss 1.1984 LearningRate 0.0002 Epoch: 25 Global Step: 44680 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 05:23:02,558-Speed 9305.16 samples/sec Loss 1.2034 LearningRate 0.0002 Epoch: 25 Global Step: 44690 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 05:23:28,941-Speed 9315.22 samples/sec Loss 1.1998 LearningRate 0.0002 Epoch: 25 Global Step: 44700 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 05:23:55,376-Speed 9297.17 samples/sec Loss 1.1953 LearningRate 0.0002 Epoch: 25 Global Step: 44710 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 05:24:21,754-Speed 9317.51 samples/sec Loss 1.1976 LearningRate 0.0002 Epoch: 25 Global Step: 44720 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 05:24:48,152-Speed 9310.04 samples/sec Loss 1.2084 LearningRate 0.0002 Epoch: 25 Global Step: 44730 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 05:25:14,733-Speed 9246.05 samples/sec Loss 1.2004 LearningRate 0.0002 Epoch: 25 Global Step: 44740 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 05:25:41,120-Speed 9313.92 samples/sec Loss 1.1959 LearningRate 0.0002 Epoch: 25 Global Step: 44750 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 05:26:07,592-Speed 9284.45 samples/sec Loss 1.1869 LearningRate 0.0002 Epoch: 25 Global Step: 44760 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 05:26:34,103-Speed 9270.20 samples/sec Loss 1.1935 LearningRate 0.0002 Epoch: 25 Global Step: 44770 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 05:27:00,527-Speed 9301.15 samples/sec Loss 1.1934 LearningRate 0.0002 Epoch: 25 Global Step: 44780 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-06 05:27:27,101-Speed 9248.64 samples/sec Loss 1.1980 LearningRate 0.0002 Epoch: 25 Global Step: 44790 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-06 05:27:53,803-Speed 9204.43 samples/sec Loss 1.2010 LearningRate 0.0002 Epoch: 25 Global Step: 44800 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-06 05:28:20,407-Speed 9238.05 samples/sec Loss 1.2046 LearningRate 0.0002 Epoch: 25 Global Step: 44810 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-06 05:28:47,162-Speed 9185.83 samples/sec Loss 1.2024 LearningRate 0.0002 Epoch: 25 Global Step: 44820 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-06 05:29:13,428-Speed 9357.13 samples/sec Loss 1.1991 LearningRate 0.0002 Epoch: 25 Global Step: 44830 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-06 05:29:39,640-Speed 9376.32 samples/sec Loss 1.1969 LearningRate 0.0002 Epoch: 25 Global Step: 44840 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-06 05:30:05,834-Speed 9382.53 samples/sec Loss 1.2062 LearningRate 0.0002 Epoch: 25 Global Step: 44850 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-06 05:30:32,114-Speed 9352.07 samples/sec Loss 1.2033 LearningRate 0.0002 Epoch: 25 Global Step: 44860 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-06 05:30:58,373-Speed 9360.17 samples/sec Loss 1.1939 LearningRate 0.0002 Epoch: 25 Global Step: 44870 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-06 05:31:24,742-Speed 9320.35 samples/sec Loss 1.2018 LearningRate 0.0002 Epoch: 25 Global Step: 44880 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-06 05:31:51,272-Speed 9263.94 samples/sec Loss 1.2131 LearningRate 0.0002 Epoch: 25 Global Step: 44890 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-06 05:32:17,691-Speed 9302.94 samples/sec Loss 1.2041 LearningRate 0.0002 Epoch: 25 Global Step: 44900 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-06 05:32:44,166-Speed 9283.07 samples/sec Loss 1.1942 LearningRate 0.0002 Epoch: 25 Global Step: 44910 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-06 05:33:10,682-Speed 9268.54 samples/sec Loss 1.2083 LearningRate 0.0002 Epoch: 25 Global Step: 44920 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-06 05:33:37,284-Speed 9239.05 samples/sec Loss 1.2012 LearningRate 0.0002 Epoch: 25 Global Step: 44930 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-06 05:34:57,034-Speed 3081.65 samples/sec Loss 1.1992 LearningRate 0.0002 Epoch: 26 Global Step: 44940 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-06 05:35:22,987-Speed 9469.93 samples/sec Loss 1.1890 LearningRate 0.0002 Epoch: 26 Global Step: 44950 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-06 05:35:48,938-Speed 9470.61 samples/sec Loss 1.1850 LearningRate 0.0002 Epoch: 26 Global Step: 44960 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-06 05:36:15,036-Speed 9417.67 samples/sec Loss 1.1781 LearningRate 0.0002 Epoch: 26 Global Step: 44970 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-06 05:36:41,117-Speed 9423.44 samples/sec Loss 1.1857 LearningRate 0.0002 Epoch: 26 Global Step: 44980 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-06 05:37:07,219-Speed 9415.86 samples/sec Loss 1.1748 LearningRate 0.0002 Epoch: 26 Global Step: 44990 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-06 05:37:33,293-Speed 9425.67 samples/sec Loss 1.1854 LearningRate 0.0002 Epoch: 26 Global Step: 45000 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-06 05:37:59,340-Speed 9435.60 samples/sec Loss 1.1899 LearningRate 0.0002 Epoch: 26 Global Step: 45010 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-06 05:38:25,511-Speed 9391.10 samples/sec Loss 1.1848 LearningRate 0.0002 Epoch: 26 Global Step: 45020 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-06 05:38:51,586-Speed 9425.64 samples/sec Loss 1.1719 LearningRate 0.0001 Epoch: 26 Global Step: 45030 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-06 05:39:17,774-Speed 9385.01 samples/sec Loss 1.1709 LearningRate 0.0001 Epoch: 26 Global Step: 45040 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-06 05:39:43,863-Speed 9420.37 samples/sec Loss 1.1753 LearningRate 0.0001 Epoch: 26 Global Step: 45050 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-06 05:40:10,000-Speed 9404.12 samples/sec Loss 1.1867 LearningRate 0.0001 Epoch: 26 Global Step: 45060 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-06 05:40:36,171-Speed 9391.07 samples/sec Loss 1.1839 LearningRate 0.0001 Epoch: 26 Global Step: 45070 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-06 05:41:02,323-Speed 9397.75 samples/sec Loss 1.1784 LearningRate 0.0001 Epoch: 26 Global Step: 45080 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 05:41:28,462-Speed 9402.71 samples/sec Loss 1.1842 LearningRate 0.0001 Epoch: 26 Global Step: 45090 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 05:41:54,568-Speed 9414.18 samples/sec Loss 1.1859 LearningRate 0.0001 Epoch: 26 Global Step: 45100 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 05:42:20,642-Speed 9425.71 samples/sec Loss 1.1720 LearningRate 0.0001 Epoch: 26 Global Step: 45110 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 05:42:46,820-Speed 9388.48 samples/sec Loss 1.1849 LearningRate 0.0001 Epoch: 26 Global Step: 45120 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 05:43:13,077-Speed 9360.32 samples/sec Loss 1.1830 LearningRate 0.0001 Epoch: 26 Global Step: 45130 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 05:43:39,280-Speed 9379.20 samples/sec Loss 1.1747 LearningRate 0.0001 Epoch: 26 Global Step: 45140 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 05:44:05,551-Speed 9355.25 samples/sec Loss 1.1814 LearningRate 0.0001 Epoch: 26 Global Step: 45150 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 05:44:31,721-Speed 9391.12 samples/sec Loss 1.1881 LearningRate 0.0001 Epoch: 26 Global Step: 45160 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 05:44:57,870-Speed 9398.86 samples/sec Loss 1.1787 LearningRate 0.0001 Epoch: 26 Global Step: 45170 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 05:45:24,072-Speed 9380.02 samples/sec Loss 1.1772 LearningRate 0.0001 Epoch: 26 Global Step: 45180 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-06 05:45:50,298-Speed 9371.18 samples/sec Loss 1.1848 LearningRate 0.0001 Epoch: 26 Global Step: 45190 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-06 05:46:16,558-Speed 9359.15 samples/sec Loss 1.1913 LearningRate 0.0001 Epoch: 26 Global Step: 45200 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-06 05:46:42,865-Speed 9342.39 samples/sec Loss 1.1842 LearningRate 0.0001 Epoch: 26 Global Step: 45210 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-06 05:47:09,344-Speed 9281.86 samples/sec Loss 1.1800 LearningRate 0.0001 Epoch: 26 Global Step: 45220 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-06 05:47:35,814-Speed 9285.12 samples/sec Loss 1.1846 LearningRate 0.0001 Epoch: 26 Global Step: 45230 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-06 05:48:02,264-Speed 9291.69 samples/sec Loss 1.1901 LearningRate 0.0001 Epoch: 26 Global Step: 45240 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-06 05:48:28,534-Speed 9356.15 samples/sec Loss 1.1803 LearningRate 0.0001 Epoch: 26 Global Step: 45250 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 05:48:54,948-Speed 9304.33 samples/sec Loss 1.1823 LearningRate 0.0001 Epoch: 26 Global Step: 45260 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 05:49:21,291-Speed 9329.78 samples/sec Loss 1.1883 LearningRate 0.0001 Epoch: 26 Global Step: 45270 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 05:49:47,637-Speed 9328.62 samples/sec Loss 1.1810 LearningRate 0.0001 Epoch: 26 Global Step: 45280 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 05:50:13,984-Speed 9328.40 samples/sec Loss 1.1814 LearningRate 0.0001 Epoch: 26 Global Step: 45290 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 05:50:40,446-Speed 9287.55 samples/sec Loss 1.1885 LearningRate 0.0001 Epoch: 26 Global Step: 45300 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 05:51:06,746-Speed 9345.17 samples/sec Loss 1.1771 LearningRate 0.0001 Epoch: 26 Global Step: 45310 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 05:51:33,143-Speed 9310.39 samples/sec Loss 1.1771 LearningRate 0.0001 Epoch: 26 Global Step: 45320 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 05:51:59,528-Speed 9315.09 samples/sec Loss 1.1766 LearningRate 0.0001 Epoch: 26 Global Step: 45330 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-06 05:52:25,955-Speed 9299.87 samples/sec Loss 1.1744 LearningRate 0.0001 Epoch: 26 Global Step: 45340 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-06 05:52:52,388-Speed 9297.87 samples/sec Loss 1.1800 LearningRate 0.0001 Epoch: 26 Global Step: 45350 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 05:53:18,813-Speed 9300.73 samples/sec Loss 1.1732 LearningRate 0.0001 Epoch: 26 Global Step: 45360 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 05:53:45,367-Speed 9256.20 samples/sec Loss 1.1780 LearningRate 0.0001 Epoch: 26 Global Step: 45370 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 05:54:11,797-Speed 9298.99 samples/sec Loss 1.1786 LearningRate 0.0001 Epoch: 26 Global Step: 45380 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 05:54:38,313-Speed 9268.62 samples/sec Loss 1.1803 LearningRate 0.0001 Epoch: 26 Global Step: 45390 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 05:55:04,830-Speed 9268.48 samples/sec Loss 1.1699 LearningRate 0.0001 Epoch: 26 Global Step: 45400 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 05:55:31,477-Speed 9223.31 samples/sec Loss 1.1836 LearningRate 0.0001 Epoch: 26 Global Step: 45410 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 05:55:58,137-Speed 9218.87 samples/sec Loss 1.1758 LearningRate 0.0001 Epoch: 26 Global Step: 45420 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 05:56:24,772-Speed 9227.23 samples/sec Loss 1.1693 LearningRate 0.0001 Epoch: 26 Global Step: 45430 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 05:56:51,368-Speed 9240.99 samples/sec Loss 1.1771 LearningRate 0.0001 Epoch: 26 Global Step: 45440 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-06 05:57:17,818-Speed 9291.91 samples/sec Loss 1.1766 LearningRate 0.0001 Epoch: 26 Global Step: 45450 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-06 05:57:44,320-Speed 9273.71 samples/sec Loss 1.1670 LearningRate 0.0001 Epoch: 26 Global Step: 45460 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-06 05:58:10,791-Speed 9284.73 samples/sec Loss 1.1733 LearningRate 0.0001 Epoch: 26 Global Step: 45470 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-06 05:58:37,232-Speed 9295.04 samples/sec Loss 1.1750 LearningRate 0.0001 Epoch: 26 Global Step: 45480 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-06 05:59:03,526-Speed 9347.02 samples/sec Loss 1.1701 LearningRate 0.0001 Epoch: 26 Global Step: 45490 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-06 05:59:29,760-Speed 9368.59 samples/sec Loss 1.1731 LearningRate 0.0001 Epoch: 26 Global Step: 45500 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-06 05:59:56,021-Speed 9359.02 samples/sec Loss 1.1718 LearningRate 0.0001 Epoch: 26 Global Step: 45510 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-06 06:00:22,210-Speed 9384.09 samples/sec Loss 1.1797 LearningRate 0.0001 Epoch: 26 Global Step: 45520 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-06 06:00:48,531-Speed 9337.48 samples/sec Loss 1.1693 LearningRate 0.0001 Epoch: 26 Global Step: 45530 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-06 06:01:14,776-Speed 9364.56 samples/sec Loss 1.1717 LearningRate 0.0001 Epoch: 26 Global Step: 45540 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-06 06:01:41,039-Speed 9357.97 samples/sec Loss 1.1717 LearningRate 0.0001 Epoch: 26 Global Step: 45550 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-06 06:02:07,267-Speed 9370.90 samples/sec Loss 1.1755 LearningRate 0.0001 Epoch: 26 Global Step: 45560 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-06 06:02:33,467-Speed 9380.33 samples/sec Loss 1.1767 LearningRate 0.0001 Epoch: 26 Global Step: 45570 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-06 06:02:59,670-Speed 9379.39 samples/sec Loss 1.1683 LearningRate 0.0001 Epoch: 26 Global Step: 45580 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-06 06:03:25,966-Speed 9346.58 samples/sec Loss 1.1724 LearningRate 0.0001 Epoch: 26 Global Step: 45590 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-06 06:03:54,394-Speed 8645.40 samples/sec Loss 1.1694 LearningRate 0.0001 Epoch: 26 Global Step: 45600 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-06 06:04:20,536-Speed 9401.46 samples/sec Loss 1.1725 LearningRate 0.0001 Epoch: 26 Global Step: 45610 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-06 06:04:46,744-Speed 9377.71 samples/sec Loss 1.1791 LearningRate 0.0001 Epoch: 26 Global Step: 45620 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-06 06:05:13,034-Speed 9348.03 samples/sec Loss 1.1730 LearningRate 0.0001 Epoch: 26 Global Step: 45630 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-06 06:05:39,277-Speed 9365.29 samples/sec Loss 1.1710 LearningRate 0.0001 Epoch: 26 Global Step: 45640 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:06:05,538-Speed 9358.73 samples/sec Loss 1.1597 LearningRate 0.0001 Epoch: 26 Global Step: 45650 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:06:31,733-Speed 9382.47 samples/sec Loss 1.1729 LearningRate 0.0001 Epoch: 26 Global Step: 45660 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:06:57,936-Speed 9379.31 samples/sec Loss 1.1800 LearningRate 0.0001 Epoch: 26 Global Step: 45670 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:07:24,061-Speed 9407.51 samples/sec Loss 1.1565 LearningRate 0.0001 Epoch: 26 Global Step: 45680 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:07:50,276-Speed 9375.52 samples/sec Loss 1.1710 LearningRate 0.0001 Epoch: 26 Global Step: 45690 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:08:16,552-Speed 9353.33 samples/sec Loss 1.1655 LearningRate 0.0001 Epoch: 26 Global Step: 45700 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:08:42,726-Speed 9390.26 samples/sec Loss 1.1767 LearningRate 0.0001 Epoch: 26 Global Step: 45710 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:09:08,935-Speed 9377.41 samples/sec Loss 1.1602 LearningRate 0.0001 Epoch: 26 Global Step: 45720 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:09:35,120-Speed 9385.71 samples/sec Loss 1.1653 LearningRate 0.0001 Epoch: 26 Global Step: 45730 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:10:01,223-Speed 9415.39 samples/sec Loss 1.1663 LearningRate 0.0001 Epoch: 26 Global Step: 45740 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:10:27,419-Speed 9381.94 samples/sec Loss 1.1600 LearningRate 0.0001 Epoch: 26 Global Step: 45750 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:10:53,571-Speed 9397.87 samples/sec Loss 1.1559 LearningRate 0.0001 Epoch: 26 Global Step: 45760 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:11:19,713-Speed 9401.38 samples/sec Loss 1.1730 LearningRate 0.0001 Epoch: 26 Global Step: 45770 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-06 06:11:45,928-Speed 9375.15 samples/sec Loss 1.1716 LearningRate 0.0001 Epoch: 26 Global Step: 45780 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-06 06:12:12,151-Speed 9372.51 samples/sec Loss 1.1659 LearningRate 0.0001 Epoch: 26 Global Step: 45790 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-06 06:12:38,293-Speed 9401.20 samples/sec Loss 1.1662 LearningRate 0.0001 Epoch: 26 Global Step: 45800 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-06 06:13:04,544-Speed 9362.27 samples/sec Loss 1.1696 LearningRate 0.0001 Epoch: 26 Global Step: 45810 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-06 06:13:30,929-Speed 9314.81 samples/sec Loss 1.1603 LearningRate 0.0001 Epoch: 26 Global Step: 45820 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-06 06:13:57,174-Speed 9364.43 samples/sec Loss 1.1519 LearningRate 0.0001 Epoch: 26 Global Step: 45830 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-06 06:14:23,333-Speed 9395.08 samples/sec Loss 1.1555 LearningRate 0.0001 Epoch: 26 Global Step: 45840 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-06 06:14:49,511-Speed 9388.27 samples/sec Loss 1.1507 LearningRate 0.0001 Epoch: 26 Global Step: 45850 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-06 06:15:15,693-Speed 9387.16 samples/sec Loss 1.1623 LearningRate 0.0001 Epoch: 26 Global Step: 45860 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-06 06:15:41,815-Speed 9408.54 samples/sec Loss 1.1622 LearningRate 0.0001 Epoch: 26 Global Step: 45870 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:16:07,992-Speed 9388.56 samples/sec Loss 1.1520 LearningRate 0.0001 Epoch: 26 Global Step: 45880 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:16:34,191-Speed 9380.96 samples/sec Loss 1.1616 LearningRate 0.0001 Epoch: 26 Global Step: 45890 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-03-06 06:17:00,410-Speed 9373.78 samples/sec Loss 1.1675 LearningRate 0.0001 Epoch: 26 Global Step: 45900 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-03-06 06:17:26,825-Speed 9304.46 samples/sec Loss 1.1581 LearningRate 0.0001 Epoch: 26 Global Step: 45910 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-03-06 06:17:53,037-Speed 9376.20 samples/sec Loss 1.1569 LearningRate 0.0001 Epoch: 26 Global Step: 45920 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-03-06 06:18:19,396-Speed 9324.10 samples/sec Loss 1.1582 LearningRate 0.0001 Epoch: 26 Global Step: 45930 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-03-06 06:18:45,746-Speed 9326.92 samples/sec Loss 1.1477 LearningRate 0.0001 Epoch: 26 Global Step: 45940 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-03-06 06:19:11,936-Speed 9384.08 samples/sec Loss 1.1666 LearningRate 0.0001 Epoch: 26 Global Step: 45950 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-03-06 06:19:38,129-Speed 9383.45 samples/sec Loss 1.1538 LearningRate 0.0001 Epoch: 26 Global Step: 45960 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-03-06 06:20:04,426-Speed 9345.72 samples/sec Loss 1.1518 LearningRate 0.0001 Epoch: 26 Global Step: 45970 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-03-06 06:20:30,677-Speed 9362.76 samples/sec Loss 1.1575 LearningRate 0.0001 Epoch: 26 Global Step: 45980 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-03-06 06:20:56,818-Speed 9401.58 samples/sec Loss 1.1571 LearningRate 0.0001 Epoch: 26 Global Step: 45990 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-06 06:21:23,025-Speed 9378.38 samples/sec Loss 1.1661 LearningRate 0.0001 Epoch: 26 Global Step: 46000 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-06 06:21:49,227-Speed 9379.47 samples/sec Loss 1.1571 LearningRate 0.0001 Epoch: 26 Global Step: 46010 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-06 06:22:15,486-Speed 9359.50 samples/sec Loss 1.1591 LearningRate 0.0001 Epoch: 26 Global Step: 46020 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-06 06:22:41,678-Speed 9383.49 samples/sec Loss 1.1481 LearningRate 0.0001 Epoch: 26 Global Step: 46030 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-06 06:23:07,890-Speed 9376.34 samples/sec Loss 1.1507 LearningRate 0.0001 Epoch: 26 Global Step: 46040 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-06 06:23:34,054-Speed 9393.43 samples/sec Loss 1.1451 LearningRate 0.0001 Epoch: 26 Global Step: 46050 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-06 06:24:00,311-Speed 9360.09 samples/sec Loss 1.1555 LearningRate 0.0001 Epoch: 26 Global Step: 46060 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-06 06:24:26,494-Speed 9386.86 samples/sec Loss 1.1554 LearningRate 0.0001 Epoch: 26 Global Step: 46070 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-06 06:24:52,691-Speed 9381.49 samples/sec Loss 1.1566 LearningRate 0.0001 Epoch: 26 Global Step: 46080 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-06 06:25:19,068-Speed 9317.27 samples/sec Loss 1.1469 LearningRate 0.0001 Epoch: 26 Global Step: 46090 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:25:45,561-Speed 9276.79 samples/sec Loss 1.1449 LearningRate 0.0001 Epoch: 26 Global Step: 46100 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:26:11,893-Speed 9333.49 samples/sec Loss 1.1520 LearningRate 0.0001 Epoch: 26 Global Step: 46110 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:26:38,243-Speed 9327.31 samples/sec Loss 1.1483 LearningRate 0.0001 Epoch: 26 Global Step: 46120 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:27:04,642-Speed 9309.79 samples/sec Loss 1.1470 LearningRate 0.0001 Epoch: 26 Global Step: 46130 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:27:30,969-Speed 9335.11 samples/sec Loss 1.1533 LearningRate 0.0001 Epoch: 26 Global Step: 46140 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:27:57,289-Speed 9338.11 samples/sec Loss 1.1547 LearningRate 0.0001 Epoch: 26 Global Step: 46150 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:28:23,567-Speed 9352.65 samples/sec Loss 1.1548 LearningRate 0.0001 Epoch: 26 Global Step: 46160 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:28:49,976-Speed 9306.09 samples/sec Loss 1.1440 LearningRate 0.0001 Epoch: 26 Global Step: 46170 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:29:16,233-Speed 9360.24 samples/sec Loss 1.1472 LearningRate 0.0001 Epoch: 26 Global Step: 46180 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:29:42,560-Speed 9335.47 samples/sec Loss 1.1561 LearningRate 0.0001 Epoch: 26 Global Step: 46190 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-06 06:30:08,772-Speed 9376.61 samples/sec Loss 1.1494 LearningRate 0.0001 Epoch: 26 Global Step: 46200 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:30:34,978-Speed 9378.51 samples/sec Loss 1.1463 LearningRate 0.0001 Epoch: 26 Global Step: 46210 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:31:01,182-Speed 9378.90 samples/sec Loss 1.1494 LearningRate 0.0001 Epoch: 26 Global Step: 46220 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:31:27,417-Speed 9368.06 samples/sec Loss 1.1475 LearningRate 0.0001 Epoch: 26 Global Step: 46230 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:31:53,716-Speed 9345.46 samples/sec Loss 1.1538 LearningRate 0.0001 Epoch: 26 Global Step: 46240 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:32:19,987-Speed 9355.24 samples/sec Loss 1.1528 LearningRate 0.0001 Epoch: 26 Global Step: 46250 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:32:46,195-Speed 9377.72 samples/sec Loss 1.1598 LearningRate 0.0001 Epoch: 26 Global Step: 46260 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:33:12,470-Speed 9353.72 samples/sec Loss 1.1511 LearningRate 0.0001 Epoch: 26 Global Step: 46270 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:33:38,681-Speed 9376.67 samples/sec Loss 1.1420 LearningRate 0.0001 Epoch: 26 Global Step: 46280 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:34:04,898-Speed 9374.31 samples/sec Loss 1.1436 LearningRate 0.0001 Epoch: 26 Global Step: 46290 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:34:31,053-Speed 9396.78 samples/sec Loss 1.1486 LearningRate 0.0001 Epoch: 26 Global Step: 46300 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:34:57,199-Speed 9399.96 samples/sec Loss 1.1501 LearningRate 0.0001 Epoch: 26 Global Step: 46310 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:35:23,395-Speed 9382.13 samples/sec Loss 1.1469 LearningRate 0.0001 Epoch: 26 Global Step: 46320 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:35:49,705-Speed 9341.17 samples/sec Loss 1.1368 LearningRate 0.0001 Epoch: 26 Global Step: 46330 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:36:15,879-Speed 9389.89 samples/sec Loss 1.1471 LearningRate 0.0001 Epoch: 26 Global Step: 46340 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:36:42,066-Speed 9385.47 samples/sec Loss 1.1441 LearningRate 0.0001 Epoch: 26 Global Step: 46350 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:37:08,309-Speed 9365.04 samples/sec Loss 1.1474 LearningRate 0.0001 Epoch: 26 Global Step: 46360 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:37:34,523-Speed 9375.30 samples/sec Loss 1.1444 LearningRate 0.0001 Epoch: 26 Global Step: 46370 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:38:00,752-Speed 9370.30 samples/sec Loss 1.1457 LearningRate 0.0001 Epoch: 26 Global Step: 46380 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:38:26,892-Speed 9402.00 samples/sec Loss 1.1518 LearningRate 0.0001 Epoch: 26 Global Step: 46390 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:38:52,987-Speed 9418.11 samples/sec Loss 1.1434 LearningRate 0.0001 Epoch: 26 Global Step: 46400 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-06 06:39:19,093-Speed 9414.56 samples/sec Loss 1.1488 LearningRate 0.0001 Epoch: 26 Global Step: 46410 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:39:45,280-Speed 9385.36 samples/sec Loss 1.1388 LearningRate 0.0001 Epoch: 26 Global Step: 46420 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:40:11,500-Speed 9373.62 samples/sec Loss 1.1491 LearningRate 0.0001 Epoch: 26 Global Step: 46430 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:40:37,649-Speed 9399.01 samples/sec Loss 1.1515 LearningRate 0.0001 Epoch: 26 Global Step: 46440 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:41:03,956-Speed 9342.02 samples/sec Loss 1.1407 LearningRate 0.0001 Epoch: 26 Global Step: 46450 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:41:30,149-Speed 9383.35 samples/sec Loss 1.1484 LearningRate 0.0001 Epoch: 26 Global Step: 46460 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:41:56,417-Speed 9356.36 samples/sec Loss 1.1434 LearningRate 0.0001 Epoch: 26 Global Step: 46470 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:42:22,652-Speed 9368.14 samples/sec Loss 1.1332 LearningRate 0.0001 Epoch: 26 Global Step: 46480 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:42:48,858-Speed 9378.19 samples/sec Loss 1.1453 LearningRate 0.0001 Epoch: 26 Global Step: 46490 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:43:15,091-Speed 9368.97 samples/sec Loss 1.1393 LearningRate 0.0001 Epoch: 26 Global Step: 46500 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:43:41,221-Speed 9405.54 samples/sec Loss 1.1356 LearningRate 0.0001 Epoch: 26 Global Step: 46510 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:44:07,462-Speed 9365.96 samples/sec Loss 1.1430 LearningRate 0.0001 Epoch: 26 Global Step: 46520 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:44:33,823-Speed 9323.23 samples/sec Loss 1.1356 LearningRate 0.0001 Epoch: 26 Global Step: 46530 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:45:00,001-Speed 9388.16 samples/sec Loss 1.1414 LearningRate 0.0001 Epoch: 26 Global Step: 46540 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:45:26,162-Speed 9394.50 samples/sec Loss 1.1504 LearningRate 0.0001 Epoch: 26 Global Step: 46550 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-06 06:45:52,356-Speed 9382.87 samples/sec Loss 1.1387 LearningRate 0.0001 Epoch: 26 Global Step: 46560 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-06 06:46:18,600-Speed 9364.90 samples/sec Loss 1.1373 LearningRate 0.0001 Epoch: 26 Global Step: 46570 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-06 06:46:44,777-Speed 9388.82 samples/sec Loss 1.1463 LearningRate 0.0001 Epoch: 26 Global Step: 46580 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-06 06:47:11,029-Speed 9361.84 samples/sec Loss 1.1415 LearningRate 0.0001 Epoch: 26 Global Step: 46590 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-06 06:47:37,257-Speed 9370.53 samples/sec Loss 1.1449 LearningRate 0.0001 Epoch: 26 Global Step: 46600 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-06 06:48:03,443-Speed 9385.43 samples/sec Loss 1.1509 LearningRate 0.0001 Epoch: 26 Global Step: 46610 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-06 06:48:29,785-Speed 9330.27 samples/sec Loss 1.1393 LearningRate 0.0001 Epoch: 26 Global Step: 46620 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-06 06:48:56,098-Speed 9340.18 samples/sec Loss 1.1403 LearningRate 0.0001 Epoch: 26 Global Step: 46630 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-06 06:49:22,291-Speed 9383.09 samples/sec Loss 1.1473 LearningRate 0.0001 Epoch: 26 Global Step: 46640 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-06 06:49:48,619-Speed 9334.68 samples/sec Loss 1.1502 LearningRate 0.0001 Epoch: 26 Global Step: 46650 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:50:14,957-Speed 9331.38 samples/sec Loss 1.1462 LearningRate 0.0001 Epoch: 26 Global Step: 46660 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:51:34,869-Speed 3075.44 samples/sec Loss 1.1319 LearningRate 0.0001 Epoch: 27 Global Step: 46670 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:52:00,844-Speed 9461.70 samples/sec Loss 1.1222 LearningRate 0.0001 Epoch: 27 Global Step: 46680 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:52:26,860-Speed 9447.16 samples/sec Loss 1.1239 LearningRate 0.0001 Epoch: 27 Global Step: 46690 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-06 06:52:52,976-Speed 9410.86 samples/sec Loss 1.1236 LearningRate 0.0001 Epoch: 27 Global Step: 46700 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 06:53:19,040-Speed 9429.63 samples/sec Loss 1.1236 LearningRate 0.0001 Epoch: 27 Global Step: 46710 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-06 06:53:45,068-Speed 9442.33 samples/sec Loss 1.1253 LearningRate 0.0001 Epoch: 27 Global Step: 46720 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-06 06:54:11,192-Speed 9408.08 samples/sec Loss 1.1304 LearningRate 0.0001 Epoch: 27 Global Step: 46730 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-06 06:54:37,373-Speed 9387.33 samples/sec Loss 1.1274 LearningRate 0.0001 Epoch: 27 Global Step: 46740 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-06 06:55:03,591-Speed 9374.58 samples/sec Loss 1.1273 LearningRate 0.0001 Epoch: 27 Global Step: 46750 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-06 06:55:29,949-Speed 9324.17 samples/sec Loss 1.1197 LearningRate 0.0001 Epoch: 27 Global Step: 46760 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-06 06:55:56,336-Speed 9314.20 samples/sec Loss 1.1229 LearningRate 0.0001 Epoch: 27 Global Step: 46770 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-06 06:56:22,591-Speed 9360.62 samples/sec Loss 1.1289 LearningRate 0.0001 Epoch: 27 Global Step: 46780 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-06 06:56:48,955-Speed 9322.11 samples/sec Loss 1.1256 LearningRate 0.0001 Epoch: 27 Global Step: 46790 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-06 06:57:15,303-Speed 9328.16 samples/sec Loss 1.1247 LearningRate 0.0001 Epoch: 27 Global Step: 46800 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-06 06:57:41,646-Speed 9329.37 samples/sec Loss 1.1234 LearningRate 0.0001 Epoch: 27 Global Step: 46810 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 06:58:07,957-Speed 9340.92 samples/sec Loss 1.1338 LearningRate 0.0001 Epoch: 27 Global Step: 46820 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 06:58:34,141-Speed 9386.62 samples/sec Loss 1.1229 LearningRate 0.0001 Epoch: 27 Global Step: 46830 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 06:59:00,532-Speed 9312.63 samples/sec Loss 1.1205 LearningRate 0.0001 Epoch: 27 Global Step: 46840 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 06:59:26,861-Speed 9334.53 samples/sec Loss 1.1222 LearningRate 0.0001 Epoch: 27 Global Step: 46850 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 06:59:53,244-Speed 9315.42 samples/sec Loss 1.1284 LearningRate 0.0001 Epoch: 27 Global Step: 46860 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:00:19,511-Speed 9356.49 samples/sec Loss 1.1232 LearningRate 0.0001 Epoch: 27 Global Step: 46870 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:00:45,752-Speed 9366.16 samples/sec Loss 1.1343 LearningRate 0.0001 Epoch: 27 Global Step: 46880 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:01:11,917-Speed 9393.05 samples/sec Loss 1.1312 LearningRate 0.0001 Epoch: 27 Global Step: 46890 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:01:38,082-Speed 9392.70 samples/sec Loss 1.1254 LearningRate 0.0001 Epoch: 27 Global Step: 46900 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:02:04,339-Speed 9360.27 samples/sec Loss 1.1252 LearningRate 0.0001 Epoch: 27 Global Step: 46910 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-06 07:02:30,496-Speed 9396.73 samples/sec Loss 1.1291 LearningRate 0.0001 Epoch: 27 Global Step: 46920 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:02:56,735-Speed 9366.49 samples/sec Loss 1.1225 LearningRate 0.0001 Epoch: 27 Global Step: 46930 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:03:23,036-Speed 9344.81 samples/sec Loss 1.1233 LearningRate 0.0001 Epoch: 27 Global Step: 46940 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:03:49,277-Speed 9365.77 samples/sec Loss 1.1286 LearningRate 0.0001 Epoch: 27 Global Step: 46950 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:04:15,696-Speed 9302.76 samples/sec Loss 1.1265 LearningRate 0.0001 Epoch: 27 Global Step: 46960 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:04:41,955-Speed 9359.70 samples/sec Loss 1.1252 LearningRate 0.0001 Epoch: 27 Global Step: 46970 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:05:08,447-Speed 9277.04 samples/sec Loss 1.1239 LearningRate 0.0001 Epoch: 27 Global Step: 46980 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:05:34,809-Speed 9322.65 samples/sec Loss 1.1253 LearningRate 0.0001 Epoch: 27 Global Step: 46990 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:06:01,143-Speed 9332.93 samples/sec Loss 1.1219 LearningRate 0.0001 Epoch: 27 Global Step: 47000 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:06:27,538-Speed 9311.02 samples/sec Loss 1.1223 LearningRate 0.0001 Epoch: 27 Global Step: 47010 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:06:53,812-Speed 9354.31 samples/sec Loss 1.1291 LearningRate 0.0001 Epoch: 27 Global Step: 47020 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:07:20,114-Speed 9343.93 samples/sec Loss 1.1236 LearningRate 0.0001 Epoch: 27 Global Step: 47030 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:07:46,447-Speed 9333.35 samples/sec Loss 1.1292 LearningRate 0.0001 Epoch: 27 Global Step: 47040 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:08:12,679-Speed 9369.63 samples/sec Loss 1.1311 LearningRate 0.0001 Epoch: 27 Global Step: 47050 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:08:38,891-Speed 9376.02 samples/sec Loss 1.1243 LearningRate 0.0001 Epoch: 27 Global Step: 47060 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:09:05,145-Speed 9361.50 samples/sec Loss 1.1264 LearningRate 0.0001 Epoch: 27 Global Step: 47070 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:09:31,536-Speed 9312.42 samples/sec Loss 1.1299 LearningRate 0.0001 Epoch: 27 Global Step: 47080 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:09:57,856-Speed 9337.96 samples/sec Loss 1.1230 LearningRate 0.0001 Epoch: 27 Global Step: 47090 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:10:29,416-Speed 7787.45 samples/sec Loss 1.1284 LearningRate 0.0001 Epoch: 27 Global Step: 47100 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:10:55,694-Speed 9352.60 samples/sec Loss 1.1356 LearningRate 0.0001 Epoch: 27 Global Step: 47110 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:11:21,903-Speed 9377.20 samples/sec Loss 1.1209 LearningRate 0.0001 Epoch: 27 Global Step: 47120 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-06 07:11:48,328-Speed 9300.68 samples/sec Loss 1.1194 LearningRate 0.0001 Epoch: 27 Global Step: 47130 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-06 07:12:14,797-Speed 9285.14 samples/sec Loss 1.1127 LearningRate 0.0001 Epoch: 27 Global Step: 47140 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-06 07:12:41,328-Speed 9263.44 samples/sec Loss 1.1150 LearningRate 0.0001 Epoch: 27 Global Step: 47150 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:13:08,006-Speed 9212.76 samples/sec Loss 1.1271 LearningRate 0.0001 Epoch: 27 Global Step: 47160 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:13:34,453-Speed 9292.83 samples/sec Loss 1.1266 LearningRate 0.0001 Epoch: 27 Global Step: 47170 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:14:00,955-Speed 9273.68 samples/sec Loss 1.1278 LearningRate 0.0001 Epoch: 27 Global Step: 47180 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:14:27,369-Speed 9304.49 samples/sec Loss 1.1168 LearningRate 0.0001 Epoch: 27 Global Step: 47190 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:14:53,835-Speed 9286.83 samples/sec Loss 1.1111 LearningRate 0.0001 Epoch: 27 Global Step: 47200 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-06 07:15:20,388-Speed 9255.89 samples/sec Loss 1.1184 LearningRate 0.0001 Epoch: 27 Global Step: 47210 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-06 07:15:46,840-Speed 9291.10 samples/sec Loss 1.1203 LearningRate 0.0001 Epoch: 27 Global Step: 47220 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-06 07:16:13,441-Speed 9239.35 samples/sec Loss 1.1168 LearningRate 0.0001 Epoch: 27 Global Step: 47230 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-03-06 07:16:39,932-Speed 9277.60 samples/sec Loss 1.1141 LearningRate 0.0001 Epoch: 27 Global Step: 47240 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-03-06 07:17:06,595-Speed 9217.50 samples/sec Loss 1.1122 LearningRate 0.0001 Epoch: 27 Global Step: 47250 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-03-06 07:17:33,358-Speed 9182.99 samples/sec Loss 1.1126 LearningRate 0.0001 Epoch: 27 Global Step: 47260 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-03-06 07:17:59,802-Speed 9294.19 samples/sec Loss 1.1213 LearningRate 0.0001 Epoch: 27 Global Step: 47270 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-03-06 07:18:26,574-Speed 9180.03 samples/sec Loss 1.1117 LearningRate 0.0001 Epoch: 27 Global Step: 47280 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-03-06 07:18:53,252-Speed 9212.19 samples/sec Loss 1.1112 LearningRate 0.0001 Epoch: 27 Global Step: 47290 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-03-06 07:19:19,739-Speed 9279.13 samples/sec Loss 1.1243 LearningRate 0.0001 Epoch: 27 Global Step: 47300 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-03-06 07:19:46,200-Speed 9287.82 samples/sec Loss 1.1142 LearningRate 0.0001 Epoch: 27 Global Step: 47310 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-03-06 07:20:12,791-Speed 9242.59 samples/sec Loss 1.1238 LearningRate 0.0001 Epoch: 27 Global Step: 47320 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-03-06 07:20:39,432-Speed 9225.27 samples/sec Loss 1.1212 LearningRate 0.0001 Epoch: 27 Global Step: 47330 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-06 07:21:06,167-Speed 9192.79 samples/sec Loss 1.1099 LearningRate 0.0001 Epoch: 27 Global Step: 47340 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-06 07:21:32,969-Speed 9171.32 samples/sec Loss 1.1197 LearningRate 0.0001 Epoch: 27 Global Step: 47350 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-06 07:21:59,760-Speed 9173.28 samples/sec Loss 1.1196 LearningRate 0.0001 Epoch: 27 Global Step: 47360 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-06 07:22:26,433-Speed 9214.44 samples/sec Loss 1.1115 LearningRate 0.0001 Epoch: 27 Global Step: 47370 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-06 07:22:53,094-Speed 9218.23 samples/sec Loss 1.1061 LearningRate 0.0001 Epoch: 27 Global Step: 47380 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-06 07:23:19,722-Speed 9229.88 samples/sec Loss 1.1073 LearningRate 0.0001 Epoch: 27 Global Step: 47390 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-06 07:23:46,256-Speed 9262.45 samples/sec Loss 1.1167 LearningRate 0.0001 Epoch: 27 Global Step: 47400 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-06 07:24:12,823-Speed 9250.75 samples/sec Loss 1.1140 LearningRate 0.0001 Epoch: 27 Global Step: 47410 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-06 07:24:39,474-Speed 9221.99 samples/sec Loss 1.1216 LearningRate 0.0001 Epoch: 27 Global Step: 47420 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-06 07:25:05,940-Speed 9286.41 samples/sec Loss 1.1128 LearningRate 0.0001 Epoch: 27 Global Step: 47430 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:25:32,669-Speed 9195.23 samples/sec Loss 1.1116 LearningRate 0.0001 Epoch: 27 Global Step: 47440 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:25:59,164-Speed 9276.07 samples/sec Loss 1.1125 LearningRate 0.0001 Epoch: 27 Global Step: 47450 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-06 07:26:25,802-Speed 9226.15 samples/sec Loss 1.1125 LearningRate 0.0001 Epoch: 27 Global Step: 47460 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-06 07:26:52,341-Speed 9261.73 samples/sec Loss 1.1014 LearningRate 0.0001 Epoch: 27 Global Step: 47470 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-06 07:27:19,071-Speed 9194.45 samples/sec Loss 1.1167 LearningRate 0.0001 Epoch: 27 Global Step: 47480 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-06 07:27:45,859-Speed 9174.86 samples/sec Loss 1.1104 LearningRate 0.0001 Epoch: 27 Global Step: 47490 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-06 07:28:12,495-Speed 9226.95 samples/sec Loss 1.1115 LearningRate 0.0001 Epoch: 27 Global Step: 47500 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-06 07:28:39,084-Speed 9243.23 samples/sec Loss 1.1131 LearningRate 0.0001 Epoch: 27 Global Step: 47510 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-06 07:29:05,667-Speed 9245.65 samples/sec Loss 1.1051 LearningRate 0.0001 Epoch: 27 Global Step: 47520 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-06 07:29:32,139-Speed 9284.16 samples/sec Loss 1.1145 LearningRate 0.0001 Epoch: 27 Global Step: 47530 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-06 07:29:58,939-Speed 9170.35 samples/sec Loss 1.1052 LearningRate 0.0001 Epoch: 27 Global Step: 47540 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-06 07:30:25,562-Speed 9231.78 samples/sec Loss 1.1130 LearningRate 0.0001 Epoch: 27 Global Step: 47550 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:30:52,340-Speed 9178.20 samples/sec Loss 1.1074 LearningRate 0.0001 Epoch: 27 Global Step: 47560 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:31:18,981-Speed 9225.51 samples/sec Loss 1.1061 LearningRate 0.0001 Epoch: 27 Global Step: 47570 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:31:45,726-Speed 9189.43 samples/sec Loss 1.1071 LearningRate 0.0001 Epoch: 27 Global Step: 47580 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:32:12,341-Speed 9234.25 samples/sec Loss 1.1070 LearningRate 0.0001 Epoch: 27 Global Step: 47590 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:32:38,920-Speed 9246.81 samples/sec Loss 1.1067 LearningRate 0.0001 Epoch: 27 Global Step: 47600 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:33:05,464-Speed 9258.99 samples/sec Loss 1.1106 LearningRate 0.0001 Epoch: 27 Global Step: 47610 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:33:32,189-Speed 9196.19 samples/sec Loss 1.1022 LearningRate 0.0001 Epoch: 27 Global Step: 47620 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:33:58,854-Speed 9217.01 samples/sec Loss 1.1029 LearningRate 0.0001 Epoch: 27 Global Step: 47630 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:34:25,568-Speed 9199.95 samples/sec Loss 1.1019 LearningRate 0.0001 Epoch: 27 Global Step: 47640 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:34:52,173-Speed 9237.85 samples/sec Loss 1.0928 LearningRate 0.0001 Epoch: 27 Global Step: 47650 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-06 07:35:19,053-Speed 9143.50 samples/sec Loss 1.0953 LearningRate 0.0001 Epoch: 27 Global Step: 47660 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-06 07:35:45,692-Speed 9225.71 samples/sec Loss 1.1118 LearningRate 0.0001 Epoch: 27 Global Step: 47670 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-06 07:36:12,383-Speed 9208.21 samples/sec Loss 1.0955 LearningRate 0.0001 Epoch: 27 Global Step: 47680 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-06 07:36:38,947-Speed 9252.11 samples/sec Loss 1.0963 LearningRate 0.0001 Epoch: 27 Global Step: 47690 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-06 07:37:05,484-Speed 9261.21 samples/sec Loss 1.1044 LearningRate 0.0001 Epoch: 27 Global Step: 47700 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:37:32,072-Speed 9243.86 samples/sec Loss 1.1098 LearningRate 0.0001 Epoch: 27 Global Step: 47710 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:37:58,662-Speed 9243.12 samples/sec Loss 1.1021 LearningRate 0.0001 Epoch: 27 Global Step: 47720 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:38:25,246-Speed 9244.95 samples/sec Loss 1.1008 LearningRate 0.0001 Epoch: 27 Global Step: 47730 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:38:51,763-Speed 9268.34 samples/sec Loss 1.0999 LearningRate 0.0001 Epoch: 27 Global Step: 47740 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:39:18,214-Speed 9291.53 samples/sec Loss 1.1070 LearningRate 0.0001 Epoch: 27 Global Step: 47750 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:39:44,925-Speed 9201.11 samples/sec Loss 1.1038 LearningRate 0.0001 Epoch: 27 Global Step: 47760 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:40:11,445-Speed 9267.39 samples/sec Loss 1.1019 LearningRate 0.0001 Epoch: 27 Global Step: 47770 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:40:37,986-Speed 9260.04 samples/sec Loss 1.1053 LearningRate 0.0001 Epoch: 27 Global Step: 47780 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:41:04,656-Speed 9215.07 samples/sec Loss 1.0996 LearningRate 0.0001 Epoch: 27 Global Step: 47790 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:41:31,261-Speed 9237.92 samples/sec Loss 1.0982 LearningRate 0.0001 Epoch: 27 Global Step: 47800 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-06 07:41:57,953-Speed 9207.74 samples/sec Loss 1.0971 LearningRate 0.0001 Epoch: 27 Global Step: 47810 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-06 07:42:24,417-Speed 9286.62 samples/sec Loss 1.0919 LearningRate 0.0001 Epoch: 27 Global Step: 47820 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:42:50,984-Speed 9251.46 samples/sec Loss 1.0924 LearningRate 0.0001 Epoch: 27 Global Step: 47830 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:43:17,536-Speed 9256.21 samples/sec Loss 1.1005 LearningRate 0.0001 Epoch: 27 Global Step: 47840 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:43:44,196-Speed 9218.86 samples/sec Loss 1.0948 LearningRate 0.0001 Epoch: 27 Global Step: 47850 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:44:11,038-Speed 9156.35 samples/sec Loss 1.0997 LearningRate 0.0001 Epoch: 27 Global Step: 47860 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:44:37,931-Speed 9138.71 samples/sec Loss 1.0962 LearningRate 0.0001 Epoch: 27 Global Step: 47870 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-06 07:45:04,511-Speed 9246.20 samples/sec Loss 1.0940 LearningRate 0.0001 Epoch: 27 Global Step: 47880 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-06 07:45:31,247-Speed 9192.52 samples/sec Loss 1.1005 LearningRate 0.0001 Epoch: 27 Global Step: 47890 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-06 07:45:57,994-Speed 9188.87 samples/sec Loss 1.0994 LearningRate 0.0001 Epoch: 27 Global Step: 47900 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-06 07:46:24,590-Speed 9240.86 samples/sec Loss 1.0997 LearningRate 0.0001 Epoch: 27 Global Step: 47910 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-06 07:46:51,050-Speed 9288.25 samples/sec Loss 1.0997 LearningRate 0.0001 Epoch: 27 Global Step: 47920 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-06 07:47:17,400-Speed 9327.22 samples/sec Loss 1.0981 LearningRate 0.0001 Epoch: 27 Global Step: 47930 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-06 07:47:43,704-Speed 9343.34 samples/sec Loss 1.0948 LearningRate 0.0001 Epoch: 27 Global Step: 47940 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-06 07:48:10,131-Speed 9299.74 samples/sec Loss 1.0911 LearningRate 0.0001 Epoch: 27 Global Step: 47950 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-06 07:48:36,575-Speed 9295.31 samples/sec Loss 1.0990 LearningRate 0.0001 Epoch: 27 Global Step: 47960 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-06 07:49:03,031-Speed 9289.43 samples/sec Loss 1.0903 LearningRate 0.0001 Epoch: 27 Global Step: 47970 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:49:29,685-Speed 9220.73 samples/sec Loss 1.0933 LearningRate 0.0001 Epoch: 27 Global Step: 47980 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:50:01,809-Speed 7650.88 samples/sec Loss 1.0947 LearningRate 0.0001 Epoch: 27 Global Step: 47990 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:50:28,222-Speed 9304.67 samples/sec Loss 1.0900 LearningRate 0.0001 Epoch: 27 Global Step: 48000 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:50:54,715-Speed 9276.77 samples/sec Loss 1.0991 LearningRate 0.0001 Epoch: 27 Global Step: 48010 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:51:21,149-Speed 9297.47 samples/sec Loss 1.0952 LearningRate 0.0001 Epoch: 27 Global Step: 48020 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:51:47,613-Speed 9287.01 samples/sec Loss 1.1025 LearningRate 0.0001 Epoch: 27 Global Step: 48030 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:52:14,130-Speed 9268.60 samples/sec Loss 1.0937 LearningRate 0.0001 Epoch: 27 Global Step: 48040 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:52:40,625-Speed 9276.02 samples/sec Loss 1.0837 LearningRate 0.0001 Epoch: 27 Global Step: 48050 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-06 07:53:07,063-Speed 9295.96 samples/sec Loss 1.0993 LearningRate 0.0001 Epoch: 27 Global Step: 48060 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 07:53:33,528-Speed 9286.62 samples/sec Loss 1.0889 LearningRate 0.0001 Epoch: 27 Global Step: 48070 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 07:54:00,173-Speed 9223.74 samples/sec Loss 1.0868 LearningRate 0.0001 Epoch: 27 Global Step: 48080 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 07:54:26,622-Speed 9292.31 samples/sec Loss 1.0851 LearningRate 0.0001 Epoch: 27 Global Step: 48090 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 07:54:53,078-Speed 9289.86 samples/sec Loss 1.0851 LearningRate 0.0001 Epoch: 27 Global Step: 48100 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 07:55:19,628-Speed 9256.79 samples/sec Loss 1.0998 LearningRate 0.0001 Epoch: 27 Global Step: 48110 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 07:55:46,043-Speed 9304.12 samples/sec Loss 1.0860 LearningRate 0.0001 Epoch: 27 Global Step: 48120 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 07:56:12,607-Speed 9252.11 samples/sec Loss 1.0842 LearningRate 0.0001 Epoch: 27 Global Step: 48130 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 07:56:39,254-Speed 9223.40 samples/sec Loss 1.0892 LearningRate 0.0001 Epoch: 27 Global Step: 48140 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 07:57:05,953-Speed 9205.08 samples/sec Loss 1.0946 LearningRate 0.0001 Epoch: 27 Global Step: 48150 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 07:57:32,691-Speed 9191.77 samples/sec Loss 1.0931 LearningRate 0.0001 Epoch: 27 Global Step: 48160 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 07:57:59,290-Speed 9240.22 samples/sec Loss 1.0836 LearningRate 0.0001 Epoch: 27 Global Step: 48170 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-06 07:58:25,800-Speed 9270.67 samples/sec Loss 1.0970 LearningRate 0.0001 Epoch: 27 Global Step: 48180 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-06 07:58:52,365-Speed 9251.65 samples/sec Loss 1.0985 LearningRate 0.0001 Epoch: 27 Global Step: 48190 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-06 07:59:18,833-Speed 9285.88 samples/sec Loss 1.0964 LearningRate 0.0001 Epoch: 27 Global Step: 48200 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 07:59:45,390-Speed 9254.40 samples/sec Loss 1.0921 LearningRate 0.0001 Epoch: 27 Global Step: 48210 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:00:12,065-Speed 9213.63 samples/sec Loss 1.0934 LearningRate 0.0001 Epoch: 27 Global Step: 48220 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:00:38,461-Speed 9310.88 samples/sec Loss 1.0854 LearningRate 0.0001 Epoch: 27 Global Step: 48230 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:01:05,088-Speed 9229.90 samples/sec Loss 1.0896 LearningRate 0.0001 Epoch: 27 Global Step: 48240 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:01:31,580-Speed 9277.14 samples/sec Loss 1.0796 LearningRate 0.0001 Epoch: 27 Global Step: 48250 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:01:58,074-Speed 9276.39 samples/sec Loss 1.0903 LearningRate 0.0001 Epoch: 27 Global Step: 48260 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:02:24,877-Speed 9169.67 samples/sec Loss 1.0800 LearningRate 0.0001 Epoch: 27 Global Step: 48270 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:02:51,341-Speed 9287.01 samples/sec Loss 1.0914 LearningRate 0.0001 Epoch: 27 Global Step: 48280 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:03:19,957-Speed 8588.33 samples/sec Loss 1.0907 LearningRate 0.0001 Epoch: 27 Global Step: 48290 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:03:46,474-Speed 9268.44 samples/sec Loss 1.0844 LearningRate 0.0001 Epoch: 27 Global Step: 48300 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:04:12,983-Speed 9271.56 samples/sec Loss 1.0899 LearningRate 0.0001 Epoch: 27 Global Step: 48310 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:04:39,479-Speed 9275.65 samples/sec Loss 1.0918 LearningRate 0.0001 Epoch: 27 Global Step: 48320 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:05:06,029-Speed 9256.83 samples/sec Loss 1.0960 LearningRate 0.0001 Epoch: 27 Global Step: 48330 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:05:32,545-Speed 9268.81 samples/sec Loss 1.0908 LearningRate 0.0001 Epoch: 27 Global Step: 48340 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:05:59,054-Speed 9271.19 samples/sec Loss 1.0879 LearningRate 0.0001 Epoch: 27 Global Step: 48350 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:06:25,531-Speed 9282.49 samples/sec Loss 1.0864 LearningRate 0.0001 Epoch: 27 Global Step: 48360 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:06:52,272-Speed 9190.99 samples/sec Loss 1.0939 LearningRate 0.0001 Epoch: 27 Global Step: 48370 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:07:18,717-Speed 9293.74 samples/sec Loss 1.0901 LearningRate 0.0001 Epoch: 27 Global Step: 48380 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:07:45,132-Speed 9304.05 samples/sec Loss 1.0946 LearningRate 0.0001 Epoch: 27 Global Step: 48390 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:09:04,169-Speed 3109.51 samples/sec Loss 1.0838 LearningRate 0.0001 Epoch: 28 Global Step: 48400 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:09:30,210-Speed 9437.92 samples/sec Loss 1.0876 LearningRate 0.0001 Epoch: 28 Global Step: 48410 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:09:56,306-Speed 9418.12 samples/sec Loss 1.0762 LearningRate 0.0001 Epoch: 28 Global Step: 48420 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:10:22,459-Speed 9397.35 samples/sec Loss 1.0776 LearningRate 0.0001 Epoch: 28 Global Step: 48430 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:10:48,740-Speed 9351.58 samples/sec Loss 1.0752 LearningRate 0.0001 Epoch: 28 Global Step: 48440 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:11:15,072-Speed 9333.37 samples/sec Loss 1.0773 LearningRate 0.0001 Epoch: 28 Global Step: 48450 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:11:41,208-Speed 9403.79 samples/sec Loss 1.0794 LearningRate 0.0001 Epoch: 28 Global Step: 48460 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:12:07,385-Speed 9388.84 samples/sec Loss 1.0712 LearningRate 0.0001 Epoch: 28 Global Step: 48470 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:12:33,523-Speed 9402.55 samples/sec Loss 1.0750 LearningRate 0.0001 Epoch: 28 Global Step: 48480 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:12:59,640-Speed 9410.42 samples/sec Loss 1.0716 LearningRate 0.0001 Epoch: 28 Global Step: 48490 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:13:25,776-Speed 9403.51 samples/sec Loss 1.0707 LearningRate 0.0001 Epoch: 28 Global Step: 48500 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:13:51,876-Speed 9416.97 samples/sec Loss 1.0769 LearningRate 0.0001 Epoch: 28 Global Step: 48510 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:14:18,184-Speed 9341.91 samples/sec Loss 1.0691 LearningRate 0.0001 Epoch: 28 Global Step: 48520 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:14:44,350-Speed 9392.88 samples/sec Loss 1.0677 LearningRate 0.0001 Epoch: 28 Global Step: 48530 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:15:10,470-Speed 9409.36 samples/sec Loss 1.0704 LearningRate 0.0001 Epoch: 28 Global Step: 48540 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:15:36,531-Speed 9430.45 samples/sec Loss 1.0846 LearningRate 0.0001 Epoch: 28 Global Step: 48550 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:16:02,662-Speed 9406.24 samples/sec Loss 1.0777 LearningRate 0.0001 Epoch: 28 Global Step: 48560 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:16:28,820-Speed 9395.16 samples/sec Loss 1.0723 LearningRate 0.0001 Epoch: 28 Global Step: 48570 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:16:54,955-Speed 9404.16 samples/sec Loss 1.0825 LearningRate 0.0001 Epoch: 28 Global Step: 48580 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:17:21,061-Speed 9414.27 samples/sec Loss 1.0774 LearningRate 0.0001 Epoch: 28 Global Step: 48590 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:17:47,200-Speed 9402.02 samples/sec Loss 1.0746 LearningRate 0.0001 Epoch: 28 Global Step: 48600 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:18:13,431-Speed 9369.57 samples/sec Loss 1.0662 LearningRate 0.0001 Epoch: 28 Global Step: 48610 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:18:39,596-Speed 9393.29 samples/sec Loss 1.0676 LearningRate 0.0001 Epoch: 28 Global Step: 48620 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:19:05,800-Speed 9378.78 samples/sec Loss 1.0697 LearningRate 0.0001 Epoch: 28 Global Step: 48630 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:19:31,927-Speed 9406.90 samples/sec Loss 1.0668 LearningRate 0.0001 Epoch: 28 Global Step: 48640 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:19:58,155-Speed 9370.44 samples/sec Loss 1.0751 LearningRate 0.0001 Epoch: 28 Global Step: 48650 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:20:24,340-Speed 9386.04 samples/sec Loss 1.0712 LearningRate 0.0001 Epoch: 28 Global Step: 48660 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:20:50,666-Speed 9335.45 samples/sec Loss 1.0673 LearningRate 0.0001 Epoch: 28 Global Step: 48670 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:21:16,849-Speed 9386.59 samples/sec Loss 1.0785 LearningRate 0.0001 Epoch: 28 Global Step: 48680 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:21:43,011-Speed 9394.17 samples/sec Loss 1.0813 LearningRate 0.0001 Epoch: 28 Global Step: 48690 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:22:09,191-Speed 9388.37 samples/sec Loss 1.0708 LearningRate 0.0001 Epoch: 28 Global Step: 48700 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:22:35,264-Speed 9426.29 samples/sec Loss 1.0753 LearningRate 0.0001 Epoch: 28 Global Step: 48710 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:23:01,430-Speed 9392.68 samples/sec Loss 1.0622 LearningRate 0.0001 Epoch: 28 Global Step: 48720 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:23:27,577-Speed 9399.62 samples/sec Loss 1.0847 LearningRate 0.0001 Epoch: 28 Global Step: 48730 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:23:53,673-Speed 9417.93 samples/sec Loss 1.0720 LearningRate 0.0001 Epoch: 28 Global Step: 48740 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:24:19,824-Speed 9398.09 samples/sec Loss 1.0816 LearningRate 0.0001 Epoch: 28 Global Step: 48750 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:24:45,933-Speed 9413.10 samples/sec Loss 1.0834 LearningRate 0.0001 Epoch: 28 Global Step: 48760 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:25:12,103-Speed 9391.20 samples/sec Loss 1.0754 LearningRate 0.0001 Epoch: 28 Global Step: 48770 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:25:38,306-Speed 9379.65 samples/sec Loss 1.0734 LearningRate 0.0001 Epoch: 28 Global Step: 48780 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:26:04,419-Speed 9412.12 samples/sec Loss 1.0736 LearningRate 0.0001 Epoch: 28 Global Step: 48790 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:26:30,583-Speed 9393.36 samples/sec Loss 1.0810 LearningRate 0.0001 Epoch: 28 Global Step: 48800 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:26:56,790-Speed 9378.10 samples/sec Loss 1.0673 LearningRate 0.0001 Epoch: 28 Global Step: 48810 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:27:23,017-Speed 9370.82 samples/sec Loss 1.0710 LearningRate 0.0001 Epoch: 28 Global Step: 48820 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:27:49,150-Speed 9404.63 samples/sec Loss 1.0732 LearningRate 0.0001 Epoch: 28 Global Step: 48830 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:28:15,305-Speed 9396.60 samples/sec Loss 1.0742 LearningRate 0.0001 Epoch: 28 Global Step: 48840 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:28:41,452-Speed 9399.67 samples/sec Loss 1.0744 LearningRate 0.0001 Epoch: 28 Global Step: 48850 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:29:07,566-Speed 9411.43 samples/sec Loss 1.0708 LearningRate 0.0001 Epoch: 28 Global Step: 48860 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:29:33,763-Speed 9381.48 samples/sec Loss 1.0771 LearningRate 0.0001 Epoch: 28 Global Step: 48870 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:29:59,930-Speed 9392.41 samples/sec Loss 1.0720 LearningRate 0.0001 Epoch: 28 Global Step: 48880 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:30:26,072-Speed 9401.73 samples/sec Loss 1.0775 LearningRate 0.0001 Epoch: 28 Global Step: 48890 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:30:52,274-Speed 9379.51 samples/sec Loss 1.0718 LearningRate 0.0001 Epoch: 28 Global Step: 48900 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:31:18,413-Speed 9402.57 samples/sec Loss 1.0699 LearningRate 0.0001 Epoch: 28 Global Step: 48910 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:31:44,521-Speed 9413.38 samples/sec Loss 1.0807 LearningRate 0.0001 Epoch: 28 Global Step: 48920 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:32:10,604-Speed 9422.80 samples/sec Loss 1.0721 LearningRate 0.0001 Epoch: 28 Global Step: 48930 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:32:36,667-Speed 9429.96 samples/sec Loss 1.0679 LearningRate 0.0001 Epoch: 28 Global Step: 48940 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:33:02,784-Speed 9410.90 samples/sec Loss 1.0687 LearningRate 0.0001 Epoch: 28 Global Step: 48950 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:33:28,919-Speed 9403.86 samples/sec Loss 1.0649 LearningRate 0.0001 Epoch: 28 Global Step: 48960 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:33:55,064-Speed 9400.13 samples/sec Loss 1.0654 LearningRate 0.0001 Epoch: 28 Global Step: 48970 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:34:21,165-Speed 9418.58 samples/sec Loss 1.0619 LearningRate 0.0001 Epoch: 28 Global Step: 48980 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:34:47,374-Speed 9377.19 samples/sec Loss 1.0712 LearningRate 0.0001 Epoch: 28 Global Step: 48990 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-06 08:35:13,537-Speed 9393.89 samples/sec Loss 1.0652 LearningRate 0.0001 Epoch: 28 Global Step: 49000 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-06 08:35:43,639-Speed 8164.48 samples/sec Loss 1.0600 LearningRate 0.0001 Epoch: 28 Global Step: 49010 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:36:09,728-Speed 9420.57 samples/sec Loss 1.0578 LearningRate 0.0001 Epoch: 28 Global Step: 49020 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:36:35,882-Speed 9397.14 samples/sec Loss 1.0652 LearningRate 0.0001 Epoch: 28 Global Step: 49030 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:37:02,054-Speed 9390.75 samples/sec Loss 1.0601 LearningRate 0.0001 Epoch: 28 Global Step: 49040 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:37:28,325-Speed 9355.42 samples/sec Loss 1.0583 LearningRate 0.0001 Epoch: 28 Global Step: 49050 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:37:54,541-Speed 9374.49 samples/sec Loss 1.0732 LearningRate 0.0001 Epoch: 28 Global Step: 49060 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:38:20,665-Speed 9408.00 samples/sec Loss 1.0652 LearningRate 0.0001 Epoch: 28 Global Step: 49070 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:38:46,794-Speed 9406.06 samples/sec Loss 1.0603 LearningRate 0.0001 Epoch: 28 Global Step: 49080 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:39:12,913-Speed 9409.89 samples/sec Loss 1.0568 LearningRate 0.0001 Epoch: 28 Global Step: 49090 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:39:39,087-Speed 9389.76 samples/sec Loss 1.0640 LearningRate 0.0001 Epoch: 28 Global Step: 49100 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:40:13,194-Speed 7205.73 samples/sec Loss 1.0601 LearningRate 0.0001 Epoch: 28 Global Step: 49110 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-06 08:40:39,196-Speed 9452.12 samples/sec Loss 1.0650 LearningRate 0.0001 Epoch: 28 Global Step: 49120 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:41:05,390-Speed 9382.42 samples/sec Loss 1.0632 LearningRate 0.0001 Epoch: 28 Global Step: 49130 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:41:31,506-Speed 9410.82 samples/sec Loss 1.0664 LearningRate 0.0001 Epoch: 28 Global Step: 49140 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:41:57,646-Speed 9401.98 samples/sec Loss 1.0596 LearningRate 0.0001 Epoch: 28 Global Step: 49150 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:42:23,761-Speed 9411.19 samples/sec Loss 1.0627 LearningRate 0.0001 Epoch: 28 Global Step: 49160 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:42:49,889-Speed 9406.61 samples/sec Loss 1.0562 LearningRate 0.0001 Epoch: 28 Global Step: 49170 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:43:15,927-Speed 9438.53 samples/sec Loss 1.0548 LearningRate 0.0001 Epoch: 28 Global Step: 49180 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:43:42,041-Speed 9411.47 samples/sec Loss 1.0556 LearningRate 0.0001 Epoch: 28 Global Step: 49190 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:44:08,172-Speed 9405.41 samples/sec Loss 1.0602 LearningRate 0.0001 Epoch: 28 Global Step: 49200 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:44:34,299-Speed 9406.74 samples/sec Loss 1.0631 LearningRate 0.0001 Epoch: 28 Global Step: 49210 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:45:00,418-Speed 9409.70 samples/sec Loss 1.0607 LearningRate 0.0001 Epoch: 28 Global Step: 49220 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:45:26,520-Speed 9415.77 samples/sec Loss 1.0667 LearningRate 0.0001 Epoch: 28 Global Step: 49230 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:45:52,579-Speed 9431.11 samples/sec Loss 1.0598 LearningRate 0.0001 Epoch: 28 Global Step: 49240 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:46:18,742-Speed 9393.84 samples/sec Loss 1.0574 LearningRate 0.0001 Epoch: 28 Global Step: 49250 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:46:44,920-Speed 9388.61 samples/sec Loss 1.0486 LearningRate 0.0001 Epoch: 28 Global Step: 49260 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:47:11,084-Speed 9393.48 samples/sec Loss 1.0608 LearningRate 0.0001 Epoch: 28 Global Step: 49270 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:47:37,165-Speed 9423.27 samples/sec Loss 1.0599 LearningRate 0.0001 Epoch: 28 Global Step: 49280 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:48:03,318-Speed 9397.34 samples/sec Loss 1.0606 LearningRate 0.0001 Epoch: 28 Global Step: 49290 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:48:29,457-Speed 9402.24 samples/sec Loss 1.0598 LearningRate 0.0001 Epoch: 28 Global Step: 49300 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:48:55,567-Speed 9413.09 samples/sec Loss 1.0577 LearningRate 0.0001 Epoch: 28 Global Step: 49310 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:49:21,696-Speed 9406.05 samples/sec Loss 1.0573 LearningRate 0.0001 Epoch: 28 Global Step: 49320 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:49:47,867-Speed 9391.16 samples/sec Loss 1.0567 LearningRate 0.0001 Epoch: 28 Global Step: 49330 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-06 08:50:13,993-Speed 9406.97 samples/sec Loss 1.0496 LearningRate 0.0001 Epoch: 28 Global Step: 49340 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:50:40,203-Speed 9379.15 samples/sec Loss 1.0485 LearningRate 0.0001 Epoch: 28 Global Step: 49350 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:51:08,601-Speed 8654.22 samples/sec Loss 1.0551 LearningRate 0.0001 Epoch: 28 Global Step: 49360 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:51:34,684-Speed 9422.64 samples/sec Loss 1.0563 LearningRate 0.0001 Epoch: 28 Global Step: 49370 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:52:00,963-Speed 9352.67 samples/sec Loss 1.0516 LearningRate 0.0001 Epoch: 28 Global Step: 49380 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:52:27,167-Speed 9378.89 samples/sec Loss 1.0502 LearningRate 0.0001 Epoch: 28 Global Step: 49390 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:52:53,307-Speed 9402.34 samples/sec Loss 1.0455 LearningRate 0.0001 Epoch: 28 Global Step: 49400 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:53:19,408-Speed 9415.76 samples/sec Loss 1.0533 LearningRate 0.0001 Epoch: 28 Global Step: 49410 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:53:45,529-Speed 9409.21 samples/sec Loss 1.0458 LearningRate 0.0001 Epoch: 28 Global Step: 49420 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-06 08:54:11,672-Speed 9401.06 samples/sec Loss 1.0507 LearningRate 0.0001 Epoch: 28 Global Step: 49430 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 08:54:37,807-Speed 9403.78 samples/sec Loss 1.0508 LearningRate 0.0001 Epoch: 28 Global Step: 49440 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-06 08:55:03,957-Speed 9398.30 samples/sec Loss 1.0589 LearningRate 0.0001 Epoch: 28 Global Step: 49450 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-06 08:55:30,141-Speed 9386.27 samples/sec Loss 1.0490 LearningRate 0.0001 Epoch: 28 Global Step: 49460 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-06 08:55:56,253-Speed 9412.33 samples/sec Loss 1.0512 LearningRate 0.0001 Epoch: 28 Global Step: 49470 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-06 08:56:22,360-Speed 9414.39 samples/sec Loss 1.0494 LearningRate 0.0001 Epoch: 28 Global Step: 49480 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-06 08:56:48,494-Speed 9403.98 samples/sec Loss 1.0599 LearningRate 0.0001 Epoch: 28 Global Step: 49490 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-06 08:57:14,660-Speed 9393.00 samples/sec Loss 1.0552 LearningRate 0.0001 Epoch: 28 Global Step: 49500 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-06 08:57:40,842-Speed 9387.09 samples/sec Loss 1.0548 LearningRate 0.0001 Epoch: 28 Global Step: 49510 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-06 08:58:07,033-Speed 9383.67 samples/sec Loss 1.0437 LearningRate 0.0001 Epoch: 28 Global Step: 49520 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-06 08:58:33,158-Speed 9407.52 samples/sec Loss 1.0553 LearningRate 0.0001 Epoch: 28 Global Step: 49530 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-06 08:58:59,291-Speed 9404.40 samples/sec Loss 1.0451 LearningRate 0.0001 Epoch: 28 Global Step: 49540 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-03-06 08:59:25,415-Speed 9408.04 samples/sec Loss 1.0518 LearningRate 0.0001 Epoch: 28 Global Step: 49550 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-03-06 08:59:51,483-Speed 9428.18 samples/sec Loss 1.0502 LearningRate 0.0001 Epoch: 28 Global Step: 49560 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-06 09:00:17,732-Speed 9362.76 samples/sec Loss 1.0583 LearningRate 0.0001 Epoch: 28 Global Step: 49570 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-06 09:00:43,870-Speed 9402.81 samples/sec Loss 1.0534 LearningRate 0.0001 Epoch: 28 Global Step: 49580 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-06 09:01:09,956-Speed 9421.52 samples/sec Loss 1.0636 LearningRate 0.0001 Epoch: 28 Global Step: 49590 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-06 09:01:36,068-Speed 9412.09 samples/sec Loss 1.0436 LearningRate 0.0001 Epoch: 28 Global Step: 49600 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-06 09:02:02,173-Speed 9414.77 samples/sec Loss 1.0437 LearningRate 0.0001 Epoch: 28 Global Step: 49610 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-06 09:02:28,269-Speed 9417.90 samples/sec Loss 1.0376 LearningRate 0.0001 Epoch: 28 Global Step: 49620 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-06 09:02:54,333-Speed 9429.49 samples/sec Loss 1.0477 LearningRate 0.0001 Epoch: 28 Global Step: 49630 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-06 09:03:20,425-Speed 9419.39 samples/sec Loss 1.0429 LearningRate 0.0001 Epoch: 28 Global Step: 49640 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-06 09:03:46,519-Speed 9418.59 samples/sec Loss 1.0409 LearningRate 0.0001 Epoch: 28 Global Step: 49650 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-06 09:04:12,582-Speed 9430.14 samples/sec Loss 1.0531 LearningRate 0.0001 Epoch: 28 Global Step: 49660 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-03-06 09:04:38,594-Speed 9449.10 samples/sec Loss 1.0414 LearningRate 0.0001 Epoch: 28 Global Step: 49670 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-06 09:05:04,670-Speed 9425.07 samples/sec Loss 1.0380 LearningRate 0.0001 Epoch: 28 Global Step: 49680 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-06 09:05:30,836-Speed 9392.70 samples/sec Loss 1.0500 LearningRate 0.0001 Epoch: 28 Global Step: 49690 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-06 09:05:56,943-Speed 9413.97 samples/sec Loss 1.0532 LearningRate 0.0001 Epoch: 28 Global Step: 49700 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-06 09:06:23,059-Speed 9410.86 samples/sec Loss 1.0451 LearningRate 0.0001 Epoch: 28 Global Step: 49710 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-06 09:06:49,120-Speed 9430.61 samples/sec Loss 1.0473 LearningRate 0.0001 Epoch: 28 Global Step: 49720 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:07:15,352-Speed 9368.95 samples/sec Loss 1.0397 LearningRate 0.0001 Epoch: 28 Global Step: 49730 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:07:41,542-Speed 9384.15 samples/sec Loss 1.0511 LearningRate 0.0001 Epoch: 28 Global Step: 49740 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:08:07,805-Speed 9358.31 samples/sec Loss 1.0393 LearningRate 0.0001 Epoch: 28 Global Step: 49750 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:08:34,021-Speed 9374.65 samples/sec Loss 1.0474 LearningRate 0.0001 Epoch: 28 Global Step: 49760 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:09:00,166-Speed 9400.59 samples/sec Loss 1.0453 LearningRate 0.0001 Epoch: 28 Global Step: 49770 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:09:26,319-Speed 9397.22 samples/sec Loss 1.0420 LearningRate 0.0001 Epoch: 28 Global Step: 49780 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:09:52,439-Speed 9409.52 samples/sec Loss 1.0448 LearningRate 0.0001 Epoch: 28 Global Step: 49790 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:10:18,577-Speed 9402.81 samples/sec Loss 1.0439 LearningRate 0.0001 Epoch: 28 Global Step: 49800 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:10:44,658-Speed 9423.53 samples/sec Loss 1.0499 LearningRate 0.0001 Epoch: 28 Global Step: 49810 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:11:10,774-Speed 9410.99 samples/sec Loss 1.0397 LearningRate 0.0001 Epoch: 28 Global Step: 49820 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-06 09:11:36,904-Speed 9405.53 samples/sec Loss 1.0441 LearningRate 0.0001 Epoch: 28 Global Step: 49830 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-06 09:12:03,090-Speed 9385.87 samples/sec Loss 1.0329 LearningRate 0.0001 Epoch: 28 Global Step: 49840 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-06 09:12:29,248-Speed 9395.45 samples/sec Loss 1.0433 LearningRate 0.0001 Epoch: 28 Global Step: 49850 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-06 09:12:55,357-Speed 9413.28 samples/sec Loss 1.0325 LearningRate 0.0001 Epoch: 28 Global Step: 49860 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-06 09:13:21,484-Speed 9407.04 samples/sec Loss 1.0410 LearningRate 0.0001 Epoch: 28 Global Step: 49870 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-06 09:13:47,560-Speed 9425.00 samples/sec Loss 1.0426 LearningRate 0.0001 Epoch: 28 Global Step: 49880 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-06 09:14:13,735-Speed 9389.92 samples/sec Loss 1.0502 LearningRate 0.0001 Epoch: 28 Global Step: 49890 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-06 09:14:39,906-Speed 9390.86 samples/sec Loss 1.0421 LearningRate 0.0001 Epoch: 28 Global Step: 49900 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-06 09:15:05,996-Speed 9420.12 samples/sec Loss 1.0372 LearningRate 0.0001 Epoch: 28 Global Step: 49910 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-06 09:15:32,118-Speed 9408.30 samples/sec Loss 1.0430 LearningRate 0.0001 Epoch: 28 Global Step: 49920 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-03-06 09:15:58,311-Speed 9383.02 samples/sec Loss 1.0406 LearningRate 0.0001 Epoch: 28 Global Step: 49930 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-03-06 09:16:24,404-Speed 9419.09 samples/sec Loss 1.0394 LearningRate 0.0001 Epoch: 28 Global Step: 49940 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:16:50,571-Speed 9392.48 samples/sec Loss 1.0397 LearningRate 0.0001 Epoch: 28 Global Step: 49950 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:17:16,751-Speed 9387.57 samples/sec Loss 1.0401 LearningRate 0.0001 Epoch: 28 Global Step: 49960 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:17:42,915-Speed 9393.45 samples/sec Loss 1.0359 LearningRate 0.0001 Epoch: 28 Global Step: 49970 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:18:09,068-Speed 9397.45 samples/sec Loss 1.0356 LearningRate 0.0001 Epoch: 28 Global Step: 49980 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:18:35,210-Speed 9401.51 samples/sec Loss 1.0457 LearningRate 0.0001 Epoch: 28 Global Step: 49990 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:19:01,362-Speed 9397.59 samples/sec Loss 1.0353 LearningRate 0.0001 Epoch: 28 Global Step: 50000 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:19:27,483-Speed 9409.28 samples/sec Loss 1.0319 LearningRate 0.0001 Epoch: 28 Global Step: 50010 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:19:53,636-Speed 9397.53 samples/sec Loss 1.0432 LearningRate 0.0001 Epoch: 28 Global Step: 50020 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:20:19,852-Speed 9374.89 samples/sec Loss 1.0352 LearningRate 0.0001 Epoch: 28 Global Step: 50030 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:20:46,043-Speed 9383.52 samples/sec Loss 1.0443 LearningRate 0.0001 Epoch: 28 Global Step: 50040 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-06 09:21:12,153-Speed 9412.89 samples/sec Loss 1.0452 LearningRate 0.0001 Epoch: 28 Global Step: 50050 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-06 09:21:38,401-Speed 9364.06 samples/sec Loss 1.0352 LearningRate 0.0001 Epoch: 28 Global Step: 50060 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-06 09:22:04,539-Speed 9403.09 samples/sec Loss 1.0384 LearningRate 0.0001 Epoch: 28 Global Step: 50070 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-06 09:22:30,686-Speed 9399.70 samples/sec Loss 1.0469 LearningRate 0.0001 Epoch: 28 Global Step: 50080 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-06 09:22:56,778-Speed 9419.29 samples/sec Loss 1.0459 LearningRate 0.0001 Epoch: 28 Global Step: 50090 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-06 09:23:22,909-Speed 9405.53 samples/sec Loss 1.0487 LearningRate 0.0001 Epoch: 28 Global Step: 50100 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-06 09:23:49,017-Speed 9413.84 samples/sec Loss 1.0486 LearningRate 0.0001 Epoch: 28 Global Step: 50110 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-06 09:24:15,220-Speed 9379.37 samples/sec Loss 1.0522 LearningRate 0.0001 Epoch: 28 Global Step: 50120 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-06 09:25:34,405-Speed 3103.67 samples/sec Loss 1.0318 LearningRate 0.0001 Epoch: 29 Global Step: 50130 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-06 09:26:00,394-Speed 9456.68 samples/sec Loss 1.0308 LearningRate 0.0001 Epoch: 29 Global Step: 50140 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:26:26,429-Speed 9440.08 samples/sec Loss 1.0290 LearningRate 0.0001 Epoch: 29 Global Step: 50150 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:26:52,538-Speed 9413.24 samples/sec Loss 1.0376 LearningRate 0.0001 Epoch: 29 Global Step: 50160 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:27:18,615-Speed 9424.79 samples/sec Loss 1.0283 LearningRate 0.0001 Epoch: 29 Global Step: 50170 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:27:44,583-Speed 9464.34 samples/sec Loss 1.0250 LearningRate 0.0001 Epoch: 29 Global Step: 50180 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:28:10,702-Speed 9409.55 samples/sec Loss 1.0340 LearningRate 0.0001 Epoch: 29 Global Step: 50190 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:28:36,788-Speed 9421.84 samples/sec Loss 1.0273 LearningRate 0.0001 Epoch: 29 Global Step: 50200 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:29:02,806-Speed 9446.14 samples/sec Loss 1.0291 LearningRate 0.0001 Epoch: 29 Global Step: 50210 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-06 09:29:28,946-Speed 9402.38 samples/sec Loss 1.0247 LearningRate 0.0001 Epoch: 29 Global Step: 50220 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-06 09:29:55,048-Speed 9415.86 samples/sec Loss 1.0284 LearningRate 0.0001 Epoch: 29 Global Step: 50230 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-06 09:30:21,101-Speed 9433.41 samples/sec Loss 1.0250 LearningRate 0.0001 Epoch: 29 Global Step: 50240 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-06 09:30:47,180-Speed 9423.86 samples/sec Loss 1.0270 LearningRate 0.0001 Epoch: 29 Global Step: 50250 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-06 09:31:13,269-Speed 9420.49 samples/sec Loss 1.0271 LearningRate 0.0001 Epoch: 29 Global Step: 50260 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-06 09:31:39,325-Speed 9432.80 samples/sec Loss 1.0237 LearningRate 0.0001 Epoch: 29 Global Step: 50270 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-06 09:32:05,426-Speed 9416.17 samples/sec Loss 1.0331 LearningRate 0.0001 Epoch: 29 Global Step: 50280 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-06 09:32:31,486-Speed 9431.53 samples/sec Loss 1.0211 LearningRate 0.0001 Epoch: 29 Global Step: 50290 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-06 09:32:57,594-Speed 9413.53 samples/sec Loss 1.0227 LearningRate 0.0001 Epoch: 29 Global Step: 50300 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-06 09:33:23,636-Speed 9437.61 samples/sec Loss 1.0261 LearningRate 0.0001 Epoch: 29 Global Step: 50310 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:33:49,742-Speed 9414.21 samples/sec Loss 1.0262 LearningRate 0.0001 Epoch: 29 Global Step: 50320 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:34:15,824-Speed 9423.20 samples/sec Loss 1.0252 LearningRate 0.0001 Epoch: 29 Global Step: 50330 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:34:41,872-Speed 9435.32 samples/sec Loss 1.0272 LearningRate 0.0001 Epoch: 29 Global Step: 50340 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:35:08,020-Speed 9398.97 samples/sec Loss 1.0257 LearningRate 0.0001 Epoch: 29 Global Step: 50350 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:35:34,053-Speed 9440.85 samples/sec Loss 1.0279 LearningRate 0.0001 Epoch: 29 Global Step: 50360 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:36:00,245-Speed 9383.50 samples/sec Loss 1.0311 LearningRate 0.0001 Epoch: 29 Global Step: 50370 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:36:26,409-Speed 9393.58 samples/sec Loss 1.0289 LearningRate 0.0001 Epoch: 29 Global Step: 50380 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:36:52,596-Speed 9385.25 samples/sec Loss 1.0263 LearningRate 0.0001 Epoch: 29 Global Step: 50390 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:37:18,688-Speed 9419.32 samples/sec Loss 1.0282 LearningRate 0.0001 Epoch: 29 Global Step: 50400 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:37:44,723-Speed 9440.03 samples/sec Loss 1.0217 LearningRate 0.0001 Epoch: 29 Global Step: 50410 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:38:10,839-Speed 9410.85 samples/sec Loss 1.0225 LearningRate 0.0001 Epoch: 29 Global Step: 50420 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:38:36,934-Speed 9418.35 samples/sec Loss 1.0230 LearningRate 0.0001 Epoch: 29 Global Step: 50430 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:39:03,004-Speed 9427.28 samples/sec Loss 1.0208 LearningRate 0.0001 Epoch: 29 Global Step: 50440 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:39:29,129-Speed 9407.67 samples/sec Loss 1.0273 LearningRate 0.0001 Epoch: 29 Global Step: 50450 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:39:55,230-Speed 9416.08 samples/sec Loss 1.0209 LearningRate 0.0001 Epoch: 29 Global Step: 50460 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:40:21,386-Speed 9396.44 samples/sec Loss 1.0319 LearningRate 0.0001 Epoch: 29 Global Step: 50470 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:40:47,545-Speed 9395.29 samples/sec Loss 1.0335 LearningRate 0.0001 Epoch: 29 Global Step: 50480 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:41:13,708-Speed 9393.86 samples/sec Loss 1.0257 LearningRate 0.0001 Epoch: 29 Global Step: 50490 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:41:39,831-Speed 9408.21 samples/sec Loss 1.0236 LearningRate 0.0001 Epoch: 29 Global Step: 50500 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:42:06,021-Speed 9384.63 samples/sec Loss 1.0250 LearningRate 0.0001 Epoch: 29 Global Step: 50510 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-06 09:42:32,141-Speed 9409.45 samples/sec Loss 1.0191 LearningRate 0.0001 Epoch: 29 Global Step: 50520 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-06 09:42:58,239-Speed 9417.41 samples/sec Loss 1.0283 LearningRate 0.0001 Epoch: 29 Global Step: 50530 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-06 09:43:24,335-Speed 9418.06 samples/sec Loss 1.0165 LearningRate 0.0001 Epoch: 29 Global Step: 50540 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-06 09:43:50,483-Speed 9399.02 samples/sec Loss 1.0245 LearningRate 0.0001 Epoch: 29 Global Step: 50550 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:44:16,646-Speed 9393.79 samples/sec Loss 1.0270 LearningRate 0.0001 Epoch: 29 Global Step: 50560 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:44:42,826-Speed 9387.81 samples/sec Loss 1.0290 LearningRate 0.0001 Epoch: 29 Global Step: 50570 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:45:08,993-Speed 9393.13 samples/sec Loss 1.0219 LearningRate 0.0001 Epoch: 29 Global Step: 50580 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:45:35,228-Speed 9367.99 samples/sec Loss 1.0192 LearningRate 0.0001 Epoch: 29 Global Step: 50590 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:46:01,333-Speed 9414.71 samples/sec Loss 1.0286 LearningRate 0.0001 Epoch: 29 Global Step: 50600 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:46:27,429-Speed 9417.97 samples/sec Loss 1.0287 LearningRate 0.0001 Epoch: 29 Global Step: 50610 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:46:53,599-Speed 9391.52 samples/sec Loss 1.0260 LearningRate 0.0001 Epoch: 29 Global Step: 50620 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:47:19,786-Speed 9385.35 samples/sec Loss 1.0315 LearningRate 0.0001 Epoch: 29 Global Step: 50630 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:47:46,009-Speed 9372.48 samples/sec Loss 1.0287 LearningRate 0.0001 Epoch: 29 Global Step: 50640 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:48:12,275-Speed 9357.08 samples/sec Loss 1.0253 LearningRate 0.0001 Epoch: 29 Global Step: 50650 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-06 09:48:38,468-Speed 9383.04 samples/sec Loss 1.0267 LearningRate 0.0001 Epoch: 29 Global Step: 50660 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-06 09:49:04,747-Speed 9352.11 samples/sec Loss 1.0174 LearningRate 0.0001 Epoch: 29 Global Step: 50670 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-06 09:49:30,863-Speed 9410.86 samples/sec Loss 1.0202 LearningRate 0.0001 Epoch: 29 Global Step: 50680 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-06 09:49:56,938-Speed 9425.58 samples/sec Loss 1.0210 LearningRate 0.0001 Epoch: 29 Global Step: 50690 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:50:23,082-Speed 9400.85 samples/sec Loss 1.0237 LearningRate 0.0001 Epoch: 29 Global Step: 50700 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:50:49,234-Speed 9397.75 samples/sec Loss 1.0157 LearningRate 0.0001 Epoch: 29 Global Step: 50710 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:51:15,434-Speed 9380.45 samples/sec Loss 1.0142 LearningRate 0.0001 Epoch: 29 Global Step: 50720 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:51:41,584-Speed 9398.81 samples/sec Loss 1.0287 LearningRate 0.0001 Epoch: 29 Global Step: 50730 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:52:07,684-Speed 9416.41 samples/sec Loss 1.0184 LearningRate 0.0001 Epoch: 29 Global Step: 50740 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:52:33,790-Speed 9414.30 samples/sec Loss 1.0200 LearningRate 0.0001 Epoch: 29 Global Step: 50750 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:52:59,881-Speed 9419.76 samples/sec Loss 1.0167 LearningRate 0.0001 Epoch: 29 Global Step: 50760 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:53:25,930-Speed 9435.61 samples/sec Loss 1.0170 LearningRate 0.0001 Epoch: 29 Global Step: 50770 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-06 09:53:52,069-Speed 9402.48 samples/sec Loss 1.0216 LearningRate 0.0001 Epoch: 29 Global Step: 50780 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 09:54:18,226-Speed 9396.04 samples/sec Loss 1.0171 LearningRate 0.0001 Epoch: 29 Global Step: 50790 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 09:54:44,398-Speed 9390.37 samples/sec Loss 1.0296 LearningRate 0.0001 Epoch: 29 Global Step: 50800 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 09:55:10,557-Speed 9395.40 samples/sec Loss 1.0160 LearningRate 0.0001 Epoch: 29 Global Step: 50810 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 09:55:36,747-Speed 9383.95 samples/sec Loss 1.0129 LearningRate 0.0001 Epoch: 29 Global Step: 50820 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 09:56:02,893-Speed 9400.14 samples/sec Loss 1.0126 LearningRate 0.0001 Epoch: 29 Global Step: 50830 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 09:56:29,029-Speed 9403.29 samples/sec Loss 1.0143 LearningRate 0.0001 Epoch: 29 Global Step: 50840 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 09:56:55,166-Speed 9403.43 samples/sec Loss 1.0120 LearningRate 0.0001 Epoch: 29 Global Step: 50850 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 09:57:21,313-Speed 9399.31 samples/sec Loss 1.0115 LearningRate 0.0001 Epoch: 29 Global Step: 50860 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 09:57:47,398-Speed 9422.26 samples/sec Loss 1.0152 LearningRate 0.0001 Epoch: 29 Global Step: 50870 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 09:58:13,510-Speed 9412.15 samples/sec Loss 1.0250 LearningRate 0.0001 Epoch: 29 Global Step: 50880 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 09:58:39,641-Speed 9405.44 samples/sec Loss 1.0141 LearningRate 0.0001 Epoch: 29 Global Step: 50890 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 09:59:05,788-Speed 9399.29 samples/sec Loss 1.0148 LearningRate 0.0001 Epoch: 29 Global Step: 50900 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 09:59:31,988-Speed 9380.75 samples/sec Loss 1.0167 LearningRate 0.0001 Epoch: 29 Global Step: 50910 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 09:59:58,107-Speed 9409.63 samples/sec Loss 1.0129 LearningRate 0.0001 Epoch: 29 Global Step: 50920 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 10:00:24,222-Speed 9410.96 samples/sec Loss 1.0175 LearningRate 0.0001 Epoch: 29 Global Step: 50930 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 10:00:52,839-Speed 8588.25 samples/sec Loss 1.0166 LearningRate 0.0001 Epoch: 29 Global Step: 50940 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 10:01:18,966-Speed 9406.93 samples/sec Loss 1.0093 LearningRate 0.0001 Epoch: 29 Global Step: 50950 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 10:01:45,094-Speed 9406.40 samples/sec Loss 1.0083 LearningRate 0.0001 Epoch: 29 Global Step: 50960 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 10:02:11,264-Speed 9391.43 samples/sec Loss 1.0049 LearningRate 0.0001 Epoch: 29 Global Step: 50970 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 10:02:37,399-Speed 9403.92 samples/sec Loss 1.0079 LearningRate 0.0001 Epoch: 29 Global Step: 50980 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 10:03:03,655-Speed 9360.81 samples/sec Loss 1.0041 LearningRate 0.0001 Epoch: 29 Global Step: 50990 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 10:03:29,830-Speed 9389.30 samples/sec Loss 1.0099 LearningRate 0.0001 Epoch: 29 Global Step: 51000 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 10:03:55,994-Speed 9393.31 samples/sec Loss 1.0079 LearningRate 0.0001 Epoch: 29 Global Step: 51010 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-03-06 10:04:22,253-Speed 9359.69 samples/sec Loss 1.0108 LearningRate 0.0001 Epoch: 29 Global Step: 51020 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-03-06 10:04:48,518-Speed 9357.54 samples/sec Loss 1.0061 LearningRate 0.0001 Epoch: 29 Global Step: 51030 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-03-06 10:05:14,653-Speed 9404.24 samples/sec Loss 0.9984 LearningRate 0.0001 Epoch: 29 Global Step: 51040 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 10:05:40,828-Speed 9389.28 samples/sec Loss 1.0046 LearningRate 0.0001 Epoch: 29 Global Step: 51050 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 10:06:07,012-Speed 9386.57 samples/sec Loss 1.0090 LearningRate 0.0001 Epoch: 29 Global Step: 51060 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:06:33,257-Speed 9364.45 samples/sec Loss 0.9977 LearningRate 0.0001 Epoch: 29 Global Step: 51070 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:06:59,542-Speed 9350.42 samples/sec Loss 1.0105 LearningRate 0.0001 Epoch: 29 Global Step: 51080 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:07:25,761-Speed 9373.76 samples/sec Loss 1.0114 LearningRate 0.0001 Epoch: 29 Global Step: 51090 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:07:51,970-Speed 9377.52 samples/sec Loss 1.0075 LearningRate 0.0001 Epoch: 29 Global Step: 51100 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:08:18,045-Speed 9425.64 samples/sec Loss 1.0118 LearningRate 0.0001 Epoch: 29 Global Step: 51110 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:08:44,142-Speed 9417.64 samples/sec Loss 1.0077 LearningRate 0.0001 Epoch: 29 Global Step: 51120 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:09:10,207-Speed 9429.15 samples/sec Loss 1.0018 LearningRate 0.0001 Epoch: 29 Global Step: 51130 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:09:36,403-Speed 9382.11 samples/sec Loss 0.9998 LearningRate 0.0001 Epoch: 29 Global Step: 51140 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:10:02,456-Speed 9433.17 samples/sec Loss 1.0095 LearningRate 0.0001 Epoch: 29 Global Step: 51150 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:10:28,616-Speed 9395.09 samples/sec Loss 1.0104 LearningRate 0.0001 Epoch: 29 Global Step: 51160 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 10:10:54,790-Speed 9390.09 samples/sec Loss 0.9998 LearningRate 0.0001 Epoch: 29 Global Step: 51170 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 10:11:20,896-Speed 9415.37 samples/sec Loss 1.0017 LearningRate 0.0001 Epoch: 29 Global Step: 51180 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:11:47,002-Speed 9414.20 samples/sec Loss 1.0085 LearningRate 0.0001 Epoch: 29 Global Step: 51190 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:12:13,133-Speed 9405.40 samples/sec Loss 1.0095 LearningRate 0.0001 Epoch: 29 Global Step: 51200 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:12:39,281-Speed 9399.09 samples/sec Loss 1.0068 LearningRate 0.0001 Epoch: 29 Global Step: 51210 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:13:05,471-Speed 9384.28 samples/sec Loss 1.0120 LearningRate 0.0001 Epoch: 29 Global Step: 51220 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:13:31,593-Speed 9408.74 samples/sec Loss 1.0061 LearningRate 0.0001 Epoch: 29 Global Step: 51230 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:13:57,730-Speed 9403.28 samples/sec Loss 1.0019 LearningRate 0.0001 Epoch: 29 Global Step: 51240 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:14:23,808-Speed 9424.29 samples/sec Loss 1.0079 LearningRate 0.0001 Epoch: 29 Global Step: 51250 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:14:50,100-Speed 9347.88 samples/sec Loss 1.0040 LearningRate 0.0001 Epoch: 29 Global Step: 51260 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:15:16,241-Speed 9401.65 samples/sec Loss 0.9998 LearningRate 0.0001 Epoch: 29 Global Step: 51270 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:15:42,343-Speed 9415.98 samples/sec Loss 1.0084 LearningRate 0.0001 Epoch: 29 Global Step: 51280 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 10:16:08,471-Speed 9406.52 samples/sec Loss 1.0019 LearningRate 0.0001 Epoch: 29 Global Step: 51290 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 10:16:34,643-Speed 9390.67 samples/sec Loss 1.0069 LearningRate 0.0001 Epoch: 29 Global Step: 51300 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 10:17:00,807-Speed 9393.45 samples/sec Loss 0.9967 LearningRate 0.0001 Epoch: 29 Global Step: 51310 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 10:17:27,042-Speed 9367.85 samples/sec Loss 1.0101 LearningRate 0.0001 Epoch: 29 Global Step: 51320 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 10:17:53,185-Speed 9401.35 samples/sec Loss 1.0049 LearningRate 0.0001 Epoch: 29 Global Step: 51330 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 10:18:19,346-Speed 9394.29 samples/sec Loss 1.0051 LearningRate 0.0001 Epoch: 29 Global Step: 51340 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 10:18:45,470-Speed 9407.70 samples/sec Loss 0.9978 LearningRate 0.0001 Epoch: 29 Global Step: 51350 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 10:19:11,618-Speed 9399.22 samples/sec Loss 0.9972 LearningRate 0.0001 Epoch: 29 Global Step: 51360 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 10:19:37,853-Speed 9368.17 samples/sec Loss 1.0036 LearningRate 0.0001 Epoch: 29 Global Step: 51370 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 10:20:03,997-Speed 9400.46 samples/sec Loss 1.0015 LearningRate 0.0001 Epoch: 29 Global Step: 51380 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-03-06 10:20:30,118-Speed 9408.99 samples/sec Loss 1.0090 LearningRate 0.0001 Epoch: 29 Global Step: 51390 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-03-06 10:20:56,177-Speed 9431.51 samples/sec Loss 0.9929 LearningRate 0.0001 Epoch: 29 Global Step: 51400 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 10:21:22,346-Speed 9392.23 samples/sec Loss 0.9958 LearningRate 0.0001 Epoch: 29 Global Step: 51410 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 10:21:48,494-Speed 9399.40 samples/sec Loss 0.9964 LearningRate 0.0001 Epoch: 29 Global Step: 51420 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 10:22:14,689-Speed 9382.19 samples/sec Loss 0.9952 LearningRate 0.0001 Epoch: 29 Global Step: 51430 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 10:22:40,846-Speed 9396.06 samples/sec Loss 1.0036 LearningRate 0.0001 Epoch: 29 Global Step: 51440 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 10:23:06,920-Speed 9425.88 samples/sec Loss 0.9878 LearningRate 0.0001 Epoch: 29 Global Step: 51450 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:23:33,036-Speed 9410.70 samples/sec Loss 1.0086 LearningRate 0.0001 Epoch: 29 Global Step: 51460 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:23:59,224-Speed 9384.93 samples/sec Loss 1.0003 LearningRate 0.0001 Epoch: 29 Global Step: 51470 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:24:25,375-Speed 9398.56 samples/sec Loss 0.9927 LearningRate 0.0001 Epoch: 29 Global Step: 51480 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:24:51,556-Speed 9387.21 samples/sec Loss 0.9907 LearningRate 0.0001 Epoch: 29 Global Step: 51490 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:25:17,704-Speed 9399.26 samples/sec Loss 0.9918 LearningRate 0.0001 Epoch: 29 Global Step: 51500 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:25:43,840-Speed 9403.67 samples/sec Loss 0.9911 LearningRate 0.0001 Epoch: 29 Global Step: 51510 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:26:09,966-Speed 9407.33 samples/sec Loss 0.9953 LearningRate 0.0001 Epoch: 29 Global Step: 51520 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:26:36,153-Speed 9385.37 samples/sec Loss 1.0037 LearningRate 0.0001 Epoch: 29 Global Step: 51530 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:27:02,376-Speed 9372.32 samples/sec Loss 0.9975 LearningRate 0.0001 Epoch: 29 Global Step: 51540 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:27:28,572-Speed 9381.88 samples/sec Loss 0.9892 LearningRate 0.0001 Epoch: 29 Global Step: 51550 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 10:27:54,813-Speed 9365.88 samples/sec Loss 0.9962 LearningRate 0.0001 Epoch: 29 Global Step: 51560 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 10:28:21,025-Speed 9376.44 samples/sec Loss 1.0000 LearningRate 0.0001 Epoch: 29 Global Step: 51570 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 10:28:47,194-Speed 9391.69 samples/sec Loss 0.9908 LearningRate 0.0001 Epoch: 29 Global Step: 51580 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 10:29:13,404-Speed 9376.99 samples/sec Loss 1.0017 LearningRate 0.0001 Epoch: 29 Global Step: 51590 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 10:29:39,533-Speed 9406.15 samples/sec Loss 0.9935 LearningRate 0.0001 Epoch: 29 Global Step: 51600 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 10:30:05,677-Speed 9401.25 samples/sec Loss 0.9993 LearningRate 0.0001 Epoch: 29 Global Step: 51610 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 10:30:32,031-Speed 9325.45 samples/sec Loss 0.9995 LearningRate 0.0001 Epoch: 29 Global Step: 51620 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 10:30:58,190-Speed 9395.52 samples/sec Loss 0.9960 LearningRate 0.0001 Epoch: 29 Global Step: 51630 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 10:31:24,280-Speed 9419.80 samples/sec Loss 0.9943 LearningRate 0.0001 Epoch: 29 Global Step: 51640 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:31:50,409-Speed 9406.12 samples/sec Loss 1.0003 LearningRate 0.0001 Epoch: 29 Global Step: 51650 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:32:16,547-Speed 9403.04 samples/sec Loss 0.9944 LearningRate 0.0001 Epoch: 29 Global Step: 51660 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:32:42,707-Speed 9394.54 samples/sec Loss 0.9936 LearningRate 0.0001 Epoch: 29 Global Step: 51670 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:33:08,752-Speed 9436.65 samples/sec Loss 0.9987 LearningRate 0.0001 Epoch: 29 Global Step: 51680 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:33:34,830-Speed 9424.56 samples/sec Loss 0.9932 LearningRate 0.0001 Epoch: 29 Global Step: 51690 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:34:00,913-Speed 9422.65 samples/sec Loss 0.9994 LearningRate 0.0001 Epoch: 29 Global Step: 51700 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:34:27,024-Speed 9412.46 samples/sec Loss 0.9867 LearningRate 0.0001 Epoch: 29 Global Step: 51710 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:34:53,176-Speed 9398.05 samples/sec Loss 1.0004 LearningRate 0.0001 Epoch: 29 Global Step: 51720 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:35:19,333-Speed 9395.97 samples/sec Loss 0.9903 LearningRate 0.0001 Epoch: 29 Global Step: 51730 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:35:45,454-Speed 9408.76 samples/sec Loss 0.9991 LearningRate 0.0001 Epoch: 29 Global Step: 51740 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 10:36:11,582-Speed 9406.49 samples/sec Loss 0.9975 LearningRate 0.0001 Epoch: 29 Global Step: 51750 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:36:37,680-Speed 9417.41 samples/sec Loss 0.9998 LearningRate 0.0001 Epoch: 29 Global Step: 51760 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:37:03,774-Speed 9418.49 samples/sec Loss 0.9928 LearningRate 0.0001 Epoch: 29 Global Step: 51770 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:37:29,874-Speed 9416.40 samples/sec Loss 0.9951 LearningRate 0.0001 Epoch: 29 Global Step: 51780 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:37:56,027-Speed 9397.54 samples/sec Loss 0.9999 LearningRate 0.0001 Epoch: 29 Global Step: 51790 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:38:22,191-Speed 9393.30 samples/sec Loss 0.9888 LearningRate 0.0001 Epoch: 29 Global Step: 51800 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:38:48,324-Speed 9404.63 samples/sec Loss 0.9997 LearningRate 0.0001 Epoch: 29 Global Step: 51810 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:39:14,486-Speed 9394.22 samples/sec Loss 0.9882 LearningRate 0.0001 Epoch: 29 Global Step: 51820 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:39:40,592-Speed 9414.19 samples/sec Loss 0.9956 LearningRate 0.0001 Epoch: 29 Global Step: 51830 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:40:06,755-Speed 9394.16 samples/sec Loss 0.9962 LearningRate 0.0001 Epoch: 29 Global Step: 51840 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:41:26,722-Speed 3073.32 samples/sec Loss 0.9964 LearningRate 0.0001 Epoch: 30 Global Step: 51850 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 10:41:52,672-Speed 9470.64 samples/sec Loss 0.9931 LearningRate 0.0001 Epoch: 30 Global Step: 51860 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 10:42:18,629-Speed 9468.73 samples/sec Loss 0.9891 LearningRate 0.0001 Epoch: 30 Global Step: 51870 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 10:42:44,664-Speed 9439.96 samples/sec Loss 0.9894 LearningRate 0.0001 Epoch: 30 Global Step: 51880 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 10:43:10,628-Speed 9465.80 samples/sec Loss 0.9867 LearningRate 0.0001 Epoch: 30 Global Step: 51890 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 10:43:36,592-Speed 9465.85 samples/sec Loss 0.9907 LearningRate 0.0001 Epoch: 30 Global Step: 51900 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 10:44:02,605-Speed 9448.27 samples/sec Loss 0.9802 LearningRate 0.0001 Epoch: 30 Global Step: 51910 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 10:44:28,605-Speed 9452.46 samples/sec Loss 0.9766 LearningRate 0.0001 Epoch: 30 Global Step: 51920 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 10:44:54,624-Speed 9445.80 samples/sec Loss 0.9815 LearningRate 0.0001 Epoch: 30 Global Step: 51930 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 10:45:20,535-Speed 9485.30 samples/sec Loss 0.9716 LearningRate 0.0001 Epoch: 30 Global Step: 51940 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 10:45:46,568-Speed 9440.99 samples/sec Loss 0.9798 LearningRate 0.0001 Epoch: 30 Global Step: 51950 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-03-06 10:46:12,637-Speed 9427.43 samples/sec Loss 0.9843 LearningRate 0.0001 Epoch: 30 Global Step: 51960 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-03-06 10:46:38,849-Speed 9376.36 samples/sec Loss 0.9790 LearningRate 0.0001 Epoch: 30 Global Step: 51970 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 10:47:04,866-Speed 9446.89 samples/sec Loss 0.9793 LearningRate 0.0001 Epoch: 30 Global Step: 51980 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 10:47:30,853-Speed 9457.25 samples/sec Loss 0.9837 LearningRate 0.0001 Epoch: 30 Global Step: 51990 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 10:47:56,897-Speed 9436.91 samples/sec Loss 0.9824 LearningRate 0.0001 Epoch: 30 Global Step: 52000 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 10:48:22,983-Speed 9421.75 samples/sec Loss 0.9776 LearningRate 0.0001 Epoch: 30 Global Step: 52010 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 10:48:49,038-Speed 9433.27 samples/sec Loss 0.9887 LearningRate 0.0001 Epoch: 30 Global Step: 52020 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 10:49:15,121-Speed 9422.63 samples/sec Loss 0.9834 LearningRate 0.0001 Epoch: 30 Global Step: 52030 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:49:41,139-Speed 9446.06 samples/sec Loss 0.9801 LearningRate 0.0001 Epoch: 30 Global Step: 52040 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:50:07,155-Speed 9446.84 samples/sec Loss 0.9792 LearningRate 0.0001 Epoch: 30 Global Step: 52050 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:50:33,229-Speed 9425.81 samples/sec Loss 0.9907 LearningRate 0.0001 Epoch: 30 Global Step: 52060 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:50:59,368-Speed 9402.35 samples/sec Loss 0.9824 LearningRate 0.0001 Epoch: 30 Global Step: 52070 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:51:25,425-Speed 9432.20 samples/sec Loss 0.9783 LearningRate 0.0001 Epoch: 30 Global Step: 52080 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:51:51,540-Speed 9411.05 samples/sec Loss 0.9885 LearningRate 0.0001 Epoch: 30 Global Step: 52090 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:52:17,593-Speed 9433.41 samples/sec Loss 0.9829 LearningRate 0.0001 Epoch: 30 Global Step: 52100 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:52:43,635-Speed 9437.49 samples/sec Loss 0.9866 LearningRate 0.0001 Epoch: 30 Global Step: 52110 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:53:09,783-Speed 9399.13 samples/sec Loss 0.9843 LearningRate 0.0001 Epoch: 30 Global Step: 52120 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-06 10:53:35,888-Speed 9414.66 samples/sec Loss 0.9836 LearningRate 0.0001 Epoch: 30 Global Step: 52130 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-06 10:54:02,062-Speed 9389.90 samples/sec Loss 0.9858 LearningRate 0.0001 Epoch: 30 Global Step: 52140 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-06 10:54:28,159-Speed 9417.52 samples/sec Loss 0.9824 LearningRate 0.0001 Epoch: 30 Global Step: 52150 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 10:54:54,270-Speed 9412.34 samples/sec Loss 0.9842 LearningRate 0.0001 Epoch: 30 Global Step: 52160 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 10:55:20,411-Speed 9402.08 samples/sec Loss 0.9840 LearningRate 0.0001 Epoch: 30 Global Step: 52170 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 10:55:46,533-Speed 9408.35 samples/sec Loss 0.9891 LearningRate 0.0001 Epoch: 30 Global Step: 52180 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 10:56:12,660-Speed 9406.91 samples/sec Loss 0.9801 LearningRate 0.0001 Epoch: 30 Global Step: 52190 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 10:56:39,017-Speed 9324.69 samples/sec Loss 0.9842 LearningRate 0.0001 Epoch: 30 Global Step: 52200 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 10:57:05,099-Speed 9422.80 samples/sec Loss 0.9822 LearningRate 0.0001 Epoch: 30 Global Step: 52210 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 10:57:31,294-Speed 9382.60 samples/sec Loss 0.9797 LearningRate 0.0001 Epoch: 30 Global Step: 52220 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 10:57:57,458-Speed 9393.10 samples/sec Loss 0.9767 LearningRate 0.0001 Epoch: 30 Global Step: 52230 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 10:58:23,584-Speed 9407.50 samples/sec Loss 0.9839 LearningRate 0.0001 Epoch: 30 Global Step: 52240 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 10:58:49,747-Speed 9393.70 samples/sec Loss 0.9804 LearningRate 0.0001 Epoch: 30 Global Step: 52250 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-06 10:59:15,826-Speed 9423.74 samples/sec Loss 0.9790 LearningRate 0.0001 Epoch: 30 Global Step: 52260 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 10:59:41,968-Speed 9401.75 samples/sec Loss 0.9823 LearningRate 0.0001 Epoch: 30 Global Step: 52270 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:00:08,230-Speed 9358.46 samples/sec Loss 0.9752 LearningRate 0.0001 Epoch: 30 Global Step: 52280 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:00:34,394-Speed 9393.22 samples/sec Loss 0.9789 LearningRate 0.0001 Epoch: 30 Global Step: 52290 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-06 11:01:00,575-Speed 9387.52 samples/sec Loss 0.9880 LearningRate 0.0001 Epoch: 30 Global Step: 52300 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-06 11:01:26,888-Speed 9340.42 samples/sec Loss 0.9784 LearningRate 0.0001 Epoch: 30 Global Step: 52310 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-06 11:01:53,045-Speed 9396.12 samples/sec Loss 0.9789 LearningRate 0.0001 Epoch: 30 Global Step: 52320 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-06 11:02:19,193-Speed 9399.14 samples/sec Loss 0.9788 LearningRate 0.0001 Epoch: 30 Global Step: 52330 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-06 11:02:45,378-Speed 9386.34 samples/sec Loss 0.9800 LearningRate 0.0001 Epoch: 30 Global Step: 52340 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-06 11:03:11,516-Speed 9402.61 samples/sec Loss 0.9700 LearningRate 0.0001 Epoch: 30 Global Step: 52350 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-06 11:03:37,684-Speed 9392.17 samples/sec Loss 0.9797 LearningRate 0.0001 Epoch: 30 Global Step: 52360 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-06 11:04:03,803-Speed 9409.74 samples/sec Loss 0.9825 LearningRate 0.0001 Epoch: 30 Global Step: 52370 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-06 11:04:29,947-Speed 9400.45 samples/sec Loss 0.9820 LearningRate 0.0001 Epoch: 30 Global Step: 52380 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-06 11:04:56,184-Speed 9367.41 samples/sec Loss 0.9784 LearningRate 0.0001 Epoch: 30 Global Step: 52390 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:05:22,266-Speed 9423.16 samples/sec Loss 0.9725 LearningRate 0.0001 Epoch: 30 Global Step: 52400 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:05:48,381-Speed 9411.19 samples/sec Loss 0.9806 LearningRate 0.0001 Epoch: 30 Global Step: 52410 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:06:14,617-Speed 9367.84 samples/sec Loss 0.9743 LearningRate 0.0001 Epoch: 30 Global Step: 52420 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:06:40,770-Speed 9397.44 samples/sec Loss 0.9786 LearningRate 0.0001 Epoch: 30 Global Step: 52430 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:07:06,935-Speed 9393.17 samples/sec Loss 0.9815 LearningRate 0.0001 Epoch: 30 Global Step: 52440 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:07:33,053-Speed 9410.16 samples/sec Loss 0.9744 LearningRate 0.0001 Epoch: 30 Global Step: 52450 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:07:59,278-Speed 9371.79 samples/sec Loss 0.9727 LearningRate 0.0001 Epoch: 30 Global Step: 52460 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:08:25,367-Speed 9420.50 samples/sec Loss 0.9736 LearningRate 0.0001 Epoch: 30 Global Step: 52470 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:08:51,571-Speed 9379.15 samples/sec Loss 0.9711 LearningRate 0.0001 Epoch: 30 Global Step: 52480 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:09:17,727-Speed 9396.21 samples/sec Loss 0.9774 LearningRate 0.0001 Epoch: 30 Global Step: 52490 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-06 11:09:43,966-Speed 9366.92 samples/sec Loss 0.9716 LearningRate 0.0001 Epoch: 30 Global Step: 52500 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-06 11:10:10,137-Speed 9390.91 samples/sec Loss 0.9727 LearningRate 0.0001 Epoch: 30 Global Step: 52510 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-06 11:10:36,271-Speed 9404.16 samples/sec Loss 0.9709 LearningRate 0.0001 Epoch: 30 Global Step: 52520 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-06 11:11:03,694-Speed 8962.08 samples/sec Loss 0.9691 LearningRate 0.0001 Epoch: 30 Global Step: 52530 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-06 11:11:29,836-Speed 9401.39 samples/sec Loss 0.9780 LearningRate 0.0001 Epoch: 30 Global Step: 52540 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-06 11:11:55,958-Speed 9408.76 samples/sec Loss 0.9691 LearningRate 0.0001 Epoch: 30 Global Step: 52550 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-06 11:12:22,061-Speed 9415.51 samples/sec Loss 0.9757 LearningRate 0.0001 Epoch: 30 Global Step: 52560 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-06 11:12:48,200-Speed 9402.57 samples/sec Loss 0.9773 LearningRate 0.0001 Epoch: 30 Global Step: 52570 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:13:14,364-Speed 9393.51 samples/sec Loss 0.9719 LearningRate 0.0001 Epoch: 30 Global Step: 52580 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:13:40,604-Speed 9366.46 samples/sec Loss 0.9731 LearningRate 0.0001 Epoch: 30 Global Step: 52590 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:14:06,753-Speed 9398.99 samples/sec Loss 0.9784 LearningRate 0.0001 Epoch: 30 Global Step: 52600 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:14:32,931-Speed 9388.32 samples/sec Loss 0.9719 LearningRate 0.0001 Epoch: 30 Global Step: 52610 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:14:59,065-Speed 9404.45 samples/sec Loss 0.9792 LearningRate 0.0001 Epoch: 30 Global Step: 52620 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:15:25,269-Speed 9379.17 samples/sec Loss 0.9659 LearningRate 0.0001 Epoch: 30 Global Step: 52630 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:15:51,443-Speed 9389.68 samples/sec Loss 0.9720 LearningRate 0.0001 Epoch: 30 Global Step: 52640 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:16:17,571-Speed 9406.33 samples/sec Loss 0.9679 LearningRate 0.0001 Epoch: 30 Global Step: 52650 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:16:43,750-Speed 9388.37 samples/sec Loss 0.9790 LearningRate 0.0001 Epoch: 30 Global Step: 52660 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:17:09,870-Speed 9409.15 samples/sec Loss 0.9741 LearningRate 0.0001 Epoch: 30 Global Step: 52670 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-06 11:17:36,165-Speed 9347.21 samples/sec Loss 0.9669 LearningRate 0.0001 Epoch: 30 Global Step: 52680 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-06 11:18:02,391-Speed 9371.23 samples/sec Loss 0.9642 LearningRate 0.0001 Epoch: 30 Global Step: 52690 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-06 11:18:28,563-Speed 9390.54 samples/sec Loss 0.9728 LearningRate 0.0001 Epoch: 30 Global Step: 52700 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:18:54,785-Speed 9372.72 samples/sec Loss 0.9711 LearningRate 0.0001 Epoch: 30 Global Step: 52710 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:19:21,086-Speed 9344.49 samples/sec Loss 0.9614 LearningRate 0.0001 Epoch: 30 Global Step: 52720 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:19:47,249-Speed 9394.45 samples/sec Loss 0.9677 LearningRate 0.0001 Epoch: 30 Global Step: 52730 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:20:13,509-Speed 9359.22 samples/sec Loss 0.9595 LearningRate 0.0001 Epoch: 30 Global Step: 52740 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:20:39,690-Speed 9387.36 samples/sec Loss 0.9692 LearningRate 0.0001 Epoch: 30 Global Step: 52750 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:21:05,959-Speed 9355.74 samples/sec Loss 0.9680 LearningRate 0.0001 Epoch: 30 Global Step: 52760 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:21:32,133-Speed 9390.09 samples/sec Loss 0.9624 LearningRate 0.0001 Epoch: 30 Global Step: 52770 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-06 11:21:58,338-Speed 9378.76 samples/sec Loss 0.9633 LearningRate 0.0001 Epoch: 30 Global Step: 52780 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-06 11:22:24,524-Speed 9385.39 samples/sec Loss 0.9742 LearningRate 0.0001 Epoch: 30 Global Step: 52790 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-06 11:22:50,655-Speed 9405.52 samples/sec Loss 0.9734 LearningRate 0.0001 Epoch: 30 Global Step: 52800 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-06 11:23:16,842-Speed 9385.12 samples/sec Loss 0.9711 LearningRate 0.0001 Epoch: 30 Global Step: 52810 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-06 11:23:42,942-Speed 9416.52 samples/sec Loss 0.9637 LearningRate 0.0001 Epoch: 30 Global Step: 52820 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-06 11:24:09,109-Speed 9392.58 samples/sec Loss 0.9633 LearningRate 0.0001 Epoch: 30 Global Step: 52830 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-06 11:24:35,244-Speed 9403.71 samples/sec Loss 0.9619 LearningRate 0.0001 Epoch: 30 Global Step: 52840 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-06 11:25:01,367-Speed 9408.42 samples/sec Loss 0.9670 LearningRate 0.0001 Epoch: 30 Global Step: 52850 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-06 11:25:27,556-Speed 9384.46 samples/sec Loss 0.9698 LearningRate 0.0001 Epoch: 30 Global Step: 52860 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-06 11:25:53,696-Speed 9402.46 samples/sec Loss 0.9603 LearningRate 0.0001 Epoch: 30 Global Step: 52870 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:26:19,823-Speed 9406.84 samples/sec Loss 0.9547 LearningRate 0.0001 Epoch: 30 Global Step: 52880 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:26:45,946-Speed 9407.83 samples/sec Loss 0.9632 LearningRate 0.0001 Epoch: 30 Global Step: 52890 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:27:12,080-Speed 9404.50 samples/sec Loss 0.9703 LearningRate 0.0001 Epoch: 30 Global Step: 52900 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:27:38,273-Speed 9383.00 samples/sec Loss 0.9612 LearningRate 0.0001 Epoch: 30 Global Step: 52910 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:28:04,492-Speed 9373.95 samples/sec Loss 0.9708 LearningRate 0.0001 Epoch: 30 Global Step: 52920 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:28:30,580-Speed 9420.93 samples/sec Loss 0.9641 LearningRate 0.0001 Epoch: 30 Global Step: 52930 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:28:56,699-Speed 9409.79 samples/sec Loss 0.9606 LearningRate 0.0001 Epoch: 30 Global Step: 52940 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:29:22,781-Speed 9422.87 samples/sec Loss 0.9647 LearningRate 0.0001 Epoch: 30 Global Step: 52950 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:29:48,939-Speed 9395.73 samples/sec Loss 0.9538 LearningRate 0.0001 Epoch: 30 Global Step: 52960 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:30:15,133-Speed 9382.93 samples/sec Loss 0.9595 LearningRate 0.0001 Epoch: 30 Global Step: 52970 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-06 11:30:41,310-Speed 9388.51 samples/sec Loss 0.9625 LearningRate 0.0001 Epoch: 30 Global Step: 52980 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-06 11:31:07,422-Speed 9412.32 samples/sec Loss 0.9587 LearningRate 0.0001 Epoch: 30 Global Step: 52990 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:31:33,622-Speed 9380.46 samples/sec Loss 0.9650 LearningRate 0.0001 Epoch: 30 Global Step: 53000 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:31:59,786-Speed 9393.86 samples/sec Loss 0.9601 LearningRate 0.0001 Epoch: 30 Global Step: 53010 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:32:25,898-Speed 9412.01 samples/sec Loss 0.9587 LearningRate 0.0001 Epoch: 30 Global Step: 53020 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:32:52,040-Speed 9401.23 samples/sec Loss 0.9661 LearningRate 0.0001 Epoch: 30 Global Step: 53030 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:33:18,141-Speed 9415.84 samples/sec Loss 0.9609 LearningRate 0.0001 Epoch: 30 Global Step: 53040 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:33:44,293-Speed 9397.80 samples/sec Loss 0.9514 LearningRate 0.0001 Epoch: 30 Global Step: 53050 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:34:10,398-Speed 9414.75 samples/sec Loss 0.9554 LearningRate 0.0001 Epoch: 30 Global Step: 53060 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:34:36,492-Speed 9418.78 samples/sec Loss 0.9490 LearningRate 0.0001 Epoch: 30 Global Step: 53070 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:35:02,581-Speed 9420.21 samples/sec Loss 0.9551 LearningRate 0.0001 Epoch: 30 Global Step: 53080 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:35:28,668-Speed 9421.27 samples/sec Loss 0.9566 LearningRate 0.0001 Epoch: 30 Global Step: 53090 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-06 11:35:54,767-Speed 9416.73 samples/sec Loss 0.9583 LearningRate 0.0001 Epoch: 30 Global Step: 53100 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-06 11:36:20,880-Speed 9411.86 samples/sec Loss 0.9546 LearningRate 0.0001 Epoch: 30 Global Step: 53110 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-06 11:36:47,059-Speed 9388.27 samples/sec Loss 0.9636 LearningRate 0.0001 Epoch: 30 Global Step: 53120 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-06 11:37:13,222-Speed 9393.52 samples/sec Loss 0.9565 LearningRate 0.0001 Epoch: 30 Global Step: 53130 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-06 11:37:39,442-Speed 9373.56 samples/sec Loss 0.9610 LearningRate 0.0001 Epoch: 30 Global Step: 53140 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-06 11:38:05,587-Speed 9400.25 samples/sec Loss 0.9628 LearningRate 0.0001 Epoch: 30 Global Step: 53150 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-06 11:38:31,749-Speed 9394.34 samples/sec Loss 0.9574 LearningRate 0.0001 Epoch: 30 Global Step: 53160 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-06 11:38:57,841-Speed 9419.11 samples/sec Loss 0.9609 LearningRate 0.0001 Epoch: 30 Global Step: 53170 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-06 11:39:23,967-Speed 9407.24 samples/sec Loss 0.9634 LearningRate 0.0001 Epoch: 30 Global Step: 53180 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-06 11:39:50,260-Speed 9347.43 samples/sec Loss 0.9589 LearningRate 0.0001 Epoch: 30 Global Step: 53190 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-03-06 11:40:16,336-Speed 9425.24 samples/sec Loss 0.9607 LearningRate 0.0001 Epoch: 30 Global Step: 53200 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-06 11:40:42,559-Speed 9372.48 samples/sec Loss 0.9559 LearningRate 0.0001 Epoch: 30 Global Step: 53210 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:41:08,712-Speed 9397.55 samples/sec Loss 0.9591 LearningRate 0.0001 Epoch: 30 Global Step: 53220 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:41:34,841-Speed 9406.11 samples/sec Loss 0.9581 LearningRate 0.0001 Epoch: 30 Global Step: 53230 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:42:01,064-Speed 9372.43 samples/sec Loss 0.9594 LearningRate 0.0001 Epoch: 30 Global Step: 53240 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:42:27,191-Speed 9407.20 samples/sec Loss 0.9549 LearningRate 0.0001 Epoch: 30 Global Step: 53250 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:42:53,285-Speed 9418.78 samples/sec Loss 0.9557 LearningRate 0.0001 Epoch: 30 Global Step: 53260 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:43:19,440-Speed 9396.65 samples/sec Loss 0.9542 LearningRate 0.0001 Epoch: 30 Global Step: 53270 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:43:45,664-Speed 9372.15 samples/sec Loss 0.9514 LearningRate 0.0001 Epoch: 30 Global Step: 53280 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:44:11,785-Speed 9408.78 samples/sec Loss 0.9518 LearningRate 0.0001 Epoch: 30 Global Step: 53290 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:44:37,936-Speed 9399.58 samples/sec Loss 0.9564 LearningRate 0.0001 Epoch: 30 Global Step: 53300 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:45:04,284-Speed 9328.05 samples/sec Loss 0.9558 LearningRate 0.0001 Epoch: 30 Global Step: 53310 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-06 11:45:30,472-Speed 9384.91 samples/sec Loss 0.9594 LearningRate 0.0001 Epoch: 30 Global Step: 53320 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-06 11:45:56,659-Speed 9385.28 samples/sec Loss 0.9561 LearningRate 0.0001 Epoch: 30 Global Step: 53330 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:46:22,810-Speed 9398.15 samples/sec Loss 0.9516 LearningRate 0.0001 Epoch: 30 Global Step: 53340 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:46:48,899-Speed 9420.77 samples/sec Loss 0.9557 LearningRate 0.0001 Epoch: 30 Global Step: 53350 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:47:15,082-Speed 9386.24 samples/sec Loss 0.9531 LearningRate 0.0001 Epoch: 30 Global Step: 53360 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:47:41,219-Speed 9403.34 samples/sec Loss 0.9580 LearningRate 0.0001 Epoch: 30 Global Step: 53370 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:48:07,420-Speed 9380.10 samples/sec Loss 0.9539 LearningRate 0.0001 Epoch: 30 Global Step: 53380 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:48:33,610-Speed 9384.46 samples/sec Loss 0.9525 LearningRate 0.0001 Epoch: 30 Global Step: 53390 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:48:59,766-Speed 9396.43 samples/sec Loss 0.9550 LearningRate 0.0001 Epoch: 30 Global Step: 53400 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:49:26,052-Speed 9349.71 samples/sec Loss 0.9508 LearningRate 0.0001 Epoch: 30 Global Step: 53410 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:49:52,217-Speed 9393.56 samples/sec Loss 0.9579 LearningRate 0.0001 Epoch: 30 Global Step: 53420 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-06 11:50:18,326-Speed 9413.38 samples/sec Loss 0.9507 LearningRate 0.0001 Epoch: 30 Global Step: 53430 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-06 11:50:44,500-Speed 9390.61 samples/sec Loss 0.9508 LearningRate 0.0001 Epoch: 30 Global Step: 53440 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-06 11:51:10,661-Speed 9394.50 samples/sec Loss 0.9593 LearningRate 0.0001 Epoch: 30 Global Step: 53450 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-06 11:51:36,820-Speed 9395.58 samples/sec Loss 0.9533 LearningRate 0.0001 Epoch: 30 Global Step: 53460 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-06 11:52:03,106-Speed 9349.90 samples/sec Loss 0.9472 LearningRate 0.0001 Epoch: 30 Global Step: 53470 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-06 11:52:29,318-Speed 9376.46 samples/sec Loss 0.9522 LearningRate 0.0001 Epoch: 30 Global Step: 53480 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-06 11:52:55,466-Speed 9398.86 samples/sec Loss 0.9549 LearningRate 0.0001 Epoch: 30 Global Step: 53490 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-06 11:53:21,615-Speed 9398.99 samples/sec Loss 0.9519 LearningRate 0.0001 Epoch: 30 Global Step: 53500 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 11:53:47,770-Speed 9396.68 samples/sec Loss 0.9577 LearningRate 0.0001 Epoch: 30 Global Step: 53510 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 11:54:13,934-Speed 9393.34 samples/sec Loss 0.9463 LearningRate 0.0001 Epoch: 30 Global Step: 53520 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 11:54:40,083-Speed 9398.82 samples/sec Loss 0.9486 LearningRate 0.0001 Epoch: 30 Global Step: 53530 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 11:55:06,215-Speed 9405.13 samples/sec Loss 0.9465 LearningRate 0.0001 Epoch: 30 Global Step: 53540 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 11:55:32,286-Speed 9426.96 samples/sec Loss 0.9435 LearningRate 0.0001 Epoch: 30 Global Step: 53550 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 11:55:58,425-Speed 9402.26 samples/sec Loss 0.9625 LearningRate 0.0001 Epoch: 30 Global Step: 53560 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 11:56:24,616-Speed 9384.00 samples/sec Loss 0.9594 LearningRate 0.0001 Epoch: 30 Global Step: 53570 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 11:57:43,282-Speed 3124.16 samples/sec Loss 0.9532 LearningRate 0.0001 Epoch: 31 Global Step: 53580 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 11:58:09,136-Speed 9506.04 samples/sec Loss 0.9442 LearningRate 0.0001 Epoch: 31 Global Step: 53590 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 11:58:35,095-Speed 9467.54 samples/sec Loss 0.9464 LearningRate 0.0001 Epoch: 31 Global Step: 53600 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 11:59:01,192-Speed 9417.74 samples/sec Loss 0.9413 LearningRate 0.0001 Epoch: 31 Global Step: 53610 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 11:59:27,387-Speed 9382.40 samples/sec Loss 0.9434 LearningRate 0.0001 Epoch: 31 Global Step: 53620 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 11:59:53,551-Speed 9393.56 samples/sec Loss 0.9514 LearningRate 0.0001 Epoch: 31 Global Step: 53630 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:00:19,807-Speed 9360.36 samples/sec Loss 0.9425 LearningRate 0.0001 Epoch: 31 Global Step: 53640 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:00:46,014-Speed 9378.20 samples/sec Loss 0.9438 LearningRate 0.0001 Epoch: 31 Global Step: 53650 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:01:12,147-Speed 9404.64 samples/sec Loss 0.9442 LearningRate 0.0001 Epoch: 31 Global Step: 53660 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:01:38,260-Speed 9411.91 samples/sec Loss 0.9409 LearningRate 0.0001 Epoch: 31 Global Step: 53670 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:02:04,365-Speed 9414.56 samples/sec Loss 0.9421 LearningRate 0.0001 Epoch: 31 Global Step: 53680 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:02:30,493-Speed 9406.12 samples/sec Loss 0.9435 LearningRate 0.0001 Epoch: 31 Global Step: 53690 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:02:56,651-Speed 9395.88 samples/sec Loss 0.9444 LearningRate 0.0001 Epoch: 31 Global Step: 53700 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:03:22,774-Speed 9408.08 samples/sec Loss 0.9425 LearningRate 0.0001 Epoch: 31 Global Step: 53710 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-03-06 12:03:48,867-Speed 9419.36 samples/sec Loss 0.9410 LearningRate 0.0001 Epoch: 31 Global Step: 53720 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:04:14,940-Speed 9426.38 samples/sec Loss 0.9450 LearningRate 0.0001 Epoch: 31 Global Step: 53730 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:04:41,097-Speed 9395.96 samples/sec Loss 0.9443 LearningRate 0.0001 Epoch: 31 Global Step: 53740 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:05:07,209-Speed 9412.05 samples/sec Loss 0.9478 LearningRate 0.0001 Epoch: 31 Global Step: 53750 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:05:33,310-Speed 9416.38 samples/sec Loss 0.9406 LearningRate 0.0001 Epoch: 31 Global Step: 53760 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:05:59,413-Speed 9415.70 samples/sec Loss 0.9419 LearningRate 0.0001 Epoch: 31 Global Step: 53770 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:06:25,549-Speed 9404.62 samples/sec Loss 0.9341 LearningRate 0.0001 Epoch: 31 Global Step: 53780 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:06:58,657-Speed 7423.28 samples/sec Loss 0.9438 LearningRate 0.0001 Epoch: 31 Global Step: 53790 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:07:24,818-Speed 9394.38 samples/sec Loss 0.9460 LearningRate 0.0001 Epoch: 31 Global Step: 53800 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:07:50,969-Speed 9398.39 samples/sec Loss 0.9431 LearningRate 0.0001 Epoch: 31 Global Step: 53810 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:08:17,110-Speed 9401.58 samples/sec Loss 0.9483 LearningRate 0.0001 Epoch: 31 Global Step: 53820 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-03-06 12:08:43,136-Speed 9443.21 samples/sec Loss 0.9460 LearningRate 0.0001 Epoch: 31 Global Step: 53830 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:09:09,326-Speed 9384.12 samples/sec Loss 0.9433 LearningRate 0.0001 Epoch: 31 Global Step: 53840 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:09:35,515-Speed 9384.95 samples/sec Loss 0.9428 LearningRate 0.0001 Epoch: 31 Global Step: 53850 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:10:01,638-Speed 9408.06 samples/sec Loss 0.9404 LearningRate 0.0001 Epoch: 31 Global Step: 53860 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:10:27,846-Speed 9377.73 samples/sec Loss 0.9370 LearningRate 0.0001 Epoch: 31 Global Step: 53870 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:10:54,037-Speed 9384.00 samples/sec Loss 0.9387 LearningRate 0.0001 Epoch: 31 Global Step: 53880 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:11:20,215-Speed 9388.53 samples/sec Loss 0.9429 LearningRate 0.0001 Epoch: 31 Global Step: 53890 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:11:46,419-Speed 9378.98 samples/sec Loss 0.9469 LearningRate 0.0001 Epoch: 31 Global Step: 53900 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:12:12,578-Speed 9395.02 samples/sec Loss 0.9378 LearningRate 0.0001 Epoch: 31 Global Step: 53910 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:12:38,735-Speed 9396.28 samples/sec Loss 0.9375 LearningRate 0.0001 Epoch: 31 Global Step: 53920 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:13:04,919-Speed 9386.25 samples/sec Loss 0.9439 LearningRate 0.0001 Epoch: 31 Global Step: 53930 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-03-06 12:13:30,985-Speed 9429.46 samples/sec Loss 0.9475 LearningRate 0.0001 Epoch: 31 Global Step: 53940 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:13:57,171-Speed 9385.66 samples/sec Loss 0.9429 LearningRate 0.0001 Epoch: 31 Global Step: 53950 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:14:23,282-Speed 9412.14 samples/sec Loss 0.9340 LearningRate 0.0001 Epoch: 31 Global Step: 53960 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:14:49,446-Speed 9393.65 samples/sec Loss 0.9457 LearningRate 0.0001 Epoch: 31 Global Step: 53970 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:15:15,637-Speed 9384.46 samples/sec Loss 0.9404 LearningRate 0.0001 Epoch: 31 Global Step: 53980 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:15:43,867-Speed 8705.82 samples/sec Loss 0.9412 LearningRate 0.0001 Epoch: 31 Global Step: 53990 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 12:16:09,986-Speed 9409.91 samples/sec Loss 0.9363 LearningRate 0.0001 Epoch: 31 Global Step: 54000 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 12:16:36,137-Speed 9399.02 samples/sec Loss 0.9407 LearningRate 0.0001 Epoch: 31 Global Step: 54010 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 12:17:02,281-Speed 9400.90 samples/sec Loss 0.9468 LearningRate 0.0001 Epoch: 31 Global Step: 54020 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 12:17:28,325-Speed 9436.79 samples/sec Loss 0.9463 LearningRate 0.0001 Epoch: 31 Global Step: 54030 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 12:17:54,541-Speed 9374.93 samples/sec Loss 0.9326 LearningRate 0.0001 Epoch: 31 Global Step: 54040 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 12:18:20,754-Speed 9376.00 samples/sec Loss 0.9448 LearningRate 0.0001 Epoch: 31 Global Step: 54050 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 12:18:46,822-Speed 9428.22 samples/sec Loss 0.9381 LearningRate 0.0001 Epoch: 31 Global Step: 54060 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 12:19:12,958-Speed 9403.37 samples/sec Loss 0.9507 LearningRate 0.0001 Epoch: 31 Global Step: 54070 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 12:19:39,082-Speed 9407.94 samples/sec Loss 0.9385 LearningRate 0.0001 Epoch: 31 Global Step: 54080 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 12:20:05,215-Speed 9404.60 samples/sec Loss 0.9360 LearningRate 0.0001 Epoch: 31 Global Step: 54090 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:20:31,370-Speed 9396.98 samples/sec Loss 0.9377 LearningRate 0.0001 Epoch: 31 Global Step: 54100 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:20:57,473-Speed 9415.59 samples/sec Loss 0.9372 LearningRate 0.0001 Epoch: 31 Global Step: 54110 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:21:23,641-Speed 9391.95 samples/sec Loss 0.9341 LearningRate 0.0001 Epoch: 31 Global Step: 54120 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:21:49,745-Speed 9414.99 samples/sec Loss 0.9352 LearningRate 0.0001 Epoch: 31 Global Step: 54130 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:22:15,898-Speed 9397.81 samples/sec Loss 0.9411 LearningRate 0.0001 Epoch: 31 Global Step: 54140 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:22:42,057-Speed 9395.18 samples/sec Loss 0.9319 LearningRate 0.0001 Epoch: 31 Global Step: 54150 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:23:08,131-Speed 9426.57 samples/sec Loss 0.9355 LearningRate 0.0001 Epoch: 31 Global Step: 54160 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:23:34,335-Speed 9379.18 samples/sec Loss 0.9345 LearningRate 0.0001 Epoch: 31 Global Step: 54170 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:24:00,591-Speed 9360.76 samples/sec Loss 0.9390 LearningRate 0.0001 Epoch: 31 Global Step: 54180 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:24:26,768-Speed 9388.86 samples/sec Loss 0.9334 LearningRate 0.0001 Epoch: 31 Global Step: 54190 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-03-06 12:24:52,824-Speed 9432.53 samples/sec Loss 0.9350 LearningRate 0.0001 Epoch: 31 Global Step: 54200 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 12:25:18,965-Speed 9401.64 samples/sec Loss 0.9322 LearningRate 0.0001 Epoch: 31 Global Step: 54210 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 12:25:45,031-Speed 9428.78 samples/sec Loss 0.9363 LearningRate 0.0001 Epoch: 31 Global Step: 54220 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 12:26:11,192-Speed 9394.78 samples/sec Loss 0.9326 LearningRate 0.0001 Epoch: 31 Global Step: 54230 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 12:26:37,410-Speed 9374.03 samples/sec Loss 0.9361 LearningRate 0.0001 Epoch: 31 Global Step: 54240 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 12:27:03,512-Speed 9416.13 samples/sec Loss 0.9325 LearningRate 0.0001 Epoch: 31 Global Step: 54250 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 12:27:29,642-Speed 9405.86 samples/sec Loss 0.9293 LearningRate 0.0001 Epoch: 31 Global Step: 54260 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 12:27:55,802-Speed 9394.98 samples/sec Loss 0.9408 LearningRate 0.0001 Epoch: 31 Global Step: 54270 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 12:28:21,901-Speed 9416.90 samples/sec Loss 0.9340 LearningRate 0.0001 Epoch: 31 Global Step: 54280 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 12:28:48,041-Speed 9402.07 samples/sec Loss 0.9293 LearningRate 0.0001 Epoch: 31 Global Step: 54290 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 12:29:14,140-Speed 9416.78 samples/sec Loss 0.9356 LearningRate 0.0001 Epoch: 31 Global Step: 54300 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 12:29:40,285-Speed 9400.18 samples/sec Loss 0.9305 LearningRate 0.0001 Epoch: 31 Global Step: 54310 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 12:30:06,406-Speed 9409.17 samples/sec Loss 0.9334 LearningRate 0.0001 Epoch: 31 Global Step: 54320 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 12:30:32,530-Speed 9407.92 samples/sec Loss 0.9300 LearningRate 0.0001 Epoch: 31 Global Step: 54330 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 12:30:58,726-Speed 9381.81 samples/sec Loss 0.9391 LearningRate 0.0001 Epoch: 31 Global Step: 54340 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 12:31:24,788-Speed 9430.92 samples/sec Loss 0.9298 LearningRate 0.0001 Epoch: 31 Global Step: 54350 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 12:31:50,919-Speed 9405.41 samples/sec Loss 0.9352 LearningRate 0.0001 Epoch: 31 Global Step: 54360 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 12:32:17,026-Speed 9413.85 samples/sec Loss 0.9324 LearningRate 0.0001 Epoch: 31 Global Step: 54370 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 12:32:43,139-Speed 9411.77 samples/sec Loss 0.9307 LearningRate 0.0001 Epoch: 31 Global Step: 54380 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 12:33:09,324-Speed 9385.96 samples/sec Loss 0.9369 LearningRate 0.0001 Epoch: 31 Global Step: 54390 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 12:33:35,390-Speed 9428.75 samples/sec Loss 0.9350 LearningRate 0.0001 Epoch: 31 Global Step: 54400 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:34:01,517-Speed 9406.77 samples/sec Loss 0.9276 LearningRate 0.0001 Epoch: 31 Global Step: 54410 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:34:27,639-Speed 9408.42 samples/sec Loss 0.9203 LearningRate 0.0001 Epoch: 31 Global Step: 54420 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:34:53,787-Speed 9399.36 samples/sec Loss 0.9339 LearningRate 0.0001 Epoch: 31 Global Step: 54430 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:35:19,964-Speed 9388.55 samples/sec Loss 0.9285 LearningRate 0.0001 Epoch: 31 Global Step: 54440 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:35:46,146-Speed 9387.24 samples/sec Loss 0.9295 LearningRate 0.0001 Epoch: 31 Global Step: 54450 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:36:12,345-Speed 9380.59 samples/sec Loss 0.9310 LearningRate 0.0001 Epoch: 31 Global Step: 54460 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:36:38,530-Speed 9386.06 samples/sec Loss 0.9301 LearningRate 0.0001 Epoch: 31 Global Step: 54470 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:37:04,771-Speed 9365.81 samples/sec Loss 0.9281 LearningRate 0.0001 Epoch: 31 Global Step: 54480 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:37:30,983-Speed 9376.26 samples/sec Loss 0.9172 LearningRate 0.0001 Epoch: 31 Global Step: 54490 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:37:57,097-Speed 9411.75 samples/sec Loss 0.9249 LearningRate 0.0001 Epoch: 31 Global Step: 54500 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-03-06 12:38:23,210-Speed 9411.68 samples/sec Loss 0.9234 LearningRate 0.0001 Epoch: 31 Global Step: 54510 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-03-06 12:38:49,323-Speed 9411.94 samples/sec Loss 0.9299 LearningRate 0.0001 Epoch: 31 Global Step: 54520 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:39:15,473-Speed 9398.52 samples/sec Loss 0.9284 LearningRate 0.0001 Epoch: 31 Global Step: 54530 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:39:41,601-Speed 9406.42 samples/sec Loss 0.9298 LearningRate 0.0001 Epoch: 31 Global Step: 54540 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:40:07,691-Speed 9420.28 samples/sec Loss 0.9256 LearningRate 0.0001 Epoch: 31 Global Step: 54550 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:40:33,799-Speed 9413.84 samples/sec Loss 0.9176 LearningRate 0.0001 Epoch: 31 Global Step: 54560 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:40:59,983-Speed 9385.91 samples/sec Loss 0.9274 LearningRate 0.0001 Epoch: 31 Global Step: 54570 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:41:26,127-Speed 9400.89 samples/sec Loss 0.9267 LearningRate 0.0001 Epoch: 31 Global Step: 54580 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:41:52,229-Speed 9415.99 samples/sec Loss 0.9272 LearningRate 0.0001 Epoch: 31 Global Step: 54590 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:42:18,389-Speed 9394.95 samples/sec Loss 0.9301 LearningRate 0.0001 Epoch: 31 Global Step: 54600 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:42:44,441-Speed 9433.83 samples/sec Loss 0.9312 LearningRate 0.0001 Epoch: 31 Global Step: 54610 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 12:43:10,616-Speed 9389.29 samples/sec Loss 0.9244 LearningRate 0.0001 Epoch: 31 Global Step: 54620 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 12:43:36,821-Speed 9379.95 samples/sec Loss 0.9275 LearningRate 0.0001 Epoch: 31 Global Step: 54630 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 12:44:02,965-Speed 9400.89 samples/sec Loss 0.9278 LearningRate 0.0001 Epoch: 31 Global Step: 54640 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 12:44:29,104-Speed 9402.43 samples/sec Loss 0.9229 LearningRate 0.0001 Epoch: 31 Global Step: 54650 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 12:44:55,315-Speed 9376.55 samples/sec Loss 0.9268 LearningRate 0.0001 Epoch: 31 Global Step: 54660 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 12:45:21,454-Speed 9402.58 samples/sec Loss 0.9229 LearningRate 0.0001 Epoch: 31 Global Step: 54670 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 12:45:47,677-Speed 9372.23 samples/sec Loss 0.9193 LearningRate 0.0001 Epoch: 31 Global Step: 54680 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 12:46:13,825-Speed 9399.41 samples/sec Loss 0.9223 LearningRate 0.0001 Epoch: 31 Global Step: 54690 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 12:46:39,974-Speed 9398.59 samples/sec Loss 0.9205 LearningRate 0.0001 Epoch: 31 Global Step: 54700 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 12:47:06,143-Speed 9391.66 samples/sec Loss 0.9222 LearningRate 0.0001 Epoch: 31 Global Step: 54710 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:47:32,321-Speed 9388.45 samples/sec Loss 0.9239 LearningRate 0.0001 Epoch: 31 Global Step: 54720 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:47:58,444-Speed 9408.32 samples/sec Loss 0.9230 LearningRate 0.0001 Epoch: 31 Global Step: 54730 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:48:24,573-Speed 9405.88 samples/sec Loss 0.9272 LearningRate 0.0001 Epoch: 31 Global Step: 54740 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 12:48:50,674-Speed 9416.43 samples/sec Loss 0.9227 LearningRate 0.0001 Epoch: 31 Global Step: 54750 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 12:49:16,811-Speed 9403.11 samples/sec Loss 0.9234 LearningRate 0.0001 Epoch: 31 Global Step: 54760 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 12:49:42,944-Speed 9404.53 samples/sec Loss 0.9218 LearningRate 0.0001 Epoch: 31 Global Step: 54770 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 12:50:09,271-Speed 9335.52 samples/sec Loss 0.9198 LearningRate 0.0001 Epoch: 31 Global Step: 54780 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 12:50:35,422-Speed 9398.26 samples/sec Loss 0.9154 LearningRate 0.0001 Epoch: 31 Global Step: 54790 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 12:51:01,572-Speed 9398.91 samples/sec Loss 0.9227 LearningRate 0.0001 Epoch: 31 Global Step: 54800 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 12:51:27,738-Speed 9392.65 samples/sec Loss 0.9208 LearningRate 0.0001 Epoch: 31 Global Step: 54810 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 12:51:53,894-Speed 9396.14 samples/sec Loss 0.9267 LearningRate 0.0001 Epoch: 31 Global Step: 54820 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 12:52:20,012-Speed 9410.24 samples/sec Loss 0.9286 LearningRate 0.0001 Epoch: 31 Global Step: 54830 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-06 12:52:46,140-Speed 9406.47 samples/sec Loss 0.9124 LearningRate 0.0001 Epoch: 31 Global Step: 54840 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:53:12,262-Speed 9408.59 samples/sec Loss 0.9188 LearningRate 0.0001 Epoch: 31 Global Step: 54850 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-06 12:53:38,557-Speed 9346.98 samples/sec Loss 0.9206 LearningRate 0.0001 Epoch: 31 Global Step: 54860 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 12:54:04,733-Speed 9389.07 samples/sec Loss 0.9265 LearningRate 0.0001 Epoch: 31 Global Step: 54870 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-06 12:54:30,840-Speed 9413.84 samples/sec Loss 0.9293 LearningRate 0.0001 Epoch: 31 Global Step: 54880 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-06 12:54:56,958-Speed 9410.00 samples/sec Loss 0.9224 LearningRate 0.0001 Epoch: 31 Global Step: 54890 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-06 12:55:23,087-Speed 9406.36 samples/sec Loss 0.9182 LearningRate 0.0001 Epoch: 31 Global Step: 54900 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-06 12:55:49,240-Speed 9397.35 samples/sec Loss 0.9157 LearningRate 0.0001 Epoch: 31 Global Step: 54910 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-06 12:56:15,380-Speed 9402.34 samples/sec Loss 0.9098 LearningRate 0.0001 Epoch: 31 Global Step: 54920 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-06 12:56:41,496-Speed 9410.68 samples/sec Loss 0.9134 LearningRate 0.0001 Epoch: 31 Global Step: 54930 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-06 12:57:07,625-Speed 9405.87 samples/sec Loss 0.9218 LearningRate 0.0001 Epoch: 31 Global Step: 54940 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-06 12:57:33,813-Speed 9385.04 samples/sec Loss 0.9326 LearningRate 0.0001 Epoch: 31 Global Step: 54950 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-06 12:58:00,003-Speed 9384.11 samples/sec Loss 0.9214 LearningRate 0.0001 Epoch: 31 Global Step: 54960 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-06 12:58:26,183-Speed 9387.76 samples/sec Loss 0.9221 LearningRate 0.0001 Epoch: 31 Global Step: 54970 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 12:58:52,275-Speed 9419.66 samples/sec Loss 0.9123 LearningRate 0.0001 Epoch: 31 Global Step: 54980 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 12:59:18,419-Speed 9400.56 samples/sec Loss 0.9130 LearningRate 0.0001 Epoch: 31 Global Step: 54990 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 12:59:44,584-Speed 9392.77 samples/sec Loss 0.9163 LearningRate 0.0001 Epoch: 31 Global Step: 55000 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:00:10,743-Speed 9395.39 samples/sec Loss 0.9163 LearningRate 0.0001 Epoch: 31 Global Step: 55010 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:00:36,951-Speed 9377.88 samples/sec Loss 0.9225 LearningRate 0.0001 Epoch: 31 Global Step: 55020 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:01:03,096-Speed 9399.95 samples/sec Loss 0.9100 LearningRate 0.0001 Epoch: 31 Global Step: 55030 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:01:29,217-Speed 9409.06 samples/sec Loss 0.9173 LearningRate 0.0001 Epoch: 31 Global Step: 55040 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:01:55,378-Speed 9394.25 samples/sec Loss 0.9232 LearningRate 0.0001 Epoch: 31 Global Step: 55050 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:02:21,567-Speed 9384.71 samples/sec Loss 0.9198 LearningRate 0.0001 Epoch: 31 Global Step: 55060 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:02:47,784-Speed 9374.72 samples/sec Loss 0.9170 LearningRate 0.0001 Epoch: 31 Global Step: 55070 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-03-06 13:03:13,977-Speed 9382.82 samples/sec Loss 0.9156 LearningRate 0.0001 Epoch: 31 Global Step: 55080 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:03:40,151-Speed 9390.03 samples/sec Loss 0.9155 LearningRate 0.0001 Epoch: 31 Global Step: 55090 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:04:06,295-Speed 9400.70 samples/sec Loss 0.9109 LearningRate 0.0001 Epoch: 31 Global Step: 55100 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:04:32,435-Speed 9402.01 samples/sec Loss 0.9144 LearningRate 0.0001 Epoch: 31 Global Step: 55110 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:04:58,700-Speed 9357.60 samples/sec Loss 0.9211 LearningRate 0.0001 Epoch: 31 Global Step: 55120 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:05:24,861-Speed 9394.27 samples/sec Loss 0.9130 LearningRate 0.0001 Epoch: 31 Global Step: 55130 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:05:51,056-Speed 9382.58 samples/sec Loss 0.9067 LearningRate 0.0001 Epoch: 31 Global Step: 55140 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:06:17,234-Speed 9388.16 samples/sec Loss 0.9151 LearningRate 0.0001 Epoch: 31 Global Step: 55150 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:06:43,428-Speed 9382.66 samples/sec Loss 0.9124 LearningRate 0.0001 Epoch: 31 Global Step: 55160 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:07:09,561-Speed 9404.71 samples/sec Loss 0.9159 LearningRate 0.0001 Epoch: 31 Global Step: 55170 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:07:35,678-Speed 9410.57 samples/sec Loss 0.9158 LearningRate 0.0001 Epoch: 31 Global Step: 55180 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-03-06 13:08:01,973-Speed 9346.77 samples/sec Loss 0.9097 LearningRate 0.0001 Epoch: 31 Global Step: 55190 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:08:28,106-Speed 9404.88 samples/sec Loss 0.9151 LearningRate 0.0001 Epoch: 31 Global Step: 55200 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:08:54,268-Speed 9394.06 samples/sec Loss 0.9305 LearningRate 0.0000 Epoch: 31 Global Step: 55210 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:09:20,431-Speed 9393.89 samples/sec Loss 0.9198 LearningRate 0.0000 Epoch: 31 Global Step: 55220 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:09:46,613-Speed 9387.15 samples/sec Loss 0.9096 LearningRate 0.0000 Epoch: 31 Global Step: 55230 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:10:12,825-Speed 9376.45 samples/sec Loss 0.9186 LearningRate 0.0000 Epoch: 31 Global Step: 55240 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:10:38,966-Speed 9401.81 samples/sec Loss 0.9130 LearningRate 0.0000 Epoch: 31 Global Step: 55250 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:11:05,156-Speed 9384.28 samples/sec Loss 0.9145 LearningRate 0.0000 Epoch: 31 Global Step: 55260 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:11:31,268-Speed 9412.09 samples/sec Loss 0.9235 LearningRate 0.0000 Epoch: 31 Global Step: 55270 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:11:57,408-Speed 9402.09 samples/sec Loss 0.9141 LearningRate 0.0000 Epoch: 31 Global Step: 55280 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:12:23,568-Speed 9394.99 samples/sec Loss 0.9181 LearningRate 0.0000 Epoch: 31 Global Step: 55290 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:12:49,672-Speed 9414.92 samples/sec Loss 0.9184 LearningRate 0.0000 Epoch: 31 Global Step: 55300 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:14:09,731-Speed 3069.78 samples/sec Loss 0.9126 LearningRate 0.0000 Epoch: 32 Global Step: 55310 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:14:35,733-Speed 9451.92 samples/sec Loss 0.9113 LearningRate 0.0000 Epoch: 32 Global Step: 55320 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:15:01,788-Speed 9432.82 samples/sec Loss 0.9082 LearningRate 0.0000 Epoch: 32 Global Step: 55330 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:15:27,842-Speed 9433.13 samples/sec Loss 0.9124 LearningRate 0.0000 Epoch: 32 Global Step: 55340 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:15:53,954-Speed 9412.36 samples/sec Loss 0.9028 LearningRate 0.0000 Epoch: 32 Global Step: 55350 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:16:20,138-Speed 9386.03 samples/sec Loss 0.8992 LearningRate 0.0000 Epoch: 32 Global Step: 55360 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:16:46,236-Speed 9417.25 samples/sec Loss 0.8999 LearningRate 0.0000 Epoch: 32 Global Step: 55370 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:17:12,361-Speed 9407.68 samples/sec Loss 0.9061 LearningRate 0.0000 Epoch: 32 Global Step: 55380 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:17:38,419-Speed 9432.15 samples/sec Loss 0.9032 LearningRate 0.0000 Epoch: 32 Global Step: 55390 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-03-06 13:18:04,527-Speed 9413.60 samples/sec Loss 0.9075 LearningRate 0.0000 Epoch: 32 Global Step: 55400 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-03-06 13:18:30,624-Speed 9417.51 samples/sec Loss 0.9115 LearningRate 0.0000 Epoch: 32 Global Step: 55410 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:18:56,780-Speed 9396.18 samples/sec Loss 0.9043 LearningRate 0.0000 Epoch: 32 Global Step: 55420 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:19:22,922-Speed 9401.50 samples/sec Loss 0.9113 LearningRate 0.0000 Epoch: 32 Global Step: 55430 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:19:49,044-Speed 9408.70 samples/sec Loss 0.9040 LearningRate 0.0000 Epoch: 32 Global Step: 55440 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:20:15,272-Speed 9370.62 samples/sec Loss 0.8977 LearningRate 0.0000 Epoch: 32 Global Step: 55450 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:20:41,426-Speed 9397.26 samples/sec Loss 0.9109 LearningRate 0.0000 Epoch: 32 Global Step: 55460 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:21:07,666-Speed 9366.27 samples/sec Loss 0.9032 LearningRate 0.0000 Epoch: 32 Global Step: 55470 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:21:33,833-Speed 9392.55 samples/sec Loss 0.9093 LearningRate 0.0000 Epoch: 32 Global Step: 55480 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:21:59,916-Speed 9422.58 samples/sec Loss 0.9043 LearningRate 0.0000 Epoch: 32 Global Step: 55490 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-06 13:22:26,078-Speed 9394.28 samples/sec Loss 0.9034 LearningRate 0.0000 Epoch: 32 Global Step: 55500 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-06 13:22:52,243-Speed 9392.94 samples/sec Loss 0.9132 LearningRate 0.0000 Epoch: 32 Global Step: 55510 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-06 13:23:18,370-Speed 9406.77 samples/sec Loss 0.9152 LearningRate 0.0000 Epoch: 32 Global Step: 55520 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-06 13:23:44,564-Speed 9382.85 samples/sec Loss 0.9056 LearningRate 0.0000 Epoch: 32 Global Step: 55530 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-06 13:24:10,775-Speed 9376.66 samples/sec Loss 0.9056 LearningRate 0.0000 Epoch: 32 Global Step: 55540 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-06 13:24:36,928-Speed 9397.47 samples/sec Loss 0.9084 LearningRate 0.0000 Epoch: 32 Global Step: 55550 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-06 13:25:03,049-Speed 9408.75 samples/sec Loss 0.9031 LearningRate 0.0000 Epoch: 32 Global Step: 55560 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-06 13:25:29,137-Speed 9421.03 samples/sec Loss 0.9013 LearningRate 0.0000 Epoch: 32 Global Step: 55570 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-06 13:25:55,252-Speed 9411.13 samples/sec Loss 0.9073 LearningRate 0.0000 Epoch: 32 Global Step: 55580 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-06 13:26:21,405-Speed 9397.32 samples/sec Loss 0.9047 LearningRate 0.0000 Epoch: 32 Global Step: 55590 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:26:47,555-Speed 9398.76 samples/sec Loss 0.9071 LearningRate 0.0000 Epoch: 32 Global Step: 55600 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:27:13,692-Speed 9403.31 samples/sec Loss 0.9074 LearningRate 0.0000 Epoch: 32 Global Step: 55610 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-06 13:27:39,837-Speed 9399.94 samples/sec Loss 0.9106 LearningRate 0.0000 Epoch: 32 Global Step: 55620 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-06 13:28:05,940-Speed 9415.74 samples/sec Loss 0.9054 LearningRate 0.0000 Epoch: 32 Global Step: 55630 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-06 13:28:32,108-Speed 9391.76 samples/sec Loss 0.9052 LearningRate 0.0000 Epoch: 32 Global Step: 55640 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-06 13:28:58,221-Speed 9411.95 samples/sec Loss 0.9015 LearningRate 0.0000 Epoch: 32 Global Step: 55650 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-06 13:29:24,316-Speed 9418.28 samples/sec Loss 0.9074 LearningRate 0.0000 Epoch: 32 Global Step: 55660 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-06 13:29:50,462-Speed 9399.79 samples/sec Loss 0.9049 LearningRate 0.0000 Epoch: 32 Global Step: 55670 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-06 13:30:16,561-Speed 9417.07 samples/sec Loss 0.9046 LearningRate 0.0000 Epoch: 32 Global Step: 55680 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-06 13:30:42,723-Speed 9394.00 samples/sec Loss 0.9157 LearningRate 0.0000 Epoch: 32 Global Step: 55690 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-06 13:31:08,870-Speed 9399.90 samples/sec Loss 0.9132 LearningRate 0.0000 Epoch: 32 Global Step: 55700 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-06 13:31:35,047-Speed 9388.69 samples/sec Loss 0.9074 LearningRate 0.0000 Epoch: 32 Global Step: 55710 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:32:01,243-Speed 9382.12 samples/sec Loss 0.9006 LearningRate 0.0000 Epoch: 32 Global Step: 55720 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:32:27,442-Speed 9381.26 samples/sec Loss 0.9123 LearningRate 0.0000 Epoch: 32 Global Step: 55730 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:32:53,610-Speed 9391.80 samples/sec Loss 0.9057 LearningRate 0.0000 Epoch: 32 Global Step: 55740 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:33:19,890-Speed 9352.05 samples/sec Loss 0.9022 LearningRate 0.0000 Epoch: 32 Global Step: 55750 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:33:46,100-Speed 9377.23 samples/sec Loss 0.9048 LearningRate 0.0000 Epoch: 32 Global Step: 55760 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:34:12,331-Speed 9369.66 samples/sec Loss 0.9082 LearningRate 0.0000 Epoch: 32 Global Step: 55770 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:34:38,589-Speed 9359.62 samples/sec Loss 0.9012 LearningRate 0.0000 Epoch: 32 Global Step: 55780 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:35:04,744-Speed 9396.79 samples/sec Loss 0.8998 LearningRate 0.0000 Epoch: 32 Global Step: 55790 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:35:30,879-Speed 9403.59 samples/sec Loss 0.8998 LearningRate 0.0000 Epoch: 32 Global Step: 55800 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:35:57,114-Speed 9368.39 samples/sec Loss 0.8993 LearningRate 0.0000 Epoch: 32 Global Step: 55810 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-03-06 13:36:23,302-Speed 9384.83 samples/sec Loss 0.9061 LearningRate 0.0000 Epoch: 32 Global Step: 55820 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:36:49,485-Speed 9386.90 samples/sec Loss 0.9017 LearningRate 0.0000 Epoch: 32 Global Step: 55830 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:37:15,577-Speed 9419.43 samples/sec Loss 0.9010 LearningRate 0.0000 Epoch: 32 Global Step: 55840 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:37:41,734-Speed 9396.00 samples/sec Loss 0.9010 LearningRate 0.0000 Epoch: 32 Global Step: 55850 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:38:07,829-Speed 9418.54 samples/sec Loss 0.9019 LearningRate 0.0000 Epoch: 32 Global Step: 55860 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:38:33,961-Speed 9405.01 samples/sec Loss 0.8989 LearningRate 0.0000 Epoch: 32 Global Step: 55870 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:39:00,139-Speed 9389.06 samples/sec Loss 0.9013 LearningRate 0.0000 Epoch: 32 Global Step: 55880 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:39:26,351-Speed 9375.99 samples/sec Loss 0.8963 LearningRate 0.0000 Epoch: 32 Global Step: 55890 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:39:52,516-Speed 9393.34 samples/sec Loss 0.8962 LearningRate 0.0000 Epoch: 32 Global Step: 55900 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:40:18,732-Speed 9375.29 samples/sec Loss 0.8993 LearningRate 0.0000 Epoch: 32 Global Step: 55910 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:40:44,747-Speed 9447.29 samples/sec Loss 0.8898 LearningRate 0.0000 Epoch: 32 Global Step: 55920 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-06 13:41:10,960-Speed 9376.03 samples/sec Loss 0.8981 LearningRate 0.0000 Epoch: 32 Global Step: 55930 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-06 13:41:37,081-Speed 9409.10 samples/sec Loss 0.8976 LearningRate 0.0000 Epoch: 32 Global Step: 55940 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-06 13:42:03,224-Speed 9400.99 samples/sec Loss 0.8993 LearningRate 0.0000 Epoch: 32 Global Step: 55950 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-06 13:42:29,330-Speed 9414.43 samples/sec Loss 0.8985 LearningRate 0.0000 Epoch: 32 Global Step: 55960 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-06 13:42:55,445-Speed 9410.94 samples/sec Loss 0.8963 LearningRate 0.0000 Epoch: 32 Global Step: 55970 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-06 13:43:21,530-Speed 9421.97 samples/sec Loss 0.8997 LearningRate 0.0000 Epoch: 32 Global Step: 55980 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-06 13:43:47,708-Speed 9388.80 samples/sec Loss 0.8996 LearningRate 0.0000 Epoch: 32 Global Step: 55990 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-06 13:44:13,775-Speed 9428.46 samples/sec Loss 0.8944 LearningRate 0.0000 Epoch: 32 Global Step: 56000 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-06 13:44:39,926-Speed 9398.08 samples/sec Loss 0.8973 LearningRate 0.0000 Epoch: 32 Global Step: 56010 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-06 13:45:06,029-Speed 9415.46 samples/sec Loss 0.8975 LearningRate 0.0000 Epoch: 32 Global Step: 56020 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:45:32,204-Speed 9389.51 samples/sec Loss 0.8966 LearningRate 0.0000 Epoch: 32 Global Step: 56030 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:45:58,406-Speed 9380.02 samples/sec Loss 0.8963 LearningRate 0.0000 Epoch: 32 Global Step: 56040 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:46:24,646-Speed 9366.29 samples/sec Loss 0.8947 LearningRate 0.0000 Epoch: 32 Global Step: 56050 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:46:50,815-Speed 9391.98 samples/sec Loss 0.8962 LearningRate 0.0000 Epoch: 32 Global Step: 56060 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:47:16,947-Speed 9405.03 samples/sec Loss 0.8965 LearningRate 0.0000 Epoch: 32 Global Step: 56070 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-06 13:47:43,055-Speed 9413.69 samples/sec Loss 0.8918 LearningRate 0.0000 Epoch: 32 Global Step: 56080 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-06 13:48:09,273-Speed 9374.21 samples/sec Loss 0.8944 LearningRate 0.0000 Epoch: 32 Global Step: 56090 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-06 13:48:35,360-Speed 9421.17 samples/sec Loss 0.9095 LearningRate 0.0000 Epoch: 32 Global Step: 56100 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-06 13:49:01,554-Speed 9383.19 samples/sec Loss 0.8980 LearningRate 0.0000 Epoch: 32 Global Step: 56110 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-06 13:49:27,797-Speed 9365.20 samples/sec Loss 0.9030 LearningRate 0.0000 Epoch: 32 Global Step: 56120 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-06 13:49:53,993-Speed 9381.95 samples/sec Loss 0.8985 LearningRate 0.0000 Epoch: 32 Global Step: 56130 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-06 13:50:20,190-Speed 9381.67 samples/sec Loss 0.9070 LearningRate 0.0000 Epoch: 32 Global Step: 56140 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-06 13:50:46,378-Speed 9385.08 samples/sec Loss 0.9039 LearningRate 0.0000 Epoch: 32 Global Step: 56150 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-06 13:51:12,575-Speed 9381.65 samples/sec Loss 0.8997 LearningRate 0.0000 Epoch: 32 Global Step: 56160 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-06 13:51:38,769-Speed 9382.90 samples/sec Loss 0.8920 LearningRate 0.0000 Epoch: 32 Global Step: 56170 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-06 13:52:05,045-Speed 9353.55 samples/sec Loss 0.8895 LearningRate 0.0000 Epoch: 32 Global Step: 56180 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-06 13:52:31,278-Speed 9369.04 samples/sec Loss 0.8961 LearningRate 0.0000 Epoch: 32 Global Step: 56190 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-06 13:52:57,506-Speed 9370.72 samples/sec Loss 0.8890 LearningRate 0.0000 Epoch: 32 Global Step: 56200 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-06 13:53:23,727-Speed 9373.27 samples/sec Loss 0.8923 LearningRate 0.0000 Epoch: 32 Global Step: 56210 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-06 13:53:49,889-Speed 9394.36 samples/sec Loss 0.8907 LearningRate 0.0000 Epoch: 32 Global Step: 56220 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-06 13:54:16,079-Speed 9384.14 samples/sec Loss 0.8833 LearningRate 0.0000 Epoch: 32 Global Step: 56230 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-06 13:54:42,259-Speed 9387.78 samples/sec Loss 0.8893 LearningRate 0.0000 Epoch: 32 Global Step: 56240 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-06 13:55:08,547-Speed 9348.97 samples/sec Loss 0.8914 LearningRate 0.0000 Epoch: 32 Global Step: 56250 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-06 13:55:34,722-Speed 9389.75 samples/sec Loss 0.8924 LearningRate 0.0000 Epoch: 32 Global Step: 56260 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-06 13:56:00,918-Speed 9381.83 samples/sec Loss 0.8910 LearningRate 0.0000 Epoch: 32 Global Step: 56270 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-06 13:56:27,079-Speed 9394.48 samples/sec Loss 0.8936 LearningRate 0.0000 Epoch: 32 Global Step: 56280 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-06 13:56:53,332-Speed 9361.86 samples/sec Loss 0.8931 LearningRate 0.0000 Epoch: 32 Global Step: 56290 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-06 13:57:19,530-Speed 9381.06 samples/sec Loss 0.8947 LearningRate 0.0000 Epoch: 32 Global Step: 56300 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-06 13:57:45,690-Speed 9395.01 samples/sec Loss 0.8980 LearningRate 0.0000 Epoch: 32 Global Step: 56310 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-06 13:58:11,872-Speed 9387.06 samples/sec Loss 0.8931 LearningRate 0.0000 Epoch: 32 Global Step: 56320 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-06 13:58:38,091-Speed 9374.20 samples/sec Loss 0.8907 LearningRate 0.0000 Epoch: 32 Global Step: 56330 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-06 13:59:04,285-Speed 9382.91 samples/sec Loss 0.8940 LearningRate 0.0000 Epoch: 32 Global Step: 56340 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-06 13:59:30,464-Speed 9388.04 samples/sec Loss 0.8878 LearningRate 0.0000 Epoch: 32 Global Step: 56350 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-06 13:59:56,539-Speed 9425.29 samples/sec Loss 0.8895 LearningRate 0.0000 Epoch: 32 Global Step: 56360 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-06 14:00:22,630-Speed 9420.07 samples/sec Loss 0.8874 LearningRate 0.0000 Epoch: 32 Global Step: 56370 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-06 14:00:48,760-Speed 9406.61 samples/sec Loss 0.8873 LearningRate 0.0000 Epoch: 32 Global Step: 56380 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-06 14:01:14,974-Speed 9375.49 samples/sec Loss 0.8897 LearningRate 0.0000 Epoch: 32 Global Step: 56390 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-06 14:01:41,085-Speed 9412.40 samples/sec Loss 0.8929 LearningRate 0.0000 Epoch: 32 Global Step: 56400 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-06 14:02:07,258-Speed 9389.88 samples/sec Loss 0.8908 LearningRate 0.0000 Epoch: 32 Global Step: 56410 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-06 14:02:33,451-Speed 9383.31 samples/sec Loss 0.8896 LearningRate 0.0000 Epoch: 32 Global Step: 56420 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-06 14:02:59,558-Speed 9413.81 samples/sec Loss 0.8891 LearningRate 0.0000 Epoch: 32 Global Step: 56430 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-06 14:03:25,793-Speed 9367.86 samples/sec Loss 0.8920 LearningRate 0.0000 Epoch: 32 Global Step: 56440 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-06 14:03:51,992-Speed 9381.28 samples/sec Loss 0.8887 LearningRate 0.0000 Epoch: 32 Global Step: 56450 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-06 14:04:18,252-Speed 9358.86 samples/sec Loss 0.8949 LearningRate 0.0000 Epoch: 32 Global Step: 56460 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:04:44,495-Speed 9365.49 samples/sec Loss 0.8939 LearningRate 0.0000 Epoch: 32 Global Step: 56470 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:05:10,674-Speed 9388.20 samples/sec Loss 0.9013 LearningRate 0.0000 Epoch: 32 Global Step: 56480 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:05:36,885-Speed 9376.63 samples/sec Loss 0.8905 LearningRate 0.0000 Epoch: 32 Global Step: 56490 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:06:03,014-Speed 9406.22 samples/sec Loss 0.8881 LearningRate 0.0000 Epoch: 32 Global Step: 56500 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:06:29,156-Speed 9401.49 samples/sec Loss 0.8859 LearningRate 0.0000 Epoch: 32 Global Step: 56510 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:06:55,283-Speed 9407.66 samples/sec Loss 0.8890 LearningRate 0.0000 Epoch: 32 Global Step: 56520 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:07:21,389-Speed 9414.26 samples/sec Loss 0.8832 LearningRate 0.0000 Epoch: 32 Global Step: 56530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:07:47,562-Speed 9390.39 samples/sec Loss 0.8863 LearningRate 0.0000 Epoch: 32 Global Step: 56540 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-06 14:08:13,674-Speed 9412.05 samples/sec Loss 0.8872 LearningRate 0.0000 Epoch: 32 Global Step: 56550 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-06 14:08:39,808-Speed 9404.08 samples/sec Loss 0.8876 LearningRate 0.0000 Epoch: 32 Global Step: 56560 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-06 14:09:05,963-Speed 9396.66 samples/sec Loss 0.8894 LearningRate 0.0000 Epoch: 32 Global Step: 56570 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-06 14:09:32,016-Speed 9433.65 samples/sec Loss 0.8851 LearningRate 0.0000 Epoch: 32 Global Step: 56580 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-06 14:09:58,103-Speed 9421.02 samples/sec Loss 0.8837 LearningRate 0.0000 Epoch: 32 Global Step: 56590 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-06 14:10:24,223-Speed 9409.44 samples/sec Loss 0.8862 LearningRate 0.0000 Epoch: 32 Global Step: 56600 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-06 14:10:50,353-Speed 9405.44 samples/sec Loss 0.8817 LearningRate 0.0000 Epoch: 32 Global Step: 56610 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-06 14:11:16,442-Speed 9420.71 samples/sec Loss 0.8876 LearningRate 0.0000 Epoch: 32 Global Step: 56620 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-06 14:11:42,557-Speed 9411.15 samples/sec Loss 0.8909 LearningRate 0.0000 Epoch: 32 Global Step: 56630 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-06 14:12:08,764-Speed 9378.05 samples/sec Loss 0.8922 LearningRate 0.0000 Epoch: 32 Global Step: 56640 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:12:34,918-Speed 9396.71 samples/sec Loss 0.8871 LearningRate 0.0000 Epoch: 32 Global Step: 56650 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:13:01,019-Speed 9416.65 samples/sec Loss 0.8830 LearningRate 0.0000 Epoch: 32 Global Step: 56660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:13:30,459-Speed 8348.14 samples/sec Loss 0.8877 LearningRate 0.0000 Epoch: 32 Global Step: 56670 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:13:56,540-Speed 9423.77 samples/sec Loss 0.8806 LearningRate 0.0000 Epoch: 32 Global Step: 56680 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-06 14:14:22,722-Speed 9387.16 samples/sec Loss 0.8868 LearningRate 0.0000 Epoch: 32 Global Step: 56690 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-06 14:14:48,955-Speed 9368.77 samples/sec Loss 0.8858 LearningRate 0.0000 Epoch: 32 Global Step: 56700 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-06 14:15:15,138-Speed 9386.67 samples/sec Loss 0.8820 LearningRate 0.0000 Epoch: 32 Global Step: 56710 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-06 14:15:41,240-Speed 9415.86 samples/sec Loss 0.8828 LearningRate 0.0000 Epoch: 32 Global Step: 56720 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-06 14:16:07,326-Speed 9422.18 samples/sec Loss 0.8941 LearningRate 0.0000 Epoch: 32 Global Step: 56730 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-06 14:16:33,445-Speed 9409.69 samples/sec Loss 0.8882 LearningRate 0.0000 Epoch: 32 Global Step: 56740 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-06 14:16:59,559-Speed 9411.45 samples/sec Loss 0.8811 LearningRate 0.0000 Epoch: 32 Global Step: 56750 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-06 14:17:25,740-Speed 9387.53 samples/sec Loss 0.8796 LearningRate 0.0000 Epoch: 32 Global Step: 56760 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-06 14:17:51,945-Speed 9378.85 samples/sec Loss 0.8854 LearningRate 0.0000 Epoch: 32 Global Step: 56770 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-06 14:18:18,166-Speed 9373.04 samples/sec Loss 0.8814 LearningRate 0.0000 Epoch: 32 Global Step: 56780 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-06 14:18:44,379-Speed 9375.89 samples/sec Loss 0.8861 LearningRate 0.0000 Epoch: 32 Global Step: 56790 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-06 14:19:10,494-Speed 9411.03 samples/sec Loss 0.8824 LearningRate 0.0000 Epoch: 32 Global Step: 56800 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-06 14:19:36,689-Speed 9382.60 samples/sec Loss 0.8863 LearningRate 0.0000 Epoch: 32 Global Step: 56810 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-06 14:20:02,800-Speed 9412.89 samples/sec Loss 0.8824 LearningRate 0.0000 Epoch: 32 Global Step: 56820 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-06 14:20:28,963-Speed 9393.75 samples/sec Loss 0.8769 LearningRate 0.0000 Epoch: 32 Global Step: 56830 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-06 14:20:55,146-Speed 9386.42 samples/sec Loss 0.8873 LearningRate 0.0000 Epoch: 32 Global Step: 56840 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-06 14:21:21,295-Speed 9398.86 samples/sec Loss 0.8854 LearningRate 0.0000 Epoch: 32 Global Step: 56850 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-06 14:21:47,505-Speed 9377.35 samples/sec Loss 0.8865 LearningRate 0.0000 Epoch: 32 Global Step: 56860 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-06 14:22:13,704-Speed 9380.88 samples/sec Loss 0.8799 LearningRate 0.0000 Epoch: 32 Global Step: 56870 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-06 14:22:39,858-Speed 9397.20 samples/sec Loss 0.8808 LearningRate 0.0000 Epoch: 32 Global Step: 56880 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:23:05,937-Speed 9423.96 samples/sec Loss 0.8829 LearningRate 0.0000 Epoch: 32 Global Step: 56890 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:23:32,150-Speed 9375.78 samples/sec Loss 0.8836 LearningRate 0.0000 Epoch: 32 Global Step: 56900 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:23:58,285-Speed 9404.02 samples/sec Loss 0.8894 LearningRate 0.0000 Epoch: 32 Global Step: 56910 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:24:24,434-Speed 9398.86 samples/sec Loss 0.8911 LearningRate 0.0000 Epoch: 32 Global Step: 56920 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:24:50,541-Speed 9414.05 samples/sec Loss 0.8848 LearningRate 0.0000 Epoch: 32 Global Step: 56930 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:25:16,737-Speed 9381.97 samples/sec Loss 0.8904 LearningRate 0.0000 Epoch: 32 Global Step: 56940 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:25:43,003-Speed 9357.03 samples/sec Loss 0.8805 LearningRate 0.0000 Epoch: 32 Global Step: 56950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:26:09,209-Speed 9378.20 samples/sec Loss 0.8772 LearningRate 0.0000 Epoch: 32 Global Step: 56960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:26:35,364-Speed 9397.04 samples/sec Loss 0.8745 LearningRate 0.0000 Epoch: 32 Global Step: 56970 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:27:01,510-Speed 9399.94 samples/sec Loss 0.8785 LearningRate 0.0000 Epoch: 32 Global Step: 56980 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-03-06 14:27:27,664-Speed 9396.66 samples/sec Loss 0.8806 LearningRate 0.0000 Epoch: 32 Global Step: 56990 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-03-06 14:27:53,920-Speed 9360.47 samples/sec Loss 0.8800 LearningRate 0.0000 Epoch: 32 Global Step: 57000 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:28:20,134-Speed 9375.65 samples/sec Loss 0.8815 LearningRate 0.0000 Epoch: 32 Global Step: 57010 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:28:46,344-Speed 9377.00 samples/sec Loss 0.8847 LearningRate 0.0000 Epoch: 32 Global Step: 57020 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:29:12,505-Speed 9394.49 samples/sec Loss 0.8817 LearningRate 0.0000 Epoch: 32 Global Step: 57030 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:30:31,558-Speed 3108.85 samples/sec Loss 0.8796 LearningRate 0.0000 Epoch: 33 Global Step: 57040 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:30:57,529-Speed 9463.35 samples/sec Loss 0.8733 LearningRate 0.0000 Epoch: 33 Global Step: 57050 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:31:23,677-Speed 9399.03 samples/sec Loss 0.8707 LearningRate 0.0000 Epoch: 33 Global Step: 57060 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:31:49,675-Speed 9453.84 samples/sec Loss 0.8793 LearningRate 0.0000 Epoch: 33 Global Step: 57070 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:32:15,734-Speed 9431.07 samples/sec Loss 0.8737 LearningRate 0.0000 Epoch: 33 Global Step: 57080 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:32:41,855-Speed 9409.00 samples/sec Loss 0.8808 LearningRate 0.0000 Epoch: 33 Global Step: 57090 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:33:07,886-Speed 9441.54 samples/sec Loss 0.8816 LearningRate 0.0000 Epoch: 33 Global Step: 57100 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-03-06 14:33:33,965-Speed 9424.10 samples/sec Loss 0.8689 LearningRate 0.0000 Epoch: 33 Global Step: 57110 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:34:00,022-Speed 9432.15 samples/sec Loss 0.8749 LearningRate 0.0000 Epoch: 33 Global Step: 57120 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:34:26,038-Speed 9446.66 samples/sec Loss 0.8732 LearningRate 0.0000 Epoch: 33 Global Step: 57130 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:34:52,081-Speed 9437.49 samples/sec Loss 0.8777 LearningRate 0.0000 Epoch: 33 Global Step: 57140 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:35:18,139-Speed 9431.75 samples/sec Loss 0.8758 LearningRate 0.0000 Epoch: 33 Global Step: 57150 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:35:44,275-Speed 9403.46 samples/sec Loss 0.8769 LearningRate 0.0000 Epoch: 33 Global Step: 57160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:36:10,303-Speed 9442.56 samples/sec Loss 0.8689 LearningRate 0.0000 Epoch: 33 Global Step: 57170 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-06 14:36:36,393-Speed 9420.21 samples/sec Loss 0.8753 LearningRate 0.0000 Epoch: 33 Global Step: 57180 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-06 14:37:02,520-Speed 9406.72 samples/sec Loss 0.8739 LearningRate 0.0000 Epoch: 33 Global Step: 57190 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-06 14:37:28,673-Speed 9397.13 samples/sec Loss 0.8707 LearningRate 0.0000 Epoch: 33 Global Step: 57200 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-06 14:37:54,806-Speed 9404.61 samples/sec Loss 0.8810 LearningRate 0.0000 Epoch: 33 Global Step: 57210 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-06 14:38:21,002-Speed 9383.21 samples/sec Loss 0.8749 LearningRate 0.0000 Epoch: 33 Global Step: 57220 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-06 14:38:47,181-Speed 9387.80 samples/sec Loss 0.8768 LearningRate 0.0000 Epoch: 33 Global Step: 57230 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-06 14:39:13,388-Speed 9377.99 samples/sec Loss 0.8769 LearningRate 0.0000 Epoch: 33 Global Step: 57240 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-06 14:39:39,573-Speed 9386.10 samples/sec Loss 0.8677 LearningRate 0.0000 Epoch: 33 Global Step: 57250 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-06 14:40:05,727-Speed 9397.02 samples/sec Loss 0.8805 LearningRate 0.0000 Epoch: 33 Global Step: 57260 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-06 14:40:31,978-Speed 9362.07 samples/sec Loss 0.8728 LearningRate 0.0000 Epoch: 33 Global Step: 57270 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:40:58,077-Speed 9417.01 samples/sec Loss 0.8731 LearningRate 0.0000 Epoch: 33 Global Step: 57280 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:41:24,166-Speed 9420.59 samples/sec Loss 0.8737 LearningRate 0.0000 Epoch: 33 Global Step: 57290 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:41:50,265-Speed 9416.96 samples/sec Loss 0.8787 LearningRate 0.0000 Epoch: 33 Global Step: 57300 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:42:16,364-Speed 9417.03 samples/sec Loss 0.8740 LearningRate 0.0000 Epoch: 33 Global Step: 57310 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:42:42,434-Speed 9427.09 samples/sec Loss 0.8734 LearningRate 0.0000 Epoch: 33 Global Step: 57320 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:43:08,476-Speed 9437.85 samples/sec Loss 0.8766 LearningRate 0.0000 Epoch: 33 Global Step: 57330 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:43:34,603-Speed 9406.83 samples/sec Loss 0.8801 LearningRate 0.0000 Epoch: 33 Global Step: 57340 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:44:00,817-Speed 9375.57 samples/sec Loss 0.8748 LearningRate 0.0000 Epoch: 33 Global Step: 57350 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:44:26,963-Speed 9399.99 samples/sec Loss 0.8691 LearningRate 0.0000 Epoch: 33 Global Step: 57360 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:44:53,130-Speed 9392.36 samples/sec Loss 0.8818 LearningRate 0.0000 Epoch: 33 Global Step: 57370 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-03-06 14:45:19,211-Speed 9423.22 samples/sec Loss 0.8745 LearningRate 0.0000 Epoch: 33 Global Step: 57380 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:45:48,924-Speed 8271.70 samples/sec Loss 0.8747 LearningRate 0.0000 Epoch: 33 Global Step: 57390 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:46:15,017-Speed 9419.24 samples/sec Loss 0.8720 LearningRate 0.0000 Epoch: 33 Global Step: 57400 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:46:41,145-Speed 9406.04 samples/sec Loss 0.8710 LearningRate 0.0000 Epoch: 33 Global Step: 57410 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:47:07,274-Speed 9406.03 samples/sec Loss 0.8734 LearningRate 0.0000 Epoch: 33 Global Step: 57420 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:47:33,330-Speed 9432.66 samples/sec Loss 0.8763 LearningRate 0.0000 Epoch: 33 Global Step: 57430 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:47:59,403-Speed 9426.25 samples/sec Loss 0.8777 LearningRate 0.0000 Epoch: 33 Global Step: 57440 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:48:25,526-Speed 9408.29 samples/sec Loss 0.8716 LearningRate 0.0000 Epoch: 33 Global Step: 57450 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:48:51,659-Speed 9404.69 samples/sec Loss 0.8688 LearningRate 0.0000 Epoch: 33 Global Step: 57460 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:49:17,717-Speed 9431.45 samples/sec Loss 0.8800 LearningRate 0.0000 Epoch: 33 Global Step: 57470 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:49:43,935-Speed 9374.21 samples/sec Loss 0.8731 LearningRate 0.0000 Epoch: 33 Global Step: 57480 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-03-06 14:50:09,985-Speed 9434.48 samples/sec Loss 0.8645 LearningRate 0.0000 Epoch: 33 Global Step: 57490 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:50:36,087-Speed 9415.86 samples/sec Loss 0.8647 LearningRate 0.0000 Epoch: 33 Global Step: 57500 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:51:02,216-Speed 9406.00 samples/sec Loss 0.8793 LearningRate 0.0000 Epoch: 33 Global Step: 57510 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:51:28,353-Speed 9403.10 samples/sec Loss 0.8652 LearningRate 0.0000 Epoch: 33 Global Step: 57520 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-06 14:51:54,445-Speed 9419.20 samples/sec Loss 0.8742 LearningRate 0.0000 Epoch: 33 Global Step: 57530 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-06 14:52:20,535-Speed 9420.50 samples/sec Loss 0.8666 LearningRate 0.0000 Epoch: 33 Global Step: 57540 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-06 14:52:46,611-Speed 9424.91 samples/sec Loss 0.8688 LearningRate 0.0000 Epoch: 33 Global Step: 57550 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-06 14:53:12,689-Speed 9424.75 samples/sec Loss 0.8646 LearningRate 0.0000 Epoch: 33 Global Step: 57560 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-06 14:53:38,908-Speed 9373.74 samples/sec Loss 0.8710 LearningRate 0.0000 Epoch: 33 Global Step: 57570 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-06 14:54:05,177-Speed 9355.66 samples/sec Loss 0.8696 LearningRate 0.0000 Epoch: 33 Global Step: 57580 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-06 14:54:31,415-Speed 9367.29 samples/sec Loss 0.8784 LearningRate 0.0000 Epoch: 33 Global Step: 57590 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-06 14:54:57,645-Speed 9369.78 samples/sec Loss 0.8624 LearningRate 0.0000 Epoch: 33 Global Step: 57600 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-06 14:55:23,907-Speed 9358.36 samples/sec Loss 0.8715 LearningRate 0.0000 Epoch: 33 Global Step: 57610 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-06 14:55:50,141-Speed 9368.19 samples/sec Loss 0.8703 LearningRate 0.0000 Epoch: 33 Global Step: 57620 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-06 14:56:16,366-Speed 9371.99 samples/sec Loss 0.8687 LearningRate 0.0000 Epoch: 33 Global Step: 57630 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-06 14:56:42,555-Speed 9384.38 samples/sec Loss 0.8749 LearningRate 0.0000 Epoch: 33 Global Step: 57640 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-06 14:57:08,763-Speed 9377.91 samples/sec Loss 0.8758 LearningRate 0.0000 Epoch: 33 Global Step: 57650 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 14:57:34,866-Speed 9415.65 samples/sec Loss 0.8665 LearningRate 0.0000 Epoch: 33 Global Step: 57660 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 14:58:00,946-Speed 9423.65 samples/sec Loss 0.8706 LearningRate 0.0000 Epoch: 33 Global Step: 57670 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 14:58:28,073-Speed 9060.01 samples/sec Loss 0.8653 LearningRate 0.0000 Epoch: 33 Global Step: 57680 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 14:58:54,227-Speed 9397.20 samples/sec Loss 0.8691 LearningRate 0.0000 Epoch: 33 Global Step: 57690 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 14:59:20,394-Speed 9392.36 samples/sec Loss 0.8748 LearningRate 0.0000 Epoch: 33 Global Step: 57700 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 14:59:46,559-Speed 9393.03 samples/sec Loss 0.8692 LearningRate 0.0000 Epoch: 33 Global Step: 57710 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:00:12,730-Speed 9391.01 samples/sec Loss 0.8679 LearningRate 0.0000 Epoch: 33 Global Step: 57720 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:00:38,997-Speed 9356.69 samples/sec Loss 0.8705 LearningRate 0.0000 Epoch: 33 Global Step: 57730 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:01:05,169-Speed 9390.25 samples/sec Loss 0.8705 LearningRate 0.0000 Epoch: 33 Global Step: 57740 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:01:31,307-Speed 9402.98 samples/sec Loss 0.8704 LearningRate 0.0000 Epoch: 33 Global Step: 57750 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-06 15:01:57,487-Speed 9387.57 samples/sec Loss 0.8752 LearningRate 0.0000 Epoch: 33 Global Step: 57760 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-06 15:02:23,532-Speed 9436.28 samples/sec Loss 0.8672 LearningRate 0.0000 Epoch: 33 Global Step: 57770 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:02:49,685-Speed 9397.58 samples/sec Loss 0.8684 LearningRate 0.0000 Epoch: 33 Global Step: 57780 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:03:15,761-Speed 9424.96 samples/sec Loss 0.8711 LearningRate 0.0000 Epoch: 33 Global Step: 57790 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:03:41,897-Speed 9403.74 samples/sec Loss 0.8623 LearningRate 0.0000 Epoch: 33 Global Step: 57800 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:04:08,108-Speed 9376.81 samples/sec Loss 0.8666 LearningRate 0.0000 Epoch: 33 Global Step: 57810 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:04:34,282-Speed 9389.82 samples/sec Loss 0.8695 LearningRate 0.0000 Epoch: 33 Global Step: 57820 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:05:00,410-Speed 9406.50 samples/sec Loss 0.8672 LearningRate 0.0000 Epoch: 33 Global Step: 57830 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:05:26,726-Speed 9339.27 samples/sec Loss 0.8569 LearningRate 0.0000 Epoch: 33 Global Step: 57840 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:05:52,956-Speed 9369.92 samples/sec Loss 0.8613 LearningRate 0.0000 Epoch: 33 Global Step: 57850 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:06:19,178-Speed 9372.75 samples/sec Loss 0.8656 LearningRate 0.0000 Epoch: 33 Global Step: 57860 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:06:45,386-Speed 9377.69 samples/sec Loss 0.8636 LearningRate 0.0000 Epoch: 33 Global Step: 57870 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-06 15:07:11,526-Speed 9401.94 samples/sec Loss 0.8648 LearningRate 0.0000 Epoch: 33 Global Step: 57880 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-06 15:07:37,727-Speed 9380.14 samples/sec Loss 0.8599 LearningRate 0.0000 Epoch: 33 Global Step: 57890 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-06 15:08:03,857-Speed 9406.00 samples/sec Loss 0.8628 LearningRate 0.0000 Epoch: 33 Global Step: 57900 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-06 15:08:30,035-Speed 9389.21 samples/sec Loss 0.8658 LearningRate 0.0000 Epoch: 33 Global Step: 57910 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-06 15:08:56,216-Speed 9387.38 samples/sec Loss 0.8600 LearningRate 0.0000 Epoch: 33 Global Step: 57920 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-06 15:09:22,373-Speed 9395.70 samples/sec Loss 0.8567 LearningRate 0.0000 Epoch: 33 Global Step: 57930 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-06 15:09:48,483-Speed 9412.90 samples/sec Loss 0.8629 LearningRate 0.0000 Epoch: 33 Global Step: 57940 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-06 15:10:14,603-Speed 9409.35 samples/sec Loss 0.8638 LearningRate 0.0000 Epoch: 33 Global Step: 57950 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-06 15:10:40,693-Speed 9420.18 samples/sec Loss 0.8628 LearningRate 0.0000 Epoch: 33 Global Step: 57960 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-06 15:11:06,855-Speed 9394.01 samples/sec Loss 0.8608 LearningRate 0.0000 Epoch: 33 Global Step: 57970 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-06 15:11:33,115-Speed 9359.33 samples/sec Loss 0.8662 LearningRate 0.0000 Epoch: 33 Global Step: 57980 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:11:59,268-Speed 9397.65 samples/sec Loss 0.8547 LearningRate 0.0000 Epoch: 33 Global Step: 57990 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:12:25,402-Speed 9404.31 samples/sec Loss 0.8585 LearningRate 0.0000 Epoch: 33 Global Step: 58000 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:12:51,780-Speed 9317.24 samples/sec Loss 0.8576 LearningRate 0.0000 Epoch: 33 Global Step: 58010 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:13:17,947-Speed 9392.24 samples/sec Loss 0.8580 LearningRate 0.0000 Epoch: 33 Global Step: 58020 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:13:44,214-Speed 9356.36 samples/sec Loss 0.8669 LearningRate 0.0000 Epoch: 33 Global Step: 58030 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:14:10,380-Speed 9392.75 samples/sec Loss 0.8637 LearningRate 0.0000 Epoch: 33 Global Step: 58040 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:14:36,540-Speed 9395.02 samples/sec Loss 0.8641 LearningRate 0.0000 Epoch: 33 Global Step: 58050 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:15:02,710-Speed 9391.24 samples/sec Loss 0.8598 LearningRate 0.0000 Epoch: 33 Global Step: 58060 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:15:28,846-Speed 9403.56 samples/sec Loss 0.8665 LearningRate 0.0000 Epoch: 33 Global Step: 58070 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:15:55,158-Speed 9340.41 samples/sec Loss 0.8614 LearningRate 0.0000 Epoch: 33 Global Step: 58080 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-06 15:16:21,351-Speed 9383.22 samples/sec Loss 0.8642 LearningRate 0.0000 Epoch: 33 Global Step: 58090 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-06 15:16:47,480-Speed 9406.15 samples/sec Loss 0.8577 LearningRate 0.0000 Epoch: 33 Global Step: 58100 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-06 15:17:13,706-Speed 9371.33 samples/sec Loss 0.8603 LearningRate 0.0000 Epoch: 33 Global Step: 58110 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-06 15:17:39,844-Speed 9402.68 samples/sec Loss 0.8590 LearningRate 0.0000 Epoch: 33 Global Step: 58120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-06 15:18:05,894-Speed 9434.76 samples/sec Loss 0.8607 LearningRate 0.0000 Epoch: 33 Global Step: 58130 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:18:32,027-Speed 9404.94 samples/sec Loss 0.8608 LearningRate 0.0000 Epoch: 33 Global Step: 58140 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:18:58,097-Speed 9427.20 samples/sec Loss 0.8567 LearningRate 0.0000 Epoch: 33 Global Step: 58150 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:19:24,242-Speed 9400.11 samples/sec Loss 0.8535 LearningRate 0.0000 Epoch: 33 Global Step: 58160 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:19:50,413-Speed 9391.05 samples/sec Loss 0.8561 LearningRate 0.0000 Epoch: 33 Global Step: 58170 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:20:16,627-Speed 9375.76 samples/sec Loss 0.8660 LearningRate 0.0000 Epoch: 33 Global Step: 58180 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:20:42,905-Speed 9352.86 samples/sec Loss 0.8628 LearningRate 0.0000 Epoch: 33 Global Step: 58190 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:21:09,024-Speed 9409.65 samples/sec Loss 0.8564 LearningRate 0.0000 Epoch: 33 Global Step: 58200 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:21:35,153-Speed 9406.20 samples/sec Loss 0.8477 LearningRate 0.0000 Epoch: 33 Global Step: 58210 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:22:01,284-Speed 9405.06 samples/sec Loss 0.8613 LearningRate 0.0000 Epoch: 33 Global Step: 58220 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:22:27,540-Speed 9360.68 samples/sec Loss 0.8594 LearningRate 0.0000 Epoch: 33 Global Step: 58230 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-06 15:22:53,684-Speed 9400.96 samples/sec Loss 0.8608 LearningRate 0.0000 Epoch: 33 Global Step: 58240 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-06 15:23:19,852-Speed 9391.90 samples/sec Loss 0.8653 LearningRate 0.0000 Epoch: 33 Global Step: 58250 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-06 15:23:45,923-Speed 9427.04 samples/sec Loss 0.8566 LearningRate 0.0000 Epoch: 33 Global Step: 58260 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-06 15:24:12,103-Speed 9387.48 samples/sec Loss 0.8587 LearningRate 0.0000 Epoch: 33 Global Step: 58270 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-06 15:24:38,243-Speed 9402.19 samples/sec Loss 0.8595 LearningRate 0.0000 Epoch: 33 Global Step: 58280 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-06 15:25:04,411-Speed 9392.02 samples/sec Loss 0.8593 LearningRate 0.0000 Epoch: 33 Global Step: 58290 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-06 15:25:30,487-Speed 9425.43 samples/sec Loss 0.8591 LearningRate 0.0000 Epoch: 33 Global Step: 58300 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-06 15:25:56,530-Speed 9436.87 samples/sec Loss 0.8569 LearningRate 0.0000 Epoch: 33 Global Step: 58310 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-06 15:26:22,720-Speed 9383.93 samples/sec Loss 0.8541 LearningRate 0.0000 Epoch: 33 Global Step: 58320 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-06 15:26:48,846-Speed 9407.55 samples/sec Loss 0.8540 LearningRate 0.0000 Epoch: 33 Global Step: 58330 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-06 15:27:15,122-Speed 9353.09 samples/sec Loss 0.8639 LearningRate 0.0000 Epoch: 33 Global Step: 58340 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-06 15:27:41,406-Speed 9350.82 samples/sec Loss 0.8642 LearningRate 0.0000 Epoch: 33 Global Step: 58350 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-06 15:28:07,547-Speed 9401.53 samples/sec Loss 0.8597 LearningRate 0.0000 Epoch: 33 Global Step: 58360 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-06 15:28:33,643-Speed 9417.71 samples/sec Loss 0.8653 LearningRate 0.0000 Epoch: 33 Global Step: 58370 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-06 15:28:59,770-Speed 9406.92 samples/sec Loss 0.8547 LearningRate 0.0000 Epoch: 33 Global Step: 58380 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:29:25,908-Speed 9402.82 samples/sec Loss 0.8568 LearningRate 0.0000 Epoch: 33 Global Step: 58390 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:29:52,027-Speed 9409.33 samples/sec Loss 0.8558 LearningRate 0.0000 Epoch: 33 Global Step: 58400 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:30:18,226-Speed 9381.27 samples/sec Loss 0.8455 LearningRate 0.0000 Epoch: 33 Global Step: 58410 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:30:44,484-Speed 9359.75 samples/sec Loss 0.8560 LearningRate 0.0000 Epoch: 33 Global Step: 58420 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:31:10,630-Speed 9400.12 samples/sec Loss 0.8472 LearningRate 0.0000 Epoch: 33 Global Step: 58430 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:31:36,799-Speed 9391.69 samples/sec Loss 0.8549 LearningRate 0.0000 Epoch: 33 Global Step: 58440 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:32:02,980-Speed 9387.27 samples/sec Loss 0.8512 LearningRate 0.0000 Epoch: 33 Global Step: 58450 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:32:29,152-Speed 9390.70 samples/sec Loss 0.8554 LearningRate 0.0000 Epoch: 33 Global Step: 58460 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:32:55,287-Speed 9403.84 samples/sec Loss 0.8559 LearningRate 0.0000 Epoch: 33 Global Step: 58470 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:33:21,583-Speed 9346.47 samples/sec Loss 0.8552 LearningRate 0.0000 Epoch: 33 Global Step: 58480 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-06 15:33:47,715-Speed 9405.00 samples/sec Loss 0.8564 LearningRate 0.0000 Epoch: 33 Global Step: 58490 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-06 15:34:13,929-Speed 9375.43 samples/sec Loss 0.8618 LearningRate 0.0000 Epoch: 33 Global Step: 58500 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-06 15:34:40,148-Speed 9373.70 samples/sec Loss 0.8544 LearningRate 0.0000 Epoch: 33 Global Step: 58510 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-06 15:35:06,370-Speed 9372.92 samples/sec Loss 0.8535 LearningRate 0.0000 Epoch: 33 Global Step: 58520 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-06 15:35:32,468-Speed 9417.46 samples/sec Loss 0.8536 LearningRate 0.0000 Epoch: 33 Global Step: 58530 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-06 15:35:58,603-Speed 9403.67 samples/sec Loss 0.8601 LearningRate 0.0000 Epoch: 33 Global Step: 58540 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:36:24,713-Speed 9412.92 samples/sec Loss 0.8558 LearningRate 0.0000 Epoch: 33 Global Step: 58550 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:36:50,856-Speed 9400.89 samples/sec Loss 0.8557 LearningRate 0.0000 Epoch: 33 Global Step: 58560 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:37:16,996-Speed 9402.09 samples/sec Loss 0.8538 LearningRate 0.0000 Epoch: 33 Global Step: 58570 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:37:43,109-Speed 9412.18 samples/sec Loss 0.8524 LearningRate 0.0000 Epoch: 33 Global Step: 58580 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:38:09,195-Speed 9421.41 samples/sec Loss 0.8532 LearningRate 0.0000 Epoch: 33 Global Step: 58590 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:38:35,259-Speed 9429.36 samples/sec Loss 0.8546 LearningRate 0.0000 Epoch: 33 Global Step: 58600 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:39:01,374-Speed 9411.38 samples/sec Loss 0.8520 LearningRate 0.0000 Epoch: 33 Global Step: 58610 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:39:27,478-Speed 9415.45 samples/sec Loss 0.8544 LearningRate 0.0000 Epoch: 33 Global Step: 58620 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:39:53,597-Speed 9409.58 samples/sec Loss 0.8495 LearningRate 0.0000 Epoch: 33 Global Step: 58630 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-06 15:40:19,714-Speed 9410.78 samples/sec Loss 0.8556 LearningRate 0.0000 Epoch: 33 Global Step: 58640 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-06 15:40:45,785-Speed 9426.91 samples/sec Loss 0.8491 LearningRate 0.0000 Epoch: 33 Global Step: 58650 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-06 15:41:11,902-Speed 9410.27 samples/sec Loss 0.8514 LearningRate 0.0000 Epoch: 33 Global Step: 58660 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-06 15:41:37,984-Speed 9423.42 samples/sec Loss 0.8584 LearningRate 0.0000 Epoch: 33 Global Step: 58670 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-06 15:42:04,079-Speed 9418.17 samples/sec Loss 0.8536 LearningRate 0.0000 Epoch: 33 Global Step: 58680 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-06 15:42:30,274-Speed 9382.20 samples/sec Loss 0.8567 LearningRate 0.0000 Epoch: 33 Global Step: 58690 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-06 15:42:56,387-Speed 9411.78 samples/sec Loss 0.8567 LearningRate 0.0000 Epoch: 33 Global Step: 58700 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-06 15:43:22,522-Speed 9403.89 samples/sec Loss 0.8563 LearningRate 0.0000 Epoch: 33 Global Step: 58710 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-06 15:43:48,608-Speed 9421.84 samples/sec Loss 0.8507 LearningRate 0.0000 Epoch: 33 Global Step: 58720 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-06 15:44:14,795-Speed 9384.85 samples/sec Loss 0.8494 LearningRate 0.0000 Epoch: 33 Global Step: 58730 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:44:40,910-Speed 9411.10 samples/sec Loss 0.8561 LearningRate 0.0000 Epoch: 33 Global Step: 58740 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:45:06,999-Speed 9420.68 samples/sec Loss 0.8552 LearningRate 0.0000 Epoch: 33 Global Step: 58750 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:45:33,162-Speed 9393.66 samples/sec Loss 0.8561 LearningRate 0.0000 Epoch: 33 Global Step: 58760 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:46:52,490-Speed 3098.08 samples/sec Loss 0.8533 LearningRate 0.0000 Epoch: 34 Global Step: 58770 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:47:18,358-Speed 9500.86 samples/sec Loss 0.8482 LearningRate 0.0000 Epoch: 34 Global Step: 58780 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:47:44,317-Speed 9467.66 samples/sec Loss 0.8522 LearningRate 0.0000 Epoch: 34 Global Step: 58790 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:48:10,275-Speed 9468.16 samples/sec Loss 0.8456 LearningRate 0.0000 Epoch: 34 Global Step: 58800 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:48:36,218-Speed 9473.54 samples/sec Loss 0.8460 LearningRate 0.0000 Epoch: 34 Global Step: 58810 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:49:02,170-Speed 9470.00 samples/sec Loss 0.8463 LearningRate 0.0000 Epoch: 34 Global Step: 58820 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:49:28,141-Speed 9463.93 samples/sec Loss 0.8445 LearningRate 0.0000 Epoch: 34 Global Step: 58830 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-06 15:49:54,074-Speed 9476.84 samples/sec Loss 0.8491 LearningRate 0.0000 Epoch: 34 Global Step: 58840 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-06 15:50:20,040-Speed 9465.18 samples/sec Loss 0.8462 LearningRate 0.0000 Epoch: 34 Global Step: 58850 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-06 15:50:46,029-Speed 9456.77 samples/sec Loss 0.8557 LearningRate 0.0000 Epoch: 34 Global Step: 58860 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-06 15:51:12,006-Speed 9461.20 samples/sec Loss 0.8450 LearningRate 0.0000 Epoch: 34 Global Step: 58870 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-06 15:51:37,977-Speed 9464.10 samples/sec Loss 0.8522 LearningRate 0.0000 Epoch: 34 Global Step: 58880 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-06 15:52:04,013-Speed 9439.91 samples/sec Loss 0.8454 LearningRate 0.0000 Epoch: 34 Global Step: 58890 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-06 15:52:30,047-Speed 9440.40 samples/sec Loss 0.8490 LearningRate 0.0000 Epoch: 34 Global Step: 58900 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-06 15:52:55,958-Speed 9484.96 samples/sec Loss 0.8393 LearningRate 0.0000 Epoch: 34 Global Step: 58910 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-06 15:53:22,005-Speed 9435.72 samples/sec Loss 0.8404 LearningRate 0.0000 Epoch: 34 Global Step: 58920 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-06 15:53:47,987-Speed 9459.06 samples/sec Loss 0.8513 LearningRate 0.0000 Epoch: 34 Global Step: 58930 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-06 15:54:13,982-Speed 9454.82 samples/sec Loss 0.8506 LearningRate 0.0000 Epoch: 34 Global Step: 58940 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-06 15:54:39,956-Speed 9461.96 samples/sec Loss 0.8491 LearningRate 0.0000 Epoch: 34 Global Step: 58950 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-06 15:55:05,926-Speed 9463.80 samples/sec Loss 0.8499 LearningRate 0.0000 Epoch: 34 Global Step: 58960 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-06 15:55:31,949-Speed 9444.25 samples/sec Loss 0.8491 LearningRate 0.0000 Epoch: 34 Global Step: 58970 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-06 15:55:57,925-Speed 9461.50 samples/sec Loss 0.8491 LearningRate 0.0000 Epoch: 34 Global Step: 58980 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-06 15:56:23,969-Speed 9436.66 samples/sec Loss 0.8493 LearningRate 0.0000 Epoch: 34 Global Step: 58990 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-06 15:56:50,007-Speed 9439.06 samples/sec Loss 0.8428 LearningRate 0.0000 Epoch: 34 Global Step: 59000 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-06 15:57:16,017-Speed 9449.01 samples/sec Loss 0.8481 LearningRate 0.0000 Epoch: 34 Global Step: 59010 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-06 15:57:42,031-Speed 9447.99 samples/sec Loss 0.8441 LearningRate 0.0000 Epoch: 34 Global Step: 59020 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-06 15:58:08,073-Speed 9437.29 samples/sec Loss 0.8508 LearningRate 0.0000 Epoch: 34 Global Step: 59030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 15:58:34,123-Speed 9434.55 samples/sec Loss 0.8441 LearningRate 0.0000 Epoch: 34 Global Step: 59040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 15:59:00,141-Speed 9446.81 samples/sec Loss 0.8414 LearningRate 0.0000 Epoch: 34 Global Step: 59050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 15:59:26,204-Speed 9429.93 samples/sec Loss 0.8492 LearningRate 0.0000 Epoch: 34 Global Step: 59060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 15:59:52,265-Speed 9430.25 samples/sec Loss 0.8474 LearningRate 0.0000 Epoch: 34 Global Step: 59070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:00:18,284-Speed 9445.96 samples/sec Loss 0.8536 LearningRate 0.0000 Epoch: 34 Global Step: 59080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:00:44,348-Speed 9429.32 samples/sec Loss 0.8443 LearningRate 0.0000 Epoch: 34 Global Step: 59090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:01:10,424-Speed 9425.21 samples/sec Loss 0.8440 LearningRate 0.0000 Epoch: 34 Global Step: 59100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:01:36,484-Speed 9430.88 samples/sec Loss 0.8462 LearningRate 0.0000 Epoch: 34 Global Step: 59110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:02:02,570-Speed 9421.74 samples/sec Loss 0.8535 LearningRate 0.0000 Epoch: 34 Global Step: 59120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:02:28,701-Speed 9405.29 samples/sec Loss 0.8450 LearningRate 0.0000 Epoch: 34 Global Step: 59130 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-03-06 16:02:54,845-Speed 9400.50 samples/sec Loss 0.8451 LearningRate 0.0000 Epoch: 34 Global Step: 59140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-03-06 16:03:20,924-Speed 9424.30 samples/sec Loss 0.8504 LearningRate 0.0000 Epoch: 34 Global Step: 59150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:03:47,051-Speed 9406.70 samples/sec Loss 0.8491 LearningRate 0.0000 Epoch: 34 Global Step: 59160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:04:13,138-Speed 9421.06 samples/sec Loss 0.8466 LearningRate 0.0000 Epoch: 34 Global Step: 59170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:04:39,279-Speed 9401.61 samples/sec Loss 0.8424 LearningRate 0.0000 Epoch: 34 Global Step: 59180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:05:05,459-Speed 9387.94 samples/sec Loss 0.8464 LearningRate 0.0000 Epoch: 34 Global Step: 59190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:05:31,602-Speed 9401.09 samples/sec Loss 0.8435 LearningRate 0.0000 Epoch: 34 Global Step: 59200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:05:57,689-Speed 9421.15 samples/sec Loss 0.8395 LearningRate 0.0000 Epoch: 34 Global Step: 59210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:06:23,891-Speed 9379.89 samples/sec Loss 0.8425 LearningRate 0.0000 Epoch: 34 Global Step: 59220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:06:49,985-Speed 9418.89 samples/sec Loss 0.8413 LearningRate 0.0000 Epoch: 34 Global Step: 59230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:07:16,142-Speed 9395.75 samples/sec Loss 0.8470 LearningRate 0.0000 Epoch: 34 Global Step: 59240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:07:42,288-Speed 9400.19 samples/sec Loss 0.8419 LearningRate 0.0000 Epoch: 34 Global Step: 59250 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-03-06 16:08:08,411-Speed 9408.15 samples/sec Loss 0.8451 LearningRate 0.0000 Epoch: 34 Global Step: 59260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:08:34,611-Speed 9380.50 samples/sec Loss 0.8533 LearningRate 0.0000 Epoch: 34 Global Step: 59270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:09:00,824-Speed 9376.09 samples/sec Loss 0.8436 LearningRate 0.0000 Epoch: 34 Global Step: 59280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:09:27,056-Speed 9369.21 samples/sec Loss 0.8516 LearningRate 0.0000 Epoch: 34 Global Step: 59290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:09:53,212-Speed 9396.37 samples/sec Loss 0.8465 LearningRate 0.0000 Epoch: 34 Global Step: 59300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:10:19,335-Speed 9408.09 samples/sec Loss 0.8445 LearningRate 0.0000 Epoch: 34 Global Step: 59310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:10:45,472-Speed 9403.47 samples/sec Loss 0.8476 LearningRate 0.0000 Epoch: 34 Global Step: 59320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:11:11,624-Speed 9397.62 samples/sec Loss 0.8461 LearningRate 0.0000 Epoch: 34 Global Step: 59330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:11:37,725-Speed 9415.95 samples/sec Loss 0.8497 LearningRate 0.0000 Epoch: 34 Global Step: 59340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:12:03,864-Speed 9402.78 samples/sec Loss 0.8389 LearningRate 0.0000 Epoch: 34 Global Step: 59350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:12:29,998-Speed 9404.49 samples/sec Loss 0.8358 LearningRate 0.0000 Epoch: 34 Global Step: 59360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-03-06 16:12:56,130-Speed 9404.93 samples/sec Loss 0.8471 LearningRate 0.0000 Epoch: 34 Global Step: 59370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-03-06 16:13:22,191-Speed 9430.56 samples/sec Loss 0.8436 LearningRate 0.0000 Epoch: 34 Global Step: 59380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:13:48,339-Speed 9399.25 samples/sec Loss 0.8405 LearningRate 0.0000 Epoch: 34 Global Step: 59390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:14:14,446-Speed 9414.23 samples/sec Loss 0.8439 LearningRate 0.0000 Epoch: 34 Global Step: 59400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:14:40,613-Speed 9392.51 samples/sec Loss 0.8384 LearningRate 0.0000 Epoch: 34 Global Step: 59410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:15:06,731-Speed 9410.22 samples/sec Loss 0.8445 LearningRate 0.0000 Epoch: 34 Global Step: 59420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:15:32,882-Speed 9398.01 samples/sec Loss 0.8383 LearningRate 0.0000 Epoch: 34 Global Step: 59430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:15:59,011-Speed 9406.20 samples/sec Loss 0.8434 LearningRate 0.0000 Epoch: 34 Global Step: 59440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:16:25,220-Speed 9377.32 samples/sec Loss 0.8441 LearningRate 0.0000 Epoch: 34 Global Step: 59450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:16:54,582-Speed 8370.39 samples/sec Loss 0.8441 LearningRate 0.0000 Epoch: 34 Global Step: 59460 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-06 16:17:20,774-Speed 9383.78 samples/sec Loss 0.8524 LearningRate 0.0000 Epoch: 34 Global Step: 59470 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-06 16:17:46,980-Speed 9378.42 samples/sec Loss 0.8433 LearningRate 0.0000 Epoch: 34 Global Step: 59480 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-06 16:18:13,139-Speed 9395.47 samples/sec Loss 0.8378 LearningRate 0.0000 Epoch: 34 Global Step: 59490 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-06 16:18:39,252-Speed 9412.10 samples/sec Loss 0.8397 LearningRate 0.0000 Epoch: 34 Global Step: 59500 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-06 16:19:05,433-Speed 9387.45 samples/sec Loss 0.8377 LearningRate 0.0000 Epoch: 34 Global Step: 59510 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-06 16:19:31,539-Speed 9414.16 samples/sec Loss 0.8394 LearningRate 0.0000 Epoch: 34 Global Step: 59520 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-06 16:19:57,744-Speed 9378.79 samples/sec Loss 0.8394 LearningRate 0.0000 Epoch: 34 Global Step: 59530 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-06 16:20:23,941-Speed 9381.69 samples/sec Loss 0.8424 LearningRate 0.0000 Epoch: 34 Global Step: 59540 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-06 16:20:50,178-Speed 9367.62 samples/sec Loss 0.8367 LearningRate 0.0000 Epoch: 34 Global Step: 59550 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-06 16:21:16,428-Speed 9362.71 samples/sec Loss 0.8473 LearningRate 0.0000 Epoch: 34 Global Step: 59560 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:21:42,533-Speed 9414.79 samples/sec Loss 0.8415 LearningRate 0.0000 Epoch: 34 Global Step: 59570 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:22:08,673-Speed 9402.10 samples/sec Loss 0.8465 LearningRate 0.0000 Epoch: 34 Global Step: 59580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:22:34,944-Speed 9355.03 samples/sec Loss 0.8358 LearningRate 0.0000 Epoch: 34 Global Step: 59590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:23:01,107-Speed 9394.11 samples/sec Loss 0.8364 LearningRate 0.0000 Epoch: 34 Global Step: 59600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:23:27,284-Speed 9388.69 samples/sec Loss 0.8445 LearningRate 0.0000 Epoch: 34 Global Step: 59610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:23:53,396-Speed 9412.31 samples/sec Loss 0.8364 LearningRate 0.0000 Epoch: 34 Global Step: 59620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:24:19,630-Speed 9368.26 samples/sec Loss 0.8400 LearningRate 0.0000 Epoch: 34 Global Step: 59630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:24:45,791-Speed 9394.53 samples/sec Loss 0.8377 LearningRate 0.0000 Epoch: 34 Global Step: 59640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:25:11,887-Speed 9417.88 samples/sec Loss 0.8354 LearningRate 0.0000 Epoch: 34 Global Step: 59650 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-06 16:25:38,026-Speed 9402.55 samples/sec Loss 0.8333 LearningRate 0.0000 Epoch: 34 Global Step: 59660 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-06 16:26:04,170-Speed 9400.41 samples/sec Loss 0.8443 LearningRate 0.0000 Epoch: 34 Global Step: 59670 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-06 16:26:30,319-Speed 9398.93 samples/sec Loss 0.8328 LearningRate 0.0000 Epoch: 34 Global Step: 59680 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-06 16:26:56,478-Speed 9395.20 samples/sec Loss 0.8406 LearningRate 0.0000 Epoch: 34 Global Step: 59690 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-06 16:27:22,705-Speed 9370.94 samples/sec Loss 0.8386 LearningRate 0.0000 Epoch: 34 Global Step: 59700 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-06 16:27:48,856-Speed 9397.95 samples/sec Loss 0.8343 LearningRate 0.0000 Epoch: 34 Global Step: 59710 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-06 16:28:15,002-Speed 9400.30 samples/sec Loss 0.8370 LearningRate 0.0000 Epoch: 34 Global Step: 59720 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-06 16:28:41,045-Speed 9437.04 samples/sec Loss 0.8391 LearningRate 0.0000 Epoch: 34 Global Step: 59730 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-06 16:29:07,177-Speed 9404.77 samples/sec Loss 0.8372 LearningRate 0.0000 Epoch: 34 Global Step: 59740 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-06 16:29:33,322-Speed 9400.50 samples/sec Loss 0.8316 LearningRate 0.0000 Epoch: 34 Global Step: 59750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:29:59,398-Speed 9425.12 samples/sec Loss 0.8383 LearningRate 0.0000 Epoch: 34 Global Step: 59760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:30:25,599-Speed 9379.99 samples/sec Loss 0.8424 LearningRate 0.0000 Epoch: 34 Global Step: 59770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:30:51,844-Speed 9364.62 samples/sec Loss 0.8399 LearningRate 0.0000 Epoch: 34 Global Step: 59780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:31:17,995-Speed 9398.07 samples/sec Loss 0.8279 LearningRate 0.0000 Epoch: 34 Global Step: 59790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:31:44,108-Speed 9411.77 samples/sec Loss 0.8366 LearningRate 0.0000 Epoch: 34 Global Step: 59800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:32:10,257-Speed 9398.82 samples/sec Loss 0.8354 LearningRate 0.0000 Epoch: 34 Global Step: 59810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:32:36,308-Speed 9433.97 samples/sec Loss 0.8425 LearningRate 0.0000 Epoch: 34 Global Step: 59820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:33:02,369-Speed 9430.98 samples/sec Loss 0.8404 LearningRate 0.0000 Epoch: 34 Global Step: 59830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:33:28,525-Speed 9396.34 samples/sec Loss 0.8254 LearningRate 0.0000 Epoch: 34 Global Step: 59840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:33:54,575-Speed 9434.42 samples/sec Loss 0.8325 LearningRate 0.0000 Epoch: 34 Global Step: 59850 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-06 16:34:20,705-Speed 9406.03 samples/sec Loss 0.8358 LearningRate 0.0000 Epoch: 34 Global Step: 59860 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-06 16:34:46,791-Speed 9421.46 samples/sec Loss 0.8341 LearningRate 0.0000 Epoch: 34 Global Step: 59870 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-06 16:35:12,907-Speed 9410.79 samples/sec Loss 0.8279 LearningRate 0.0000 Epoch: 34 Global Step: 59880 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-06 16:35:38,995-Speed 9421.32 samples/sec Loss 0.8319 LearningRate 0.0000 Epoch: 34 Global Step: 59890 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-06 16:36:05,145-Speed 9398.34 samples/sec Loss 0.8354 LearningRate 0.0000 Epoch: 34 Global Step: 59900 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-06 16:36:31,271-Speed 9407.00 samples/sec Loss 0.8374 LearningRate 0.0000 Epoch: 34 Global Step: 59910 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-06 16:36:57,446-Speed 9389.40 samples/sec Loss 0.8295 LearningRate 0.0000 Epoch: 34 Global Step: 59920 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-06 16:37:23,564-Speed 9410.11 samples/sec Loss 0.8371 LearningRate 0.0000 Epoch: 34 Global Step: 59930 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-06 16:37:49,674-Speed 9412.94 samples/sec Loss 0.8327 LearningRate 0.0000 Epoch: 34 Global Step: 59940 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-06 16:38:15,802-Speed 9406.50 samples/sec Loss 0.8399 LearningRate 0.0000 Epoch: 34 Global Step: 59950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:38:41,907-Speed 9414.48 samples/sec Loss 0.8258 LearningRate 0.0000 Epoch: 34 Global Step: 59960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:39:08,023-Speed 9410.63 samples/sec Loss 0.8333 LearningRate 0.0000 Epoch: 34 Global Step: 59970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:39:34,218-Speed 9382.33 samples/sec Loss 0.8354 LearningRate 0.0000 Epoch: 34 Global Step: 59980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:40:00,348-Speed 9405.64 samples/sec Loss 0.8336 LearningRate 0.0000 Epoch: 34 Global Step: 59990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:40:26,446-Speed 9417.33 samples/sec Loss 0.8342 LearningRate 0.0000 Epoch: 34 Global Step: 60000 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:40:52,538-Speed 9419.63 samples/sec Loss 0.8305 LearningRate 0.0000 Epoch: 34 Global Step: 60010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:41:18,752-Speed 9375.37 samples/sec Loss 0.8340 LearningRate 0.0000 Epoch: 34 Global Step: 60020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:41:44,880-Speed 9406.62 samples/sec Loss 0.8351 LearningRate 0.0000 Epoch: 34 Global Step: 60030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:42:11,093-Speed 9376.11 samples/sec Loss 0.8298 LearningRate 0.0000 Epoch: 34 Global Step: 60040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:42:37,168-Speed 9425.45 samples/sec Loss 0.8293 LearningRate 0.0000 Epoch: 34 Global Step: 60050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:43:03,305-Speed 9403.12 samples/sec Loss 0.8245 LearningRate 0.0000 Epoch: 34 Global Step: 60060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:43:29,369-Speed 9429.39 samples/sec Loss 0.8340 LearningRate 0.0000 Epoch: 34 Global Step: 60070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:43:55,411-Speed 9437.67 samples/sec Loss 0.8308 LearningRate 0.0000 Epoch: 34 Global Step: 60080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:44:21,551-Speed 9402.06 samples/sec Loss 0.8440 LearningRate 0.0000 Epoch: 34 Global Step: 60090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:44:47,636-Speed 9421.77 samples/sec Loss 0.8349 LearningRate 0.0000 Epoch: 34 Global Step: 60100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:45:13,738-Speed 9415.71 samples/sec Loss 0.8356 LearningRate 0.0000 Epoch: 34 Global Step: 60110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:45:39,918-Speed 9387.67 samples/sec Loss 0.8420 LearningRate 0.0000 Epoch: 34 Global Step: 60120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:46:06,081-Speed 9393.91 samples/sec Loss 0.8355 LearningRate 0.0000 Epoch: 34 Global Step: 60130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:46:32,121-Speed 9438.11 samples/sec Loss 0.8340 LearningRate 0.0000 Epoch: 34 Global Step: 60140 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-06 16:46:58,187-Speed 9428.86 samples/sec Loss 0.8305 LearningRate 0.0000 Epoch: 34 Global Step: 60150 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-06 16:47:24,267-Speed 9423.82 samples/sec Loss 0.8337 LearningRate 0.0000 Epoch: 34 Global Step: 60160 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-06 16:47:50,329-Speed 9429.86 samples/sec Loss 0.8340 LearningRate 0.0000 Epoch: 34 Global Step: 60170 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-06 16:48:16,367-Speed 9439.08 samples/sec Loss 0.8285 LearningRate 0.0000 Epoch: 34 Global Step: 60180 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-06 16:48:42,457-Speed 9419.98 samples/sec Loss 0.8335 LearningRate 0.0000 Epoch: 34 Global Step: 60190 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-06 16:49:08,555-Speed 9417.31 samples/sec Loss 0.8334 LearningRate 0.0000 Epoch: 34 Global Step: 60200 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-06 16:49:34,707-Speed 9397.86 samples/sec Loss 0.8272 LearningRate 0.0000 Epoch: 34 Global Step: 60210 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-06 16:50:00,829-Speed 9408.50 samples/sec Loss 0.8322 LearningRate 0.0000 Epoch: 34 Global Step: 60220 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-06 16:50:27,117-Speed 9349.03 samples/sec Loss 0.8335 LearningRate 0.0000 Epoch: 34 Global Step: 60230 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-06 16:50:53,262-Speed 9400.40 samples/sec Loss 0.8351 LearningRate 0.0000 Epoch: 34 Global Step: 60240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:51:19,435-Speed 9390.48 samples/sec Loss 0.8270 LearningRate 0.0000 Epoch: 34 Global Step: 60250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:51:45,577-Speed 9401.29 samples/sec Loss 0.8310 LearningRate 0.0000 Epoch: 34 Global Step: 60260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:52:11,672-Speed 9418.21 samples/sec Loss 0.8313 LearningRate 0.0000 Epoch: 34 Global Step: 60270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:52:37,800-Speed 9406.40 samples/sec Loss 0.8280 LearningRate 0.0000 Epoch: 34 Global Step: 60280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-06 16:53:03,985-Speed 9386.23 samples/sec Loss 0.8298 LearningRate 0.0000 Epoch: 34 Global Step: 60290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-06 16:53:30,012-Speed 9442.65 samples/sec Loss 0.8280 LearningRate 0.0000 Epoch: 34 Global Step: 60300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-06 16:53:56,111-Speed 9417.16 samples/sec Loss 0.8330 LearningRate 0.0000 Epoch: 34 Global Step: 60310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-06 16:54:22,161-Speed 9434.35 samples/sec Loss 0.8320 LearningRate 0.0000 Epoch: 34 Global Step: 60320 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 16:54:48,215-Speed 9432.98 samples/sec Loss 0.8298 LearningRate 0.0000 Epoch: 34 Global Step: 60330 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 16:55:14,285-Speed 9427.42 samples/sec Loss 0.8321 LearningRate 0.0000 Epoch: 34 Global Step: 60340 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 16:55:40,409-Speed 9407.73 samples/sec Loss 0.8368 LearningRate 0.0000 Epoch: 34 Global Step: 60350 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 16:56:06,488-Speed 9424.15 samples/sec Loss 0.8303 LearningRate 0.0000 Epoch: 34 Global Step: 60360 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 16:56:32,714-Speed 9371.18 samples/sec Loss 0.8329 LearningRate 0.0000 Epoch: 34 Global Step: 60370 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 16:56:58,854-Speed 9402.35 samples/sec Loss 0.8334 LearningRate 0.0000 Epoch: 34 Global Step: 60380 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 16:57:24,952-Speed 9416.94 samples/sec Loss 0.8254 LearningRate 0.0000 Epoch: 34 Global Step: 60390 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 16:57:50,943-Speed 9456.04 samples/sec Loss 0.8280 LearningRate 0.0000 Epoch: 34 Global Step: 60400 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-06 16:58:17,114-Speed 9390.74 samples/sec Loss 0.8262 LearningRate 0.0000 Epoch: 34 Global Step: 60410 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-06 16:58:43,251-Speed 9403.05 samples/sec Loss 0.8299 LearningRate 0.0000 Epoch: 34 Global Step: 60420 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-06 16:59:09,367-Speed 9410.86 samples/sec Loss 0.8357 LearningRate 0.0000 Epoch: 34 Global Step: 60430 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-06 16:59:35,448-Speed 9423.24 samples/sec Loss 0.8283 LearningRate 0.0000 Epoch: 34 Global Step: 60440 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-06 17:00:01,589-Speed 9402.03 samples/sec Loss 0.8321 LearningRate 0.0000 Epoch: 34 Global Step: 60450 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-06 17:00:27,668-Speed 9423.97 samples/sec Loss 0.8245 LearningRate 0.0000 Epoch: 34 Global Step: 60460 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-06 17:00:53,795-Speed 9406.92 samples/sec Loss 0.8205 LearningRate 0.0000 Epoch: 34 Global Step: 60470 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-06 17:01:20,032-Speed 9367.57 samples/sec Loss 0.8284 LearningRate 0.0000 Epoch: 34 Global Step: 60480 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-06 17:01:46,337-Speed 9342.87 samples/sec Loss 0.8314 LearningRate 0.0000 Epoch: 34 Global Step: 60490 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-06 17:03:03,993-Speed 3164.79 samples/sec Loss 0.8246 LearningRate 0.0000 Epoch: 35 Global Step: 60500 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:03:30,006-Speed 9448.08 samples/sec Loss 0.8264 LearningRate 0.0000 Epoch: 35 Global Step: 60510 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:03:56,123-Speed 9410.51 samples/sec Loss 0.8214 LearningRate 0.0000 Epoch: 35 Global Step: 60520 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:04:22,406-Speed 9350.84 samples/sec Loss 0.8243 LearningRate 0.0000 Epoch: 35 Global Step: 60530 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:04:48,715-Speed 9341.50 samples/sec Loss 0.8298 LearningRate 0.0000 Epoch: 35 Global Step: 60540 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:05:14,948-Speed 9369.06 samples/sec Loss 0.8223 LearningRate 0.0000 Epoch: 35 Global Step: 60550 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:05:41,844-Speed 9137.76 samples/sec Loss 0.8327 LearningRate 0.0000 Epoch: 35 Global Step: 60560 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:06:08,074-Speed 9369.61 samples/sec Loss 0.8257 LearningRate 0.0000 Epoch: 35 Global Step: 60570 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:06:34,188-Speed 9411.31 samples/sec Loss 0.8249 LearningRate 0.0000 Epoch: 35 Global Step: 60580 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:07:00,278-Speed 9420.58 samples/sec Loss 0.8242 LearningRate 0.0000 Epoch: 35 Global Step: 60590 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:07:26,497-Speed 9373.79 samples/sec Loss 0.8264 LearningRate 0.0000 Epoch: 35 Global Step: 60600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-06 17:07:52,635-Speed 9402.53 samples/sec Loss 0.8211 LearningRate 0.0000 Epoch: 35 Global Step: 60610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-06 17:08:18,764-Speed 9407.30 samples/sec Loss 0.8261 LearningRate 0.0000 Epoch: 35 Global Step: 60620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-06 17:08:44,899-Speed 9403.86 samples/sec Loss 0.8198 LearningRate 0.0000 Epoch: 35 Global Step: 60630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-06 17:09:11,260-Speed 9323.63 samples/sec Loss 0.8204 LearningRate 0.0000 Epoch: 35 Global Step: 60640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-06 17:09:37,399-Speed 9402.62 samples/sec Loss 0.8221 LearningRate 0.0000 Epoch: 35 Global Step: 60650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-06 17:10:03,551-Speed 9397.75 samples/sec Loss 0.8183 LearningRate 0.0000 Epoch: 35 Global Step: 60660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-06 17:10:29,694-Speed 9400.78 samples/sec Loss 0.8228 LearningRate 0.0000 Epoch: 35 Global Step: 60670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-06 17:10:55,845-Speed 9399.13 samples/sec Loss 0.8219 LearningRate 0.0000 Epoch: 35 Global Step: 60680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-06 17:11:22,064-Speed 9373.80 samples/sec Loss 0.8256 LearningRate 0.0000 Epoch: 35 Global Step: 60690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-06 17:11:48,161-Speed 9417.33 samples/sec Loss 0.8223 LearningRate 0.0000 Epoch: 35 Global Step: 60700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-03-06 17:12:14,267-Speed 9414.40 samples/sec Loss 0.8279 LearningRate 0.0000 Epoch: 35 Global Step: 60710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-03-06 17:12:40,458-Speed 9383.79 samples/sec Loss 0.8292 LearningRate 0.0000 Epoch: 35 Global Step: 60720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-03-06 17:13:06,594-Speed 9403.67 samples/sec Loss 0.8245 LearningRate 0.0000 Epoch: 35 Global Step: 60730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-03-06 17:13:32,760-Speed 9392.94 samples/sec Loss 0.8259 LearningRate 0.0000 Epoch: 35 Global Step: 60740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-06 17:13:58,838-Speed 9424.20 samples/sec Loss 0.8263 LearningRate 0.0000 Epoch: 35 Global Step: 60750 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:14:24,963-Speed 9407.49 samples/sec Loss 0.8231 LearningRate 0.0000 Epoch: 35 Global Step: 60760 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:14:51,054-Speed 9419.96 samples/sec Loss 0.8239 LearningRate 0.0000 Epoch: 35 Global Step: 60770 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:15:17,193-Speed 9402.39 samples/sec Loss 0.8247 LearningRate 0.0000 Epoch: 35 Global Step: 60780 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:15:43,358-Speed 9393.19 samples/sec Loss 0.8230 LearningRate 0.0000 Epoch: 35 Global Step: 60790 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:16:09,487-Speed 9405.84 samples/sec Loss 0.8239 LearningRate 0.0000 Epoch: 35 Global Step: 60800 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:16:35,600-Speed 9412.18 samples/sec Loss 0.8269 LearningRate 0.0000 Epoch: 35 Global Step: 60810 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:17:01,718-Speed 9409.79 samples/sec Loss 0.8254 LearningRate 0.0000 Epoch: 35 Global Step: 60820 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:17:27,817-Speed 9416.91 samples/sec Loss 0.8211 LearningRate 0.0000 Epoch: 35 Global Step: 60830 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:17:53,921-Speed 9415.33 samples/sec Loss 0.8314 LearningRate 0.0000 Epoch: 35 Global Step: 60840 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:18:20,026-Speed 9414.48 samples/sec Loss 0.8242 LearningRate 0.0000 Epoch: 35 Global Step: 60850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-06 17:18:46,085-Speed 9431.47 samples/sec Loss 0.8217 LearningRate 0.0000 Epoch: 35 Global Step: 60860 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:19:12,183-Speed 9417.18 samples/sec Loss 0.8250 LearningRate 0.0000 Epoch: 35 Global Step: 60870 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:19:38,282-Speed 9416.81 samples/sec Loss 0.8219 LearningRate 0.0000 Epoch: 35 Global Step: 60880 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:20:04,451-Speed 9391.77 samples/sec Loss 0.8219 LearningRate 0.0000 Epoch: 35 Global Step: 60890 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:20:30,650-Speed 9380.62 samples/sec Loss 0.8205 LearningRate 0.0000 Epoch: 35 Global Step: 60900 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:20:56,720-Speed 9427.63 samples/sec Loss 0.8256 LearningRate 0.0000 Epoch: 35 Global Step: 60910 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:21:22,853-Speed 9404.33 samples/sec Loss 0.8250 LearningRate 0.0000 Epoch: 35 Global Step: 60920 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:21:48,881-Speed 9443.47 samples/sec Loss 0.8253 LearningRate 0.0000 Epoch: 35 Global Step: 60930 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:22:14,916-Speed 9440.03 samples/sec Loss 0.8233 LearningRate 0.0000 Epoch: 35 Global Step: 60940 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:22:41,038-Speed 9408.75 samples/sec Loss 0.8308 LearningRate 0.0000 Epoch: 35 Global Step: 60950 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:23:07,204-Speed 9392.46 samples/sec Loss 0.8259 LearningRate 0.0000 Epoch: 35 Global Step: 60960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-06 17:23:33,328-Speed 9407.74 samples/sec Loss 0.8244 LearningRate 0.0000 Epoch: 35 Global Step: 60970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-06 17:23:59,463-Speed 9403.92 samples/sec Loss 0.8204 LearningRate 0.0000 Epoch: 35 Global Step: 60980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-06 17:24:25,578-Speed 9411.25 samples/sec Loss 0.8192 LearningRate 0.0000 Epoch: 35 Global Step: 60990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-06 17:24:51,726-Speed 9399.30 samples/sec Loss 0.8267 LearningRate 0.0000 Epoch: 35 Global Step: 61000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-06 17:25:17,830-Speed 9414.82 samples/sec Loss 0.8253 LearningRate 0.0000 Epoch: 35 Global Step: 61010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-06 17:25:43,994-Speed 9393.51 samples/sec Loss 0.8255 LearningRate 0.0000 Epoch: 35 Global Step: 61020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-06 17:26:10,112-Speed 9410.17 samples/sec Loss 0.8188 LearningRate 0.0000 Epoch: 35 Global Step: 61030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-06 17:26:36,269-Speed 9396.05 samples/sec Loss 0.8275 LearningRate 0.0000 Epoch: 35 Global Step: 61040 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:27:02,451-Speed 9386.78 samples/sec Loss 0.8190 LearningRate 0.0000 Epoch: 35 Global Step: 61050 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:27:28,652-Speed 9380.54 samples/sec Loss 0.8149 LearningRate 0.0000 Epoch: 35 Global Step: 61060 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:27:54,813-Speed 9394.39 samples/sec Loss 0.8196 LearningRate 0.0000 Epoch: 35 Global Step: 61070 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:28:20,957-Speed 9400.78 samples/sec Loss 0.8217 LearningRate 0.0000 Epoch: 35 Global Step: 61080 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:28:47,062-Speed 9414.84 samples/sec Loss 0.8186 LearningRate 0.0000 Epoch: 35 Global Step: 61090 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:29:13,195-Speed 9404.52 samples/sec Loss 0.8216 LearningRate 0.0000 Epoch: 35 Global Step: 61100 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:29:39,296-Speed 9415.82 samples/sec Loss 0.8198 LearningRate 0.0000 Epoch: 35 Global Step: 61110 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:30:05,455-Speed 9395.62 samples/sec Loss 0.8209 LearningRate 0.0000 Epoch: 35 Global Step: 61120 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:30:31,577-Speed 9408.25 samples/sec Loss 0.8226 LearningRate 0.0000 Epoch: 35 Global Step: 61130 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:30:57,686-Speed 9413.37 samples/sec Loss 0.8207 LearningRate 0.0000 Epoch: 35 Global Step: 61140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-06 17:31:23,843-Speed 9396.04 samples/sec Loss 0.8192 LearningRate 0.0000 Epoch: 35 Global Step: 61150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-06 17:31:50,056-Speed 9375.96 samples/sec Loss 0.8221 LearningRate 0.0000 Epoch: 35 Global Step: 61160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-06 17:32:16,128-Speed 9426.76 samples/sec Loss 0.8198 LearningRate 0.0000 Epoch: 35 Global Step: 61170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-06 17:32:42,355-Speed 9370.83 samples/sec Loss 0.8117 LearningRate 0.0000 Epoch: 35 Global Step: 61180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-06 17:33:08,501-Speed 9399.86 samples/sec Loss 0.8207 LearningRate 0.0000 Epoch: 35 Global Step: 61190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-06 17:33:34,720-Speed 9373.79 samples/sec Loss 0.8228 LearningRate 0.0000 Epoch: 35 Global Step: 61200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-06 17:34:00,807-Speed 9420.93 samples/sec Loss 0.8177 LearningRate 0.0000 Epoch: 35 Global Step: 61210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-06 17:34:26,891-Speed 9423.53 samples/sec Loss 0.8260 LearningRate 0.0000 Epoch: 35 Global Step: 61220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-06 17:34:52,990-Speed 9416.58 samples/sec Loss 0.8147 LearningRate 0.0000 Epoch: 35 Global Step: 61230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-06 17:35:19,150-Speed 9395.09 samples/sec Loss 0.8160 LearningRate 0.0000 Epoch: 35 Global Step: 61240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-06 17:35:45,197-Speed 9435.62 samples/sec Loss 0.8209 LearningRate 0.0000 Epoch: 35 Global Step: 61250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-06 17:36:11,334-Speed 9403.35 samples/sec Loss 0.8165 LearningRate 0.0000 Epoch: 35 Global Step: 61260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-06 17:36:37,472-Speed 9403.96 samples/sec Loss 0.8220 LearningRate 0.0000 Epoch: 35 Global Step: 61270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-06 17:37:03,610-Speed 9402.65 samples/sec Loss 0.8196 LearningRate 0.0000 Epoch: 35 Global Step: 61280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-06 17:37:29,798-Speed 9384.53 samples/sec Loss 0.8279 LearningRate 0.0000 Epoch: 35 Global Step: 61290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-06 17:37:55,948-Speed 9398.57 samples/sec Loss 0.8243 LearningRate 0.0000 Epoch: 35 Global Step: 61300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-06 17:38:22,159-Speed 9376.46 samples/sec Loss 0.8117 LearningRate 0.0000 Epoch: 35 Global Step: 61310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-06 17:38:48,245-Speed 9421.54 samples/sec Loss 0.8215 LearningRate 0.0000 Epoch: 35 Global Step: 61320 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:39:14,434-Speed 9384.78 samples/sec Loss 0.8203 LearningRate 0.0000 Epoch: 35 Global Step: 61330 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:39:40,574-Speed 9402.11 samples/sec Loss 0.8167 LearningRate 0.0000 Epoch: 35 Global Step: 61340 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:40:06,799-Speed 9371.32 samples/sec Loss 0.8135 LearningRate 0.0000 Epoch: 35 Global Step: 61350 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:40:32,913-Speed 9411.90 samples/sec Loss 0.8158 LearningRate 0.0000 Epoch: 35 Global Step: 61360 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:40:59,024-Speed 9412.31 samples/sec Loss 0.8238 LearningRate 0.0000 Epoch: 35 Global Step: 61370 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:41:25,073-Speed 9435.20 samples/sec Loss 0.8142 LearningRate 0.0000 Epoch: 35 Global Step: 61380 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:41:51,191-Speed 9409.87 samples/sec Loss 0.8145 LearningRate 0.0000 Epoch: 35 Global Step: 61390 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:42:17,284-Speed 9418.94 samples/sec Loss 0.8155 LearningRate 0.0000 Epoch: 35 Global Step: 61400 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:42:43,392-Speed 9413.93 samples/sec Loss 0.8174 LearningRate 0.0000 Epoch: 35 Global Step: 61410 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:43:09,529-Speed 9403.23 samples/sec Loss 0.8150 LearningRate 0.0000 Epoch: 35 Global Step: 61420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-06 17:43:35,708-Speed 9387.79 samples/sec Loss 0.8105 LearningRate 0.0000 Epoch: 35 Global Step: 61430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-06 17:44:01,837-Speed 9406.08 samples/sec Loss 0.8134 LearningRate 0.0000 Epoch: 35 Global Step: 61440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-06 17:44:28,033-Speed 9382.29 samples/sec Loss 0.8159 LearningRate 0.0000 Epoch: 35 Global Step: 61450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-06 17:44:54,154-Speed 9408.84 samples/sec Loss 0.8176 LearningRate 0.0000 Epoch: 35 Global Step: 61460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-06 17:45:20,292-Speed 9403.07 samples/sec Loss 0.8134 LearningRate 0.0000 Epoch: 35 Global Step: 61470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-06 17:45:46,409-Speed 9410.52 samples/sec Loss 0.8189 LearningRate 0.0000 Epoch: 35 Global Step: 61480 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:46:12,526-Speed 9410.38 samples/sec Loss 0.8177 LearningRate 0.0000 Epoch: 35 Global Step: 61490 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:46:38,584-Speed 9431.79 samples/sec Loss 0.8131 LearningRate 0.0000 Epoch: 35 Global Step: 61500 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:47:04,647-Speed 9429.81 samples/sec Loss 0.8183 LearningRate 0.0000 Epoch: 35 Global Step: 61510 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:47:30,716-Speed 9427.90 samples/sec Loss 0.8101 LearningRate 0.0000 Epoch: 35 Global Step: 61520 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:47:56,816-Speed 9416.66 samples/sec Loss 0.8148 LearningRate 0.0000 Epoch: 35 Global Step: 61530 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:48:22,992-Speed 9388.97 samples/sec Loss 0.8195 LearningRate 0.0000 Epoch: 35 Global Step: 61540 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:48:49,113-Speed 9408.86 samples/sec Loss 0.8111 LearningRate 0.0000 Epoch: 35 Global Step: 61550 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:49:15,303-Speed 9384.21 samples/sec Loss 0.8162 LearningRate 0.0000 Epoch: 35 Global Step: 61560 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:49:41,514-Speed 9376.74 samples/sec Loss 0.8094 LearningRate 0.0000 Epoch: 35 Global Step: 61570 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-06 17:50:07,610-Speed 9417.84 samples/sec Loss 0.8168 LearningRate 0.0000 Epoch: 35 Global Step: 61580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-06 17:50:33,751-Speed 9401.65 samples/sec Loss 0.8151 LearningRate 0.0000 Epoch: 35 Global Step: 61590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-06 17:50:59,802-Speed 9434.16 samples/sec Loss 0.8117 LearningRate 0.0000 Epoch: 35 Global Step: 61600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-06 17:51:25,896-Speed 9419.11 samples/sec Loss 0.8164 LearningRate 0.0000 Epoch: 35 Global Step: 61610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-06 17:51:51,957-Speed 9430.51 samples/sec Loss 0.8116 LearningRate 0.0000 Epoch: 35 Global Step: 61620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-06 17:52:18,084-Speed 9406.74 samples/sec Loss 0.8192 LearningRate 0.0000 Epoch: 35 Global Step: 61630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-06 17:52:44,233-Speed 9399.12 samples/sec Loss 0.8152 LearningRate 0.0000 Epoch: 35 Global Step: 61640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-06 17:53:10,450-Speed 9374.32 samples/sec Loss 0.8208 LearningRate 0.0000 Epoch: 35 Global Step: 61650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-06 17:53:36,579-Speed 9406.47 samples/sec Loss 0.8127 LearningRate 0.0000 Epoch: 35 Global Step: 61660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-06 17:54:02,639-Speed 9431.01 samples/sec Loss 0.8136 LearningRate 0.0000 Epoch: 35 Global Step: 61670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-06 17:54:28,732-Speed 9418.89 samples/sec Loss 0.8148 LearningRate 0.0000 Epoch: 35 Global Step: 61680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-03-06 17:54:54,827-Speed 9418.21 samples/sec Loss 0.8113 LearningRate 0.0000 Epoch: 35 Global Step: 61690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-06 17:55:20,921-Speed 9418.73 samples/sec Loss 0.8139 LearningRate 0.0000 Epoch: 35 Global Step: 61700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-06 17:55:46,984-Speed 9430.09 samples/sec Loss 0.8142 LearningRate 0.0000 Epoch: 35 Global Step: 61710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-06 17:56:13,057-Speed 9426.03 samples/sec Loss 0.8097 LearningRate 0.0000 Epoch: 35 Global Step: 61720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-06 17:56:39,139-Speed 9423.37 samples/sec Loss 0.8085 LearningRate 0.0000 Epoch: 35 Global Step: 61730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-06 17:57:05,291-Speed 9397.37 samples/sec Loss 0.8172 LearningRate 0.0000 Epoch: 35 Global Step: 61740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-06 17:57:31,490-Speed 9381.24 samples/sec Loss 0.8120 LearningRate 0.0000 Epoch: 35 Global Step: 61750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-06 17:57:57,618-Speed 9406.39 samples/sec Loss 0.8219 LearningRate 0.0000 Epoch: 35 Global Step: 61760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-06 17:58:23,713-Speed 9418.52 samples/sec Loss 0.8142 LearningRate 0.0000 Epoch: 35 Global Step: 61770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-06 17:58:49,815-Speed 9415.65 samples/sec Loss 0.8139 LearningRate 0.0000 Epoch: 35 Global Step: 61780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-06 17:59:15,940-Speed 9407.28 samples/sec Loss 0.8175 LearningRate 0.0000 Epoch: 35 Global Step: 61790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-06 17:59:41,997-Speed 9432.24 samples/sec Loss 0.8127 LearningRate 0.0000 Epoch: 35 Global Step: 61800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-06 18:00:08,079-Speed 9422.99 samples/sec Loss 0.8106 LearningRate 0.0000 Epoch: 35 Global Step: 61810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-06 18:00:34,153-Speed 9425.86 samples/sec Loss 0.8123 LearningRate 0.0000 Epoch: 35 Global Step: 61820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-06 18:01:00,224-Speed 9426.97 samples/sec Loss 0.8157 LearningRate 0.0000 Epoch: 35 Global Step: 61830 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:01:26,319-Speed 9418.31 samples/sec Loss 0.8123 LearningRate 0.0000 Epoch: 35 Global Step: 61840 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:01:52,422-Speed 9415.66 samples/sec Loss 0.8184 LearningRate 0.0000 Epoch: 35 Global Step: 61850 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:02:18,520-Speed 9417.41 samples/sec Loss 0.7997 LearningRate 0.0000 Epoch: 35 Global Step: 61860 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:02:44,554-Speed 9440.28 samples/sec Loss 0.8140 LearningRate 0.0000 Epoch: 35 Global Step: 61870 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:03:10,727-Speed 9390.26 samples/sec Loss 0.8191 LearningRate 0.0000 Epoch: 35 Global Step: 61880 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:03:36,877-Speed 9398.80 samples/sec Loss 0.8155 LearningRate 0.0000 Epoch: 35 Global Step: 61890 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:04:02,924-Speed 9435.14 samples/sec Loss 0.8110 LearningRate 0.0000 Epoch: 35 Global Step: 61900 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-06 18:04:28,931-Speed 9450.50 samples/sec Loss 0.8124 LearningRate 0.0000 Epoch: 35 Global Step: 61910 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-06 18:04:54,968-Speed 9439.25 samples/sec Loss 0.8146 LearningRate 0.0000 Epoch: 35 Global Step: 61920 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-06 18:05:21,102-Speed 9404.08 samples/sec Loss 0.8140 LearningRate 0.0000 Epoch: 35 Global Step: 61930 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-06 18:05:47,221-Speed 9409.85 samples/sec Loss 0.8045 LearningRate 0.0000 Epoch: 35 Global Step: 61940 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-06 18:06:13,300-Speed 9423.83 samples/sec Loss 0.8097 LearningRate 0.0000 Epoch: 35 Global Step: 61950 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-06 18:06:39,388-Speed 9420.75 samples/sec Loss 0.8118 LearningRate 0.0000 Epoch: 35 Global Step: 61960 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-06 18:07:05,533-Speed 9400.31 samples/sec Loss 0.8118 LearningRate 0.0000 Epoch: 35 Global Step: 61970 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-06 18:07:31,597-Speed 9429.69 samples/sec Loss 0.8141 LearningRate 0.0000 Epoch: 35 Global Step: 61980 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-06 18:07:57,692-Speed 9418.15 samples/sec Loss 0.8120 LearningRate 0.0000 Epoch: 35 Global Step: 61990 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-06 18:08:23,773-Speed 9423.28 samples/sec Loss 0.8125 LearningRate 0.0000 Epoch: 35 Global Step: 62000 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:08:49,883-Speed 9413.62 samples/sec Loss 0.8079 LearningRate 0.0000 Epoch: 35 Global Step: 62010 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:09:16,052-Speed 9391.74 samples/sec Loss 0.8133 LearningRate 0.0000 Epoch: 35 Global Step: 62020 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:09:42,183-Speed 9405.22 samples/sec Loss 0.8040 LearningRate 0.0000 Epoch: 35 Global Step: 62030 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:10:08,351-Speed 9392.21 samples/sec Loss 0.8165 LearningRate 0.0000 Epoch: 35 Global Step: 62040 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:10:34,397-Speed 9435.76 samples/sec Loss 0.8054 LearningRate 0.0000 Epoch: 35 Global Step: 62050 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:11:00,465-Speed 9428.44 samples/sec Loss 0.8025 LearningRate 0.0000 Epoch: 35 Global Step: 62060 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:11:26,558-Speed 9418.91 samples/sec Loss 0.8099 LearningRate 0.0000 Epoch: 35 Global Step: 62070 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:11:52,607-Speed 9435.11 samples/sec Loss 0.8086 LearningRate 0.0000 Epoch: 35 Global Step: 62080 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:12:18,704-Speed 9417.55 samples/sec Loss 0.8132 LearningRate 0.0000 Epoch: 35 Global Step: 62090 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:12:44,793-Speed 9420.22 samples/sec Loss 0.8114 LearningRate 0.0000 Epoch: 35 Global Step: 62100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-06 18:13:10,924-Speed 9405.61 samples/sec Loss 0.8123 LearningRate 0.0000 Epoch: 35 Global Step: 62110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-06 18:13:37,052-Speed 9406.20 samples/sec Loss 0.8165 LearningRate 0.0000 Epoch: 35 Global Step: 62120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-06 18:14:03,063-Speed 9448.78 samples/sec Loss 0.8118 LearningRate 0.0000 Epoch: 35 Global Step: 62130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-06 18:14:29,134-Speed 9426.68 samples/sec Loss 0.8160 LearningRate 0.0000 Epoch: 35 Global Step: 62140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-06 18:14:55,177-Speed 9437.27 samples/sec Loss 0.8080 LearningRate 0.0000 Epoch: 35 Global Step: 62150 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:15:21,201-Speed 9444.07 samples/sec Loss 0.8130 LearningRate 0.0000 Epoch: 35 Global Step: 62160 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:15:47,277-Speed 9425.20 samples/sec Loss 0.8112 LearningRate 0.0000 Epoch: 35 Global Step: 62170 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:16:13,438-Speed 9394.69 samples/sec Loss 0.8066 LearningRate 0.0000 Epoch: 35 Global Step: 62180 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:16:39,664-Speed 9371.49 samples/sec Loss 0.8110 LearningRate 0.0000 Epoch: 35 Global Step: 62190 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:17:05,835-Speed 9390.86 samples/sec Loss 0.8102 LearningRate 0.0000 Epoch: 35 Global Step: 62200 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:17:31,987-Speed 9398.76 samples/sec Loss 0.7982 LearningRate 0.0000 Epoch: 35 Global Step: 62210 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:18:52,217-Speed 3063.27 samples/sec Loss 0.8099 LearningRate 0.0000 Epoch: 36 Global Step: 62220 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:19:18,125-Speed 9486.41 samples/sec Loss 0.8064 LearningRate 0.0000 Epoch: 36 Global Step: 62230 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:19:44,006-Speed 9496.13 samples/sec Loss 0.8056 LearningRate 0.0000 Epoch: 36 Global Step: 62240 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:20:10,058-Speed 9433.66 samples/sec Loss 0.8118 LearningRate 0.0000 Epoch: 36 Global Step: 62250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-06 18:20:36,032-Speed 9462.20 samples/sec Loss 0.8075 LearningRate 0.0000 Epoch: 36 Global Step: 62260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-06 18:21:01,982-Speed 9470.87 samples/sec Loss 0.8114 LearningRate 0.0000 Epoch: 36 Global Step: 62270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-06 18:21:28,002-Speed 9445.46 samples/sec Loss 0.8105 LearningRate 0.0000 Epoch: 36 Global Step: 62280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-06 18:21:54,070-Speed 9428.42 samples/sec Loss 0.8067 LearningRate 0.0000 Epoch: 36 Global Step: 62290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-06 18:22:20,138-Speed 9427.90 samples/sec Loss 0.8081 LearningRate 0.0000 Epoch: 36 Global Step: 62300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-06 18:22:46,129-Speed 9456.37 samples/sec Loss 0.8059 LearningRate 0.0000 Epoch: 36 Global Step: 62310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-06 18:23:12,217-Speed 9420.74 samples/sec Loss 0.8024 LearningRate 0.0000 Epoch: 36 Global Step: 62320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-06 18:23:38,366-Speed 9399.10 samples/sec Loss 0.8143 LearningRate 0.0000 Epoch: 36 Global Step: 62330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-06 18:24:04,534-Speed 9391.77 samples/sec Loss 0.8077 LearningRate 0.0000 Epoch: 36 Global Step: 62340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-06 18:24:30,621-Speed 9421.10 samples/sec Loss 0.7980 LearningRate 0.0000 Epoch: 36 Global Step: 62350 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-03-06 18:24:56,730-Speed 9413.68 samples/sec Loss 0.8026 LearningRate 0.0000 Epoch: 36 Global Step: 62360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-03-06 18:25:22,742-Speed 9448.11 samples/sec Loss 0.8019 LearningRate 0.0000 Epoch: 36 Global Step: 62370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-06 18:25:48,901-Speed 9395.20 samples/sec Loss 0.8160 LearningRate 0.0000 Epoch: 36 Global Step: 62380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-06 18:26:14,916-Speed 9447.54 samples/sec Loss 0.8013 LearningRate 0.0000 Epoch: 36 Global Step: 62390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-06 18:26:40,933-Speed 9446.54 samples/sec Loss 0.8045 LearningRate 0.0000 Epoch: 36 Global Step: 62400 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:27:07,098-Speed 9392.98 samples/sec Loss 0.8023 LearningRate 0.0000 Epoch: 36 Global Step: 62410 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:27:33,147-Speed 9434.78 samples/sec Loss 0.8082 LearningRate 0.0000 Epoch: 36 Global Step: 62420 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:27:59,212-Speed 9429.09 samples/sec Loss 0.8073 LearningRate 0.0000 Epoch: 36 Global Step: 62430 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:28:25,274-Speed 9430.54 samples/sec Loss 0.8047 LearningRate 0.0000 Epoch: 36 Global Step: 62440 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:28:51,440-Speed 9392.44 samples/sec Loss 0.8041 LearningRate 0.0000 Epoch: 36 Global Step: 62450 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:29:17,599-Speed 9395.39 samples/sec Loss 0.8066 LearningRate 0.0000 Epoch: 36 Global Step: 62460 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:29:43,686-Speed 9421.09 samples/sec Loss 0.8094 LearningRate 0.0000 Epoch: 36 Global Step: 62470 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:30:09,792-Speed 9414.04 samples/sec Loss 0.8031 LearningRate 0.0000 Epoch: 36 Global Step: 62480 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:30:35,899-Speed 9413.91 samples/sec Loss 0.8042 LearningRate 0.0000 Epoch: 36 Global Step: 62490 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:31:02,128-Speed 9370.42 samples/sec Loss 0.8101 LearningRate 0.0000 Epoch: 36 Global Step: 62500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-06 18:31:28,180-Speed 9433.63 samples/sec Loss 0.8023 LearningRate 0.0000 Epoch: 36 Global Step: 62510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-06 18:31:54,246-Speed 9428.73 samples/sec Loss 0.8005 LearningRate 0.0000 Epoch: 36 Global Step: 62520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-06 18:32:20,416-Speed 9391.31 samples/sec Loss 0.8009 LearningRate 0.0000 Epoch: 36 Global Step: 62530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-06 18:32:46,525-Speed 9413.46 samples/sec Loss 0.8077 LearningRate 0.0000 Epoch: 36 Global Step: 62540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-06 18:33:12,641-Speed 9410.65 samples/sec Loss 0.8029 LearningRate 0.0000 Epoch: 36 Global Step: 62550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-06 18:33:38,704-Speed 9430.32 samples/sec Loss 0.8083 LearningRate 0.0000 Epoch: 36 Global Step: 62560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-06 18:34:04,917-Speed 9375.83 samples/sec Loss 0.8079 LearningRate 0.0000 Epoch: 36 Global Step: 62570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-06 18:34:30,980-Speed 9429.79 samples/sec Loss 0.8033 LearningRate 0.0000 Epoch: 36 Global Step: 62580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-06 18:34:57,053-Speed 9426.69 samples/sec Loss 0.8026 LearningRate 0.0000 Epoch: 36 Global Step: 62590 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:35:23,202-Speed 9398.68 samples/sec Loss 0.8041 LearningRate 0.0000 Epoch: 36 Global Step: 62600 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:35:49,342-Speed 9402.05 samples/sec Loss 0.8072 LearningRate 0.0000 Epoch: 36 Global Step: 62610 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:36:15,407-Speed 9428.80 samples/sec Loss 0.7989 LearningRate 0.0000 Epoch: 36 Global Step: 62620 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:36:41,618-Speed 9376.91 samples/sec Loss 0.8087 LearningRate 0.0000 Epoch: 36 Global Step: 62630 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:37:07,703-Speed 9421.84 samples/sec Loss 0.8023 LearningRate 0.0000 Epoch: 36 Global Step: 62640 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:37:33,779-Speed 9425.19 samples/sec Loss 0.8131 LearningRate 0.0000 Epoch: 36 Global Step: 62650 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-06 18:37:59,934-Speed 9396.97 samples/sec Loss 0.8048 LearningRate 0.0000 Epoch: 36 Global Step: 62660 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-06 18:38:26,112-Speed 9388.24 samples/sec Loss 0.8097 LearningRate 0.0000 Epoch: 36 Global Step: 62670 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-06 18:38:52,253-Speed 9401.64 samples/sec Loss 0.8046 LearningRate 0.0000 Epoch: 36 Global Step: 62680 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-06 18:39:18,370-Speed 9410.41 samples/sec Loss 0.8108 LearningRate 0.0000 Epoch: 36 Global Step: 62690 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-06 18:39:44,557-Speed 9385.17 samples/sec Loss 0.8078 LearningRate 0.0000 Epoch: 36 Global Step: 62700 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-06 18:40:10,740-Speed 9386.75 samples/sec Loss 0.8044 LearningRate 0.0000 Epoch: 36 Global Step: 62710 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-06 18:40:36,902-Speed 9394.19 samples/sec Loss 0.8073 LearningRate 0.0000 Epoch: 36 Global Step: 62720 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-06 18:41:03,075-Speed 9390.41 samples/sec Loss 0.8127 LearningRate 0.0000 Epoch: 36 Global Step: 62730 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-06 18:41:29,224-Speed 9400.21 samples/sec Loss 0.8032 LearningRate 0.0000 Epoch: 36 Global Step: 62740 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-06 18:41:55,368-Speed 9400.61 samples/sec Loss 0.8069 LearningRate 0.0000 Epoch: 36 Global Step: 62750 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:42:21,481-Speed 9411.90 samples/sec Loss 0.8055 LearningRate 0.0000 Epoch: 36 Global Step: 62760 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:42:47,594-Speed 9411.54 samples/sec Loss 0.8049 LearningRate 0.0000 Epoch: 36 Global Step: 62770 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:43:13,778-Speed 9386.42 samples/sec Loss 0.8057 LearningRate 0.0000 Epoch: 36 Global Step: 62780 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:43:39,877-Speed 9416.75 samples/sec Loss 0.7999 LearningRate 0.0000 Epoch: 36 Global Step: 62790 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:44:06,012-Speed 9403.89 samples/sec Loss 0.8034 LearningRate 0.0000 Epoch: 36 Global Step: 62800 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:44:32,232-Speed 9373.56 samples/sec Loss 0.8084 LearningRate 0.0000 Epoch: 36 Global Step: 62810 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:44:58,294-Speed 9430.14 samples/sec Loss 0.8019 LearningRate 0.0000 Epoch: 36 Global Step: 62820 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:45:24,413-Speed 9409.86 samples/sec Loss 0.8009 LearningRate 0.0000 Epoch: 36 Global Step: 62830 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:45:50,505-Speed 9419.36 samples/sec Loss 0.8033 LearningRate 0.0000 Epoch: 36 Global Step: 62840 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:46:16,669-Speed 9393.36 samples/sec Loss 0.8018 LearningRate 0.0000 Epoch: 36 Global Step: 62850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-06 18:46:42,763-Speed 9418.48 samples/sec Loss 0.8000 LearningRate 0.0000 Epoch: 36 Global Step: 62860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-06 18:47:08,957-Speed 9382.71 samples/sec Loss 0.7994 LearningRate 0.0000 Epoch: 36 Global Step: 62870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-06 18:47:35,140-Speed 9386.71 samples/sec Loss 0.8075 LearningRate 0.0000 Epoch: 36 Global Step: 62880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-06 18:48:01,203-Speed 9430.02 samples/sec Loss 0.8058 LearningRate 0.0000 Epoch: 36 Global Step: 62890 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:48:27,312-Speed 9413.18 samples/sec Loss 0.8098 LearningRate 0.0000 Epoch: 36 Global Step: 62900 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:48:53,470-Speed 9395.72 samples/sec Loss 0.8034 LearningRate 0.0000 Epoch: 36 Global Step: 62910 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:49:19,566-Speed 9417.66 samples/sec Loss 0.8025 LearningRate 0.0000 Epoch: 36 Global Step: 62920 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:49:45,612-Speed 9436.09 samples/sec Loss 0.8009 LearningRate 0.0000 Epoch: 36 Global Step: 62930 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:50:11,768-Speed 9396.47 samples/sec Loss 0.7979 LearningRate 0.0000 Epoch: 36 Global Step: 62940 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:50:37,953-Speed 9385.52 samples/sec Loss 0.8027 LearningRate 0.0000 Epoch: 36 Global Step: 62950 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:51:04,083-Speed 9405.69 samples/sec Loss 0.8093 LearningRate 0.0000 Epoch: 36 Global Step: 62960 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:51:30,243-Speed 9394.84 samples/sec Loss 0.8048 LearningRate 0.0000 Epoch: 36 Global Step: 62970 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:51:56,398-Speed 9396.96 samples/sec Loss 0.8029 LearningRate 0.0000 Epoch: 36 Global Step: 62980 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-06 18:52:22,545-Speed 9399.51 samples/sec Loss 0.7961 LearningRate 0.0000 Epoch: 36 Global Step: 62990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-06 18:52:48,754-Speed 9377.18 samples/sec Loss 0.8045 LearningRate 0.0000 Epoch: 36 Global Step: 63000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-06 18:53:14,834-Speed 9423.99 samples/sec Loss 0.8018 LearningRate 0.0000 Epoch: 36 Global Step: 63010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 18:53:41,028-Speed 9382.74 samples/sec Loss 0.8089 LearningRate 0.0000 Epoch: 36 Global Step: 63020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 18:54:07,195-Speed 9392.11 samples/sec Loss 0.7981 LearningRate 0.0000 Epoch: 36 Global Step: 63030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 18:54:33,384-Speed 9384.52 samples/sec Loss 0.8056 LearningRate 0.0000 Epoch: 36 Global Step: 63040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 18:54:59,527-Speed 9401.01 samples/sec Loss 0.8009 LearningRate 0.0000 Epoch: 36 Global Step: 63050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 18:55:25,637-Speed 9413.03 samples/sec Loss 0.7965 LearningRate 0.0000 Epoch: 36 Global Step: 63060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 18:55:51,791-Speed 9397.11 samples/sec Loss 0.7975 LearningRate 0.0000 Epoch: 36 Global Step: 63070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 18:56:17,999-Speed 9377.79 samples/sec Loss 0.7994 LearningRate 0.0000 Epoch: 36 Global Step: 63080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 18:56:44,051-Speed 9433.78 samples/sec Loss 0.8000 LearningRate 0.0000 Epoch: 36 Global Step: 63090 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-06 18:57:10,203-Speed 9397.77 samples/sec Loss 0.8066 LearningRate 0.0000 Epoch: 36 Global Step: 63100 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-06 18:57:39,721-Speed 8326.16 samples/sec Loss 0.7904 LearningRate 0.0000 Epoch: 36 Global Step: 63110 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-06 18:58:05,831-Speed 9412.71 samples/sec Loss 0.8024 LearningRate 0.0000 Epoch: 36 Global Step: 63120 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-06 18:58:32,016-Speed 9385.92 samples/sec Loss 0.7927 LearningRate 0.0000 Epoch: 36 Global Step: 63130 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-06 18:58:58,181-Speed 9393.29 samples/sec Loss 0.8033 LearningRate 0.0000 Epoch: 36 Global Step: 63140 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-06 18:59:24,346-Speed 9393.09 samples/sec Loss 0.7964 LearningRate 0.0000 Epoch: 36 Global Step: 63150 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-06 18:59:50,563-Speed 9374.36 samples/sec Loss 0.8020 LearningRate 0.0000 Epoch: 36 Global Step: 63160 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-06 19:00:16,780-Speed 9374.50 samples/sec Loss 0.7991 LearningRate 0.0000 Epoch: 36 Global Step: 63170 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-06 19:00:42,885-Speed 9414.56 samples/sec Loss 0.7988 LearningRate 0.0000 Epoch: 36 Global Step: 63180 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-06 19:01:09,101-Speed 9374.87 samples/sec Loss 0.7982 LearningRate 0.0000 Epoch: 36 Global Step: 63190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:01:35,238-Speed 9403.13 samples/sec Loss 0.7975 LearningRate 0.0000 Epoch: 36 Global Step: 63200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:02:01,345-Speed 9414.12 samples/sec Loss 0.7989 LearningRate 0.0000 Epoch: 36 Global Step: 63210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:02:27,510-Speed 9393.27 samples/sec Loss 0.7999 LearningRate 0.0000 Epoch: 36 Global Step: 63220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:02:53,676-Speed 9392.85 samples/sec Loss 0.7951 LearningRate 0.0000 Epoch: 36 Global Step: 63230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:03:19,866-Speed 9383.93 samples/sec Loss 0.8005 LearningRate 0.0000 Epoch: 36 Global Step: 63240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:03:46,079-Speed 9375.86 samples/sec Loss 0.8025 LearningRate 0.0000 Epoch: 36 Global Step: 63250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:04:12,236-Speed 9396.05 samples/sec Loss 0.7916 LearningRate 0.0000 Epoch: 36 Global Step: 63260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:04:38,372-Speed 9403.42 samples/sec Loss 0.8051 LearningRate 0.0000 Epoch: 36 Global Step: 63270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:05:04,520-Speed 9400.12 samples/sec Loss 0.7951 LearningRate 0.0000 Epoch: 36 Global Step: 63280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:05:30,715-Speed 9382.26 samples/sec Loss 0.8049 LearningRate 0.0000 Epoch: 36 Global Step: 63290 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-06 19:05:56,891-Speed 9389.31 samples/sec Loss 0.8060 LearningRate 0.0000 Epoch: 36 Global Step: 63300 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-06 19:06:23,154-Speed 9357.99 samples/sec Loss 0.8040 LearningRate 0.0000 Epoch: 36 Global Step: 63310 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-06 19:06:49,298-Speed 9400.63 samples/sec Loss 0.7984 LearningRate 0.0000 Epoch: 36 Global Step: 63320 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-06 19:07:15,571-Speed 9354.35 samples/sec Loss 0.7994 LearningRate 0.0000 Epoch: 36 Global Step: 63330 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-06 19:07:41,869-Speed 9345.71 samples/sec Loss 0.7982 LearningRate 0.0000 Epoch: 36 Global Step: 63340 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-06 19:08:07,992-Speed 9407.87 samples/sec Loss 0.8007 LearningRate 0.0000 Epoch: 36 Global Step: 63350 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-06 19:08:34,121-Speed 9406.08 samples/sec Loss 0.7931 LearningRate 0.0000 Epoch: 36 Global Step: 63360 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-06 19:09:00,356-Speed 9368.23 samples/sec Loss 0.7942 LearningRate 0.0000 Epoch: 36 Global Step: 63370 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-06 19:09:26,464-Speed 9413.52 samples/sec Loss 0.7932 LearningRate 0.0000 Epoch: 36 Global Step: 63380 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-06 19:09:52,550-Speed 9421.50 samples/sec Loss 0.7963 LearningRate 0.0000 Epoch: 36 Global Step: 63390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:10:18,714-Speed 9393.64 samples/sec Loss 0.8026 LearningRate 0.0000 Epoch: 36 Global Step: 63400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:10:44,902-Speed 9384.54 samples/sec Loss 0.7961 LearningRate 0.0000 Epoch: 36 Global Step: 63410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:11:11,173-Speed 9355.30 samples/sec Loss 0.8057 LearningRate 0.0000 Epoch: 36 Global Step: 63420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:11:37,335-Speed 9394.11 samples/sec Loss 0.7879 LearningRate 0.0000 Epoch: 36 Global Step: 63430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:12:03,461-Speed 9407.20 samples/sec Loss 0.7944 LearningRate 0.0000 Epoch: 36 Global Step: 63440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:12:29,543-Speed 9423.08 samples/sec Loss 0.7925 LearningRate 0.0000 Epoch: 36 Global Step: 63450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:12:55,659-Speed 9410.64 samples/sec Loss 0.7976 LearningRate 0.0000 Epoch: 36 Global Step: 63460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:13:21,762-Speed 9415.39 samples/sec Loss 0.7982 LearningRate 0.0000 Epoch: 36 Global Step: 63470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:13:47,890-Speed 9406.38 samples/sec Loss 0.8038 LearningRate 0.0000 Epoch: 36 Global Step: 63480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:14:13,995-Speed 9414.54 samples/sec Loss 0.7948 LearningRate 0.0000 Epoch: 36 Global Step: 63490 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-06 19:14:40,126-Speed 9405.32 samples/sec Loss 0.7945 LearningRate 0.0000 Epoch: 36 Global Step: 63500 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-06 19:15:06,316-Speed 9384.10 samples/sec Loss 0.7991 LearningRate 0.0000 Epoch: 36 Global Step: 63510 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-06 19:15:32,489-Speed 9390.17 samples/sec Loss 0.8040 LearningRate 0.0000 Epoch: 36 Global Step: 63520 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-06 19:15:58,571-Speed 9423.06 samples/sec Loss 0.7956 LearningRate 0.0000 Epoch: 36 Global Step: 63530 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-06 19:16:24,774-Speed 9379.38 samples/sec Loss 0.7969 LearningRate 0.0000 Epoch: 36 Global Step: 63540 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-06 19:16:50,903-Speed 9405.97 samples/sec Loss 0.7938 LearningRate 0.0000 Epoch: 36 Global Step: 63550 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-06 19:17:17,021-Speed 9409.92 samples/sec Loss 0.8007 LearningRate 0.0000 Epoch: 36 Global Step: 63560 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-06 19:17:43,175-Speed 9396.92 samples/sec Loss 0.8048 LearningRate 0.0000 Epoch: 36 Global Step: 63570 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-06 19:18:09,278-Speed 9415.64 samples/sec Loss 0.7972 LearningRate 0.0000 Epoch: 36 Global Step: 63580 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-06 19:18:35,422-Speed 9400.76 samples/sec Loss 0.7959 LearningRate 0.0000 Epoch: 36 Global Step: 63590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:19:01,571-Speed 9398.86 samples/sec Loss 0.7954 LearningRate 0.0000 Epoch: 36 Global Step: 63600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:19:27,736-Speed 9392.92 samples/sec Loss 0.7985 LearningRate 0.0000 Epoch: 36 Global Step: 63610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:19:53,885-Speed 9398.88 samples/sec Loss 0.7915 LearningRate 0.0000 Epoch: 36 Global Step: 63620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:20:20,074-Speed 9384.39 samples/sec Loss 0.7923 LearningRate 0.0000 Epoch: 36 Global Step: 63630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:20:46,219-Speed 9400.51 samples/sec Loss 0.8016 LearningRate 0.0000 Epoch: 36 Global Step: 63640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:21:12,413-Speed 9382.66 samples/sec Loss 0.7947 LearningRate 0.0000 Epoch: 36 Global Step: 63650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:21:38,591-Speed 9388.59 samples/sec Loss 0.7991 LearningRate 0.0000 Epoch: 36 Global Step: 63660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:22:04,741-Speed 9398.50 samples/sec Loss 0.7952 LearningRate 0.0000 Epoch: 36 Global Step: 63670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:22:30,876-Speed 9403.80 samples/sec Loss 0.7982 LearningRate 0.0000 Epoch: 36 Global Step: 63680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:22:56,955-Speed 9424.39 samples/sec Loss 0.7989 LearningRate 0.0000 Epoch: 36 Global Step: 63690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:23:23,068-Speed 9411.95 samples/sec Loss 0.7982 LearningRate 0.0000 Epoch: 36 Global Step: 63700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:23:49,194-Speed 9407.08 samples/sec Loss 0.7995 LearningRate 0.0000 Epoch: 36 Global Step: 63710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:24:15,343-Speed 9399.09 samples/sec Loss 0.8023 LearningRate 0.0000 Epoch: 36 Global Step: 63720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:24:41,566-Speed 9372.09 samples/sec Loss 0.7974 LearningRate 0.0000 Epoch: 36 Global Step: 63730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:25:07,697-Speed 9405.72 samples/sec Loss 0.7905 LearningRate 0.0000 Epoch: 36 Global Step: 63740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:25:33,852-Speed 9396.70 samples/sec Loss 0.7973 LearningRate 0.0000 Epoch: 36 Global Step: 63750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:26:00,003-Speed 9398.35 samples/sec Loss 0.7989 LearningRate 0.0000 Epoch: 36 Global Step: 63760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:26:26,120-Speed 9410.39 samples/sec Loss 0.7970 LearningRate 0.0000 Epoch: 36 Global Step: 63770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:26:52,301-Speed 9387.31 samples/sec Loss 0.7932 LearningRate 0.0000 Epoch: 36 Global Step: 63780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:27:18,509-Speed 9377.46 samples/sec Loss 0.8012 LearningRate 0.0000 Epoch: 36 Global Step: 63790 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-03-06 19:27:44,613-Speed 9415.40 samples/sec Loss 0.7991 LearningRate 0.0000 Epoch: 36 Global Step: 63800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:28:10,734-Speed 9409.01 samples/sec Loss 0.7940 LearningRate 0.0000 Epoch: 36 Global Step: 63810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:28:36,878-Speed 9400.89 samples/sec Loss 0.7934 LearningRate 0.0000 Epoch: 36 Global Step: 63820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:29:03,125-Speed 9363.69 samples/sec Loss 0.8021 LearningRate 0.0000 Epoch: 36 Global Step: 63830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:29:29,358-Speed 9368.96 samples/sec Loss 0.7962 LearningRate 0.0000 Epoch: 36 Global Step: 63840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:29:55,504-Speed 9400.08 samples/sec Loss 0.7937 LearningRate 0.0000 Epoch: 36 Global Step: 63850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:30:21,771-Speed 9356.21 samples/sec Loss 0.7935 LearningRate 0.0000 Epoch: 36 Global Step: 63860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:30:47,934-Speed 9393.75 samples/sec Loss 0.7965 LearningRate 0.0000 Epoch: 36 Global Step: 63870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:31:14,226-Speed 9347.80 samples/sec Loss 0.8011 LearningRate 0.0000 Epoch: 36 Global Step: 63880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:31:40,437-Speed 9376.62 samples/sec Loss 0.7932 LearningRate 0.0000 Epoch: 36 Global Step: 63890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:32:06,606-Speed 9391.88 samples/sec Loss 0.7929 LearningRate 0.0000 Epoch: 36 Global Step: 63900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:32:32,784-Speed 9388.57 samples/sec Loss 0.7914 LearningRate 0.0000 Epoch: 36 Global Step: 63910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:32:58,931-Speed 9399.60 samples/sec Loss 0.7986 LearningRate 0.0000 Epoch: 36 Global Step: 63920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:33:25,144-Speed 9375.93 samples/sec Loss 0.7949 LearningRate 0.0000 Epoch: 36 Global Step: 63930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:33:51,308-Speed 9393.72 samples/sec Loss 0.7966 LearningRate 0.0000 Epoch: 36 Global Step: 63940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:35:10,834-Speed 3090.33 samples/sec Loss 0.7901 LearningRate 0.0000 Epoch: 37 Global Step: 63950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:35:36,730-Speed 9490.68 samples/sec Loss 0.7897 LearningRate 0.0000 Epoch: 37 Global Step: 63960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:36:02,716-Speed 9458.27 samples/sec Loss 0.7895 LearningRate 0.0000 Epoch: 37 Global Step: 63970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:36:28,797-Speed 9423.20 samples/sec Loss 0.7893 LearningRate 0.0000 Epoch: 37 Global Step: 63980 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-06 19:36:54,819-Speed 9444.75 samples/sec Loss 0.7865 LearningRate 0.0000 Epoch: 37 Global Step: 63990 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-06 19:37:20,859-Speed 9438.21 samples/sec Loss 0.7958 LearningRate 0.0000 Epoch: 37 Global Step: 64000 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-06 19:37:46,791-Speed 9477.30 samples/sec Loss 0.7939 LearningRate 0.0000 Epoch: 37 Global Step: 64010 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-06 19:38:12,734-Speed 9473.32 samples/sec Loss 0.7845 LearningRate 0.0000 Epoch: 37 Global Step: 64020 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-06 19:38:38,703-Speed 9464.05 samples/sec Loss 0.7928 LearningRate 0.0000 Epoch: 37 Global Step: 64030 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-06 19:39:04,662-Speed 9467.77 samples/sec Loss 0.7943 LearningRate 0.0000 Epoch: 37 Global Step: 64040 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-06 19:39:30,674-Speed 9448.09 samples/sec Loss 0.7884 LearningRate 0.0000 Epoch: 37 Global Step: 64050 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-06 19:39:56,664-Speed 9456.85 samples/sec Loss 0.7969 LearningRate 0.0000 Epoch: 37 Global Step: 64060 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-06 19:40:22,616-Speed 9470.27 samples/sec Loss 0.7948 LearningRate 0.0000 Epoch: 37 Global Step: 64070 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-06 19:40:48,623-Speed 9450.19 samples/sec Loss 0.7851 LearningRate 0.0000 Epoch: 37 Global Step: 64080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:41:14,661-Speed 9438.66 samples/sec Loss 0.7899 LearningRate 0.0000 Epoch: 37 Global Step: 64090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:41:40,689-Speed 9442.61 samples/sec Loss 0.7970 LearningRate 0.0000 Epoch: 37 Global Step: 64100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:42:06,700-Speed 9448.69 samples/sec Loss 0.7899 LearningRate 0.0000 Epoch: 37 Global Step: 64110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:42:32,751-Speed 9434.26 samples/sec Loss 0.7911 LearningRate 0.0000 Epoch: 37 Global Step: 64120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:42:58,803-Speed 9433.92 samples/sec Loss 0.7933 LearningRate 0.0000 Epoch: 37 Global Step: 64130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:43:24,929-Speed 9407.31 samples/sec Loss 0.7913 LearningRate 0.0000 Epoch: 37 Global Step: 64140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:43:50,895-Speed 9464.87 samples/sec Loss 0.7914 LearningRate 0.0000 Epoch: 37 Global Step: 64150 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-06 19:44:17,211-Speed 9339.32 samples/sec Loss 0.7927 LearningRate 0.0000 Epoch: 37 Global Step: 64160 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-06 19:44:43,283-Speed 9426.85 samples/sec Loss 0.7900 LearningRate 0.0000 Epoch: 37 Global Step: 64170 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-06 19:45:09,446-Speed 9393.66 samples/sec Loss 0.7878 LearningRate 0.0000 Epoch: 37 Global Step: 64180 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-06 19:45:35,525-Speed 9424.10 samples/sec Loss 0.7949 LearningRate 0.0000 Epoch: 37 Global Step: 64190 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-06 19:46:01,686-Speed 9394.63 samples/sec Loss 0.7945 LearningRate 0.0000 Epoch: 37 Global Step: 64200 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-06 19:46:27,758-Speed 9426.73 samples/sec Loss 0.7950 LearningRate 0.0000 Epoch: 37 Global Step: 64210 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-06 19:46:53,896-Speed 9402.95 samples/sec Loss 0.7946 LearningRate 0.0000 Epoch: 37 Global Step: 64220 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-06 19:47:20,127-Speed 9369.32 samples/sec Loss 0.7957 LearningRate 0.0000 Epoch: 37 Global Step: 64230 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-06 19:47:46,293-Speed 9392.86 samples/sec Loss 0.7893 LearningRate 0.0000 Epoch: 37 Global Step: 64240 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-06 19:48:12,463-Speed 9391.18 samples/sec Loss 0.7866 LearningRate 0.0000 Epoch: 37 Global Step: 64250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:48:38,612-Speed 9398.85 samples/sec Loss 0.7951 LearningRate 0.0000 Epoch: 37 Global Step: 64260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:49:04,746-Speed 9404.28 samples/sec Loss 0.8014 LearningRate 0.0000 Epoch: 37 Global Step: 64270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:49:30,992-Speed 9364.16 samples/sec Loss 0.7986 LearningRate 0.0000 Epoch: 37 Global Step: 64280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:49:57,155-Speed 9393.92 samples/sec Loss 0.7963 LearningRate 0.0000 Epoch: 37 Global Step: 64290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:50:23,434-Speed 9352.43 samples/sec Loss 0.7933 LearningRate 0.0000 Epoch: 37 Global Step: 64300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:50:49,655-Speed 9373.53 samples/sec Loss 0.8001 LearningRate 0.0000 Epoch: 37 Global Step: 64310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:51:15,882-Speed 9370.88 samples/sec Loss 0.7945 LearningRate 0.0000 Epoch: 37 Global Step: 64320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:51:42,062-Speed 9387.32 samples/sec Loss 0.7925 LearningRate 0.0000 Epoch: 37 Global Step: 64330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:52:08,268-Speed 9378.82 samples/sec Loss 0.7930 LearningRate 0.0000 Epoch: 37 Global Step: 64340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:52:34,502-Speed 9368.26 samples/sec Loss 0.7943 LearningRate 0.0000 Epoch: 37 Global Step: 64350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:53:00,698-Speed 9382.43 samples/sec Loss 0.8005 LearningRate 0.0000 Epoch: 37 Global Step: 64360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-06 19:53:26,834-Speed 9403.63 samples/sec Loss 0.7954 LearningRate 0.0000 Epoch: 37 Global Step: 64370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 19:53:53,074-Speed 9366.04 samples/sec Loss 0.7937 LearningRate 0.0000 Epoch: 37 Global Step: 64380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 19:54:19,271-Speed 9381.67 samples/sec Loss 0.7953 LearningRate 0.0000 Epoch: 37 Global Step: 64390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 19:54:45,503-Speed 9369.43 samples/sec Loss 0.7846 LearningRate 0.0000 Epoch: 37 Global Step: 64400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 19:55:11,658-Speed 9396.59 samples/sec Loss 0.7876 LearningRate 0.0000 Epoch: 37 Global Step: 64410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 19:55:37,859-Speed 9380.02 samples/sec Loss 0.7907 LearningRate 0.0000 Epoch: 37 Global Step: 64420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 19:56:04,015-Speed 9396.37 samples/sec Loss 0.7914 LearningRate 0.0000 Epoch: 37 Global Step: 64430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 19:56:30,227-Speed 9376.39 samples/sec Loss 0.7934 LearningRate 0.0000 Epoch: 37 Global Step: 64440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 19:56:56,436-Speed 9377.45 samples/sec Loss 0.7961 LearningRate 0.0000 Epoch: 37 Global Step: 64450 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-03-06 19:57:22,581-Speed 9400.09 samples/sec Loss 0.7901 LearningRate 0.0000 Epoch: 37 Global Step: 64460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 19:57:48,765-Speed 9386.28 samples/sec Loss 0.7942 LearningRate 0.0000 Epoch: 37 Global Step: 64470 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-06 19:58:14,915-Speed 9398.53 samples/sec Loss 0.7909 LearningRate 0.0000 Epoch: 37 Global Step: 64480 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-06 19:58:41,064-Speed 9399.14 samples/sec Loss 0.7945 LearningRate 0.0000 Epoch: 37 Global Step: 64490 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-06 19:59:07,253-Speed 9384.45 samples/sec Loss 0.7811 LearningRate 0.0000 Epoch: 37 Global Step: 64500 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-06 19:59:36,100-Speed 8519.62 samples/sec Loss 0.7937 LearningRate 0.0000 Epoch: 37 Global Step: 64510 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-06 20:00:02,249-Speed 9398.99 samples/sec Loss 0.7873 LearningRate 0.0000 Epoch: 37 Global Step: 64520 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-06 20:00:28,404-Speed 9396.87 samples/sec Loss 0.7917 LearningRate 0.0000 Epoch: 37 Global Step: 64530 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-06 20:00:54,552-Speed 9399.32 samples/sec Loss 0.7849 LearningRate 0.0000 Epoch: 37 Global Step: 64540 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-06 20:01:20,690-Speed 9402.88 samples/sec Loss 0.7931 LearningRate 0.0000 Epoch: 37 Global Step: 64550 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-06 20:01:46,918-Speed 9370.37 samples/sec Loss 0.7950 LearningRate 0.0000 Epoch: 37 Global Step: 64560 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-06 20:02:13,069-Speed 9398.42 samples/sec Loss 0.7913 LearningRate 0.0000 Epoch: 37 Global Step: 64570 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-06 20:02:39,263-Speed 9382.53 samples/sec Loss 0.7883 LearningRate 0.0000 Epoch: 37 Global Step: 64580 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-06 20:03:05,404-Speed 9401.45 samples/sec Loss 0.7891 LearningRate 0.0000 Epoch: 37 Global Step: 64590 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-06 20:03:31,632-Speed 9370.87 samples/sec Loss 0.7861 LearningRate 0.0000 Epoch: 37 Global Step: 64600 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-06 20:03:57,870-Speed 9366.99 samples/sec Loss 0.7923 LearningRate 0.0000 Epoch: 37 Global Step: 64610 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-06 20:04:24,071-Speed 9380.22 samples/sec Loss 0.7899 LearningRate 0.0000 Epoch: 37 Global Step: 64620 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-06 20:04:50,280-Speed 9377.10 samples/sec Loss 0.7930 LearningRate 0.0000 Epoch: 37 Global Step: 64630 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-06 20:05:16,412-Speed 9404.99 samples/sec Loss 0.7903 LearningRate 0.0000 Epoch: 37 Global Step: 64640 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-06 20:05:42,569-Speed 9396.03 samples/sec Loss 0.7961 LearningRate 0.0000 Epoch: 37 Global Step: 64650 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-06 20:06:08,759-Speed 9385.25 samples/sec Loss 0.7878 LearningRate 0.0000 Epoch: 37 Global Step: 64660 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-06 20:06:34,955-Speed 9381.99 samples/sec Loss 0.7854 LearningRate 0.0000 Epoch: 37 Global Step: 64670 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-06 20:07:01,128-Speed 9389.99 samples/sec Loss 0.7877 LearningRate 0.0000 Epoch: 37 Global Step: 64680 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-06 20:07:27,298-Speed 9391.53 samples/sec Loss 0.7884 LearningRate 0.0000 Epoch: 37 Global Step: 64690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-06 20:07:53,470-Speed 9390.75 samples/sec Loss 0.7907 LearningRate 0.0000 Epoch: 37 Global Step: 64700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-06 20:08:19,727-Speed 9360.20 samples/sec Loss 0.7948 LearningRate 0.0000 Epoch: 37 Global Step: 64710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-06 20:08:45,881-Speed 9397.08 samples/sec Loss 0.7951 LearningRate 0.0000 Epoch: 37 Global Step: 64720 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-06 20:09:12,087-Speed 9378.39 samples/sec Loss 0.7928 LearningRate 0.0000 Epoch: 37 Global Step: 64730 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-06 20:09:38,297-Speed 9376.98 samples/sec Loss 0.7939 LearningRate 0.0000 Epoch: 37 Global Step: 64740 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-06 20:10:04,435-Speed 9403.54 samples/sec Loss 0.7910 LearningRate 0.0000 Epoch: 37 Global Step: 64750 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-06 20:10:30,616-Speed 9387.43 samples/sec Loss 0.7938 LearningRate 0.0000 Epoch: 37 Global Step: 64760 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-06 20:10:56,756-Speed 9402.00 samples/sec Loss 0.7918 LearningRate 0.0000 Epoch: 37 Global Step: 64770 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-06 20:11:23,004-Speed 9363.40 samples/sec Loss 0.7850 LearningRate 0.0000 Epoch: 37 Global Step: 64780 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-06 20:11:49,175-Speed 9391.01 samples/sec Loss 0.7874 LearningRate 0.0000 Epoch: 37 Global Step: 64790 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-06 20:12:15,414-Speed 9366.49 samples/sec Loss 0.7834 LearningRate 0.0000 Epoch: 37 Global Step: 64800 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-06 20:12:41,586-Speed 9390.38 samples/sec Loss 0.7832 LearningRate 0.0000 Epoch: 37 Global Step: 64810 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-06 20:13:07,808-Speed 9372.67 samples/sec Loss 0.7918 LearningRate 0.0000 Epoch: 37 Global Step: 64820 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-06 20:13:34,062-Speed 9361.24 samples/sec Loss 0.7857 LearningRate 0.0000 Epoch: 37 Global Step: 64830 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-06 20:14:00,327-Speed 9357.40 samples/sec Loss 0.7884 LearningRate 0.0000 Epoch: 37 Global Step: 64840 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-06 20:14:26,536-Speed 9377.57 samples/sec Loss 0.7859 LearningRate 0.0000 Epoch: 37 Global Step: 64850 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-06 20:14:52,750-Speed 9375.67 samples/sec Loss 0.7930 LearningRate 0.0000 Epoch: 37 Global Step: 64860 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-06 20:15:18,919-Speed 9391.63 samples/sec Loss 0.7882 LearningRate 0.0000 Epoch: 37 Global Step: 64870 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-06 20:15:45,147-Speed 9371.09 samples/sec Loss 0.7845 LearningRate 0.0000 Epoch: 37 Global Step: 64880 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-06 20:16:11,342-Speed 9382.41 samples/sec Loss 0.7920 LearningRate 0.0000 Epoch: 37 Global Step: 64890 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-06 20:16:37,481-Speed 9402.33 samples/sec Loss 0.7860 LearningRate 0.0000 Epoch: 37 Global Step: 64900 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-06 20:17:03,682-Speed 9380.15 samples/sec Loss 0.7871 LearningRate 0.0000 Epoch: 37 Global Step: 64910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-06 20:17:29,777-Speed 9418.28 samples/sec Loss 0.7871 LearningRate 0.0000 Epoch: 37 Global Step: 64920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 20:17:55,937-Speed 9394.97 samples/sec Loss 0.7862 LearningRate 0.0000 Epoch: 37 Global Step: 64930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 20:18:22,062-Speed 9407.46 samples/sec Loss 0.7888 LearningRate 0.0000 Epoch: 37 Global Step: 64940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 20:18:48,259-Speed 9381.69 samples/sec Loss 0.7868 LearningRate 0.0000 Epoch: 37 Global Step: 64950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 20:19:14,304-Speed 9436.47 samples/sec Loss 0.7899 LearningRate 0.0000 Epoch: 37 Global Step: 64960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 20:19:40,386-Speed 9422.91 samples/sec Loss 0.7842 LearningRate 0.0000 Epoch: 37 Global Step: 64970 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-06 20:20:06,555-Speed 9391.36 samples/sec Loss 0.7862 LearningRate 0.0000 Epoch: 37 Global Step: 64980 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-06 20:20:32,745-Speed 9384.38 samples/sec Loss 0.7869 LearningRate 0.0000 Epoch: 37 Global Step: 64990 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-06 20:20:58,919-Speed 9389.69 samples/sec Loss 0.7846 LearningRate 0.0000 Epoch: 37 Global Step: 65000 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-06 20:21:25,024-Speed 9414.54 samples/sec Loss 0.7942 LearningRate 0.0000 Epoch: 37 Global Step: 65010 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-06 20:21:51,142-Speed 9410.03 samples/sec Loss 0.7876 LearningRate 0.0000 Epoch: 37 Global Step: 65020 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-06 20:22:17,324-Speed 9387.23 samples/sec Loss 0.7932 LearningRate 0.0000 Epoch: 37 Global Step: 65030 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-06 20:22:43,435-Speed 9412.45 samples/sec Loss 0.7919 LearningRate 0.0000 Epoch: 37 Global Step: 65040 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-06 20:23:09,526-Speed 9419.84 samples/sec Loss 0.7891 LearningRate 0.0000 Epoch: 37 Global Step: 65050 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-06 20:23:35,654-Speed 9406.06 samples/sec Loss 0.7865 LearningRate 0.0000 Epoch: 37 Global Step: 65060 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-06 20:24:01,878-Speed 9372.03 samples/sec Loss 0.7886 LearningRate 0.0000 Epoch: 37 Global Step: 65070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 20:24:28,023-Speed 9400.34 samples/sec Loss 0.7865 LearningRate 0.0000 Epoch: 37 Global Step: 65080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 20:24:54,132-Speed 9413.20 samples/sec Loss 0.7887 LearningRate 0.0000 Epoch: 37 Global Step: 65090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 20:25:20,335-Speed 9379.86 samples/sec Loss 0.7822 LearningRate 0.0000 Epoch: 37 Global Step: 65100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 20:25:46,438-Speed 9415.54 samples/sec Loss 0.7830 LearningRate 0.0000 Epoch: 37 Global Step: 65110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 20:26:12,542-Speed 9415.13 samples/sec Loss 0.7880 LearningRate 0.0000 Epoch: 37 Global Step: 65120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 20:26:38,782-Speed 9366.75 samples/sec Loss 0.7840 LearningRate 0.0000 Epoch: 37 Global Step: 65130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 20:27:05,027-Speed 9364.45 samples/sec Loss 0.7866 LearningRate 0.0000 Epoch: 37 Global Step: 65140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 20:27:31,136-Speed 9413.44 samples/sec Loss 0.7837 LearningRate 0.0000 Epoch: 37 Global Step: 65150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 20:27:57,285-Speed 9398.83 samples/sec Loss 0.7908 LearningRate 0.0000 Epoch: 37 Global Step: 65160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 20:28:23,432-Speed 9400.04 samples/sec Loss 0.7859 LearningRate 0.0000 Epoch: 37 Global Step: 65170 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-03-06 20:28:49,530-Speed 9417.08 samples/sec Loss 0.7737 LearningRate 0.0000 Epoch: 37 Global Step: 65180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 20:29:15,705-Speed 9389.36 samples/sec Loss 0.7897 LearningRate 0.0000 Epoch: 37 Global Step: 65190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 20:29:41,921-Speed 9374.95 samples/sec Loss 0.7862 LearningRate 0.0000 Epoch: 37 Global Step: 65200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 20:30:08,098-Speed 9388.45 samples/sec Loss 0.7951 LearningRate 0.0000 Epoch: 37 Global Step: 65210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 20:30:34,222-Speed 9409.01 samples/sec Loss 0.7845 LearningRate 0.0000 Epoch: 37 Global Step: 65220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 20:31:00,368-Speed 9399.78 samples/sec Loss 0.7864 LearningRate 0.0000 Epoch: 37 Global Step: 65230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 20:31:26,475-Speed 9413.95 samples/sec Loss 0.7855 LearningRate 0.0000 Epoch: 37 Global Step: 65240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-06 20:31:52,616-Speed 9401.61 samples/sec Loss 0.7827 LearningRate 0.0000 Epoch: 37 Global Step: 65250 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-06 20:32:18,850-Speed 9368.52 samples/sec Loss 0.7797 LearningRate 0.0000 Epoch: 37 Global Step: 65260 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-06 20:32:45,051-Speed 9380.36 samples/sec Loss 0.7882 LearningRate 0.0000 Epoch: 37 Global Step: 65270 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-06 20:33:11,221-Speed 9391.11 samples/sec Loss 0.7885 LearningRate 0.0000 Epoch: 37 Global Step: 65280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-06 20:33:37,437-Speed 9374.92 samples/sec Loss 0.7834 LearningRate 0.0000 Epoch: 37 Global Step: 65290 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-06 20:34:03,690-Speed 9361.77 samples/sec Loss 0.7892 LearningRate 0.0000 Epoch: 37 Global Step: 65300 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-06 20:34:29,830-Speed 9401.99 samples/sec Loss 0.7952 LearningRate 0.0000 Epoch: 37 Global Step: 65310 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-06 20:34:56,036-Speed 9378.42 samples/sec Loss 0.7786 LearningRate 0.0000 Epoch: 37 Global Step: 65320 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-06 20:35:22,150-Speed 9411.26 samples/sec Loss 0.7866 LearningRate 0.0000 Epoch: 37 Global Step: 65330 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-06 20:35:48,301-Speed 9398.30 samples/sec Loss 0.7826 LearningRate 0.0000 Epoch: 37 Global Step: 65340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 20:36:14,440-Speed 9402.21 samples/sec Loss 0.7865 LearningRate 0.0000 Epoch: 37 Global Step: 65350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 20:36:40,679-Speed 9366.87 samples/sec Loss 0.7823 LearningRate 0.0000 Epoch: 37 Global Step: 65360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 20:37:06,835-Speed 9396.14 samples/sec Loss 0.7805 LearningRate 0.0000 Epoch: 37 Global Step: 65370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 20:37:32,952-Speed 9410.08 samples/sec Loss 0.7893 LearningRate 0.0000 Epoch: 37 Global Step: 65380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 20:37:59,199-Speed 9363.91 samples/sec Loss 0.7856 LearningRate 0.0000 Epoch: 37 Global Step: 65390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 20:38:25,440-Speed 9365.84 samples/sec Loss 0.7832 LearningRate 0.0000 Epoch: 37 Global Step: 65400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 20:38:55,935-Speed 8059.41 samples/sec Loss 0.7857 LearningRate 0.0000 Epoch: 37 Global Step: 65410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 20:39:22,067-Speed 9405.78 samples/sec Loss 0.7890 LearningRate 0.0000 Epoch: 37 Global Step: 65420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 20:39:48,235-Speed 9391.63 samples/sec Loss 0.7810 LearningRate 0.0000 Epoch: 37 Global Step: 65430 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-06 20:40:14,419-Speed 9386.47 samples/sec Loss 0.7862 LearningRate 0.0000 Epoch: 37 Global Step: 65440 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-06 20:40:40,633-Speed 9375.49 samples/sec Loss 0.7879 LearningRate 0.0000 Epoch: 37 Global Step: 65450 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-06 20:41:06,786-Speed 9397.55 samples/sec Loss 0.7844 LearningRate 0.0000 Epoch: 37 Global Step: 65460 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-06 20:41:32,895-Speed 9413.30 samples/sec Loss 0.7964 LearningRate 0.0000 Epoch: 37 Global Step: 65470 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-06 20:41:59,103-Speed 9377.66 samples/sec Loss 0.7942 LearningRate 0.0000 Epoch: 37 Global Step: 65480 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-06 20:42:25,329-Speed 9371.17 samples/sec Loss 0.7852 LearningRate 0.0000 Epoch: 37 Global Step: 65490 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-06 20:42:51,407-Speed 9424.54 samples/sec Loss 0.7809 LearningRate 0.0000 Epoch: 37 Global Step: 65500 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-06 20:43:17,442-Speed 9440.28 samples/sec Loss 0.7938 LearningRate 0.0000 Epoch: 37 Global Step: 65510 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-06 20:43:43,462-Speed 9445.44 samples/sec Loss 0.7881 LearningRate 0.0000 Epoch: 37 Global Step: 65520 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-06 20:44:09,535-Speed 9425.93 samples/sec Loss 0.7821 LearningRate 0.0000 Epoch: 37 Global Step: 65530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 20:44:35,710-Speed 9389.67 samples/sec Loss 0.7845 LearningRate 0.0000 Epoch: 37 Global Step: 65540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 20:45:01,837-Speed 9406.73 samples/sec Loss 0.7877 LearningRate 0.0000 Epoch: 37 Global Step: 65550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 20:45:27,933-Speed 9417.93 samples/sec Loss 0.7854 LearningRate 0.0000 Epoch: 37 Global Step: 65560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 20:45:53,979-Speed 9435.94 samples/sec Loss 0.7850 LearningRate 0.0000 Epoch: 37 Global Step: 65570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 20:46:20,135-Speed 9396.56 samples/sec Loss 0.7860 LearningRate 0.0000 Epoch: 37 Global Step: 65580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 20:46:46,171-Speed 9439.41 samples/sec Loss 0.7775 LearningRate 0.0000 Epoch: 37 Global Step: 65590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 20:47:16,932-Speed 7989.67 samples/sec Loss 0.7869 LearningRate 0.0000 Epoch: 37 Global Step: 65600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 20:47:43,063-Speed 9405.44 samples/sec Loss 0.7890 LearningRate 0.0000 Epoch: 37 Global Step: 65610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 20:48:09,163-Speed 9416.29 samples/sec Loss 0.7840 LearningRate 0.0000 Epoch: 37 Global Step: 65620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 20:48:35,186-Speed 9444.53 samples/sec Loss 0.7877 LearningRate 0.0000 Epoch: 37 Global Step: 65630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 20:49:01,282-Speed 9417.96 samples/sec Loss 0.7786 LearningRate 0.0000 Epoch: 37 Global Step: 65640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 20:49:27,365-Speed 9422.73 samples/sec Loss 0.7885 LearningRate 0.0000 Epoch: 37 Global Step: 65650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 20:49:53,439-Speed 9426.11 samples/sec Loss 0.7896 LearningRate 0.0000 Epoch: 37 Global Step: 65660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 20:50:19,571-Speed 9404.75 samples/sec Loss 0.7853 LearningRate 0.0000 Epoch: 37 Global Step: 65670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 20:51:39,482-Speed 3075.46 samples/sec Loss 0.7889 LearningRate 0.0000 Epoch: 38 Global Step: 65680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 20:52:05,431-Speed 9471.24 samples/sec Loss 0.7885 LearningRate 0.0000 Epoch: 38 Global Step: 65690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 20:52:31,448-Speed 9446.73 samples/sec Loss 0.7865 LearningRate 0.0000 Epoch: 38 Global Step: 65700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 20:52:57,590-Speed 9401.58 samples/sec Loss 0.7827 LearningRate 0.0000 Epoch: 38 Global Step: 65710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-06 20:53:23,719-Speed 9406.13 samples/sec Loss 0.7839 LearningRate 0.0000 Epoch: 38 Global Step: 65720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-06 20:53:49,860-Speed 9401.72 samples/sec Loss 0.7798 LearningRate 0.0000 Epoch: 38 Global Step: 65730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-06 20:54:15,961-Speed 9416.41 samples/sec Loss 0.7855 LearningRate 0.0000 Epoch: 38 Global Step: 65740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-06 20:54:41,996-Speed 9439.90 samples/sec Loss 0.7808 LearningRate 0.0000 Epoch: 38 Global Step: 65750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-06 20:55:08,110-Speed 9411.57 samples/sec Loss 0.7840 LearningRate 0.0000 Epoch: 38 Global Step: 65760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-06 20:55:34,202-Speed 9419.20 samples/sec Loss 0.7834 LearningRate 0.0000 Epoch: 38 Global Step: 65770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-06 20:56:00,311-Speed 9413.41 samples/sec Loss 0.7825 LearningRate 0.0000 Epoch: 38 Global Step: 65780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-06 20:56:26,408-Speed 9417.47 samples/sec Loss 0.7851 LearningRate 0.0000 Epoch: 38 Global Step: 65790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-06 20:56:52,548-Speed 9402.28 samples/sec Loss 0.7783 LearningRate 0.0000 Epoch: 38 Global Step: 65800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-06 20:57:18,655-Speed 9413.94 samples/sec Loss 0.7790 LearningRate 0.0000 Epoch: 38 Global Step: 65810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-06 20:57:44,801-Speed 9400.04 samples/sec Loss 0.7845 LearningRate 0.0000 Epoch: 38 Global Step: 65820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 20:58:10,863-Speed 9430.09 samples/sec Loss 0.7816 LearningRate 0.0000 Epoch: 38 Global Step: 65830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 20:58:36,968-Speed 9414.74 samples/sec Loss 0.7824 LearningRate 0.0000 Epoch: 38 Global Step: 65840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 20:59:03,132-Speed 9393.19 samples/sec Loss 0.7853 LearningRate 0.0000 Epoch: 38 Global Step: 65850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 20:59:29,191-Speed 9432.48 samples/sec Loss 0.7845 LearningRate 0.0000 Epoch: 38 Global Step: 65860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-06 20:59:55,244-Speed 9433.59 samples/sec Loss 0.7760 LearningRate 0.0000 Epoch: 38 Global Step: 65870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-06 21:00:21,348-Speed 9414.86 samples/sec Loss 0.7870 LearningRate 0.0000 Epoch: 38 Global Step: 65880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-06 21:00:47,502-Speed 9397.24 samples/sec Loss 0.7784 LearningRate 0.0000 Epoch: 38 Global Step: 65890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-06 21:01:13,602-Speed 9416.54 samples/sec Loss 0.7849 LearningRate 0.0000 Epoch: 38 Global Step: 65900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-06 21:01:39,746-Speed 9400.61 samples/sec Loss 0.7849 LearningRate 0.0000 Epoch: 38 Global Step: 65910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-06 21:02:05,978-Speed 9369.07 samples/sec Loss 0.7831 LearningRate 0.0000 Epoch: 38 Global Step: 65920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-06 21:02:32,087-Speed 9413.31 samples/sec Loss 0.7832 LearningRate 0.0000 Epoch: 38 Global Step: 65930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-06 21:02:58,235-Speed 9399.02 samples/sec Loss 0.7830 LearningRate 0.0000 Epoch: 38 Global Step: 65940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-06 21:03:24,418-Speed 9386.55 samples/sec Loss 0.7907 LearningRate 0.0000 Epoch: 38 Global Step: 65950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-06 21:03:50,494-Speed 9426.27 samples/sec Loss 0.7768 LearningRate 0.0000 Epoch: 38 Global Step: 65960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:04:16,681-Speed 9386.27 samples/sec Loss 0.7791 LearningRate 0.0000 Epoch: 38 Global Step: 65970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:04:42,901-Speed 9373.11 samples/sec Loss 0.7836 LearningRate 0.0000 Epoch: 38 Global Step: 65980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:05:09,001-Speed 9416.39 samples/sec Loss 0.7829 LearningRate 0.0000 Epoch: 38 Global Step: 65990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:05:35,104-Speed 9415.23 samples/sec Loss 0.7852 LearningRate 0.0000 Epoch: 38 Global Step: 66000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:06:01,237-Speed 9405.87 samples/sec Loss 0.7872 LearningRate 0.0000 Epoch: 38 Global Step: 66010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:06:27,440-Speed 9379.51 samples/sec Loss 0.7762 LearningRate 0.0000 Epoch: 38 Global Step: 66020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:06:53,569-Speed 9406.22 samples/sec Loss 0.7913 LearningRate 0.0000 Epoch: 38 Global Step: 66030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:07:19,700-Speed 9405.11 samples/sec Loss 0.7887 LearningRate 0.0000 Epoch: 38 Global Step: 66040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:07:45,832-Speed 9405.11 samples/sec Loss 0.7865 LearningRate 0.0000 Epoch: 38 Global Step: 66050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:08:11,927-Speed 9418.49 samples/sec Loss 0.7851 LearningRate 0.0000 Epoch: 38 Global Step: 66060 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-03-06 21:08:38,023-Speed 9417.72 samples/sec Loss 0.7838 LearningRate 0.0000 Epoch: 38 Global Step: 66070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:09:04,099-Speed 9425.28 samples/sec Loss 0.7866 LearningRate 0.0000 Epoch: 38 Global Step: 66080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:09:30,237-Speed 9402.56 samples/sec Loss 0.7825 LearningRate 0.0000 Epoch: 38 Global Step: 66090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:09:56,310-Speed 9426.55 samples/sec Loss 0.7845 LearningRate 0.0000 Epoch: 38 Global Step: 66100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:10:22,488-Speed 9388.29 samples/sec Loss 0.7869 LearningRate 0.0000 Epoch: 38 Global Step: 66110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:10:48,619-Speed 9405.28 samples/sec Loss 0.7876 LearningRate 0.0000 Epoch: 38 Global Step: 66120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:11:14,816-Speed 9381.79 samples/sec Loss 0.7894 LearningRate 0.0000 Epoch: 38 Global Step: 66130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:11:40,934-Speed 9409.85 samples/sec Loss 0.7866 LearningRate 0.0000 Epoch: 38 Global Step: 66140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:12:07,172-Speed 9367.03 samples/sec Loss 0.7811 LearningRate 0.0000 Epoch: 38 Global Step: 66150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:12:33,330-Speed 9395.62 samples/sec Loss 0.7864 LearningRate 0.0000 Epoch: 38 Global Step: 66160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:12:59,487-Speed 9395.80 samples/sec Loss 0.7838 LearningRate 0.0000 Epoch: 38 Global Step: 66170 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-03-06 21:13:25,572-Speed 9421.94 samples/sec Loss 0.7890 LearningRate 0.0000 Epoch: 38 Global Step: 66180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:13:51,715-Speed 9401.15 samples/sec Loss 0.7822 LearningRate 0.0000 Epoch: 38 Global Step: 66190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:14:17,879-Speed 9393.66 samples/sec Loss 0.7825 LearningRate 0.0000 Epoch: 38 Global Step: 66200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:14:43,977-Speed 9416.94 samples/sec Loss 0.7815 LearningRate 0.0000 Epoch: 38 Global Step: 66210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:15:10,148-Speed 9390.95 samples/sec Loss 0.7775 LearningRate 0.0000 Epoch: 38 Global Step: 66220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:15:36,217-Speed 9427.56 samples/sec Loss 0.7803 LearningRate 0.0000 Epoch: 38 Global Step: 66230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:16:02,297-Speed 9423.81 samples/sec Loss 0.7837 LearningRate 0.0000 Epoch: 38 Global Step: 66240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:16:28,526-Speed 9370.35 samples/sec Loss 0.7893 LearningRate 0.0000 Epoch: 38 Global Step: 66250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:16:54,570-Speed 9436.42 samples/sec Loss 0.7781 LearningRate 0.0000 Epoch: 38 Global Step: 66260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:17:20,660-Speed 9420.44 samples/sec Loss 0.7821 LearningRate 0.0000 Epoch: 38 Global Step: 66270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:17:46,764-Speed 9414.93 samples/sec Loss 0.7848 LearningRate 0.0000 Epoch: 38 Global Step: 66280 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-03-06 21:18:12,826-Speed 9430.34 samples/sec Loss 0.7869 LearningRate 0.0000 Epoch: 38 Global Step: 66290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:18:38,956-Speed 9405.59 samples/sec Loss 0.7790 LearningRate 0.0000 Epoch: 38 Global Step: 66300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:19:05,090-Speed 9404.25 samples/sec Loss 0.7831 LearningRate 0.0000 Epoch: 38 Global Step: 66310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:19:31,172-Speed 9423.17 samples/sec Loss 0.7807 LearningRate 0.0000 Epoch: 38 Global Step: 66320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:19:57,351-Speed 9387.88 samples/sec Loss 0.7860 LearningRate 0.0000 Epoch: 38 Global Step: 66330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:20:23,512-Speed 9394.46 samples/sec Loss 0.7803 LearningRate 0.0000 Epoch: 38 Global Step: 66340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:20:49,697-Speed 9386.22 samples/sec Loss 0.7837 LearningRate 0.0000 Epoch: 38 Global Step: 66350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:21:15,820-Speed 9408.16 samples/sec Loss 0.7815 LearningRate 0.0000 Epoch: 38 Global Step: 66360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:21:41,943-Speed 9408.43 samples/sec Loss 0.7784 LearningRate 0.0000 Epoch: 38 Global Step: 66370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:22:08,155-Speed 9376.38 samples/sec Loss 0.7897 LearningRate 0.0000 Epoch: 38 Global Step: 66380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:22:34,280-Speed 9407.36 samples/sec Loss 0.7881 LearningRate 0.0000 Epoch: 38 Global Step: 66390 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-03-06 21:23:00,449-Speed 9391.75 samples/sec Loss 0.7779 LearningRate 0.0000 Epoch: 38 Global Step: 66400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:23:26,599-Speed 9398.52 samples/sec Loss 0.7707 LearningRate 0.0000 Epoch: 38 Global Step: 66410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:23:52,750-Speed 9398.26 samples/sec Loss 0.7883 LearningRate 0.0000 Epoch: 38 Global Step: 66420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:24:18,869-Speed 9409.33 samples/sec Loss 0.7839 LearningRate 0.0000 Epoch: 38 Global Step: 66430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:24:45,052-Speed 9386.72 samples/sec Loss 0.7832 LearningRate 0.0000 Epoch: 38 Global Step: 66440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:25:11,223-Speed 9391.09 samples/sec Loss 0.7798 LearningRate 0.0000 Epoch: 38 Global Step: 66450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:25:37,360-Speed 9403.57 samples/sec Loss 0.7826 LearningRate 0.0000 Epoch: 38 Global Step: 66460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:26:03,473-Speed 9411.68 samples/sec Loss 0.7834 LearningRate 0.0000 Epoch: 38 Global Step: 66470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:26:29,583-Speed 9412.87 samples/sec Loss 0.7842 LearningRate 0.0000 Epoch: 38 Global Step: 66480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:26:55,728-Speed 9400.58 samples/sec Loss 0.7805 LearningRate 0.0000 Epoch: 38 Global Step: 66490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:27:21,843-Speed 9411.34 samples/sec Loss 0.7829 LearningRate 0.0000 Epoch: 38 Global Step: 66500 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-03-06 21:27:47,913-Speed 9427.26 samples/sec Loss 0.7811 LearningRate 0.0000 Epoch: 38 Global Step: 66510 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-03-06 21:28:14,048-Speed 9403.78 samples/sec Loss 0.7794 LearningRate 0.0000 Epoch: 38 Global Step: 66520 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-03-06 21:28:40,191-Speed 9401.16 samples/sec Loss 0.7803 LearningRate 0.0000 Epoch: 38 Global Step: 66530 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-03-06 21:29:06,246-Speed 9432.73 samples/sec Loss 0.7758 LearningRate 0.0000 Epoch: 38 Global Step: 66540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:29:32,341-Speed 9418.28 samples/sec Loss 0.7777 LearningRate 0.0000 Epoch: 38 Global Step: 66550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:29:58,442-Speed 9416.13 samples/sec Loss 0.7870 LearningRate 0.0000 Epoch: 38 Global Step: 66560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:30:24,544-Speed 9415.83 samples/sec Loss 0.7860 LearningRate 0.0000 Epoch: 38 Global Step: 66570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:30:50,630-Speed 9421.53 samples/sec Loss 0.7737 LearningRate 0.0000 Epoch: 38 Global Step: 66580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:31:16,718-Speed 9420.80 samples/sec Loss 0.7820 LearningRate 0.0000 Epoch: 38 Global Step: 66590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:31:42,950-Speed 9369.19 samples/sec Loss 0.7768 LearningRate 0.0000 Epoch: 38 Global Step: 66600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:32:09,022-Speed 9426.75 samples/sec Loss 0.7733 LearningRate 0.0000 Epoch: 38 Global Step: 66610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-06 21:32:35,090-Speed 9428.04 samples/sec Loss 0.7824 LearningRate 0.0000 Epoch: 38 Global Step: 66620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-06 21:33:01,270-Speed 9387.68 samples/sec Loss 0.7806 LearningRate 0.0000 Epoch: 38 Global Step: 66630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-06 21:33:27,327-Speed 9431.95 samples/sec Loss 0.7764 LearningRate 0.0000 Epoch: 38 Global Step: 66640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-06 21:33:53,386-Speed 9431.45 samples/sec Loss 0.7800 LearningRate 0.0000 Epoch: 38 Global Step: 66650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-06 21:34:19,499-Speed 9411.71 samples/sec Loss 0.7798 LearningRate 0.0000 Epoch: 38 Global Step: 66660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-06 21:34:45,632-Speed 9404.56 samples/sec Loss 0.7847 LearningRate 0.0000 Epoch: 38 Global Step: 66670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-06 21:35:11,827-Speed 9382.23 samples/sec Loss 0.7789 LearningRate 0.0000 Epoch: 38 Global Step: 66680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-06 21:35:37,928-Speed 9417.27 samples/sec Loss 0.7847 LearningRate 0.0000 Epoch: 38 Global Step: 66690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-06 21:36:04,000-Speed 9426.69 samples/sec Loss 0.7776 LearningRate 0.0000 Epoch: 38 Global Step: 66700 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-06 21:36:30,127-Speed 9407.04 samples/sec Loss 0.7778 LearningRate 0.0000 Epoch: 38 Global Step: 66710 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-06 21:36:56,220-Speed 9419.14 samples/sec Loss 0.7812 LearningRate 0.0000 Epoch: 38 Global Step: 66720 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-06 21:37:22,348-Speed 9406.25 samples/sec Loss 0.7813 LearningRate 0.0000 Epoch: 38 Global Step: 66730 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-06 21:37:48,400-Speed 9434.34 samples/sec Loss 0.7834 LearningRate 0.0000 Epoch: 38 Global Step: 66740 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-06 21:38:14,476-Speed 9424.93 samples/sec Loss 0.7789 LearningRate 0.0000 Epoch: 38 Global Step: 66750 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-06 21:38:40,481-Speed 9450.96 samples/sec Loss 0.7734 LearningRate 0.0000 Epoch: 38 Global Step: 66760 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-06 21:39:06,565-Speed 9422.51 samples/sec Loss 0.7821 LearningRate 0.0000 Epoch: 38 Global Step: 66770 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-06 21:39:32,704-Speed 9402.39 samples/sec Loss 0.7813 LearningRate 0.0000 Epoch: 38 Global Step: 66780 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-06 21:40:01,331-Speed 8585.12 samples/sec Loss 0.7786 LearningRate 0.0000 Epoch: 38 Global Step: 66790 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-06 21:40:27,455-Speed 9407.88 samples/sec Loss 0.7828 LearningRate 0.0000 Epoch: 38 Global Step: 66800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-06 21:40:53,498-Speed 9437.34 samples/sec Loss 0.7788 LearningRate 0.0000 Epoch: 38 Global Step: 66810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-06 21:41:19,561-Speed 9429.97 samples/sec Loss 0.7775 LearningRate 0.0000 Epoch: 38 Global Step: 66820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-06 21:41:45,627-Speed 9428.71 samples/sec Loss 0.7794 LearningRate 0.0000 Epoch: 38 Global Step: 66830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-06 21:42:11,765-Speed 9402.94 samples/sec Loss 0.7753 LearningRate 0.0000 Epoch: 38 Global Step: 66840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-06 21:42:37,854-Speed 9420.19 samples/sec Loss 0.7776 LearningRate 0.0000 Epoch: 38 Global Step: 66850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-06 21:43:03,941-Speed 9421.31 samples/sec Loss 0.7820 LearningRate 0.0000 Epoch: 38 Global Step: 66860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-06 21:43:29,949-Speed 9450.08 samples/sec Loss 0.7814 LearningRate 0.0000 Epoch: 38 Global Step: 66870 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-06 21:43:55,955-Speed 9450.20 samples/sec Loss 0.7849 LearningRate 0.0000 Epoch: 38 Global Step: 66880 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-06 21:44:22,126-Speed 9391.18 samples/sec Loss 0.7794 LearningRate 0.0000 Epoch: 38 Global Step: 66890 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-06 21:44:48,196-Speed 9427.48 samples/sec Loss 0.7829 LearningRate 0.0000 Epoch: 38 Global Step: 66900 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-06 21:45:14,348-Speed 9397.60 samples/sec Loss 0.7785 LearningRate 0.0000 Epoch: 38 Global Step: 66910 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-06 21:45:40,379-Speed 9441.54 samples/sec Loss 0.7776 LearningRate 0.0000 Epoch: 38 Global Step: 66920 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-06 21:46:06,626-Speed 9363.86 samples/sec Loss 0.7793 LearningRate 0.0000 Epoch: 38 Global Step: 66930 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-06 21:46:32,797-Speed 9390.90 samples/sec Loss 0.7725 LearningRate 0.0000 Epoch: 38 Global Step: 66940 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-06 21:46:58,959-Speed 9393.95 samples/sec Loss 0.7787 LearningRate 0.0000 Epoch: 38 Global Step: 66950 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-06 21:47:25,029-Speed 9427.56 samples/sec Loss 0.7762 LearningRate 0.0000 Epoch: 38 Global Step: 66960 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-06 21:47:51,201-Speed 9390.49 samples/sec Loss 0.7840 LearningRate 0.0000 Epoch: 38 Global Step: 66970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-06 21:48:17,294-Speed 9419.09 samples/sec Loss 0.7798 LearningRate 0.0000 Epoch: 38 Global Step: 66980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-06 21:48:43,436-Speed 9401.34 samples/sec Loss 0.7802 LearningRate 0.0000 Epoch: 38 Global Step: 66990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-06 21:49:09,593-Speed 9396.15 samples/sec Loss 0.7864 LearningRate 0.0000 Epoch: 38 Global Step: 67000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-06 21:49:35,689-Speed 9417.75 samples/sec Loss 0.7841 LearningRate 0.0000 Epoch: 38 Global Step: 67010 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-06 21:50:01,761-Speed 9426.75 samples/sec Loss 0.7893 LearningRate 0.0000 Epoch: 38 Global Step: 67020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-06 21:50:27,835-Speed 9426.07 samples/sec Loss 0.7781 LearningRate 0.0000 Epoch: 38 Global Step: 67030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-06 21:50:53,931-Speed 9417.57 samples/sec Loss 0.7838 LearningRate 0.0000 Epoch: 38 Global Step: 67040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-06 21:51:20,029-Speed 9417.45 samples/sec Loss 0.7818 LearningRate 0.0000 Epoch: 38 Global Step: 67050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-06 21:51:46,111-Speed 9422.87 samples/sec Loss 0.7813 LearningRate 0.0000 Epoch: 38 Global Step: 67060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-06 21:52:12,206-Speed 9418.69 samples/sec Loss 0.7765 LearningRate 0.0000 Epoch: 38 Global Step: 67070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:52:38,336-Speed 9405.86 samples/sec Loss 0.7841 LearningRate 0.0000 Epoch: 38 Global Step: 67080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-06 21:53:04,472-Speed 9403.54 samples/sec Loss 0.7743 LearningRate 0.0000 Epoch: 38 Global Step: 67090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-06 21:53:30,613-Speed 9401.78 samples/sec Loss 0.7800 LearningRate 0.0000 Epoch: 38 Global Step: 67100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-06 21:53:56,696-Speed 9422.83 samples/sec Loss 0.7791 LearningRate 0.0000 Epoch: 38 Global Step: 67110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-06 21:54:22,710-Speed 9447.80 samples/sec Loss 0.7749 LearningRate 0.0000 Epoch: 38 Global Step: 67120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-06 21:54:48,778-Speed 9428.10 samples/sec Loss 0.7774 LearningRate 0.0000 Epoch: 38 Global Step: 67130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-06 21:55:14,913-Speed 9403.96 samples/sec Loss 0.7735 LearningRate 0.0000 Epoch: 38 Global Step: 67140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-06 21:55:41,061-Speed 9399.04 samples/sec Loss 0.7779 LearningRate 0.0000 Epoch: 38 Global Step: 67150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-06 21:56:07,154-Speed 9418.89 samples/sec Loss 0.7815 LearningRate 0.0000 Epoch: 38 Global Step: 67160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-06 21:56:33,229-Speed 9425.61 samples/sec Loss 0.7769 LearningRate 0.0000 Epoch: 38 Global Step: 67170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-06 21:56:59,304-Speed 9425.47 samples/sec Loss 0.7848 LearningRate 0.0000 Epoch: 38 Global Step: 67180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-06 21:57:25,372-Speed 9428.23 samples/sec Loss 0.7802 LearningRate 0.0000 Epoch: 38 Global Step: 67190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 21:57:51,450-Speed 9424.33 samples/sec Loss 0.7811 LearningRate 0.0000 Epoch: 38 Global Step: 67200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 21:58:17,604-Speed 9396.91 samples/sec Loss 0.7836 LearningRate 0.0000 Epoch: 38 Global Step: 67210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 21:58:43,733-Speed 9406.35 samples/sec Loss 0.7757 LearningRate 0.0000 Epoch: 38 Global Step: 67220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 21:59:09,896-Speed 9393.59 samples/sec Loss 0.7793 LearningRate 0.0000 Epoch: 38 Global Step: 67230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 21:59:35,879-Speed 9458.94 samples/sec Loss 0.7816 LearningRate 0.0000 Epoch: 38 Global Step: 67240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:00:01,892-Speed 9447.95 samples/sec Loss 0.7816 LearningRate 0.0000 Epoch: 38 Global Step: 67250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:00:27,931-Speed 9438.46 samples/sec Loss 0.7764 LearningRate 0.0000 Epoch: 38 Global Step: 67260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:00:53,971-Speed 9438.85 samples/sec Loss 0.7764 LearningRate 0.0000 Epoch: 38 Global Step: 67270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:01:19,998-Speed 9443.10 samples/sec Loss 0.7856 LearningRate 0.0000 Epoch: 38 Global Step: 67280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:01:46,002-Speed 9451.09 samples/sec Loss 0.7784 LearningRate 0.0000 Epoch: 38 Global Step: 67290 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-03-06 22:02:12,059-Speed 9432.16 samples/sec Loss 0.7811 LearningRate 0.0000 Epoch: 38 Global Step: 67300 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-03-06 22:02:38,130-Speed 9427.10 samples/sec Loss 0.7780 LearningRate 0.0000 Epoch: 38 Global Step: 67310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:03:04,119-Speed 9456.58 samples/sec Loss 0.7749 LearningRate 0.0000 Epoch: 38 Global Step: 67320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:03:30,243-Speed 9407.96 samples/sec Loss 0.7799 LearningRate 0.0000 Epoch: 38 Global Step: 67330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:03:56,410-Speed 9392.35 samples/sec Loss 0.7844 LearningRate 0.0000 Epoch: 38 Global Step: 67340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:04:22,474-Speed 9429.66 samples/sec Loss 0.7814 LearningRate 0.0000 Epoch: 38 Global Step: 67350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:04:48,607-Speed 9404.47 samples/sec Loss 0.7828 LearningRate 0.0000 Epoch: 38 Global Step: 67360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:05:14,700-Speed 9418.91 samples/sec Loss 0.7799 LearningRate 0.0000 Epoch: 38 Global Step: 67370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:05:40,832-Speed 9404.87 samples/sec Loss 0.7804 LearningRate 0.0000 Epoch: 38 Global Step: 67380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:06:06,957-Speed 9407.49 samples/sec Loss 0.7752 LearningRate 0.0000 Epoch: 38 Global Step: 67390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:06:33,183-Speed 9371.29 samples/sec Loss 0.7840 LearningRate 0.0000 Epoch: 38 Global Step: 67400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:07:53,152-Speed 3073.25 samples/sec Loss 0.7821 LearningRate 0.0000 Epoch: 39 Global Step: 67410 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-03-06 22:08:19,198-Speed 9435.88 samples/sec Loss 0.7812 LearningRate 0.0000 Epoch: 39 Global Step: 67420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:08:45,251-Speed 9434.32 samples/sec Loss 0.7802 LearningRate 0.0000 Epoch: 39 Global Step: 67430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-06 22:09:11,257-Speed 9450.23 samples/sec Loss 0.7756 LearningRate 0.0000 Epoch: 39 Global Step: 67440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-06 22:09:37,276-Speed 9446.06 samples/sec Loss 0.7787 LearningRate 0.0000 Epoch: 39 Global Step: 67450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-06 22:10:03,392-Speed 9410.64 samples/sec Loss 0.7786 LearningRate 0.0000 Epoch: 39 Global Step: 67460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-06 22:10:29,445-Speed 9433.27 samples/sec Loss 0.7805 LearningRate 0.0000 Epoch: 39 Global Step: 67470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-06 22:10:55,650-Speed 9378.99 samples/sec Loss 0.7755 LearningRate 0.0000 Epoch: 39 Global Step: 67480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-06 22:11:21,724-Speed 9425.97 samples/sec Loss 0.7771 LearningRate 0.0000 Epoch: 39 Global Step: 67490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-06 22:11:47,767-Speed 9436.90 samples/sec Loss 0.7753 LearningRate 0.0000 Epoch: 39 Global Step: 67500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-06 22:12:13,852-Speed 9422.14 samples/sec Loss 0.7788 LearningRate 0.0000 Epoch: 39 Global Step: 67510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-06 22:12:39,995-Speed 9400.92 samples/sec Loss 0.7739 LearningRate 0.0000 Epoch: 39 Global Step: 67520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-06 22:13:06,033-Speed 9438.99 samples/sec Loss 0.7814 LearningRate 0.0000 Epoch: 39 Global Step: 67530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:13:32,149-Speed 9410.56 samples/sec Loss 0.7789 LearningRate 0.0000 Epoch: 39 Global Step: 67540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:13:58,245-Speed 9418.60 samples/sec Loss 0.7732 LearningRate 0.0000 Epoch: 39 Global Step: 67550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:14:24,349-Speed 9414.95 samples/sec Loss 0.7766 LearningRate 0.0000 Epoch: 39 Global Step: 67560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:14:50,455-Speed 9414.28 samples/sec Loss 0.7783 LearningRate 0.0000 Epoch: 39 Global Step: 67570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:15:16,507-Speed 9433.83 samples/sec Loss 0.7768 LearningRate 0.0000 Epoch: 39 Global Step: 67580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:15:42,653-Speed 9400.00 samples/sec Loss 0.7751 LearningRate 0.0000 Epoch: 39 Global Step: 67590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-06 22:16:08,787-Speed 9404.39 samples/sec Loss 0.7759 LearningRate 0.0000 Epoch: 39 Global Step: 67600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-06 22:16:34,910-Speed 9408.18 samples/sec Loss 0.7777 LearningRate 0.0000 Epoch: 39 Global Step: 67610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-06 22:17:01,104-Speed 9382.70 samples/sec Loss 0.7779 LearningRate 0.0000 Epoch: 39 Global Step: 67620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-06 22:17:27,175-Speed 9427.23 samples/sec Loss 0.7806 LearningRate 0.0000 Epoch: 39 Global Step: 67630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-06 22:17:53,394-Speed 9374.06 samples/sec Loss 0.7800 LearningRate 0.0000 Epoch: 39 Global Step: 67640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-06 22:18:19,525-Speed 9405.04 samples/sec Loss 0.7726 LearningRate 0.0000 Epoch: 39 Global Step: 67650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-06 22:18:45,612-Speed 9421.39 samples/sec Loss 0.7737 LearningRate 0.0000 Epoch: 39 Global Step: 67660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-06 22:19:11,731-Speed 9409.38 samples/sec Loss 0.7816 LearningRate 0.0000 Epoch: 39 Global Step: 67670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-06 22:19:37,931-Speed 9380.55 samples/sec Loss 0.7766 LearningRate 0.0000 Epoch: 39 Global Step: 67680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-06 22:20:04,045-Speed 9411.59 samples/sec Loss 0.7769 LearningRate 0.0000 Epoch: 39 Global Step: 67690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:20:30,223-Speed 9388.34 samples/sec Loss 0.7797 LearningRate 0.0000 Epoch: 39 Global Step: 67700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:20:56,388-Speed 9393.12 samples/sec Loss 0.7898 LearningRate 0.0000 Epoch: 39 Global Step: 67710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:21:22,527-Speed 9402.60 samples/sec Loss 0.7765 LearningRate 0.0000 Epoch: 39 Global Step: 67720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:21:48,740-Speed 9375.89 samples/sec Loss 0.7763 LearningRate 0.0000 Epoch: 39 Global Step: 67730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:22:14,794-Speed 9433.27 samples/sec Loss 0.7797 LearningRate 0.0000 Epoch: 39 Global Step: 67740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:22:40,869-Speed 9425.40 samples/sec Loss 0.7797 LearningRate 0.0000 Epoch: 39 Global Step: 67750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-06 22:23:06,966-Speed 9417.60 samples/sec Loss 0.7799 LearningRate 0.0000 Epoch: 39 Global Step: 67760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-06 22:23:33,039-Speed 9426.34 samples/sec Loss 0.7689 LearningRate 0.0000 Epoch: 39 Global Step: 67770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-06 22:23:59,164-Speed 9407.52 samples/sec Loss 0.7786 LearningRate 0.0000 Epoch: 39 Global Step: 67780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-06 22:24:25,332-Speed 9391.98 samples/sec Loss 0.7796 LearningRate 0.0000 Epoch: 39 Global Step: 67790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-06 22:24:51,464-Speed 9404.97 samples/sec Loss 0.7832 LearningRate 0.0000 Epoch: 39 Global Step: 67800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-06 22:25:17,583-Speed 9409.88 samples/sec Loss 0.7822 LearningRate 0.0000 Epoch: 39 Global Step: 67810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-06 22:25:43,658-Speed 9425.34 samples/sec Loss 0.7790 LearningRate 0.0000 Epoch: 39 Global Step: 67820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-06 22:26:09,751-Speed 9419.38 samples/sec Loss 0.7784 LearningRate 0.0000 Epoch: 39 Global Step: 67830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-06 22:26:35,886-Speed 9403.68 samples/sec Loss 0.7819 LearningRate 0.0000 Epoch: 39 Global Step: 67840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-06 22:27:02,027-Speed 9401.68 samples/sec Loss 0.7761 LearningRate 0.0000 Epoch: 39 Global Step: 67850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:27:28,113-Speed 9421.62 samples/sec Loss 0.7767 LearningRate 0.0000 Epoch: 39 Global Step: 67860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:27:54,163-Speed 9434.78 samples/sec Loss 0.7818 LearningRate 0.0000 Epoch: 39 Global Step: 67870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:28:20,344-Speed 9387.40 samples/sec Loss 0.7796 LearningRate 0.0000 Epoch: 39 Global Step: 67880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:28:46,440-Speed 9418.05 samples/sec Loss 0.7791 LearningRate 0.0000 Epoch: 39 Global Step: 67890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:29:12,560-Speed 9409.55 samples/sec Loss 0.7783 LearningRate 0.0000 Epoch: 39 Global Step: 67900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:29:38,742-Speed 9386.77 samples/sec Loss 0.7775 LearningRate 0.0000 Epoch: 39 Global Step: 67910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:30:04,803-Speed 9430.87 samples/sec Loss 0.7839 LearningRate 0.0000 Epoch: 39 Global Step: 67920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:30:30,938-Speed 9404.02 samples/sec Loss 0.7800 LearningRate 0.0000 Epoch: 39 Global Step: 67930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:30:57,043-Speed 9414.93 samples/sec Loss 0.7827 LearningRate 0.0000 Epoch: 39 Global Step: 67940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:31:23,132-Speed 9420.48 samples/sec Loss 0.7793 LearningRate 0.0000 Epoch: 39 Global Step: 67950 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-03-06 22:31:49,241-Speed 9413.27 samples/sec Loss 0.7835 LearningRate 0.0000 Epoch: 39 Global Step: 67960 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-03-06 22:32:15,324-Speed 9422.45 samples/sec Loss 0.7780 LearningRate 0.0000 Epoch: 39 Global Step: 67970 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-03-06 22:32:41,508-Speed 9386.34 samples/sec Loss 0.7759 LearningRate 0.0000 Epoch: 39 Global Step: 67980 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-03-06 22:33:07,602-Speed 9419.12 samples/sec Loss 0.7728 LearningRate 0.0000 Epoch: 39 Global Step: 67990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:33:33,691-Speed 9420.45 samples/sec Loss 0.7779 LearningRate 0.0000 Epoch: 39 Global Step: 68000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:33:59,824-Speed 9404.60 samples/sec Loss 0.7783 LearningRate 0.0000 Epoch: 39 Global Step: 68010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:34:25,925-Speed 9416.16 samples/sec Loss 0.7770 LearningRate 0.0000 Epoch: 39 Global Step: 68020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:34:53,373-Speed 8953.98 samples/sec Loss 0.7715 LearningRate 0.0000 Epoch: 39 Global Step: 68030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:35:19,448-Speed 9425.63 samples/sec Loss 0.7732 LearningRate 0.0000 Epoch: 39 Global Step: 68040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:35:45,574-Speed 9406.82 samples/sec Loss 0.7850 LearningRate 0.0000 Epoch: 39 Global Step: 68050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:36:11,706-Speed 9404.87 samples/sec Loss 0.7795 LearningRate 0.0000 Epoch: 39 Global Step: 68060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:36:37,883-Speed 9389.65 samples/sec Loss 0.7744 LearningRate 0.0000 Epoch: 39 Global Step: 68070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:37:04,048-Speed 9393.18 samples/sec Loss 0.7796 LearningRate 0.0000 Epoch: 39 Global Step: 68080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:37:30,158-Speed 9413.12 samples/sec Loss 0.7820 LearningRate 0.0000 Epoch: 39 Global Step: 68090 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-03-06 22:37:56,299-Speed 9401.71 samples/sec Loss 0.7751 LearningRate 0.0000 Epoch: 39 Global Step: 68100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:38:22,553-Speed 9361.32 samples/sec Loss 0.7774 LearningRate 0.0000 Epoch: 39 Global Step: 68110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:38:48,800-Speed 9363.63 samples/sec Loss 0.7776 LearningRate 0.0000 Epoch: 39 Global Step: 68120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:39:15,064-Speed 9357.73 samples/sec Loss 0.7767 LearningRate 0.0000 Epoch: 39 Global Step: 68130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:39:41,298-Speed 9368.40 samples/sec Loss 0.7817 LearningRate 0.0000 Epoch: 39 Global Step: 68140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:40:07,438-Speed 9402.25 samples/sec Loss 0.7826 LearningRate 0.0000 Epoch: 39 Global Step: 68150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:40:33,570-Speed 9404.75 samples/sec Loss 0.7818 LearningRate 0.0000 Epoch: 39 Global Step: 68160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:40:59,720-Speed 9398.34 samples/sec Loss 0.7719 LearningRate 0.0000 Epoch: 39 Global Step: 68170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:41:25,905-Speed 9385.90 samples/sec Loss 0.7721 LearningRate 0.0000 Epoch: 39 Global Step: 68180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:41:52,100-Speed 9382.97 samples/sec Loss 0.7745 LearningRate 0.0000 Epoch: 39 Global Step: 68190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:42:18,247-Speed 9399.20 samples/sec Loss 0.7795 LearningRate 0.0000 Epoch: 39 Global Step: 68200 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-03-06 22:42:44,438-Speed 9384.04 samples/sec Loss 0.7773 LearningRate 0.0000 Epoch: 39 Global Step: 68210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:43:10,588-Speed 9398.74 samples/sec Loss 0.7753 LearningRate 0.0000 Epoch: 39 Global Step: 68220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:43:36,687-Speed 9416.78 samples/sec Loss 0.7768 LearningRate 0.0000 Epoch: 39 Global Step: 68230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:44:02,871-Speed 9386.43 samples/sec Loss 0.7705 LearningRate 0.0000 Epoch: 39 Global Step: 68240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:44:28,958-Speed 9420.97 samples/sec Loss 0.7799 LearningRate 0.0000 Epoch: 39 Global Step: 68250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:44:55,091-Speed 9405.04 samples/sec Loss 0.7744 LearningRate 0.0000 Epoch: 39 Global Step: 68260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:45:21,256-Speed 9393.24 samples/sec Loss 0.7797 LearningRate 0.0000 Epoch: 39 Global Step: 68270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:45:47,435-Speed 9387.70 samples/sec Loss 0.7779 LearningRate 0.0000 Epoch: 39 Global Step: 68280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:46:13,583-Speed 9399.46 samples/sec Loss 0.7747 LearningRate 0.0000 Epoch: 39 Global Step: 68290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:46:39,803-Speed 9373.29 samples/sec Loss 0.7720 LearningRate 0.0000 Epoch: 39 Global Step: 68300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:47:05,928-Speed 9407.79 samples/sec Loss 0.7758 LearningRate 0.0000 Epoch: 39 Global Step: 68310 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-03-06 22:47:31,999-Speed 9426.70 samples/sec Loss 0.7790 LearningRate 0.0000 Epoch: 39 Global Step: 68320 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-03-06 22:47:58,066-Speed 9428.44 samples/sec Loss 0.7799 LearningRate 0.0000 Epoch: 39 Global Step: 68330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:48:24,260-Speed 9382.84 samples/sec Loss 0.7767 LearningRate 0.0000 Epoch: 39 Global Step: 68340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:48:50,420-Speed 9394.66 samples/sec Loss 0.7771 LearningRate 0.0000 Epoch: 39 Global Step: 68350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:49:16,481-Speed 9430.61 samples/sec Loss 0.7763 LearningRate 0.0000 Epoch: 39 Global Step: 68360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:49:42,635-Speed 9397.39 samples/sec Loss 0.7800 LearningRate 0.0000 Epoch: 39 Global Step: 68370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:50:08,798-Speed 9393.74 samples/sec Loss 0.7718 LearningRate 0.0000 Epoch: 39 Global Step: 68380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:50:34,968-Speed 9391.08 samples/sec Loss 0.7763 LearningRate 0.0000 Epoch: 39 Global Step: 68390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:51:01,096-Speed 9406.30 samples/sec Loss 0.7804 LearningRate 0.0000 Epoch: 39 Global Step: 68400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:51:29,739-Speed 8581.89 samples/sec Loss 0.7734 LearningRate 0.0000 Epoch: 39 Global Step: 68410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:51:55,895-Speed 9396.44 samples/sec Loss 0.7779 LearningRate 0.0000 Epoch: 39 Global Step: 68420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:52:22,002-Speed 9414.20 samples/sec Loss 0.7761 LearningRate 0.0000 Epoch: 39 Global Step: 68430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-06 22:52:48,183-Speed 9387.44 samples/sec Loss 0.7789 LearningRate 0.0000 Epoch: 39 Global Step: 68440 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-03-06 22:53:14,281-Speed 9417.33 samples/sec Loss 0.7794 LearningRate 0.0000 Epoch: 39 Global Step: 68450 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-03-06 22:53:40,506-Speed 9371.53 samples/sec Loss 0.7740 LearningRate 0.0000 Epoch: 39 Global Step: 68460 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-03-06 22:54:06,678-Speed 9390.53 samples/sec Loss 0.7745 LearningRate 0.0000 Epoch: 39 Global Step: 68470 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-03-06 22:54:32,772-Speed 9418.94 samples/sec Loss 0.7724 LearningRate 0.0000 Epoch: 39 Global Step: 68480 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-03-06 22:54:58,973-Speed 9380.28 samples/sec Loss 0.7744 LearningRate 0.0000 Epoch: 39 Global Step: 68490 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-03-06 22:55:25,010-Speed 9439.27 samples/sec Loss 0.7840 LearningRate 0.0000 Epoch: 39 Global Step: 68500 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-03-06 22:55:51,048-Speed 9439.30 samples/sec Loss 0.7712 LearningRate 0.0000 Epoch: 39 Global Step: 68510 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-06 22:56:17,302-Speed 9361.25 samples/sec Loss 0.7696 LearningRate 0.0000 Epoch: 39 Global Step: 68520 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-06 22:56:43,344-Speed 9437.75 samples/sec Loss 0.7732 LearningRate 0.0000 Epoch: 39 Global Step: 68530 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-06 22:57:09,466-Speed 9408.57 samples/sec Loss 0.7796 LearningRate 0.0000 Epoch: 39 Global Step: 68540 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-06 22:57:35,624-Speed 9395.63 samples/sec Loss 0.7775 LearningRate 0.0000 Epoch: 39 Global Step: 68550 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-06 22:58:01,819-Speed 9382.19 samples/sec Loss 0.7773 LearningRate 0.0000 Epoch: 39 Global Step: 68560 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-06 22:58:27,922-Speed 9415.43 samples/sec Loss 0.7785 LearningRate 0.0000 Epoch: 39 Global Step: 68570 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-06 22:58:54,058-Speed 9403.81 samples/sec Loss 0.7750 LearningRate 0.0000 Epoch: 39 Global Step: 68580 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-06 22:59:20,257-Speed 9380.87 samples/sec Loss 0.7847 LearningRate 0.0000 Epoch: 39 Global Step: 68590 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-06 22:59:46,427-Speed 9391.52 samples/sec Loss 0.7843 LearningRate 0.0000 Epoch: 39 Global Step: 68600 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-06 23:00:12,568-Speed 9401.89 samples/sec Loss 0.7818 LearningRate 0.0000 Epoch: 39 Global Step: 68610 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-03-06 23:00:38,798-Speed 9369.80 samples/sec Loss 0.7731 LearningRate 0.0000 Epoch: 39 Global Step: 68620 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-03-06 23:01:05,044-Speed 9364.11 samples/sec Loss 0.7791 LearningRate 0.0000 Epoch: 39 Global Step: 68630 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-03-06 23:01:31,229-Speed 9385.94 samples/sec Loss 0.7739 LearningRate 0.0000 Epoch: 39 Global Step: 68640 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-03-06 23:01:57,352-Speed 9408.18 samples/sec Loss 0.7806 LearningRate 0.0000 Epoch: 39 Global Step: 68650 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-03-06 23:02:23,484-Speed 9405.13 samples/sec Loss 0.7772 LearningRate 0.0000 Epoch: 39 Global Step: 68660 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-03-06 23:02:49,738-Speed 9361.23 samples/sec Loss 0.7767 LearningRate 0.0000 Epoch: 39 Global Step: 68670 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-03-06 23:03:15,839-Speed 9416.27 samples/sec Loss 0.7820 LearningRate 0.0000 Epoch: 39 Global Step: 68680 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-03-06 23:03:41,975-Speed 9403.18 samples/sec Loss 0.7789 LearningRate 0.0000 Epoch: 39 Global Step: 68690 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-03-06 23:04:08,109-Speed 9404.53 samples/sec Loss 0.7738 LearningRate 0.0000 Epoch: 39 Global Step: 68700 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-03-06 23:04:34,217-Speed 9413.48 samples/sec Loss 0.7732 LearningRate 0.0000 Epoch: 39 Global Step: 68710 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-03-06 23:05:00,405-Speed 9385.00 samples/sec Loss 0.7767 LearningRate 0.0000 Epoch: 39 Global Step: 68720 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-03-06 23:05:26,538-Speed 9404.54 samples/sec Loss 0.7723 LearningRate 0.0000 Epoch: 39 Global Step: 68730 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-03-06 23:05:52,604-Speed 9429.03 samples/sec Loss 0.7786 LearningRate 0.0000 Epoch: 39 Global Step: 68740 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-06 23:06:18,765-Speed 9394.54 samples/sec Loss 0.7790 LearningRate 0.0000 Epoch: 39 Global Step: 68750 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-06 23:06:44,830-Speed 9429.11 samples/sec Loss 0.7810 LearningRate 0.0000 Epoch: 39 Global Step: 68760 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-06 23:07:10,906-Speed 9425.49 samples/sec Loss 0.7760 LearningRate 0.0000 Epoch: 39 Global Step: 68770 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-06 23:07:37,062-Speed 9396.33 samples/sec Loss 0.7794 LearningRate 0.0000 Epoch: 39 Global Step: 68780 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-06 23:08:03,149-Speed 9421.08 samples/sec Loss 0.7755 LearningRate 0.0000 Epoch: 39 Global Step: 68790 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-06 23:08:29,236-Speed 9421.27 samples/sec Loss 0.7787 LearningRate 0.0000 Epoch: 39 Global Step: 68800 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-06 23:08:55,319-Speed 9422.64 samples/sec Loss 0.7805 LearningRate 0.0000 Epoch: 39 Global Step: 68810 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-06 23:09:21,449-Speed 9405.90 samples/sec Loss 0.7784 LearningRate 0.0000 Epoch: 39 Global Step: 68820 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-06 23:09:47,542-Speed 9419.17 samples/sec Loss 0.7817 LearningRate 0.0000 Epoch: 39 Global Step: 68830 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-06 23:10:13,671-Speed 9406.44 samples/sec Loss 0.7804 LearningRate 0.0000 Epoch: 39 Global Step: 68840 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-03-06 23:10:39,837-Speed 9392.67 samples/sec Loss 0.7740 LearningRate 0.0000 Epoch: 39 Global Step: 68850 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-03-06 23:11:05,974-Speed 9403.11 samples/sec Loss 0.7755 LearningRate 0.0000 Epoch: 39 Global Step: 68860 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-06 23:11:32,034-Speed 9431.02 samples/sec Loss 0.7756 LearningRate 0.0000 Epoch: 39 Global Step: 68870 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-06 23:11:58,104-Speed 9427.48 samples/sec Loss 0.7768 LearningRate 0.0000 Epoch: 39 Global Step: 68880 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-06 23:12:24,226-Speed 9408.34 samples/sec Loss 0.7773 LearningRate 0.0000 Epoch: 39 Global Step: 68890 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-06 23:12:50,380-Speed 9397.17 samples/sec Loss 0.7810 LearningRate 0.0000 Epoch: 39 Global Step: 68900 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-06 23:13:16,492-Speed 9412.16 samples/sec Loss 0.7749 LearningRate 0.0000 Epoch: 39 Global Step: 68910 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-06 23:13:42,606-Speed 9411.46 samples/sec Loss 0.7774 LearningRate 0.0000 Epoch: 39 Global Step: 68920 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-06 23:14:08,724-Speed 9409.74 samples/sec Loss 0.7786 LearningRate 0.0000 Epoch: 39 Global Step: 68930 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-06 23:14:34,943-Speed 9373.96 samples/sec Loss 0.7717 LearningRate 0.0000 Epoch: 39 Global Step: 68940 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-06 23:15:01,065-Speed 9408.90 samples/sec Loss 0.7746 LearningRate 0.0000 Epoch: 39 Global Step: 68950 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-06 23:15:27,255-Speed 9384.09 samples/sec Loss 0.7720 LearningRate 0.0000 Epoch: 39 Global Step: 68960 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-03-06 23:15:53,376-Speed 9408.91 samples/sec Loss 0.7760 LearningRate 0.0000 Epoch: 39 Global Step: 68970 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-03-06 23:16:19,461-Speed 9422.02 samples/sec Loss 0.7796 LearningRate 0.0000 Epoch: 39 Global Step: 68980 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-03-06 23:16:45,655-Speed 9382.88 samples/sec Loss 0.7747 LearningRate 0.0000 Epoch: 39 Global Step: 68990 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-03-06 23:17:11,822-Speed 9392.05 samples/sec Loss 0.7783 LearningRate 0.0000 Epoch: 39 Global Step: 69000 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-03-06 23:17:37,991-Speed 9392.00 samples/sec Loss 0.7785 LearningRate 0.0000 Epoch: 39 Global Step: 69010 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-03-06 23:18:04,269-Speed 9352.67 samples/sec Loss 0.7741 LearningRate 0.0000 Epoch: 39 Global Step: 69020 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-03-06 23:18:30,557-Speed 9349.07 samples/sec Loss 0.7782 LearningRate 0.0000 Epoch: 39 Global Step: 69030 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-06 23:18:56,679-Speed 9408.79 samples/sec Loss 0.7796 LearningRate 0.0000 Epoch: 39 Global Step: 69040 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-06 23:19:22,745-Speed 9428.85 samples/sec Loss 0.7809 LearningRate 0.0000 Epoch: 39 Global Step: 69050 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-06 23:19:48,841-Speed 9418.13 samples/sec Loss 0.7767 LearningRate 0.0000 Epoch: 39 Global Step: 69060 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-06 23:20:14,911-Speed 9427.00 samples/sec Loss 0.7794 LearningRate 0.0000 Epoch: 39 Global Step: 69070 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-06 23:20:41,045-Speed 9404.34 samples/sec Loss 0.7818 LearningRate 0.0000 Epoch: 39 Global Step: 69080 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-06 23:21:07,078-Speed 9440.52 samples/sec Loss 0.7765 LearningRate 0.0000 Epoch: 39 Global Step: 69090 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-06 23:21:33,136-Speed 9431.79 samples/sec Loss 0.7808 LearningRate 0.0000 Epoch: 39 Global Step: 69100 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-06 23:21:59,235-Speed 9416.93 samples/sec Loss 0.7772 LearningRate 0.0000 Epoch: 39 Global Step: 69110 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-06 23:22:25,339-Speed 9415.14 samples/sec Loss 0.7787 LearningRate 0.0000 Epoch: 39 Global Step: 69120 Fp16 Grad Scale: 16384 Required: -0 hours Training: 2022-03-06 23:22:51,452-Speed 9411.79 samples/sec Loss 0.7833 LearningRate 0.0000 Epoch: 39 Global Step: 69130 Fp16 Grad Scale: 16384 Required: -0 hours