Training: 2022-03-25 22:36:07,736-rank_id: 0 Training: 2022-03-25 22:36:58,412-Speed 24752.70 samples/sec Loss 42.4928 LearningRate 0.0000 Epoch: 0 Global Step: 20 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-25 22:37:08,404-Speed 24601.27 samples/sec Loss 42.4611 LearningRate 0.0000 Epoch: 0 Global Step: 30 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-25 22:37:18,224-Speed 25033.10 samples/sec Loss 42.4434 LearningRate 0.0000 Epoch: 0 Global Step: 40 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-25 22:37:28,098-Speed 24893.11 samples/sec Loss 42.4066 LearningRate 0.0000 Epoch: 0 Global Step: 50 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-25 22:37:37,970-Speed 24897.43 samples/sec Loss 42.3443 LearningRate 0.0000 Epoch: 0 Global Step: 60 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-25 22:37:47,692-Speed 25282.15 samples/sec Loss 42.2692 LearningRate 0.0000 Epoch: 0 Global Step: 70 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-25 22:37:57,593-Speed 24827.06 samples/sec Loss 42.1428 LearningRate 0.0000 Epoch: 0 Global Step: 80 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-25 22:38:07,403-Speed 25055.80 samples/sec Loss 41.9780 LearningRate 0.0000 Epoch: 0 Global Step: 90 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-25 22:38:17,218-Speed 25045.76 samples/sec Loss 41.7694 LearningRate 0.0000 Epoch: 0 Global Step: 100 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-03-25 22:38:26,940-Speed 25283.01 samples/sec Loss 41.5106 LearningRate 0.0000 Epoch: 0 Global Step: 110 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-03-25 22:38:36,646-Speed 25329.46 samples/sec Loss 41.2360 LearningRate 0.0000 Epoch: 0 Global Step: 120 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-03-25 22:38:46,629-Speed 24623.08 samples/sec Loss 40.9357 LearningRate 0.0000 Epoch: 0 Global Step: 130 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-03-25 22:38:56,435-Speed 25067.54 samples/sec Loss 40.6204 LearningRate 0.0000 Epoch: 0 Global Step: 140 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-03-25 22:39:06,262-Speed 25011.07 samples/sec Loss 40.3145 LearningRate 0.0000 Epoch: 0 Global Step: 150 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-03-25 22:39:15,977-Speed 25303.17 samples/sec Loss 40.0316 LearningRate 0.0000 Epoch: 0 Global Step: 160 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-03-25 22:39:25,906-Speed 24753.12 samples/sec Loss 39.7716 LearningRate 0.0000 Epoch: 0 Global Step: 170 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-03-25 22:39:35,660-Speed 25199.90 samples/sec Loss 39.5489 LearningRate 0.0000 Epoch: 0 Global Step: 180 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-03-25 22:39:45,573-Speed 24795.62 samples/sec Loss 39.3533 LearningRate 0.0000 Epoch: 0 Global Step: 190 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-03-25 22:39:55,358-Speed 25125.76 samples/sec Loss 39.1919 LearningRate 0.0000 Epoch: 0 Global Step: 200 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-03-25 22:40:05,157-Speed 25084.93 samples/sec Loss 39.0729 LearningRate 0.0000 Epoch: 0 Global Step: 210 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-03-25 22:40:14,887-Speed 25261.59 samples/sec Loss 38.9766 LearningRate 0.0000 Epoch: 0 Global Step: 220 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-25 22:40:24,604-Speed 25297.40 samples/sec Loss 38.8966 LearningRate 0.0000 Epoch: 0 Global Step: 230 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-25 22:40:35,552-Speed 22449.68 samples/sec Loss 38.8501 LearningRate 0.0000 Epoch: 0 Global Step: 240 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-25 22:40:45,377-Speed 25019.55 samples/sec Loss 38.8173 LearningRate 0.0000 Epoch: 0 Global Step: 250 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-25 22:40:55,209-Speed 24999.09 samples/sec Loss 38.7820 LearningRate 0.0000 Epoch: 0 Global Step: 260 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-25 22:41:04,964-Speed 25195.94 samples/sec Loss 38.7643 LearningRate 0.0000 Epoch: 0 Global Step: 270 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-25 22:41:14,766-Speed 25078.41 samples/sec Loss 38.7602 LearningRate 0.0000 Epoch: 0 Global Step: 280 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-25 22:41:24,563-Speed 25089.73 samples/sec Loss 38.7441 LearningRate 0.0000 Epoch: 0 Global Step: 290 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-25 22:41:35,110-Speed 23305.06 samples/sec Loss 38.7394 LearningRate 0.0000 Epoch: 0 Global Step: 300 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-25 22:41:44,998-Speed 24859.36 samples/sec Loss 38.7302 LearningRate 0.0000 Epoch: 0 Global Step: 310 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-25 22:41:54,839-Speed 24976.39 samples/sec Loss 38.7246 LearningRate 0.0000 Epoch: 0 Global Step: 320 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-25 22:42:04,500-Speed 25444.07 samples/sec Loss 38.7257 LearningRate 0.0000 Epoch: 0 Global Step: 330 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-25 22:42:14,379-Speed 24881.89 samples/sec Loss 38.7195 LearningRate 0.0000 Epoch: 0 Global Step: 340 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-25 22:42:24,089-Speed 25313.48 samples/sec Loss 38.7181 LearningRate 0.0001 Epoch: 0 Global Step: 350 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-25 22:42:33,808-Speed 25290.30 samples/sec Loss 38.7167 LearningRate 0.0001 Epoch: 0 Global Step: 360 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-25 22:42:43,576-Speed 25165.10 samples/sec Loss 38.7178 LearningRate 0.0001 Epoch: 0 Global Step: 370 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-25 22:42:53,337-Speed 25179.68 samples/sec Loss 38.7127 LearningRate 0.0001 Epoch: 0 Global Step: 380 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-25 22:43:03,190-Speed 24946.59 samples/sec Loss 38.7139 LearningRate 0.0001 Epoch: 0 Global Step: 390 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-25 22:43:12,944-Speed 25211.06 samples/sec Loss 38.7209 LearningRate 0.0001 Epoch: 0 Global Step: 400 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-25 22:43:22,649-Speed 25325.49 samples/sec Loss 38.7303 LearningRate 0.0001 Epoch: 0 Global Step: 410 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-25 22:43:32,536-Speed 24861.02 samples/sec Loss 38.7329 LearningRate 0.0001 Epoch: 0 Global Step: 420 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-25 22:43:42,371-Speed 24992.98 samples/sec Loss 38.7332 LearningRate 0.0001 Epoch: 0 Global Step: 430 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-25 22:43:52,363-Speed 24606.66 samples/sec Loss 38.7629 LearningRate 0.0001 Epoch: 0 Global Step: 440 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-25 22:44:02,356-Speed 24599.41 samples/sec Loss 38.8732 LearningRate 0.0001 Epoch: 0 Global Step: 450 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-03-25 22:44:12,129-Speed 25150.83 samples/sec Loss 38.7793 LearningRate 0.0001 Epoch: 0 Global Step: 460 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-03-25 22:44:21,837-Speed 25321.16 samples/sec Loss 38.7960 LearningRate 0.0001 Epoch: 0 Global Step: 470 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-03-25 22:44:32,812-Speed 22395.31 samples/sec Loss 38.8065 LearningRate 0.0001 Epoch: 0 Global Step: 480 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-03-25 22:44:42,564-Speed 25209.48 samples/sec Loss 38.8675 LearningRate 0.0001 Epoch: 0 Global Step: 490 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-03-25 22:44:52,349-Speed 25120.53 samples/sec Loss 38.8335 LearningRate 0.0001 Epoch: 0 Global Step: 500 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-03-25 22:45:02,109-Speed 25183.51 samples/sec Loss 38.8670 LearningRate 0.0001 Epoch: 0 Global Step: 510 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-03-25 22:45:11,904-Speed 25093.14 samples/sec Loss 38.8441 LearningRate 0.0001 Epoch: 0 Global Step: 520 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-03-25 22:45:21,631-Speed 25267.33 samples/sec Loss 38.8459 LearningRate 0.0001 Epoch: 0 Global Step: 530 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-03-25 22:45:31,663-Speed 24500.13 samples/sec Loss 38.8537 LearningRate 0.0001 Epoch: 0 Global Step: 540 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-03-25 22:45:41,489-Speed 25016.00 samples/sec Loss 38.8839 LearningRate 0.0001 Epoch: 0 Global Step: 550 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-03-25 22:45:51,305-Speed 25038.74 samples/sec Loss 38.8862 LearningRate 0.0001 Epoch: 0 Global Step: 560 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-03-25 22:46:01,105-Speed 25081.81 samples/sec Loss 38.8807 LearningRate 0.0001 Epoch: 0 Global Step: 570 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-03-25 22:46:10,969-Speed 24918.48 samples/sec Loss 38.8994 LearningRate 0.0001 Epoch: 0 Global Step: 580 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-03-25 22:46:20,722-Speed 25200.57 samples/sec Loss 38.9110 LearningRate 0.0001 Epoch: 0 Global Step: 590 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-03-25 22:46:30,525-Speed 25073.68 samples/sec Loss 38.9053 LearningRate 0.0001 Epoch: 0 Global Step: 600 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-03-25 22:46:40,177-Speed 25465.73 samples/sec Loss 38.9112 LearningRate 0.0001 Epoch: 0 Global Step: 610 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-03-25 22:46:49,897-Speed 25287.79 samples/sec Loss 38.9328 LearningRate 0.0001 Epoch: 0 Global Step: 620 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-03-25 22:46:59,527-Speed 25523.88 samples/sec Loss 38.9337 LearningRate 0.0001 Epoch: 0 Global Step: 630 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-03-25 22:47:09,322-Speed 25094.00 samples/sec Loss 38.9425 LearningRate 0.0001 Epoch: 0 Global Step: 640 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-03-25 22:47:19,173-Speed 24953.98 samples/sec Loss 38.9526 LearningRate 0.0001 Epoch: 0 Global Step: 650 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-03-25 22:47:30,472-Speed 21753.67 samples/sec Loss 38.9557 LearningRate 0.0001 Epoch: 0 Global Step: 660 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-03-25 22:47:40,195-Speed 25279.94 samples/sec Loss 38.9613 LearningRate 0.0001 Epoch: 0 Global Step: 670 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-03-25 22:47:50,004-Speed 25057.28 samples/sec Loss 38.9586 LearningRate 0.0001 Epoch: 0 Global Step: 680 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-03-25 22:47:59,841-Speed 24986.01 samples/sec Loss 38.9529 LearningRate 0.0001 Epoch: 0 Global Step: 690 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-03-25 22:48:09,576-Speed 25248.77 samples/sec Loss 38.9621 LearningRate 0.0001 Epoch: 0 Global Step: 700 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-03-25 22:48:19,331-Speed 25199.78 samples/sec Loss 38.9775 LearningRate 0.0001 Epoch: 0 Global Step: 710 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-03-25 22:48:29,087-Speed 25194.89 samples/sec Loss 38.9830 LearningRate 0.0001 Epoch: 0 Global Step: 720 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-03-25 22:48:38,797-Speed 25319.98 samples/sec Loss 38.9819 LearningRate 0.0001 Epoch: 0 Global Step: 730 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-03-25 22:48:48,522-Speed 25272.41 samples/sec Loss 38.9803 LearningRate 0.0001 Epoch: 0 Global Step: 740 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-03-25 22:48:58,343-Speed 25027.31 samples/sec Loss 38.9782 LearningRate 0.0001 Epoch: 0 Global Step: 750 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-25 22:49:08,065-Speed 25284.50 samples/sec Loss 38.9735 LearningRate 0.0001 Epoch: 0 Global Step: 760 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-25 22:49:17,930-Speed 24917.86 samples/sec Loss 38.9842 LearningRate 0.0001 Epoch: 0 Global Step: 770 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-25 22:49:27,816-Speed 24867.03 samples/sec Loss 38.9782 LearningRate 0.0001 Epoch: 0 Global Step: 780 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-25 22:49:37,695-Speed 24878.41 samples/sec Loss 38.9766 LearningRate 0.0001 Epoch: 0 Global Step: 790 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-25 22:49:47,415-Speed 25288.16 samples/sec Loss 38.9930 LearningRate 0.0001 Epoch: 0 Global Step: 800 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-25 22:49:57,106-Speed 25364.67 samples/sec Loss 38.9862 LearningRate 0.0001 Epoch: 0 Global Step: 810 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-25 22:50:06,886-Speed 25130.70 samples/sec Loss 38.9670 LearningRate 0.0001 Epoch: 0 Global Step: 820 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-25 22:50:16,699-Speed 25049.66 samples/sec Loss 38.9634 LearningRate 0.0001 Epoch: 0 Global Step: 830 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-25 22:50:26,525-Speed 25013.46 samples/sec Loss 38.9432 LearningRate 0.0001 Epoch: 0 Global Step: 840 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-25 22:50:37,507-Speed 22382.83 samples/sec Loss 38.9405 LearningRate 0.0001 Epoch: 0 Global Step: 850 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-25 22:50:47,280-Speed 25149.70 samples/sec Loss 38.9218 LearningRate 0.0001 Epoch: 0 Global Step: 860 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-25 22:50:57,092-Speed 25050.26 samples/sec Loss 38.8972 LearningRate 0.0001 Epoch: 0 Global Step: 870 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-25 22:51:06,865-Speed 25155.23 samples/sec Loss 38.8819 LearningRate 0.0001 Epoch: 0 Global Step: 880 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-25 22:51:16,679-Speed 25047.79 samples/sec Loss 38.8412 LearningRate 0.0001 Epoch: 0 Global Step: 890 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-25 22:51:26,401-Speed 25279.53 samples/sec Loss 38.8155 LearningRate 0.0001 Epoch: 0 Global Step: 900 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-25 22:51:36,468-Speed 24417.38 samples/sec Loss 38.7776 LearningRate 0.0001 Epoch: 0 Global Step: 910 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-25 22:51:46,288-Speed 25029.46 samples/sec Loss 38.7361 LearningRate 0.0001 Epoch: 0 Global Step: 920 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-25 22:51:56,058-Speed 25157.95 samples/sec Loss 38.7040 LearningRate 0.0001 Epoch: 0 Global Step: 930 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-25 22:52:05,902-Speed 24968.12 samples/sec Loss 38.6560 LearningRate 0.0001 Epoch: 0 Global Step: 940 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-25 22:52:15,684-Speed 25125.92 samples/sec Loss 38.6261 LearningRate 0.0001 Epoch: 0 Global Step: 950 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-25 22:52:25,446-Speed 25179.22 samples/sec Loss 38.5882 LearningRate 0.0001 Epoch: 0 Global Step: 960 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-25 22:52:35,462-Speed 24540.79 samples/sec Loss 38.5673 LearningRate 0.0001 Epoch: 0 Global Step: 970 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-25 22:52:45,302-Speed 24986.35 samples/sec Loss 38.5227 LearningRate 0.0001 Epoch: 0 Global Step: 980 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-25 22:52:55,141-Speed 24985.96 samples/sec Loss 38.4887 LearningRate 0.0001 Epoch: 0 Global Step: 990 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-25 22:53:04,954-Speed 25046.79 samples/sec Loss 38.4526 LearningRate 0.0001 Epoch: 0 Global Step: 1000 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-25 22:53:14,837-Speed 24876.65 samples/sec Loss 38.4188 LearningRate 0.0001 Epoch: 0 Global Step: 1010 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-25 22:53:24,687-Speed 24955.08 samples/sec Loss 38.3958 LearningRate 0.0001 Epoch: 0 Global Step: 1020 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-25 22:53:34,689-Speed 24573.73 samples/sec Loss 38.3687 LearningRate 0.0001 Epoch: 0 Global Step: 1030 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-25 22:53:44,512-Speed 25023.17 samples/sec Loss 38.3508 LearningRate 0.0002 Epoch: 0 Global Step: 1040 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-25 22:53:54,279-Speed 25165.41 samples/sec Loss 38.3439 LearningRate 0.0002 Epoch: 0 Global Step: 1050 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-25 22:54:04,055-Speed 25144.15 samples/sec Loss 38.2910 LearningRate 0.0002 Epoch: 0 Global Step: 1060 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-25 22:54:13,922-Speed 24908.75 samples/sec Loss 38.2325 LearningRate 0.0002 Epoch: 0 Global Step: 1070 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-25 22:54:23,600-Speed 25396.45 samples/sec Loss 38.2070 LearningRate 0.0002 Epoch: 0 Global Step: 1080 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-25 22:54:34,131-Speed 23339.99 samples/sec Loss 38.1601 LearningRate 0.0002 Epoch: 0 Global Step: 1090 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-25 22:54:43,846-Speed 25305.78 samples/sec Loss 38.1244 LearningRate 0.0002 Epoch: 0 Global Step: 1100 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-25 22:54:53,591-Speed 25224.01 samples/sec Loss 38.1062 LearningRate 0.0002 Epoch: 0 Global Step: 1110 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-25 22:55:03,334-Speed 25228.52 samples/sec Loss 38.0669 LearningRate 0.0002 Epoch: 0 Global Step: 1120 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-25 22:55:13,187-Speed 24945.68 samples/sec Loss 38.0845 LearningRate 0.0002 Epoch: 0 Global Step: 1130 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-25 22:55:23,041-Speed 24944.14 samples/sec Loss 38.0333 LearningRate 0.0002 Epoch: 0 Global Step: 1140 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-25 22:55:32,868-Speed 25013.82 samples/sec Loss 37.9502 LearningRate 0.0002 Epoch: 0 Global Step: 1150 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-25 22:55:42,595-Speed 25268.36 samples/sec Loss 37.9103 LearningRate 0.0002 Epoch: 0 Global Step: 1160 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-25 22:55:52,386-Speed 25105.04 samples/sec Loss 37.8745 LearningRate 0.0002 Epoch: 0 Global Step: 1170 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-25 22:56:02,242-Speed 24937.07 samples/sec Loss 37.8555 LearningRate 0.0002 Epoch: 0 Global Step: 1180 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-25 22:56:12,045-Speed 25073.10 samples/sec Loss 37.7807 LearningRate 0.0002 Epoch: 0 Global Step: 1190 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-03-25 22:56:21,797-Speed 25204.43 samples/sec Loss 37.7258 LearningRate 0.0002 Epoch: 0 Global Step: 1200 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-03-25 22:56:31,531-Speed 25251.95 samples/sec Loss 37.7467 LearningRate 0.0002 Epoch: 0 Global Step: 1210 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-03-25 22:56:42,084-Speed 23291.91 samples/sec Loss 37.6412 LearningRate 0.0002 Epoch: 0 Global Step: 1220 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-03-25 22:56:51,906-Speed 25022.93 samples/sec Loss 37.5869 LearningRate 0.0002 Epoch: 0 Global Step: 1230 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-03-25 22:57:01,780-Speed 24893.63 samples/sec Loss 37.5390 LearningRate 0.0002 Epoch: 0 Global Step: 1240 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-03-25 22:57:11,522-Speed 25238.10 samples/sec Loss 37.4656 LearningRate 0.0002 Epoch: 0 Global Step: 1250 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-03-25 22:57:21,332-Speed 25056.54 samples/sec Loss 37.3967 LearningRate 0.0002 Epoch: 0 Global Step: 1260 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-03-25 22:57:31,079-Speed 25218.55 samples/sec Loss 37.3382 LearningRate 0.0002 Epoch: 0 Global Step: 1270 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-03-25 22:57:41,158-Speed 24385.51 samples/sec Loss 37.3044 LearningRate 0.0002 Epoch: 0 Global Step: 1280 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-03-25 22:57:50,909-Speed 25205.78 samples/sec Loss 37.2822 LearningRate 0.0002 Epoch: 0 Global Step: 1290 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-03-25 22:58:00,655-Speed 25220.17 samples/sec Loss 37.2265 LearningRate 0.0002 Epoch: 0 Global Step: 1300 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-03-25 22:58:10,395-Speed 25236.94 samples/sec Loss 37.1217 LearningRate 0.0002 Epoch: 0 Global Step: 1310 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-03-25 22:58:20,271-Speed 24888.00 samples/sec Loss 37.1116 LearningRate 0.0002 Epoch: 0 Global Step: 1320 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-03-25 22:58:30,195-Speed 24768.94 samples/sec Loss 37.0782 LearningRate 0.0002 Epoch: 0 Global Step: 1330 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-03-25 22:58:39,960-Speed 25171.81 samples/sec Loss 37.0565 LearningRate 0.0002 Epoch: 0 Global Step: 1340 Fp16 Grad Scale: 2048 Required: 19 hours Training: 2022-03-25 22:58:49,774-Speed 25046.03 samples/sec Loss 36.9526 LearningRate 0.0002 Epoch: 0 Global Step: 1350 Fp16 Grad Scale: 2048 Required: 19 hours Training: 2022-03-25 22:58:59,534-Speed 25182.55 samples/sec Loss 36.8588 LearningRate 0.0002 Epoch: 0 Global Step: 1360 Fp16 Grad Scale: 2048 Required: 19 hours Training: 2022-03-25 22:59:09,293-Speed 25186.89 samples/sec Loss 36.8086 LearningRate 0.0002 Epoch: 0 Global Step: 1370 Fp16 Grad Scale: 2048 Required: 19 hours Training: 2022-03-25 22:59:19,027-Speed 25252.06 samples/sec Loss 36.7707 LearningRate 0.0002 Epoch: 0 Global Step: 1380 Fp16 Grad Scale: 2048 Required: 19 hours Training: 2022-03-25 22:59:28,767-Speed 25237.80 samples/sec Loss 36.7119 LearningRate 0.0002 Epoch: 0 Global Step: 1390 Fp16 Grad Scale: 2048 Required: 19 hours Training: 2022-03-25 22:59:38,574-Speed 25073.98 samples/sec Loss 36.6268 LearningRate 0.0002 Epoch: 0 Global Step: 1400 Fp16 Grad Scale: 2048 Required: 19 hours Training: 2022-03-25 22:59:48,289-Speed 25301.34 samples/sec Loss 36.6037 LearningRate 0.0002 Epoch: 0 Global Step: 1410 Fp16 Grad Scale: 2048 Required: 19 hours Training: 2022-03-25 22:59:58,191-Speed 24823.01 samples/sec Loss 36.6277 LearningRate 0.0002 Epoch: 0 Global Step: 1420 Fp16 Grad Scale: 2048 Required: 19 hours Training: 2022-03-25 23:00:07,998-Speed 25063.98 samples/sec Loss 36.5858 LearningRate 0.0002 Epoch: 0 Global Step: 1430 Fp16 Grad Scale: 1024 Required: 19 hours Training: 2022-03-25 23:00:17,726-Speed 25272.23 samples/sec Loss 36.5115 LearningRate 0.0002 Epoch: 0 Global Step: 1440 Fp16 Grad Scale: 1024 Required: 19 hours Training: 2022-03-25 23:00:27,473-Speed 25219.59 samples/sec Loss 36.6534 LearningRate 0.0002 Epoch: 0 Global Step: 1450 Fp16 Grad Scale: 1024 Required: 19 hours Training: 2022-03-25 23:00:37,185-Speed 25307.38 samples/sec Loss 36.4391 LearningRate 0.0002 Epoch: 0 Global Step: 1460 Fp16 Grad Scale: 1024 Required: 19 hours Training: 2022-03-25 23:00:47,005-Speed 25029.88 samples/sec Loss 36.3262 LearningRate 0.0002 Epoch: 0 Global Step: 1470 Fp16 Grad Scale: 1024 Required: 19 hours Training: 2022-03-25 23:00:56,650-Speed 25486.44 samples/sec Loss 36.2863 LearningRate 0.0002 Epoch: 0 Global Step: 1480 Fp16 Grad Scale: 1024 Required: 19 hours Training: 2022-03-25 23:01:06,397-Speed 25217.03 samples/sec Loss 36.1859 LearningRate 0.0002 Epoch: 0 Global Step: 1490 Fp16 Grad Scale: 1024 Required: 19 hours Training: 2022-03-25 23:01:16,278-Speed 24875.33 samples/sec Loss 36.0993 LearningRate 0.0002 Epoch: 0 Global Step: 1500 Fp16 Grad Scale: 1024 Required: 19 hours Training: 2022-03-25 23:01:26,049-Speed 25154.64 samples/sec Loss 35.9905 LearningRate 0.0002 Epoch: 0 Global Step: 1510 Fp16 Grad Scale: 1024 Required: 19 hours Training: 2022-03-25 23:01:35,841-Speed 25103.46 samples/sec Loss 35.9407 LearningRate 0.0002 Epoch: 0 Global Step: 1520 Fp16 Grad Scale: 1024 Required: 19 hours Training: 2022-03-25 23:01:45,730-Speed 24860.70 samples/sec Loss 35.9075 LearningRate 0.0002 Epoch: 0 Global Step: 1530 Fp16 Grad Scale: 2048 Required: 19 hours Training: 2022-03-25 23:01:55,543-Speed 25046.41 samples/sec Loss 35.8568 LearningRate 0.0002 Epoch: 0 Global Step: 1540 Fp16 Grad Scale: 2048 Required: 19 hours Training: 2022-03-25 23:02:05,468-Speed 24766.90 samples/sec Loss 35.8091 LearningRate 0.0002 Epoch: 0 Global Step: 1550 Fp16 Grad Scale: 2048 Required: 19 hours Training: 2022-03-25 23:02:15,258-Speed 25107.07 samples/sec Loss 35.7093 LearningRate 0.0002 Epoch: 0 Global Step: 1560 Fp16 Grad Scale: 2048 Required: 19 hours Training: 2022-03-25 23:02:24,948-Speed 25365.59 samples/sec Loss 35.6379 LearningRate 0.0002 Epoch: 0 Global Step: 1570 Fp16 Grad Scale: 2048 Required: 19 hours Training: 2022-03-25 23:02:34,600-Speed 25467.27 samples/sec Loss 35.5819 LearningRate 0.0002 Epoch: 0 Global Step: 1580 Fp16 Grad Scale: 2048 Required: 19 hours Training: 2022-03-25 23:02:44,354-Speed 25199.03 samples/sec Loss 35.5177 LearningRate 0.0002 Epoch: 0 Global Step: 1590 Fp16 Grad Scale: 2048 Required: 19 hours Training: 2022-03-25 23:02:54,098-Speed 25225.16 samples/sec Loss 35.4733 LearningRate 0.0002 Epoch: 0 Global Step: 1600 Fp16 Grad Scale: 2048 Required: 19 hours Training: 2022-03-25 23:03:03,803-Speed 25326.54 samples/sec Loss 35.3953 LearningRate 0.0002 Epoch: 0 Global Step: 1610 Fp16 Grad Scale: 2048 Required: 19 hours Training: 2022-03-25 23:03:13,656-Speed 24945.85 samples/sec Loss 35.3190 LearningRate 0.0002 Epoch: 0 Global Step: 1620 Fp16 Grad Scale: 2048 Required: 19 hours Training: 2022-03-25 23:03:23,359-Speed 25336.58 samples/sec Loss 35.2584 LearningRate 0.0002 Epoch: 0 Global Step: 1630 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-03-25 23:03:33,050-Speed 25361.03 samples/sec Loss 35.1773 LearningRate 0.0002 Epoch: 0 Global Step: 1640 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-03-25 23:03:42,810-Speed 25183.60 samples/sec Loss 35.1168 LearningRate 0.0002 Epoch: 0 Global Step: 1650 Fp16 Grad Scale: 4096 Required: 18 hours Training: 2022-03-25 23:03:52,591-Speed 25138.79 samples/sec Loss 35.0389 LearningRate 0.0002 Epoch: 0 Global Step: 1660 Fp16 Grad Scale: 4096 Required: 18 hours Training: 2022-03-25 23:04:02,432-Speed 24975.44 samples/sec Loss 34.9721 LearningRate 0.0002 Epoch: 0 Global Step: 1670 Fp16 Grad Scale: 4096 Required: 18 hours Training: 2022-03-25 23:04:12,221-Speed 25111.65 samples/sec Loss 34.9036 LearningRate 0.0002 Epoch: 0 Global Step: 1680 Fp16 Grad Scale: 4096 Required: 18 hours Training: 2022-03-25 23:04:21,921-Speed 25338.85 samples/sec Loss 34.8275 LearningRate 0.0002 Epoch: 0 Global Step: 1690 Fp16 Grad Scale: 4096 Required: 18 hours Training: 2022-03-25 23:04:31,677-Speed 25202.05 samples/sec Loss 34.7599 LearningRate 0.0002 Epoch: 0 Global Step: 1700 Fp16 Grad Scale: 4096 Required: 18 hours Training: 2022-03-25 23:04:41,463-Speed 25118.88 samples/sec Loss 34.6675 LearningRate 0.0002 Epoch: 0 Global Step: 1710 Fp16 Grad Scale: 4096 Required: 18 hours Training: 2022-03-25 23:04:51,190-Speed 25269.60 samples/sec Loss 34.5887 LearningRate 0.0002 Epoch: 0 Global Step: 1720 Fp16 Grad Scale: 4096 Required: 18 hours Training: 2022-03-25 23:05:49,792-Speed 4193.82 samples/sec Loss 34.5386 LearningRate 0.0003 Epoch: 1 Global Step: 1730 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-03-25 23:05:59,663-Speed 24900.48 samples/sec Loss 34.4704 LearningRate 0.0003 Epoch: 1 Global Step: 1740 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-03-25 23:06:09,497-Speed 24996.18 samples/sec Loss 34.3774 LearningRate 0.0003 Epoch: 1 Global Step: 1750 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-03-25 23:06:19,438-Speed 24731.98 samples/sec Loss 34.3104 LearningRate 0.0003 Epoch: 1 Global Step: 1760 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-03-25 23:06:29,398-Speed 24677.92 samples/sec Loss 34.2475 LearningRate 0.0003 Epoch: 1 Global Step: 1770 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-03-25 23:06:39,262-Speed 24920.67 samples/sec Loss 34.1715 LearningRate 0.0003 Epoch: 1 Global Step: 1780 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-03-25 23:06:49,171-Speed 24803.68 samples/sec Loss 34.0740 LearningRate 0.0003 Epoch: 1 Global Step: 1790 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-03-25 23:06:58,917-Speed 25219.17 samples/sec Loss 34.0136 LearningRate 0.0003 Epoch: 1 Global Step: 1800 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-03-25 23:07:08,612-Speed 25354.10 samples/sec Loss 33.9507 LearningRate 0.0003 Epoch: 1 Global Step: 1810 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-03-25 23:07:18,426-Speed 25043.86 samples/sec Loss 33.8603 LearningRate 0.0003 Epoch: 1 Global Step: 1820 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-03-25 23:07:28,089-Speed 25436.74 samples/sec Loss 33.7683 LearningRate 0.0003 Epoch: 1 Global Step: 1830 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-25 23:07:37,888-Speed 25083.87 samples/sec Loss 33.6876 LearningRate 0.0003 Epoch: 1 Global Step: 1840 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-25 23:07:47,612-Speed 25284.22 samples/sec Loss 33.6100 LearningRate 0.0003 Epoch: 1 Global Step: 1850 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-25 23:07:57,263-Speed 25466.85 samples/sec Loss 33.5325 LearningRate 0.0003 Epoch: 1 Global Step: 1860 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-25 23:08:06,923-Speed 25443.08 samples/sec Loss 33.4317 LearningRate 0.0003 Epoch: 1 Global Step: 1870 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-25 23:08:16,658-Speed 25247.89 samples/sec Loss 33.3378 LearningRate 0.0003 Epoch: 1 Global Step: 1880 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-25 23:08:26,313-Speed 25458.70 samples/sec Loss 33.2390 LearningRate 0.0003 Epoch: 1 Global Step: 1890 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-25 23:08:36,048-Speed 25248.86 samples/sec Loss 33.1705 LearningRate 0.0003 Epoch: 1 Global Step: 1900 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-25 23:08:45,991-Speed 24720.53 samples/sec Loss 33.0983 LearningRate 0.0003 Epoch: 1 Global Step: 1910 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-25 23:08:55,799-Speed 25063.41 samples/sec Loss 32.9861 LearningRate 0.0003 Epoch: 1 Global Step: 1920 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-25 23:09:05,705-Speed 24811.03 samples/sec Loss 32.9448 LearningRate 0.0003 Epoch: 1 Global Step: 1930 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-25 23:09:15,629-Speed 24767.99 samples/sec Loss 32.8222 LearningRate 0.0003 Epoch: 1 Global Step: 1940 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-25 23:09:25,504-Speed 24891.05 samples/sec Loss 32.6941 LearningRate 0.0003 Epoch: 1 Global Step: 1950 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-25 23:09:35,375-Speed 24909.39 samples/sec Loss 32.6399 LearningRate 0.0003 Epoch: 1 Global Step: 1960 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-25 23:09:45,208-Speed 24995.24 samples/sec Loss 32.5305 LearningRate 0.0003 Epoch: 1 Global Step: 1970 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-25 23:09:54,978-Speed 25165.76 samples/sec Loss 32.4602 LearningRate 0.0003 Epoch: 1 Global Step: 1980 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-25 23:10:04,881-Speed 24823.34 samples/sec Loss 32.3620 LearningRate 0.0003 Epoch: 1 Global Step: 1990 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-25 23:10:14,713-Speed 24999.25 samples/sec Loss 32.2720 LearningRate 0.0003 Epoch: 1 Global Step: 2000 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-25 23:10:24,651-Speed 24732.22 samples/sec Loss 32.1940 LearningRate 0.0003 Epoch: 1 Global Step: 2010 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-25 23:10:34,452-Speed 25077.44 samples/sec Loss 32.0679 LearningRate 0.0003 Epoch: 1 Global Step: 2020 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-25 23:10:44,484-Speed 24502.03 samples/sec Loss 31.9575 LearningRate 0.0003 Epoch: 1 Global Step: 2030 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-25 23:10:54,512-Speed 24511.65 samples/sec Loss 31.9039 LearningRate 0.0003 Epoch: 1 Global Step: 2040 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-25 23:11:04,360-Speed 24955.93 samples/sec Loss 31.7882 LearningRate 0.0003 Epoch: 1 Global Step: 2050 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-25 23:11:14,238-Speed 24882.98 samples/sec Loss 31.7037 LearningRate 0.0003 Epoch: 1 Global Step: 2060 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-25 23:11:24,179-Speed 24725.06 samples/sec Loss 31.5904 LearningRate 0.0003 Epoch: 1 Global Step: 2070 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-25 23:11:34,099-Speed 24777.58 samples/sec Loss 31.4572 LearningRate 0.0003 Epoch: 1 Global Step: 2080 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-25 23:11:44,070-Speed 24650.79 samples/sec Loss 31.3781 LearningRate 0.0003 Epoch: 1 Global Step: 2090 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-25 23:11:53,994-Speed 24767.28 samples/sec Loss 31.3101 LearningRate 0.0003 Epoch: 1 Global Step: 2100 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-03-25 23:12:04,102-Speed 24316.11 samples/sec Loss 31.1987 LearningRate 0.0003 Epoch: 1 Global Step: 2110 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-03-25 23:12:14,130-Speed 24510.62 samples/sec Loss 31.0848 LearningRate 0.0003 Epoch: 1 Global Step: 2120 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-03-25 23:12:23,979-Speed 24959.30 samples/sec Loss 31.0179 LearningRate 0.0003 Epoch: 1 Global Step: 2130 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-03-25 23:12:33,916-Speed 24740.25 samples/sec Loss 30.9685 LearningRate 0.0003 Epoch: 1 Global Step: 2140 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-03-25 23:12:43,804-Speed 24856.78 samples/sec Loss 30.8054 LearningRate 0.0003 Epoch: 1 Global Step: 2150 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-03-25 23:12:53,660-Speed 24941.99 samples/sec Loss 30.7122 LearningRate 0.0003 Epoch: 1 Global Step: 2160 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-03-25 23:13:03,415-Speed 25199.00 samples/sec Loss 30.6227 LearningRate 0.0003 Epoch: 1 Global Step: 2170 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-03-25 23:13:13,225-Speed 25055.10 samples/sec Loss 30.5507 LearningRate 0.0003 Epoch: 1 Global Step: 2180 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-03-25 23:13:22,938-Speed 25305.01 samples/sec Loss 30.3906 LearningRate 0.0003 Epoch: 1 Global Step: 2190 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-03-25 23:13:32,605-Speed 25426.92 samples/sec Loss 30.2623 LearningRate 0.0003 Epoch: 1 Global Step: 2200 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-03-25 23:13:42,483-Speed 24882.28 samples/sec Loss 30.1496 LearningRate 0.0003 Epoch: 1 Global Step: 2210 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-03-25 23:13:52,180-Speed 25348.24 samples/sec Loss 30.0365 LearningRate 0.0003 Epoch: 1 Global Step: 2220 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-03-25 23:14:01,911-Speed 25259.35 samples/sec Loss 30.0293 LearningRate 0.0003 Epoch: 1 Global Step: 2230 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-03-25 23:14:11,532-Speed 25546.73 samples/sec Loss 29.9535 LearningRate 0.0003 Epoch: 1 Global Step: 2240 Fp16 Grad Scale: 2048 Required: 19 hours Training: 2022-03-25 23:14:21,301-Speed 25161.23 samples/sec Loss 29.7620 LearningRate 0.0003 Epoch: 1 Global Step: 2250 Fp16 Grad Scale: 2048 Required: 19 hours Training: 2022-03-25 23:14:31,095-Speed 25095.03 samples/sec Loss 29.6482 LearningRate 0.0003 Epoch: 1 Global Step: 2260 Fp16 Grad Scale: 2048 Required: 19 hours Training: 2022-03-25 23:14:40,891-Speed 25091.62 samples/sec Loss 29.5748 LearningRate 0.0003 Epoch: 1 Global Step: 2270 Fp16 Grad Scale: 2048 Required: 19 hours Training: 2022-03-25 23:14:50,613-Speed 25280.88 samples/sec Loss 29.4368 LearningRate 0.0003 Epoch: 1 Global Step: 2280 Fp16 Grad Scale: 2048 Required: 19 hours Training: 2022-03-25 23:15:00,361-Speed 25215.68 samples/sec Loss 29.3380 LearningRate 0.0003 Epoch: 1 Global Step: 2290 Fp16 Grad Scale: 2048 Required: 19 hours Training: 2022-03-25 23:15:10,128-Speed 25164.90 samples/sec Loss 29.2221 LearningRate 0.0003 Epoch: 1 Global Step: 2300 Fp16 Grad Scale: 2048 Required: 19 hours Training: 2022-03-25 23:15:19,873-Speed 25221.98 samples/sec Loss 29.1020 LearningRate 0.0003 Epoch: 1 Global Step: 2310 Fp16 Grad Scale: 2048 Required: 19 hours Training: 2022-03-25 23:15:29,623-Speed 25210.88 samples/sec Loss 28.9722 LearningRate 0.0003 Epoch: 1 Global Step: 2320 Fp16 Grad Scale: 2048 Required: 19 hours Training: 2022-03-25 23:15:39,326-Speed 25332.75 samples/sec Loss 28.9109 LearningRate 0.0003 Epoch: 1 Global Step: 2330 Fp16 Grad Scale: 2048 Required: 19 hours Training: 2022-03-25 23:15:49,064-Speed 25239.42 samples/sec Loss 28.7423 LearningRate 0.0003 Epoch: 1 Global Step: 2340 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-03-25 23:15:58,844-Speed 25133.41 samples/sec Loss 28.6458 LearningRate 0.0003 Epoch: 1 Global Step: 2350 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-03-25 23:16:08,542-Speed 25345.84 samples/sec Loss 28.5085 LearningRate 0.0003 Epoch: 1 Global Step: 2360 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-03-25 23:16:18,335-Speed 25097.38 samples/sec Loss 28.4597 LearningRate 0.0003 Epoch: 1 Global Step: 2370 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-03-25 23:16:28,218-Speed 24871.83 samples/sec Loss 28.3246 LearningRate 0.0003 Epoch: 1 Global Step: 2380 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-03-25 23:16:38,003-Speed 25120.47 samples/sec Loss 28.1763 LearningRate 0.0003 Epoch: 1 Global Step: 2390 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-03-25 23:16:47,768-Speed 25173.69 samples/sec Loss 28.0865 LearningRate 0.0003 Epoch: 1 Global Step: 2400 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-03-25 23:16:57,578-Speed 25053.55 samples/sec Loss 27.9546 LearningRate 0.0003 Epoch: 1 Global Step: 2410 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-03-25 23:17:07,364-Speed 25117.12 samples/sec Loss 27.8524 LearningRate 0.0004 Epoch: 1 Global Step: 2420 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-03-25 23:17:17,083-Speed 25289.73 samples/sec Loss 27.7783 LearningRate 0.0004 Epoch: 1 Global Step: 2430 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-03-25 23:17:26,896-Speed 25050.44 samples/sec Loss 27.6127 LearningRate 0.0004 Epoch: 1 Global Step: 2440 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-03-25 23:17:36,626-Speed 25260.47 samples/sec Loss 27.5015 LearningRate 0.0004 Epoch: 1 Global Step: 2450 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-03-25 23:17:46,362-Speed 25247.13 samples/sec Loss 27.3621 LearningRate 0.0004 Epoch: 1 Global Step: 2460 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-03-25 23:17:56,105-Speed 25225.55 samples/sec Loss 27.2799 LearningRate 0.0004 Epoch: 1 Global Step: 2470 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-03-25 23:18:05,877-Speed 25154.32 samples/sec Loss 27.1513 LearningRate 0.0004 Epoch: 1 Global Step: 2480 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-03-25 23:18:15,617-Speed 25234.62 samples/sec Loss 27.0488 LearningRate 0.0004 Epoch: 1 Global Step: 2490 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-03-25 23:18:25,371-Speed 25199.02 samples/sec Loss 26.9313 LearningRate 0.0004 Epoch: 1 Global Step: 2500 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-03-25 23:18:35,077-Speed 25323.45 samples/sec Loss 26.8186 LearningRate 0.0004 Epoch: 1 Global Step: 2510 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-03-25 23:18:44,966-Speed 24854.79 samples/sec Loss 26.7437 LearningRate 0.0004 Epoch: 1 Global Step: 2520 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-03-25 23:18:54,740-Speed 25148.84 samples/sec Loss 26.5722 LearningRate 0.0004 Epoch: 1 Global Step: 2530 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-03-25 23:19:04,462-Speed 25280.59 samples/sec Loss 26.3833 LearningRate 0.0004 Epoch: 1 Global Step: 2540 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-03-25 23:19:14,176-Speed 25304.65 samples/sec Loss 26.3266 LearningRate 0.0004 Epoch: 1 Global Step: 2550 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-03-25 23:19:24,024-Speed 24957.82 samples/sec Loss 26.2671 LearningRate 0.0004 Epoch: 1 Global Step: 2560 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-03-25 23:19:33,707-Speed 25382.74 samples/sec Loss 26.0960 LearningRate 0.0004 Epoch: 1 Global Step: 2570 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-03-25 23:19:43,584-Speed 24885.25 samples/sec Loss 25.9456 LearningRate 0.0004 Epoch: 1 Global Step: 2580 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-03-25 23:19:53,320-Speed 25245.32 samples/sec Loss 25.8807 LearningRate 0.0004 Epoch: 1 Global Step: 2590 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-03-25 23:20:03,045-Speed 25274.59 samples/sec Loss 25.7070 LearningRate 0.0004 Epoch: 1 Global Step: 2600 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-03-25 23:20:12,819-Speed 25145.51 samples/sec Loss 25.5839 LearningRate 0.0004 Epoch: 1 Global Step: 2610 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-03-25 23:20:22,516-Speed 25347.50 samples/sec Loss 25.4604 LearningRate 0.0004 Epoch: 1 Global Step: 2620 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-03-25 23:20:32,445-Speed 24754.29 samples/sec Loss 25.3822 LearningRate 0.0004 Epoch: 1 Global Step: 2630 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-03-25 23:20:42,198-Speed 25200.97 samples/sec Loss 25.1756 LearningRate 0.0004 Epoch: 1 Global Step: 2640 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-03-25 23:20:51,950-Speed 25203.18 samples/sec Loss 25.1051 LearningRate 0.0004 Epoch: 1 Global Step: 2650 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-03-25 23:21:01,748-Speed 25084.55 samples/sec Loss 24.9740 LearningRate 0.0004 Epoch: 1 Global Step: 2660 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-03-25 23:21:11,554-Speed 25064.55 samples/sec Loss 24.8707 LearningRate 0.0004 Epoch: 1 Global Step: 2670 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-25 23:21:21,424-Speed 24902.45 samples/sec Loss 24.7900 LearningRate 0.0004 Epoch: 1 Global Step: 2680 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-25 23:21:31,246-Speed 25025.59 samples/sec Loss 24.6314 LearningRate 0.0004 Epoch: 1 Global Step: 2690 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-25 23:21:40,971-Speed 25274.11 samples/sec Loss 24.4532 LearningRate 0.0004 Epoch: 1 Global Step: 2700 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-25 23:21:50,797-Speed 25013.45 samples/sec Loss 24.3664 LearningRate 0.0004 Epoch: 1 Global Step: 2710 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-25 23:22:00,591-Speed 25097.87 samples/sec Loss 24.2363 LearningRate 0.0004 Epoch: 1 Global Step: 2720 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-25 23:22:10,442-Speed 24950.15 samples/sec Loss 24.1528 LearningRate 0.0004 Epoch: 1 Global Step: 2730 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-25 23:22:20,216-Speed 25149.74 samples/sec Loss 23.9904 LearningRate 0.0004 Epoch: 1 Global Step: 2740 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-03-25 23:22:29,987-Speed 25152.79 samples/sec Loss 23.8639 LearningRate 0.0004 Epoch: 1 Global Step: 2750 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-03-25 23:22:39,760-Speed 25156.50 samples/sec Loss 23.7269 LearningRate 0.0004 Epoch: 1 Global Step: 2760 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-03-25 23:22:49,762-Speed 24573.17 samples/sec Loss 23.6158 LearningRate 0.0004 Epoch: 1 Global Step: 2770 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-03-25 23:22:59,587-Speed 25018.46 samples/sec Loss 23.4984 LearningRate 0.0004 Epoch: 1 Global Step: 2780 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-03-25 23:23:09,342-Speed 25195.89 samples/sec Loss 23.3760 LearningRate 0.0004 Epoch: 1 Global Step: 2790 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-03-25 23:23:19,085-Speed 25227.78 samples/sec Loss 23.2356 LearningRate 0.0004 Epoch: 1 Global Step: 2800 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-03-25 23:23:28,974-Speed 24855.70 samples/sec Loss 23.1308 LearningRate 0.0004 Epoch: 1 Global Step: 2810 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-03-25 23:23:38,697-Speed 25278.51 samples/sec Loss 23.0694 LearningRate 0.0004 Epoch: 1 Global Step: 2820 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-03-25 23:23:48,461-Speed 25173.10 samples/sec Loss 22.9124 LearningRate 0.0004 Epoch: 1 Global Step: 2830 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-03-25 23:23:58,250-Speed 25108.78 samples/sec Loss 22.7914 LearningRate 0.0004 Epoch: 1 Global Step: 2840 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-25 23:24:08,131-Speed 24873.48 samples/sec Loss 22.6964 LearningRate 0.0004 Epoch: 1 Global Step: 2850 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-25 23:24:17,891-Speed 25181.79 samples/sec Loss 22.5736 LearningRate 0.0004 Epoch: 1 Global Step: 2860 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-25 23:24:27,707-Speed 25039.65 samples/sec Loss 22.3991 LearningRate 0.0004 Epoch: 1 Global Step: 2870 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-25 23:24:37,535-Speed 25015.40 samples/sec Loss 22.3019 LearningRate 0.0004 Epoch: 1 Global Step: 2880 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-25 23:24:47,245-Speed 25319.49 samples/sec Loss 22.1840 LearningRate 0.0004 Epoch: 1 Global Step: 2890 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-25 23:24:56,964-Speed 25288.64 samples/sec Loss 22.0701 LearningRate 0.0004 Epoch: 1 Global Step: 2900 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-25 23:25:06,791-Speed 25009.78 samples/sec Loss 21.8759 LearningRate 0.0004 Epoch: 1 Global Step: 2910 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-25 23:25:16,718-Speed 24763.15 samples/sec Loss 21.8232 LearningRate 0.0004 Epoch: 1 Global Step: 2920 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-25 23:25:26,487-Speed 25160.05 samples/sec Loss 21.7388 LearningRate 0.0004 Epoch: 1 Global Step: 2930 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-25 23:25:36,348-Speed 24926.86 samples/sec Loss 21.5801 LearningRate 0.0004 Epoch: 1 Global Step: 2940 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-25 23:25:46,084-Speed 25247.66 samples/sec Loss 21.4782 LearningRate 0.0004 Epoch: 1 Global Step: 2950 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-25 23:25:55,828-Speed 25224.72 samples/sec Loss 21.3349 LearningRate 0.0004 Epoch: 1 Global Step: 2960 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-25 23:26:05,583-Speed 25196.26 samples/sec Loss 21.2125 LearningRate 0.0004 Epoch: 1 Global Step: 2970 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-25 23:26:15,368-Speed 25121.02 samples/sec Loss 21.1110 LearningRate 0.0004 Epoch: 1 Global Step: 2980 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-25 23:26:25,104-Speed 25246.69 samples/sec Loss 20.9764 LearningRate 0.0004 Epoch: 1 Global Step: 2990 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-25 23:26:34,962-Speed 24933.38 samples/sec Loss 20.8678 LearningRate 0.0004 Epoch: 1 Global Step: 3000 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-25 23:26:44,706-Speed 25232.20 samples/sec Loss 20.7297 LearningRate 0.0004 Epoch: 1 Global Step: 3010 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-25 23:26:54,407-Speed 25336.29 samples/sec Loss 20.6361 LearningRate 0.0004 Epoch: 1 Global Step: 3020 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-25 23:27:04,185-Speed 25139.13 samples/sec Loss 20.5041 LearningRate 0.0004 Epoch: 1 Global Step: 3030 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-25 23:27:13,954-Speed 25159.72 samples/sec Loss 20.4249 LearningRate 0.0004 Epoch: 1 Global Step: 3040 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-25 23:27:23,674-Speed 25289.26 samples/sec Loss 20.2864 LearningRate 0.0004 Epoch: 1 Global Step: 3050 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-25 23:27:33,494-Speed 25028.91 samples/sec Loss 20.2047 LearningRate 0.0004 Epoch: 1 Global Step: 3060 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-25 23:27:43,282-Speed 25113.80 samples/sec Loss 20.1353 LearningRate 0.0004 Epoch: 1 Global Step: 3070 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-25 23:27:53,064-Speed 25133.60 samples/sec Loss 19.9164 LearningRate 0.0004 Epoch: 1 Global Step: 3080 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-25 23:28:02,854-Speed 25105.58 samples/sec Loss 19.8458 LearningRate 0.0004 Epoch: 1 Global Step: 3090 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-25 23:28:12,713-Speed 24930.63 samples/sec Loss 19.7557 LearningRate 0.0004 Epoch: 1 Global Step: 3100 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-25 23:28:22,443-Speed 25261.38 samples/sec Loss 19.6165 LearningRate 0.0004 Epoch: 1 Global Step: 3110 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-25 23:28:32,217-Speed 25148.00 samples/sec Loss 19.5574 LearningRate 0.0005 Epoch: 1 Global Step: 3120 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-25 23:28:42,247-Speed 24507.88 samples/sec Loss 19.4151 LearningRate 0.0005 Epoch: 1 Global Step: 3130 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-25 23:28:52,265-Speed 24533.52 samples/sec Loss 19.3071 LearningRate 0.0005 Epoch: 1 Global Step: 3140 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-25 23:29:02,288-Speed 24521.72 samples/sec Loss 19.2118 LearningRate 0.0005 Epoch: 1 Global Step: 3150 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-25 23:29:12,310-Speed 24525.79 samples/sec Loss 19.1147 LearningRate 0.0005 Epoch: 1 Global Step: 3160 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-25 23:29:22,324-Speed 24546.47 samples/sec Loss 19.0137 LearningRate 0.0005 Epoch: 1 Global Step: 3170 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-25 23:29:32,335-Speed 24551.03 samples/sec Loss 18.9172 LearningRate 0.0005 Epoch: 1 Global Step: 3180 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-25 23:29:42,374-Speed 24484.38 samples/sec Loss 18.7752 LearningRate 0.0005 Epoch: 1 Global Step: 3190 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-25 23:29:52,433-Speed 24433.77 samples/sec Loss 18.6822 LearningRate 0.0005 Epoch: 1 Global Step: 3200 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-25 23:30:02,430-Speed 24587.29 samples/sec Loss 18.6161 LearningRate 0.0005 Epoch: 1 Global Step: 3210 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-25 23:30:12,467-Speed 24488.55 samples/sec Loss 18.4775 LearningRate 0.0005 Epoch: 1 Global Step: 3220 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-25 23:30:22,298-Speed 25002.10 samples/sec Loss 18.3985 LearningRate 0.0005 Epoch: 1 Global Step: 3230 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-25 23:30:32,127-Speed 25007.64 samples/sec Loss 18.2802 LearningRate 0.0005 Epoch: 1 Global Step: 3240 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-25 23:30:42,041-Speed 24794.09 samples/sec Loss 18.2469 LearningRate 0.0005 Epoch: 1 Global Step: 3250 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-25 23:30:51,821-Speed 25134.25 samples/sec Loss 18.1722 LearningRate 0.0005 Epoch: 1 Global Step: 3260 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-25 23:31:01,640-Speed 25033.96 samples/sec Loss 18.0265 LearningRate 0.0005 Epoch: 1 Global Step: 3270 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-25 23:31:11,488-Speed 24966.02 samples/sec Loss 17.9772 LearningRate 0.0005 Epoch: 1 Global Step: 3280 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-25 23:31:21,408-Speed 24778.64 samples/sec Loss 17.8233 LearningRate 0.0005 Epoch: 1 Global Step: 3290 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-25 23:31:31,202-Speed 25098.75 samples/sec Loss 17.7702 LearningRate 0.0005 Epoch: 1 Global Step: 3300 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-25 23:31:41,083-Speed 24876.48 samples/sec Loss 17.7215 LearningRate 0.0005 Epoch: 1 Global Step: 3310 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-25 23:31:50,968-Speed 24866.10 samples/sec Loss 17.6012 LearningRate 0.0005 Epoch: 1 Global Step: 3320 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-25 23:32:00,850-Speed 24871.41 samples/sec Loss 17.4883 LearningRate 0.0005 Epoch: 1 Global Step: 3330 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-25 23:32:10,637-Speed 25115.86 samples/sec Loss 17.4178 LearningRate 0.0005 Epoch: 1 Global Step: 3340 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-25 23:32:20,426-Speed 25106.67 samples/sec Loss 17.3480 LearningRate 0.0005 Epoch: 1 Global Step: 3350 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-25 23:32:30,319-Speed 24846.18 samples/sec Loss 17.2066 LearningRate 0.0005 Epoch: 1 Global Step: 3360 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-25 23:32:40,196-Speed 24888.77 samples/sec Loss 17.1509 LearningRate 0.0005 Epoch: 1 Global Step: 3370 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:32:50,028-Speed 25005.31 samples/sec Loss 17.0565 LearningRate 0.0005 Epoch: 1 Global Step: 3380 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:32:59,910-Speed 24873.97 samples/sec Loss 16.9704 LearningRate 0.0005 Epoch: 1 Global Step: 3390 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:33:09,769-Speed 24931.94 samples/sec Loss 16.9162 LearningRate 0.0005 Epoch: 1 Global Step: 3400 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:33:19,576-Speed 25061.83 samples/sec Loss 16.8070 LearningRate 0.0005 Epoch: 1 Global Step: 3410 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:33:29,445-Speed 24913.75 samples/sec Loss 16.7561 LearningRate 0.0005 Epoch: 1 Global Step: 3420 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:33:39,278-Speed 24995.35 samples/sec Loss 16.6870 LearningRate 0.0005 Epoch: 1 Global Step: 3430 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:33:49,088-Speed 25057.64 samples/sec Loss 16.5930 LearningRate 0.0005 Epoch: 1 Global Step: 3440 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:33:58,959-Speed 24899.82 samples/sec Loss 16.4984 LearningRate 0.0005 Epoch: 1 Global Step: 3450 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:34:57,707-Speed 4183.37 samples/sec Loss 16.3728 LearningRate 0.0005 Epoch: 2 Global Step: 3460 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:35:07,667-Speed 24679.81 samples/sec Loss 16.3247 LearningRate 0.0005 Epoch: 2 Global Step: 3470 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:35:17,465-Speed 25086.25 samples/sec Loss 16.2328 LearningRate 0.0005 Epoch: 2 Global Step: 3480 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:35:27,276-Speed 25051.62 samples/sec Loss 16.1593 LearningRate 0.0005 Epoch: 2 Global Step: 3490 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:35:37,009-Speed 25255.88 samples/sec Loss 16.0609 LearningRate 0.0005 Epoch: 2 Global Step: 3500 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:35:46,731-Speed 25279.23 samples/sec Loss 15.9801 LearningRate 0.0005 Epoch: 2 Global Step: 3510 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:35:56,502-Speed 25158.14 samples/sec Loss 15.9200 LearningRate 0.0005 Epoch: 2 Global Step: 3520 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:36:06,189-Speed 25372.74 samples/sec Loss 15.8597 LearningRate 0.0005 Epoch: 2 Global Step: 3530 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:36:15,986-Speed 25087.32 samples/sec Loss 15.8265 LearningRate 0.0005 Epoch: 2 Global Step: 3540 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:36:25,788-Speed 25074.90 samples/sec Loss 15.7255 LearningRate 0.0005 Epoch: 2 Global Step: 3550 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:36:35,561-Speed 25152.20 samples/sec Loss 15.6217 LearningRate 0.0005 Epoch: 2 Global Step: 3560 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:36:45,406-Speed 24965.75 samples/sec Loss 15.5476 LearningRate 0.0005 Epoch: 2 Global Step: 3570 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:36:55,122-Speed 25298.09 samples/sec Loss 15.5001 LearningRate 0.0005 Epoch: 2 Global Step: 3580 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:37:04,945-Speed 25021.74 samples/sec Loss 15.4222 LearningRate 0.0005 Epoch: 2 Global Step: 3590 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:37:14,671-Speed 25272.34 samples/sec Loss 15.3599 LearningRate 0.0005 Epoch: 2 Global Step: 3600 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:37:24,428-Speed 25198.43 samples/sec Loss 15.3094 LearningRate 0.0005 Epoch: 2 Global Step: 3610 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:37:34,106-Speed 25401.97 samples/sec Loss 15.1692 LearningRate 0.0005 Epoch: 2 Global Step: 3620 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:37:43,939-Speed 24999.28 samples/sec Loss 15.1055 LearningRate 0.0005 Epoch: 2 Global Step: 3630 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:37:53,773-Speed 24994.45 samples/sec Loss 15.0738 LearningRate 0.0005 Epoch: 2 Global Step: 3640 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:38:03,526-Speed 25205.36 samples/sec Loss 15.0974 LearningRate 0.0005 Epoch: 2 Global Step: 3650 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:38:13,313-Speed 25116.48 samples/sec Loss 14.9016 LearningRate 0.0005 Epoch: 2 Global Step: 3660 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:38:23,031-Speed 25292.23 samples/sec Loss 14.8859 LearningRate 0.0005 Epoch: 2 Global Step: 3670 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:38:32,843-Speed 25050.72 samples/sec Loss 14.7951 LearningRate 0.0005 Epoch: 2 Global Step: 3680 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:38:42,582-Speed 25244.15 samples/sec Loss 14.8158 LearningRate 0.0005 Epoch: 2 Global Step: 3690 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:38:52,380-Speed 25090.83 samples/sec Loss 14.7362 LearningRate 0.0005 Epoch: 2 Global Step: 3700 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:39:02,083-Speed 25329.69 samples/sec Loss 14.7002 LearningRate 0.0005 Epoch: 2 Global Step: 3710 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:39:11,820-Speed 25243.74 samples/sec Loss 14.5990 LearningRate 0.0005 Epoch: 2 Global Step: 3720 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:39:21,606-Speed 25118.53 samples/sec Loss 14.5215 LearningRate 0.0005 Epoch: 2 Global Step: 3730 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:39:31,435-Speed 25006.42 samples/sec Loss 14.4906 LearningRate 0.0005 Epoch: 2 Global Step: 3740 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:39:41,149-Speed 25302.30 samples/sec Loss 14.4105 LearningRate 0.0005 Epoch: 2 Global Step: 3750 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:39:50,888-Speed 25239.12 samples/sec Loss 14.3434 LearningRate 0.0005 Epoch: 2 Global Step: 3760 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:40:00,718-Speed 25003.55 samples/sec Loss 14.2330 LearningRate 0.0005 Epoch: 2 Global Step: 3770 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:40:10,424-Speed 25323.92 samples/sec Loss 14.1956 LearningRate 0.0005 Epoch: 2 Global Step: 3780 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:40:20,182-Speed 25196.42 samples/sec Loss 14.2314 LearningRate 0.0005 Epoch: 2 Global Step: 3790 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:40:29,971-Speed 25106.13 samples/sec Loss 14.1366 LearningRate 0.0005 Epoch: 2 Global Step: 3800 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:40:39,726-Speed 25197.32 samples/sec Loss 14.0105 LearningRate 0.0006 Epoch: 2 Global Step: 3810 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:40:49,456-Speed 25260.65 samples/sec Loss 13.9939 LearningRate 0.0006 Epoch: 2 Global Step: 3820 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:40:59,147-Speed 25361.11 samples/sec Loss 13.9411 LearningRate 0.0006 Epoch: 2 Global Step: 3830 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:41:08,842-Speed 25354.19 samples/sec Loss 13.9214 LearningRate 0.0006 Epoch: 2 Global Step: 3840 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:41:18,552-Speed 25312.57 samples/sec Loss 13.7856 LearningRate 0.0006 Epoch: 2 Global Step: 3850 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:41:28,330-Speed 25140.65 samples/sec Loss 13.7677 LearningRate 0.0006 Epoch: 2 Global Step: 3860 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:41:38,009-Speed 25393.24 samples/sec Loss 13.7017 LearningRate 0.0006 Epoch: 2 Global Step: 3870 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-03-25 23:41:47,749-Speed 25236.41 samples/sec Loss 13.6327 LearningRate 0.0006 Epoch: 2 Global Step: 3880 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:41:57,579-Speed 25003.21 samples/sec Loss 13.5456 LearningRate 0.0006 Epoch: 2 Global Step: 3890 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:42:07,314-Speed 25248.67 samples/sec Loss 13.5768 LearningRate 0.0006 Epoch: 2 Global Step: 3900 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:42:17,079-Speed 25170.45 samples/sec Loss 13.4974 LearningRate 0.0006 Epoch: 2 Global Step: 3910 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:42:26,946-Speed 24910.98 samples/sec Loss 13.5260 LearningRate 0.0006 Epoch: 2 Global Step: 3920 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:42:36,681-Speed 25249.00 samples/sec Loss 13.4213 LearningRate 0.0006 Epoch: 2 Global Step: 3930 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:42:46,454-Speed 25151.89 samples/sec Loss 13.3690 LearningRate 0.0006 Epoch: 2 Global Step: 3940 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:42:56,184-Speed 25259.51 samples/sec Loss 13.2970 LearningRate 0.0006 Epoch: 2 Global Step: 3950 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:43:05,888-Speed 25329.59 samples/sec Loss 13.2026 LearningRate 0.0006 Epoch: 2 Global Step: 3960 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:43:15,707-Speed 25035.02 samples/sec Loss 13.1293 LearningRate 0.0006 Epoch: 2 Global Step: 3970 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:43:25,471-Speed 25173.89 samples/sec Loss 13.1073 LearningRate 0.0006 Epoch: 2 Global Step: 3980 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:43:35,303-Speed 24999.28 samples/sec Loss 13.0804 LearningRate 0.0006 Epoch: 2 Global Step: 3990 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:43:45,046-Speed 25231.76 samples/sec Loss 13.0401 LearningRate 0.0006 Epoch: 2 Global Step: 4000 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:43:54,865-Speed 25031.91 samples/sec Loss 13.0453 LearningRate 0.0006 Epoch: 2 Global Step: 4010 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:44:04,620-Speed 25198.44 samples/sec Loss 12.9254 LearningRate 0.0006 Epoch: 2 Global Step: 4020 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:44:14,465-Speed 24966.65 samples/sec Loss 12.8736 LearningRate 0.0006 Epoch: 2 Global Step: 4030 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:44:24,209-Speed 25224.69 samples/sec Loss 12.8687 LearningRate 0.0006 Epoch: 2 Global Step: 4040 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:44:33,957-Speed 25214.93 samples/sec Loss 12.7996 LearningRate 0.0006 Epoch: 2 Global Step: 4050 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:44:43,707-Speed 25208.57 samples/sec Loss 12.7144 LearningRate 0.0006 Epoch: 2 Global Step: 4060 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:44:53,418-Speed 25310.90 samples/sec Loss 12.7112 LearningRate 0.0006 Epoch: 2 Global Step: 4070 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:45:03,119-Speed 25336.58 samples/sec Loss 12.6514 LearningRate 0.0006 Epoch: 2 Global Step: 4080 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:45:12,904-Speed 25121.01 samples/sec Loss 12.6124 LearningRate 0.0006 Epoch: 2 Global Step: 4090 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:45:22,654-Speed 25210.84 samples/sec Loss 12.5718 LearningRate 0.0006 Epoch: 2 Global Step: 4100 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:45:32,360-Speed 25326.26 samples/sec Loss 12.6235 LearningRate 0.0006 Epoch: 2 Global Step: 4110 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:45:42,072-Speed 25307.38 samples/sec Loss 12.5090 LearningRate 0.0006 Epoch: 2 Global Step: 4120 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:45:51,819-Speed 25217.09 samples/sec Loss 12.4400 LearningRate 0.0006 Epoch: 2 Global Step: 4130 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:46:01,570-Speed 25211.03 samples/sec Loss 12.4379 LearningRate 0.0006 Epoch: 2 Global Step: 4140 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:46:11,289-Speed 25290.91 samples/sec Loss 12.3433 LearningRate 0.0006 Epoch: 2 Global Step: 4150 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:46:21,151-Speed 24923.95 samples/sec Loss 12.2806 LearningRate 0.0006 Epoch: 2 Global Step: 4160 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:46:30,910-Speed 25187.19 samples/sec Loss 12.3000 LearningRate 0.0006 Epoch: 2 Global Step: 4170 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:46:40,615-Speed 25325.31 samples/sec Loss 12.2398 LearningRate 0.0006 Epoch: 2 Global Step: 4180 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:46:50,560-Speed 24715.79 samples/sec Loss 12.1789 LearningRate 0.0006 Epoch: 2 Global Step: 4190 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:47:00,302-Speed 25235.29 samples/sec Loss 12.1331 LearningRate 0.0006 Epoch: 2 Global Step: 4200 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:47:10,118-Speed 25040.59 samples/sec Loss 12.1272 LearningRate 0.0006 Epoch: 2 Global Step: 4210 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:47:19,898-Speed 25133.54 samples/sec Loss 12.0899 LearningRate 0.0006 Epoch: 2 Global Step: 4220 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:47:29,693-Speed 25094.69 samples/sec Loss 12.0265 LearningRate 0.0006 Epoch: 2 Global Step: 4230 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:47:39,535-Speed 24974.47 samples/sec Loss 12.0830 LearningRate 0.0006 Epoch: 2 Global Step: 4240 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:47:49,328-Speed 25098.36 samples/sec Loss 11.9514 LearningRate 0.0006 Epoch: 2 Global Step: 4250 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:47:59,096-Speed 25162.92 samples/sec Loss 11.9148 LearningRate 0.0006 Epoch: 2 Global Step: 4260 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:48:08,934-Speed 24984.37 samples/sec Loss 11.8832 LearningRate 0.0006 Epoch: 2 Global Step: 4270 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:48:18,646-Speed 25307.81 samples/sec Loss 11.9524 LearningRate 0.0006 Epoch: 2 Global Step: 4280 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:48:28,471-Speed 25016.85 samples/sec Loss 11.9027 LearningRate 0.0006 Epoch: 2 Global Step: 4290 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:48:38,259-Speed 25112.91 samples/sec Loss 11.7897 LearningRate 0.0006 Epoch: 2 Global Step: 4300 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:48:48,109-Speed 24959.77 samples/sec Loss 11.7582 LearningRate 0.0006 Epoch: 2 Global Step: 4310 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:48:57,852-Speed 25229.31 samples/sec Loss 11.7283 LearningRate 0.0006 Epoch: 2 Global Step: 4320 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:49:07,633-Speed 25131.05 samples/sec Loss 11.6573 LearningRate 0.0006 Epoch: 2 Global Step: 4330 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:49:17,375-Speed 25231.12 samples/sec Loss 11.6564 LearningRate 0.0006 Epoch: 2 Global Step: 4340 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:49:27,208-Speed 24996.63 samples/sec Loss 11.6620 LearningRate 0.0006 Epoch: 2 Global Step: 4350 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:49:37,022-Speed 25044.90 samples/sec Loss 11.5483 LearningRate 0.0006 Epoch: 2 Global Step: 4360 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:49:46,783-Speed 25180.21 samples/sec Loss 11.5244 LearningRate 0.0006 Epoch: 2 Global Step: 4370 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:49:56,497-Speed 25303.70 samples/sec Loss 11.5562 LearningRate 0.0006 Epoch: 2 Global Step: 4380 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:50:06,302-Speed 25076.01 samples/sec Loss 11.5065 LearningRate 0.0006 Epoch: 2 Global Step: 4390 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:50:16,100-Speed 25085.51 samples/sec Loss 11.4748 LearningRate 0.0006 Epoch: 2 Global Step: 4400 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:50:25,898-Speed 25086.95 samples/sec Loss 11.4196 LearningRate 0.0006 Epoch: 2 Global Step: 4410 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:50:35,707-Speed 25060.21 samples/sec Loss 11.2997 LearningRate 0.0006 Epoch: 2 Global Step: 4420 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:50:45,503-Speed 25093.93 samples/sec Loss 11.3310 LearningRate 0.0006 Epoch: 2 Global Step: 4430 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:50:55,251-Speed 25216.23 samples/sec Loss 11.3364 LearningRate 0.0006 Epoch: 2 Global Step: 4440 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:51:05,080-Speed 25005.39 samples/sec Loss 11.2949 LearningRate 0.0006 Epoch: 2 Global Step: 4450 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:51:14,932-Speed 24948.79 samples/sec Loss 11.2041 LearningRate 0.0006 Epoch: 2 Global Step: 4460 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:51:24,668-Speed 25252.34 samples/sec Loss 11.2298 LearningRate 0.0006 Epoch: 2 Global Step: 4470 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-25 23:51:34,469-Speed 25078.05 samples/sec Loss 11.2525 LearningRate 0.0006 Epoch: 2 Global Step: 4480 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-25 23:51:44,352-Speed 24871.25 samples/sec Loss 11.1555 LearningRate 0.0006 Epoch: 2 Global Step: 4490 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-25 23:51:54,140-Speed 25113.61 samples/sec Loss 11.1016 LearningRate 0.0007 Epoch: 2 Global Step: 4500 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-25 23:52:03,951-Speed 25054.44 samples/sec Loss 11.0671 LearningRate 0.0007 Epoch: 2 Global Step: 4510 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-25 23:52:13,792-Speed 24975.34 samples/sec Loss 11.0565 LearningRate 0.0007 Epoch: 2 Global Step: 4520 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-25 23:52:23,575-Speed 25124.43 samples/sec Loss 11.0361 LearningRate 0.0007 Epoch: 2 Global Step: 4530 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-25 23:52:33,275-Speed 25341.27 samples/sec Loss 11.0075 LearningRate 0.0007 Epoch: 2 Global Step: 4540 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-25 23:52:42,960-Speed 25377.26 samples/sec Loss 11.0614 LearningRate 0.0007 Epoch: 2 Global Step: 4550 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-25 23:52:52,737-Speed 25139.88 samples/sec Loss 11.0280 LearningRate 0.0007 Epoch: 2 Global Step: 4560 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-25 23:53:02,567-Speed 25005.29 samples/sec Loss 10.9427 LearningRate 0.0007 Epoch: 2 Global Step: 4570 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:53:12,352-Speed 25119.12 samples/sec Loss 10.8824 LearningRate 0.0007 Epoch: 2 Global Step: 4580 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:53:22,106-Speed 25198.72 samples/sec Loss 10.8102 LearningRate 0.0007 Epoch: 2 Global Step: 4590 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:53:31,876-Speed 25161.07 samples/sec Loss 10.8607 LearningRate 0.0007 Epoch: 2 Global Step: 4600 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:53:41,722-Speed 24964.47 samples/sec Loss 10.8664 LearningRate 0.0007 Epoch: 2 Global Step: 4610 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:53:51,532-Speed 25055.69 samples/sec Loss 10.8339 LearningRate 0.0007 Epoch: 2 Global Step: 4620 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:54:01,219-Speed 25371.86 samples/sec Loss 10.6886 LearningRate 0.0007 Epoch: 2 Global Step: 4630 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:54:10,985-Speed 25169.49 samples/sec Loss 10.7700 LearningRate 0.0007 Epoch: 2 Global Step: 4640 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:54:20,850-Speed 24917.16 samples/sec Loss 10.7013 LearningRate 0.0007 Epoch: 2 Global Step: 4650 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:54:30,635-Speed 25118.22 samples/sec Loss 10.6198 LearningRate 0.0007 Epoch: 2 Global Step: 4660 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:54:40,454-Speed 25040.51 samples/sec Loss 10.5787 LearningRate 0.0007 Epoch: 2 Global Step: 4670 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:54:50,191-Speed 25243.10 samples/sec Loss 10.5600 LearningRate 0.0007 Epoch: 2 Global Step: 4680 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:54:59,953-Speed 25181.89 samples/sec Loss 10.5815 LearningRate 0.0007 Epoch: 2 Global Step: 4690 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:55:09,811-Speed 24933.24 samples/sec Loss 10.5900 LearningRate 0.0007 Epoch: 2 Global Step: 4700 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:55:19,663-Speed 24950.51 samples/sec Loss 10.5495 LearningRate 0.0007 Epoch: 2 Global Step: 4710 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:55:29,453-Speed 25108.22 samples/sec Loss 10.4854 LearningRate 0.0007 Epoch: 2 Global Step: 4720 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:55:39,274-Speed 25028.60 samples/sec Loss 10.4569 LearningRate 0.0007 Epoch: 2 Global Step: 4730 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:55:49,103-Speed 25008.11 samples/sec Loss 10.4924 LearningRate 0.0007 Epoch: 2 Global Step: 4740 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:55:58,931-Speed 25009.09 samples/sec Loss 10.4289 LearningRate 0.0007 Epoch: 2 Global Step: 4750 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:56:08,699-Speed 25162.58 samples/sec Loss 10.4140 LearningRate 0.0007 Epoch: 2 Global Step: 4760 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:56:18,449-Speed 25212.08 samples/sec Loss 10.4119 LearningRate 0.0007 Epoch: 2 Global Step: 4770 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-03-25 23:56:28,305-Speed 24941.63 samples/sec Loss 10.3793 LearningRate 0.0007 Epoch: 2 Global Step: 4780 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:56:38,159-Speed 24943.38 samples/sec Loss 10.3522 LearningRate 0.0007 Epoch: 2 Global Step: 4790 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:56:47,896-Speed 25241.57 samples/sec Loss 10.3673 LearningRate 0.0007 Epoch: 2 Global Step: 4800 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:56:57,694-Speed 25087.72 samples/sec Loss 10.2997 LearningRate 0.0007 Epoch: 2 Global Step: 4810 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:57:07,452-Speed 25187.33 samples/sec Loss 10.2656 LearningRate 0.0007 Epoch: 2 Global Step: 4820 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:57:17,353-Speed 24827.43 samples/sec Loss 10.2868 LearningRate 0.0007 Epoch: 2 Global Step: 4830 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:57:27,118-Speed 25177.99 samples/sec Loss 10.1768 LearningRate 0.0007 Epoch: 2 Global Step: 4840 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:57:36,870-Speed 25205.31 samples/sec Loss 10.1621 LearningRate 0.0007 Epoch: 2 Global Step: 4850 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:57:46,737-Speed 24911.77 samples/sec Loss 10.1530 LearningRate 0.0007 Epoch: 2 Global Step: 4860 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:57:56,654-Speed 24784.97 samples/sec Loss 10.1176 LearningRate 0.0007 Epoch: 2 Global Step: 4870 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:58:06,353-Speed 25342.66 samples/sec Loss 10.1218 LearningRate 0.0007 Epoch: 2 Global Step: 4880 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:58:16,162-Speed 25057.98 samples/sec Loss 10.0869 LearningRate 0.0007 Epoch: 2 Global Step: 4890 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:58:25,999-Speed 24985.12 samples/sec Loss 10.0806 LearningRate 0.0007 Epoch: 2 Global Step: 4900 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:58:35,824-Speed 25018.33 samples/sec Loss 10.0787 LearningRate 0.0007 Epoch: 2 Global Step: 4910 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:58:45,650-Speed 25014.37 samples/sec Loss 10.0137 LearningRate 0.0007 Epoch: 2 Global Step: 4920 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:58:55,429-Speed 25134.10 samples/sec Loss 10.0020 LearningRate 0.0007 Epoch: 2 Global Step: 4930 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:59:05,193-Speed 25174.40 samples/sec Loss 9.9455 LearningRate 0.0007 Epoch: 2 Global Step: 4940 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:59:15,012-Speed 25030.26 samples/sec Loss 9.9494 LearningRate 0.0007 Epoch: 2 Global Step: 4950 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:59:24,776-Speed 25174.85 samples/sec Loss 9.9770 LearningRate 0.0007 Epoch: 2 Global Step: 4960 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:59:34,559-Speed 25124.38 samples/sec Loss 9.9730 LearningRate 0.0007 Epoch: 2 Global Step: 4970 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:59:44,439-Speed 24878.23 samples/sec Loss 9.9573 LearningRate 0.0007 Epoch: 2 Global Step: 4980 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-25 23:59:54,296-Speed 24935.87 samples/sec Loss 9.8760 LearningRate 0.0007 Epoch: 2 Global Step: 4990 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:00:04,096-Speed 25081.86 samples/sec Loss 9.8640 LearningRate 0.0007 Epoch: 2 Global Step: 5000 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:00:13,909-Speed 25049.23 samples/sec Loss 9.8768 LearningRate 0.0007 Epoch: 2 Global Step: 5010 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:00:23,719-Speed 25058.36 samples/sec Loss 9.8387 LearningRate 0.0007 Epoch: 2 Global Step: 5020 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:00:33,524-Speed 25068.39 samples/sec Loss 9.7866 LearningRate 0.0007 Epoch: 2 Global Step: 5030 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:00:43,267-Speed 25227.29 samples/sec Loss 9.8399 LearningRate 0.0007 Epoch: 2 Global Step: 5040 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:00:53,114-Speed 24962.07 samples/sec Loss 9.7810 LearningRate 0.0007 Epoch: 2 Global Step: 5050 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:01:02,893-Speed 25134.12 samples/sec Loss 9.7603 LearningRate 0.0007 Epoch: 2 Global Step: 5060 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:01:12,701-Speed 25062.07 samples/sec Loss 9.6907 LearningRate 0.0007 Epoch: 2 Global Step: 5070 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:01:22,496-Speed 25093.12 samples/sec Loss 9.7353 LearningRate 0.0007 Epoch: 2 Global Step: 5080 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:01:32,310-Speed 25050.01 samples/sec Loss 9.7265 LearningRate 0.0007 Epoch: 2 Global Step: 5090 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:01:42,061-Speed 25206.37 samples/sec Loss 9.6970 LearningRate 0.0007 Epoch: 2 Global Step: 5100 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:01:51,810-Speed 25214.50 samples/sec Loss 9.7057 LearningRate 0.0007 Epoch: 2 Global Step: 5110 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:02:01,650-Speed 24977.51 samples/sec Loss 9.6234 LearningRate 0.0007 Epoch: 2 Global Step: 5120 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:02:11,368-Speed 25290.98 samples/sec Loss 9.7377 LearningRate 0.0007 Epoch: 2 Global Step: 5130 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:02:21,061-Speed 25357.24 samples/sec Loss 9.6508 LearningRate 0.0007 Epoch: 2 Global Step: 5140 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:02:30,808-Speed 25220.80 samples/sec Loss 9.6799 LearningRate 0.0007 Epoch: 2 Global Step: 5150 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:02:40,520-Speed 25311.73 samples/sec Loss 9.5638 LearningRate 0.0007 Epoch: 2 Global Step: 5160 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:02:50,378-Speed 24933.31 samples/sec Loss 9.5930 LearningRate 0.0007 Epoch: 2 Global Step: 5170 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:03:00,179-Speed 25078.91 samples/sec Loss 9.6003 LearningRate 0.0007 Epoch: 2 Global Step: 5180 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:03:59,623-Speed 4134.44 samples/sec Loss 9.4701 LearningRate 0.0008 Epoch: 3 Global Step: 5190 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:04:09,330-Speed 25322.28 samples/sec Loss 9.4427 LearningRate 0.0008 Epoch: 3 Global Step: 5200 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:04:19,091-Speed 25181.88 samples/sec Loss 9.4165 LearningRate 0.0008 Epoch: 3 Global Step: 5210 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:04:28,786-Speed 25352.36 samples/sec Loss 9.4297 LearningRate 0.0008 Epoch: 3 Global Step: 5220 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:04:38,520-Speed 25249.93 samples/sec Loss 9.4170 LearningRate 0.0008 Epoch: 3 Global Step: 5230 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:04:48,175-Speed 25459.46 samples/sec Loss 9.4249 LearningRate 0.0008 Epoch: 3 Global Step: 5240 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:04:57,848-Speed 25410.10 samples/sec Loss 9.3451 LearningRate 0.0008 Epoch: 3 Global Step: 5250 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:05:07,582-Speed 25249.68 samples/sec Loss 9.3547 LearningRate 0.0008 Epoch: 3 Global Step: 5260 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:05:17,385-Speed 25073.18 samples/sec Loss 9.3459 LearningRate 0.0008 Epoch: 3 Global Step: 5270 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:05:27,079-Speed 25360.88 samples/sec Loss 9.2893 LearningRate 0.0008 Epoch: 3 Global Step: 5280 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:05:36,822-Speed 25230.88 samples/sec Loss 9.2653 LearningRate 0.0008 Epoch: 3 Global Step: 5290 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:05:46,598-Speed 25147.75 samples/sec Loss 9.2566 LearningRate 0.0008 Epoch: 3 Global Step: 5300 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:05:56,336-Speed 25241.67 samples/sec Loss 9.2670 LearningRate 0.0008 Epoch: 3 Global Step: 5310 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:06:06,055-Speed 25289.47 samples/sec Loss 9.2176 LearningRate 0.0008 Epoch: 3 Global Step: 5320 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:06:15,884-Speed 25006.69 samples/sec Loss 9.1890 LearningRate 0.0008 Epoch: 3 Global Step: 5330 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:06:25,617-Speed 25252.48 samples/sec Loss 9.2091 LearningRate 0.0008 Epoch: 3 Global Step: 5340 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:06:35,440-Speed 25023.01 samples/sec Loss 9.1918 LearningRate 0.0008 Epoch: 3 Global Step: 5350 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:06:45,288-Speed 24958.41 samples/sec Loss 9.1846 LearningRate 0.0008 Epoch: 3 Global Step: 5360 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:06:55,258-Speed 24652.77 samples/sec Loss 9.1476 LearningRate 0.0008 Epoch: 3 Global Step: 5370 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:07:05,070-Speed 25057.50 samples/sec Loss 9.1563 LearningRate 0.0008 Epoch: 3 Global Step: 5380 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:07:14,833-Speed 25183.64 samples/sec Loss 9.1902 LearningRate 0.0008 Epoch: 3 Global Step: 5390 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:07:24,636-Speed 25071.94 samples/sec Loss 9.0847 LearningRate 0.0008 Epoch: 3 Global Step: 5400 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:07:34,359-Speed 25279.70 samples/sec Loss 9.1499 LearningRate 0.0008 Epoch: 3 Global Step: 5410 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:07:43,993-Speed 25513.10 samples/sec Loss 9.1415 LearningRate 0.0008 Epoch: 3 Global Step: 5420 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:07:53,669-Speed 25400.29 samples/sec Loss 9.1243 LearningRate 0.0008 Epoch: 3 Global Step: 5430 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:08:03,445-Speed 25144.23 samples/sec Loss 9.1109 LearningRate 0.0008 Epoch: 3 Global Step: 5440 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:08:13,225-Speed 25130.78 samples/sec Loss 9.1249 LearningRate 0.0008 Epoch: 3 Global Step: 5450 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:08:22,902-Speed 25406.45 samples/sec Loss 9.1001 LearningRate 0.0008 Epoch: 3 Global Step: 5460 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:08:32,610-Speed 25318.35 samples/sec Loss 9.0919 LearningRate 0.0008 Epoch: 3 Global Step: 5470 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:08:42,298-Speed 25371.43 samples/sec Loss 8.9636 LearningRate 0.0008 Epoch: 3 Global Step: 5480 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:08:52,133-Speed 24992.26 samples/sec Loss 8.9695 LearningRate 0.0008 Epoch: 3 Global Step: 5490 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:09:01,850-Speed 25292.55 samples/sec Loss 8.9780 LearningRate 0.0008 Epoch: 3 Global Step: 5500 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:09:11,548-Speed 25346.54 samples/sec Loss 8.9259 LearningRate 0.0008 Epoch: 3 Global Step: 5510 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:09:21,316-Speed 25163.32 samples/sec Loss 8.9477 LearningRate 0.0008 Epoch: 3 Global Step: 5520 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:09:31,056-Speed 25232.77 samples/sec Loss 8.9673 LearningRate 0.0008 Epoch: 3 Global Step: 5530 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:09:40,806-Speed 25209.06 samples/sec Loss 8.9054 LearningRate 0.0008 Epoch: 3 Global Step: 5540 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:09:50,600-Speed 25097.34 samples/sec Loss 8.8796 LearningRate 0.0008 Epoch: 3 Global Step: 5550 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:10:00,318-Speed 25290.97 samples/sec Loss 8.8892 LearningRate 0.0008 Epoch: 3 Global Step: 5560 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:10:10,064-Speed 25221.00 samples/sec Loss 8.8740 LearningRate 0.0008 Epoch: 3 Global Step: 5570 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:10:19,910-Speed 24963.09 samples/sec Loss 8.8371 LearningRate 0.0008 Epoch: 3 Global Step: 5580 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:10:29,696-Speed 25117.61 samples/sec Loss 8.8797 LearningRate 0.0008 Epoch: 3 Global Step: 5590 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:10:39,485-Speed 25110.77 samples/sec Loss 8.8208 LearningRate 0.0008 Epoch: 3 Global Step: 5600 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:10:49,333-Speed 24957.71 samples/sec Loss 8.7724 LearningRate 0.0008 Epoch: 3 Global Step: 5610 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:10:59,026-Speed 25357.31 samples/sec Loss 8.8220 LearningRate 0.0008 Epoch: 3 Global Step: 5620 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:11:08,760-Speed 25252.15 samples/sec Loss 8.7634 LearningRate 0.0008 Epoch: 3 Global Step: 5630 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:11:18,505-Speed 25222.65 samples/sec Loss 8.7438 LearningRate 0.0008 Epoch: 3 Global Step: 5640 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:11:28,134-Speed 25527.61 samples/sec Loss 8.7767 LearningRate 0.0008 Epoch: 3 Global Step: 5650 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:11:37,890-Speed 25191.96 samples/sec Loss 8.8055 LearningRate 0.0008 Epoch: 3 Global Step: 5660 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:11:47,610-Speed 25289.09 samples/sec Loss 8.7738 LearningRate 0.0008 Epoch: 3 Global Step: 5670 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:11:57,371-Speed 25180.09 samples/sec Loss 8.6217 LearningRate 0.0008 Epoch: 3 Global Step: 5680 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-03-26 00:12:07,069-Speed 25344.44 samples/sec Loss 8.6296 LearningRate 0.0008 Epoch: 3 Global Step: 5690 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:12:16,815-Speed 25219.41 samples/sec Loss 8.6877 LearningRate 0.0008 Epoch: 3 Global Step: 5700 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:12:26,484-Speed 25420.79 samples/sec Loss 8.7370 LearningRate 0.0008 Epoch: 3 Global Step: 5710 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:12:36,313-Speed 25008.18 samples/sec Loss 8.6183 LearningRate 0.0008 Epoch: 3 Global Step: 5720 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:12:46,057-Speed 25224.56 samples/sec Loss 8.5889 LearningRate 0.0008 Epoch: 3 Global Step: 5730 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:12:55,850-Speed 25098.72 samples/sec Loss 8.6456 LearningRate 0.0008 Epoch: 3 Global Step: 5740 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:13:05,576-Speed 25270.20 samples/sec Loss 8.6471 LearningRate 0.0008 Epoch: 3 Global Step: 5750 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:13:15,423-Speed 24961.15 samples/sec Loss 8.5966 LearningRate 0.0008 Epoch: 3 Global Step: 5760 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:13:25,148-Speed 25274.57 samples/sec Loss 8.5759 LearningRate 0.0008 Epoch: 3 Global Step: 5770 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:13:34,895-Speed 25215.66 samples/sec Loss 8.5472 LearningRate 0.0008 Epoch: 3 Global Step: 5780 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:13:44,656-Speed 25182.06 samples/sec Loss 8.5257 LearningRate 0.0008 Epoch: 3 Global Step: 5790 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:13:54,473-Speed 25042.20 samples/sec Loss 8.4904 LearningRate 0.0008 Epoch: 3 Global Step: 5800 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:14:04,264-Speed 25103.59 samples/sec Loss 8.5452 LearningRate 0.0008 Epoch: 3 Global Step: 5810 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:14:13,954-Speed 25365.60 samples/sec Loss 8.5394 LearningRate 0.0008 Epoch: 3 Global Step: 5820 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:14:23,690-Speed 25247.59 samples/sec Loss 8.5146 LearningRate 0.0008 Epoch: 3 Global Step: 5830 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:14:33,554-Speed 24917.01 samples/sec Loss 8.4958 LearningRate 0.0008 Epoch: 3 Global Step: 5840 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:14:43,362-Speed 25061.99 samples/sec Loss 8.4932 LearningRate 0.0008 Epoch: 3 Global Step: 5850 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:14:53,127-Speed 25176.04 samples/sec Loss 8.4356 LearningRate 0.0008 Epoch: 3 Global Step: 5860 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:15:02,892-Speed 25170.01 samples/sec Loss 8.4244 LearningRate 0.0008 Epoch: 3 Global Step: 5870 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:15:12,778-Speed 24863.24 samples/sec Loss 8.3853 LearningRate 0.0009 Epoch: 3 Global Step: 5880 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:15:22,530-Speed 25204.48 samples/sec Loss 8.3750 LearningRate 0.0009 Epoch: 3 Global Step: 5890 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:15:32,286-Speed 25194.25 samples/sec Loss 8.3303 LearningRate 0.0009 Epoch: 3 Global Step: 5900 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:15:42,143-Speed 24932.52 samples/sec Loss 8.3864 LearningRate 0.0009 Epoch: 3 Global Step: 5910 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:15:52,057-Speed 24793.18 samples/sec Loss 8.3414 LearningRate 0.0009 Epoch: 3 Global Step: 5920 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:16:01,916-Speed 24929.77 samples/sec Loss 8.4041 LearningRate 0.0009 Epoch: 3 Global Step: 5930 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:16:11,687-Speed 25155.50 samples/sec Loss 8.3395 LearningRate 0.0009 Epoch: 3 Global Step: 5940 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:16:21,419-Speed 25253.53 samples/sec Loss 8.3195 LearningRate 0.0009 Epoch: 3 Global Step: 5950 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:16:31,230-Speed 25052.34 samples/sec Loss 8.3377 LearningRate 0.0009 Epoch: 3 Global Step: 5960 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:16:40,972-Speed 25230.44 samples/sec Loss 8.3686 LearningRate 0.0009 Epoch: 3 Global Step: 5970 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:16:50,721-Speed 25212.55 samples/sec Loss 8.3669 LearningRate 0.0009 Epoch: 3 Global Step: 5980 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:17:00,519-Speed 25090.97 samples/sec Loss 8.3554 LearningRate 0.0009 Epoch: 3 Global Step: 5990 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:17:10,335-Speed 25040.95 samples/sec Loss 8.2894 LearningRate 0.0009 Epoch: 3 Global Step: 6000 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:17:20,109-Speed 25147.01 samples/sec Loss 8.2254 LearningRate 0.0009 Epoch: 3 Global Step: 6010 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:17:29,819-Speed 25313.54 samples/sec Loss 8.2284 LearningRate 0.0009 Epoch: 3 Global Step: 6020 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:17:39,574-Speed 25197.02 samples/sec Loss 8.1942 LearningRate 0.0009 Epoch: 3 Global Step: 6030 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:17:49,402-Speed 25009.50 samples/sec Loss 8.2376 LearningRate 0.0009 Epoch: 3 Global Step: 6040 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:17:59,149-Speed 25216.90 samples/sec Loss 8.2390 LearningRate 0.0009 Epoch: 3 Global Step: 6050 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:18:09,007-Speed 24933.33 samples/sec Loss 8.2134 LearningRate 0.0009 Epoch: 3 Global Step: 6060 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:18:18,770-Speed 25176.86 samples/sec Loss 8.1617 LearningRate 0.0009 Epoch: 3 Global Step: 6070 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:18:28,554-Speed 25121.06 samples/sec Loss 8.2070 LearningRate 0.0009 Epoch: 3 Global Step: 6080 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:18:38,378-Speed 25018.68 samples/sec Loss 8.2319 LearningRate 0.0009 Epoch: 3 Global Step: 6090 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:18:48,197-Speed 25032.15 samples/sec Loss 8.1441 LearningRate 0.0009 Epoch: 3 Global Step: 6100 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:18:57,983-Speed 25114.55 samples/sec Loss 8.1726 LearningRate 0.0009 Epoch: 3 Global Step: 6110 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:19:07,903-Speed 24778.72 samples/sec Loss 8.1756 LearningRate 0.0009 Epoch: 3 Global Step: 6120 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:19:17,787-Speed 24871.80 samples/sec Loss 8.1066 LearningRate 0.0009 Epoch: 3 Global Step: 6130 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:19:27,544-Speed 25190.26 samples/sec Loss 8.0820 LearningRate 0.0009 Epoch: 3 Global Step: 6140 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:19:37,423-Speed 24880.18 samples/sec Loss 8.1431 LearningRate 0.0009 Epoch: 3 Global Step: 6150 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:19:47,197-Speed 25145.45 samples/sec Loss 8.1249 LearningRate 0.0009 Epoch: 3 Global Step: 6160 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:19:57,196-Speed 24584.11 samples/sec Loss 8.1091 LearningRate 0.0009 Epoch: 3 Global Step: 6170 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:20:07,098-Speed 24821.25 samples/sec Loss 7.9841 LearningRate 0.0009 Epoch: 3 Global Step: 6180 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:20:16,889-Speed 25112.39 samples/sec Loss 7.9398 LearningRate 0.0009 Epoch: 3 Global Step: 6190 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:20:26,701-Speed 25049.36 samples/sec Loss 8.0519 LearningRate 0.0009 Epoch: 3 Global Step: 6200 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:20:36,476-Speed 25145.17 samples/sec Loss 8.0810 LearningRate 0.0009 Epoch: 3 Global Step: 6210 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:20:46,225-Speed 25210.99 samples/sec Loss 8.0225 LearningRate 0.0009 Epoch: 3 Global Step: 6220 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:20:56,108-Speed 24876.74 samples/sec Loss 8.0601 LearningRate 0.0009 Epoch: 3 Global Step: 6230 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:21:06,356-Speed 23983.17 samples/sec Loss 8.0344 LearningRate 0.0009 Epoch: 3 Global Step: 6240 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:21:16,341-Speed 24615.50 samples/sec Loss 7.9839 LearningRate 0.0009 Epoch: 3 Global Step: 6250 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:21:26,420-Speed 24386.81 samples/sec Loss 7.9602 LearningRate 0.0009 Epoch: 3 Global Step: 6260 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:21:36,540-Speed 24289.46 samples/sec Loss 7.9311 LearningRate 0.0009 Epoch: 3 Global Step: 6270 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:21:46,612-Speed 24412.34 samples/sec Loss 7.9621 LearningRate 0.0009 Epoch: 3 Global Step: 6280 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:21:56,690-Speed 24389.53 samples/sec Loss 8.0333 LearningRate 0.0009 Epoch: 3 Global Step: 6290 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:22:06,812-Speed 24282.23 samples/sec Loss 7.8987 LearningRate 0.0009 Epoch: 3 Global Step: 6300 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:22:16,817-Speed 24566.48 samples/sec Loss 7.9394 LearningRate 0.0009 Epoch: 3 Global Step: 6310 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:22:27,113-Speed 23873.23 samples/sec Loss 7.9426 LearningRate 0.0009 Epoch: 3 Global Step: 6320 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:22:37,257-Speed 24229.24 samples/sec Loss 7.9326 LearningRate 0.0009 Epoch: 3 Global Step: 6330 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:22:47,303-Speed 24466.20 samples/sec Loss 7.9177 LearningRate 0.0009 Epoch: 3 Global Step: 6340 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-26 00:22:57,283-Speed 24628.79 samples/sec Loss 7.8422 LearningRate 0.0009 Epoch: 3 Global Step: 6350 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-26 00:23:07,345-Speed 24428.02 samples/sec Loss 7.8939 LearningRate 0.0009 Epoch: 3 Global Step: 6360 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-26 00:23:17,382-Speed 24489.82 samples/sec Loss 7.8812 LearningRate 0.0009 Epoch: 3 Global Step: 6370 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-26 00:23:27,390-Speed 24560.68 samples/sec Loss 7.7782 LearningRate 0.0009 Epoch: 3 Global Step: 6380 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-26 00:23:37,332-Speed 24721.38 samples/sec Loss 7.8224 LearningRate 0.0009 Epoch: 3 Global Step: 6390 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-26 00:23:47,414-Speed 24381.34 samples/sec Loss 7.8082 LearningRate 0.0009 Epoch: 3 Global Step: 6400 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-26 00:23:57,609-Speed 24109.10 samples/sec Loss 7.7722 LearningRate 0.0009 Epoch: 3 Global Step: 6410 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-26 00:24:07,675-Speed 24418.09 samples/sec Loss 7.7975 LearningRate 0.0009 Epoch: 3 Global Step: 6420 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-26 00:24:17,595-Speed 24775.20 samples/sec Loss 7.7403 LearningRate 0.0009 Epoch: 3 Global Step: 6430 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-26 00:24:27,614-Speed 24533.08 samples/sec Loss 7.7146 LearningRate 0.0009 Epoch: 3 Global Step: 6440 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-26 00:24:37,612-Speed 24583.15 samples/sec Loss 7.7962 LearningRate 0.0009 Epoch: 3 Global Step: 6450 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:24:47,688-Speed 24392.39 samples/sec Loss 7.7953 LearningRate 0.0009 Epoch: 3 Global Step: 6460 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:24:57,756-Speed 24413.54 samples/sec Loss 7.7941 LearningRate 0.0009 Epoch: 3 Global Step: 6470 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:25:07,792-Speed 24490.69 samples/sec Loss 7.7030 LearningRate 0.0009 Epoch: 3 Global Step: 6480 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:25:17,895-Speed 24330.83 samples/sec Loss 7.7171 LearningRate 0.0009 Epoch: 3 Global Step: 6490 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:25:27,893-Speed 24582.19 samples/sec Loss 7.7340 LearningRate 0.0009 Epoch: 3 Global Step: 6500 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:25:37,926-Speed 24498.67 samples/sec Loss 7.6852 LearningRate 0.0009 Epoch: 3 Global Step: 6510 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:25:48,012-Speed 24368.77 samples/sec Loss 7.7294 LearningRate 0.0009 Epoch: 3 Global Step: 6520 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:25:58,037-Speed 24518.46 samples/sec Loss 7.6826 LearningRate 0.0009 Epoch: 3 Global Step: 6530 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:26:08,195-Speed 24196.59 samples/sec Loss 7.7042 LearningRate 0.0009 Epoch: 3 Global Step: 6540 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-03-26 00:26:18,226-Speed 24502.38 samples/sec Loss 7.6462 LearningRate 0.0009 Epoch: 3 Global Step: 6550 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:26:28,173-Speed 24710.91 samples/sec Loss 7.6830 LearningRate 0.0009 Epoch: 3 Global Step: 6560 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:26:37,959-Speed 25116.16 samples/sec Loss 7.6741 LearningRate 0.0010 Epoch: 3 Global Step: 6570 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:26:47,767-Speed 25059.93 samples/sec Loss 7.6447 LearningRate 0.0010 Epoch: 3 Global Step: 6580 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:26:57,484-Speed 25293.30 samples/sec Loss 7.6585 LearningRate 0.0010 Epoch: 3 Global Step: 6590 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:27:07,377-Speed 24845.76 samples/sec Loss 7.6434 LearningRate 0.0010 Epoch: 3 Global Step: 6600 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:27:17,217-Speed 24979.21 samples/sec Loss 7.6320 LearningRate 0.0010 Epoch: 3 Global Step: 6610 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:27:27,052-Speed 24993.21 samples/sec Loss 7.6203 LearningRate 0.0010 Epoch: 3 Global Step: 6620 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:27:36,867-Speed 25040.50 samples/sec Loss 7.5565 LearningRate 0.0010 Epoch: 3 Global Step: 6630 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:27:46,696-Speed 25008.15 samples/sec Loss 7.6590 LearningRate 0.0010 Epoch: 3 Global Step: 6640 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:27:56,485-Speed 25108.67 samples/sec Loss 7.6426 LearningRate 0.0010 Epoch: 3 Global Step: 6650 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:28:06,338-Speed 24946.67 samples/sec Loss 7.5881 LearningRate 0.0010 Epoch: 3 Global Step: 6660 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:28:16,303-Speed 24665.76 samples/sec Loss 7.5379 LearningRate 0.0010 Epoch: 3 Global Step: 6670 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:28:26,037-Speed 25249.66 samples/sec Loss 7.5526 LearningRate 0.0010 Epoch: 3 Global Step: 6680 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:28:35,919-Speed 24873.60 samples/sec Loss 7.5407 LearningRate 0.0010 Epoch: 3 Global Step: 6690 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:28:45,676-Speed 25190.25 samples/sec Loss 7.5633 LearningRate 0.0010 Epoch: 3 Global Step: 6700 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:28:55,337-Speed 25440.25 samples/sec Loss 7.5421 LearningRate 0.0010 Epoch: 3 Global Step: 6710 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:29:05,106-Speed 25160.23 samples/sec Loss 7.4941 LearningRate 0.0010 Epoch: 3 Global Step: 6720 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:29:14,882-Speed 25143.61 samples/sec Loss 7.4862 LearningRate 0.0010 Epoch: 3 Global Step: 6730 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:29:24,768-Speed 24863.46 samples/sec Loss 7.5958 LearningRate 0.0010 Epoch: 3 Global Step: 6740 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:29:34,634-Speed 24918.15 samples/sec Loss 7.4902 LearningRate 0.0010 Epoch: 3 Global Step: 6750 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:29:44,536-Speed 24820.67 samples/sec Loss 7.4514 LearningRate 0.0010 Epoch: 3 Global Step: 6760 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:29:54,440-Speed 24817.93 samples/sec Loss 7.4421 LearningRate 0.0010 Epoch: 3 Global Step: 6770 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:30:04,258-Speed 25034.29 samples/sec Loss 7.4929 LearningRate 0.0010 Epoch: 3 Global Step: 6780 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:30:14,132-Speed 24893.92 samples/sec Loss 7.3748 LearningRate 0.0010 Epoch: 3 Global Step: 6790 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:30:24,056-Speed 24767.21 samples/sec Loss 7.4649 LearningRate 0.0010 Epoch: 3 Global Step: 6800 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:30:33,874-Speed 25035.99 samples/sec Loss 7.3780 LearningRate 0.0010 Epoch: 3 Global Step: 6810 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:30:43,917-Speed 24474.72 samples/sec Loss 7.4102 LearningRate 0.0010 Epoch: 3 Global Step: 6820 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:30:53,933-Speed 24538.75 samples/sec Loss 7.3705 LearningRate 0.0010 Epoch: 3 Global Step: 6830 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:31:03,806-Speed 24895.82 samples/sec Loss 7.3566 LearningRate 0.0010 Epoch: 3 Global Step: 6840 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:31:13,590-Speed 25121.83 samples/sec Loss 7.3351 LearningRate 0.0010 Epoch: 3 Global Step: 6850 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:31:23,373-Speed 25124.41 samples/sec Loss 7.4099 LearningRate 0.0010 Epoch: 3 Global Step: 6860 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:31:33,309-Speed 24734.85 samples/sec Loss 7.4066 LearningRate 0.0010 Epoch: 3 Global Step: 6870 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:31:43,174-Speed 24917.49 samples/sec Loss 7.4310 LearningRate 0.0010 Epoch: 3 Global Step: 6880 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:31:52,994-Speed 25036.63 samples/sec Loss 7.3806 LearningRate 0.0010 Epoch: 3 Global Step: 6890 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:32:02,935-Speed 24724.54 samples/sec Loss 7.3986 LearningRate 0.0010 Epoch: 3 Global Step: 6900 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:32:12,860-Speed 24764.29 samples/sec Loss 7.4012 LearningRate 0.0010 Epoch: 3 Global Step: 6910 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:33:13,442-Speed 4056.76 samples/sec Loss 7.3120 LearningRate 0.0010 Epoch: 4 Global Step: 6920 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:33:23,238-Speed 25092.98 samples/sec Loss 7.2355 LearningRate 0.0010 Epoch: 4 Global Step: 6930 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:33:33,012-Speed 25148.24 samples/sec Loss 7.2313 LearningRate 0.0010 Epoch: 4 Global Step: 6940 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:33:42,883-Speed 24899.68 samples/sec Loss 7.2063 LearningRate 0.0010 Epoch: 4 Global Step: 6950 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:33:52,723-Speed 24984.04 samples/sec Loss 7.2832 LearningRate 0.0010 Epoch: 4 Global Step: 6960 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:34:02,424-Speed 25335.26 samples/sec Loss 7.2097 LearningRate 0.0010 Epoch: 4 Global Step: 6970 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:34:12,334-Speed 24803.02 samples/sec Loss 7.2662 LearningRate 0.0010 Epoch: 4 Global Step: 6980 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:34:22,272-Speed 24731.73 samples/sec Loss 7.1920 LearningRate 0.0010 Epoch: 4 Global Step: 6990 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:34:32,064-Speed 25101.05 samples/sec Loss 7.2773 LearningRate 0.0010 Epoch: 4 Global Step: 7000 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:34:41,759-Speed 25350.28 samples/sec Loss 7.1750 LearningRate 0.0010 Epoch: 4 Global Step: 7010 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:34:51,676-Speed 24787.00 samples/sec Loss 7.2162 LearningRate 0.0010 Epoch: 4 Global Step: 7020 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:35:01,449-Speed 25148.21 samples/sec Loss 7.1763 LearningRate 0.0010 Epoch: 4 Global Step: 7030 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:35:11,270-Speed 25027.46 samples/sec Loss 7.1563 LearningRate 0.0010 Epoch: 4 Global Step: 7040 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:35:21,117-Speed 24960.25 samples/sec Loss 7.0961 LearningRate 0.0010 Epoch: 4 Global Step: 7050 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-03-26 00:35:30,907-Speed 25105.62 samples/sec Loss 7.1668 LearningRate 0.0010 Epoch: 4 Global Step: 7060 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:35:40,790-Speed 24868.34 samples/sec Loss 7.1404 LearningRate 0.0010 Epoch: 4 Global Step: 7070 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:35:50,575-Speed 25119.82 samples/sec Loss 7.1577 LearningRate 0.0010 Epoch: 4 Global Step: 7080 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:36:00,459-Speed 24868.66 samples/sec Loss 7.1018 LearningRate 0.0010 Epoch: 4 Global Step: 7090 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:36:10,244-Speed 25118.32 samples/sec Loss 7.1295 LearningRate 0.0010 Epoch: 4 Global Step: 7100 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:36:19,995-Speed 25206.64 samples/sec Loss 7.0792 LearningRate 0.0010 Epoch: 4 Global Step: 7110 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-26 00:36:29,910-Speed 24791.88 samples/sec Loss 7.0970 LearningRate 0.0010 Epoch: 4 Global Step: 7120 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-26 00:36:39,669-Speed 25184.28 samples/sec Loss 7.0911 LearningRate 0.0010 Epoch: 4 Global Step: 7130 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-26 00:36:49,428-Speed 25186.38 samples/sec Loss 7.0420 LearningRate 0.0010 Epoch: 4 Global Step: 7140 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-26 00:36:59,259-Speed 25001.78 samples/sec Loss 7.0417 LearningRate 0.0010 Epoch: 4 Global Step: 7150 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-26 00:37:09,000-Speed 25232.50 samples/sec Loss 7.0684 LearningRate 0.0010 Epoch: 4 Global Step: 7160 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-26 00:37:18,767-Speed 25164.48 samples/sec Loss 7.0660 LearningRate 0.0010 Epoch: 4 Global Step: 7170 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-26 00:37:28,517-Speed 25208.99 samples/sec Loss 7.0251 LearningRate 0.0010 Epoch: 4 Global Step: 7180 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-26 00:37:38,426-Speed 24804.74 samples/sec Loss 7.0165 LearningRate 0.0010 Epoch: 4 Global Step: 7190 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-26 00:37:48,191-Speed 25172.18 samples/sec Loss 6.9950 LearningRate 0.0010 Epoch: 4 Global Step: 7200 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-26 00:37:57,952-Speed 25179.39 samples/sec Loss 6.9845 LearningRate 0.0010 Epoch: 4 Global Step: 7210 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:38:07,878-Speed 24764.69 samples/sec Loss 7.0303 LearningRate 0.0010 Epoch: 4 Global Step: 7220 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:38:17,680-Speed 25075.88 samples/sec Loss 6.9704 LearningRate 0.0010 Epoch: 4 Global Step: 7230 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:38:27,402-Speed 25280.90 samples/sec Loss 6.9641 LearningRate 0.0010 Epoch: 4 Global Step: 7240 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:38:37,183-Speed 25128.71 samples/sec Loss 6.9684 LearningRate 0.0010 Epoch: 4 Global Step: 7250 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:38:46,862-Speed 25394.15 samples/sec Loss 6.9266 LearningRate 0.0010 Epoch: 4 Global Step: 7260 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:38:56,651-Speed 25110.98 samples/sec Loss 6.9443 LearningRate 0.0010 Epoch: 4 Global Step: 7270 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:39:06,465-Speed 25044.62 samples/sec Loss 6.9226 LearningRate 0.0010 Epoch: 4 Global Step: 7280 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:39:16,338-Speed 24893.78 samples/sec Loss 6.9126 LearningRate 0.0010 Epoch: 4 Global Step: 7290 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:39:26,190-Speed 24950.89 samples/sec Loss 6.8732 LearningRate 0.0010 Epoch: 4 Global Step: 7300 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:39:35,963-Speed 25149.04 samples/sec Loss 6.9062 LearningRate 0.0010 Epoch: 4 Global Step: 7310 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:39:45,813-Speed 24954.48 samples/sec Loss 6.8869 LearningRate 0.0010 Epoch: 4 Global Step: 7320 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:39:55,536-Speed 25288.26 samples/sec Loss 6.8599 LearningRate 0.0010 Epoch: 4 Global Step: 7330 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:40:05,448-Speed 24800.87 samples/sec Loss 6.8219 LearningRate 0.0010 Epoch: 4 Global Step: 7340 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:40:15,201-Speed 25200.67 samples/sec Loss 6.8643 LearningRate 0.0010 Epoch: 4 Global Step: 7350 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:40:24,965-Speed 25174.44 samples/sec Loss 6.8795 LearningRate 0.0010 Epoch: 4 Global Step: 7360 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:40:34,834-Speed 24906.00 samples/sec Loss 6.8164 LearningRate 0.0010 Epoch: 4 Global Step: 7370 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:40:44,679-Speed 24965.69 samples/sec Loss 6.8160 LearningRate 0.0010 Epoch: 4 Global Step: 7380 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:40:54,625-Speed 24713.68 samples/sec Loss 6.8320 LearningRate 0.0010 Epoch: 4 Global Step: 7390 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:41:04,409-Speed 25121.99 samples/sec Loss 6.8038 LearningRate 0.0010 Epoch: 4 Global Step: 7400 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:41:14,162-Speed 25201.41 samples/sec Loss 6.7783 LearningRate 0.0010 Epoch: 4 Global Step: 7410 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:41:23,951-Speed 25109.55 samples/sec Loss 6.7702 LearningRate 0.0010 Epoch: 4 Global Step: 7420 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:41:33,694-Speed 25226.67 samples/sec Loss 6.7572 LearningRate 0.0010 Epoch: 4 Global Step: 7430 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:41:43,446-Speed 25202.81 samples/sec Loss 6.8119 LearningRate 0.0010 Epoch: 4 Global Step: 7440 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:41:53,230-Speed 25122.74 samples/sec Loss 6.7465 LearningRate 0.0010 Epoch: 4 Global Step: 7450 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:42:03,007-Speed 25138.89 samples/sec Loss 6.7089 LearningRate 0.0010 Epoch: 4 Global Step: 7460 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:42:12,769-Speed 25178.52 samples/sec Loss 6.7120 LearningRate 0.0010 Epoch: 4 Global Step: 7470 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:42:22,589-Speed 25029.56 samples/sec Loss 6.7161 LearningRate 0.0010 Epoch: 4 Global Step: 7480 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:42:32,561-Speed 24648.26 samples/sec Loss 6.7144 LearningRate 0.0010 Epoch: 4 Global Step: 7490 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:42:42,411-Speed 24956.95 samples/sec Loss 6.6788 LearningRate 0.0010 Epoch: 4 Global Step: 7500 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:42:52,169-Speed 25186.90 samples/sec Loss 6.6832 LearningRate 0.0010 Epoch: 4 Global Step: 7510 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-03-26 00:43:01,942-Speed 25149.93 samples/sec Loss 6.6942 LearningRate 0.0010 Epoch: 4 Global Step: 7520 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:43:11,739-Speed 25088.36 samples/sec Loss 6.7026 LearningRate 0.0010 Epoch: 4 Global Step: 7530 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:43:21,475-Speed 25245.51 samples/sec Loss 6.6500 LearningRate 0.0010 Epoch: 4 Global Step: 7540 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:43:31,307-Speed 25000.41 samples/sec Loss 6.6177 LearningRate 0.0010 Epoch: 4 Global Step: 7550 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:43:40,999-Speed 25358.90 samples/sec Loss 6.6011 LearningRate 0.0010 Epoch: 4 Global Step: 7560 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:43:50,806-Speed 25065.61 samples/sec Loss 6.5918 LearningRate 0.0010 Epoch: 4 Global Step: 7570 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:44:00,581-Speed 25144.75 samples/sec Loss 6.5792 LearningRate 0.0010 Epoch: 4 Global Step: 7580 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:44:10,320-Speed 25240.05 samples/sec Loss 6.5603 LearningRate 0.0010 Epoch: 4 Global Step: 7590 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:44:20,148-Speed 25009.33 samples/sec Loss 6.5855 LearningRate 0.0010 Epoch: 4 Global Step: 7600 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:44:29,887-Speed 25235.62 samples/sec Loss 6.5605 LearningRate 0.0010 Epoch: 4 Global Step: 7610 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:44:39,588-Speed 25337.60 samples/sec Loss 6.5597 LearningRate 0.0010 Epoch: 4 Global Step: 7620 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:44:49,332-Speed 25227.13 samples/sec Loss 6.5594 LearningRate 0.0010 Epoch: 4 Global Step: 7630 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:44:59,255-Speed 24767.62 samples/sec Loss 6.5055 LearningRate 0.0010 Epoch: 4 Global Step: 7640 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:45:09,110-Speed 24940.18 samples/sec Loss 6.5126 LearningRate 0.0010 Epoch: 4 Global Step: 7650 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:45:18,887-Speed 25139.64 samples/sec Loss 6.4882 LearningRate 0.0010 Epoch: 4 Global Step: 7660 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:45:28,690-Speed 25071.92 samples/sec Loss 6.5267 LearningRate 0.0010 Epoch: 4 Global Step: 7670 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:45:38,510-Speed 25030.03 samples/sec Loss 6.5036 LearningRate 0.0010 Epoch: 4 Global Step: 7680 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:45:48,298-Speed 25114.06 samples/sec Loss 6.4935 LearningRate 0.0010 Epoch: 4 Global Step: 7690 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:45:58,116-Speed 25035.11 samples/sec Loss 6.5155 LearningRate 0.0010 Epoch: 4 Global Step: 7700 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:46:08,010-Speed 24841.16 samples/sec Loss 6.4691 LearningRate 0.0010 Epoch: 4 Global Step: 7710 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:46:17,879-Speed 24907.08 samples/sec Loss 6.5049 LearningRate 0.0010 Epoch: 4 Global Step: 7720 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:46:27,734-Speed 24939.82 samples/sec Loss 6.4672 LearningRate 0.0010 Epoch: 4 Global Step: 7730 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:46:37,525-Speed 25103.28 samples/sec Loss 6.4665 LearningRate 0.0010 Epoch: 4 Global Step: 7740 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:46:47,286-Speed 25196.27 samples/sec Loss 6.4196 LearningRate 0.0010 Epoch: 4 Global Step: 7750 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:46:57,080-Speed 25101.16 samples/sec Loss 6.3943 LearningRate 0.0010 Epoch: 4 Global Step: 7760 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:47:06,828-Speed 25212.10 samples/sec Loss 6.4047 LearningRate 0.0010 Epoch: 4 Global Step: 7770 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:47:16,621-Speed 25100.17 samples/sec Loss 6.3908 LearningRate 0.0010 Epoch: 4 Global Step: 7780 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:47:26,390-Speed 25161.63 samples/sec Loss 6.3924 LearningRate 0.0010 Epoch: 4 Global Step: 7790 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:47:36,171-Speed 25128.48 samples/sec Loss 6.3877 LearningRate 0.0010 Epoch: 4 Global Step: 7800 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:47:45,939-Speed 25165.34 samples/sec Loss 6.3467 LearningRate 0.0010 Epoch: 4 Global Step: 7810 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:47:55,855-Speed 24786.69 samples/sec Loss 6.3343 LearningRate 0.0010 Epoch: 4 Global Step: 7820 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-03-26 00:48:05,660-Speed 25069.68 samples/sec Loss 6.3343 LearningRate 0.0010 Epoch: 4 Global Step: 7830 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:48:15,654-Speed 24595.86 samples/sec Loss 6.3095 LearningRate 0.0010 Epoch: 4 Global Step: 7840 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:48:25,471-Speed 25038.37 samples/sec Loss 6.3493 LearningRate 0.0010 Epoch: 4 Global Step: 7850 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:48:35,363-Speed 24846.99 samples/sec Loss 6.3432 LearningRate 0.0010 Epoch: 4 Global Step: 7860 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:48:45,297-Speed 24742.73 samples/sec Loss 6.2847 LearningRate 0.0010 Epoch: 4 Global Step: 7870 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:48:55,309-Speed 24550.60 samples/sec Loss 6.2820 LearningRate 0.0010 Epoch: 4 Global Step: 7880 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:49:05,265-Speed 24690.41 samples/sec Loss 6.2897 LearningRate 0.0010 Epoch: 4 Global Step: 7890 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:49:15,089-Speed 25019.25 samples/sec Loss 6.2793 LearningRate 0.0010 Epoch: 4 Global Step: 7900 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:49:24,976-Speed 24857.00 samples/sec Loss 6.2774 LearningRate 0.0010 Epoch: 4 Global Step: 7910 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:49:34,888-Speed 24799.17 samples/sec Loss 6.2965 LearningRate 0.0010 Epoch: 4 Global Step: 7920 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:49:44,769-Speed 24875.19 samples/sec Loss 6.2792 LearningRate 0.0010 Epoch: 4 Global Step: 7930 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:49:54,964-Speed 24108.19 samples/sec Loss 6.2517 LearningRate 0.0010 Epoch: 4 Global Step: 7940 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:50:05,109-Speed 24227.45 samples/sec Loss 6.2311 LearningRate 0.0010 Epoch: 4 Global Step: 7950 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:50:14,977-Speed 24908.50 samples/sec Loss 6.2340 LearningRate 0.0010 Epoch: 4 Global Step: 7960 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:50:24,841-Speed 24919.23 samples/sec Loss 6.1852 LearningRate 0.0010 Epoch: 4 Global Step: 7970 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:50:34,751-Speed 24801.79 samples/sec Loss 6.1886 LearningRate 0.0010 Epoch: 4 Global Step: 7980 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:50:44,624-Speed 24893.90 samples/sec Loss 6.1971 LearningRate 0.0010 Epoch: 4 Global Step: 7990 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:50:54,470-Speed 24964.55 samples/sec Loss 6.2642 LearningRate 0.0010 Epoch: 4 Global Step: 8000 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:51:04,292-Speed 25026.09 samples/sec Loss 6.1767 LearningRate 0.0010 Epoch: 4 Global Step: 8010 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:51:14,211-Speed 24778.37 samples/sec Loss 6.1355 LearningRate 0.0010 Epoch: 4 Global Step: 8020 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:51:24,099-Speed 24857.83 samples/sec Loss 6.1776 LearningRate 0.0010 Epoch: 4 Global Step: 8030 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-26 00:51:34,003-Speed 24817.98 samples/sec Loss 6.1645 LearningRate 0.0010 Epoch: 4 Global Step: 8040 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-26 00:51:43,930-Speed 24761.19 samples/sec Loss 6.1438 LearningRate 0.0010 Epoch: 4 Global Step: 8050 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-26 00:51:53,788-Speed 24932.35 samples/sec Loss 6.1255 LearningRate 0.0010 Epoch: 4 Global Step: 8060 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-26 00:52:03,716-Speed 24756.53 samples/sec Loss 6.1375 LearningRate 0.0010 Epoch: 4 Global Step: 8070 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-26 00:52:13,576-Speed 24927.99 samples/sec Loss 6.1374 LearningRate 0.0010 Epoch: 4 Global Step: 8080 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-26 00:52:23,401-Speed 25023.49 samples/sec Loss 6.1222 LearningRate 0.0010 Epoch: 4 Global Step: 8090 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-26 00:52:33,386-Speed 24616.65 samples/sec Loss 6.0808 LearningRate 0.0010 Epoch: 4 Global Step: 8100 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-26 00:52:43,286-Speed 24827.71 samples/sec Loss 6.1225 LearningRate 0.0010 Epoch: 4 Global Step: 8110 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-26 00:52:53,175-Speed 24854.17 samples/sec Loss 6.0692 LearningRate 0.0010 Epoch: 4 Global Step: 8120 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-26 00:53:03,023-Speed 24959.91 samples/sec Loss 6.0600 LearningRate 0.0010 Epoch: 4 Global Step: 8130 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:53:13,128-Speed 24324.10 samples/sec Loss 6.0875 LearningRate 0.0010 Epoch: 4 Global Step: 8140 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:53:23,029-Speed 24822.74 samples/sec Loss 6.0331 LearningRate 0.0010 Epoch: 4 Global Step: 8150 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:53:32,933-Speed 24818.97 samples/sec Loss 6.0501 LearningRate 0.0010 Epoch: 4 Global Step: 8160 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:53:42,852-Speed 24779.55 samples/sec Loss 6.0627 LearningRate 0.0010 Epoch: 4 Global Step: 8170 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:53:52,650-Speed 25083.99 samples/sec Loss 6.0554 LearningRate 0.0010 Epoch: 4 Global Step: 8180 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:54:02,632-Speed 24623.47 samples/sec Loss 6.0250 LearningRate 0.0010 Epoch: 4 Global Step: 8190 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:54:12,415-Speed 25125.29 samples/sec Loss 6.0207 LearningRate 0.0010 Epoch: 4 Global Step: 8200 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:54:22,376-Speed 24674.77 samples/sec Loss 6.0172 LearningRate 0.0010 Epoch: 4 Global Step: 8210 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:54:32,243-Speed 24913.42 samples/sec Loss 6.0350 LearningRate 0.0010 Epoch: 4 Global Step: 8220 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:54:42,269-Speed 24513.77 samples/sec Loss 5.9769 LearningRate 0.0010 Epoch: 4 Global Step: 8230 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:54:52,140-Speed 24900.47 samples/sec Loss 5.9299 LearningRate 0.0010 Epoch: 4 Global Step: 8240 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:55:01,974-Speed 24995.76 samples/sec Loss 5.9888 LearningRate 0.0010 Epoch: 4 Global Step: 8250 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:55:11,813-Speed 24985.69 samples/sec Loss 5.9866 LearningRate 0.0010 Epoch: 4 Global Step: 8260 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:55:21,718-Speed 24815.43 samples/sec Loss 5.9250 LearningRate 0.0010 Epoch: 4 Global Step: 8270 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:55:31,636-Speed 24782.82 samples/sec Loss 5.9610 LearningRate 0.0010 Epoch: 4 Global Step: 8280 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:55:41,443-Speed 25061.71 samples/sec Loss 5.9377 LearningRate 0.0010 Epoch: 4 Global Step: 8290 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:55:51,237-Speed 25095.92 samples/sec Loss 5.9618 LearningRate 0.0010 Epoch: 4 Global Step: 8300 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:56:01,242-Speed 24565.77 samples/sec Loss 5.9636 LearningRate 0.0010 Epoch: 4 Global Step: 8310 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:56:11,066-Speed 25020.01 samples/sec Loss 5.9302 LearningRate 0.0010 Epoch: 4 Global Step: 8320 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:56:20,883-Speed 25039.21 samples/sec Loss 5.9335 LearningRate 0.0010 Epoch: 4 Global Step: 8330 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:56:30,772-Speed 24854.31 samples/sec Loss 5.8877 LearningRate 0.0010 Epoch: 4 Global Step: 8340 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:56:40,586-Speed 25043.62 samples/sec Loss 5.9028 LearningRate 0.0010 Epoch: 4 Global Step: 8350 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:56:50,415-Speed 25008.99 samples/sec Loss 5.9152 LearningRate 0.0010 Epoch: 4 Global Step: 8360 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:57:00,340-Speed 24763.54 samples/sec Loss 5.8824 LearningRate 0.0010 Epoch: 4 Global Step: 8370 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:57:10,299-Speed 24682.13 samples/sec Loss 5.8274 LearningRate 0.0010 Epoch: 4 Global Step: 8380 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:57:20,408-Speed 24318.89 samples/sec Loss 5.8898 LearningRate 0.0010 Epoch: 4 Global Step: 8390 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:57:30,370-Speed 24674.75 samples/sec Loss 5.8577 LearningRate 0.0010 Epoch: 4 Global Step: 8400 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:57:40,351-Speed 24623.98 samples/sec Loss 5.8445 LearningRate 0.0010 Epoch: 4 Global Step: 8410 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:57:50,325-Speed 24643.27 samples/sec Loss 5.8427 LearningRate 0.0010 Epoch: 4 Global Step: 8420 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:58:00,177-Speed 24950.09 samples/sec Loss 5.8272 LearningRate 0.0010 Epoch: 4 Global Step: 8430 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:58:10,092-Speed 24787.69 samples/sec Loss 5.8205 LearningRate 0.0010 Epoch: 4 Global Step: 8440 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:58:19,923-Speed 25004.02 samples/sec Loss 5.8164 LearningRate 0.0010 Epoch: 4 Global Step: 8450 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:58:29,792-Speed 24903.38 samples/sec Loss 5.8369 LearningRate 0.0010 Epoch: 4 Global Step: 8460 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:58:39,751-Speed 24679.93 samples/sec Loss 5.8248 LearningRate 0.0010 Epoch: 4 Global Step: 8470 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:58:49,651-Speed 24827.36 samples/sec Loss 5.7754 LearningRate 0.0010 Epoch: 4 Global Step: 8480 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:58:59,547-Speed 24835.21 samples/sec Loss 5.7766 LearningRate 0.0009 Epoch: 4 Global Step: 8490 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:59:09,396-Speed 24956.56 samples/sec Loss 5.7966 LearningRate 0.0009 Epoch: 4 Global Step: 8500 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:59:19,225-Speed 25006.18 samples/sec Loss 5.8486 LearningRate 0.0009 Epoch: 4 Global Step: 8510 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:59:29,062-Speed 24985.32 samples/sec Loss 5.7734 LearningRate 0.0009 Epoch: 4 Global Step: 8520 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:59:38,969-Speed 24809.09 samples/sec Loss 5.7463 LearningRate 0.0009 Epoch: 4 Global Step: 8530 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:59:48,670-Speed 25336.39 samples/sec Loss 5.7454 LearningRate 0.0009 Epoch: 4 Global Step: 8540 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 00:59:58,423-Speed 25200.05 samples/sec Loss 5.7176 LearningRate 0.0009 Epoch: 4 Global Step: 8550 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:00:08,131-Speed 25318.19 samples/sec Loss 5.7334 LearningRate 0.0009 Epoch: 4 Global Step: 8560 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:00:17,971-Speed 24978.16 samples/sec Loss 5.7634 LearningRate 0.0009 Epoch: 4 Global Step: 8570 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:00:27,690-Speed 25289.73 samples/sec Loss 5.7660 LearningRate 0.0009 Epoch: 4 Global Step: 8580 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:00:37,492-Speed 25075.29 samples/sec Loss 5.7667 LearningRate 0.0009 Epoch: 4 Global Step: 8590 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:00:47,238-Speed 25217.85 samples/sec Loss 5.7512 LearningRate 0.0009 Epoch: 4 Global Step: 8600 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:00:56,910-Speed 25411.08 samples/sec Loss 5.7362 LearningRate 0.0009 Epoch: 4 Global Step: 8610 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:01:06,688-Speed 25137.25 samples/sec Loss 5.7451 LearningRate 0.0009 Epoch: 4 Global Step: 8620 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:01:16,376-Speed 25370.97 samples/sec Loss 5.7614 LearningRate 0.0009 Epoch: 4 Global Step: 8630 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:01:26,107-Speed 25257.12 samples/sec Loss 5.7384 LearningRate 0.0009 Epoch: 4 Global Step: 8640 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:02:26,210-Speed 4089.12 samples/sec Loss 5.6603 LearningRate 0.0009 Epoch: 5 Global Step: 8650 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:02:35,990-Speed 25131.91 samples/sec Loss 5.5986 LearningRate 0.0009 Epoch: 5 Global Step: 8660 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:02:45,722-Speed 25258.04 samples/sec Loss 5.5975 LearningRate 0.0009 Epoch: 5 Global Step: 8670 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:02:55,417-Speed 25351.88 samples/sec Loss 5.5957 LearningRate 0.0009 Epoch: 5 Global Step: 8680 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:03:05,162-Speed 25224.47 samples/sec Loss 5.6402 LearningRate 0.0009 Epoch: 5 Global Step: 8690 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:03:14,953-Speed 25103.59 samples/sec Loss 5.6274 LearningRate 0.0009 Epoch: 5 Global Step: 8700 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:03:24,752-Speed 25084.45 samples/sec Loss 5.6249 LearningRate 0.0009 Epoch: 5 Global Step: 8710 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-26 01:03:34,521-Speed 25159.61 samples/sec Loss 5.6127 LearningRate 0.0009 Epoch: 5 Global Step: 8720 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-26 01:03:44,430-Speed 24804.51 samples/sec Loss 5.5775 LearningRate 0.0009 Epoch: 5 Global Step: 8730 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-26 01:03:54,203-Speed 25151.51 samples/sec Loss 5.5604 LearningRate 0.0009 Epoch: 5 Global Step: 8740 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-26 01:04:03,966-Speed 25174.51 samples/sec Loss 5.5650 LearningRate 0.0009 Epoch: 5 Global Step: 8750 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-26 01:04:13,737-Speed 25162.23 samples/sec Loss 5.6041 LearningRate 0.0009 Epoch: 5 Global Step: 8760 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-26 01:04:23,522-Speed 25118.33 samples/sec Loss 5.5819 LearningRate 0.0009 Epoch: 5 Global Step: 8770 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-26 01:04:33,345-Speed 25024.02 samples/sec Loss 5.5337 LearningRate 0.0009 Epoch: 5 Global Step: 8780 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-26 01:04:43,062-Speed 25295.02 samples/sec Loss 5.5784 LearningRate 0.0009 Epoch: 5 Global Step: 8790 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-26 01:04:52,829-Speed 25165.70 samples/sec Loss 5.5639 LearningRate 0.0009 Epoch: 5 Global Step: 8800 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-26 01:05:02,607-Speed 25136.82 samples/sec Loss 5.5570 LearningRate 0.0009 Epoch: 5 Global Step: 8810 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:05:12,456-Speed 24957.60 samples/sec Loss 5.6196 LearningRate 0.0009 Epoch: 5 Global Step: 8820 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:05:22,252-Speed 25089.55 samples/sec Loss 5.5806 LearningRate 0.0009 Epoch: 5 Global Step: 8830 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:05:31,986-Speed 25251.77 samples/sec Loss 5.5476 LearningRate 0.0009 Epoch: 5 Global Step: 8840 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:05:41,767-Speed 25129.37 samples/sec Loss 5.5172 LearningRate 0.0009 Epoch: 5 Global Step: 8850 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:05:51,596-Speed 25006.70 samples/sec Loss 5.5306 LearningRate 0.0009 Epoch: 5 Global Step: 8860 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:06:01,398-Speed 25073.74 samples/sec Loss 5.5042 LearningRate 0.0009 Epoch: 5 Global Step: 8870 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:06:11,145-Speed 25216.35 samples/sec Loss 5.5931 LearningRate 0.0009 Epoch: 5 Global Step: 8880 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:06:21,017-Speed 24898.69 samples/sec Loss 5.5065 LearningRate 0.0009 Epoch: 5 Global Step: 8890 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:06:30,823-Speed 25066.55 samples/sec Loss 5.5309 LearningRate 0.0009 Epoch: 5 Global Step: 8900 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:06:40,578-Speed 25197.96 samples/sec Loss 5.5089 LearningRate 0.0009 Epoch: 5 Global Step: 8910 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:06:50,323-Speed 25220.43 samples/sec Loss 5.5014 LearningRate 0.0009 Epoch: 5 Global Step: 8920 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:07:00,149-Speed 25014.83 samples/sec Loss 5.4582 LearningRate 0.0009 Epoch: 5 Global Step: 8930 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:07:09,979-Speed 25004.36 samples/sec Loss 5.4963 LearningRate 0.0009 Epoch: 5 Global Step: 8940 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:07:19,774-Speed 25096.01 samples/sec Loss 5.4979 LearningRate 0.0009 Epoch: 5 Global Step: 8950 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:07:29,488-Speed 25299.76 samples/sec Loss 5.4595 LearningRate 0.0009 Epoch: 5 Global Step: 8960 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:07:39,284-Speed 25091.37 samples/sec Loss 5.4802 LearningRate 0.0009 Epoch: 5 Global Step: 8970 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:07:49,109-Speed 25017.17 samples/sec Loss 5.5310 LearningRate 0.0009 Epoch: 5 Global Step: 8980 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:07:58,895-Speed 25115.55 samples/sec Loss 5.5063 LearningRate 0.0009 Epoch: 5 Global Step: 8990 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:08:08,653-Speed 25188.22 samples/sec Loss 5.4347 LearningRate 0.0009 Epoch: 5 Global Step: 9000 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:08:18,432-Speed 25134.80 samples/sec Loss 5.4446 LearningRate 0.0009 Epoch: 5 Global Step: 9010 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-03-26 01:08:28,241-Speed 25055.26 samples/sec Loss 5.4386 LearningRate 0.0009 Epoch: 5 Global Step: 9020 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:08:38,134-Speed 24845.61 samples/sec Loss 5.4723 LearningRate 0.0009 Epoch: 5 Global Step: 9030 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:08:48,022-Speed 24858.07 samples/sec Loss 5.4443 LearningRate 0.0009 Epoch: 5 Global Step: 9040 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:08:57,939-Speed 24785.18 samples/sec Loss 5.4305 LearningRate 0.0009 Epoch: 5 Global Step: 9050 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:09:07,696-Speed 25191.02 samples/sec Loss 5.4198 LearningRate 0.0009 Epoch: 5 Global Step: 9060 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:09:17,421-Speed 25275.31 samples/sec Loss 5.4088 LearningRate 0.0009 Epoch: 5 Global Step: 9070 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:09:27,221-Speed 25082.68 samples/sec Loss 5.4132 LearningRate 0.0009 Epoch: 5 Global Step: 9080 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:09:36,942-Speed 25283.80 samples/sec Loss 5.4190 LearningRate 0.0009 Epoch: 5 Global Step: 9090 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:09:46,770-Speed 25009.50 samples/sec Loss 5.3721 LearningRate 0.0009 Epoch: 5 Global Step: 9100 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:09:56,674-Speed 24816.35 samples/sec Loss 5.3964 LearningRate 0.0009 Epoch: 5 Global Step: 9110 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:10:06,409-Speed 25246.52 samples/sec Loss 5.4420 LearningRate 0.0009 Epoch: 5 Global Step: 9120 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:10:16,292-Speed 24870.75 samples/sec Loss 5.4162 LearningRate 0.0009 Epoch: 5 Global Step: 9130 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:10:26,150-Speed 24933.46 samples/sec Loss 5.3487 LearningRate 0.0009 Epoch: 5 Global Step: 9140 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:10:35,984-Speed 24992.43 samples/sec Loss 5.3448 LearningRate 0.0009 Epoch: 5 Global Step: 9150 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:10:45,908-Speed 24769.01 samples/sec Loss 5.3572 LearningRate 0.0009 Epoch: 5 Global Step: 9160 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-26 01:10:55,858-Speed 24702.17 samples/sec Loss 5.3512 LearningRate 0.0009 Epoch: 5 Global Step: 9170 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-26 01:11:05,637-Speed 25134.63 samples/sec Loss 5.3611 LearningRate 0.0009 Epoch: 5 Global Step: 9180 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-26 01:11:15,382-Speed 25222.36 samples/sec Loss 5.3881 LearningRate 0.0009 Epoch: 5 Global Step: 9190 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-26 01:11:25,219-Speed 24986.73 samples/sec Loss 5.3395 LearningRate 0.0009 Epoch: 5 Global Step: 9200 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-26 01:11:34,973-Speed 25198.65 samples/sec Loss 5.3534 LearningRate 0.0009 Epoch: 5 Global Step: 9210 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-26 01:11:44,725-Speed 25204.23 samples/sec Loss 5.3299 LearningRate 0.0009 Epoch: 5 Global Step: 9220 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-26 01:11:54,463-Speed 25240.31 samples/sec Loss 5.3527 LearningRate 0.0009 Epoch: 5 Global Step: 9230 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-26 01:12:04,328-Speed 24915.90 samples/sec Loss 5.3312 LearningRate 0.0009 Epoch: 5 Global Step: 9240 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-26 01:12:14,175-Speed 24961.07 samples/sec Loss 5.3054 LearningRate 0.0009 Epoch: 5 Global Step: 9250 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-26 01:12:23,984-Speed 25058.40 samples/sec Loss 5.3071 LearningRate 0.0009 Epoch: 5 Global Step: 9260 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:12:33,881-Speed 24835.57 samples/sec Loss 5.3231 LearningRate 0.0009 Epoch: 5 Global Step: 9270 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:12:43,621-Speed 25234.48 samples/sec Loss 5.2881 LearningRate 0.0009 Epoch: 5 Global Step: 9280 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:12:53,511-Speed 24854.18 samples/sec Loss 5.2966 LearningRate 0.0009 Epoch: 5 Global Step: 9290 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:13:03,397-Speed 24860.94 samples/sec Loss 5.2919 LearningRate 0.0009 Epoch: 5 Global Step: 9300 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:13:13,274-Speed 24886.96 samples/sec Loss 5.2931 LearningRate 0.0009 Epoch: 5 Global Step: 9310 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:13:23,291-Speed 24538.93 samples/sec Loss 5.2845 LearningRate 0.0009 Epoch: 5 Global Step: 9320 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:13:33,180-Speed 24855.06 samples/sec Loss 5.2519 LearningRate 0.0009 Epoch: 5 Global Step: 9330 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:13:43,145-Speed 24667.08 samples/sec Loss 5.2290 LearningRate 0.0009 Epoch: 5 Global Step: 9340 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:13:52,885-Speed 25233.92 samples/sec Loss 5.2907 LearningRate 0.0009 Epoch: 5 Global Step: 9350 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:14:02,620-Speed 25248.48 samples/sec Loss 5.2755 LearningRate 0.0009 Epoch: 5 Global Step: 9360 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:14:12,409-Speed 25109.09 samples/sec Loss 5.2152 LearningRate 0.0009 Epoch: 5 Global Step: 9370 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:14:22,177-Speed 25161.32 samples/sec Loss 5.2867 LearningRate 0.0009 Epoch: 5 Global Step: 9380 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:14:31,973-Speed 25093.01 samples/sec Loss 5.2589 LearningRate 0.0009 Epoch: 5 Global Step: 9390 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:14:41,785-Speed 25048.53 samples/sec Loss 5.2270 LearningRate 0.0009 Epoch: 5 Global Step: 9400 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-26 01:14:51,491-Speed 25325.13 samples/sec Loss 5.2172 LearningRate 0.0009 Epoch: 5 Global Step: 9410 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-26 01:15:01,251-Speed 25185.26 samples/sec Loss 5.2493 LearningRate 0.0009 Epoch: 5 Global Step: 9420 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-26 01:15:10,954-Speed 25331.23 samples/sec Loss 5.2087 LearningRate 0.0009 Epoch: 5 Global Step: 9430 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-26 01:15:20,799-Speed 24964.38 samples/sec Loss 5.1892 LearningRate 0.0009 Epoch: 5 Global Step: 9440 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-26 01:15:30,532-Speed 25253.97 samples/sec Loss 5.2065 LearningRate 0.0009 Epoch: 5 Global Step: 9450 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-26 01:15:40,461-Speed 24753.83 samples/sec Loss 5.2090 LearningRate 0.0009 Epoch: 5 Global Step: 9460 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-26 01:15:50,178-Speed 25295.27 samples/sec Loss 5.2405 LearningRate 0.0009 Epoch: 5 Global Step: 9470 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-26 01:15:59,915-Speed 25242.13 samples/sec Loss 5.2370 LearningRate 0.0009 Epoch: 5 Global Step: 9480 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-26 01:16:09,731-Speed 25040.62 samples/sec Loss 5.1699 LearningRate 0.0009 Epoch: 5 Global Step: 9490 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-26 01:16:19,592-Speed 24926.35 samples/sec Loss 5.1767 LearningRate 0.0009 Epoch: 5 Global Step: 9500 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:16:29,452-Speed 24928.93 samples/sec Loss 5.1738 LearningRate 0.0009 Epoch: 5 Global Step: 9510 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:16:39,236-Speed 25120.20 samples/sec Loss 5.1320 LearningRate 0.0009 Epoch: 5 Global Step: 9520 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:16:49,130-Speed 24844.92 samples/sec Loss 5.1637 LearningRate 0.0009 Epoch: 5 Global Step: 9530 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:16:58,851-Speed 25284.66 samples/sec Loss 5.2133 LearningRate 0.0009 Epoch: 5 Global Step: 9540 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:17:08,644-Speed 25100.84 samples/sec Loss 5.1445 LearningRate 0.0009 Epoch: 5 Global Step: 9550 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:17:18,374-Speed 25259.46 samples/sec Loss 5.1772 LearningRate 0.0009 Epoch: 5 Global Step: 9560 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:17:28,144-Speed 25158.75 samples/sec Loss 5.1412 LearningRate 0.0009 Epoch: 5 Global Step: 9570 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:17:37,906-Speed 25180.69 samples/sec Loss 5.1571 LearningRate 0.0009 Epoch: 5 Global Step: 9580 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:17:47,677-Speed 25207.72 samples/sec Loss 5.1605 LearningRate 0.0009 Epoch: 5 Global Step: 9590 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:17:57,645-Speed 24657.27 samples/sec Loss 5.1429 LearningRate 0.0009 Epoch: 5 Global Step: 9600 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-03-26 01:18:07,473-Speed 25009.95 samples/sec Loss 5.1366 LearningRate 0.0009 Epoch: 5 Global Step: 9610 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:18:17,234-Speed 25186.45 samples/sec Loss 5.1494 LearningRate 0.0009 Epoch: 5 Global Step: 9620 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:18:27,139-Speed 24814.77 samples/sec Loss 5.1418 LearningRate 0.0009 Epoch: 5 Global Step: 9630 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:18:36,967-Speed 25008.72 samples/sec Loss 5.1340 LearningRate 0.0009 Epoch: 5 Global Step: 9640 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:18:46,756-Speed 25107.76 samples/sec Loss 5.1376 LearningRate 0.0009 Epoch: 5 Global Step: 9650 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:18:56,537-Speed 25129.98 samples/sec Loss 5.1235 LearningRate 0.0009 Epoch: 5 Global Step: 9660 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:19:06,443-Speed 24812.10 samples/sec Loss 5.1167 LearningRate 0.0009 Epoch: 5 Global Step: 9670 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:19:16,367-Speed 24769.76 samples/sec Loss 5.0729 LearningRate 0.0009 Epoch: 5 Global Step: 9680 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:19:26,200-Speed 24993.63 samples/sec Loss 5.0992 LearningRate 0.0009 Epoch: 5 Global Step: 9690 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:19:35,980-Speed 25134.61 samples/sec Loss 5.0672 LearningRate 0.0009 Epoch: 5 Global Step: 9700 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:19:45,800-Speed 25029.06 samples/sec Loss 5.0844 LearningRate 0.0009 Epoch: 5 Global Step: 9710 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:19:55,645-Speed 24967.37 samples/sec Loss 5.0856 LearningRate 0.0009 Epoch: 5 Global Step: 9720 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:20:05,421-Speed 25143.95 samples/sec Loss 5.0577 LearningRate 0.0009 Epoch: 5 Global Step: 9730 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:20:15,197-Speed 25143.46 samples/sec Loss 5.0915 LearningRate 0.0009 Epoch: 5 Global Step: 9740 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:20:24,998-Speed 25077.05 samples/sec Loss 5.0609 LearningRate 0.0009 Epoch: 5 Global Step: 9750 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:20:34,870-Speed 24899.10 samples/sec Loss 5.0537 LearningRate 0.0009 Epoch: 5 Global Step: 9760 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:20:44,696-Speed 25013.55 samples/sec Loss 5.0726 LearningRate 0.0009 Epoch: 5 Global Step: 9770 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:20:54,494-Speed 25087.64 samples/sec Loss 5.0727 LearningRate 0.0009 Epoch: 5 Global Step: 9780 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:21:04,242-Speed 25213.01 samples/sec Loss 5.0377 LearningRate 0.0009 Epoch: 5 Global Step: 9790 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:21:13,994-Speed 25204.58 samples/sec Loss 5.0286 LearningRate 0.0009 Epoch: 5 Global Step: 9800 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:21:23,863-Speed 24905.12 samples/sec Loss 5.0573 LearningRate 0.0009 Epoch: 5 Global Step: 9810 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:21:33,620-Speed 25191.03 samples/sec Loss 5.0645 LearningRate 0.0009 Epoch: 5 Global Step: 9820 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:21:43,440-Speed 25028.60 samples/sec Loss 5.0238 LearningRate 0.0009 Epoch: 5 Global Step: 9830 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:21:53,201-Speed 25182.10 samples/sec Loss 4.9970 LearningRate 0.0009 Epoch: 5 Global Step: 9840 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:22:02,933-Speed 25255.75 samples/sec Loss 5.0238 LearningRate 0.0009 Epoch: 5 Global Step: 9850 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:22:12,791-Speed 24932.65 samples/sec Loss 5.0226 LearningRate 0.0009 Epoch: 5 Global Step: 9860 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:22:22,679-Speed 24857.36 samples/sec Loss 5.0394 LearningRate 0.0009 Epoch: 5 Global Step: 9870 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:22:32,498-Speed 25030.54 samples/sec Loss 5.0479 LearningRate 0.0009 Epoch: 5 Global Step: 9880 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:22:42,234-Speed 25244.99 samples/sec Loss 5.0375 LearningRate 0.0009 Epoch: 5 Global Step: 9890 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:22:52,052-Speed 25035.54 samples/sec Loss 5.0171 LearningRate 0.0009 Epoch: 5 Global Step: 9900 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:23:01,864-Speed 25052.14 samples/sec Loss 5.0264 LearningRate 0.0009 Epoch: 5 Global Step: 9910 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-03-26 01:23:11,628-Speed 25172.40 samples/sec Loss 5.0160 LearningRate 0.0009 Epoch: 5 Global Step: 9920 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:23:21,388-Speed 25183.32 samples/sec Loss 4.9695 LearningRate 0.0009 Epoch: 5 Global Step: 9930 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:23:31,273-Speed 24865.98 samples/sec Loss 4.9663 LearningRate 0.0009 Epoch: 5 Global Step: 9940 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:23:41,039-Speed 25167.27 samples/sec Loss 4.9489 LearningRate 0.0009 Epoch: 5 Global Step: 9950 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:23:50,830-Speed 25106.00 samples/sec Loss 4.9552 LearningRate 0.0009 Epoch: 5 Global Step: 9960 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:24:00,689-Speed 24929.81 samples/sec Loss 4.9910 LearningRate 0.0009 Epoch: 5 Global Step: 9970 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:24:10,525-Speed 24987.41 samples/sec Loss 4.9541 LearningRate 0.0009 Epoch: 5 Global Step: 9980 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:24:20,382-Speed 24935.51 samples/sec Loss 4.9766 LearningRate 0.0009 Epoch: 5 Global Step: 9990 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:24:30,218-Speed 24992.24 samples/sec Loss 4.9503 LearningRate 0.0009 Epoch: 5 Global Step: 10000 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:24:40,082-Speed 24918.70 samples/sec Loss 4.9143 LearningRate 0.0009 Epoch: 5 Global Step: 10010 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:24:50,012-Speed 24752.36 samples/sec Loss 4.9498 LearningRate 0.0009 Epoch: 5 Global Step: 10020 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-03-26 01:25:00,002-Speed 24604.55 samples/sec Loss 4.9312 LearningRate 0.0009 Epoch: 5 Global Step: 10030 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:25:09,780-Speed 25137.29 samples/sec Loss 4.9378 LearningRate 0.0009 Epoch: 5 Global Step: 10040 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:25:19,611-Speed 25002.68 samples/sec Loss 4.9471 LearningRate 0.0009 Epoch: 5 Global Step: 10050 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:25:29,435-Speed 25018.29 samples/sec Loss 4.9703 LearningRate 0.0009 Epoch: 5 Global Step: 10060 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:25:39,263-Speed 25011.41 samples/sec Loss 4.9218 LearningRate 0.0009 Epoch: 5 Global Step: 10070 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:25:49,036-Speed 25149.34 samples/sec Loss 4.9557 LearningRate 0.0009 Epoch: 5 Global Step: 10080 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:25:58,861-Speed 25017.22 samples/sec Loss 4.9536 LearningRate 0.0009 Epoch: 5 Global Step: 10090 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:26:08,887-Speed 24514.52 samples/sec Loss 4.9330 LearningRate 0.0009 Epoch: 5 Global Step: 10100 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:26:18,729-Speed 24973.93 samples/sec Loss 4.9093 LearningRate 0.0009 Epoch: 5 Global Step: 10110 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:26:28,472-Speed 25225.17 samples/sec Loss 4.9694 LearningRate 0.0009 Epoch: 5 Global Step: 10120 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:26:38,254-Speed 25126.16 samples/sec Loss 4.8996 LearningRate 0.0009 Epoch: 5 Global Step: 10130 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:26:47,947-Speed 25357.70 samples/sec Loss 4.8953 LearningRate 0.0009 Epoch: 5 Global Step: 10140 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:26:57,759-Speed 25048.66 samples/sec Loss 4.9043 LearningRate 0.0009 Epoch: 5 Global Step: 10150 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:27:07,486-Speed 25269.80 samples/sec Loss 4.9247 LearningRate 0.0009 Epoch: 5 Global Step: 10160 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:27:17,249-Speed 25173.91 samples/sec Loss 4.9135 LearningRate 0.0009 Epoch: 5 Global Step: 10170 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:27:26,992-Speed 25233.23 samples/sec Loss 4.9217 LearningRate 0.0009 Epoch: 5 Global Step: 10180 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:27:36,737-Speed 25223.97 samples/sec Loss 4.8752 LearningRate 0.0009 Epoch: 5 Global Step: 10190 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:27:46,591-Speed 24942.68 samples/sec Loss 4.8692 LearningRate 0.0009 Epoch: 5 Global Step: 10200 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:27:56,413-Speed 25025.08 samples/sec Loss 4.8506 LearningRate 0.0009 Epoch: 5 Global Step: 10210 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:28:06,229-Speed 25046.94 samples/sec Loss 4.8441 LearningRate 0.0009 Epoch: 5 Global Step: 10220 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:28:16,029-Speed 25090.00 samples/sec Loss 4.8768 LearningRate 0.0009 Epoch: 5 Global Step: 10230 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:28:25,785-Speed 25192.10 samples/sec Loss 4.8818 LearningRate 0.0009 Epoch: 5 Global Step: 10240 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:28:35,579-Speed 25099.33 samples/sec Loss 4.9434 LearningRate 0.0009 Epoch: 5 Global Step: 10250 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:28:45,616-Speed 24488.70 samples/sec Loss 4.8886 LearningRate 0.0009 Epoch: 5 Global Step: 10260 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:28:55,415-Speed 25088.77 samples/sec Loss 4.8710 LearningRate 0.0009 Epoch: 5 Global Step: 10270 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:29:05,210-Speed 25095.19 samples/sec Loss 4.8499 LearningRate 0.0009 Epoch: 5 Global Step: 10280 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:29:14,949-Speed 25237.15 samples/sec Loss 4.8649 LearningRate 0.0009 Epoch: 5 Global Step: 10290 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:29:24,793-Speed 24969.24 samples/sec Loss 4.8828 LearningRate 0.0009 Epoch: 5 Global Step: 10300 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:29:34,658-Speed 24915.11 samples/sec Loss 4.8906 LearningRate 0.0009 Epoch: 5 Global Step: 10310 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:29:44,394-Speed 25245.39 samples/sec Loss 4.8407 LearningRate 0.0009 Epoch: 5 Global Step: 10320 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:29:54,114-Speed 25287.71 samples/sec Loss 4.8299 LearningRate 0.0009 Epoch: 5 Global Step: 10330 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:30:03,920-Speed 25067.69 samples/sec Loss 4.8198 LearningRate 0.0009 Epoch: 5 Global Step: 10340 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:30:13,726-Speed 25064.78 samples/sec Loss 4.8559 LearningRate 0.0009 Epoch: 5 Global Step: 10350 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:30:23,485-Speed 25185.84 samples/sec Loss 4.8571 LearningRate 0.0009 Epoch: 5 Global Step: 10360 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:30:33,262-Speed 25140.14 samples/sec Loss 4.8912 LearningRate 0.0009 Epoch: 5 Global Step: 10370 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:31:34,307-Speed 4026.05 samples/sec Loss 4.7680 LearningRate 0.0009 Epoch: 6 Global Step: 10380 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:31:44,111-Speed 25072.29 samples/sec Loss 4.7378 LearningRate 0.0009 Epoch: 6 Global Step: 10390 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-03-26 01:31:54,425-Speed 23830.47 samples/sec Loss 4.7974 LearningRate 0.0009 Epoch: 6 Global Step: 10400 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:32:04,176-Speed 25206.23 samples/sec Loss 4.7783 LearningRate 0.0009 Epoch: 6 Global Step: 10410 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:32:13,863-Speed 25374.89 samples/sec Loss 4.7752 LearningRate 0.0009 Epoch: 6 Global Step: 10420 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:32:23,704-Speed 24976.76 samples/sec Loss 4.7502 LearningRate 0.0009 Epoch: 6 Global Step: 10430 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-03-26 01:32:33,337-Speed 25515.39 samples/sec Loss 4.7876 LearningRate 0.0009 Epoch: 6 Global Step: 10440 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:32:43,022-Speed 25377.67 samples/sec Loss 4.7738 LearningRate 0.0009 Epoch: 6 Global Step: 10450 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:32:52,803-Speed 25129.03 samples/sec Loss 4.7495 LearningRate 0.0009 Epoch: 6 Global Step: 10460 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:33:02,576-Speed 25149.97 samples/sec Loss 4.7629 LearningRate 0.0009 Epoch: 6 Global Step: 10470 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:33:12,274-Speed 25342.84 samples/sec Loss 4.7466 LearningRate 0.0009 Epoch: 6 Global Step: 10480 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:33:22,038-Speed 25174.18 samples/sec Loss 4.7813 LearningRate 0.0009 Epoch: 6 Global Step: 10490 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:33:31,759-Speed 25284.12 samples/sec Loss 4.7298 LearningRate 0.0009 Epoch: 6 Global Step: 10500 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:33:41,530-Speed 25157.09 samples/sec Loss 4.7483 LearningRate 0.0009 Epoch: 6 Global Step: 10510 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:33:51,259-Speed 25264.79 samples/sec Loss 4.7582 LearningRate 0.0009 Epoch: 6 Global Step: 10520 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:34:00,924-Speed 25430.86 samples/sec Loss 4.7469 LearningRate 0.0009 Epoch: 6 Global Step: 10530 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:34:10,658-Speed 25250.57 samples/sec Loss 4.6952 LearningRate 0.0009 Epoch: 6 Global Step: 10540 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:34:20,406-Speed 25213.56 samples/sec Loss 4.7146 LearningRate 0.0009 Epoch: 6 Global Step: 10550 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:34:30,176-Speed 25157.37 samples/sec Loss 4.7388 LearningRate 0.0009 Epoch: 6 Global Step: 10560 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:34:39,870-Speed 25356.16 samples/sec Loss 4.7493 LearningRate 0.0009 Epoch: 6 Global Step: 10570 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:34:49,640-Speed 25158.02 samples/sec Loss 4.7240 LearningRate 0.0009 Epoch: 6 Global Step: 10580 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:34:59,345-Speed 25326.71 samples/sec Loss 4.7698 LearningRate 0.0009 Epoch: 6 Global Step: 10590 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:35:09,073-Speed 25275.17 samples/sec Loss 4.7344 LearningRate 0.0009 Epoch: 6 Global Step: 10600 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:35:18,905-Speed 24997.58 samples/sec Loss 4.7580 LearningRate 0.0009 Epoch: 6 Global Step: 10610 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:35:28,773-Speed 24908.87 samples/sec Loss 4.6666 LearningRate 0.0009 Epoch: 6 Global Step: 10620 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:35:38,653-Speed 24876.93 samples/sec Loss 4.6950 LearningRate 0.0009 Epoch: 6 Global Step: 10630 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:35:48,388-Speed 25248.14 samples/sec Loss 4.7267 LearningRate 0.0009 Epoch: 6 Global Step: 10640 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:35:58,108-Speed 25286.60 samples/sec Loss 4.6916 LearningRate 0.0009 Epoch: 6 Global Step: 10650 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:36:07,963-Speed 24939.36 samples/sec Loss 4.7106 LearningRate 0.0009 Epoch: 6 Global Step: 10660 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:36:17,819-Speed 24938.08 samples/sec Loss 4.7212 LearningRate 0.0009 Epoch: 6 Global Step: 10670 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:36:27,581-Speed 25178.54 samples/sec Loss 4.6882 LearningRate 0.0009 Epoch: 6 Global Step: 10680 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:36:37,294-Speed 25305.77 samples/sec Loss 4.7096 LearningRate 0.0009 Epoch: 6 Global Step: 10690 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:36:47,079-Speed 25119.95 samples/sec Loss 4.7360 LearningRate 0.0009 Epoch: 6 Global Step: 10700 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:36:56,809-Speed 25263.43 samples/sec Loss 4.6978 LearningRate 0.0009 Epoch: 6 Global Step: 10710 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:37:06,637-Speed 25009.25 samples/sec Loss 4.6559 LearningRate 0.0009 Epoch: 6 Global Step: 10720 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:37:16,471-Speed 24993.20 samples/sec Loss 4.6793 LearningRate 0.0009 Epoch: 6 Global Step: 10730 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:37:26,271-Speed 25081.52 samples/sec Loss 4.7196 LearningRate 0.0009 Epoch: 6 Global Step: 10740 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:37:35,958-Speed 25375.37 samples/sec Loss 4.7184 LearningRate 0.0009 Epoch: 6 Global Step: 10750 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:37:45,718-Speed 25182.89 samples/sec Loss 4.7031 LearningRate 0.0009 Epoch: 6 Global Step: 10760 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:37:55,428-Speed 25311.61 samples/sec Loss 4.6504 LearningRate 0.0009 Epoch: 6 Global Step: 10770 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:38:05,271-Speed 24972.27 samples/sec Loss 4.6502 LearningRate 0.0009 Epoch: 6 Global Step: 10780 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 01:38:15,038-Speed 25167.75 samples/sec Loss 4.6846 LearningRate 0.0009 Epoch: 6 Global Step: 10790 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 01:38:24,796-Speed 25187.45 samples/sec Loss 4.6756 LearningRate 0.0009 Epoch: 6 Global Step: 10800 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 01:38:34,566-Speed 25157.40 samples/sec Loss 4.6612 LearningRate 0.0009 Epoch: 6 Global Step: 10810 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 01:38:44,349-Speed 25125.63 samples/sec Loss 4.6709 LearningRate 0.0009 Epoch: 6 Global Step: 10820 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 01:38:54,109-Speed 25192.78 samples/sec Loss 4.6295 LearningRate 0.0009 Epoch: 6 Global Step: 10830 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 01:39:03,902-Speed 25098.96 samples/sec Loss 4.6689 LearningRate 0.0009 Epoch: 6 Global Step: 10840 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 01:39:13,662-Speed 25183.36 samples/sec Loss 4.6505 LearningRate 0.0009 Epoch: 6 Global Step: 10850 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 01:39:23,565-Speed 24820.90 samples/sec Loss 4.6203 LearningRate 0.0009 Epoch: 6 Global Step: 10860 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 01:39:33,552-Speed 24611.39 samples/sec Loss 4.6150 LearningRate 0.0009 Epoch: 6 Global Step: 10870 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 01:39:43,512-Speed 24678.28 samples/sec Loss 4.6708 LearningRate 0.0009 Epoch: 6 Global Step: 10880 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:39:53,477-Speed 24667.51 samples/sec Loss 4.6387 LearningRate 0.0009 Epoch: 6 Global Step: 10890 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:40:03,543-Speed 24416.83 samples/sec Loss 4.6428 LearningRate 0.0009 Epoch: 6 Global Step: 10900 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:40:13,541-Speed 24584.23 samples/sec Loss 4.6262 LearningRate 0.0009 Epoch: 6 Global Step: 10910 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:40:23,563-Speed 24533.45 samples/sec Loss 4.6274 LearningRate 0.0009 Epoch: 6 Global Step: 10920 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:40:33,543-Speed 24626.89 samples/sec Loss 4.6108 LearningRate 0.0009 Epoch: 6 Global Step: 10930 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 01:40:43,537-Speed 24592.67 samples/sec Loss 4.6060 LearningRate 0.0009 Epoch: 6 Global Step: 10940 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 01:40:53,606-Speed 24409.54 samples/sec Loss 4.6151 LearningRate 0.0009 Epoch: 6 Global Step: 10950 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 01:41:03,636-Speed 24507.19 samples/sec Loss 4.5885 LearningRate 0.0009 Epoch: 6 Global Step: 10960 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 01:41:13,578-Speed 24721.52 samples/sec Loss 4.6097 LearningRate 0.0009 Epoch: 6 Global Step: 10970 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 01:41:23,519-Speed 24725.72 samples/sec Loss 4.5934 LearningRate 0.0009 Epoch: 6 Global Step: 10980 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 01:41:33,461-Speed 24720.09 samples/sec Loss 4.6127 LearningRate 0.0009 Epoch: 6 Global Step: 10990 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 01:41:43,384-Speed 24771.08 samples/sec Loss 4.6008 LearningRate 0.0009 Epoch: 6 Global Step: 11000 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 01:41:53,328-Speed 24715.51 samples/sec Loss 4.5858 LearningRate 0.0009 Epoch: 6 Global Step: 11010 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 01:42:03,301-Speed 24647.46 samples/sec Loss 4.5898 LearningRate 0.0009 Epoch: 6 Global Step: 11020 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 01:42:13,419-Speed 24294.72 samples/sec Loss 4.6250 LearningRate 0.0009 Epoch: 6 Global Step: 11030 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:42:23,442-Speed 24521.09 samples/sec Loss 4.5995 LearningRate 0.0009 Epoch: 6 Global Step: 11040 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:42:33,337-Speed 24837.93 samples/sec Loss 4.5996 LearningRate 0.0009 Epoch: 6 Global Step: 11050 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:42:43,376-Speed 24483.11 samples/sec Loss 4.5944 LearningRate 0.0009 Epoch: 6 Global Step: 11060 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:42:53,406-Speed 24507.02 samples/sec Loss 4.5647 LearningRate 0.0009 Epoch: 6 Global Step: 11070 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:43:03,370-Speed 24669.50 samples/sec Loss 4.5883 LearningRate 0.0009 Epoch: 6 Global Step: 11080 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:43:13,377-Speed 24560.76 samples/sec Loss 4.5997 LearningRate 0.0009 Epoch: 6 Global Step: 11090 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:43:23,353-Speed 24641.27 samples/sec Loss 4.5845 LearningRate 0.0009 Epoch: 6 Global Step: 11100 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:43:33,354-Speed 24575.47 samples/sec Loss 4.5806 LearningRate 0.0009 Epoch: 6 Global Step: 11110 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:43:43,311-Speed 24686.23 samples/sec Loss 4.5551 LearningRate 0.0009 Epoch: 6 Global Step: 11120 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:43:53,302-Speed 24603.59 samples/sec Loss 4.5314 LearningRate 0.0009 Epoch: 6 Global Step: 11130 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-03-26 01:44:03,244-Speed 24722.96 samples/sec Loss 4.5251 LearningRate 0.0009 Epoch: 6 Global Step: 11140 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:44:13,196-Speed 24697.16 samples/sec Loss 4.5154 LearningRate 0.0009 Epoch: 6 Global Step: 11150 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:44:23,152-Speed 24686.27 samples/sec Loss 4.4955 LearningRate 0.0009 Epoch: 6 Global Step: 11160 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:44:33,098-Speed 24711.80 samples/sec Loss 4.4818 LearningRate 0.0009 Epoch: 6 Global Step: 11170 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:44:43,117-Speed 24534.04 samples/sec Loss 4.5475 LearningRate 0.0009 Epoch: 6 Global Step: 11180 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:44:53,120-Speed 24574.81 samples/sec Loss 4.5226 LearningRate 0.0009 Epoch: 6 Global Step: 11190 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:45:03,179-Speed 24435.70 samples/sec Loss 4.5717 LearningRate 0.0009 Epoch: 6 Global Step: 11200 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:45:13,274-Speed 24347.13 samples/sec Loss 4.5935 LearningRate 0.0009 Epoch: 6 Global Step: 11210 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:45:23,276-Speed 24574.58 samples/sec Loss 4.5489 LearningRate 0.0009 Epoch: 6 Global Step: 11220 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 01:45:33,209-Speed 24744.91 samples/sec Loss 4.5081 LearningRate 0.0009 Epoch: 6 Global Step: 11230 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 01:45:43,285-Speed 24392.63 samples/sec Loss 4.5005 LearningRate 0.0009 Epoch: 6 Global Step: 11240 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 01:45:53,232-Speed 24709.12 samples/sec Loss 4.4818 LearningRate 0.0009 Epoch: 6 Global Step: 11250 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 01:46:03,224-Speed 24598.78 samples/sec Loss 4.4902 LearningRate 0.0009 Epoch: 6 Global Step: 11260 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 01:46:13,208-Speed 24619.99 samples/sec Loss 4.4949 LearningRate 0.0009 Epoch: 6 Global Step: 11270 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 01:46:23,234-Speed 24514.61 samples/sec Loss 4.4678 LearningRate 0.0009 Epoch: 6 Global Step: 11280 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 01:46:33,308-Speed 24401.26 samples/sec Loss 4.5010 LearningRate 0.0009 Epoch: 6 Global Step: 11290 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 01:46:43,300-Speed 24598.32 samples/sec Loss 4.5647 LearningRate 0.0009 Epoch: 6 Global Step: 11300 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 01:46:53,270-Speed 24653.06 samples/sec Loss 4.5287 LearningRate 0.0009 Epoch: 6 Global Step: 11310 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 01:47:03,260-Speed 24604.50 samples/sec Loss 4.4874 LearningRate 0.0009 Epoch: 6 Global Step: 11320 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:47:13,305-Speed 24470.25 samples/sec Loss 4.4762 LearningRate 0.0009 Epoch: 6 Global Step: 11330 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:47:23,266-Speed 24674.96 samples/sec Loss 4.4616 LearningRate 0.0009 Epoch: 6 Global Step: 11340 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:47:33,171-Speed 24814.54 samples/sec Loss 4.4919 LearningRate 0.0009 Epoch: 6 Global Step: 11350 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 01:47:43,321-Speed 24216.81 samples/sec Loss 4.5003 LearningRate 0.0009 Epoch: 6 Global Step: 11360 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 01:47:53,279-Speed 24684.75 samples/sec Loss 4.4771 LearningRate 0.0009 Epoch: 6 Global Step: 11370 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 01:48:03,472-Speed 24117.78 samples/sec Loss 4.4801 LearningRate 0.0009 Epoch: 6 Global Step: 11380 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 01:48:13,446-Speed 24650.80 samples/sec Loss 4.4642 LearningRate 0.0009 Epoch: 6 Global Step: 11390 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 01:48:23,469-Speed 24521.84 samples/sec Loss 4.4918 LearningRate 0.0009 Epoch: 6 Global Step: 11400 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 01:48:33,419-Speed 24703.02 samples/sec Loss 4.4533 LearningRate 0.0009 Epoch: 6 Global Step: 11410 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 01:48:43,385-Speed 24664.05 samples/sec Loss 4.4328 LearningRate 0.0009 Epoch: 6 Global Step: 11420 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 01:48:53,432-Speed 24462.87 samples/sec Loss 4.4663 LearningRate 0.0009 Epoch: 6 Global Step: 11430 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 01:49:03,498-Speed 24417.39 samples/sec Loss 4.5124 LearningRate 0.0009 Epoch: 6 Global Step: 11440 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-26 01:49:13,538-Speed 24482.53 samples/sec Loss 4.4181 LearningRate 0.0009 Epoch: 6 Global Step: 11450 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-26 01:49:23,592-Speed 24445.34 samples/sec Loss 4.4421 LearningRate 0.0009 Epoch: 6 Global Step: 11460 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-26 01:49:33,532-Speed 24726.53 samples/sec Loss 4.4743 LearningRate 0.0009 Epoch: 6 Global Step: 11470 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-26 01:49:43,519-Speed 24618.14 samples/sec Loss 4.4535 LearningRate 0.0009 Epoch: 6 Global Step: 11480 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-26 01:49:53,489-Speed 24653.54 samples/sec Loss 4.5024 LearningRate 0.0009 Epoch: 6 Global Step: 11490 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-26 01:50:03,387-Speed 24839.78 samples/sec Loss 4.4299 LearningRate 0.0009 Epoch: 6 Global Step: 11500 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-26 01:50:13,436-Speed 24458.61 samples/sec Loss 4.4297 LearningRate 0.0009 Epoch: 6 Global Step: 11510 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-26 01:50:23,354-Speed 24783.49 samples/sec Loss 4.4284 LearningRate 0.0009 Epoch: 6 Global Step: 11520 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-26 01:50:33,295-Speed 24725.92 samples/sec Loss 4.4089 LearningRate 0.0009 Epoch: 6 Global Step: 11530 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-26 01:50:43,346-Speed 24455.31 samples/sec Loss 4.4182 LearningRate 0.0009 Epoch: 6 Global Step: 11540 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 01:50:53,300-Speed 24691.67 samples/sec Loss 4.4150 LearningRate 0.0009 Epoch: 6 Global Step: 11550 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 01:51:03,274-Speed 24642.40 samples/sec Loss 4.4560 LearningRate 0.0009 Epoch: 6 Global Step: 11560 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 01:51:13,247-Speed 24645.91 samples/sec Loss 4.4260 LearningRate 0.0009 Epoch: 6 Global Step: 11570 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 01:51:23,250-Speed 24568.93 samples/sec Loss 4.4194 LearningRate 0.0009 Epoch: 6 Global Step: 11580 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 01:51:33,248-Speed 24586.53 samples/sec Loss 4.4719 LearningRate 0.0009 Epoch: 6 Global Step: 11590 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 01:51:43,237-Speed 24604.71 samples/sec Loss 4.4357 LearningRate 0.0009 Epoch: 6 Global Step: 11600 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 01:51:53,257-Speed 24532.02 samples/sec Loss 4.3827 LearningRate 0.0009 Epoch: 6 Global Step: 11610 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 01:52:03,399-Speed 24234.94 samples/sec Loss 4.4255 LearningRate 0.0009 Epoch: 6 Global Step: 11620 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 01:52:13,366-Speed 24658.86 samples/sec Loss 4.4067 LearningRate 0.0009 Epoch: 6 Global Step: 11630 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 01:52:23,359-Speed 24596.28 samples/sec Loss 4.3734 LearningRate 0.0009 Epoch: 6 Global Step: 11640 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:52:33,322-Speed 24669.64 samples/sec Loss 4.3851 LearningRate 0.0009 Epoch: 6 Global Step: 11650 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:52:43,300-Speed 24634.44 samples/sec Loss 4.3734 LearningRate 0.0009 Epoch: 6 Global Step: 11660 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:52:53,286-Speed 24613.37 samples/sec Loss 4.4359 LearningRate 0.0009 Epoch: 6 Global Step: 11670 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:53:03,310-Speed 24519.84 samples/sec Loss 4.3985 LearningRate 0.0009 Epoch: 6 Global Step: 11680 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:53:13,274-Speed 24667.82 samples/sec Loss 4.4060 LearningRate 0.0009 Epoch: 6 Global Step: 11690 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:53:23,291-Speed 24535.82 samples/sec Loss 4.3878 LearningRate 0.0009 Epoch: 6 Global Step: 11700 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:53:33,255-Speed 24668.04 samples/sec Loss 4.3930 LearningRate 0.0009 Epoch: 6 Global Step: 11710 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:53:43,520-Speed 23943.92 samples/sec Loss 4.4142 LearningRate 0.0009 Epoch: 6 Global Step: 11720 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:53:53,441-Speed 24774.58 samples/sec Loss 4.3980 LearningRate 0.0009 Epoch: 6 Global Step: 11730 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:54:03,342-Speed 24826.73 samples/sec Loss 4.4004 LearningRate 0.0009 Epoch: 6 Global Step: 11740 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:54:13,364-Speed 24525.72 samples/sec Loss 4.3562 LearningRate 0.0009 Epoch: 6 Global Step: 11750 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:54:23,345-Speed 24623.48 samples/sec Loss 4.3470 LearningRate 0.0009 Epoch: 6 Global Step: 11760 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:54:33,331-Speed 24612.99 samples/sec Loss 4.3207 LearningRate 0.0008 Epoch: 6 Global Step: 11770 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:54:43,327-Speed 24595.26 samples/sec Loss 4.3726 LearningRate 0.0008 Epoch: 6 Global Step: 11780 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:54:53,312-Speed 24616.55 samples/sec Loss 4.3432 LearningRate 0.0008 Epoch: 6 Global Step: 11790 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:55:03,277-Speed 24665.23 samples/sec Loss 4.3913 LearningRate 0.0008 Epoch: 6 Global Step: 11800 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:55:13,402-Speed 24274.71 samples/sec Loss 4.3846 LearningRate 0.0008 Epoch: 6 Global Step: 11810 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:55:23,363-Speed 24675.09 samples/sec Loss 4.3478 LearningRate 0.0008 Epoch: 6 Global Step: 11820 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:55:33,375-Speed 24556.77 samples/sec Loss 4.3446 LearningRate 0.0008 Epoch: 6 Global Step: 11830 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:55:43,306-Speed 24747.80 samples/sec Loss 4.3450 LearningRate 0.0008 Epoch: 6 Global Step: 11840 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:55:53,278-Speed 24656.11 samples/sec Loss 4.3350 LearningRate 0.0008 Epoch: 6 Global Step: 11850 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:56:03,350-Speed 24402.45 samples/sec Loss 4.3574 LearningRate 0.0008 Epoch: 6 Global Step: 11860 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:56:13,392-Speed 24476.19 samples/sec Loss 4.3414 LearningRate 0.0008 Epoch: 6 Global Step: 11870 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:56:23,386-Speed 24594.61 samples/sec Loss 4.3272 LearningRate 0.0008 Epoch: 6 Global Step: 11880 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:56:33,367-Speed 24624.42 samples/sec Loss 4.3417 LearningRate 0.0008 Epoch: 6 Global Step: 11890 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:56:43,414-Speed 24466.49 samples/sec Loss 4.3321 LearningRate 0.0008 Epoch: 6 Global Step: 11900 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:56:53,415-Speed 24576.48 samples/sec Loss 4.3439 LearningRate 0.0008 Epoch: 6 Global Step: 11910 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:57:03,435-Speed 24529.64 samples/sec Loss 4.3225 LearningRate 0.0008 Epoch: 6 Global Step: 11920 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:57:13,490-Speed 24444.86 samples/sec Loss 4.3454 LearningRate 0.0008 Epoch: 6 Global Step: 11930 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:57:23,465-Speed 24642.23 samples/sec Loss 4.2876 LearningRate 0.0008 Epoch: 6 Global Step: 11940 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:57:33,515-Speed 24462.76 samples/sec Loss 4.3218 LearningRate 0.0008 Epoch: 6 Global Step: 11950 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:57:43,351-Speed 24987.98 samples/sec Loss 4.3497 LearningRate 0.0008 Epoch: 6 Global Step: 11960 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:57:53,087-Speed 25246.48 samples/sec Loss 4.3238 LearningRate 0.0008 Epoch: 6 Global Step: 11970 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:58:02,875-Speed 25111.72 samples/sec Loss 4.3166 LearningRate 0.0008 Epoch: 6 Global Step: 11980 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:58:12,678-Speed 25072.54 samples/sec Loss 4.3399 LearningRate 0.0008 Epoch: 6 Global Step: 11990 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:58:22,394-Speed 25303.62 samples/sec Loss 4.2892 LearningRate 0.0008 Epoch: 6 Global Step: 12000 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:58:32,167-Speed 25150.77 samples/sec Loss 4.3176 LearningRate 0.0008 Epoch: 6 Global Step: 12010 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:58:41,859-Speed 25359.92 samples/sec Loss 4.2784 LearningRate 0.0008 Epoch: 6 Global Step: 12020 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:58:51,598-Speed 25238.86 samples/sec Loss 4.3134 LearningRate 0.0008 Epoch: 6 Global Step: 12030 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:59:01,442-Speed 24968.67 samples/sec Loss 4.3390 LearningRate 0.0008 Epoch: 6 Global Step: 12040 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-03-26 01:59:11,291-Speed 24955.89 samples/sec Loss 4.3710 LearningRate 0.0008 Epoch: 6 Global Step: 12050 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:59:21,087-Speed 25090.00 samples/sec Loss 4.3340 LearningRate 0.0008 Epoch: 6 Global Step: 12060 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:59:30,814-Speed 25270.46 samples/sec Loss 4.3224 LearningRate 0.0008 Epoch: 6 Global Step: 12070 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:59:40,536-Speed 25281.65 samples/sec Loss 4.3359 LearningRate 0.0008 Epoch: 6 Global Step: 12080 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 01:59:50,388-Speed 24946.10 samples/sec Loss 4.3253 LearningRate 0.0008 Epoch: 6 Global Step: 12090 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:00:50,326-Speed 4100.30 samples/sec Loss 4.3474 LearningRate 0.0008 Epoch: 7 Global Step: 12100 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:01:00,144-Speed 25035.31 samples/sec Loss 4.2515 LearningRate 0.0008 Epoch: 7 Global Step: 12110 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:01:09,818-Speed 25408.94 samples/sec Loss 4.2426 LearningRate 0.0008 Epoch: 7 Global Step: 12120 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:01:19,567-Speed 25214.47 samples/sec Loss 4.2101 LearningRate 0.0008 Epoch: 7 Global Step: 12130 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:01:29,294-Speed 25279.17 samples/sec Loss 4.2349 LearningRate 0.0008 Epoch: 7 Global Step: 12140 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:01:39,134-Speed 24978.73 samples/sec Loss 4.2336 LearningRate 0.0008 Epoch: 7 Global Step: 12150 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:01:48,880-Speed 25221.36 samples/sec Loss 4.2417 LearningRate 0.0008 Epoch: 7 Global Step: 12160 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:01:58,553-Speed 25415.48 samples/sec Loss 4.3043 LearningRate 0.0008 Epoch: 7 Global Step: 12170 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:02:08,351-Speed 25086.08 samples/sec Loss 4.2236 LearningRate 0.0008 Epoch: 7 Global Step: 12180 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:02:18,196-Speed 24968.88 samples/sec Loss 4.2330 LearningRate 0.0008 Epoch: 7 Global Step: 12190 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:02:27,973-Speed 25141.18 samples/sec Loss 4.2669 LearningRate 0.0008 Epoch: 7 Global Step: 12200 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:02:37,765-Speed 25105.64 samples/sec Loss 4.2055 LearningRate 0.0008 Epoch: 7 Global Step: 12210 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:02:47,660-Speed 24840.78 samples/sec Loss 4.2341 LearningRate 0.0008 Epoch: 7 Global Step: 12220 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:02:57,333-Speed 25412.04 samples/sec Loss 4.2333 LearningRate 0.0008 Epoch: 7 Global Step: 12230 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:03:07,079-Speed 25219.68 samples/sec Loss 4.2201 LearningRate 0.0008 Epoch: 7 Global Step: 12240 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:03:16,888-Speed 25065.68 samples/sec Loss 4.2402 LearningRate 0.0008 Epoch: 7 Global Step: 12250 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:03:26,703-Speed 25051.69 samples/sec Loss 4.2312 LearningRate 0.0008 Epoch: 7 Global Step: 12260 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:03:36,427-Speed 25276.23 samples/sec Loss 4.2331 LearningRate 0.0008 Epoch: 7 Global Step: 12270 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:03:46,134-Speed 25321.26 samples/sec Loss 4.1898 LearningRate 0.0008 Epoch: 7 Global Step: 12280 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:03:55,870-Speed 25245.42 samples/sec Loss 4.2250 LearningRate 0.0008 Epoch: 7 Global Step: 12290 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:04:05,600-Speed 25262.74 samples/sec Loss 4.2785 LearningRate 0.0008 Epoch: 7 Global Step: 12300 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:04:15,316-Speed 25298.23 samples/sec Loss 4.2343 LearningRate 0.0008 Epoch: 7 Global Step: 12310 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:04:25,184-Speed 24908.90 samples/sec Loss 4.2079 LearningRate 0.0008 Epoch: 7 Global Step: 12320 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:04:35,023-Speed 24983.08 samples/sec Loss 4.2173 LearningRate 0.0008 Epoch: 7 Global Step: 12330 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:04:44,827-Speed 25069.17 samples/sec Loss 4.2400 LearningRate 0.0008 Epoch: 7 Global Step: 12340 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-03-26 02:04:54,487-Speed 25444.95 samples/sec Loss 4.2424 LearningRate 0.0008 Epoch: 7 Global Step: 12350 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:05:04,307-Speed 25028.63 samples/sec Loss 4.2343 LearningRate 0.0008 Epoch: 7 Global Step: 12360 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:05:14,128-Speed 25029.27 samples/sec Loss 4.2103 LearningRate 0.0008 Epoch: 7 Global Step: 12370 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:05:23,952-Speed 25018.93 samples/sec Loss 4.2564 LearningRate 0.0008 Epoch: 7 Global Step: 12380 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:05:33,702-Speed 25215.33 samples/sec Loss 4.1886 LearningRate 0.0008 Epoch: 7 Global Step: 12390 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:05:43,450-Speed 25214.27 samples/sec Loss 4.2128 LearningRate 0.0008 Epoch: 7 Global Step: 12400 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:05:53,263-Speed 25047.21 samples/sec Loss 4.1801 LearningRate 0.0008 Epoch: 7 Global Step: 12410 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:06:02,936-Speed 25410.78 samples/sec Loss 4.1842 LearningRate 0.0008 Epoch: 7 Global Step: 12420 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:06:12,733-Speed 25089.44 samples/sec Loss 4.1884 LearningRate 0.0008 Epoch: 7 Global Step: 12430 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:06:22,408-Speed 25404.14 samples/sec Loss 4.2004 LearningRate 0.0008 Epoch: 7 Global Step: 12440 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:06:32,228-Speed 25027.47 samples/sec Loss 4.2039 LearningRate 0.0008 Epoch: 7 Global Step: 12450 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:06:41,920-Speed 25361.41 samples/sec Loss 4.2254 LearningRate 0.0008 Epoch: 7 Global Step: 12460 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:06:51,793-Speed 24897.58 samples/sec Loss 4.1878 LearningRate 0.0008 Epoch: 7 Global Step: 12470 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:07:01,539-Speed 25217.92 samples/sec Loss 4.1767 LearningRate 0.0008 Epoch: 7 Global Step: 12480 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:07:11,259-Speed 25288.75 samples/sec Loss 4.1827 LearningRate 0.0008 Epoch: 7 Global Step: 12490 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:07:21,054-Speed 25094.17 samples/sec Loss 4.2146 LearningRate 0.0008 Epoch: 7 Global Step: 12500 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:07:30,831-Speed 25140.93 samples/sec Loss 4.1807 LearningRate 0.0008 Epoch: 7 Global Step: 12510 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:07:40,674-Speed 24969.26 samples/sec Loss 4.1942 LearningRate 0.0008 Epoch: 7 Global Step: 12520 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:07:50,493-Speed 25031.08 samples/sec Loss 4.1608 LearningRate 0.0008 Epoch: 7 Global Step: 12530 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:08:00,296-Speed 25074.91 samples/sec Loss 4.1857 LearningRate 0.0008 Epoch: 7 Global Step: 12540 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:08:09,966-Speed 25416.52 samples/sec Loss 4.1629 LearningRate 0.0008 Epoch: 7 Global Step: 12550 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:08:19,772-Speed 25065.06 samples/sec Loss 4.1643 LearningRate 0.0008 Epoch: 7 Global Step: 12560 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:08:29,572-Speed 25082.46 samples/sec Loss 4.1771 LearningRate 0.0008 Epoch: 7 Global Step: 12570 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:08:39,247-Speed 25402.61 samples/sec Loss 4.1834 LearningRate 0.0008 Epoch: 7 Global Step: 12580 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:08:49,025-Speed 25136.91 samples/sec Loss 4.1487 LearningRate 0.0008 Epoch: 7 Global Step: 12590 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:08:58,730-Speed 25326.32 samples/sec Loss 4.1200 LearningRate 0.0008 Epoch: 7 Global Step: 12600 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:09:08,545-Speed 25043.87 samples/sec Loss 4.1444 LearningRate 0.0008 Epoch: 7 Global Step: 12610 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:09:18,350-Speed 25067.48 samples/sec Loss 4.1489 LearningRate 0.0008 Epoch: 7 Global Step: 12620 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:09:28,210-Speed 24928.75 samples/sec Loss 4.1848 LearningRate 0.0008 Epoch: 7 Global Step: 12630 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:09:37,976-Speed 25168.87 samples/sec Loss 4.1680 LearningRate 0.0008 Epoch: 7 Global Step: 12640 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:09:47,678-Speed 25331.47 samples/sec Loss 4.1205 LearningRate 0.0008 Epoch: 7 Global Step: 12650 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:09:57,692-Speed 24545.72 samples/sec Loss 4.1944 LearningRate 0.0008 Epoch: 7 Global Step: 12660 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:10:07,451-Speed 25184.38 samples/sec Loss 4.1822 LearningRate 0.0008 Epoch: 7 Global Step: 12670 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:10:17,257-Speed 25065.11 samples/sec Loss 4.1333 LearningRate 0.0008 Epoch: 7 Global Step: 12680 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:10:27,143-Speed 24862.17 samples/sec Loss 4.1132 LearningRate 0.0008 Epoch: 7 Global Step: 12690 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:10:36,937-Speed 25095.54 samples/sec Loss 4.0823 LearningRate 0.0008 Epoch: 7 Global Step: 12700 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:10:46,670-Speed 25251.48 samples/sec Loss 4.1427 LearningRate 0.0008 Epoch: 7 Global Step: 12710 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:10:56,422-Speed 25205.56 samples/sec Loss 4.1560 LearningRate 0.0008 Epoch: 7 Global Step: 12720 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:11:06,170-Speed 25213.50 samples/sec Loss 4.1280 LearningRate 0.0008 Epoch: 7 Global Step: 12730 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:11:15,903-Speed 25254.14 samples/sec Loss 4.0761 LearningRate 0.0008 Epoch: 7 Global Step: 12740 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:11:25,605-Speed 25333.63 samples/sec Loss 4.1416 LearningRate 0.0008 Epoch: 7 Global Step: 12750 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:11:35,329-Speed 25274.59 samples/sec Loss 4.1264 LearningRate 0.0008 Epoch: 7 Global Step: 12760 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:11:45,189-Speed 24928.38 samples/sec Loss 4.1210 LearningRate 0.0008 Epoch: 7 Global Step: 12770 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:11:55,022-Speed 24995.74 samples/sec Loss 4.1616 LearningRate 0.0008 Epoch: 7 Global Step: 12780 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:12:04,871-Speed 24964.30 samples/sec Loss 4.1545 LearningRate 0.0008 Epoch: 7 Global Step: 12790 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:12:14,564-Speed 25356.03 samples/sec Loss 4.1203 LearningRate 0.0008 Epoch: 7 Global Step: 12800 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:12:24,289-Speed 25274.66 samples/sec Loss 4.0950 LearningRate 0.0008 Epoch: 7 Global Step: 12810 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:12:34,128-Speed 24982.84 samples/sec Loss 4.0852 LearningRate 0.0008 Epoch: 7 Global Step: 12820 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:12:43,836-Speed 25317.55 samples/sec Loss 4.1098 LearningRate 0.0008 Epoch: 7 Global Step: 12830 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:12:53,546-Speed 25312.52 samples/sec Loss 4.0961 LearningRate 0.0008 Epoch: 7 Global Step: 12840 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:13:03,262-Speed 25298.01 samples/sec Loss 4.0929 LearningRate 0.0008 Epoch: 7 Global Step: 12850 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-03-26 02:13:13,005-Speed 25227.99 samples/sec Loss 4.1123 LearningRate 0.0008 Epoch: 7 Global Step: 12860 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:13:22,746-Speed 25232.19 samples/sec Loss 4.1187 LearningRate 0.0008 Epoch: 7 Global Step: 12870 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:13:32,539-Speed 25100.42 samples/sec Loss 4.1135 LearningRate 0.0008 Epoch: 7 Global Step: 12880 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:13:42,348-Speed 25057.38 samples/sec Loss 4.0821 LearningRate 0.0008 Epoch: 7 Global Step: 12890 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:13:52,060-Speed 25307.89 samples/sec Loss 4.1389 LearningRate 0.0008 Epoch: 7 Global Step: 12900 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:14:01,858-Speed 25086.98 samples/sec Loss 4.1066 LearningRate 0.0008 Epoch: 7 Global Step: 12910 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:14:11,665-Speed 25065.90 samples/sec Loss 4.0640 LearningRate 0.0008 Epoch: 7 Global Step: 12920 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:14:21,429-Speed 25172.95 samples/sec Loss 4.0630 LearningRate 0.0008 Epoch: 7 Global Step: 12930 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:14:31,205-Speed 25147.27 samples/sec Loss 4.0808 LearningRate 0.0008 Epoch: 7 Global Step: 12940 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:14:41,003-Speed 25087.18 samples/sec Loss 4.0742 LearningRate 0.0008 Epoch: 7 Global Step: 12950 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:14:50,849-Speed 24964.22 samples/sec Loss 4.0925 LearningRate 0.0008 Epoch: 7 Global Step: 12960 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:15:00,553-Speed 25326.19 samples/sec Loss 4.1331 LearningRate 0.0008 Epoch: 7 Global Step: 12970 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:15:10,354-Speed 25078.63 samples/sec Loss 4.0794 LearningRate 0.0008 Epoch: 7 Global Step: 12980 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:15:20,155-Speed 25078.54 samples/sec Loss 4.0672 LearningRate 0.0008 Epoch: 7 Global Step: 12990 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:15:29,948-Speed 25098.68 samples/sec Loss 4.0788 LearningRate 0.0008 Epoch: 7 Global Step: 13000 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:15:39,595-Speed 25478.39 samples/sec Loss 4.0969 LearningRate 0.0008 Epoch: 7 Global Step: 13010 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:15:49,401-Speed 25064.35 samples/sec Loss 4.0706 LearningRate 0.0008 Epoch: 7 Global Step: 13020 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:15:59,121-Speed 25284.44 samples/sec Loss 4.0659 LearningRate 0.0008 Epoch: 7 Global Step: 13030 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:16:08,945-Speed 25025.11 samples/sec Loss 4.0623 LearningRate 0.0008 Epoch: 7 Global Step: 13040 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:16:18,649-Speed 25331.04 samples/sec Loss 4.0657 LearningRate 0.0008 Epoch: 7 Global Step: 13050 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:16:28,536-Speed 24859.17 samples/sec Loss 4.0420 LearningRate 0.0008 Epoch: 7 Global Step: 13060 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:16:38,398-Speed 24923.52 samples/sec Loss 4.0505 LearningRate 0.0008 Epoch: 7 Global Step: 13070 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:16:48,177-Speed 25134.22 samples/sec Loss 4.0618 LearningRate 0.0008 Epoch: 7 Global Step: 13080 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:16:57,940-Speed 25175.96 samples/sec Loss 4.0519 LearningRate 0.0008 Epoch: 7 Global Step: 13090 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:17:07,776-Speed 24986.40 samples/sec Loss 4.0632 LearningRate 0.0008 Epoch: 7 Global Step: 13100 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:17:17,586-Speed 25056.30 samples/sec Loss 4.0432 LearningRate 0.0008 Epoch: 7 Global Step: 13110 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:17:27,514-Speed 24759.10 samples/sec Loss 4.0054 LearningRate 0.0008 Epoch: 7 Global Step: 13120 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:17:37,312-Speed 25085.46 samples/sec Loss 4.0056 LearningRate 0.0008 Epoch: 7 Global Step: 13130 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:17:47,122-Speed 25054.58 samples/sec Loss 4.0591 LearningRate 0.0008 Epoch: 7 Global Step: 13140 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:17:56,833-Speed 25312.25 samples/sec Loss 4.0070 LearningRate 0.0008 Epoch: 7 Global Step: 13150 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:18:06,673-Speed 24978.59 samples/sec Loss 4.0398 LearningRate 0.0008 Epoch: 7 Global Step: 13160 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:18:16,412-Speed 25238.07 samples/sec Loss 4.0822 LearningRate 0.0008 Epoch: 7 Global Step: 13170 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:18:26,143-Speed 25260.49 samples/sec Loss 4.0693 LearningRate 0.0008 Epoch: 7 Global Step: 13180 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:18:35,943-Speed 25080.46 samples/sec Loss 4.0337 LearningRate 0.0008 Epoch: 7 Global Step: 13190 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:18:45,770-Speed 25014.74 samples/sec Loss 4.0500 LearningRate 0.0008 Epoch: 7 Global Step: 13200 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:18:55,554-Speed 25119.28 samples/sec Loss 4.0070 LearningRate 0.0008 Epoch: 7 Global Step: 13210 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:19:05,338-Speed 25122.34 samples/sec Loss 4.0335 LearningRate 0.0008 Epoch: 7 Global Step: 13220 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:19:15,048-Speed 25314.54 samples/sec Loss 3.9762 LearningRate 0.0008 Epoch: 7 Global Step: 13230 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:19:24,909-Speed 24927.18 samples/sec Loss 3.9996 LearningRate 0.0008 Epoch: 7 Global Step: 13240 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:19:34,691-Speed 25132.98 samples/sec Loss 4.0287 LearningRate 0.0008 Epoch: 7 Global Step: 13250 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:19:44,445-Speed 25198.42 samples/sec Loss 3.9750 LearningRate 0.0008 Epoch: 7 Global Step: 13260 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:19:54,251-Speed 25073.94 samples/sec Loss 3.9852 LearningRate 0.0008 Epoch: 7 Global Step: 13270 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:20:04,073-Speed 25025.43 samples/sec Loss 4.0146 LearningRate 0.0008 Epoch: 7 Global Step: 13280 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:20:13,809-Speed 25244.31 samples/sec Loss 4.0107 LearningRate 0.0008 Epoch: 7 Global Step: 13290 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:20:23,606-Speed 25090.31 samples/sec Loss 3.9922 LearningRate 0.0008 Epoch: 7 Global Step: 13300 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:20:33,347-Speed 25236.32 samples/sec Loss 4.0098 LearningRate 0.0008 Epoch: 7 Global Step: 13310 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:20:43,026-Speed 25394.50 samples/sec Loss 3.9826 LearningRate 0.0008 Epoch: 7 Global Step: 13320 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:20:52,831-Speed 25067.93 samples/sec Loss 3.9900 LearningRate 0.0008 Epoch: 7 Global Step: 13330 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:21:02,592-Speed 25180.81 samples/sec Loss 3.9894 LearningRate 0.0008 Epoch: 7 Global Step: 13340 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:21:12,355-Speed 25177.62 samples/sec Loss 3.9913 LearningRate 0.0008 Epoch: 7 Global Step: 13350 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:21:22,108-Speed 25209.48 samples/sec Loss 4.0016 LearningRate 0.0008 Epoch: 7 Global Step: 13360 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:21:31,874-Speed 25167.88 samples/sec Loss 3.9605 LearningRate 0.0008 Epoch: 7 Global Step: 13370 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:21:41,621-Speed 25217.77 samples/sec Loss 4.0046 LearningRate 0.0008 Epoch: 7 Global Step: 13380 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:21:51,372-Speed 25206.42 samples/sec Loss 3.9873 LearningRate 0.0008 Epoch: 7 Global Step: 13390 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:22:01,128-Speed 25193.42 samples/sec Loss 4.0013 LearningRate 0.0008 Epoch: 7 Global Step: 13400 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:22:10,812-Speed 25379.38 samples/sec Loss 3.9992 LearningRate 0.0008 Epoch: 7 Global Step: 13410 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:22:20,583-Speed 25157.38 samples/sec Loss 3.9546 LearningRate 0.0008 Epoch: 7 Global Step: 13420 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:22:30,270-Speed 25373.15 samples/sec Loss 3.9699 LearningRate 0.0008 Epoch: 7 Global Step: 13430 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:22:39,963-Speed 25357.83 samples/sec Loss 3.9747 LearningRate 0.0008 Epoch: 7 Global Step: 13440 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:22:49,777-Speed 25045.67 samples/sec Loss 3.9583 LearningRate 0.0008 Epoch: 7 Global Step: 13450 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:22:59,611-Speed 24994.76 samples/sec Loss 3.9987 LearningRate 0.0008 Epoch: 7 Global Step: 13460 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:23:09,414-Speed 25074.55 samples/sec Loss 3.9792 LearningRate 0.0008 Epoch: 7 Global Step: 13470 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:23:19,278-Speed 24917.46 samples/sec Loss 4.0014 LearningRate 0.0008 Epoch: 7 Global Step: 13480 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:23:29,352-Speed 24397.79 samples/sec Loss 3.9543 LearningRate 0.0008 Epoch: 7 Global Step: 13490 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:23:39,161-Speed 25059.60 samples/sec Loss 3.9792 LearningRate 0.0008 Epoch: 7 Global Step: 13500 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:23:48,939-Speed 25137.78 samples/sec Loss 3.9795 LearningRate 0.0008 Epoch: 7 Global Step: 13510 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:23:58,803-Speed 24915.73 samples/sec Loss 3.9571 LearningRate 0.0008 Epoch: 7 Global Step: 13520 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:24:08,568-Speed 25170.49 samples/sec Loss 3.9429 LearningRate 0.0008 Epoch: 7 Global Step: 13530 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:24:18,358-Speed 25107.65 samples/sec Loss 3.9727 LearningRate 0.0008 Epoch: 7 Global Step: 13540 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:24:28,163-Speed 25067.33 samples/sec Loss 3.9798 LearningRate 0.0008 Epoch: 7 Global Step: 13550 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:24:37,883-Speed 25287.79 samples/sec Loss 4.0062 LearningRate 0.0008 Epoch: 7 Global Step: 13560 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:24:47,665-Speed 25127.16 samples/sec Loss 3.9492 LearningRate 0.0008 Epoch: 7 Global Step: 13570 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:24:57,529-Speed 24918.85 samples/sec Loss 3.9444 LearningRate 0.0008 Epoch: 7 Global Step: 13580 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:25:07,230-Speed 25338.04 samples/sec Loss 3.9328 LearningRate 0.0008 Epoch: 7 Global Step: 13590 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:25:17,112-Speed 24871.20 samples/sec Loss 3.9588 LearningRate 0.0008 Epoch: 7 Global Step: 13600 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:25:26,866-Speed 25199.81 samples/sec Loss 3.9645 LearningRate 0.0008 Epoch: 7 Global Step: 13610 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:25:36,654-Speed 25112.10 samples/sec Loss 3.9395 LearningRate 0.0008 Epoch: 7 Global Step: 13620 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:25:46,514-Speed 24931.49 samples/sec Loss 3.9846 LearningRate 0.0008 Epoch: 7 Global Step: 13630 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:25:56,284-Speed 25157.27 samples/sec Loss 3.9798 LearningRate 0.0008 Epoch: 7 Global Step: 13640 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:26:06,049-Speed 25170.92 samples/sec Loss 3.9545 LearningRate 0.0008 Epoch: 7 Global Step: 13650 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:26:15,797-Speed 25215.25 samples/sec Loss 3.9325 LearningRate 0.0008 Epoch: 7 Global Step: 13660 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:26:25,613-Speed 25041.95 samples/sec Loss 3.9420 LearningRate 0.0008 Epoch: 7 Global Step: 13670 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:26:35,372-Speed 25186.01 samples/sec Loss 3.9084 LearningRate 0.0008 Epoch: 7 Global Step: 13680 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:26:45,233-Speed 24923.96 samples/sec Loss 3.9437 LearningRate 0.0008 Epoch: 7 Global Step: 13690 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:26:55,018-Speed 25119.56 samples/sec Loss 3.9281 LearningRate 0.0008 Epoch: 7 Global Step: 13700 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:27:04,831-Speed 25054.88 samples/sec Loss 3.9250 LearningRate 0.0008 Epoch: 7 Global Step: 13710 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:27:14,575-Speed 25224.14 samples/sec Loss 3.9086 LearningRate 0.0008 Epoch: 7 Global Step: 13720 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:27:24,319-Speed 25224.34 samples/sec Loss 3.9345 LearningRate 0.0008 Epoch: 7 Global Step: 13730 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:27:34,149-Speed 25006.69 samples/sec Loss 3.9747 LearningRate 0.0008 Epoch: 7 Global Step: 13740 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:27:43,913-Speed 25170.46 samples/sec Loss 3.9478 LearningRate 0.0008 Epoch: 7 Global Step: 13750 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:27:53,674-Speed 25179.75 samples/sec Loss 3.9211 LearningRate 0.0008 Epoch: 7 Global Step: 13760 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:28:03,477-Speed 25071.99 samples/sec Loss 3.9136 LearningRate 0.0008 Epoch: 7 Global Step: 13770 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:28:13,267-Speed 25108.18 samples/sec Loss 3.9201 LearningRate 0.0008 Epoch: 7 Global Step: 13780 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:28:23,014-Speed 25215.51 samples/sec Loss 3.9430 LearningRate 0.0008 Epoch: 7 Global Step: 13790 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:28:32,757-Speed 25226.82 samples/sec Loss 3.9645 LearningRate 0.0008 Epoch: 7 Global Step: 13800 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:28:42,670-Speed 24793.67 samples/sec Loss 3.9645 LearningRate 0.0008 Epoch: 7 Global Step: 13810 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:28:52,440-Speed 25155.22 samples/sec Loss 3.9465 LearningRate 0.0008 Epoch: 7 Global Step: 13820 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:29:52,089-Speed 4120.21 samples/sec Loss 3.9138 LearningRate 0.0008 Epoch: 8 Global Step: 13830 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:30:01,806-Speed 25304.58 samples/sec Loss 3.8754 LearningRate 0.0008 Epoch: 8 Global Step: 13840 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:30:11,776-Speed 24652.69 samples/sec Loss 3.8626 LearningRate 0.0008 Epoch: 8 Global Step: 13850 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:30:21,613-Speed 24986.52 samples/sec Loss 3.8203 LearningRate 0.0008 Epoch: 8 Global Step: 13860 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:30:31,432-Speed 25031.82 samples/sec Loss 3.8493 LearningRate 0.0008 Epoch: 8 Global Step: 13870 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:30:41,207-Speed 25145.81 samples/sec Loss 3.8583 LearningRate 0.0008 Epoch: 8 Global Step: 13880 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-03-26 02:30:50,983-Speed 25142.01 samples/sec Loss 3.8407 LearningRate 0.0008 Epoch: 8 Global Step: 13890 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:31:00,868-Speed 24863.91 samples/sec Loss 3.8764 LearningRate 0.0008 Epoch: 8 Global Step: 13900 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:31:10,811-Speed 24720.84 samples/sec Loss 3.9215 LearningRate 0.0008 Epoch: 8 Global Step: 13910 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:31:20,934-Speed 24280.11 samples/sec Loss 3.8754 LearningRate 0.0008 Epoch: 8 Global Step: 13920 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:31:30,908-Speed 24644.35 samples/sec Loss 3.8833 LearningRate 0.0008 Epoch: 8 Global Step: 13930 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:31:40,901-Speed 24596.81 samples/sec Loss 3.8833 LearningRate 0.0008 Epoch: 8 Global Step: 13940 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:31:50,937-Speed 24490.19 samples/sec Loss 3.8605 LearningRate 0.0008 Epoch: 8 Global Step: 13950 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:32:00,888-Speed 24702.17 samples/sec Loss 3.8635 LearningRate 0.0008 Epoch: 8 Global Step: 13960 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:32:10,846-Speed 24683.30 samples/sec Loss 3.8599 LearningRate 0.0008 Epoch: 8 Global Step: 13970 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-26 02:32:20,827-Speed 24625.59 samples/sec Loss 3.9016 LearningRate 0.0008 Epoch: 8 Global Step: 13980 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:32:30,865-Speed 24486.37 samples/sec Loss 3.8945 LearningRate 0.0008 Epoch: 8 Global Step: 13990 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:32:40,887-Speed 24524.98 samples/sec Loss 3.8841 LearningRate 0.0008 Epoch: 8 Global Step: 14000 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:32:50,875-Speed 24609.17 samples/sec Loss 3.8477 LearningRate 0.0008 Epoch: 8 Global Step: 14010 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:33:00,788-Speed 24795.32 samples/sec Loss 3.8288 LearningRate 0.0008 Epoch: 8 Global Step: 14020 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:33:10,691-Speed 24819.11 samples/sec Loss 3.8192 LearningRate 0.0008 Epoch: 8 Global Step: 14030 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:33:20,671-Speed 24628.04 samples/sec Loss 3.9350 LearningRate 0.0008 Epoch: 8 Global Step: 14040 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:33:30,763-Speed 24354.26 samples/sec Loss 3.9634 LearningRate 0.0008 Epoch: 8 Global Step: 14050 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:33:40,788-Speed 24516.19 samples/sec Loss 3.8885 LearningRate 0.0008 Epoch: 8 Global Step: 14060 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:33:50,732-Speed 24717.24 samples/sec Loss 3.8600 LearningRate 0.0008 Epoch: 8 Global Step: 14070 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:34:00,687-Speed 24688.76 samples/sec Loss 3.8569 LearningRate 0.0008 Epoch: 8 Global Step: 14080 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:34:10,684-Speed 24586.48 samples/sec Loss 3.8499 LearningRate 0.0008 Epoch: 8 Global Step: 14090 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:34:20,591-Speed 24808.97 samples/sec Loss 3.8450 LearningRate 0.0008 Epoch: 8 Global Step: 14100 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:34:30,527-Speed 24737.10 samples/sec Loss 3.8493 LearningRate 0.0008 Epoch: 8 Global Step: 14110 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:34:40,570-Speed 24472.88 samples/sec Loss 3.8449 LearningRate 0.0008 Epoch: 8 Global Step: 14120 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:34:50,599-Speed 24508.89 samples/sec Loss 3.8731 LearningRate 0.0008 Epoch: 8 Global Step: 14130 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:35:00,590-Speed 24600.57 samples/sec Loss 3.8488 LearningRate 0.0008 Epoch: 8 Global Step: 14140 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:35:10,542-Speed 24697.02 samples/sec Loss 3.8270 LearningRate 0.0008 Epoch: 8 Global Step: 14150 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:35:20,584-Speed 24474.04 samples/sec Loss 3.8340 LearningRate 0.0008 Epoch: 8 Global Step: 14160 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:35:30,630-Speed 24465.74 samples/sec Loss 3.8367 LearningRate 0.0008 Epoch: 8 Global Step: 14170 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:35:40,815-Speed 24139.84 samples/sec Loss 3.8717 LearningRate 0.0008 Epoch: 8 Global Step: 14180 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:35:50,817-Speed 24577.49 samples/sec Loss 3.8553 LearningRate 0.0008 Epoch: 8 Global Step: 14190 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:36:00,839-Speed 24525.57 samples/sec Loss 3.8867 LearningRate 0.0008 Epoch: 8 Global Step: 14200 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:36:10,832-Speed 24593.65 samples/sec Loss 3.8680 LearningRate 0.0008 Epoch: 8 Global Step: 14210 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:36:20,924-Speed 24354.74 samples/sec Loss 3.8307 LearningRate 0.0008 Epoch: 8 Global Step: 14220 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:36:31,049-Speed 24274.67 samples/sec Loss 3.8332 LearningRate 0.0008 Epoch: 8 Global Step: 14230 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:36:41,067-Speed 24533.81 samples/sec Loss 3.8228 LearningRate 0.0008 Epoch: 8 Global Step: 14240 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:36:51,094-Speed 24513.57 samples/sec Loss 3.8476 LearningRate 0.0008 Epoch: 8 Global Step: 14250 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:37:01,083-Speed 24605.72 samples/sec Loss 3.8655 LearningRate 0.0008 Epoch: 8 Global Step: 14260 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:37:11,072-Speed 24607.45 samples/sec Loss 3.8269 LearningRate 0.0008 Epoch: 8 Global Step: 14270 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:37:21,177-Speed 24322.57 samples/sec Loss 3.8409 LearningRate 0.0008 Epoch: 8 Global Step: 14280 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:37:31,311-Speed 24251.79 samples/sec Loss 3.8134 LearningRate 0.0008 Epoch: 8 Global Step: 14290 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:37:41,331-Speed 24536.65 samples/sec Loss 3.8171 LearningRate 0.0008 Epoch: 8 Global Step: 14300 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:37:51,293-Speed 24672.51 samples/sec Loss 3.8267 LearningRate 0.0008 Epoch: 8 Global Step: 14310 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:38:01,314-Speed 24527.95 samples/sec Loss 3.8053 LearningRate 0.0008 Epoch: 8 Global Step: 14320 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:38:11,294-Speed 24628.11 samples/sec Loss 3.8315 LearningRate 0.0008 Epoch: 8 Global Step: 14330 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:38:21,239-Speed 24713.52 samples/sec Loss 3.8011 LearningRate 0.0008 Epoch: 8 Global Step: 14340 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:38:31,201-Speed 24671.86 samples/sec Loss 3.8117 LearningRate 0.0008 Epoch: 8 Global Step: 14350 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:38:41,176-Speed 24642.03 samples/sec Loss 3.8597 LearningRate 0.0008 Epoch: 8 Global Step: 14360 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:38:51,180-Speed 24569.03 samples/sec Loss 3.8187 LearningRate 0.0008 Epoch: 8 Global Step: 14370 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:39:01,064-Speed 24867.62 samples/sec Loss 3.8027 LearningRate 0.0008 Epoch: 8 Global Step: 14380 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:39:10,958-Speed 24842.19 samples/sec Loss 3.7951 LearningRate 0.0008 Epoch: 8 Global Step: 14390 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:39:20,779-Speed 25025.52 samples/sec Loss 3.7770 LearningRate 0.0008 Epoch: 8 Global Step: 14400 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:39:30,587-Speed 25061.07 samples/sec Loss 3.8071 LearningRate 0.0008 Epoch: 8 Global Step: 14410 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:39:40,418-Speed 25007.46 samples/sec Loss 3.7893 LearningRate 0.0008 Epoch: 8 Global Step: 14420 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:39:50,206-Speed 25112.12 samples/sec Loss 3.8100 LearningRate 0.0008 Epoch: 8 Global Step: 14430 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:40:00,228-Speed 24523.16 samples/sec Loss 3.7826 LearningRate 0.0008 Epoch: 8 Global Step: 14440 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:40:10,298-Speed 24407.26 samples/sec Loss 3.7731 LearningRate 0.0008 Epoch: 8 Global Step: 14450 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:40:20,315-Speed 24537.68 samples/sec Loss 3.7979 LearningRate 0.0008 Epoch: 8 Global Step: 14460 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:40:30,308-Speed 24596.12 samples/sec Loss 3.8110 LearningRate 0.0008 Epoch: 8 Global Step: 14470 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:40:40,396-Speed 24363.13 samples/sec Loss 3.8005 LearningRate 0.0008 Epoch: 8 Global Step: 14480 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:40:50,403-Speed 24561.29 samples/sec Loss 3.7814 LearningRate 0.0008 Epoch: 8 Global Step: 14490 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:41:00,376-Speed 24643.75 samples/sec Loss 3.7522 LearningRate 0.0008 Epoch: 8 Global Step: 14500 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:41:10,205-Speed 25006.77 samples/sec Loss 3.8140 LearningRate 0.0008 Epoch: 8 Global Step: 14510 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:41:19,937-Speed 25256.13 samples/sec Loss 3.8184 LearningRate 0.0008 Epoch: 8 Global Step: 14520 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:41:29,772-Speed 24992.04 samples/sec Loss 3.7823 LearningRate 0.0008 Epoch: 8 Global Step: 14530 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:41:39,608-Speed 24989.08 samples/sec Loss 3.7869 LearningRate 0.0008 Epoch: 8 Global Step: 14540 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:41:49,335-Speed 25267.75 samples/sec Loss 3.7520 LearningRate 0.0008 Epoch: 8 Global Step: 14550 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:41:59,303-Speed 24663.76 samples/sec Loss 3.7739 LearningRate 0.0008 Epoch: 8 Global Step: 14560 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:42:09,059-Speed 25194.16 samples/sec Loss 3.7741 LearningRate 0.0008 Epoch: 8 Global Step: 14570 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:42:19,071-Speed 24548.39 samples/sec Loss 3.7687 LearningRate 0.0008 Epoch: 8 Global Step: 14580 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:42:29,261-Speed 24125.37 samples/sec Loss 3.7813 LearningRate 0.0008 Epoch: 8 Global Step: 14590 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:42:39,451-Speed 24128.55 samples/sec Loss 3.8034 LearningRate 0.0008 Epoch: 8 Global Step: 14600 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:42:49,512-Speed 24430.40 samples/sec Loss 3.7719 LearningRate 0.0008 Epoch: 8 Global Step: 14610 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:42:59,588-Speed 24394.57 samples/sec Loss 3.7813 LearningRate 0.0008 Epoch: 8 Global Step: 14620 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:43:09,660-Speed 24403.17 samples/sec Loss 3.7635 LearningRate 0.0008 Epoch: 8 Global Step: 14630 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:43:19,747-Speed 24364.23 samples/sec Loss 3.7691 LearningRate 0.0008 Epoch: 8 Global Step: 14640 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:43:29,898-Speed 24212.44 samples/sec Loss 3.7563 LearningRate 0.0008 Epoch: 8 Global Step: 14650 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:43:39,952-Speed 24447.44 samples/sec Loss 3.7880 LearningRate 0.0008 Epoch: 8 Global Step: 14660 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:43:50,150-Speed 24100.34 samples/sec Loss 3.7867 LearningRate 0.0008 Epoch: 8 Global Step: 14670 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:44:00,236-Speed 24376.77 samples/sec Loss 3.7434 LearningRate 0.0008 Epoch: 8 Global Step: 14680 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-03-26 02:44:10,309-Speed 24400.31 samples/sec Loss 3.7230 LearningRate 0.0008 Epoch: 8 Global Step: 14690 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:44:20,389-Speed 24382.19 samples/sec Loss 3.7192 LearningRate 0.0008 Epoch: 8 Global Step: 14700 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:44:30,465-Speed 24392.41 samples/sec Loss 3.7501 LearningRate 0.0008 Epoch: 8 Global Step: 14710 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:44:40,502-Speed 24487.07 samples/sec Loss 3.7627 LearningRate 0.0008 Epoch: 8 Global Step: 14720 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:44:50,588-Speed 24368.43 samples/sec Loss 3.7497 LearningRate 0.0008 Epoch: 8 Global Step: 14730 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:45:00,706-Speed 24293.43 samples/sec Loss 3.7436 LearningRate 0.0008 Epoch: 8 Global Step: 14740 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:45:10,798-Speed 24354.35 samples/sec Loss 3.7419 LearningRate 0.0008 Epoch: 8 Global Step: 14750 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:45:20,888-Speed 24357.15 samples/sec Loss 3.7032 LearningRate 0.0008 Epoch: 8 Global Step: 14760 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:45:30,950-Speed 24426.61 samples/sec Loss 3.7331 LearningRate 0.0008 Epoch: 8 Global Step: 14770 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:45:40,952-Speed 24575.21 samples/sec Loss 3.7303 LearningRate 0.0008 Epoch: 8 Global Step: 14780 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:45:50,785-Speed 24993.52 samples/sec Loss 3.7397 LearningRate 0.0008 Epoch: 8 Global Step: 14790 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:46:00,586-Speed 25078.60 samples/sec Loss 3.7116 LearningRate 0.0008 Epoch: 8 Global Step: 14800 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:46:10,334-Speed 25216.87 samples/sec Loss 3.7671 LearningRate 0.0008 Epoch: 8 Global Step: 14810 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:46:20,093-Speed 25183.89 samples/sec Loss 3.7466 LearningRate 0.0008 Epoch: 8 Global Step: 14820 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:46:29,823-Speed 25262.61 samples/sec Loss 3.7333 LearningRate 0.0008 Epoch: 8 Global Step: 14830 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:46:39,578-Speed 25194.89 samples/sec Loss 3.7083 LearningRate 0.0008 Epoch: 8 Global Step: 14840 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:46:49,405-Speed 25011.12 samples/sec Loss 3.6960 LearningRate 0.0008 Epoch: 8 Global Step: 14850 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:46:59,104-Speed 25341.43 samples/sec Loss 3.7007 LearningRate 0.0008 Epoch: 8 Global Step: 14860 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:47:08,878-Speed 25148.12 samples/sec Loss 3.7545 LearningRate 0.0008 Epoch: 8 Global Step: 14870 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:47:18,667-Speed 25109.12 samples/sec Loss 3.7929 LearningRate 0.0008 Epoch: 8 Global Step: 14880 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:47:28,540-Speed 24893.96 samples/sec Loss 3.7056 LearningRate 0.0008 Epoch: 8 Global Step: 14890 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:47:38,368-Speed 25006.97 samples/sec Loss 3.6920 LearningRate 0.0008 Epoch: 8 Global Step: 14900 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:47:48,138-Speed 25158.31 samples/sec Loss 3.7382 LearningRate 0.0008 Epoch: 8 Global Step: 14910 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:47:57,898-Speed 25183.25 samples/sec Loss 3.7458 LearningRate 0.0008 Epoch: 8 Global Step: 14920 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:48:07,699-Speed 25075.97 samples/sec Loss 3.6889 LearningRate 0.0008 Epoch: 8 Global Step: 14930 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:48:17,471-Speed 25152.72 samples/sec Loss 3.6965 LearningRate 0.0008 Epoch: 8 Global Step: 14940 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:48:27,190-Speed 25289.41 samples/sec Loss 3.7136 LearningRate 0.0008 Epoch: 8 Global Step: 14950 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:48:36,961-Speed 25153.41 samples/sec Loss 3.7224 LearningRate 0.0008 Epoch: 8 Global Step: 14960 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:48:46,755-Speed 25094.24 samples/sec Loss 3.6817 LearningRate 0.0008 Epoch: 8 Global Step: 14970 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:48:56,595-Speed 24980.00 samples/sec Loss 3.6657 LearningRate 0.0008 Epoch: 8 Global Step: 14980 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:49:06,441-Speed 24963.01 samples/sec Loss 3.7167 LearningRate 0.0008 Epoch: 8 Global Step: 14990 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:49:16,203-Speed 25178.10 samples/sec Loss 3.6801 LearningRate 0.0008 Epoch: 8 Global Step: 15000 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:49:25,921-Speed 25291.08 samples/sec Loss 3.7028 LearningRate 0.0008 Epoch: 8 Global Step: 15010 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:49:35,651-Speed 25257.63 samples/sec Loss 3.6936 LearningRate 0.0008 Epoch: 8 Global Step: 15020 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:49:45,382-Speed 25260.09 samples/sec Loss 3.6745 LearningRate 0.0008 Epoch: 8 Global Step: 15030 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:49:55,199-Speed 25034.20 samples/sec Loss 3.6887 LearningRate 0.0008 Epoch: 8 Global Step: 15040 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:50:04,976-Speed 25139.79 samples/sec Loss 3.6938 LearningRate 0.0008 Epoch: 8 Global Step: 15050 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:50:14,680-Speed 25327.32 samples/sec Loss 3.6447 LearningRate 0.0008 Epoch: 8 Global Step: 15060 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:50:24,511-Speed 25002.61 samples/sec Loss 3.7039 LearningRate 0.0008 Epoch: 8 Global Step: 15070 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:50:34,209-Speed 25343.38 samples/sec Loss 3.7353 LearningRate 0.0008 Epoch: 8 Global Step: 15080 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:50:44,120-Speed 24797.74 samples/sec Loss 3.7074 LearningRate 0.0008 Epoch: 8 Global Step: 15090 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:50:53,896-Speed 25141.83 samples/sec Loss 3.6889 LearningRate 0.0008 Epoch: 8 Global Step: 15100 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:51:03,675-Speed 25134.27 samples/sec Loss 3.7079 LearningRate 0.0008 Epoch: 8 Global Step: 15110 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:51:13,435-Speed 25182.60 samples/sec Loss 3.7334 LearningRate 0.0008 Epoch: 8 Global Step: 15120 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:51:23,229-Speed 25102.42 samples/sec Loss 3.6983 LearningRate 0.0008 Epoch: 8 Global Step: 15130 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:51:32,997-Speed 25164.22 samples/sec Loss 3.6790 LearningRate 0.0008 Epoch: 8 Global Step: 15140 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:51:42,801-Speed 25070.06 samples/sec Loss 3.6835 LearningRate 0.0008 Epoch: 8 Global Step: 15150 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:51:52,699-Speed 24831.19 samples/sec Loss 3.6716 LearningRate 0.0008 Epoch: 8 Global Step: 15160 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:52:02,406-Speed 25323.23 samples/sec Loss 3.6744 LearningRate 0.0008 Epoch: 8 Global Step: 15170 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:52:12,179-Speed 25149.93 samples/sec Loss 3.6643 LearningRate 0.0008 Epoch: 8 Global Step: 15180 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:52:21,948-Speed 25159.69 samples/sec Loss 3.6516 LearningRate 0.0008 Epoch: 8 Global Step: 15190 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:52:31,666-Speed 25291.32 samples/sec Loss 3.6752 LearningRate 0.0008 Epoch: 8 Global Step: 15200 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:52:41,474-Speed 25060.83 samples/sec Loss 3.6775 LearningRate 0.0008 Epoch: 8 Global Step: 15210 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:52:51,209-Speed 25247.44 samples/sec Loss 3.7118 LearningRate 0.0008 Epoch: 8 Global Step: 15220 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:53:01,075-Speed 24917.95 samples/sec Loss 3.6881 LearningRate 0.0008 Epoch: 8 Global Step: 15230 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:53:10,782-Speed 25320.01 samples/sec Loss 3.6385 LearningRate 0.0008 Epoch: 8 Global Step: 15240 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:53:20,596-Speed 25043.85 samples/sec Loss 3.6801 LearningRate 0.0007 Epoch: 8 Global Step: 15250 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:53:30,365-Speed 25164.24 samples/sec Loss 3.6955 LearningRate 0.0007 Epoch: 8 Global Step: 15260 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:53:40,058-Speed 25356.96 samples/sec Loss 3.6569 LearningRate 0.0007 Epoch: 8 Global Step: 15270 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:53:49,877-Speed 25032.87 samples/sec Loss 3.6764 LearningRate 0.0007 Epoch: 8 Global Step: 15280 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:53:59,635-Speed 25185.90 samples/sec Loss 3.6577 LearningRate 0.0007 Epoch: 8 Global Step: 15290 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:54:09,389-Speed 25200.55 samples/sec Loss 3.6485 LearningRate 0.0007 Epoch: 8 Global Step: 15300 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:54:19,192-Speed 25071.86 samples/sec Loss 3.6554 LearningRate 0.0007 Epoch: 8 Global Step: 15310 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:54:28,834-Speed 25491.21 samples/sec Loss 3.6525 LearningRate 0.0007 Epoch: 8 Global Step: 15320 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:54:38,558-Speed 25278.12 samples/sec Loss 3.6484 LearningRate 0.0007 Epoch: 8 Global Step: 15330 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:54:48,302-Speed 25227.55 samples/sec Loss 3.6362 LearningRate 0.0007 Epoch: 8 Global Step: 15340 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:54:58,078-Speed 25144.65 samples/sec Loss 3.6567 LearningRate 0.0007 Epoch: 8 Global Step: 15350 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:55:07,843-Speed 25169.14 samples/sec Loss 3.6487 LearningRate 0.0007 Epoch: 8 Global Step: 15360 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:55:17,680-Speed 24988.08 samples/sec Loss 3.6287 LearningRate 0.0007 Epoch: 8 Global Step: 15370 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:55:27,382-Speed 25335.95 samples/sec Loss 3.6575 LearningRate 0.0007 Epoch: 8 Global Step: 15380 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:55:37,293-Speed 24800.64 samples/sec Loss 3.6451 LearningRate 0.0007 Epoch: 8 Global Step: 15390 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:55:47,151-Speed 24932.64 samples/sec Loss 3.6442 LearningRate 0.0007 Epoch: 8 Global Step: 15400 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:55:57,094-Speed 24718.86 samples/sec Loss 3.6499 LearningRate 0.0007 Epoch: 8 Global Step: 15410 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:56:07,007-Speed 24793.07 samples/sec Loss 3.6162 LearningRate 0.0007 Epoch: 8 Global Step: 15420 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:56:16,697-Speed 25366.30 samples/sec Loss 3.6191 LearningRate 0.0007 Epoch: 8 Global Step: 15430 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:56:26,412-Speed 25301.62 samples/sec Loss 3.6449 LearningRate 0.0007 Epoch: 8 Global Step: 15440 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:56:36,142-Speed 25260.92 samples/sec Loss 3.6266 LearningRate 0.0007 Epoch: 8 Global Step: 15450 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:56:45,903-Speed 25189.23 samples/sec Loss 3.6670 LearningRate 0.0007 Epoch: 8 Global Step: 15460 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:56:55,689-Speed 25119.13 samples/sec Loss 3.6624 LearningRate 0.0007 Epoch: 8 Global Step: 15470 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:57:05,525-Speed 24987.87 samples/sec Loss 3.6319 LearningRate 0.0007 Epoch: 8 Global Step: 15480 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:57:15,274-Speed 25211.04 samples/sec Loss 3.6439 LearningRate 0.0007 Epoch: 8 Global Step: 15490 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:57:25,097-Speed 25022.45 samples/sec Loss 3.6710 LearningRate 0.0007 Epoch: 8 Global Step: 15500 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:57:35,067-Speed 24653.26 samples/sec Loss 3.6369 LearningRate 0.0007 Epoch: 8 Global Step: 15510 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:57:44,932-Speed 24915.54 samples/sec Loss 3.6461 LearningRate 0.0007 Epoch: 8 Global Step: 15520 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:57:54,701-Speed 25158.56 samples/sec Loss 3.6745 LearningRate 0.0007 Epoch: 8 Global Step: 15530 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:58:04,518-Speed 25038.50 samples/sec Loss 3.6548 LearningRate 0.0007 Epoch: 8 Global Step: 15540 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:58:14,379-Speed 24924.53 samples/sec Loss 3.6490 LearningRate 0.0007 Epoch: 8 Global Step: 15550 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:59:14,502-Speed 4087.72 samples/sec Loss 3.6441 LearningRate 0.0007 Epoch: 9 Global Step: 15560 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:59:24,210-Speed 25320.27 samples/sec Loss 3.5868 LearningRate 0.0007 Epoch: 9 Global Step: 15570 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 02:59:33,974-Speed 25172.88 samples/sec Loss 3.5680 LearningRate 0.0007 Epoch: 9 Global Step: 15580 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:59:43,811-Speed 24985.22 samples/sec Loss 3.5681 LearningRate 0.0007 Epoch: 9 Global Step: 15590 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 02:59:53,651-Speed 24979.20 samples/sec Loss 3.5854 LearningRate 0.0007 Epoch: 9 Global Step: 15600 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:00:03,445-Speed 25101.45 samples/sec Loss 3.6064 LearningRate 0.0007 Epoch: 9 Global Step: 15610 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:00:13,277-Speed 24998.75 samples/sec Loss 3.5929 LearningRate 0.0007 Epoch: 9 Global Step: 15620 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:00:23,076-Speed 25083.46 samples/sec Loss 3.6022 LearningRate 0.0007 Epoch: 9 Global Step: 15630 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:00:32,845-Speed 25160.80 samples/sec Loss 3.5510 LearningRate 0.0007 Epoch: 9 Global Step: 15640 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:00:42,567-Speed 25282.12 samples/sec Loss 3.6098 LearningRate 0.0007 Epoch: 9 Global Step: 15650 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:00:52,399-Speed 24998.20 samples/sec Loss 3.5605 LearningRate 0.0007 Epoch: 9 Global Step: 15660 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:01:02,152-Speed 25206.25 samples/sec Loss 3.5943 LearningRate 0.0007 Epoch: 9 Global Step: 15670 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:01:11,820-Speed 25424.51 samples/sec Loss 3.6076 LearningRate 0.0007 Epoch: 9 Global Step: 15680 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 03:01:21,562-Speed 25229.99 samples/sec Loss 3.5713 LearningRate 0.0007 Epoch: 9 Global Step: 15690 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 03:01:31,241-Speed 25396.71 samples/sec Loss 3.5960 LearningRate 0.0007 Epoch: 9 Global Step: 15700 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:01:40,977-Speed 25244.95 samples/sec Loss 3.5886 LearningRate 0.0007 Epoch: 9 Global Step: 15710 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:01:50,685-Speed 25320.07 samples/sec Loss 3.5696 LearningRate 0.0007 Epoch: 9 Global Step: 15720 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:02:00,458-Speed 25148.55 samples/sec Loss 3.5917 LearningRate 0.0007 Epoch: 9 Global Step: 15730 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:02:10,341-Speed 24870.09 samples/sec Loss 3.6016 LearningRate 0.0007 Epoch: 9 Global Step: 15740 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:02:20,057-Speed 25298.53 samples/sec Loss 3.6090 LearningRate 0.0007 Epoch: 9 Global Step: 15750 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:02:29,957-Speed 24825.32 samples/sec Loss 3.6045 LearningRate 0.0007 Epoch: 9 Global Step: 15760 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:02:39,779-Speed 25026.66 samples/sec Loss 3.5775 LearningRate 0.0007 Epoch: 9 Global Step: 15770 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:02:49,536-Speed 25189.44 samples/sec Loss 3.5943 LearningRate 0.0007 Epoch: 9 Global Step: 15780 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:02:59,326-Speed 25106.57 samples/sec Loss 3.5538 LearningRate 0.0007 Epoch: 9 Global Step: 15790 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:03:09,107-Speed 25129.83 samples/sec Loss 3.5863 LearningRate 0.0007 Epoch: 9 Global Step: 15800 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 03:03:19,009-Speed 24822.40 samples/sec Loss 3.5850 LearningRate 0.0007 Epoch: 9 Global Step: 15810 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 03:03:28,994-Speed 24617.53 samples/sec Loss 3.5695 LearningRate 0.0007 Epoch: 9 Global Step: 15820 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:03:38,878-Speed 24867.27 samples/sec Loss 3.5683 LearningRate 0.0007 Epoch: 9 Global Step: 15830 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:03:48,551-Speed 25409.26 samples/sec Loss 3.5756 LearningRate 0.0007 Epoch: 9 Global Step: 15840 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:03:58,266-Speed 25308.00 samples/sec Loss 3.5401 LearningRate 0.0007 Epoch: 9 Global Step: 15850 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:04:08,051-Speed 25119.21 samples/sec Loss 3.5696 LearningRate 0.0007 Epoch: 9 Global Step: 15860 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:04:17,860-Speed 25057.85 samples/sec Loss 3.6129 LearningRate 0.0007 Epoch: 9 Global Step: 15870 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:04:27,688-Speed 25011.32 samples/sec Loss 3.6185 LearningRate 0.0007 Epoch: 9 Global Step: 15880 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:04:37,429-Speed 25231.30 samples/sec Loss 3.5756 LearningRate 0.0007 Epoch: 9 Global Step: 15890 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:04:47,229-Speed 25079.49 samples/sec Loss 3.5473 LearningRate 0.0007 Epoch: 9 Global Step: 15900 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:04:56,981-Speed 25206.56 samples/sec Loss 3.5526 LearningRate 0.0007 Epoch: 9 Global Step: 15910 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:05:06,736-Speed 25197.36 samples/sec Loss 3.5999 LearningRate 0.0007 Epoch: 9 Global Step: 15920 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 03:05:16,551-Speed 25041.76 samples/sec Loss 3.5770 LearningRate 0.0007 Epoch: 9 Global Step: 15930 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 03:05:26,315-Speed 25172.86 samples/sec Loss 3.6052 LearningRate 0.0007 Epoch: 9 Global Step: 15940 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:05:36,150-Speed 24992.71 samples/sec Loss 3.5901 LearningRate 0.0007 Epoch: 9 Global Step: 15950 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:05:45,905-Speed 25198.66 samples/sec Loss 3.5432 LearningRate 0.0007 Epoch: 9 Global Step: 15960 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:05:55,751-Speed 24964.61 samples/sec Loss 3.5398 LearningRate 0.0007 Epoch: 9 Global Step: 15970 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:06:05,673-Speed 24773.63 samples/sec Loss 3.5431 LearningRate 0.0007 Epoch: 9 Global Step: 15980 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:06:15,436-Speed 25176.95 samples/sec Loss 3.5514 LearningRate 0.0007 Epoch: 9 Global Step: 15990 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:06:25,240-Speed 25070.74 samples/sec Loss 3.5379 LearningRate 0.0007 Epoch: 9 Global Step: 16000 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:06:35,034-Speed 25097.09 samples/sec Loss 3.5309 LearningRate 0.0007 Epoch: 9 Global Step: 16010 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:06:44,881-Speed 24961.15 samples/sec Loss 3.5691 LearningRate 0.0007 Epoch: 9 Global Step: 16020 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:06:54,755-Speed 24893.28 samples/sec Loss 3.5359 LearningRate 0.0007 Epoch: 9 Global Step: 16030 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:07:04,587-Speed 24999.47 samples/sec Loss 3.5690 LearningRate 0.0007 Epoch: 9 Global Step: 16040 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 03:07:14,341-Speed 25201.69 samples/sec Loss 3.5378 LearningRate 0.0007 Epoch: 9 Global Step: 16050 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 03:07:24,132-Speed 25104.14 samples/sec Loss 3.5177 LearningRate 0.0007 Epoch: 9 Global Step: 16060 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 03:07:33,922-Speed 25106.00 samples/sec Loss 3.5730 LearningRate 0.0007 Epoch: 9 Global Step: 16070 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 03:07:43,723-Speed 25078.82 samples/sec Loss 3.5489 LearningRate 0.0007 Epoch: 9 Global Step: 16080 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 03:07:53,534-Speed 25054.51 samples/sec Loss 3.5346 LearningRate 0.0007 Epoch: 9 Global Step: 16090 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 03:08:03,279-Speed 25222.09 samples/sec Loss 3.5339 LearningRate 0.0007 Epoch: 9 Global Step: 16100 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 03:08:13,121-Speed 24972.74 samples/sec Loss 3.5019 LearningRate 0.0007 Epoch: 9 Global Step: 16110 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 03:08:22,858-Speed 25248.60 samples/sec Loss 3.5400 LearningRate 0.0007 Epoch: 9 Global Step: 16120 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 03:08:32,705-Speed 24962.02 samples/sec Loss 3.5584 LearningRate 0.0007 Epoch: 9 Global Step: 16130 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 03:08:42,583-Speed 24883.86 samples/sec Loss 3.5348 LearningRate 0.0007 Epoch: 9 Global Step: 16140 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:08:52,303-Speed 25287.61 samples/sec Loss 3.5363 LearningRate 0.0007 Epoch: 9 Global Step: 16150 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:09:02,093-Speed 25110.43 samples/sec Loss 3.5282 LearningRate 0.0007 Epoch: 9 Global Step: 16160 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:09:11,920-Speed 25011.38 samples/sec Loss 3.5282 LearningRate 0.0007 Epoch: 9 Global Step: 16170 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:09:21,674-Speed 25197.78 samples/sec Loss 3.5182 LearningRate 0.0007 Epoch: 9 Global Step: 16180 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:09:31,571-Speed 24834.81 samples/sec Loss 3.5348 LearningRate 0.0007 Epoch: 9 Global Step: 16190 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:09:41,486-Speed 24791.23 samples/sec Loss 3.5445 LearningRate 0.0007 Epoch: 9 Global Step: 16200 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:09:51,179-Speed 25356.99 samples/sec Loss 3.5194 LearningRate 0.0007 Epoch: 9 Global Step: 16210 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:10:00,885-Speed 25321.73 samples/sec Loss 3.5180 LearningRate 0.0007 Epoch: 9 Global Step: 16220 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:10:10,661-Speed 25142.62 samples/sec Loss 3.5372 LearningRate 0.0007 Epoch: 9 Global Step: 16230 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:10:20,504-Speed 24972.12 samples/sec Loss 3.5486 LearningRate 0.0007 Epoch: 9 Global Step: 16240 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 03:10:30,314-Speed 25055.74 samples/sec Loss 3.5492 LearningRate 0.0007 Epoch: 9 Global Step: 16250 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 03:10:40,071-Speed 25190.99 samples/sec Loss 3.5013 LearningRate 0.0007 Epoch: 9 Global Step: 16260 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 03:10:49,872-Speed 25075.99 samples/sec Loss 3.5129 LearningRate 0.0007 Epoch: 9 Global Step: 16270 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:10:59,540-Speed 25425.83 samples/sec Loss 3.5030 LearningRate 0.0007 Epoch: 9 Global Step: 16280 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:11:09,362-Speed 25024.22 samples/sec Loss 3.5164 LearningRate 0.0007 Epoch: 9 Global Step: 16290 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:11:19,175-Speed 25048.14 samples/sec Loss 3.5519 LearningRate 0.0007 Epoch: 9 Global Step: 16300 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:11:28,973-Speed 25086.62 samples/sec Loss 3.5333 LearningRate 0.0007 Epoch: 9 Global Step: 16310 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:11:38,666-Speed 25356.26 samples/sec Loss 3.5194 LearningRate 0.0007 Epoch: 9 Global Step: 16320 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-26 03:11:48,468-Speed 25076.75 samples/sec Loss 3.5213 LearningRate 0.0007 Epoch: 9 Global Step: 16330 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-26 03:11:58,151-Speed 25383.77 samples/sec Loss 3.5094 LearningRate 0.0007 Epoch: 9 Global Step: 16340 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-26 03:12:07,946-Speed 25093.16 samples/sec Loss 3.5254 LearningRate 0.0007 Epoch: 9 Global Step: 16350 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-26 03:12:17,733-Speed 25115.26 samples/sec Loss 3.4925 LearningRate 0.0007 Epoch: 9 Global Step: 16360 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-26 03:12:27,436-Speed 25330.74 samples/sec Loss 3.4788 LearningRate 0.0007 Epoch: 9 Global Step: 16370 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-26 03:12:37,176-Speed 25237.49 samples/sec Loss 3.4760 LearningRate 0.0007 Epoch: 9 Global Step: 16380 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-26 03:12:46,974-Speed 25085.88 samples/sec Loss 3.5000 LearningRate 0.0007 Epoch: 9 Global Step: 16390 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-26 03:12:56,936-Speed 24674.46 samples/sec Loss 3.4841 LearningRate 0.0007 Epoch: 9 Global Step: 16400 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-26 03:13:06,758-Speed 25024.22 samples/sec Loss 3.4826 LearningRate 0.0007 Epoch: 9 Global Step: 16410 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-26 03:13:16,565-Speed 25063.46 samples/sec Loss 3.4806 LearningRate 0.0007 Epoch: 9 Global Step: 16420 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:13:26,481-Speed 24786.46 samples/sec Loss 3.5020 LearningRate 0.0007 Epoch: 9 Global Step: 16430 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:13:36,423-Speed 24723.35 samples/sec Loss 3.5087 LearningRate 0.0007 Epoch: 9 Global Step: 16440 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:13:46,273-Speed 24954.28 samples/sec Loss 3.4854 LearningRate 0.0007 Epoch: 9 Global Step: 16450 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:13:56,199-Speed 24766.14 samples/sec Loss 3.5074 LearningRate 0.0007 Epoch: 9 Global Step: 16460 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:14:05,993-Speed 25102.01 samples/sec Loss 3.4830 LearningRate 0.0007 Epoch: 9 Global Step: 16470 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:14:15,812-Speed 25037.89 samples/sec Loss 3.5128 LearningRate 0.0007 Epoch: 9 Global Step: 16480 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:14:25,642-Speed 25005.62 samples/sec Loss 3.5047 LearningRate 0.0007 Epoch: 9 Global Step: 16490 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:14:35,431-Speed 25108.51 samples/sec Loss 3.4812 LearningRate 0.0007 Epoch: 9 Global Step: 16500 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:14:45,246-Speed 25039.92 samples/sec Loss 3.4815 LearningRate 0.0007 Epoch: 9 Global Step: 16510 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:14:55,040-Speed 25097.54 samples/sec Loss 3.4829 LearningRate 0.0007 Epoch: 9 Global Step: 16520 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 03:15:04,923-Speed 24877.48 samples/sec Loss 3.4708 LearningRate 0.0007 Epoch: 9 Global Step: 16530 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 03:15:14,684-Speed 25181.96 samples/sec Loss 3.4577 LearningRate 0.0007 Epoch: 9 Global Step: 16540 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 03:15:24,500-Speed 25040.27 samples/sec Loss 3.5109 LearningRate 0.0007 Epoch: 9 Global Step: 16550 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 03:15:34,323-Speed 25023.32 samples/sec Loss 3.4780 LearningRate 0.0007 Epoch: 9 Global Step: 16560 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 03:15:44,118-Speed 25092.32 samples/sec Loss 3.4836 LearningRate 0.0007 Epoch: 9 Global Step: 16570 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:15:53,980-Speed 24923.32 samples/sec Loss 3.4636 LearningRate 0.0007 Epoch: 9 Global Step: 16580 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:16:03,767-Speed 25115.45 samples/sec Loss 3.4982 LearningRate 0.0007 Epoch: 9 Global Step: 16590 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:16:13,500-Speed 25253.91 samples/sec Loss 3.4680 LearningRate 0.0007 Epoch: 9 Global Step: 16600 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:16:23,364-Speed 24918.26 samples/sec Loss 3.4590 LearningRate 0.0007 Epoch: 9 Global Step: 16610 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:16:33,366-Speed 24574.61 samples/sec Loss 3.4504 LearningRate 0.0007 Epoch: 9 Global Step: 16620 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:16:43,309-Speed 24721.29 samples/sec Loss 3.4552 LearningRate 0.0007 Epoch: 9 Global Step: 16630 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:16:53,201-Speed 24849.37 samples/sec Loss 3.4503 LearningRate 0.0007 Epoch: 9 Global Step: 16640 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:17:02,981-Speed 25131.00 samples/sec Loss 3.4515 LearningRate 0.0007 Epoch: 9 Global Step: 16650 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:17:12,791-Speed 25058.41 samples/sec Loss 3.4716 LearningRate 0.0007 Epoch: 9 Global Step: 16660 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:17:22,526-Speed 25247.96 samples/sec Loss 3.4429 LearningRate 0.0007 Epoch: 9 Global Step: 16670 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:17:32,422-Speed 24837.55 samples/sec Loss 3.4596 LearningRate 0.0007 Epoch: 9 Global Step: 16680 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:17:42,351-Speed 24755.07 samples/sec Loss 3.4629 LearningRate 0.0007 Epoch: 9 Global Step: 16690 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:17:52,301-Speed 24701.66 samples/sec Loss 3.4979 LearningRate 0.0007 Epoch: 9 Global Step: 16700 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:18:02,052-Speed 25208.27 samples/sec Loss 3.4627 LearningRate 0.0007 Epoch: 9 Global Step: 16710 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:18:11,871-Speed 25033.37 samples/sec Loss 3.4730 LearningRate 0.0007 Epoch: 9 Global Step: 16720 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:18:21,706-Speed 24991.66 samples/sec Loss 3.4408 LearningRate 0.0007 Epoch: 9 Global Step: 16730 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:18:31,473-Speed 25165.78 samples/sec Loss 3.4367 LearningRate 0.0007 Epoch: 9 Global Step: 16740 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:18:41,338-Speed 24914.58 samples/sec Loss 3.4400 LearningRate 0.0007 Epoch: 9 Global Step: 16750 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:18:51,155-Speed 25036.57 samples/sec Loss 3.4462 LearningRate 0.0007 Epoch: 9 Global Step: 16760 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:19:01,020-Speed 24914.99 samples/sec Loss 3.4308 LearningRate 0.0007 Epoch: 9 Global Step: 16770 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-26 03:19:10,823-Speed 25073.46 samples/sec Loss 3.4172 LearningRate 0.0007 Epoch: 9 Global Step: 16780 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-26 03:19:20,567-Speed 25227.43 samples/sec Loss 3.4444 LearningRate 0.0007 Epoch: 9 Global Step: 16790 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-26 03:19:30,312-Speed 25223.20 samples/sec Loss 3.4331 LearningRate 0.0007 Epoch: 9 Global Step: 16800 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-26 03:19:40,167-Speed 24942.23 samples/sec Loss 3.4567 LearningRate 0.0007 Epoch: 9 Global Step: 16810 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-26 03:19:50,026-Speed 24929.52 samples/sec Loss 3.4350 LearningRate 0.0007 Epoch: 9 Global Step: 16820 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-26 03:19:59,727-Speed 25335.86 samples/sec Loss 3.4625 LearningRate 0.0007 Epoch: 9 Global Step: 16830 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-26 03:20:09,461-Speed 25252.64 samples/sec Loss 3.4191 LearningRate 0.0007 Epoch: 9 Global Step: 16840 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-26 03:20:19,177-Speed 25298.51 samples/sec Loss 3.4326 LearningRate 0.0007 Epoch: 9 Global Step: 16850 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-26 03:20:28,845-Speed 25422.98 samples/sec Loss 3.4223 LearningRate 0.0007 Epoch: 9 Global Step: 16860 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-26 03:20:38,656-Speed 25052.15 samples/sec Loss 3.4457 LearningRate 0.0007 Epoch: 9 Global Step: 16870 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:20:48,389-Speed 25254.31 samples/sec Loss 3.4540 LearningRate 0.0007 Epoch: 9 Global Step: 16880 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:20:58,237-Speed 24958.17 samples/sec Loss 3.4429 LearningRate 0.0007 Epoch: 9 Global Step: 16890 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:21:08,115-Speed 24882.00 samples/sec Loss 3.3960 LearningRate 0.0007 Epoch: 9 Global Step: 16900 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:21:17,854-Speed 25237.47 samples/sec Loss 3.3961 LearningRate 0.0007 Epoch: 9 Global Step: 16910 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:21:27,608-Speed 25199.17 samples/sec Loss 3.4239 LearningRate 0.0007 Epoch: 9 Global Step: 16920 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:21:37,397-Speed 25108.68 samples/sec Loss 3.4695 LearningRate 0.0007 Epoch: 9 Global Step: 16930 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:21:47,207-Speed 25053.68 samples/sec Loss 3.4916 LearningRate 0.0007 Epoch: 9 Global Step: 16940 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:21:56,990-Speed 25126.37 samples/sec Loss 3.4355 LearningRate 0.0007 Epoch: 9 Global Step: 16950 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:22:06,818-Speed 25009.85 samples/sec Loss 3.4355 LearningRate 0.0007 Epoch: 9 Global Step: 16960 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:22:16,628-Speed 25056.85 samples/sec Loss 3.4139 LearningRate 0.0007 Epoch: 9 Global Step: 16970 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 03:22:26,467-Speed 24982.79 samples/sec Loss 3.4442 LearningRate 0.0007 Epoch: 9 Global Step: 16980 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 03:22:36,383-Speed 24786.72 samples/sec Loss 3.4101 LearningRate 0.0007 Epoch: 9 Global Step: 16990 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 03:22:46,162-Speed 25135.84 samples/sec Loss 3.4243 LearningRate 0.0007 Epoch: 9 Global Step: 17000 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:22:56,144-Speed 24622.89 samples/sec Loss 3.4181 LearningRate 0.0007 Epoch: 9 Global Step: 17010 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:23:05,923-Speed 25136.28 samples/sec Loss 3.4280 LearningRate 0.0007 Epoch: 9 Global Step: 17020 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:23:15,695-Speed 25152.91 samples/sec Loss 3.4091 LearningRate 0.0007 Epoch: 9 Global Step: 17030 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:23:25,564-Speed 24902.95 samples/sec Loss 3.4289 LearningRate 0.0007 Epoch: 9 Global Step: 17040 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:23:35,293-Speed 25276.12 samples/sec Loss 3.4450 LearningRate 0.0007 Epoch: 9 Global Step: 17050 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:23:44,966-Speed 25410.19 samples/sec Loss 3.4087 LearningRate 0.0007 Epoch: 9 Global Step: 17060 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:23:54,893-Speed 24763.58 samples/sec Loss 3.4361 LearningRate 0.0007 Epoch: 9 Global Step: 17070 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:24:04,658-Speed 25178.92 samples/sec Loss 3.4022 LearningRate 0.0007 Epoch: 9 Global Step: 17080 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:24:14,525-Speed 24909.85 samples/sec Loss 3.4006 LearningRate 0.0007 Epoch: 9 Global Step: 17090 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:24:24,345-Speed 25031.08 samples/sec Loss 3.3976 LearningRate 0.0007 Epoch: 9 Global Step: 17100 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 03:24:34,029-Speed 25379.81 samples/sec Loss 3.4293 LearningRate 0.0007 Epoch: 9 Global Step: 17110 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:24:43,830-Speed 25078.66 samples/sec Loss 3.4243 LearningRate 0.0007 Epoch: 9 Global Step: 17120 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:24:53,527-Speed 25350.30 samples/sec Loss 3.4059 LearningRate 0.0007 Epoch: 9 Global Step: 17130 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:25:03,266-Speed 25235.48 samples/sec Loss 3.4012 LearningRate 0.0007 Epoch: 9 Global Step: 17140 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:25:12,990-Speed 25278.28 samples/sec Loss 3.4173 LearningRate 0.0007 Epoch: 9 Global Step: 17150 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:25:22,733-Speed 25228.29 samples/sec Loss 3.4285 LearningRate 0.0007 Epoch: 9 Global Step: 17160 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:25:32,537-Speed 25069.58 samples/sec Loss 3.3880 LearningRate 0.0007 Epoch: 9 Global Step: 17170 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:25:42,257-Speed 25286.28 samples/sec Loss 3.3971 LearningRate 0.0007 Epoch: 9 Global Step: 17180 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:25:52,024-Speed 25168.95 samples/sec Loss 3.4017 LearningRate 0.0007 Epoch: 9 Global Step: 17190 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:26:01,854-Speed 25004.53 samples/sec Loss 3.4059 LearningRate 0.0007 Epoch: 9 Global Step: 17200 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:26:11,627-Speed 25148.94 samples/sec Loss 3.4134 LearningRate 0.0007 Epoch: 9 Global Step: 17210 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 03:26:21,376-Speed 25212.39 samples/sec Loss 3.4347 LearningRate 0.0007 Epoch: 9 Global Step: 17220 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 03:26:31,434-Speed 24437.21 samples/sec Loss 3.3914 LearningRate 0.0007 Epoch: 9 Global Step: 17230 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 03:26:41,288-Speed 24942.42 samples/sec Loss 3.4047 LearningRate 0.0007 Epoch: 9 Global Step: 17240 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 03:26:51,254-Speed 24665.04 samples/sec Loss 3.4236 LearningRate 0.0007 Epoch: 9 Global Step: 17250 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 03:27:01,081-Speed 25011.22 samples/sec Loss 3.4293 LearningRate 0.0007 Epoch: 9 Global Step: 17260 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 03:27:10,849-Speed 25161.59 samples/sec Loss 3.4041 LearningRate 0.0007 Epoch: 9 Global Step: 17270 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 03:27:20,597-Speed 25215.35 samples/sec Loss 3.4245 LearningRate 0.0007 Epoch: 9 Global Step: 17280 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 03:28:20,868-Speed 4077.67 samples/sec Loss 3.4010 LearningRate 0.0007 Epoch: 10 Global Step: 17290 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 03:28:30,592-Speed 25277.36 samples/sec Loss 3.3847 LearningRate 0.0007 Epoch: 10 Global Step: 17300 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 03:28:40,276-Speed 25381.63 samples/sec Loss 3.3996 LearningRate 0.0007 Epoch: 10 Global Step: 17310 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 03:28:49,969-Speed 25360.91 samples/sec Loss 3.3427 LearningRate 0.0007 Epoch: 10 Global Step: 17320 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 03:28:59,689-Speed 25285.04 samples/sec Loss 3.3312 LearningRate 0.0007 Epoch: 10 Global Step: 17330 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:29:09,393-Speed 25331.47 samples/sec Loss 3.3622 LearningRate 0.0007 Epoch: 10 Global Step: 17340 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:29:19,184-Speed 25103.20 samples/sec Loss 3.3440 LearningRate 0.0007 Epoch: 10 Global Step: 17350 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:29:28,970-Speed 25116.90 samples/sec Loss 3.3255 LearningRate 0.0007 Epoch: 10 Global Step: 17360 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:29:38,617-Speed 25479.29 samples/sec Loss 3.3218 LearningRate 0.0007 Epoch: 10 Global Step: 17370 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:29:48,546-Speed 24754.19 samples/sec Loss 3.3123 LearningRate 0.0007 Epoch: 10 Global Step: 17380 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:29:58,346-Speed 25083.36 samples/sec Loss 3.3355 LearningRate 0.0007 Epoch: 10 Global Step: 17390 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:30:08,163-Speed 25036.11 samples/sec Loss 3.3693 LearningRate 0.0007 Epoch: 10 Global Step: 17400 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:30:17,881-Speed 25293.00 samples/sec Loss 3.3675 LearningRate 0.0007 Epoch: 10 Global Step: 17410 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:30:27,532-Speed 25468.95 samples/sec Loss 3.3587 LearningRate 0.0007 Epoch: 10 Global Step: 17420 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:30:37,325-Speed 25097.89 samples/sec Loss 3.3456 LearningRate 0.0007 Epoch: 10 Global Step: 17430 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 03:30:47,159-Speed 24994.10 samples/sec Loss 3.3230 LearningRate 0.0007 Epoch: 10 Global Step: 17440 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 03:30:56,968-Speed 25059.35 samples/sec Loss 3.3526 LearningRate 0.0007 Epoch: 10 Global Step: 17450 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-03-26 03:31:06,718-Speed 25208.27 samples/sec Loss 3.3664 LearningRate 0.0007 Epoch: 10 Global Step: 17460 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:31:16,422-Speed 25329.24 samples/sec Loss 3.3669 LearningRate 0.0007 Epoch: 10 Global Step: 17470 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:31:26,178-Speed 25194.55 samples/sec Loss 3.4040 LearningRate 0.0007 Epoch: 10 Global Step: 17480 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:31:35,917-Speed 25235.10 samples/sec Loss 3.3845 LearningRate 0.0007 Epoch: 10 Global Step: 17490 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:31:45,642-Speed 25275.42 samples/sec Loss 3.3497 LearningRate 0.0007 Epoch: 10 Global Step: 17500 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:31:55,396-Speed 25199.84 samples/sec Loss 3.3584 LearningRate 0.0007 Epoch: 10 Global Step: 17510 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:32:05,180-Speed 25122.52 samples/sec Loss 3.3462 LearningRate 0.0007 Epoch: 10 Global Step: 17520 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:32:14,833-Speed 25461.22 samples/sec Loss 3.3728 LearningRate 0.0007 Epoch: 10 Global Step: 17530 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-26 03:32:24,564-Speed 25263.00 samples/sec Loss 3.3789 LearningRate 0.0007 Epoch: 10 Global Step: 17540 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:32:34,346-Speed 25125.93 samples/sec Loss 3.3753 LearningRate 0.0007 Epoch: 10 Global Step: 17550 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:32:44,110-Speed 25172.08 samples/sec Loss 3.3254 LearningRate 0.0007 Epoch: 10 Global Step: 17560 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-03-26 03:32:53,859-Speed 25214.35 samples/sec Loss 3.3272 LearningRate 0.0007 Epoch: 10 Global Step: 17570 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:33:03,572-Speed 25303.90 samples/sec Loss 3.3557 LearningRate 0.0007 Epoch: 10 Global Step: 17580 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:33:13,258-Speed 25377.60 samples/sec Loss 3.3413 LearningRate 0.0007 Epoch: 10 Global Step: 17590 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:33:22,973-Speed 25298.31 samples/sec Loss 3.3503 LearningRate 0.0007 Epoch: 10 Global Step: 17600 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:33:32,788-Speed 25043.75 samples/sec Loss 3.3338 LearningRate 0.0007 Epoch: 10 Global Step: 17610 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:33:42,500-Speed 25305.63 samples/sec Loss 3.3816 LearningRate 0.0007 Epoch: 10 Global Step: 17620 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:33:52,211-Speed 25312.78 samples/sec Loss 3.3521 LearningRate 0.0007 Epoch: 10 Global Step: 17630 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:34:01,965-Speed 25199.25 samples/sec Loss 3.3448 LearningRate 0.0007 Epoch: 10 Global Step: 17640 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:34:11,643-Speed 25395.94 samples/sec Loss 3.3006 LearningRate 0.0007 Epoch: 10 Global Step: 17650 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:34:21,342-Speed 25340.79 samples/sec Loss 3.3256 LearningRate 0.0007 Epoch: 10 Global Step: 17660 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:34:31,117-Speed 25144.57 samples/sec Loss 3.3202 LearningRate 0.0007 Epoch: 10 Global Step: 17670 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-03-26 03:34:40,823-Speed 25323.55 samples/sec Loss 3.3547 LearningRate 0.0007 Epoch: 10 Global Step: 17680 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-03-26 03:34:50,525-Speed 25333.80 samples/sec Loss 3.3530 LearningRate 0.0007 Epoch: 10 Global Step: 17690 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-03-26 03:35:00,295-Speed 25161.30 samples/sec Loss 3.3330 LearningRate 0.0007 Epoch: 10 Global Step: 17700 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-03-26 03:35:10,045-Speed 25209.49 samples/sec Loss 3.3416 LearningRate 0.0007 Epoch: 10 Global Step: 17710 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-03-26 03:35:19,726-Speed 25390.06 samples/sec Loss 3.3223 LearningRate 0.0007 Epoch: 10 Global Step: 17720 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:35:29,432-Speed 25323.34 samples/sec Loss 3.3367 LearningRate 0.0007 Epoch: 10 Global Step: 17730 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:35:39,207-Speed 25146.60 samples/sec Loss 3.3304 LearningRate 0.0007 Epoch: 10 Global Step: 17740 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:35:48,932-Speed 25274.45 samples/sec Loss 3.3243 LearningRate 0.0007 Epoch: 10 Global Step: 17750 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:35:58,689-Speed 25195.80 samples/sec Loss 3.3512 LearningRate 0.0007 Epoch: 10 Global Step: 17760 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:36:08,502-Speed 25049.14 samples/sec Loss 3.3303 LearningRate 0.0007 Epoch: 10 Global Step: 17770 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:36:18,305-Speed 25072.42 samples/sec Loss 3.3399 LearningRate 0.0007 Epoch: 10 Global Step: 17780 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:36:28,086-Speed 25131.47 samples/sec Loss 3.3393 LearningRate 0.0007 Epoch: 10 Global Step: 17790 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:36:37,751-Speed 25436.66 samples/sec Loss 3.3541 LearningRate 0.0007 Epoch: 10 Global Step: 17800 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:36:47,526-Speed 25148.76 samples/sec Loss 3.3173 LearningRate 0.0007 Epoch: 10 Global Step: 17810 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:36:57,294-Speed 25165.65 samples/sec Loss 3.3198 LearningRate 0.0007 Epoch: 10 Global Step: 17820 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:37:07,265-Speed 24649.81 samples/sec Loss 3.3063 LearningRate 0.0007 Epoch: 10 Global Step: 17830 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:37:17,037-Speed 25153.02 samples/sec Loss 3.3223 LearningRate 0.0007 Epoch: 10 Global Step: 17840 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:37:26,739-Speed 25334.69 samples/sec Loss 3.2830 LearningRate 0.0007 Epoch: 10 Global Step: 17850 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:37:36,484-Speed 25221.43 samples/sec Loss 3.3148 LearningRate 0.0007 Epoch: 10 Global Step: 17860 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:37:46,300-Speed 25041.70 samples/sec Loss 3.3254 LearningRate 0.0007 Epoch: 10 Global Step: 17870 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:37:55,975-Speed 25405.10 samples/sec Loss 3.2995 LearningRate 0.0007 Epoch: 10 Global Step: 17880 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:38:05,774-Speed 25080.81 samples/sec Loss 3.3011 LearningRate 0.0007 Epoch: 10 Global Step: 17890 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:38:15,577-Speed 25073.38 samples/sec Loss 3.3065 LearningRate 0.0007 Epoch: 10 Global Step: 17900 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:38:25,327-Speed 25209.26 samples/sec Loss 3.2956 LearningRate 0.0007 Epoch: 10 Global Step: 17910 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:38:35,090-Speed 25177.93 samples/sec Loss 3.3015 LearningRate 0.0007 Epoch: 10 Global Step: 17920 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-03-26 03:38:44,948-Speed 24932.38 samples/sec Loss 3.3092 LearningRate 0.0007 Epoch: 10 Global Step: 17930 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-03-26 03:38:54,848-Speed 24827.35 samples/sec Loss 3.3309 LearningRate 0.0007 Epoch: 10 Global Step: 17940 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-03-26 03:39:04,581-Speed 25252.95 samples/sec Loss 3.2962 LearningRate 0.0007 Epoch: 10 Global Step: 17950 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-03-26 03:39:14,303-Speed 25283.70 samples/sec Loss 3.3084 LearningRate 0.0007 Epoch: 10 Global Step: 17960 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:39:24,089-Speed 25117.52 samples/sec Loss 3.2876 LearningRate 0.0007 Epoch: 10 Global Step: 17970 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:39:33,855-Speed 25168.40 samples/sec Loss 3.3003 LearningRate 0.0007 Epoch: 10 Global Step: 17980 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:39:43,678-Speed 25024.34 samples/sec Loss 3.2913 LearningRate 0.0007 Epoch: 10 Global Step: 17990 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:39:53,379-Speed 25338.83 samples/sec Loss 3.3282 LearningRate 0.0007 Epoch: 10 Global Step: 18000 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:40:03,012-Speed 25515.05 samples/sec Loss 3.2915 LearningRate 0.0007 Epoch: 10 Global Step: 18010 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:40:12,817-Speed 25069.49 samples/sec Loss 3.2798 LearningRate 0.0007 Epoch: 10 Global Step: 18020 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:40:22,515-Speed 25343.93 samples/sec Loss 3.2742 LearningRate 0.0007 Epoch: 10 Global Step: 18030 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:40:32,333-Speed 25033.50 samples/sec Loss 3.3120 LearningRate 0.0007 Epoch: 10 Global Step: 18040 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:40:42,231-Speed 24833.97 samples/sec Loss 3.3008 LearningRate 0.0007 Epoch: 10 Global Step: 18050 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:40:52,047-Speed 25039.97 samples/sec Loss 3.2849 LearningRate 0.0007 Epoch: 10 Global Step: 18060 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-03-26 03:41:01,820-Speed 25151.55 samples/sec Loss 3.3317 LearningRate 0.0007 Epoch: 10 Global Step: 18070 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-03-26 03:41:11,636-Speed 25044.08 samples/sec Loss 3.2941 LearningRate 0.0007 Epoch: 10 Global Step: 18080 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-03-26 03:41:21,304-Speed 25422.89 samples/sec Loss 3.2860 LearningRate 0.0007 Epoch: 10 Global Step: 18090 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:41:30,977-Speed 25411.27 samples/sec Loss 3.2750 LearningRate 0.0007 Epoch: 10 Global Step: 18100 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:41:40,654-Speed 25400.02 samples/sec Loss 3.2898 LearningRate 0.0007 Epoch: 10 Global Step: 18110 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:41:50,372-Speed 25292.49 samples/sec Loss 3.2869 LearningRate 0.0007 Epoch: 10 Global Step: 18120 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:42:00,061-Speed 25369.70 samples/sec Loss 3.3113 LearningRate 0.0007 Epoch: 10 Global Step: 18130 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:42:09,813-Speed 25205.91 samples/sec Loss 3.2555 LearningRate 0.0007 Epoch: 10 Global Step: 18140 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:42:19,573-Speed 25183.05 samples/sec Loss 3.2869 LearningRate 0.0007 Epoch: 10 Global Step: 18150 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:42:29,288-Speed 25300.90 samples/sec Loss 3.2910 LearningRate 0.0007 Epoch: 10 Global Step: 18160 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:42:39,023-Speed 25251.14 samples/sec Loss 3.2579 LearningRate 0.0007 Epoch: 10 Global Step: 18170 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:42:48,706-Speed 25385.19 samples/sec Loss 3.2653 LearningRate 0.0007 Epoch: 10 Global Step: 18180 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:42:58,478-Speed 25152.72 samples/sec Loss 3.2926 LearningRate 0.0007 Epoch: 10 Global Step: 18190 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-03-26 03:43:08,210-Speed 25257.13 samples/sec Loss 3.2797 LearningRate 0.0007 Epoch: 10 Global Step: 18200 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:43:17,982-Speed 25151.16 samples/sec Loss 3.2819 LearningRate 0.0007 Epoch: 10 Global Step: 18210 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:43:27,753-Speed 25155.48 samples/sec Loss 3.2740 LearningRate 0.0007 Epoch: 10 Global Step: 18220 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:43:37,432-Speed 25394.77 samples/sec Loss 3.2923 LearningRate 0.0007 Epoch: 10 Global Step: 18230 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:43:47,127-Speed 25351.31 samples/sec Loss 3.2541 LearningRate 0.0007 Epoch: 10 Global Step: 18240 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:43:56,832-Speed 25329.24 samples/sec Loss 3.2437 LearningRate 0.0007 Epoch: 10 Global Step: 18250 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:44:06,608-Speed 25140.70 samples/sec Loss 3.2610 LearningRate 0.0007 Epoch: 10 Global Step: 18260 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:44:16,364-Speed 25195.04 samples/sec Loss 3.2829 LearningRate 0.0007 Epoch: 10 Global Step: 18270 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:44:26,232-Speed 24906.71 samples/sec Loss 3.2585 LearningRate 0.0007 Epoch: 10 Global Step: 18280 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:44:35,936-Speed 25330.54 samples/sec Loss 3.2616 LearningRate 0.0007 Epoch: 10 Global Step: 18290 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:44:45,742-Speed 25066.54 samples/sec Loss 3.2786 LearningRate 0.0007 Epoch: 10 Global Step: 18300 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-03-26 03:44:55,520-Speed 25136.43 samples/sec Loss 3.3015 LearningRate 0.0007 Epoch: 10 Global Step: 18310 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:45:05,323-Speed 25072.20 samples/sec Loss 3.2509 LearningRate 0.0007 Epoch: 10 Global Step: 18320 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:45:15,091-Speed 25164.17 samples/sec Loss 3.2604 LearningRate 0.0007 Epoch: 10 Global Step: 18330 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:45:24,881-Speed 25106.25 samples/sec Loss 3.2736 LearningRate 0.0007 Epoch: 10 Global Step: 18340 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:45:34,609-Speed 25267.45 samples/sec Loss 3.2480 LearningRate 0.0007 Epoch: 10 Global Step: 18350 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:45:44,333-Speed 25274.84 samples/sec Loss 3.2423 LearningRate 0.0007 Epoch: 10 Global Step: 18360 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:45:54,176-Speed 24971.45 samples/sec Loss 3.2610 LearningRate 0.0007 Epoch: 10 Global Step: 18370 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:46:03,893-Speed 25297.65 samples/sec Loss 3.2422 LearningRate 0.0007 Epoch: 10 Global Step: 18380 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:46:13,622-Speed 25261.87 samples/sec Loss 3.2766 LearningRate 0.0007 Epoch: 10 Global Step: 18390 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:46:23,457-Speed 24992.23 samples/sec Loss 3.2440 LearningRate 0.0007 Epoch: 10 Global Step: 18400 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:46:33,243-Speed 25118.36 samples/sec Loss 3.2671 LearningRate 0.0007 Epoch: 10 Global Step: 18410 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:46:43,223-Speed 24626.09 samples/sec Loss 3.2554 LearningRate 0.0007 Epoch: 10 Global Step: 18420 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:46:52,955-Speed 25257.67 samples/sec Loss 3.2519 LearningRate 0.0007 Epoch: 10 Global Step: 18430 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:47:02,721-Speed 25171.79 samples/sec Loss 3.2454 LearningRate 0.0007 Epoch: 10 Global Step: 18440 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:47:12,370-Speed 25474.52 samples/sec Loss 3.2460 LearningRate 0.0007 Epoch: 10 Global Step: 18450 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:47:22,068-Speed 25344.71 samples/sec Loss 3.2139 LearningRate 0.0007 Epoch: 10 Global Step: 18460 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:47:31,790-Speed 25282.56 samples/sec Loss 3.2636 LearningRate 0.0007 Epoch: 10 Global Step: 18470 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:47:41,636-Speed 24963.77 samples/sec Loss 3.2546 LearningRate 0.0007 Epoch: 10 Global Step: 18480 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:47:51,378-Speed 25230.25 samples/sec Loss 3.2613 LearningRate 0.0007 Epoch: 10 Global Step: 18490 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:48:01,094-Speed 25297.20 samples/sec Loss 3.2269 LearningRate 0.0007 Epoch: 10 Global Step: 18500 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:48:10,791-Speed 25347.13 samples/sec Loss 3.2405 LearningRate 0.0007 Epoch: 10 Global Step: 18510 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-03-26 03:48:20,529-Speed 25240.50 samples/sec Loss 3.2274 LearningRate 0.0007 Epoch: 10 Global Step: 18520 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:48:30,250-Speed 25283.61 samples/sec Loss 3.2248 LearningRate 0.0007 Epoch: 10 Global Step: 18530 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:48:40,123-Speed 24896.17 samples/sec Loss 3.2405 LearningRate 0.0007 Epoch: 10 Global Step: 18540 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:48:49,976-Speed 24946.18 samples/sec Loss 3.2385 LearningRate 0.0007 Epoch: 10 Global Step: 18550 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:48:59,858-Speed 24871.84 samples/sec Loss 3.2651 LearningRate 0.0007 Epoch: 10 Global Step: 18560 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:49:09,698-Speed 24977.54 samples/sec Loss 3.2324 LearningRate 0.0007 Epoch: 10 Global Step: 18570 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:49:19,478-Speed 25132.56 samples/sec Loss 3.2135 LearningRate 0.0007 Epoch: 10 Global Step: 18580 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:49:29,195-Speed 25294.78 samples/sec Loss 3.2187 LearningRate 0.0007 Epoch: 10 Global Step: 18590 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:49:38,920-Speed 25274.58 samples/sec Loss 3.2157 LearningRate 0.0007 Epoch: 10 Global Step: 18600 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:49:48,732-Speed 25048.46 samples/sec Loss 3.2152 LearningRate 0.0007 Epoch: 10 Global Step: 18610 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:49:58,549-Speed 25038.11 samples/sec Loss 3.2025 LearningRate 0.0007 Epoch: 10 Global Step: 18620 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-03-26 03:50:08,406-Speed 24935.44 samples/sec Loss 3.2187 LearningRate 0.0007 Epoch: 10 Global Step: 18630 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:50:18,170-Speed 25173.37 samples/sec Loss 3.2058 LearningRate 0.0007 Epoch: 10 Global Step: 18640 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:50:27,930-Speed 25186.67 samples/sec Loss 3.2178 LearningRate 0.0007 Epoch: 10 Global Step: 18650 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:50:37,759-Speed 25004.65 samples/sec Loss 3.2135 LearningRate 0.0007 Epoch: 10 Global Step: 18660 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:50:47,582-Speed 25022.90 samples/sec Loss 3.2414 LearningRate 0.0007 Epoch: 10 Global Step: 18670 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:50:57,316-Speed 25250.72 samples/sec Loss 3.2110 LearningRate 0.0007 Epoch: 10 Global Step: 18680 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:51:07,076-Speed 25183.88 samples/sec Loss 3.2326 LearningRate 0.0007 Epoch: 10 Global Step: 18690 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:51:16,841-Speed 25171.12 samples/sec Loss 3.2331 LearningRate 0.0007 Epoch: 10 Global Step: 18700 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:51:26,750-Speed 24803.21 samples/sec Loss 3.2055 LearningRate 0.0007 Epoch: 10 Global Step: 18710 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:51:36,625-Speed 24892.01 samples/sec Loss 3.2150 LearningRate 0.0007 Epoch: 10 Global Step: 18720 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:51:46,410-Speed 25120.94 samples/sec Loss 3.2286 LearningRate 0.0007 Epoch: 10 Global Step: 18730 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-03-26 03:51:56,127-Speed 25296.65 samples/sec Loss 3.1943 LearningRate 0.0007 Epoch: 10 Global Step: 18740 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-03-26 03:52:05,810-Speed 25385.35 samples/sec Loss 3.2159 LearningRate 0.0007 Epoch: 10 Global Step: 18750 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:52:15,508-Speed 25344.86 samples/sec Loss 3.2011 LearningRate 0.0007 Epoch: 10 Global Step: 18760 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:52:25,284-Speed 25142.00 samples/sec Loss 3.2130 LearningRate 0.0007 Epoch: 10 Global Step: 18770 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:52:34,931-Speed 25478.23 samples/sec Loss 3.2401 LearningRate 0.0007 Epoch: 10 Global Step: 18780 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:52:44,630-Speed 25344.11 samples/sec Loss 3.2092 LearningRate 0.0007 Epoch: 10 Global Step: 18790 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:52:54,318-Speed 25370.99 samples/sec Loss 3.2085 LearningRate 0.0007 Epoch: 10 Global Step: 18800 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:53:04,072-Speed 25199.67 samples/sec Loss 3.2321 LearningRate 0.0007 Epoch: 10 Global Step: 18810 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:53:13,723-Speed 25470.99 samples/sec Loss 3.2268 LearningRate 0.0007 Epoch: 10 Global Step: 18820 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:53:23,454-Speed 25258.84 samples/sec Loss 3.2223 LearningRate 0.0007 Epoch: 10 Global Step: 18830 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:53:33,256-Speed 25079.52 samples/sec Loss 3.2007 LearningRate 0.0007 Epoch: 10 Global Step: 18840 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:53:43,054-Speed 25087.23 samples/sec Loss 3.2156 LearningRate 0.0007 Epoch: 10 Global Step: 18850 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:53:52,787-Speed 25254.91 samples/sec Loss 3.2016 LearningRate 0.0007 Epoch: 10 Global Step: 18860 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:54:02,519-Speed 25255.35 samples/sec Loss 3.1949 LearningRate 0.0007 Epoch: 10 Global Step: 18870 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:54:12,248-Speed 25264.62 samples/sec Loss 3.2222 LearningRate 0.0007 Epoch: 10 Global Step: 18880 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:54:22,208-Speed 24678.34 samples/sec Loss 3.1911 LearningRate 0.0007 Epoch: 10 Global Step: 18890 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:54:32,096-Speed 24856.73 samples/sec Loss 3.2026 LearningRate 0.0007 Epoch: 10 Global Step: 18900 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:54:41,884-Speed 25111.30 samples/sec Loss 3.1745 LearningRate 0.0007 Epoch: 10 Global Step: 18910 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:54:51,587-Speed 25332.54 samples/sec Loss 3.2018 LearningRate 0.0007 Epoch: 10 Global Step: 18920 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:55:01,307-Speed 25287.24 samples/sec Loss 3.1996 LearningRate 0.0007 Epoch: 10 Global Step: 18930 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:55:11,077-Speed 25156.00 samples/sec Loss 3.2167 LearningRate 0.0007 Epoch: 10 Global Step: 18940 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:55:20,903-Speed 25015.19 samples/sec Loss 3.2097 LearningRate 0.0007 Epoch: 10 Global Step: 18950 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-03-26 03:55:30,608-Speed 25327.31 samples/sec Loss 3.1941 LearningRate 0.0007 Epoch: 10 Global Step: 18960 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-03-26 03:55:40,376-Speed 25163.00 samples/sec Loss 3.2075 LearningRate 0.0006 Epoch: 10 Global Step: 18970 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-03-26 03:55:50,119-Speed 25229.55 samples/sec Loss 3.2546 LearningRate 0.0006 Epoch: 10 Global Step: 18980 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:55:59,852-Speed 25261.72 samples/sec Loss 3.2255 LearningRate 0.0006 Epoch: 10 Global Step: 18990 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:56:09,748-Speed 24836.78 samples/sec Loss 3.2369 LearningRate 0.0006 Epoch: 10 Global Step: 19000 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:56:19,544-Speed 25091.41 samples/sec Loss 3.2390 LearningRate 0.0006 Epoch: 10 Global Step: 19010 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:57:19,611-Speed 4091.57 samples/sec Loss 3.1713 LearningRate 0.0006 Epoch: 11 Global Step: 19020 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:57:29,557-Speed 24713.64 samples/sec Loss 3.1366 LearningRate 0.0006 Epoch: 11 Global Step: 19030 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:57:39,595-Speed 24485.51 samples/sec Loss 3.1363 LearningRate 0.0006 Epoch: 11 Global Step: 19040 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:57:49,505-Speed 24802.81 samples/sec Loss 3.1625 LearningRate 0.0006 Epoch: 11 Global Step: 19050 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:57:59,427-Speed 24774.46 samples/sec Loss 3.1829 LearningRate 0.0006 Epoch: 11 Global Step: 19060 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:58:09,348-Speed 24775.91 samples/sec Loss 3.1809 LearningRate 0.0006 Epoch: 11 Global Step: 19070 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:58:19,278-Speed 24750.89 samples/sec Loss 3.1666 LearningRate 0.0006 Epoch: 11 Global Step: 19080 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-03-26 03:58:29,187-Speed 24803.25 samples/sec Loss 3.1787 LearningRate 0.0006 Epoch: 11 Global Step: 19090 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 03:58:39,277-Speed 24361.04 samples/sec Loss 3.1794 LearningRate 0.0006 Epoch: 11 Global Step: 19100 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-26 03:58:49,212-Speed 24740.63 samples/sec Loss 3.1315 LearningRate 0.0006 Epoch: 11 Global Step: 19110 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-26 03:58:59,202-Speed 24608.36 samples/sec Loss 3.1539 LearningRate 0.0006 Epoch: 11 Global Step: 19120 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-26 03:59:09,151-Speed 24707.10 samples/sec Loss 3.1556 LearningRate 0.0006 Epoch: 11 Global Step: 19130 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-26 03:59:19,083-Speed 24746.58 samples/sec Loss 3.1341 LearningRate 0.0006 Epoch: 11 Global Step: 19140 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-26 03:59:29,012-Speed 24755.07 samples/sec Loss 3.1834 LearningRate 0.0006 Epoch: 11 Global Step: 19150 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-26 03:59:38,908-Speed 24838.29 samples/sec Loss 3.1834 LearningRate 0.0006 Epoch: 11 Global Step: 19160 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-26 03:59:48,871-Speed 24668.83 samples/sec Loss 3.1507 LearningRate 0.0006 Epoch: 11 Global Step: 19170 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-26 03:59:58,871-Speed 24578.50 samples/sec Loss 3.1759 LearningRate 0.0006 Epoch: 11 Global Step: 19180 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-26 04:00:08,855-Speed 24620.83 samples/sec Loss 3.1266 LearningRate 0.0006 Epoch: 11 Global Step: 19190 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-26 04:00:18,719-Speed 24916.97 samples/sec Loss 3.1274 LearningRate 0.0006 Epoch: 11 Global Step: 19200 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:00:28,672-Speed 24694.99 samples/sec Loss 3.1632 LearningRate 0.0006 Epoch: 11 Global Step: 19210 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:00:38,633-Speed 24676.36 samples/sec Loss 3.1804 LearningRate 0.0006 Epoch: 11 Global Step: 19220 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:00:48,557-Speed 24766.64 samples/sec Loss 3.1676 LearningRate 0.0006 Epoch: 11 Global Step: 19230 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:00:58,507-Speed 24703.95 samples/sec Loss 3.1517 LearningRate 0.0006 Epoch: 11 Global Step: 19240 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:01:08,469-Speed 24671.87 samples/sec Loss 3.1650 LearningRate 0.0006 Epoch: 11 Global Step: 19250 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:01:18,383-Speed 24791.52 samples/sec Loss 3.1661 LearningRate 0.0006 Epoch: 11 Global Step: 19260 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:01:28,325-Speed 24723.15 samples/sec Loss 3.1791 LearningRate 0.0006 Epoch: 11 Global Step: 19270 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:01:38,252-Speed 24766.61 samples/sec Loss 3.1685 LearningRate 0.0006 Epoch: 11 Global Step: 19280 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:01:48,130-Speed 24881.38 samples/sec Loss 3.1468 LearningRate 0.0006 Epoch: 11 Global Step: 19290 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:01:58,033-Speed 24820.57 samples/sec Loss 3.1689 LearningRate 0.0006 Epoch: 11 Global Step: 19300 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:02:07,989-Speed 24685.98 samples/sec Loss 3.1376 LearningRate 0.0006 Epoch: 11 Global Step: 19310 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:02:17,970-Speed 24631.75 samples/sec Loss 3.1573 LearningRate 0.0006 Epoch: 11 Global Step: 19320 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:02:27,845-Speed 24892.24 samples/sec Loss 3.1586 LearningRate 0.0006 Epoch: 11 Global Step: 19330 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:02:37,866-Speed 24526.28 samples/sec Loss 3.1363 LearningRate 0.0006 Epoch: 11 Global Step: 19340 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:02:47,840-Speed 24648.84 samples/sec Loss 3.1909 LearningRate 0.0006 Epoch: 11 Global Step: 19350 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:02:57,910-Speed 24408.96 samples/sec Loss 3.1534 LearningRate 0.0006 Epoch: 11 Global Step: 19360 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:03:07,869-Speed 24680.19 samples/sec Loss 3.1894 LearningRate 0.0006 Epoch: 11 Global Step: 19370 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:03:17,751-Speed 24872.37 samples/sec Loss 3.1585 LearningRate 0.0006 Epoch: 11 Global Step: 19380 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:03:27,675-Speed 24767.24 samples/sec Loss 3.1625 LearningRate 0.0006 Epoch: 11 Global Step: 19390 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:03:37,639-Speed 24669.02 samples/sec Loss 3.1495 LearningRate 0.0006 Epoch: 11 Global Step: 19400 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-03-26 04:03:47,599-Speed 24678.50 samples/sec Loss 3.1373 LearningRate 0.0006 Epoch: 11 Global Step: 19410 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:03:57,629-Speed 24505.74 samples/sec Loss 3.1564 LearningRate 0.0006 Epoch: 11 Global Step: 19420 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:04:07,517-Speed 24856.77 samples/sec Loss 3.1430 LearningRate 0.0006 Epoch: 11 Global Step: 19430 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:04:17,533-Speed 24540.42 samples/sec Loss 3.1203 LearningRate 0.0006 Epoch: 11 Global Step: 19440 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:04:27,543-Speed 24554.41 samples/sec Loss 3.1331 LearningRate 0.0006 Epoch: 11 Global Step: 19450 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:04:37,458-Speed 24790.14 samples/sec Loss 3.1638 LearningRate 0.0006 Epoch: 11 Global Step: 19460 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:04:47,443-Speed 24614.56 samples/sec Loss 3.1712 LearningRate 0.0006 Epoch: 11 Global Step: 19470 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:04:57,451-Speed 24560.78 samples/sec Loss 3.1531 LearningRate 0.0006 Epoch: 11 Global Step: 19480 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:05:07,467-Speed 24538.97 samples/sec Loss 3.1083 LearningRate 0.0006 Epoch: 11 Global Step: 19490 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:05:17,480-Speed 24547.41 samples/sec Loss 3.1749 LearningRate 0.0006 Epoch: 11 Global Step: 19500 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:05:27,483-Speed 24570.59 samples/sec Loss 3.1435 LearningRate 0.0006 Epoch: 11 Global Step: 19510 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-03-26 04:05:37,410-Speed 24760.58 samples/sec Loss 3.1357 LearningRate 0.0006 Epoch: 11 Global Step: 19520 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-03-26 04:05:47,383-Speed 24645.58 samples/sec Loss 3.1498 LearningRate 0.0006 Epoch: 11 Global Step: 19530 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:05:57,516-Speed 24258.27 samples/sec Loss 3.1452 LearningRate 0.0006 Epoch: 11 Global Step: 19540 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:06:07,525-Speed 24557.55 samples/sec Loss 3.1043 LearningRate 0.0006 Epoch: 11 Global Step: 19550 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:06:17,484-Speed 24687.38 samples/sec Loss 3.1331 LearningRate 0.0006 Epoch: 11 Global Step: 19560 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:06:27,522-Speed 24485.42 samples/sec Loss 3.1041 LearningRate 0.0006 Epoch: 11 Global Step: 19570 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:06:37,495-Speed 24645.47 samples/sec Loss 3.1562 LearningRate 0.0006 Epoch: 11 Global Step: 19580 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:06:47,455-Speed 24678.00 samples/sec Loss 3.1271 LearningRate 0.0006 Epoch: 11 Global Step: 19590 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:06:57,406-Speed 24701.44 samples/sec Loss 3.1064 LearningRate 0.0006 Epoch: 11 Global Step: 19600 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:07:07,342-Speed 24736.89 samples/sec Loss 3.1397 LearningRate 0.0006 Epoch: 11 Global Step: 19610 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:07:17,284-Speed 24723.16 samples/sec Loss 3.1127 LearningRate 0.0006 Epoch: 11 Global Step: 19620 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:07:27,280-Speed 24590.29 samples/sec Loss 3.1217 LearningRate 0.0006 Epoch: 11 Global Step: 19630 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-03-26 04:07:37,179-Speed 24829.35 samples/sec Loss 3.1211 LearningRate 0.0006 Epoch: 11 Global Step: 19640 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:07:47,094-Speed 24791.64 samples/sec Loss 3.1101 LearningRate 0.0006 Epoch: 11 Global Step: 19650 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:07:57,141-Speed 24468.25 samples/sec Loss 3.1072 LearningRate 0.0006 Epoch: 11 Global Step: 19660 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:08:07,105-Speed 24669.08 samples/sec Loss 3.1083 LearningRate 0.0006 Epoch: 11 Global Step: 19670 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:08:17,109-Speed 24568.53 samples/sec Loss 3.1285 LearningRate 0.0006 Epoch: 11 Global Step: 19680 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:08:27,025-Speed 24784.77 samples/sec Loss 3.1110 LearningRate 0.0006 Epoch: 11 Global Step: 19690 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:08:36,968-Speed 24721.55 samples/sec Loss 3.1002 LearningRate 0.0006 Epoch: 11 Global Step: 19700 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:08:47,162-Speed 24111.49 samples/sec Loss 3.0987 LearningRate 0.0006 Epoch: 11 Global Step: 19710 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:08:57,083-Speed 24775.07 samples/sec Loss 3.1136 LearningRate 0.0006 Epoch: 11 Global Step: 19720 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:09:07,130-Speed 24461.55 samples/sec Loss 3.0880 LearningRate 0.0006 Epoch: 11 Global Step: 19730 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:09:17,192-Speed 24429.41 samples/sec Loss 3.1224 LearningRate 0.0006 Epoch: 11 Global Step: 19740 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-03-26 04:09:27,079-Speed 24866.49 samples/sec Loss 3.1192 LearningRate 0.0006 Epoch: 11 Global Step: 19750 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:09:37,009-Speed 24751.82 samples/sec Loss 3.1129 LearningRate 0.0006 Epoch: 11 Global Step: 19760 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:09:46,952-Speed 24718.26 samples/sec Loss 3.1419 LearningRate 0.0006 Epoch: 11 Global Step: 19770 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:09:56,918-Speed 24669.41 samples/sec Loss 3.1101 LearningRate 0.0006 Epoch: 11 Global Step: 19780 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:10:06,937-Speed 24532.19 samples/sec Loss 3.1335 LearningRate 0.0006 Epoch: 11 Global Step: 19790 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:10:16,797-Speed 24929.37 samples/sec Loss 3.0933 LearningRate 0.0006 Epoch: 11 Global Step: 19800 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:10:26,707-Speed 24801.52 samples/sec Loss 3.1262 LearningRate 0.0006 Epoch: 11 Global Step: 19810 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:10:36,777-Speed 24408.53 samples/sec Loss 3.0907 LearningRate 0.0006 Epoch: 11 Global Step: 19820 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:10:46,782-Speed 24567.03 samples/sec Loss 3.0863 LearningRate 0.0006 Epoch: 11 Global Step: 19830 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:10:56,785-Speed 24571.02 samples/sec Loss 3.1282 LearningRate 0.0006 Epoch: 11 Global Step: 19840 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:11:06,809-Speed 24522.02 samples/sec Loss 3.1073 LearningRate 0.0006 Epoch: 11 Global Step: 19850 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:11:16,820-Speed 24550.99 samples/sec Loss 3.0919 LearningRate 0.0006 Epoch: 11 Global Step: 19860 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:11:26,735-Speed 24791.78 samples/sec Loss 3.0934 LearningRate 0.0006 Epoch: 11 Global Step: 19870 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:11:36,761-Speed 24515.46 samples/sec Loss 3.0842 LearningRate 0.0006 Epoch: 11 Global Step: 19880 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:11:46,783-Speed 24525.68 samples/sec Loss 3.0707 LearningRate 0.0006 Epoch: 11 Global Step: 19890 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:11:56,729-Speed 24718.08 samples/sec Loss 3.0747 LearningRate 0.0006 Epoch: 11 Global Step: 19900 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:12:06,725-Speed 24589.07 samples/sec Loss 3.0985 LearningRate 0.0006 Epoch: 11 Global Step: 19910 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:12:16,549-Speed 25020.21 samples/sec Loss 3.1289 LearningRate 0.0006 Epoch: 11 Global Step: 19920 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:12:26,360-Speed 25052.61 samples/sec Loss 3.1126 LearningRate 0.0006 Epoch: 11 Global Step: 19930 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:12:36,254-Speed 24842.01 samples/sec Loss 3.0736 LearningRate 0.0006 Epoch: 11 Global Step: 19940 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:12:46,253-Speed 24583.32 samples/sec Loss 3.0865 LearningRate 0.0006 Epoch: 11 Global Step: 19950 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-03-26 04:12:56,302-Speed 24458.51 samples/sec Loss 3.0981 LearningRate 0.0006 Epoch: 11 Global Step: 19960 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:13:06,381-Speed 24386.46 samples/sec Loss 3.0791 LearningRate 0.0006 Epoch: 11 Global Step: 19970 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:13:16,481-Speed 24334.65 samples/sec Loss 3.1331 LearningRate 0.0006 Epoch: 11 Global Step: 19980 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:13:26,591-Speed 24312.46 samples/sec Loss 3.0679 LearningRate 0.0006 Epoch: 11 Global Step: 19990 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:13:36,651-Speed 24431.77 samples/sec Loss 3.0579 LearningRate 0.0006 Epoch: 11 Global Step: 20000 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:13:46,772-Speed 24286.72 samples/sec Loss 3.1125 LearningRate 0.0006 Epoch: 11 Global Step: 20010 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:13:56,880-Speed 24316.64 samples/sec Loss 3.0919 LearningRate 0.0006 Epoch: 11 Global Step: 20020 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:14:06,997-Speed 24292.94 samples/sec Loss 3.0730 LearningRate 0.0006 Epoch: 11 Global Step: 20030 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:14:17,086-Speed 24364.31 samples/sec Loss 3.0871 LearningRate 0.0006 Epoch: 11 Global Step: 20040 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:14:27,123-Speed 24485.14 samples/sec Loss 3.0547 LearningRate 0.0006 Epoch: 11 Global Step: 20050 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:14:37,253-Speed 24264.53 samples/sec Loss 3.1004 LearningRate 0.0006 Epoch: 11 Global Step: 20060 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-03-26 04:14:47,394-Speed 24237.90 samples/sec Loss 3.1108 LearningRate 0.0006 Epoch: 11 Global Step: 20070 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:14:57,409-Speed 24542.74 samples/sec Loss 3.1054 LearningRate 0.0006 Epoch: 11 Global Step: 20080 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:15:07,607-Speed 24104.01 samples/sec Loss 3.0705 LearningRate 0.0006 Epoch: 11 Global Step: 20090 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:15:17,680-Speed 24400.16 samples/sec Loss 3.0476 LearningRate 0.0006 Epoch: 11 Global Step: 20100 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:15:27,815-Speed 24252.44 samples/sec Loss 3.0982 LearningRate 0.0006 Epoch: 11 Global Step: 20110 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:15:37,862-Speed 24464.16 samples/sec Loss 3.0711 LearningRate 0.0006 Epoch: 11 Global Step: 20120 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:15:48,016-Speed 24206.64 samples/sec Loss 3.0659 LearningRate 0.0006 Epoch: 11 Global Step: 20130 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:15:58,137-Speed 24285.22 samples/sec Loss 3.0573 LearningRate 0.0006 Epoch: 11 Global Step: 20140 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:16:08,239-Speed 24331.39 samples/sec Loss 3.0555 LearningRate 0.0006 Epoch: 11 Global Step: 20150 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:16:18,303-Speed 24422.14 samples/sec Loss 3.0792 LearningRate 0.0006 Epoch: 11 Global Step: 20160 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:16:28,339-Speed 24491.27 samples/sec Loss 3.0608 LearningRate 0.0006 Epoch: 11 Global Step: 20170 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:16:38,468-Speed 24266.36 samples/sec Loss 3.0646 LearningRate 0.0006 Epoch: 11 Global Step: 20180 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:16:48,503-Speed 24493.68 samples/sec Loss 3.0744 LearningRate 0.0006 Epoch: 11 Global Step: 20190 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:16:58,528-Speed 24517.65 samples/sec Loss 3.0476 LearningRate 0.0006 Epoch: 11 Global Step: 20200 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:17:08,581-Speed 24448.54 samples/sec Loss 3.0500 LearningRate 0.0006 Epoch: 11 Global Step: 20210 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:17:18,626-Speed 24470.27 samples/sec Loss 3.0545 LearningRate 0.0006 Epoch: 11 Global Step: 20220 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:17:28,684-Speed 24446.02 samples/sec Loss 3.0939 LearningRate 0.0006 Epoch: 11 Global Step: 20230 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:17:38,791-Speed 24318.54 samples/sec Loss 3.0761 LearningRate 0.0006 Epoch: 11 Global Step: 20240 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:17:48,850-Speed 24433.96 samples/sec Loss 3.0278 LearningRate 0.0006 Epoch: 11 Global Step: 20250 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:17:58,983-Speed 24256.20 samples/sec Loss 3.0758 LearningRate 0.0006 Epoch: 11 Global Step: 20260 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:18:09,073-Speed 24360.61 samples/sec Loss 3.0326 LearningRate 0.0006 Epoch: 11 Global Step: 20270 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:18:19,182-Speed 24313.59 samples/sec Loss 3.0358 LearningRate 0.0006 Epoch: 11 Global Step: 20280 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:18:29,307-Speed 24276.65 samples/sec Loss 3.0735 LearningRate 0.0006 Epoch: 11 Global Step: 20290 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:18:39,361-Speed 24445.68 samples/sec Loss 3.0612 LearningRate 0.0006 Epoch: 11 Global Step: 20300 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:18:49,476-Speed 24301.07 samples/sec Loss 3.0623 LearningRate 0.0006 Epoch: 11 Global Step: 20310 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:18:59,591-Speed 24299.75 samples/sec Loss 3.0567 LearningRate 0.0006 Epoch: 11 Global Step: 20320 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:19:09,764-Speed 24161.94 samples/sec Loss 3.0488 LearningRate 0.0006 Epoch: 11 Global Step: 20330 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:19:19,925-Speed 24189.75 samples/sec Loss 3.0604 LearningRate 0.0006 Epoch: 11 Global Step: 20340 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:19:30,108-Speed 24138.17 samples/sec Loss 3.0917 LearningRate 0.0006 Epoch: 11 Global Step: 20350 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:19:40,202-Speed 24348.13 samples/sec Loss 3.0688 LearningRate 0.0006 Epoch: 11 Global Step: 20360 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:19:50,301-Speed 24339.19 samples/sec Loss 3.0525 LearningRate 0.0006 Epoch: 11 Global Step: 20370 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-03-26 04:20:00,337-Speed 24490.44 samples/sec Loss 3.0639 LearningRate 0.0006 Epoch: 11 Global Step: 20380 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:20:10,475-Speed 24244.29 samples/sec Loss 3.0527 LearningRate 0.0006 Epoch: 11 Global Step: 20390 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:20:20,691-Speed 24059.31 samples/sec Loss 3.0693 LearningRate 0.0006 Epoch: 11 Global Step: 20400 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:20:31,126-Speed 23555.19 samples/sec Loss 3.0483 LearningRate 0.0006 Epoch: 11 Global Step: 20410 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:20:41,318-Speed 24114.41 samples/sec Loss 3.0523 LearningRate 0.0006 Epoch: 11 Global Step: 20420 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:20:51,395-Speed 24393.30 samples/sec Loss 3.0465 LearningRate 0.0006 Epoch: 11 Global Step: 20430 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:21:01,509-Speed 24300.41 samples/sec Loss 3.0403 LearningRate 0.0006 Epoch: 11 Global Step: 20440 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:21:11,510-Speed 24578.44 samples/sec Loss 3.0378 LearningRate 0.0006 Epoch: 11 Global Step: 20450 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:21:21,538-Speed 24508.62 samples/sec Loss 3.0477 LearningRate 0.0006 Epoch: 11 Global Step: 20460 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:21:31,582-Speed 24472.52 samples/sec Loss 3.1058 LearningRate 0.0006 Epoch: 11 Global Step: 20470 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:21:41,736-Speed 24207.39 samples/sec Loss 3.0924 LearningRate 0.0006 Epoch: 11 Global Step: 20480 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:21:51,791-Speed 24445.88 samples/sec Loss 3.0640 LearningRate 0.0006 Epoch: 11 Global Step: 20490 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:22:01,887-Speed 24345.18 samples/sec Loss 3.0594 LearningRate 0.0006 Epoch: 11 Global Step: 20500 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:22:12,006-Speed 24289.85 samples/sec Loss 3.0284 LearningRate 0.0006 Epoch: 11 Global Step: 20510 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:22:22,060-Speed 24445.97 samples/sec Loss 3.0316 LearningRate 0.0006 Epoch: 11 Global Step: 20520 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:22:32,149-Speed 24362.69 samples/sec Loss 3.0304 LearningRate 0.0006 Epoch: 11 Global Step: 20530 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:22:42,205-Speed 24441.66 samples/sec Loss 3.0370 LearningRate 0.0006 Epoch: 11 Global Step: 20540 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:22:52,303-Speed 24338.70 samples/sec Loss 3.0451 LearningRate 0.0006 Epoch: 11 Global Step: 20550 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:23:02,369-Speed 24417.26 samples/sec Loss 3.0348 LearningRate 0.0006 Epoch: 11 Global Step: 20560 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:23:12,598-Speed 24029.39 samples/sec Loss 3.0221 LearningRate 0.0006 Epoch: 11 Global Step: 20570 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:23:22,764-Speed 24175.07 samples/sec Loss 3.0447 LearningRate 0.0006 Epoch: 11 Global Step: 20580 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-03-26 04:23:32,798-Speed 24495.30 samples/sec Loss 3.0286 LearningRate 0.0006 Epoch: 11 Global Step: 20590 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:23:42,910-Speed 24306.26 samples/sec Loss 3.0303 LearningRate 0.0006 Epoch: 11 Global Step: 20600 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:23:52,983-Speed 24400.39 samples/sec Loss 3.0288 LearningRate 0.0006 Epoch: 11 Global Step: 20610 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:24:03,038-Speed 24445.00 samples/sec Loss 3.0136 LearningRate 0.0006 Epoch: 11 Global Step: 20620 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:24:13,076-Speed 24491.30 samples/sec Loss 3.0277 LearningRate 0.0006 Epoch: 11 Global Step: 20630 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:24:23,145-Speed 24409.87 samples/sec Loss 3.0217 LearningRate 0.0006 Epoch: 11 Global Step: 20640 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:24:33,183-Speed 24483.65 samples/sec Loss 3.0388 LearningRate 0.0006 Epoch: 11 Global Step: 20650 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:24:43,278-Speed 24348.51 samples/sec Loss 3.0306 LearningRate 0.0006 Epoch: 11 Global Step: 20660 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:24:53,325-Speed 24462.45 samples/sec Loss 3.0454 LearningRate 0.0006 Epoch: 11 Global Step: 20670 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:25:03,411-Speed 24370.62 samples/sec Loss 3.0345 LearningRate 0.0006 Epoch: 11 Global Step: 20680 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:25:13,481-Speed 24406.09 samples/sec Loss 3.0257 LearningRate 0.0006 Epoch: 11 Global Step: 20690 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-03-26 04:25:23,556-Speed 24395.48 samples/sec Loss 3.0511 LearningRate 0.0006 Epoch: 11 Global Step: 20700 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:25:33,658-Speed 24329.39 samples/sec Loss 3.0511 LearningRate 0.0006 Epoch: 11 Global Step: 20710 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:25:43,768-Speed 24311.17 samples/sec Loss 3.0934 LearningRate 0.0006 Epoch: 11 Global Step: 20720 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:25:53,884-Speed 24295.65 samples/sec Loss 3.0764 LearningRate 0.0006 Epoch: 11 Global Step: 20730 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:26:04,020-Speed 24247.99 samples/sec Loss 3.0576 LearningRate 0.0006 Epoch: 11 Global Step: 20740 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:27:04,743-Speed 4047.35 samples/sec Loss 2.9859 LearningRate 0.0006 Epoch: 12 Global Step: 20750 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:27:14,860-Speed 24294.11 samples/sec Loss 2.9824 LearningRate 0.0006 Epoch: 12 Global Step: 20760 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:27:24,962-Speed 24331.34 samples/sec Loss 3.0063 LearningRate 0.0006 Epoch: 12 Global Step: 20770 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:27:35,074-Speed 24305.31 samples/sec Loss 2.9893 LearningRate 0.0006 Epoch: 12 Global Step: 20780 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:27:45,111-Speed 24488.05 samples/sec Loss 2.9919 LearningRate 0.0006 Epoch: 12 Global Step: 20790 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:27:55,160-Speed 24461.22 samples/sec Loss 3.0102 LearningRate 0.0006 Epoch: 12 Global Step: 20800 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-03-26 04:28:05,200-Speed 24480.81 samples/sec Loss 3.0232 LearningRate 0.0006 Epoch: 12 Global Step: 20810 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:28:15,446-Speed 23989.02 samples/sec Loss 2.9986 LearningRate 0.0006 Epoch: 12 Global Step: 20820 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:28:25,500-Speed 24448.75 samples/sec Loss 3.0042 LearningRate 0.0006 Epoch: 12 Global Step: 20830 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:28:35,571-Speed 24403.24 samples/sec Loss 3.0100 LearningRate 0.0006 Epoch: 12 Global Step: 20840 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:28:45,672-Speed 24333.81 samples/sec Loss 2.9812 LearningRate 0.0006 Epoch: 12 Global Step: 20850 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:28:55,787-Speed 24299.24 samples/sec Loss 2.9886 LearningRate 0.0006 Epoch: 12 Global Step: 20860 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:29:05,816-Speed 24506.74 samples/sec Loss 2.9964 LearningRate 0.0006 Epoch: 12 Global Step: 20870 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:29:15,975-Speed 24196.57 samples/sec Loss 2.9742 LearningRate 0.0006 Epoch: 12 Global Step: 20880 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:29:26,093-Speed 24294.09 samples/sec Loss 2.9626 LearningRate 0.0006 Epoch: 12 Global Step: 20890 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:29:36,221-Speed 24267.34 samples/sec Loss 2.9755 LearningRate 0.0006 Epoch: 12 Global Step: 20900 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:29:46,266-Speed 24468.97 samples/sec Loss 2.9944 LearningRate 0.0006 Epoch: 12 Global Step: 20910 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-03-26 04:29:56,404-Speed 24245.55 samples/sec Loss 3.0137 LearningRate 0.0006 Epoch: 12 Global Step: 20920 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-03-26 04:30:06,475-Speed 24404.71 samples/sec Loss 3.0002 LearningRate 0.0006 Epoch: 12 Global Step: 20930 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:30:16,575-Speed 24336.21 samples/sec Loss 3.0071 LearningRate 0.0006 Epoch: 12 Global Step: 20940 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:30:26,624-Speed 24458.66 samples/sec Loss 2.9953 LearningRate 0.0006 Epoch: 12 Global Step: 20950 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:30:36,655-Speed 24503.85 samples/sec Loss 2.9787 LearningRate 0.0006 Epoch: 12 Global Step: 20960 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:30:46,813-Speed 24197.15 samples/sec Loss 3.0053 LearningRate 0.0006 Epoch: 12 Global Step: 20970 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:30:56,861-Speed 24464.53 samples/sec Loss 3.0121 LearningRate 0.0006 Epoch: 12 Global Step: 20980 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:31:06,973-Speed 24305.69 samples/sec Loss 2.9909 LearningRate 0.0006 Epoch: 12 Global Step: 20990 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:31:17,152-Speed 24146.95 samples/sec Loss 3.0030 LearningRate 0.0006 Epoch: 12 Global Step: 21000 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:31:27,211-Speed 24441.51 samples/sec Loss 2.9760 LearningRate 0.0006 Epoch: 12 Global Step: 21010 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:31:37,305-Speed 24349.42 samples/sec Loss 2.9760 LearningRate 0.0006 Epoch: 12 Global Step: 21020 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:31:47,369-Speed 24421.65 samples/sec Loss 2.9848 LearningRate 0.0006 Epoch: 12 Global Step: 21030 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:31:57,428-Speed 24435.41 samples/sec Loss 2.9583 LearningRate 0.0006 Epoch: 12 Global Step: 21040 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:32:07,524-Speed 24343.94 samples/sec Loss 3.0251 LearningRate 0.0006 Epoch: 12 Global Step: 21050 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:32:17,689-Speed 24179.42 samples/sec Loss 3.0376 LearningRate 0.0006 Epoch: 12 Global Step: 21060 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:32:27,861-Speed 24169.97 samples/sec Loss 2.9859 LearningRate 0.0006 Epoch: 12 Global Step: 21070 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:32:37,901-Speed 24480.73 samples/sec Loss 2.9892 LearningRate 0.0006 Epoch: 12 Global Step: 21080 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:32:48,008-Speed 24319.17 samples/sec Loss 2.9771 LearningRate 0.0006 Epoch: 12 Global Step: 21090 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:32:58,158-Speed 24214.60 samples/sec Loss 2.9897 LearningRate 0.0006 Epoch: 12 Global Step: 21100 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:33:08,232-Speed 24399.60 samples/sec Loss 2.9842 LearningRate 0.0006 Epoch: 12 Global Step: 21110 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:33:18,298-Speed 24417.04 samples/sec Loss 2.9761 LearningRate 0.0006 Epoch: 12 Global Step: 21120 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:33:28,345-Speed 24464.61 samples/sec Loss 2.9580 LearningRate 0.0006 Epoch: 12 Global Step: 21130 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:33:38,393-Speed 24467.42 samples/sec Loss 2.9885 LearningRate 0.0006 Epoch: 12 Global Step: 21140 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:33:48,471-Speed 24388.32 samples/sec Loss 3.0159 LearningRate 0.0006 Epoch: 12 Global Step: 21150 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:33:58,532-Speed 24430.08 samples/sec Loss 2.9758 LearningRate 0.0006 Epoch: 12 Global Step: 21160 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-26 04:34:08,645-Speed 24304.43 samples/sec Loss 2.9787 LearningRate 0.0006 Epoch: 12 Global Step: 21170 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:34:18,757-Speed 24305.44 samples/sec Loss 2.9942 LearningRate 0.0006 Epoch: 12 Global Step: 21180 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:34:28,861-Speed 24334.39 samples/sec Loss 2.9462 LearningRate 0.0006 Epoch: 12 Global Step: 21190 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:34:39,083-Speed 24046.69 samples/sec Loss 2.9690 LearningRate 0.0006 Epoch: 12 Global Step: 21200 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:34:49,141-Speed 24435.69 samples/sec Loss 2.9921 LearningRate 0.0006 Epoch: 12 Global Step: 21210 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:34:59,276-Speed 24250.43 samples/sec Loss 3.0048 LearningRate 0.0006 Epoch: 12 Global Step: 21220 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:35:09,314-Speed 24486.55 samples/sec Loss 2.9900 LearningRate 0.0006 Epoch: 12 Global Step: 21230 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:35:19,349-Speed 24493.37 samples/sec Loss 2.9583 LearningRate 0.0006 Epoch: 12 Global Step: 21240 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:35:29,415-Speed 24418.51 samples/sec Loss 2.9648 LearningRate 0.0006 Epoch: 12 Global Step: 21250 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:35:39,468-Speed 24449.81 samples/sec Loss 2.9396 LearningRate 0.0006 Epoch: 12 Global Step: 21260 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:35:49,602-Speed 24254.81 samples/sec Loss 2.9920 LearningRate 0.0006 Epoch: 12 Global Step: 21270 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:35:59,745-Speed 24234.06 samples/sec Loss 2.9439 LearningRate 0.0006 Epoch: 12 Global Step: 21280 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:36:09,872-Speed 24271.51 samples/sec Loss 2.9857 LearningRate 0.0006 Epoch: 12 Global Step: 21290 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:36:19,932-Speed 24432.76 samples/sec Loss 2.9595 LearningRate 0.0006 Epoch: 12 Global Step: 21300 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:36:30,031-Speed 24337.29 samples/sec Loss 2.9600 LearningRate 0.0006 Epoch: 12 Global Step: 21310 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:36:40,126-Speed 24350.51 samples/sec Loss 2.9542 LearningRate 0.0006 Epoch: 12 Global Step: 21320 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:36:50,227-Speed 24333.72 samples/sec Loss 2.9801 LearningRate 0.0006 Epoch: 12 Global Step: 21330 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-03-26 04:37:00,268-Speed 24477.10 samples/sec Loss 2.9919 LearningRate 0.0006 Epoch: 12 Global Step: 21340 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:37:10,337-Speed 24410.56 samples/sec Loss 2.9514 LearningRate 0.0006 Epoch: 12 Global Step: 21350 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:37:20,362-Speed 24520.91 samples/sec Loss 2.9651 LearningRate 0.0006 Epoch: 12 Global Step: 21360 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:37:30,439-Speed 24389.66 samples/sec Loss 2.9295 LearningRate 0.0006 Epoch: 12 Global Step: 21370 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:37:40,475-Speed 24490.74 samples/sec Loss 2.9823 LearningRate 0.0006 Epoch: 12 Global Step: 21380 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:37:50,560-Speed 24372.21 samples/sec Loss 2.9661 LearningRate 0.0006 Epoch: 12 Global Step: 21390 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:38:00,665-Speed 24325.60 samples/sec Loss 2.9663 LearningRate 0.0006 Epoch: 12 Global Step: 21400 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:38:10,771-Speed 24320.87 samples/sec Loss 2.9444 LearningRate 0.0006 Epoch: 12 Global Step: 21410 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:38:20,830-Speed 24433.95 samples/sec Loss 2.9500 LearningRate 0.0006 Epoch: 12 Global Step: 21420 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:38:30,951-Speed 24287.15 samples/sec Loss 2.9573 LearningRate 0.0006 Epoch: 12 Global Step: 21430 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:38:40,949-Speed 24581.82 samples/sec Loss 2.9438 LearningRate 0.0006 Epoch: 12 Global Step: 21440 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-03-26 04:38:51,161-Speed 24068.29 samples/sec Loss 2.9643 LearningRate 0.0006 Epoch: 12 Global Step: 21450 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-03-26 04:39:01,298-Speed 24250.10 samples/sec Loss 2.9615 LearningRate 0.0006 Epoch: 12 Global Step: 21460 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:39:11,365-Speed 24417.22 samples/sec Loss 2.9684 LearningRate 0.0006 Epoch: 12 Global Step: 21470 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:39:21,466-Speed 24335.20 samples/sec Loss 2.9319 LearningRate 0.0006 Epoch: 12 Global Step: 21480 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:39:31,535-Speed 24411.63 samples/sec Loss 2.9167 LearningRate 0.0006 Epoch: 12 Global Step: 21490 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:39:41,596-Speed 24429.97 samples/sec Loss 2.9709 LearningRate 0.0006 Epoch: 12 Global Step: 21500 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:39:51,654-Speed 24438.63 samples/sec Loss 2.9438 LearningRate 0.0006 Epoch: 12 Global Step: 21510 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:40:01,785-Speed 24267.77 samples/sec Loss 2.9749 LearningRate 0.0006 Epoch: 12 Global Step: 21520 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:40:11,976-Speed 24117.18 samples/sec Loss 2.9629 LearningRate 0.0006 Epoch: 12 Global Step: 21530 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:40:22,100-Speed 24277.94 samples/sec Loss 2.9474 LearningRate 0.0006 Epoch: 12 Global Step: 21540 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:40:32,171-Speed 24408.52 samples/sec Loss 2.9403 LearningRate 0.0006 Epoch: 12 Global Step: 21550 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:40:42,225-Speed 24446.21 samples/sec Loss 2.9509 LearningRate 0.0006 Epoch: 12 Global Step: 21560 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-03-26 04:40:52,332-Speed 24319.49 samples/sec Loss 2.9862 LearningRate 0.0006 Epoch: 12 Global Step: 21570 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:41:02,495-Speed 24184.74 samples/sec Loss 2.9766 LearningRate 0.0006 Epoch: 12 Global Step: 21580 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:41:12,627-Speed 24257.74 samples/sec Loss 2.9669 LearningRate 0.0006 Epoch: 12 Global Step: 21590 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:41:22,673-Speed 24468.57 samples/sec Loss 2.9505 LearningRate 0.0006 Epoch: 12 Global Step: 21600 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:41:32,768-Speed 24346.45 samples/sec Loss 2.9295 LearningRate 0.0006 Epoch: 12 Global Step: 21610 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:41:42,813-Speed 24469.70 samples/sec Loss 2.9100 LearningRate 0.0006 Epoch: 12 Global Step: 21620 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:41:52,946-Speed 24256.62 samples/sec Loss 2.9462 LearningRate 0.0006 Epoch: 12 Global Step: 21630 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:42:03,063-Speed 24295.32 samples/sec Loss 2.9719 LearningRate 0.0006 Epoch: 12 Global Step: 21640 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:42:13,161-Speed 24341.94 samples/sec Loss 2.9224 LearningRate 0.0006 Epoch: 12 Global Step: 21650 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:42:23,318-Speed 24197.61 samples/sec Loss 2.9063 LearningRate 0.0006 Epoch: 12 Global Step: 21660 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:42:33,448-Speed 24263.32 samples/sec Loss 2.9278 LearningRate 0.0006 Epoch: 12 Global Step: 21670 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:42:43,553-Speed 24324.26 samples/sec Loss 2.9154 LearningRate 0.0006 Epoch: 12 Global Step: 21680 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:42:53,605-Speed 24451.79 samples/sec Loss 2.9573 LearningRate 0.0006 Epoch: 12 Global Step: 21690 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:43:03,716-Speed 24310.19 samples/sec Loss 2.9258 LearningRate 0.0006 Epoch: 12 Global Step: 21700 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:43:13,866-Speed 24221.31 samples/sec Loss 2.9353 LearningRate 0.0006 Epoch: 12 Global Step: 21710 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:43:24,021-Speed 24201.19 samples/sec Loss 2.9417 LearningRate 0.0006 Epoch: 12 Global Step: 21720 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:43:34,073-Speed 24453.30 samples/sec Loss 2.9268 LearningRate 0.0006 Epoch: 12 Global Step: 21730 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:43:44,151-Speed 24393.76 samples/sec Loss 2.9343 LearningRate 0.0006 Epoch: 12 Global Step: 21740 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:43:54,311-Speed 24191.59 samples/sec Loss 2.9454 LearningRate 0.0006 Epoch: 12 Global Step: 21750 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:44:04,355-Speed 24471.75 samples/sec Loss 2.9426 LearningRate 0.0006 Epoch: 12 Global Step: 21760 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:44:14,408-Speed 24449.11 samples/sec Loss 2.9026 LearningRate 0.0006 Epoch: 12 Global Step: 21770 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:44:24,563-Speed 24203.73 samples/sec Loss 2.9497 LearningRate 0.0006 Epoch: 12 Global Step: 21780 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:44:34,619-Speed 24443.95 samples/sec Loss 2.9319 LearningRate 0.0006 Epoch: 12 Global Step: 21790 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:44:44,625-Speed 24570.61 samples/sec Loss 2.8972 LearningRate 0.0006 Epoch: 12 Global Step: 21800 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:44:54,797-Speed 24162.52 samples/sec Loss 2.8992 LearningRate 0.0006 Epoch: 12 Global Step: 21810 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:45:04,884-Speed 24365.10 samples/sec Loss 2.9019 LearningRate 0.0006 Epoch: 12 Global Step: 21820 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:45:14,948-Speed 24423.62 samples/sec Loss 2.9118 LearningRate 0.0006 Epoch: 12 Global Step: 21830 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:45:24,996-Speed 24462.07 samples/sec Loss 2.9087 LearningRate 0.0006 Epoch: 12 Global Step: 21840 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:45:35,073-Speed 24391.90 samples/sec Loss 2.9090 LearningRate 0.0006 Epoch: 12 Global Step: 21850 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:45:45,126-Speed 24448.96 samples/sec Loss 2.8940 LearningRate 0.0006 Epoch: 12 Global Step: 21860 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:45:55,218-Speed 24356.03 samples/sec Loss 2.9031 LearningRate 0.0006 Epoch: 12 Global Step: 21870 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:46:05,349-Speed 24259.91 samples/sec Loss 2.9469 LearningRate 0.0006 Epoch: 12 Global Step: 21880 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:46:15,395-Speed 24468.32 samples/sec Loss 2.9221 LearningRate 0.0006 Epoch: 12 Global Step: 21890 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:46:25,496-Speed 24331.27 samples/sec Loss 2.8948 LearningRate 0.0006 Epoch: 12 Global Step: 21900 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:46:35,539-Speed 24473.84 samples/sec Loss 2.9156 LearningRate 0.0006 Epoch: 12 Global Step: 21910 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:46:45,596-Speed 24441.63 samples/sec Loss 2.9112 LearningRate 0.0006 Epoch: 12 Global Step: 21920 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:46:55,677-Speed 24381.33 samples/sec Loss 2.9208 LearningRate 0.0006 Epoch: 12 Global Step: 21930 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:47:05,775-Speed 24339.09 samples/sec Loss 2.9327 LearningRate 0.0006 Epoch: 12 Global Step: 21940 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:47:15,841-Speed 24416.56 samples/sec Loss 2.9425 LearningRate 0.0006 Epoch: 12 Global Step: 21950 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:47:25,907-Speed 24420.63 samples/sec Loss 2.9042 LearningRate 0.0006 Epoch: 12 Global Step: 21960 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:47:36,058-Speed 24215.03 samples/sec Loss 2.9042 LearningRate 0.0006 Epoch: 12 Global Step: 21970 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-03-26 04:47:46,113-Speed 24450.76 samples/sec Loss 2.9219 LearningRate 0.0006 Epoch: 12 Global Step: 21980 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:47:56,197-Speed 24376.91 samples/sec Loss 2.8956 LearningRate 0.0006 Epoch: 12 Global Step: 21990 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:48:06,306-Speed 24312.95 samples/sec Loss 2.8922 LearningRate 0.0006 Epoch: 12 Global Step: 22000 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:48:16,369-Speed 24426.28 samples/sec Loss 2.8813 LearningRate 0.0006 Epoch: 12 Global Step: 22010 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:48:26,403-Speed 24498.04 samples/sec Loss 2.8894 LearningRate 0.0006 Epoch: 12 Global Step: 22020 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:48:36,503-Speed 24335.39 samples/sec Loss 2.9351 LearningRate 0.0006 Epoch: 12 Global Step: 22030 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:48:46,578-Speed 24395.65 samples/sec Loss 2.9147 LearningRate 0.0006 Epoch: 12 Global Step: 22040 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:48:56,724-Speed 24225.26 samples/sec Loss 2.8902 LearningRate 0.0006 Epoch: 12 Global Step: 22050 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:49:06,814-Speed 24357.30 samples/sec Loss 2.8894 LearningRate 0.0006 Epoch: 12 Global Step: 22060 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:49:16,890-Speed 24394.12 samples/sec Loss 2.8770 LearningRate 0.0006 Epoch: 12 Global Step: 22070 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:49:26,902-Speed 24551.51 samples/sec Loss 2.8653 LearningRate 0.0006 Epoch: 12 Global Step: 22080 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:49:37,053-Speed 24214.11 samples/sec Loss 2.8850 LearningRate 0.0006 Epoch: 12 Global Step: 22090 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:49:47,155-Speed 24331.07 samples/sec Loss 2.8971 LearningRate 0.0006 Epoch: 12 Global Step: 22100 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:49:57,490-Speed 23780.16 samples/sec Loss 2.9006 LearningRate 0.0006 Epoch: 12 Global Step: 22110 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:50:07,570-Speed 24384.65 samples/sec Loss 2.9088 LearningRate 0.0006 Epoch: 12 Global Step: 22120 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:50:17,766-Speed 24106.69 samples/sec Loss 2.8975 LearningRate 0.0006 Epoch: 12 Global Step: 22130 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:50:27,842-Speed 24394.55 samples/sec Loss 2.8893 LearningRate 0.0006 Epoch: 12 Global Step: 22140 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:50:38,006-Speed 24181.59 samples/sec Loss 2.9115 LearningRate 0.0006 Epoch: 12 Global Step: 22150 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:50:48,064-Speed 24437.26 samples/sec Loss 2.8977 LearningRate 0.0006 Epoch: 12 Global Step: 22160 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:50:58,319-Speed 23967.39 samples/sec Loss 2.9000 LearningRate 0.0006 Epoch: 12 Global Step: 22170 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:51:08,409-Speed 24361.61 samples/sec Loss 2.8870 LearningRate 0.0006 Epoch: 12 Global Step: 22180 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-03-26 04:51:18,505-Speed 24343.37 samples/sec Loss 2.8795 LearningRate 0.0006 Epoch: 12 Global Step: 22190 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:51:28,671-Speed 24179.07 samples/sec Loss 2.8898 LearningRate 0.0006 Epoch: 12 Global Step: 22200 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:51:38,712-Speed 24478.35 samples/sec Loss 2.8947 LearningRate 0.0006 Epoch: 12 Global Step: 22210 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:51:48,795-Speed 24376.09 samples/sec Loss 2.9184 LearningRate 0.0006 Epoch: 12 Global Step: 22220 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:51:58,848-Speed 24449.38 samples/sec Loss 2.8768 LearningRate 0.0006 Epoch: 12 Global Step: 22230 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:52:08,928-Speed 24383.41 samples/sec Loss 2.8678 LearningRate 0.0006 Epoch: 12 Global Step: 22240 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:52:19,013-Speed 24370.33 samples/sec Loss 2.9157 LearningRate 0.0006 Epoch: 12 Global Step: 22250 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:52:29,081-Speed 24413.79 samples/sec Loss 2.9061 LearningRate 0.0006 Epoch: 12 Global Step: 22260 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:52:39,184-Speed 24329.29 samples/sec Loss 2.9005 LearningRate 0.0006 Epoch: 12 Global Step: 22270 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:52:49,275-Speed 24356.09 samples/sec Loss 2.9081 LearningRate 0.0006 Epoch: 12 Global Step: 22280 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:52:59,324-Speed 24461.15 samples/sec Loss 2.8718 LearningRate 0.0006 Epoch: 12 Global Step: 22290 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-03-26 04:53:09,410-Speed 24367.76 samples/sec Loss 2.8978 LearningRate 0.0006 Epoch: 12 Global Step: 22300 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:53:19,574-Speed 24185.92 samples/sec Loss 2.8877 LearningRate 0.0006 Epoch: 12 Global Step: 22310 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:53:29,705-Speed 24262.73 samples/sec Loss 2.8900 LearningRate 0.0006 Epoch: 12 Global Step: 22320 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:53:39,784-Speed 24388.78 samples/sec Loss 2.9241 LearningRate 0.0006 Epoch: 12 Global Step: 22330 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:53:49,860-Speed 24392.80 samples/sec Loss 2.8810 LearningRate 0.0006 Epoch: 12 Global Step: 22340 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:54:00,011-Speed 24212.86 samples/sec Loss 2.8685 LearningRate 0.0006 Epoch: 12 Global Step: 22350 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:54:10,062-Speed 24460.21 samples/sec Loss 2.8859 LearningRate 0.0006 Epoch: 12 Global Step: 22360 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:54:20,138-Speed 24394.70 samples/sec Loss 2.8751 LearningRate 0.0006 Epoch: 12 Global Step: 22370 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:54:30,204-Speed 24419.82 samples/sec Loss 2.9083 LearningRate 0.0006 Epoch: 12 Global Step: 22380 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:54:40,359-Speed 24204.43 samples/sec Loss 2.8999 LearningRate 0.0006 Epoch: 12 Global Step: 22390 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:54:50,446-Speed 24365.36 samples/sec Loss 2.9186 LearningRate 0.0006 Epoch: 12 Global Step: 22400 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-03-26 04:55:00,518-Speed 24405.31 samples/sec Loss 2.9153 LearningRate 0.0006 Epoch: 12 Global Step: 22410 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:55:10,556-Speed 24487.09 samples/sec Loss 2.8987 LearningRate 0.0006 Epoch: 12 Global Step: 22420 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:55:20,660-Speed 24327.91 samples/sec Loss 2.8886 LearningRate 0.0006 Epoch: 12 Global Step: 22430 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:55:30,738-Speed 24388.24 samples/sec Loss 2.9215 LearningRate 0.0006 Epoch: 12 Global Step: 22440 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:55:40,894-Speed 24201.07 samples/sec Loss 2.9287 LearningRate 0.0006 Epoch: 12 Global Step: 22450 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:55:51,018-Speed 24277.31 samples/sec Loss 2.9278 LearningRate 0.0006 Epoch: 12 Global Step: 22460 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:56:51,722-Speed 4048.60 samples/sec Loss 2.8908 LearningRate 0.0006 Epoch: 13 Global Step: 22470 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:57:01,797-Speed 24398.27 samples/sec Loss 2.8484 LearningRate 0.0006 Epoch: 13 Global Step: 22480 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:57:11,929-Speed 24261.98 samples/sec Loss 2.8220 LearningRate 0.0006 Epoch: 13 Global Step: 22490 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:57:21,966-Speed 24489.56 samples/sec Loss 2.8487 LearningRate 0.0006 Epoch: 13 Global Step: 22500 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:57:32,021-Speed 24446.19 samples/sec Loss 2.8366 LearningRate 0.0006 Epoch: 13 Global Step: 22510 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:57:42,103-Speed 24381.62 samples/sec Loss 2.8374 LearningRate 0.0006 Epoch: 13 Global Step: 22520 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:57:52,180-Speed 24389.99 samples/sec Loss 2.8343 LearningRate 0.0006 Epoch: 13 Global Step: 22530 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:58:02,267-Speed 24365.35 samples/sec Loss 2.8228 LearningRate 0.0006 Epoch: 13 Global Step: 22540 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:58:12,394-Speed 24271.48 samples/sec Loss 2.8635 LearningRate 0.0006 Epoch: 13 Global Step: 22550 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:58:22,500-Speed 24321.28 samples/sec Loss 2.8457 LearningRate 0.0006 Epoch: 13 Global Step: 22560 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:58:32,695-Speed 24108.04 samples/sec Loss 2.8342 LearningRate 0.0006 Epoch: 13 Global Step: 22570 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:58:42,823-Speed 24268.92 samples/sec Loss 2.8373 LearningRate 0.0006 Epoch: 13 Global Step: 22580 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:58:52,883-Speed 24434.62 samples/sec Loss 2.8274 LearningRate 0.0006 Epoch: 13 Global Step: 22590 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:59:02,948-Speed 24423.83 samples/sec Loss 2.8353 LearningRate 0.0006 Epoch: 13 Global Step: 22600 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:59:13,043-Speed 24349.19 samples/sec Loss 2.8585 LearningRate 0.0006 Epoch: 13 Global Step: 22610 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-03-26 04:59:23,191-Speed 24220.85 samples/sec Loss 2.8325 LearningRate 0.0006 Epoch: 13 Global Step: 22620 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:59:33,265-Speed 24397.95 samples/sec Loss 2.8376 LearningRate 0.0006 Epoch: 13 Global Step: 22630 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:59:43,359-Speed 24354.49 samples/sec Loss 2.8723 LearningRate 0.0006 Epoch: 13 Global Step: 22640 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 04:59:53,426-Speed 24414.65 samples/sec Loss 2.8422 LearningRate 0.0006 Epoch: 13 Global Step: 22650 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:00:03,547-Speed 24287.03 samples/sec Loss 2.8495 LearningRate 0.0006 Epoch: 13 Global Step: 22660 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:00:13,793-Speed 23986.85 samples/sec Loss 2.8460 LearningRate 0.0006 Epoch: 13 Global Step: 22670 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:00:23,970-Speed 24152.81 samples/sec Loss 2.8827 LearningRate 0.0006 Epoch: 13 Global Step: 22680 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:00:34,022-Speed 24450.41 samples/sec Loss 2.8695 LearningRate 0.0006 Epoch: 13 Global Step: 22690 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:00:44,135-Speed 24303.62 samples/sec Loss 2.8707 LearningRate 0.0006 Epoch: 13 Global Step: 22700 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:00:54,261-Speed 24274.15 samples/sec Loss 2.8750 LearningRate 0.0006 Epoch: 13 Global Step: 22710 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:01:04,346-Speed 24370.80 samples/sec Loss 2.8462 LearningRate 0.0006 Epoch: 13 Global Step: 22720 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-03-26 05:01:14,387-Speed 24480.30 samples/sec Loss 2.8294 LearningRate 0.0006 Epoch: 13 Global Step: 22730 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:01:24,451-Speed 24422.79 samples/sec Loss 2.8348 LearningRate 0.0006 Epoch: 13 Global Step: 22740 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:01:34,539-Speed 24365.26 samples/sec Loss 2.8415 LearningRate 0.0006 Epoch: 13 Global Step: 22750 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:01:44,817-Speed 23912.26 samples/sec Loss 2.8421 LearningRate 0.0006 Epoch: 13 Global Step: 22760 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:01:54,885-Speed 24414.25 samples/sec Loss 2.8537 LearningRate 0.0006 Epoch: 13 Global Step: 22770 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:02:04,974-Speed 24360.80 samples/sec Loss 2.8614 LearningRate 0.0006 Epoch: 13 Global Step: 22780 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:02:15,101-Speed 24272.80 samples/sec Loss 2.8599 LearningRate 0.0006 Epoch: 13 Global Step: 22790 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:02:25,209-Speed 24315.12 samples/sec Loss 2.8381 LearningRate 0.0006 Epoch: 13 Global Step: 22800 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:02:35,339-Speed 24264.68 samples/sec Loss 2.8419 LearningRate 0.0006 Epoch: 13 Global Step: 22810 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:02:45,436-Speed 24341.08 samples/sec Loss 2.8196 LearningRate 0.0006 Epoch: 13 Global Step: 22820 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:02:55,597-Speed 24190.97 samples/sec Loss 2.8222 LearningRate 0.0006 Epoch: 13 Global Step: 22830 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:03:05,736-Speed 24249.25 samples/sec Loss 2.8512 LearningRate 0.0006 Epoch: 13 Global Step: 22840 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:03:15,834-Speed 24339.78 samples/sec Loss 2.8723 LearningRate 0.0006 Epoch: 13 Global Step: 22850 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:03:25,927-Speed 24360.16 samples/sec Loss 2.8624 LearningRate 0.0006 Epoch: 13 Global Step: 22860 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:03:35,940-Speed 24546.52 samples/sec Loss 2.8894 LearningRate 0.0006 Epoch: 13 Global Step: 22870 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:03:46,078-Speed 24244.47 samples/sec Loss 2.8498 LearningRate 0.0006 Epoch: 13 Global Step: 22880 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:03:56,138-Speed 24432.27 samples/sec Loss 2.8142 LearningRate 0.0006 Epoch: 13 Global Step: 22890 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:04:06,201-Speed 24424.63 samples/sec Loss 2.8106 LearningRate 0.0006 Epoch: 13 Global Step: 22900 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:04:16,320-Speed 24289.15 samples/sec Loss 2.8290 LearningRate 0.0006 Epoch: 13 Global Step: 22910 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:04:26,368-Speed 24462.47 samples/sec Loss 2.8240 LearningRate 0.0006 Epoch: 13 Global Step: 22920 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:04:36,534-Speed 24177.55 samples/sec Loss 2.8375 LearningRate 0.0006 Epoch: 13 Global Step: 22930 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:04:46,650-Speed 24298.70 samples/sec Loss 2.8296 LearningRate 0.0006 Epoch: 13 Global Step: 22940 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:04:56,762-Speed 24306.04 samples/sec Loss 2.8526 LearningRate 0.0006 Epoch: 13 Global Step: 22950 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:05:06,890-Speed 24270.04 samples/sec Loss 2.8079 LearningRate 0.0006 Epoch: 13 Global Step: 22960 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:05:16,992-Speed 24330.77 samples/sec Loss 2.8060 LearningRate 0.0006 Epoch: 13 Global Step: 22970 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:05:27,123-Speed 24261.10 samples/sec Loss 2.8418 LearningRate 0.0006 Epoch: 13 Global Step: 22980 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:05:37,192-Speed 24411.82 samples/sec Loss 2.8270 LearningRate 0.0005 Epoch: 13 Global Step: 22990 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:05:47,376-Speed 24134.68 samples/sec Loss 2.8263 LearningRate 0.0005 Epoch: 13 Global Step: 23000 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:05:57,470-Speed 24349.63 samples/sec Loss 2.8115 LearningRate 0.0005 Epoch: 13 Global Step: 23010 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:06:07,578-Speed 24319.22 samples/sec Loss 2.8223 LearningRate 0.0005 Epoch: 13 Global Step: 23020 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:06:17,631-Speed 24447.53 samples/sec Loss 2.8342 LearningRate 0.0005 Epoch: 13 Global Step: 23030 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:06:27,718-Speed 24366.17 samples/sec Loss 2.8140 LearningRate 0.0005 Epoch: 13 Global Step: 23040 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:06:37,776-Speed 24439.43 samples/sec Loss 2.8265 LearningRate 0.0005 Epoch: 13 Global Step: 23050 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:06:47,866-Speed 24360.21 samples/sec Loss 2.8255 LearningRate 0.0005 Epoch: 13 Global Step: 23060 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:06:57,886-Speed 24530.11 samples/sec Loss 2.8290 LearningRate 0.0005 Epoch: 13 Global Step: 23070 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:07:07,987-Speed 24336.86 samples/sec Loss 2.8078 LearningRate 0.0005 Epoch: 13 Global Step: 23080 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:07:18,133-Speed 24226.07 samples/sec Loss 2.8061 LearningRate 0.0005 Epoch: 13 Global Step: 23090 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:07:28,302-Speed 24170.31 samples/sec Loss 2.8332 LearningRate 0.0005 Epoch: 13 Global Step: 23100 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:07:38,425-Speed 24283.96 samples/sec Loss 2.8114 LearningRate 0.0005 Epoch: 13 Global Step: 23110 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:07:48,480-Speed 24446.98 samples/sec Loss 2.8152 LearningRate 0.0005 Epoch: 13 Global Step: 23120 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:07:58,565-Speed 24370.10 samples/sec Loss 2.8321 LearningRate 0.0005 Epoch: 13 Global Step: 23130 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:08:08,731-Speed 24181.61 samples/sec Loss 2.8077 LearningRate 0.0005 Epoch: 13 Global Step: 23140 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:08:18,819-Speed 24365.15 samples/sec Loss 2.8059 LearningRate 0.0005 Epoch: 13 Global Step: 23150 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:08:28,882-Speed 24423.33 samples/sec Loss 2.8159 LearningRate 0.0005 Epoch: 13 Global Step: 23160 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:08:38,926-Speed 24474.05 samples/sec Loss 2.8033 LearningRate 0.0005 Epoch: 13 Global Step: 23170 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:08:49,040-Speed 24302.76 samples/sec Loss 2.8173 LearningRate 0.0005 Epoch: 13 Global Step: 23180 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:08:59,112-Speed 24402.63 samples/sec Loss 2.8303 LearningRate 0.0005 Epoch: 13 Global Step: 23190 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:09:09,235-Speed 24280.78 samples/sec Loss 2.8225 LearningRate 0.0005 Epoch: 13 Global Step: 23200 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:09:19,345-Speed 24312.61 samples/sec Loss 2.8082 LearningRate 0.0005 Epoch: 13 Global Step: 23210 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:09:29,426-Speed 24383.27 samples/sec Loss 2.8027 LearningRate 0.0005 Epoch: 13 Global Step: 23220 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:09:39,465-Speed 24488.86 samples/sec Loss 2.8474 LearningRate 0.0005 Epoch: 13 Global Step: 23230 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:09:49,549-Speed 24373.00 samples/sec Loss 2.8269 LearningRate 0.0005 Epoch: 13 Global Step: 23240 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:09:59,623-Speed 24398.74 samples/sec Loss 2.8001 LearningRate 0.0005 Epoch: 13 Global Step: 23250 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:10:09,668-Speed 24468.71 samples/sec Loss 2.7818 LearningRate 0.0005 Epoch: 13 Global Step: 23260 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:10:19,835-Speed 24176.31 samples/sec Loss 2.8421 LearningRate 0.0005 Epoch: 13 Global Step: 23270 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:10:29,983-Speed 24218.79 samples/sec Loss 2.7797 LearningRate 0.0005 Epoch: 13 Global Step: 23280 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:10:40,013-Speed 24504.66 samples/sec Loss 2.7777 LearningRate 0.0005 Epoch: 13 Global Step: 23290 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:10:50,074-Speed 24431.02 samples/sec Loss 2.7758 LearningRate 0.0005 Epoch: 13 Global Step: 23300 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:11:00,043-Speed 24658.60 samples/sec Loss 2.8101 LearningRate 0.0005 Epoch: 13 Global Step: 23310 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:11:10,147-Speed 24325.22 samples/sec Loss 2.8052 LearningRate 0.0005 Epoch: 13 Global Step: 23320 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:11:20,288-Speed 24236.99 samples/sec Loss 2.8100 LearningRate 0.0005 Epoch: 13 Global Step: 23330 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:11:30,405-Speed 24293.84 samples/sec Loss 2.7710 LearningRate 0.0005 Epoch: 13 Global Step: 23340 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:11:40,530-Speed 24275.90 samples/sec Loss 2.8046 LearningRate 0.0005 Epoch: 13 Global Step: 23350 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:11:50,689-Speed 24202.53 samples/sec Loss 2.7874 LearningRate 0.0005 Epoch: 13 Global Step: 23360 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:12:00,787-Speed 24344.25 samples/sec Loss 2.8086 LearningRate 0.0005 Epoch: 13 Global Step: 23370 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:12:10,861-Speed 24396.85 samples/sec Loss 2.8140 LearningRate 0.0005 Epoch: 13 Global Step: 23380 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:12:20,979-Speed 24296.00 samples/sec Loss 2.7807 LearningRate 0.0005 Epoch: 13 Global Step: 23390 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:12:31,051-Speed 24403.26 samples/sec Loss 2.7702 LearningRate 0.0005 Epoch: 13 Global Step: 23400 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:12:41,194-Speed 24232.21 samples/sec Loss 2.7950 LearningRate 0.0005 Epoch: 13 Global Step: 23410 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-03-26 05:12:51,335-Speed 24238.22 samples/sec Loss 2.7948 LearningRate 0.0005 Epoch: 13 Global Step: 23420 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:13:01,435-Speed 24336.41 samples/sec Loss 2.7874 LearningRate 0.0005 Epoch: 13 Global Step: 23430 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:13:11,530-Speed 24347.30 samples/sec Loss 2.7802 LearningRate 0.0005 Epoch: 13 Global Step: 23440 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:13:21,736-Speed 24081.94 samples/sec Loss 2.7837 LearningRate 0.0005 Epoch: 13 Global Step: 23450 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:13:31,874-Speed 24244.14 samples/sec Loss 2.7842 LearningRate 0.0005 Epoch: 13 Global Step: 23460 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:13:41,960-Speed 24368.95 samples/sec Loss 2.7973 LearningRate 0.0005 Epoch: 13 Global Step: 23470 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:13:52,061-Speed 24332.77 samples/sec Loss 2.7953 LearningRate 0.0005 Epoch: 13 Global Step: 23480 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:14:02,147-Speed 24368.71 samples/sec Loss 2.8044 LearningRate 0.0005 Epoch: 13 Global Step: 23490 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:14:12,291-Speed 24230.34 samples/sec Loss 2.8066 LearningRate 0.0005 Epoch: 13 Global Step: 23500 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:14:22,390-Speed 24339.08 samples/sec Loss 2.7849 LearningRate 0.0005 Epoch: 13 Global Step: 23510 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:14:32,506-Speed 24296.67 samples/sec Loss 2.7811 LearningRate 0.0005 Epoch: 13 Global Step: 23520 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:14:42,636-Speed 24264.66 samples/sec Loss 2.7811 LearningRate 0.0005 Epoch: 13 Global Step: 23530 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:14:52,680-Speed 24471.18 samples/sec Loss 2.7798 LearningRate 0.0005 Epoch: 13 Global Step: 23540 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:15:02,722-Speed 24478.04 samples/sec Loss 2.7702 LearningRate 0.0005 Epoch: 13 Global Step: 23550 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:15:12,838-Speed 24298.51 samples/sec Loss 2.7906 LearningRate 0.0005 Epoch: 13 Global Step: 23560 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:15:22,912-Speed 24397.07 samples/sec Loss 2.8172 LearningRate 0.0005 Epoch: 13 Global Step: 23570 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:15:33,120-Speed 24077.83 samples/sec Loss 2.7865 LearningRate 0.0005 Epoch: 13 Global Step: 23580 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:15:43,242-Speed 24284.89 samples/sec Loss 2.7799 LearningRate 0.0005 Epoch: 13 Global Step: 23590 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:15:53,327-Speed 24369.87 samples/sec Loss 2.7806 LearningRate 0.0005 Epoch: 13 Global Step: 23600 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:16:03,516-Speed 24124.92 samples/sec Loss 2.7959 LearningRate 0.0005 Epoch: 13 Global Step: 23610 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:16:13,632-Speed 24297.65 samples/sec Loss 2.7721 LearningRate 0.0005 Epoch: 13 Global Step: 23620 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:16:23,705-Speed 24397.90 samples/sec Loss 2.7681 LearningRate 0.0005 Epoch: 13 Global Step: 23630 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:16:33,823-Speed 24292.66 samples/sec Loss 2.7570 LearningRate 0.0005 Epoch: 13 Global Step: 23640 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:16:43,929-Speed 24323.36 samples/sec Loss 2.7793 LearningRate 0.0005 Epoch: 13 Global Step: 23650 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:16:53,988-Speed 24434.40 samples/sec Loss 2.7991 LearningRate 0.0005 Epoch: 13 Global Step: 23660 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:17:04,115-Speed 24271.63 samples/sec Loss 2.7878 LearningRate 0.0005 Epoch: 13 Global Step: 23670 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:17:14,132-Speed 24537.87 samples/sec Loss 2.7824 LearningRate 0.0005 Epoch: 13 Global Step: 23680 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:17:24,240-Speed 24315.97 samples/sec Loss 2.7594 LearningRate 0.0005 Epoch: 13 Global Step: 23690 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:17:34,328-Speed 24364.36 samples/sec Loss 2.7759 LearningRate 0.0005 Epoch: 13 Global Step: 23700 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:17:44,453-Speed 24275.75 samples/sec Loss 2.7873 LearningRate 0.0005 Epoch: 13 Global Step: 23710 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:17:54,562-Speed 24315.47 samples/sec Loss 2.7712 LearningRate 0.0005 Epoch: 13 Global Step: 23720 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:18:04,667-Speed 24322.73 samples/sec Loss 2.7574 LearningRate 0.0005 Epoch: 13 Global Step: 23730 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:18:14,827-Speed 24193.40 samples/sec Loss 2.7841 LearningRate 0.0005 Epoch: 13 Global Step: 23740 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:18:24,985-Speed 24196.79 samples/sec Loss 2.7884 LearningRate 0.0005 Epoch: 13 Global Step: 23750 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:18:35,064-Speed 24386.15 samples/sec Loss 2.7362 LearningRate 0.0005 Epoch: 13 Global Step: 23760 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:18:45,145-Speed 24387.13 samples/sec Loss 2.7608 LearningRate 0.0005 Epoch: 13 Global Step: 23770 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:18:55,213-Speed 24411.75 samples/sec Loss 2.7635 LearningRate 0.0005 Epoch: 13 Global Step: 23780 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:19:05,239-Speed 24514.03 samples/sec Loss 2.7705 LearningRate 0.0005 Epoch: 13 Global Step: 23790 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:19:15,333-Speed 24349.66 samples/sec Loss 2.7840 LearningRate 0.0005 Epoch: 13 Global Step: 23800 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-03-26 05:19:25,349-Speed 24540.21 samples/sec Loss 2.7560 LearningRate 0.0005 Epoch: 13 Global Step: 23810 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:19:35,383-Speed 24494.03 samples/sec Loss 2.8086 LearningRate 0.0005 Epoch: 13 Global Step: 23820 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:19:45,438-Speed 24446.46 samples/sec Loss 2.7589 LearningRate 0.0005 Epoch: 13 Global Step: 23830 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:19:55,535-Speed 24342.17 samples/sec Loss 2.7494 LearningRate 0.0005 Epoch: 13 Global Step: 23840 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:20:05,647-Speed 24306.34 samples/sec Loss 2.7837 LearningRate 0.0005 Epoch: 13 Global Step: 23850 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:20:15,717-Speed 24407.41 samples/sec Loss 2.7428 LearningRate 0.0005 Epoch: 13 Global Step: 23860 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:20:25,792-Speed 24396.89 samples/sec Loss 2.7686 LearningRate 0.0005 Epoch: 13 Global Step: 23870 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:20:35,847-Speed 24449.29 samples/sec Loss 2.7445 LearningRate 0.0005 Epoch: 13 Global Step: 23880 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:20:45,945-Speed 24341.43 samples/sec Loss 2.7407 LearningRate 0.0005 Epoch: 13 Global Step: 23890 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:20:56,045-Speed 24336.44 samples/sec Loss 2.7516 LearningRate 0.0005 Epoch: 13 Global Step: 23900 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:21:06,126-Speed 24381.13 samples/sec Loss 2.7537 LearningRate 0.0005 Epoch: 13 Global Step: 23910 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:21:16,342-Speed 24058.88 samples/sec Loss 2.7631 LearningRate 0.0005 Epoch: 13 Global Step: 23920 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:21:26,435-Speed 24360.94 samples/sec Loss 2.7545 LearningRate 0.0005 Epoch: 13 Global Step: 23930 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:21:36,582-Speed 24222.67 samples/sec Loss 2.7649 LearningRate 0.0005 Epoch: 13 Global Step: 23940 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:21:46,667-Speed 24371.51 samples/sec Loss 2.7400 LearningRate 0.0005 Epoch: 13 Global Step: 23950 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:21:56,876-Speed 24074.79 samples/sec Loss 2.7221 LearningRate 0.0005 Epoch: 13 Global Step: 23960 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:22:06,979-Speed 24328.47 samples/sec Loss 2.7580 LearningRate 0.0005 Epoch: 13 Global Step: 23970 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:22:17,112-Speed 24253.43 samples/sec Loss 2.7694 LearningRate 0.0005 Epoch: 13 Global Step: 23980 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:22:27,321-Speed 24076.42 samples/sec Loss 2.7435 LearningRate 0.0005 Epoch: 13 Global Step: 23990 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:22:37,376-Speed 24445.39 samples/sec Loss 2.7312 LearningRate 0.0005 Epoch: 13 Global Step: 24000 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:22:47,462-Speed 24369.13 samples/sec Loss 2.7472 LearningRate 0.0005 Epoch: 13 Global Step: 24010 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:22:57,682-Speed 24051.02 samples/sec Loss 2.7403 LearningRate 0.0005 Epoch: 13 Global Step: 24020 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:23:07,948-Speed 23944.87 samples/sec Loss 2.7323 LearningRate 0.0005 Epoch: 13 Global Step: 24030 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:23:17,817-Speed 24905.15 samples/sec Loss 2.7289 LearningRate 0.0005 Epoch: 13 Global Step: 24040 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:23:27,791-Speed 24646.85 samples/sec Loss 2.7514 LearningRate 0.0005 Epoch: 13 Global Step: 24050 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:23:37,594-Speed 25072.87 samples/sec Loss 2.7494 LearningRate 0.0005 Epoch: 13 Global Step: 24060 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:23:47,598-Speed 24568.89 samples/sec Loss 2.7376 LearningRate 0.0005 Epoch: 13 Global Step: 24070 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:23:57,500-Speed 24822.60 samples/sec Loss 2.7465 LearningRate 0.0005 Epoch: 13 Global Step: 24080 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:24:07,345-Speed 24965.02 samples/sec Loss 2.7600 LearningRate 0.0005 Epoch: 13 Global Step: 24090 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:24:17,127-Speed 25126.85 samples/sec Loss 2.7553 LearningRate 0.0005 Epoch: 13 Global Step: 24100 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:24:26,939-Speed 25050.09 samples/sec Loss 2.7687 LearningRate 0.0005 Epoch: 13 Global Step: 24110 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:24:36,687-Speed 25213.97 samples/sec Loss 2.7613 LearningRate 0.0005 Epoch: 13 Global Step: 24120 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:24:46,504-Speed 25038.41 samples/sec Loss 2.7639 LearningRate 0.0005 Epoch: 13 Global Step: 24130 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:24:56,292-Speed 25112.63 samples/sec Loss 2.7341 LearningRate 0.0005 Epoch: 13 Global Step: 24140 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:25:06,218-Speed 24761.23 samples/sec Loss 2.7630 LearningRate 0.0005 Epoch: 13 Global Step: 24150 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:25:16,184-Speed 24665.31 samples/sec Loss 2.7803 LearningRate 0.0005 Epoch: 13 Global Step: 24160 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:25:26,010-Speed 25011.99 samples/sec Loss 2.7879 LearningRate 0.0005 Epoch: 13 Global Step: 24170 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:25:35,744-Speed 25252.80 samples/sec Loss 2.7593 LearningRate 0.0005 Epoch: 13 Global Step: 24180 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:25:45,600-Speed 24939.15 samples/sec Loss 2.7623 LearningRate 0.0005 Epoch: 13 Global Step: 24190 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:26:45,634-Speed 4093.72 samples/sec Loss 2.7564 LearningRate 0.0005 Epoch: 14 Global Step: 24200 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:26:55,614-Speed 24635.58 samples/sec Loss 2.6942 LearningRate 0.0005 Epoch: 14 Global Step: 24210 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:27:05,585-Speed 24649.85 samples/sec Loss 2.6881 LearningRate 0.0005 Epoch: 14 Global Step: 24220 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:27:15,513-Speed 24757.11 samples/sec Loss 2.6925 LearningRate 0.0005 Epoch: 14 Global Step: 24230 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:27:25,454-Speed 24727.38 samples/sec Loss 2.7048 LearningRate 0.0005 Epoch: 14 Global Step: 24240 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:27:35,406-Speed 24698.15 samples/sec Loss 2.7079 LearningRate 0.0005 Epoch: 14 Global Step: 24250 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:27:45,352-Speed 24711.83 samples/sec Loss 2.6933 LearningRate 0.0005 Epoch: 14 Global Step: 24260 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:27:55,380-Speed 24511.26 samples/sec Loss 2.7143 LearningRate 0.0005 Epoch: 14 Global Step: 24270 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:28:05,362-Speed 24630.03 samples/sec Loss 2.6985 LearningRate 0.0005 Epoch: 14 Global Step: 24280 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:28:15,454-Speed 24355.38 samples/sec Loss 2.7338 LearningRate 0.0005 Epoch: 14 Global Step: 24290 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:28:25,523-Speed 24409.94 samples/sec Loss 2.7478 LearningRate 0.0005 Epoch: 14 Global Step: 24300 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:28:35,533-Speed 24555.06 samples/sec Loss 2.7431 LearningRate 0.0005 Epoch: 14 Global Step: 24310 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:28:45,463-Speed 24751.35 samples/sec Loss 2.7055 LearningRate 0.0005 Epoch: 14 Global Step: 24320 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:28:55,351-Speed 24856.19 samples/sec Loss 2.6957 LearningRate 0.0005 Epoch: 14 Global Step: 24330 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:29:05,145-Speed 25097.38 samples/sec Loss 2.7085 LearningRate 0.0005 Epoch: 14 Global Step: 24340 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:29:14,968-Speed 25020.31 samples/sec Loss 2.7242 LearningRate 0.0005 Epoch: 14 Global Step: 24350 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:29:24,814-Speed 24963.85 samples/sec Loss 2.7154 LearningRate 0.0005 Epoch: 14 Global Step: 24360 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:29:34,661-Speed 24960.93 samples/sec Loss 2.7636 LearningRate 0.0005 Epoch: 14 Global Step: 24370 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:29:44,646-Speed 24622.19 samples/sec Loss 2.7751 LearningRate 0.0005 Epoch: 14 Global Step: 24380 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:29:54,478-Speed 24998.33 samples/sec Loss 2.6945 LearningRate 0.0005 Epoch: 14 Global Step: 24390 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:30:04,301-Speed 25022.61 samples/sec Loss 2.7132 LearningRate 0.0005 Epoch: 14 Global Step: 24400 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:30:14,081-Speed 25131.12 samples/sec Loss 2.7118 LearningRate 0.0005 Epoch: 14 Global Step: 24410 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:30:23,831-Speed 25211.93 samples/sec Loss 2.7147 LearningRate 0.0005 Epoch: 14 Global Step: 24420 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:30:33,655-Speed 25019.78 samples/sec Loss 2.7186 LearningRate 0.0005 Epoch: 14 Global Step: 24430 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:30:43,421-Speed 25168.11 samples/sec Loss 2.7163 LearningRate 0.0005 Epoch: 14 Global Step: 24440 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:30:53,122-Speed 25340.20 samples/sec Loss 2.7368 LearningRate 0.0005 Epoch: 14 Global Step: 24450 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:31:02,960-Speed 24982.04 samples/sec Loss 2.7254 LearningRate 0.0005 Epoch: 14 Global Step: 24460 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:31:12,707-Speed 25217.03 samples/sec Loss 2.7254 LearningRate 0.0005 Epoch: 14 Global Step: 24470 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:31:22,556-Speed 24956.02 samples/sec Loss 2.7139 LearningRate 0.0005 Epoch: 14 Global Step: 24480 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-26 05:31:32,399-Speed 24970.95 samples/sec Loss 2.7184 LearningRate 0.0005 Epoch: 14 Global Step: 24490 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:31:42,420-Speed 24527.05 samples/sec Loss 2.7227 LearningRate 0.0005 Epoch: 14 Global Step: 24500 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:31:52,425-Speed 24568.24 samples/sec Loss 2.7024 LearningRate 0.0005 Epoch: 14 Global Step: 24510 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:32:02,309-Speed 24867.23 samples/sec Loss 2.7113 LearningRate 0.0005 Epoch: 14 Global Step: 24520 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:32:12,086-Speed 25139.34 samples/sec Loss 2.6929 LearningRate 0.0005 Epoch: 14 Global Step: 24530 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:32:21,845-Speed 25187.58 samples/sec Loss 2.7137 LearningRate 0.0005 Epoch: 14 Global Step: 24540 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:32:31,719-Speed 24892.15 samples/sec Loss 2.7135 LearningRate 0.0005 Epoch: 14 Global Step: 24550 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:32:41,592-Speed 24894.09 samples/sec Loss 2.7307 LearningRate 0.0005 Epoch: 14 Global Step: 24560 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:32:51,463-Speed 24902.23 samples/sec Loss 2.7001 LearningRate 0.0005 Epoch: 14 Global Step: 24570 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:33:01,300-Speed 24985.30 samples/sec Loss 2.7206 LearningRate 0.0005 Epoch: 14 Global Step: 24580 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:33:11,121-Speed 25029.45 samples/sec Loss 2.7116 LearningRate 0.0005 Epoch: 14 Global Step: 24590 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:33:20,882-Speed 25180.02 samples/sec Loss 2.7024 LearningRate 0.0005 Epoch: 14 Global Step: 24600 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:33:30,779-Speed 24833.95 samples/sec Loss 2.7012 LearningRate 0.0005 Epoch: 14 Global Step: 24610 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:33:40,532-Speed 25201.20 samples/sec Loss 2.6993 LearningRate 0.0005 Epoch: 14 Global Step: 24620 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:33:50,371-Speed 24981.23 samples/sec Loss 2.7427 LearningRate 0.0005 Epoch: 14 Global Step: 24630 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:34:00,266-Speed 24840.23 samples/sec Loss 2.7351 LearningRate 0.0005 Epoch: 14 Global Step: 24640 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:34:10,067-Speed 25077.20 samples/sec Loss 2.6975 LearningRate 0.0005 Epoch: 14 Global Step: 24650 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:34:19,939-Speed 24899.10 samples/sec Loss 2.7000 LearningRate 0.0005 Epoch: 14 Global Step: 24660 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:34:29,822-Speed 24869.84 samples/sec Loss 2.7264 LearningRate 0.0005 Epoch: 14 Global Step: 24670 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:34:39,627-Speed 25066.76 samples/sec Loss 2.7228 LearningRate 0.0005 Epoch: 14 Global Step: 24680 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:34:49,626-Speed 24587.93 samples/sec Loss 2.6740 LearningRate 0.0005 Epoch: 14 Global Step: 24690 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-03-26 05:34:59,417-Speed 25103.18 samples/sec Loss 2.6843 LearningRate 0.0005 Epoch: 14 Global Step: 24700 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:35:09,183-Speed 25166.98 samples/sec Loss 2.7011 LearningRate 0.0005 Epoch: 14 Global Step: 24710 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:35:18,939-Speed 25194.34 samples/sec Loss 2.7261 LearningRate 0.0005 Epoch: 14 Global Step: 24720 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:35:28,747-Speed 25061.75 samples/sec Loss 2.6883 LearningRate 0.0005 Epoch: 14 Global Step: 24730 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:35:38,615-Speed 24906.96 samples/sec Loss 2.6872 LearningRate 0.0005 Epoch: 14 Global Step: 24740 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:35:48,390-Speed 25151.73 samples/sec Loss 2.6966 LearningRate 0.0005 Epoch: 14 Global Step: 24750 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:35:58,141-Speed 25204.50 samples/sec Loss 2.6841 LearningRate 0.0005 Epoch: 14 Global Step: 24760 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:36:07,997-Speed 24937.90 samples/sec Loss 2.6684 LearningRate 0.0005 Epoch: 14 Global Step: 24770 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:36:17,922-Speed 24766.19 samples/sec Loss 2.6855 LearningRate 0.0005 Epoch: 14 Global Step: 24780 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:36:27,785-Speed 24918.03 samples/sec Loss 2.6937 LearningRate 0.0005 Epoch: 14 Global Step: 24790 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:36:37,594-Speed 25057.31 samples/sec Loss 2.7297 LearningRate 0.0005 Epoch: 14 Global Step: 24800 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:36:47,423-Speed 25007.53 samples/sec Loss 2.7012 LearningRate 0.0005 Epoch: 14 Global Step: 24810 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:36:57,351-Speed 24756.80 samples/sec Loss 2.6865 LearningRate 0.0005 Epoch: 14 Global Step: 24820 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:37:07,107-Speed 25193.08 samples/sec Loss 2.6698 LearningRate 0.0005 Epoch: 14 Global Step: 24830 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-26 05:37:16,932-Speed 25015.90 samples/sec Loss 2.6925 LearningRate 0.0005 Epoch: 14 Global Step: 24840 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:37:26,734-Speed 25077.93 samples/sec Loss 2.6784 LearningRate 0.0005 Epoch: 14 Global Step: 24850 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:37:36,521-Speed 25113.04 samples/sec Loss 2.6629 LearningRate 0.0005 Epoch: 14 Global Step: 24860 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:37:46,475-Speed 24693.19 samples/sec Loss 2.6569 LearningRate 0.0005 Epoch: 14 Global Step: 24870 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:37:56,355-Speed 24876.71 samples/sec Loss 2.6589 LearningRate 0.0005 Epoch: 14 Global Step: 24880 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:38:06,356-Speed 24581.93 samples/sec Loss 2.6824 LearningRate 0.0005 Epoch: 14 Global Step: 24890 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:38:16,198-Speed 24971.51 samples/sec Loss 2.6909 LearningRate 0.0005 Epoch: 14 Global Step: 24900 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:38:26,086-Speed 24860.32 samples/sec Loss 2.7082 LearningRate 0.0005 Epoch: 14 Global Step: 24910 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:38:35,977-Speed 24854.90 samples/sec Loss 2.6714 LearningRate 0.0005 Epoch: 14 Global Step: 24920 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:38:45,764-Speed 25113.52 samples/sec Loss 2.6940 LearningRate 0.0005 Epoch: 14 Global Step: 24930 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:38:55,565-Speed 25078.90 samples/sec Loss 2.6737 LearningRate 0.0005 Epoch: 14 Global Step: 24940 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:39:05,375-Speed 25053.42 samples/sec Loss 2.6739 LearningRate 0.0005 Epoch: 14 Global Step: 24950 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:39:15,089-Speed 25304.31 samples/sec Loss 2.6405 LearningRate 0.0005 Epoch: 14 Global Step: 24960 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:39:24,940-Speed 24950.07 samples/sec Loss 2.6954 LearningRate 0.0005 Epoch: 14 Global Step: 24970 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:39:34,784-Speed 24967.38 samples/sec Loss 2.6629 LearningRate 0.0005 Epoch: 14 Global Step: 24980 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:39:44,580-Speed 25089.33 samples/sec Loss 2.6713 LearningRate 0.0005 Epoch: 14 Global Step: 24990 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:39:54,370-Speed 25106.40 samples/sec Loss 2.7029 LearningRate 0.0005 Epoch: 14 Global Step: 25000 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:40:04,258-Speed 24854.70 samples/sec Loss 2.6748 LearningRate 0.0005 Epoch: 14 Global Step: 25010 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:40:14,196-Speed 24733.05 samples/sec Loss 2.6761 LearningRate 0.0005 Epoch: 14 Global Step: 25020 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:40:24,105-Speed 24808.90 samples/sec Loss 2.6584 LearningRate 0.0005 Epoch: 14 Global Step: 25030 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:40:33,998-Speed 24843.81 samples/sec Loss 2.6653 LearningRate 0.0005 Epoch: 14 Global Step: 25040 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:40:43,787-Speed 25107.92 samples/sec Loss 2.6829 LearningRate 0.0005 Epoch: 14 Global Step: 25050 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:40:53,697-Speed 24801.24 samples/sec Loss 2.6934 LearningRate 0.0005 Epoch: 14 Global Step: 25060 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:41:03,571-Speed 24893.32 samples/sec Loss 2.6800 LearningRate 0.0005 Epoch: 14 Global Step: 25070 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:41:13,386-Speed 25039.46 samples/sec Loss 2.6747 LearningRate 0.0005 Epoch: 14 Global Step: 25080 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:41:23,199-Speed 25047.63 samples/sec Loss 2.6494 LearningRate 0.0005 Epoch: 14 Global Step: 25090 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:41:33,128-Speed 24754.47 samples/sec Loss 2.6785 LearningRate 0.0005 Epoch: 14 Global Step: 25100 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-03-26 05:41:42,975-Speed 24960.67 samples/sec Loss 2.6697 LearningRate 0.0005 Epoch: 14 Global Step: 25110 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:41:52,835-Speed 24926.06 samples/sec Loss 2.6589 LearningRate 0.0005 Epoch: 14 Global Step: 25120 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:42:02,643-Speed 25062.17 samples/sec Loss 2.7021 LearningRate 0.0005 Epoch: 14 Global Step: 25130 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:42:12,506-Speed 24919.55 samples/sec Loss 2.6772 LearningRate 0.0005 Epoch: 14 Global Step: 25140 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:42:22,530-Speed 24519.38 samples/sec Loss 2.6483 LearningRate 0.0005 Epoch: 14 Global Step: 25150 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:42:32,430-Speed 24828.33 samples/sec Loss 2.6607 LearningRate 0.0005 Epoch: 14 Global Step: 25160 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:42:42,258-Speed 25008.45 samples/sec Loss 2.6744 LearningRate 0.0005 Epoch: 14 Global Step: 25170 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:42:52,053-Speed 25094.36 samples/sec Loss 2.6565 LearningRate 0.0005 Epoch: 14 Global Step: 25180 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:43:01,889-Speed 24986.77 samples/sec Loss 2.6636 LearningRate 0.0005 Epoch: 14 Global Step: 25190 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:43:11,773-Speed 24868.11 samples/sec Loss 2.6449 LearningRate 0.0005 Epoch: 14 Global Step: 25200 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:43:21,637-Speed 24919.73 samples/sec Loss 2.6724 LearningRate 0.0005 Epoch: 14 Global Step: 25210 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:43:31,646-Speed 24556.59 samples/sec Loss 2.6768 LearningRate 0.0005 Epoch: 14 Global Step: 25220 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:43:41,722-Speed 24392.80 samples/sec Loss 2.6582 LearningRate 0.0005 Epoch: 14 Global Step: 25230 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:43:51,748-Speed 24516.35 samples/sec Loss 2.6408 LearningRate 0.0005 Epoch: 14 Global Step: 25240 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:44:01,803-Speed 24445.07 samples/sec Loss 2.6707 LearningRate 0.0005 Epoch: 14 Global Step: 25250 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:44:11,947-Speed 24230.39 samples/sec Loss 2.6691 LearningRate 0.0005 Epoch: 14 Global Step: 25260 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:44:21,967-Speed 24529.93 samples/sec Loss 2.6609 LearningRate 0.0005 Epoch: 14 Global Step: 25270 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:44:31,965-Speed 24582.86 samples/sec Loss 2.6375 LearningRate 0.0005 Epoch: 14 Global Step: 25280 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:44:42,011-Speed 24467.24 samples/sec Loss 2.6426 LearningRate 0.0005 Epoch: 14 Global Step: 25290 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:44:52,084-Speed 24401.07 samples/sec Loss 2.6701 LearningRate 0.0005 Epoch: 14 Global Step: 25300 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:45:02,044-Speed 24676.01 samples/sec Loss 2.6711 LearningRate 0.0005 Epoch: 14 Global Step: 25310 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:45:12,285-Speed 24007.55 samples/sec Loss 2.6267 LearningRate 0.0005 Epoch: 14 Global Step: 25320 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:45:22,284-Speed 24582.76 samples/sec Loss 2.6528 LearningRate 0.0005 Epoch: 14 Global Step: 25330 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:45:32,196-Speed 24796.47 samples/sec Loss 2.6351 LearningRate 0.0005 Epoch: 14 Global Step: 25340 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:45:42,183-Speed 24608.69 samples/sec Loss 2.6516 LearningRate 0.0005 Epoch: 14 Global Step: 25350 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:45:52,158-Speed 24641.44 samples/sec Loss 2.6684 LearningRate 0.0005 Epoch: 14 Global Step: 25360 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:46:02,123-Speed 24665.01 samples/sec Loss 2.6415 LearningRate 0.0005 Epoch: 14 Global Step: 25370 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:46:12,208-Speed 24376.52 samples/sec Loss 2.6415 LearningRate 0.0005 Epoch: 14 Global Step: 25380 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:46:22,261-Speed 24448.64 samples/sec Loss 2.6562 LearningRate 0.0005 Epoch: 14 Global Step: 25390 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:46:32,138-Speed 24885.62 samples/sec Loss 2.6320 LearningRate 0.0005 Epoch: 14 Global Step: 25400 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:46:42,007-Speed 24902.68 samples/sec Loss 2.6499 LearningRate 0.0005 Epoch: 14 Global Step: 25410 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:46:52,001-Speed 24597.07 samples/sec Loss 2.7587 LearningRate 0.0005 Epoch: 14 Global Step: 25420 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:47:01,953-Speed 24696.52 samples/sec Loss 2.6638 LearningRate 0.0005 Epoch: 14 Global Step: 25430 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:47:11,786-Speed 24997.33 samples/sec Loss 2.6521 LearningRate 0.0005 Epoch: 14 Global Step: 25440 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:47:21,747-Speed 24678.60 samples/sec Loss 2.6340 LearningRate 0.0005 Epoch: 14 Global Step: 25450 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:47:31,693-Speed 24712.09 samples/sec Loss 2.6573 LearningRate 0.0005 Epoch: 14 Global Step: 25460 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 05:47:41,598-Speed 24817.69 samples/sec Loss 2.6625 LearningRate 0.0005 Epoch: 14 Global Step: 25470 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 05:47:51,727-Speed 24266.19 samples/sec Loss 2.6437 LearningRate 0.0005 Epoch: 14 Global Step: 25480 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 05:48:01,707-Speed 24628.92 samples/sec Loss 2.6432 LearningRate 0.0005 Epoch: 14 Global Step: 25490 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 05:48:11,986-Speed 23912.94 samples/sec Loss 2.6762 LearningRate 0.0005 Epoch: 14 Global Step: 25500 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 05:48:22,068-Speed 24380.76 samples/sec Loss 2.6463 LearningRate 0.0005 Epoch: 14 Global Step: 25510 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 05:48:32,069-Speed 24574.14 samples/sec Loss 2.6237 LearningRate 0.0005 Epoch: 14 Global Step: 25520 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 05:48:42,109-Speed 24481.95 samples/sec Loss 2.6271 LearningRate 0.0005 Epoch: 14 Global Step: 25530 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 05:48:52,074-Speed 24671.18 samples/sec Loss 2.6399 LearningRate 0.0005 Epoch: 14 Global Step: 25540 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 05:49:02,212-Speed 24244.51 samples/sec Loss 2.6573 LearningRate 0.0005 Epoch: 14 Global Step: 25550 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 05:49:12,259-Speed 24464.89 samples/sec Loss 2.6373 LearningRate 0.0005 Epoch: 14 Global Step: 25560 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:49:22,256-Speed 24588.17 samples/sec Loss 2.6261 LearningRate 0.0005 Epoch: 14 Global Step: 25570 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:49:32,427-Speed 24166.35 samples/sec Loss 2.6395 LearningRate 0.0005 Epoch: 14 Global Step: 25580 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:49:42,402-Speed 24640.12 samples/sec Loss 2.6273 LearningRate 0.0005 Epoch: 14 Global Step: 25590 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:49:52,340-Speed 24733.34 samples/sec Loss 2.6600 LearningRate 0.0005 Epoch: 14 Global Step: 25600 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:50:02,456-Speed 24296.96 samples/sec Loss 2.6389 LearningRate 0.0005 Epoch: 14 Global Step: 25610 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:50:12,537-Speed 24382.89 samples/sec Loss 2.6212 LearningRate 0.0005 Epoch: 14 Global Step: 25620 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:50:22,505-Speed 24658.45 samples/sec Loss 2.6441 LearningRate 0.0005 Epoch: 14 Global Step: 25630 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 05:50:32,380-Speed 24888.83 samples/sec Loss 2.6456 LearningRate 0.0005 Epoch: 14 Global Step: 25640 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 05:50:42,191-Speed 25053.76 samples/sec Loss 2.6677 LearningRate 0.0005 Epoch: 14 Global Step: 25650 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 05:50:52,020-Speed 25011.83 samples/sec Loss 2.6470 LearningRate 0.0005 Epoch: 14 Global Step: 25660 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 05:51:01,826-Speed 25067.70 samples/sec Loss 2.6291 LearningRate 0.0005 Epoch: 14 Global Step: 25670 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 05:51:11,784-Speed 24683.66 samples/sec Loss 2.6336 LearningRate 0.0005 Epoch: 14 Global Step: 25680 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 05:51:21,534-Speed 25210.56 samples/sec Loss 2.6826 LearningRate 0.0005 Epoch: 14 Global Step: 25690 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 05:51:31,371-Speed 24987.15 samples/sec Loss 2.6609 LearningRate 0.0005 Epoch: 14 Global Step: 25700 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 05:51:41,252-Speed 24875.91 samples/sec Loss 2.6286 LearningRate 0.0005 Epoch: 14 Global Step: 25710 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 05:51:51,124-Speed 24898.48 samples/sec Loss 2.6112 LearningRate 0.0005 Epoch: 14 Global Step: 25720 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 05:52:00,867-Speed 25229.78 samples/sec Loss 2.6420 LearningRate 0.0005 Epoch: 14 Global Step: 25730 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:52:10,647-Speed 25130.61 samples/sec Loss 2.6493 LearningRate 0.0005 Epoch: 14 Global Step: 25740 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:52:20,429-Speed 25128.09 samples/sec Loss 2.6337 LearningRate 0.0005 Epoch: 14 Global Step: 25750 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:52:30,237-Speed 25060.94 samples/sec Loss 2.6426 LearningRate 0.0005 Epoch: 14 Global Step: 25760 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:52:40,106-Speed 24904.24 samples/sec Loss 2.6300 LearningRate 0.0005 Epoch: 14 Global Step: 25770 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:52:49,853-Speed 25217.76 samples/sec Loss 2.6183 LearningRate 0.0005 Epoch: 14 Global Step: 25780 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:52:59,596-Speed 25226.72 samples/sec Loss 2.6093 LearningRate 0.0005 Epoch: 14 Global Step: 25790 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:53:09,453-Speed 24934.93 samples/sec Loss 2.6125 LearningRate 0.0005 Epoch: 14 Global Step: 25800 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:53:19,181-Speed 25266.82 samples/sec Loss 2.6036 LearningRate 0.0005 Epoch: 14 Global Step: 25810 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:53:29,054-Speed 24899.70 samples/sec Loss 2.6302 LearningRate 0.0005 Epoch: 14 Global Step: 25820 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:53:38,838-Speed 25121.01 samples/sec Loss 2.6289 LearningRate 0.0005 Epoch: 14 Global Step: 25830 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-03-26 05:53:48,706-Speed 24907.12 samples/sec Loss 2.6167 LearningRate 0.0005 Epoch: 14 Global Step: 25840 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:53:58,393-Speed 25373.09 samples/sec Loss 2.6369 LearningRate 0.0005 Epoch: 14 Global Step: 25850 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:54:08,156-Speed 25177.87 samples/sec Loss 2.6267 LearningRate 0.0005 Epoch: 14 Global Step: 25860 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:54:18,001-Speed 24964.95 samples/sec Loss 2.6360 LearningRate 0.0005 Epoch: 14 Global Step: 25870 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:54:27,699-Speed 25344.35 samples/sec Loss 2.6405 LearningRate 0.0005 Epoch: 14 Global Step: 25880 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:54:37,664-Speed 24666.84 samples/sec Loss 2.5990 LearningRate 0.0005 Epoch: 14 Global Step: 25890 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:54:47,509-Speed 24965.45 samples/sec Loss 2.6346 LearningRate 0.0005 Epoch: 14 Global Step: 25900 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:54:57,346-Speed 24986.94 samples/sec Loss 2.6609 LearningRate 0.0005 Epoch: 14 Global Step: 25910 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:55:07,166-Speed 25029.72 samples/sec Loss 2.6734 LearningRate 0.0005 Epoch: 14 Global Step: 25920 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:56:06,582-Speed 4136.32 samples/sec Loss 2.6628 LearningRate 0.0005 Epoch: 15 Global Step: 25930 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:56:16,332-Speed 25209.83 samples/sec Loss 2.5840 LearningRate 0.0005 Epoch: 15 Global Step: 25940 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:56:26,171-Speed 24989.19 samples/sec Loss 2.5836 LearningRate 0.0005 Epoch: 15 Global Step: 25950 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:56:35,985-Speed 25047.20 samples/sec Loss 2.5923 LearningRate 0.0005 Epoch: 15 Global Step: 25960 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:56:45,712-Speed 25268.27 samples/sec Loss 2.5880 LearningRate 0.0005 Epoch: 15 Global Step: 25970 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 05:56:55,502-Speed 25108.25 samples/sec Loss 2.5899 LearningRate 0.0005 Epoch: 15 Global Step: 25980 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 05:57:05,280-Speed 25138.40 samples/sec Loss 2.5763 LearningRate 0.0005 Epoch: 15 Global Step: 25990 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 05:57:15,117-Speed 24984.41 samples/sec Loss 2.5734 LearningRate 0.0005 Epoch: 15 Global Step: 26000 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 05:57:24,902-Speed 25124.46 samples/sec Loss 2.6070 LearningRate 0.0005 Epoch: 15 Global Step: 26010 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 05:57:34,777-Speed 24889.64 samples/sec Loss 2.6153 LearningRate 0.0005 Epoch: 15 Global Step: 26020 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 05:57:44,563-Speed 25115.49 samples/sec Loss 2.5837 LearningRate 0.0005 Epoch: 15 Global Step: 26030 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 05:57:54,329-Speed 25168.76 samples/sec Loss 2.6019 LearningRate 0.0005 Epoch: 15 Global Step: 26040 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 05:58:04,121-Speed 25103.53 samples/sec Loss 2.6042 LearningRate 0.0005 Epoch: 15 Global Step: 26050 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 05:58:14,050-Speed 24754.29 samples/sec Loss 2.5588 LearningRate 0.0005 Epoch: 15 Global Step: 26060 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 05:58:23,999-Speed 24704.88 samples/sec Loss 2.5912 LearningRate 0.0005 Epoch: 15 Global Step: 26070 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:58:33,777-Speed 25137.22 samples/sec Loss 2.5909 LearningRate 0.0005 Epoch: 15 Global Step: 26080 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:58:43,637-Speed 24927.48 samples/sec Loss 2.5887 LearningRate 0.0005 Epoch: 15 Global Step: 26090 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:58:53,423-Speed 25118.61 samples/sec Loss 2.5766 LearningRate 0.0005 Epoch: 15 Global Step: 26100 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:59:03,312-Speed 24856.72 samples/sec Loss 2.5868 LearningRate 0.0005 Epoch: 15 Global Step: 26110 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:59:13,114-Speed 25075.99 samples/sec Loss 2.5835 LearningRate 0.0005 Epoch: 15 Global Step: 26120 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:59:22,837-Speed 25280.05 samples/sec Loss 2.6213 LearningRate 0.0005 Epoch: 15 Global Step: 26130 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:59:32,699-Speed 24921.86 samples/sec Loss 2.6000 LearningRate 0.0005 Epoch: 15 Global Step: 26140 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:59:42,618-Speed 24780.78 samples/sec Loss 2.5892 LearningRate 0.0005 Epoch: 15 Global Step: 26150 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 05:59:52,435-Speed 25037.29 samples/sec Loss 2.5975 LearningRate 0.0005 Epoch: 15 Global Step: 26160 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:00:02,298-Speed 24920.51 samples/sec Loss 2.5750 LearningRate 0.0005 Epoch: 15 Global Step: 26170 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:00:12,409-Speed 24308.93 samples/sec Loss 2.6049 LearningRate 0.0005 Epoch: 15 Global Step: 26180 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:00:22,373-Speed 24668.90 samples/sec Loss 2.6113 LearningRate 0.0005 Epoch: 15 Global Step: 26190 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:00:32,300-Speed 24760.25 samples/sec Loss 2.5962 LearningRate 0.0005 Epoch: 15 Global Step: 26200 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:00:42,304-Speed 24567.82 samples/sec Loss 2.6086 LearningRate 0.0005 Epoch: 15 Global Step: 26210 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:00:52,285-Speed 24626.67 samples/sec Loss 2.6036 LearningRate 0.0005 Epoch: 15 Global Step: 26220 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:01:02,049-Speed 25174.75 samples/sec Loss 2.5998 LearningRate 0.0005 Epoch: 15 Global Step: 26230 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:01:11,843-Speed 25096.52 samples/sec Loss 2.6076 LearningRate 0.0005 Epoch: 15 Global Step: 26240 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:01:21,702-Speed 24930.82 samples/sec Loss 2.5972 LearningRate 0.0005 Epoch: 15 Global Step: 26250 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:01:31,439-Speed 25243.77 samples/sec Loss 2.5875 LearningRate 0.0005 Epoch: 15 Global Step: 26260 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:01:41,279-Speed 24982.96 samples/sec Loss 2.6010 LearningRate 0.0005 Epoch: 15 Global Step: 26270 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:01:51,245-Speed 24664.34 samples/sec Loss 2.5860 LearningRate 0.0005 Epoch: 15 Global Step: 26280 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:02:01,012-Speed 25164.37 samples/sec Loss 2.5833 LearningRate 0.0005 Epoch: 15 Global Step: 26290 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:02:10,956-Speed 24720.64 samples/sec Loss 2.6154 LearningRate 0.0005 Epoch: 15 Global Step: 26300 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:02:20,809-Speed 24946.90 samples/sec Loss 2.5921 LearningRate 0.0005 Epoch: 15 Global Step: 26310 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:02:30,563-Speed 25198.45 samples/sec Loss 2.5848 LearningRate 0.0005 Epoch: 15 Global Step: 26320 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:02:40,277-Speed 25303.31 samples/sec Loss 2.6008 LearningRate 0.0005 Epoch: 15 Global Step: 26330 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:02:50,179-Speed 24824.18 samples/sec Loss 2.5764 LearningRate 0.0005 Epoch: 15 Global Step: 26340 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:03:00,142-Speed 24670.32 samples/sec Loss 2.5832 LearningRate 0.0005 Epoch: 15 Global Step: 26350 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:03:09,922-Speed 25133.66 samples/sec Loss 2.5961 LearningRate 0.0005 Epoch: 15 Global Step: 26360 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:03:19,699-Speed 25141.36 samples/sec Loss 2.6052 LearningRate 0.0005 Epoch: 15 Global Step: 26370 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:03:29,443-Speed 25223.54 samples/sec Loss 2.5774 LearningRate 0.0005 Epoch: 15 Global Step: 26380 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:03:39,190-Speed 25216.93 samples/sec Loss 2.5754 LearningRate 0.0005 Epoch: 15 Global Step: 26390 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:03:49,049-Speed 24932.35 samples/sec Loss 2.5857 LearningRate 0.0005 Epoch: 15 Global Step: 26400 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:03:58,945-Speed 24835.19 samples/sec Loss 2.5816 LearningRate 0.0005 Epoch: 15 Global Step: 26410 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:04:08,829-Speed 24868.11 samples/sec Loss 2.5658 LearningRate 0.0005 Epoch: 15 Global Step: 26420 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:04:18,629-Speed 25078.80 samples/sec Loss 2.5955 LearningRate 0.0005 Epoch: 15 Global Step: 26430 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:04:28,475-Speed 24965.52 samples/sec Loss 2.5935 LearningRate 0.0005 Epoch: 15 Global Step: 26440 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:04:38,411-Speed 24735.37 samples/sec Loss 2.5539 LearningRate 0.0005 Epoch: 15 Global Step: 26450 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:04:48,224-Speed 25048.69 samples/sec Loss 2.5698 LearningRate 0.0005 Epoch: 15 Global Step: 26460 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:04:57,978-Speed 25199.15 samples/sec Loss 2.5611 LearningRate 0.0005 Epoch: 15 Global Step: 26470 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:05:07,759-Speed 25128.01 samples/sec Loss 2.5609 LearningRate 0.0005 Epoch: 15 Global Step: 26480 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:05:17,585-Speed 25015.21 samples/sec Loss 2.5546 LearningRate 0.0005 Epoch: 15 Global Step: 26490 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:05:27,434-Speed 24956.99 samples/sec Loss 2.5628 LearningRate 0.0005 Epoch: 15 Global Step: 26500 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:05:37,388-Speed 24693.42 samples/sec Loss 2.5955 LearningRate 0.0005 Epoch: 15 Global Step: 26510 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:05:47,215-Speed 25010.51 samples/sec Loss 2.6105 LearningRate 0.0005 Epoch: 15 Global Step: 26520 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:05:57,038-Speed 25020.46 samples/sec Loss 2.5974 LearningRate 0.0005 Epoch: 15 Global Step: 26530 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:06:06,794-Speed 25195.72 samples/sec Loss 2.5887 LearningRate 0.0005 Epoch: 15 Global Step: 26540 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:06:16,581-Speed 25114.20 samples/sec Loss 2.6489 LearningRate 0.0005 Epoch: 15 Global Step: 26550 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:06:26,478-Speed 24833.23 samples/sec Loss 2.6007 LearningRate 0.0005 Epoch: 15 Global Step: 26560 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:06:36,319-Speed 24977.77 samples/sec Loss 2.5665 LearningRate 0.0005 Epoch: 15 Global Step: 26570 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:06:46,188-Speed 24904.40 samples/sec Loss 2.5658 LearningRate 0.0005 Epoch: 15 Global Step: 26580 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:06:55,976-Speed 25110.85 samples/sec Loss 2.5847 LearningRate 0.0005 Epoch: 15 Global Step: 26590 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:07:05,690-Speed 25302.22 samples/sec Loss 2.5812 LearningRate 0.0005 Epoch: 15 Global Step: 26600 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:07:15,419-Speed 25264.20 samples/sec Loss 2.5651 LearningRate 0.0005 Epoch: 15 Global Step: 26610 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-03-26 06:07:25,502-Speed 24387.09 samples/sec Loss 2.5680 LearningRate 0.0005 Epoch: 15 Global Step: 26620 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:07:35,477-Speed 24642.13 samples/sec Loss 2.6076 LearningRate 0.0005 Epoch: 15 Global Step: 26630 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:07:45,511-Speed 24494.55 samples/sec Loss 2.5661 LearningRate 0.0005 Epoch: 15 Global Step: 26640 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:07:55,519-Speed 24558.86 samples/sec Loss 2.5472 LearningRate 0.0005 Epoch: 15 Global Step: 26650 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:08:05,529-Speed 24554.25 samples/sec Loss 2.5544 LearningRate 0.0005 Epoch: 15 Global Step: 26660 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:08:15,457-Speed 24757.15 samples/sec Loss 2.5547 LearningRate 0.0005 Epoch: 15 Global Step: 26670 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:08:25,361-Speed 24817.22 samples/sec Loss 2.5723 LearningRate 0.0005 Epoch: 15 Global Step: 26680 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:08:35,524-Speed 24185.78 samples/sec Loss 2.5619 LearningRate 0.0005 Epoch: 15 Global Step: 26690 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:08:45,638-Speed 24301.93 samples/sec Loss 2.5630 LearningRate 0.0005 Epoch: 15 Global Step: 26700 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:08:55,683-Speed 24465.74 samples/sec Loss 2.5666 LearningRate 0.0005 Epoch: 15 Global Step: 26710 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:09:05,644-Speed 24678.01 samples/sec Loss 2.5347 LearningRate 0.0005 Epoch: 15 Global Step: 26720 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:09:15,679-Speed 24498.89 samples/sec Loss 2.5531 LearningRate 0.0005 Epoch: 15 Global Step: 26730 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:09:25,638-Speed 24682.79 samples/sec Loss 2.5494 LearningRate 0.0005 Epoch: 15 Global Step: 26740 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:09:35,551-Speed 24795.31 samples/sec Loss 2.5412 LearningRate 0.0005 Epoch: 15 Global Step: 26750 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:09:45,491-Speed 24727.82 samples/sec Loss 2.5705 LearningRate 0.0005 Epoch: 15 Global Step: 26760 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:09:55,500-Speed 24556.02 samples/sec Loss 2.5705 LearningRate 0.0005 Epoch: 15 Global Step: 26770 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:10:05,541-Speed 24479.47 samples/sec Loss 2.5586 LearningRate 0.0005 Epoch: 15 Global Step: 26780 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:10:15,584-Speed 24473.44 samples/sec Loss 2.5495 LearningRate 0.0005 Epoch: 15 Global Step: 26790 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:10:25,525-Speed 24725.27 samples/sec Loss 2.5407 LearningRate 0.0005 Epoch: 15 Global Step: 26800 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:10:35,651-Speed 24273.39 samples/sec Loss 2.5442 LearningRate 0.0005 Epoch: 15 Global Step: 26810 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:10:45,761-Speed 24311.58 samples/sec Loss 2.5512 LearningRate 0.0005 Epoch: 15 Global Step: 26820 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:10:55,795-Speed 24496.00 samples/sec Loss 2.5431 LearningRate 0.0005 Epoch: 15 Global Step: 26830 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:11:05,828-Speed 24509.50 samples/sec Loss 2.5904 LearningRate 0.0005 Epoch: 15 Global Step: 26840 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:11:15,811-Speed 24622.75 samples/sec Loss 2.5845 LearningRate 0.0005 Epoch: 15 Global Step: 26850 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:11:25,748-Speed 24737.28 samples/sec Loss 2.5765 LearningRate 0.0005 Epoch: 15 Global Step: 26860 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:11:35,746-Speed 24584.30 samples/sec Loss 2.5961 LearningRate 0.0005 Epoch: 15 Global Step: 26870 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:11:45,730-Speed 24619.07 samples/sec Loss 2.5543 LearningRate 0.0005 Epoch: 15 Global Step: 26880 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:11:55,818-Speed 24364.15 samples/sec Loss 2.5467 LearningRate 0.0005 Epoch: 15 Global Step: 26890 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:12:05,802-Speed 24617.19 samples/sec Loss 2.5360 LearningRate 0.0005 Epoch: 15 Global Step: 26900 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:12:15,778-Speed 24640.45 samples/sec Loss 2.5203 LearningRate 0.0005 Epoch: 15 Global Step: 26910 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-03-26 06:12:25,788-Speed 24553.34 samples/sec Loss 2.5656 LearningRate 0.0005 Epoch: 15 Global Step: 26920 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:12:35,796-Speed 24559.73 samples/sec Loss 2.5597 LearningRate 0.0005 Epoch: 15 Global Step: 26930 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:12:45,788-Speed 24604.50 samples/sec Loss 2.5408 LearningRate 0.0005 Epoch: 15 Global Step: 26940 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:12:55,745-Speed 24683.92 samples/sec Loss 2.5311 LearningRate 0.0005 Epoch: 15 Global Step: 26950 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:13:05,734-Speed 24605.46 samples/sec Loss 2.5476 LearningRate 0.0005 Epoch: 15 Global Step: 26960 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:13:15,749-Speed 24542.30 samples/sec Loss 2.5397 LearningRate 0.0005 Epoch: 15 Global Step: 26970 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:13:25,927-Speed 24147.79 samples/sec Loss 2.5304 LearningRate 0.0005 Epoch: 15 Global Step: 26980 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:13:35,987-Speed 24432.28 samples/sec Loss 2.5293 LearningRate 0.0005 Epoch: 15 Global Step: 26990 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:13:46,013-Speed 24515.33 samples/sec Loss 2.5522 LearningRate 0.0005 Epoch: 15 Global Step: 27000 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:13:56,049-Speed 24489.87 samples/sec Loss 2.5784 LearningRate 0.0005 Epoch: 15 Global Step: 27010 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:14:06,126-Speed 24394.23 samples/sec Loss 2.5696 LearningRate 0.0005 Epoch: 15 Global Step: 27020 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:14:16,133-Speed 24569.02 samples/sec Loss 2.5473 LearningRate 0.0005 Epoch: 15 Global Step: 27030 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:14:26,151-Speed 24538.69 samples/sec Loss 2.5394 LearningRate 0.0005 Epoch: 15 Global Step: 27040 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:14:36,102-Speed 24699.45 samples/sec Loss 2.5639 LearningRate 0.0005 Epoch: 15 Global Step: 27050 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:14:46,058-Speed 24688.17 samples/sec Loss 2.5183 LearningRate 0.0005 Epoch: 15 Global Step: 27060 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:14:56,135-Speed 24392.27 samples/sec Loss 2.5361 LearningRate 0.0005 Epoch: 15 Global Step: 27070 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:15:06,218-Speed 24376.61 samples/sec Loss 2.5390 LearningRate 0.0005 Epoch: 15 Global Step: 27080 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:15:16,250-Speed 24498.95 samples/sec Loss 2.5530 LearningRate 0.0005 Epoch: 15 Global Step: 27090 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:15:26,228-Speed 24633.02 samples/sec Loss 2.5382 LearningRate 0.0005 Epoch: 15 Global Step: 27100 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:15:36,270-Speed 24476.18 samples/sec Loss 2.5266 LearningRate 0.0005 Epoch: 15 Global Step: 27110 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:15:46,386-Speed 24295.05 samples/sec Loss 2.5196 LearningRate 0.0005 Epoch: 15 Global Step: 27120 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:15:56,419-Speed 24499.24 samples/sec Loss 2.5392 LearningRate 0.0005 Epoch: 15 Global Step: 27130 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:16:06,373-Speed 24693.00 samples/sec Loss 2.5087 LearningRate 0.0005 Epoch: 15 Global Step: 27140 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:16:16,361-Speed 24606.75 samples/sec Loss 2.5463 LearningRate 0.0005 Epoch: 15 Global Step: 27150 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:16:26,412-Speed 24455.17 samples/sec Loss 2.5579 LearningRate 0.0005 Epoch: 15 Global Step: 27160 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:16:36,509-Speed 24342.73 samples/sec Loss 2.5295 LearningRate 0.0005 Epoch: 15 Global Step: 27170 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:16:46,560-Speed 24454.15 samples/sec Loss 2.5528 LearningRate 0.0005 Epoch: 15 Global Step: 27180 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:16:56,524-Speed 24669.50 samples/sec Loss 2.5119 LearningRate 0.0005 Epoch: 15 Global Step: 27190 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:17:06,701-Speed 24149.85 samples/sec Loss 2.5214 LearningRate 0.0005 Epoch: 15 Global Step: 27200 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:17:16,779-Speed 24389.40 samples/sec Loss 2.5504 LearningRate 0.0005 Epoch: 15 Global Step: 27210 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:17:26,787-Speed 24560.21 samples/sec Loss 2.5249 LearningRate 0.0005 Epoch: 15 Global Step: 27220 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:17:36,851-Speed 24421.00 samples/sec Loss 2.5057 LearningRate 0.0005 Epoch: 15 Global Step: 27230 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:17:46,868-Speed 24538.24 samples/sec Loss 2.5348 LearningRate 0.0005 Epoch: 15 Global Step: 27240 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:17:56,887-Speed 24531.84 samples/sec Loss 2.5256 LearningRate 0.0005 Epoch: 15 Global Step: 27250 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:18:06,856-Speed 24664.91 samples/sec Loss 2.5483 LearningRate 0.0005 Epoch: 15 Global Step: 27260 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:18:16,891-Speed 24492.34 samples/sec Loss 2.5316 LearningRate 0.0005 Epoch: 15 Global Step: 27270 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:18:26,922-Speed 24504.02 samples/sec Loss 2.5145 LearningRate 0.0005 Epoch: 15 Global Step: 27280 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:18:37,022-Speed 24334.15 samples/sec Loss 2.5041 LearningRate 0.0005 Epoch: 15 Global Step: 27290 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:18:47,127-Speed 24322.87 samples/sec Loss 2.5244 LearningRate 0.0005 Epoch: 15 Global Step: 27300 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:18:57,138-Speed 24557.72 samples/sec Loss 2.5318 LearningRate 0.0005 Epoch: 15 Global Step: 27310 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:19:07,158-Speed 24530.15 samples/sec Loss 2.5131 LearningRate 0.0005 Epoch: 15 Global Step: 27320 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:19:17,132-Speed 24642.28 samples/sec Loss 2.5236 LearningRate 0.0005 Epoch: 15 Global Step: 27330 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:19:27,288-Speed 24201.52 samples/sec Loss 2.5167 LearningRate 0.0005 Epoch: 15 Global Step: 27340 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:19:37,323-Speed 24493.06 samples/sec Loss 2.5231 LearningRate 0.0005 Epoch: 15 Global Step: 27350 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:19:47,302-Speed 24630.61 samples/sec Loss 2.5089 LearningRate 0.0005 Epoch: 15 Global Step: 27360 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:19:57,336-Speed 24497.67 samples/sec Loss 2.5165 LearningRate 0.0005 Epoch: 15 Global Step: 27370 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:20:07,350-Speed 24543.70 samples/sec Loss 2.5128 LearningRate 0.0005 Epoch: 15 Global Step: 27380 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:20:17,329-Speed 24631.88 samples/sec Loss 2.5373 LearningRate 0.0004 Epoch: 15 Global Step: 27390 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:20:27,325-Speed 24594.50 samples/sec Loss 2.5126 LearningRate 0.0004 Epoch: 15 Global Step: 27400 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:20:37,391-Speed 24418.93 samples/sec Loss 2.5310 LearningRate 0.0004 Epoch: 15 Global Step: 27410 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:20:47,385-Speed 24593.46 samples/sec Loss 2.5269 LearningRate 0.0004 Epoch: 15 Global Step: 27420 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:20:57,431-Speed 24465.74 samples/sec Loss 2.5957 LearningRate 0.0004 Epoch: 15 Global Step: 27430 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:21:07,545-Speed 24302.40 samples/sec Loss 2.6208 LearningRate 0.0004 Epoch: 15 Global Step: 27440 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:21:17,558-Speed 24546.18 samples/sec Loss 2.5146 LearningRate 0.0004 Epoch: 15 Global Step: 27450 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:21:27,628-Speed 24408.35 samples/sec Loss 2.4942 LearningRate 0.0004 Epoch: 15 Global Step: 27460 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:21:37,629-Speed 24576.48 samples/sec Loss 2.5041 LearningRate 0.0004 Epoch: 15 Global Step: 27470 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:21:47,611-Speed 24622.38 samples/sec Loss 2.5022 LearningRate 0.0004 Epoch: 15 Global Step: 27480 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:21:57,458-Speed 24959.67 samples/sec Loss 2.5127 LearningRate 0.0004 Epoch: 15 Global Step: 27490 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:22:07,220-Speed 25179.60 samples/sec Loss 2.4902 LearningRate 0.0004 Epoch: 15 Global Step: 27500 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:22:17,002-Speed 25124.60 samples/sec Loss 2.5103 LearningRate 0.0004 Epoch: 15 Global Step: 27510 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:22:26,783-Speed 25129.33 samples/sec Loss 2.5264 LearningRate 0.0004 Epoch: 15 Global Step: 27520 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:22:36,558-Speed 25146.26 samples/sec Loss 2.5383 LearningRate 0.0004 Epoch: 15 Global Step: 27530 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:22:46,276-Speed 25291.95 samples/sec Loss 2.5319 LearningRate 0.0004 Epoch: 15 Global Step: 27540 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:22:56,148-Speed 24895.62 samples/sec Loss 2.5396 LearningRate 0.0004 Epoch: 15 Global Step: 27550 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:23:05,894-Speed 25219.70 samples/sec Loss 2.5327 LearningRate 0.0004 Epoch: 15 Global Step: 27560 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:23:15,729-Speed 24991.90 samples/sec Loss 2.5231 LearningRate 0.0004 Epoch: 15 Global Step: 27570 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:23:25,424-Speed 25350.50 samples/sec Loss 2.5173 LearningRate 0.0004 Epoch: 15 Global Step: 27580 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:23:35,185-Speed 25182.49 samples/sec Loss 2.5123 LearningRate 0.0004 Epoch: 15 Global Step: 27590 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:23:44,911-Speed 25271.08 samples/sec Loss 2.5093 LearningRate 0.0004 Epoch: 15 Global Step: 27600 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:23:54,650-Speed 25238.23 samples/sec Loss 2.5060 LearningRate 0.0004 Epoch: 15 Global Step: 27610 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:24:04,445-Speed 25094.70 samples/sec Loss 2.5299 LearningRate 0.0004 Epoch: 15 Global Step: 27620 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:24:14,328-Speed 24872.81 samples/sec Loss 2.5341 LearningRate 0.0004 Epoch: 15 Global Step: 27630 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:24:24,113-Speed 25116.48 samples/sec Loss 2.5436 LearningRate 0.0004 Epoch: 15 Global Step: 27640 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:24:33,918-Speed 25068.21 samples/sec Loss 2.5325 LearningRate 0.0004 Epoch: 15 Global Step: 27650 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:25:33,874-Speed 4099.15 samples/sec Loss 2.4976 LearningRate 0.0004 Epoch: 16 Global Step: 27660 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:25:43,613-Speed 25237.95 samples/sec Loss 2.4591 LearningRate 0.0004 Epoch: 16 Global Step: 27670 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:25:53,336-Speed 25284.15 samples/sec Loss 2.4822 LearningRate 0.0004 Epoch: 16 Global Step: 27680 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:26:03,068-Speed 25256.01 samples/sec Loss 2.4693 LearningRate 0.0004 Epoch: 16 Global Step: 27690 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:26:12,908-Speed 24978.59 samples/sec Loss 2.4737 LearningRate 0.0004 Epoch: 16 Global Step: 27700 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:26:22,673-Speed 25169.77 samples/sec Loss 2.4759 LearningRate 0.0004 Epoch: 16 Global Step: 27710 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:26:32,507-Speed 24993.79 samples/sec Loss 2.4937 LearningRate 0.0004 Epoch: 16 Global Step: 27720 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:26:42,344-Speed 24985.68 samples/sec Loss 2.4836 LearningRate 0.0004 Epoch: 16 Global Step: 27730 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:26:52,101-Speed 25191.44 samples/sec Loss 2.5108 LearningRate 0.0004 Epoch: 16 Global Step: 27740 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:27:01,950-Speed 24956.35 samples/sec Loss 2.4789 LearningRate 0.0004 Epoch: 16 Global Step: 27750 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:27:11,772-Speed 25025.13 samples/sec Loss 2.4870 LearningRate 0.0004 Epoch: 16 Global Step: 27760 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:27:21,580-Speed 25059.13 samples/sec Loss 2.4858 LearningRate 0.0004 Epoch: 16 Global Step: 27770 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:27:31,488-Speed 24818.15 samples/sec Loss 2.4824 LearningRate 0.0004 Epoch: 16 Global Step: 27780 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:27:41,309-Speed 25027.33 samples/sec Loss 2.4695 LearningRate 0.0004 Epoch: 16 Global Step: 27790 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:27:51,041-Speed 25256.77 samples/sec Loss 2.5035 LearningRate 0.0004 Epoch: 16 Global Step: 27800 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:28:00,856-Speed 25049.20 samples/sec Loss 2.4733 LearningRate 0.0004 Epoch: 16 Global Step: 27810 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:28:10,681-Speed 25015.53 samples/sec Loss 2.4821 LearningRate 0.0004 Epoch: 16 Global Step: 27820 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:28:20,536-Speed 24948.38 samples/sec Loss 2.4696 LearningRate 0.0004 Epoch: 16 Global Step: 27830 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:28:30,264-Speed 25264.97 samples/sec Loss 2.4592 LearningRate 0.0004 Epoch: 16 Global Step: 27840 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:28:40,103-Speed 24984.27 samples/sec Loss 2.4878 LearningRate 0.0004 Epoch: 16 Global Step: 27850 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:28:49,852-Speed 25211.68 samples/sec Loss 2.4989 LearningRate 0.0004 Epoch: 16 Global Step: 27860 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:28:59,649-Speed 25087.55 samples/sec Loss 2.4698 LearningRate 0.0004 Epoch: 16 Global Step: 27870 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:29:09,423-Speed 25151.81 samples/sec Loss 2.4811 LearningRate 0.0004 Epoch: 16 Global Step: 27880 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:29:19,328-Speed 24817.81 samples/sec Loss 2.4966 LearningRate 0.0004 Epoch: 16 Global Step: 27890 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:29:29,335-Speed 24569.00 samples/sec Loss 2.4885 LearningRate 0.0004 Epoch: 16 Global Step: 27900 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:29:39,380-Speed 24470.03 samples/sec Loss 2.5056 LearningRate 0.0004 Epoch: 16 Global Step: 27910 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:29:49,426-Speed 24468.61 samples/sec Loss 2.4892 LearningRate 0.0004 Epoch: 16 Global Step: 27920 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:29:59,338-Speed 24798.10 samples/sec Loss 2.4814 LearningRate 0.0004 Epoch: 16 Global Step: 27930 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:30:09,179-Speed 24977.38 samples/sec Loss 2.4770 LearningRate 0.0004 Epoch: 16 Global Step: 27940 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:30:18,934-Speed 25196.86 samples/sec Loss 2.5021 LearningRate 0.0004 Epoch: 16 Global Step: 27950 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:30:28,645-Speed 25311.15 samples/sec Loss 2.5087 LearningRate 0.0004 Epoch: 16 Global Step: 27960 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:30:38,531-Speed 24862.60 samples/sec Loss 2.4922 LearningRate 0.0004 Epoch: 16 Global Step: 27970 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:30:48,358-Speed 25019.80 samples/sec Loss 2.4857 LearningRate 0.0004 Epoch: 16 Global Step: 27980 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:30:58,113-Speed 25196.54 samples/sec Loss 2.4761 LearningRate 0.0004 Epoch: 16 Global Step: 27990 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:31:07,896-Speed 25123.36 samples/sec Loss 2.4781 LearningRate 0.0004 Epoch: 16 Global Step: 28000 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:31:17,706-Speed 25057.93 samples/sec Loss 2.4693 LearningRate 0.0004 Epoch: 16 Global Step: 28010 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:31:27,405-Speed 25344.19 samples/sec Loss 2.4857 LearningRate 0.0004 Epoch: 16 Global Step: 28020 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:31:37,226-Speed 25025.65 samples/sec Loss 2.4951 LearningRate 0.0004 Epoch: 16 Global Step: 28030 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:31:46,946-Speed 25291.79 samples/sec Loss 2.5407 LearningRate 0.0004 Epoch: 16 Global Step: 28040 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:31:56,672-Speed 25273.41 samples/sec Loss 2.4787 LearningRate 0.0004 Epoch: 16 Global Step: 28050 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:32:06,504-Speed 25001.15 samples/sec Loss 2.4834 LearningRate 0.0004 Epoch: 16 Global Step: 28060 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:32:16,422-Speed 24785.36 samples/sec Loss 2.4543 LearningRate 0.0004 Epoch: 16 Global Step: 28070 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:32:26,396-Speed 24648.09 samples/sec Loss 2.4687 LearningRate 0.0004 Epoch: 16 Global Step: 28080 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:32:36,378-Speed 24627.19 samples/sec Loss 2.4688 LearningRate 0.0004 Epoch: 16 Global Step: 28090 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:32:46,361-Speed 24629.43 samples/sec Loss 2.4938 LearningRate 0.0004 Epoch: 16 Global Step: 28100 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:32:56,463-Speed 24333.19 samples/sec Loss 2.4625 LearningRate 0.0004 Epoch: 16 Global Step: 28110 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:33:06,461-Speed 24587.29 samples/sec Loss 2.4648 LearningRate 0.0004 Epoch: 16 Global Step: 28120 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:33:16,372-Speed 24799.51 samples/sec Loss 2.4645 LearningRate 0.0004 Epoch: 16 Global Step: 28130 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:33:26,403-Speed 24504.43 samples/sec Loss 2.4872 LearningRate 0.0004 Epoch: 16 Global Step: 28140 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:33:36,361-Speed 24683.31 samples/sec Loss 2.4677 LearningRate 0.0004 Epoch: 16 Global Step: 28150 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-26 06:33:46,126-Speed 25180.28 samples/sec Loss 2.4641 LearningRate 0.0004 Epoch: 16 Global Step: 28160 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:33:55,961-Speed 24991.49 samples/sec Loss 2.4431 LearningRate 0.0004 Epoch: 16 Global Step: 28170 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:34:05,720-Speed 25188.43 samples/sec Loss 2.4665 LearningRate 0.0004 Epoch: 16 Global Step: 28180 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:34:15,550-Speed 25005.89 samples/sec Loss 2.4825 LearningRate 0.0004 Epoch: 16 Global Step: 28190 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:34:25,481-Speed 24748.24 samples/sec Loss 2.4716 LearningRate 0.0004 Epoch: 16 Global Step: 28200 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:34:35,370-Speed 24856.35 samples/sec Loss 2.4468 LearningRate 0.0004 Epoch: 16 Global Step: 28210 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:34:45,099-Speed 25269.19 samples/sec Loss 2.4795 LearningRate 0.0004 Epoch: 16 Global Step: 28220 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:34:54,870-Speed 25156.62 samples/sec Loss 2.4934 LearningRate 0.0004 Epoch: 16 Global Step: 28230 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:35:04,645-Speed 25145.42 samples/sec Loss 2.4904 LearningRate 0.0004 Epoch: 16 Global Step: 28240 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:35:14,421-Speed 25142.84 samples/sec Loss 2.4551 LearningRate 0.0004 Epoch: 16 Global Step: 28250 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:35:24,213-Speed 25118.20 samples/sec Loss 2.4747 LearningRate 0.0004 Epoch: 16 Global Step: 28260 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:35:34,029-Speed 25039.60 samples/sec Loss 2.4722 LearningRate 0.0004 Epoch: 16 Global Step: 28270 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:35:43,810-Speed 25129.90 samples/sec Loss 2.4607 LearningRate 0.0004 Epoch: 16 Global Step: 28280 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:35:53,634-Speed 25019.41 samples/sec Loss 2.4563 LearningRate 0.0004 Epoch: 16 Global Step: 28290 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:36:03,372-Speed 25240.95 samples/sec Loss 2.5356 LearningRate 0.0004 Epoch: 16 Global Step: 28300 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:36:13,046-Speed 25407.87 samples/sec Loss 2.4844 LearningRate 0.0004 Epoch: 16 Global Step: 28310 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:36:22,815-Speed 25162.14 samples/sec Loss 2.4680 LearningRate 0.0004 Epoch: 16 Global Step: 28320 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:36:32,585-Speed 25156.30 samples/sec Loss 2.4467 LearningRate 0.0004 Epoch: 16 Global Step: 28330 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:36:42,468-Speed 24872.07 samples/sec Loss 2.5125 LearningRate 0.0004 Epoch: 16 Global Step: 28340 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:36:52,223-Speed 25201.83 samples/sec Loss 2.4568 LearningRate 0.0004 Epoch: 16 Global Step: 28350 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:37:01,993-Speed 25157.03 samples/sec Loss 2.4514 LearningRate 0.0004 Epoch: 16 Global Step: 28360 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:37:11,764-Speed 25161.67 samples/sec Loss 2.4371 LearningRate 0.0004 Epoch: 16 Global Step: 28370 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:37:21,532-Speed 25161.39 samples/sec Loss 2.4575 LearningRate 0.0004 Epoch: 16 Global Step: 28380 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-26 06:37:31,273-Speed 25234.07 samples/sec Loss 2.4532 LearningRate 0.0004 Epoch: 16 Global Step: 28390 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:37:41,035-Speed 25177.19 samples/sec Loss 2.4473 LearningRate 0.0004 Epoch: 16 Global Step: 28400 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:37:50,754-Speed 25289.48 samples/sec Loss 2.4342 LearningRate 0.0004 Epoch: 16 Global Step: 28410 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:38:00,528-Speed 25148.75 samples/sec Loss 2.4380 LearningRate 0.0004 Epoch: 16 Global Step: 28420 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:38:10,224-Speed 25351.05 samples/sec Loss 2.4584 LearningRate 0.0004 Epoch: 16 Global Step: 28430 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:38:19,994-Speed 25157.95 samples/sec Loss 2.4495 LearningRate 0.0004 Epoch: 16 Global Step: 28440 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:38:29,776-Speed 25124.75 samples/sec Loss 2.4495 LearningRate 0.0004 Epoch: 16 Global Step: 28450 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:38:39,543-Speed 25165.94 samples/sec Loss 2.4397 LearningRate 0.0004 Epoch: 16 Global Step: 28460 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-03-26 06:38:49,315-Speed 25153.29 samples/sec Loss 2.4261 LearningRate 0.0004 Epoch: 16 Global Step: 28470 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:38:59,078-Speed 25178.91 samples/sec Loss 2.4464 LearningRate 0.0004 Epoch: 16 Global Step: 28480 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:39:08,822-Speed 25226.45 samples/sec Loss 2.4645 LearningRate 0.0004 Epoch: 16 Global Step: 28490 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:39:18,578-Speed 25194.57 samples/sec Loss 2.4436 LearningRate 0.0004 Epoch: 16 Global Step: 28500 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:39:28,382-Speed 25071.98 samples/sec Loss 2.4278 LearningRate 0.0004 Epoch: 16 Global Step: 28510 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:39:38,138-Speed 25199.10 samples/sec Loss 2.4541 LearningRate 0.0004 Epoch: 16 Global Step: 28520 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:39:47,866-Speed 25266.64 samples/sec Loss 2.4383 LearningRate 0.0004 Epoch: 16 Global Step: 28530 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:39:57,667-Speed 25077.53 samples/sec Loss 2.4335 LearningRate 0.0004 Epoch: 16 Global Step: 28540 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:40:07,586-Speed 24779.37 samples/sec Loss 2.4193 LearningRate 0.0004 Epoch: 16 Global Step: 28550 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:40:17,366-Speed 25131.36 samples/sec Loss 2.4330 LearningRate 0.0004 Epoch: 16 Global Step: 28560 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:40:27,100-Speed 25250.36 samples/sec Loss 2.4376 LearningRate 0.0004 Epoch: 16 Global Step: 28570 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:40:36,871-Speed 25156.48 samples/sec Loss 2.4410 LearningRate 0.0004 Epoch: 16 Global Step: 28580 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:40:46,627-Speed 25192.75 samples/sec Loss 2.4319 LearningRate 0.0004 Epoch: 16 Global Step: 28590 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:40:56,405-Speed 25137.32 samples/sec Loss 2.4419 LearningRate 0.0004 Epoch: 16 Global Step: 28600 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:41:06,175-Speed 25156.57 samples/sec Loss 2.4254 LearningRate 0.0004 Epoch: 16 Global Step: 28610 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:41:15,995-Speed 25030.38 samples/sec Loss 2.4423 LearningRate 0.0004 Epoch: 16 Global Step: 28620 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:41:25,714-Speed 25289.19 samples/sec Loss 2.4473 LearningRate 0.0004 Epoch: 16 Global Step: 28630 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:41:35,450-Speed 25247.24 samples/sec Loss 2.4403 LearningRate 0.0004 Epoch: 16 Global Step: 28640 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:41:45,241-Speed 25109.80 samples/sec Loss 2.4490 LearningRate 0.0004 Epoch: 16 Global Step: 28650 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:41:55,096-Speed 24940.17 samples/sec Loss 2.4318 LearningRate 0.0004 Epoch: 16 Global Step: 28660 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:42:04,823-Speed 25270.70 samples/sec Loss 2.4411 LearningRate 0.0004 Epoch: 16 Global Step: 28670 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:42:14,641-Speed 25033.15 samples/sec Loss 2.4489 LearningRate 0.0004 Epoch: 16 Global Step: 28680 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:42:24,344-Speed 25332.66 samples/sec Loss 2.4612 LearningRate 0.0004 Epoch: 16 Global Step: 28690 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:42:34,161-Speed 25039.35 samples/sec Loss 2.4602 LearningRate 0.0004 Epoch: 16 Global Step: 28700 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:42:43,937-Speed 25141.91 samples/sec Loss 2.4245 LearningRate 0.0004 Epoch: 16 Global Step: 28710 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:42:53,686-Speed 25213.13 samples/sec Loss 2.4318 LearningRate 0.0004 Epoch: 16 Global Step: 28720 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:43:03,384-Speed 25344.52 samples/sec Loss 2.4303 LearningRate 0.0004 Epoch: 16 Global Step: 28730 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:43:13,132-Speed 25216.57 samples/sec Loss 2.4358 LearningRate 0.0004 Epoch: 16 Global Step: 28740 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:43:22,908-Speed 25141.93 samples/sec Loss 2.4393 LearningRate 0.0004 Epoch: 16 Global Step: 28750 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:43:32,672-Speed 25172.13 samples/sec Loss 2.4237 LearningRate 0.0004 Epoch: 16 Global Step: 28760 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:43:42,397-Speed 25273.68 samples/sec Loss 2.4345 LearningRate 0.0004 Epoch: 16 Global Step: 28770 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:43:52,267-Speed 24902.71 samples/sec Loss 2.4670 LearningRate 0.0004 Epoch: 16 Global Step: 28780 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:44:02,135-Speed 24907.48 samples/sec Loss 2.4311 LearningRate 0.0004 Epoch: 16 Global Step: 28790 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:44:11,853-Speed 25294.16 samples/sec Loss 2.4250 LearningRate 0.0004 Epoch: 16 Global Step: 28800 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:44:21,604-Speed 25206.35 samples/sec Loss 2.4181 LearningRate 0.0004 Epoch: 16 Global Step: 28810 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:44:31,453-Speed 24957.02 samples/sec Loss 2.4294 LearningRate 0.0004 Epoch: 16 Global Step: 28820 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:44:41,249-Speed 25091.31 samples/sec Loss 2.4304 LearningRate 0.0004 Epoch: 16 Global Step: 28830 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:44:51,035-Speed 25118.25 samples/sec Loss 2.4392 LearningRate 0.0004 Epoch: 16 Global Step: 28840 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:45:00,799-Speed 25172.36 samples/sec Loss 2.4145 LearningRate 0.0004 Epoch: 16 Global Step: 28850 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:45:10,593-Speed 25095.46 samples/sec Loss 2.4282 LearningRate 0.0004 Epoch: 16 Global Step: 28860 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:45:20,479-Speed 24864.30 samples/sec Loss 2.4323 LearningRate 0.0004 Epoch: 16 Global Step: 28870 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:45:30,485-Speed 24565.28 samples/sec Loss 2.4246 LearningRate 0.0004 Epoch: 16 Global Step: 28880 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:45:40,439-Speed 24694.04 samples/sec Loss 2.4288 LearningRate 0.0004 Epoch: 16 Global Step: 28890 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:45:50,345-Speed 24814.57 samples/sec Loss 2.4057 LearningRate 0.0004 Epoch: 16 Global Step: 28900 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:46:00,310-Speed 24664.67 samples/sec Loss 2.4292 LearningRate 0.0004 Epoch: 16 Global Step: 28910 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:46:10,257-Speed 24710.72 samples/sec Loss 2.4229 LearningRate 0.0004 Epoch: 16 Global Step: 28920 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:46:20,201-Speed 24718.70 samples/sec Loss 2.4212 LearningRate 0.0004 Epoch: 16 Global Step: 28930 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:46:30,002-Speed 25078.54 samples/sec Loss 2.4135 LearningRate 0.0004 Epoch: 16 Global Step: 28940 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:46:39,796-Speed 25104.05 samples/sec Loss 2.4205 LearningRate 0.0004 Epoch: 16 Global Step: 28950 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:46:49,574-Speed 25138.04 samples/sec Loss 2.4205 LearningRate 0.0004 Epoch: 16 Global Step: 28960 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:46:59,273-Speed 25340.84 samples/sec Loss 2.4392 LearningRate 0.0004 Epoch: 16 Global Step: 28970 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:47:08,987-Speed 25302.01 samples/sec Loss 2.4160 LearningRate 0.0004 Epoch: 16 Global Step: 28980 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:47:18,697-Speed 25315.32 samples/sec Loss 2.4143 LearningRate 0.0004 Epoch: 16 Global Step: 28990 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:47:28,394-Speed 25349.11 samples/sec Loss 2.4057 LearningRate 0.0004 Epoch: 16 Global Step: 29000 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:47:38,129-Speed 25250.19 samples/sec Loss 2.4181 LearningRate 0.0004 Epoch: 16 Global Step: 29010 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:47:47,929-Speed 25080.64 samples/sec Loss 2.4056 LearningRate 0.0004 Epoch: 16 Global Step: 29020 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:47:57,638-Speed 25318.55 samples/sec Loss 2.4099 LearningRate 0.0004 Epoch: 16 Global Step: 29030 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:48:07,420-Speed 25124.17 samples/sec Loss 2.4054 LearningRate 0.0004 Epoch: 16 Global Step: 29040 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:48:17,321-Speed 24827.43 samples/sec Loss 2.4008 LearningRate 0.0004 Epoch: 16 Global Step: 29050 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:48:27,152-Speed 25003.38 samples/sec Loss 2.4028 LearningRate 0.0004 Epoch: 16 Global Step: 29060 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:48:36,878-Speed 25271.36 samples/sec Loss 2.4113 LearningRate 0.0004 Epoch: 16 Global Step: 29070 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-03-26 06:48:46,594-Speed 25296.93 samples/sec Loss 2.4366 LearningRate 0.0004 Epoch: 16 Global Step: 29080 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:48:56,368-Speed 25147.34 samples/sec Loss 2.4258 LearningRate 0.0004 Epoch: 16 Global Step: 29090 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:49:06,226-Speed 24933.65 samples/sec Loss 2.4383 LearningRate 0.0004 Epoch: 16 Global Step: 29100 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:49:16,090-Speed 24917.61 samples/sec Loss 2.4264 LearningRate 0.0004 Epoch: 16 Global Step: 29110 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:49:25,865-Speed 25145.35 samples/sec Loss 2.4045 LearningRate 0.0004 Epoch: 16 Global Step: 29120 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:49:35,555-Speed 25366.10 samples/sec Loss 2.4331 LearningRate 0.0004 Epoch: 16 Global Step: 29130 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:49:45,280-Speed 25274.04 samples/sec Loss 2.4302 LearningRate 0.0004 Epoch: 16 Global Step: 29140 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:49:55,318-Speed 24485.65 samples/sec Loss 2.4158 LearningRate 0.0004 Epoch: 16 Global Step: 29150 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:50:05,122-Speed 25072.43 samples/sec Loss 2.4249 LearningRate 0.0004 Epoch: 16 Global Step: 29160 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:50:14,845-Speed 25277.83 samples/sec Loss 2.4169 LearningRate 0.0004 Epoch: 16 Global Step: 29170 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:50:24,523-Speed 25397.65 samples/sec Loss 2.3961 LearningRate 0.0004 Epoch: 16 Global Step: 29180 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:50:34,229-Speed 25324.37 samples/sec Loss 2.4177 LearningRate 0.0004 Epoch: 16 Global Step: 29190 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:50:44,113-Speed 24867.81 samples/sec Loss 2.4159 LearningRate 0.0004 Epoch: 16 Global Step: 29200 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:50:53,903-Speed 25106.29 samples/sec Loss 2.4242 LearningRate 0.0004 Epoch: 16 Global Step: 29210 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:51:03,723-Speed 25029.86 samples/sec Loss 2.4029 LearningRate 0.0004 Epoch: 16 Global Step: 29220 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:51:13,440-Speed 25293.32 samples/sec Loss 2.4238 LearningRate 0.0004 Epoch: 16 Global Step: 29230 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:51:23,177-Speed 25244.50 samples/sec Loss 2.4265 LearningRate 0.0004 Epoch: 16 Global Step: 29240 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:51:32,934-Speed 25195.83 samples/sec Loss 2.4154 LearningRate 0.0004 Epoch: 16 Global Step: 29250 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:51:42,761-Speed 25013.66 samples/sec Loss 2.3963 LearningRate 0.0004 Epoch: 16 Global Step: 29260 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:51:52,471-Speed 25312.35 samples/sec Loss 2.3893 LearningRate 0.0004 Epoch: 16 Global Step: 29270 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:52:02,250-Speed 25133.75 samples/sec Loss 2.4088 LearningRate 0.0004 Epoch: 16 Global Step: 29280 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:52:11,978-Speed 25269.15 samples/sec Loss 2.4265 LearningRate 0.0004 Epoch: 16 Global Step: 29290 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:52:21,779-Speed 25079.59 samples/sec Loss 2.4141 LearningRate 0.0004 Epoch: 16 Global Step: 29300 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:52:31,593-Speed 25042.63 samples/sec Loss 2.4065 LearningRate 0.0004 Epoch: 16 Global Step: 29310 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:52:41,309-Speed 25299.23 samples/sec Loss 2.4120 LearningRate 0.0004 Epoch: 16 Global Step: 29320 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:52:51,072-Speed 25175.60 samples/sec Loss 2.3980 LearningRate 0.0004 Epoch: 16 Global Step: 29330 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:53:00,787-Speed 25301.34 samples/sec Loss 2.4094 LearningRate 0.0004 Epoch: 16 Global Step: 29340 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:53:10,523-Speed 25246.80 samples/sec Loss 2.4026 LearningRate 0.0004 Epoch: 16 Global Step: 29350 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:53:20,348-Speed 25017.48 samples/sec Loss 2.4113 LearningRate 0.0004 Epoch: 16 Global Step: 29360 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:53:30,116-Speed 25163.67 samples/sec Loss 2.4156 LearningRate 0.0004 Epoch: 16 Global Step: 29370 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:53:39,796-Speed 25390.43 samples/sec Loss 2.4393 LearningRate 0.0004 Epoch: 16 Global Step: 29380 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 06:54:39,729-Speed 4100.69 samples/sec Loss 2.4503 LearningRate 0.0004 Epoch: 17 Global Step: 29390 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 06:54:49,503-Speed 25149.01 samples/sec Loss 2.3802 LearningRate 0.0004 Epoch: 17 Global Step: 29400 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 06:54:59,220-Speed 25293.83 samples/sec Loss 2.3769 LearningRate 0.0004 Epoch: 17 Global Step: 29410 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 06:55:08,911-Speed 25362.66 samples/sec Loss 2.4014 LearningRate 0.0004 Epoch: 17 Global Step: 29420 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 06:55:18,750-Speed 24982.06 samples/sec Loss 2.4158 LearningRate 0.0004 Epoch: 17 Global Step: 29430 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 06:55:28,500-Speed 25211.13 samples/sec Loss 2.3934 LearningRate 0.0004 Epoch: 17 Global Step: 29440 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 06:55:38,243-Speed 25227.00 samples/sec Loss 2.3708 LearningRate 0.0004 Epoch: 17 Global Step: 29450 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 06:55:48,068-Speed 25018.64 samples/sec Loss 2.3767 LearningRate 0.0004 Epoch: 17 Global Step: 29460 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 06:55:57,825-Speed 25191.91 samples/sec Loss 2.3449 LearningRate 0.0004 Epoch: 17 Global Step: 29470 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 06:56:07,545-Speed 25286.38 samples/sec Loss 2.3650 LearningRate 0.0004 Epoch: 17 Global Step: 29480 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:56:17,248-Speed 25336.70 samples/sec Loss 2.3733 LearningRate 0.0004 Epoch: 17 Global Step: 29490 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:56:26,914-Speed 25426.92 samples/sec Loss 2.3787 LearningRate 0.0004 Epoch: 17 Global Step: 29500 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:56:36,707-Speed 25100.94 samples/sec Loss 2.3735 LearningRate 0.0004 Epoch: 17 Global Step: 29510 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:56:46,494-Speed 25114.86 samples/sec Loss 2.3847 LearningRate 0.0004 Epoch: 17 Global Step: 29520 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:56:56,239-Speed 25224.99 samples/sec Loss 2.3554 LearningRate 0.0004 Epoch: 17 Global Step: 29530 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:57:05,977-Speed 25238.56 samples/sec Loss 2.3972 LearningRate 0.0004 Epoch: 17 Global Step: 29540 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:57:15,723-Speed 25221.84 samples/sec Loss 2.3791 LearningRate 0.0004 Epoch: 17 Global Step: 29550 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:57:25,587-Speed 24921.29 samples/sec Loss 2.3726 LearningRate 0.0004 Epoch: 17 Global Step: 29560 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:57:35,279-Speed 25360.11 samples/sec Loss 2.3677 LearningRate 0.0004 Epoch: 17 Global Step: 29570 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:57:45,057-Speed 25138.82 samples/sec Loss 2.3840 LearningRate 0.0004 Epoch: 17 Global Step: 29580 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 06:57:54,859-Speed 25074.96 samples/sec Loss 2.3712 LearningRate 0.0004 Epoch: 17 Global Step: 29590 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 06:58:04,569-Speed 25312.48 samples/sec Loss 2.4153 LearningRate 0.0004 Epoch: 17 Global Step: 29600 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 06:58:14,300-Speed 25257.50 samples/sec Loss 2.4927 LearningRate 0.0004 Epoch: 17 Global Step: 29610 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 06:58:24,020-Speed 25289.09 samples/sec Loss 2.4426 LearningRate 0.0004 Epoch: 17 Global Step: 29620 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 06:58:33,756-Speed 25248.22 samples/sec Loss 2.3818 LearningRate 0.0004 Epoch: 17 Global Step: 29630 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 06:58:43,540-Speed 25122.31 samples/sec Loss 2.3676 LearningRate 0.0004 Epoch: 17 Global Step: 29640 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 06:58:53,252-Speed 25309.22 samples/sec Loss 2.3609 LearningRate 0.0004 Epoch: 17 Global Step: 29650 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 06:59:02,990-Speed 25240.43 samples/sec Loss 2.3778 LearningRate 0.0004 Epoch: 17 Global Step: 29660 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 06:59:12,800-Speed 25053.18 samples/sec Loss 2.3862 LearningRate 0.0004 Epoch: 17 Global Step: 29670 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 06:59:22,653-Speed 24947.51 samples/sec Loss 2.4062 LearningRate 0.0004 Epoch: 17 Global Step: 29680 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:59:32,419-Speed 25171.44 samples/sec Loss 2.3968 LearningRate 0.0004 Epoch: 17 Global Step: 29690 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:59:42,195-Speed 25141.23 samples/sec Loss 2.3803 LearningRate 0.0004 Epoch: 17 Global Step: 29700 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 06:59:51,933-Speed 25240.71 samples/sec Loss 2.3871 LearningRate 0.0004 Epoch: 17 Global Step: 29710 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:00:01,638-Speed 25325.34 samples/sec Loss 2.3795 LearningRate 0.0004 Epoch: 17 Global Step: 29720 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:00:11,389-Speed 25207.26 samples/sec Loss 2.4096 LearningRate 0.0004 Epoch: 17 Global Step: 29730 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:00:21,192-Speed 25072.92 samples/sec Loss 2.3869 LearningRate 0.0004 Epoch: 17 Global Step: 29740 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:00:30,969-Speed 25138.57 samples/sec Loss 2.3791 LearningRate 0.0004 Epoch: 17 Global Step: 29750 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:00:40,781-Speed 25049.83 samples/sec Loss 2.3851 LearningRate 0.0004 Epoch: 17 Global Step: 29760 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:00:50,485-Speed 25330.85 samples/sec Loss 2.3677 LearningRate 0.0004 Epoch: 17 Global Step: 29770 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:01:00,250-Speed 25172.25 samples/sec Loss 2.3891 LearningRate 0.0004 Epoch: 17 Global Step: 29780 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-03-26 07:01:09,905-Speed 25459.36 samples/sec Loss 2.3908 LearningRate 0.0004 Epoch: 17 Global Step: 29790 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:01:19,619-Speed 25302.57 samples/sec Loss 2.3671 LearningRate 0.0004 Epoch: 17 Global Step: 29800 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:01:29,418-Speed 25083.04 samples/sec Loss 2.3820 LearningRate 0.0004 Epoch: 17 Global Step: 29810 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:01:39,181-Speed 25174.72 samples/sec Loss 2.3741 LearningRate 0.0004 Epoch: 17 Global Step: 29820 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:01:48,905-Speed 25278.42 samples/sec Loss 2.3858 LearningRate 0.0004 Epoch: 17 Global Step: 29830 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:01:58,583-Speed 25397.37 samples/sec Loss 2.4747 LearningRate 0.0004 Epoch: 17 Global Step: 29840 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:02:08,366-Speed 25126.20 samples/sec Loss 2.4165 LearningRate 0.0004 Epoch: 17 Global Step: 29850 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:02:18,116-Speed 25210.02 samples/sec Loss 2.3894 LearningRate 0.0004 Epoch: 17 Global Step: 29860 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:02:27,828-Speed 25307.93 samples/sec Loss 2.3766 LearningRate 0.0004 Epoch: 17 Global Step: 29870 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:02:37,502-Speed 25411.15 samples/sec Loss 2.3640 LearningRate 0.0004 Epoch: 17 Global Step: 29880 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:02:47,261-Speed 25185.33 samples/sec Loss 2.3750 LearningRate 0.0004 Epoch: 17 Global Step: 29890 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:02:56,999-Speed 25242.32 samples/sec Loss 2.3416 LearningRate 0.0004 Epoch: 17 Global Step: 29900 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:03:06,781-Speed 25128.02 samples/sec Loss 2.3550 LearningRate 0.0004 Epoch: 17 Global Step: 29910 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:03:16,468-Speed 25372.45 samples/sec Loss 2.3560 LearningRate 0.0004 Epoch: 17 Global Step: 29920 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:03:26,216-Speed 25217.52 samples/sec Loss 2.3729 LearningRate 0.0004 Epoch: 17 Global Step: 29930 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:03:35,998-Speed 25127.96 samples/sec Loss 2.3658 LearningRate 0.0004 Epoch: 17 Global Step: 29940 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:03:45,658-Speed 25442.56 samples/sec Loss 2.3532 LearningRate 0.0004 Epoch: 17 Global Step: 29950 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:03:55,400-Speed 25234.72 samples/sec Loss 2.3420 LearningRate 0.0004 Epoch: 17 Global Step: 29960 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:04:05,163-Speed 25175.54 samples/sec Loss 2.3742 LearningRate 0.0004 Epoch: 17 Global Step: 29970 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:04:14,926-Speed 25175.62 samples/sec Loss 2.3778 LearningRate 0.0004 Epoch: 17 Global Step: 29980 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:04:24,625-Speed 25342.94 samples/sec Loss 2.3855 LearningRate 0.0004 Epoch: 17 Global Step: 29990 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:04:34,353-Speed 25269.36 samples/sec Loss 2.4152 LearningRate 0.0004 Epoch: 17 Global Step: 30000 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:04:44,059-Speed 25321.81 samples/sec Loss 2.3813 LearningRate 0.0004 Epoch: 17 Global Step: 30010 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:04:53,763-Speed 25329.30 samples/sec Loss 2.3651 LearningRate 0.0004 Epoch: 17 Global Step: 30020 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:05:03,493-Speed 25263.90 samples/sec Loss 2.3489 LearningRate 0.0004 Epoch: 17 Global Step: 30030 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:05:13,256-Speed 25179.97 samples/sec Loss 2.3567 LearningRate 0.0004 Epoch: 17 Global Step: 30040 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:05:22,935-Speed 25397.27 samples/sec Loss 2.3742 LearningRate 0.0004 Epoch: 17 Global Step: 30050 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:05:32,715-Speed 25131.54 samples/sec Loss 2.3545 LearningRate 0.0004 Epoch: 17 Global Step: 30060 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:05:42,466-Speed 25208.68 samples/sec Loss 2.3492 LearningRate 0.0004 Epoch: 17 Global Step: 30070 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:05:52,221-Speed 25196.70 samples/sec Loss 2.3715 LearningRate 0.0004 Epoch: 17 Global Step: 30080 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:06:02,001-Speed 25131.72 samples/sec Loss 2.3562 LearningRate 0.0004 Epoch: 17 Global Step: 30090 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:06:11,723-Speed 25284.50 samples/sec Loss 2.3601 LearningRate 0.0004 Epoch: 17 Global Step: 30100 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:06:21,439-Speed 25303.37 samples/sec Loss 2.3404 LearningRate 0.0004 Epoch: 17 Global Step: 30110 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:06:31,178-Speed 25237.89 samples/sec Loss 2.3582 LearningRate 0.0004 Epoch: 17 Global Step: 30120 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:06:40,927-Speed 25212.45 samples/sec Loss 2.3665 LearningRate 0.0004 Epoch: 17 Global Step: 30130 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:06:50,722-Speed 25097.37 samples/sec Loss 2.3438 LearningRate 0.0004 Epoch: 17 Global Step: 30140 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:07:00,412-Speed 25366.55 samples/sec Loss 2.3616 LearningRate 0.0004 Epoch: 17 Global Step: 30150 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:07:10,213-Speed 25079.28 samples/sec Loss 2.3618 LearningRate 0.0004 Epoch: 17 Global Step: 30160 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:07:20,043-Speed 25004.06 samples/sec Loss 2.3566 LearningRate 0.0004 Epoch: 17 Global Step: 30170 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:07:29,924-Speed 24880.52 samples/sec Loss 2.3632 LearningRate 0.0004 Epoch: 17 Global Step: 30180 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:07:39,685-Speed 25179.85 samples/sec Loss 2.3675 LearningRate 0.0004 Epoch: 17 Global Step: 30190 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:07:49,443-Speed 25190.53 samples/sec Loss 2.3551 LearningRate 0.0004 Epoch: 17 Global Step: 30200 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:07:59,227-Speed 25129.70 samples/sec Loss 2.3483 LearningRate 0.0004 Epoch: 17 Global Step: 30210 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:08:08,992-Speed 25169.69 samples/sec Loss 2.3607 LearningRate 0.0004 Epoch: 17 Global Step: 30220 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:08:18,726-Speed 25252.03 samples/sec Loss 2.3389 LearningRate 0.0004 Epoch: 17 Global Step: 30230 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:08:28,421-Speed 25354.27 samples/sec Loss 2.3445 LearningRate 0.0004 Epoch: 17 Global Step: 30240 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:08:38,187-Speed 25168.03 samples/sec Loss 2.3298 LearningRate 0.0004 Epoch: 17 Global Step: 30250 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:08:48,034-Speed 24963.13 samples/sec Loss 2.3256 LearningRate 0.0004 Epoch: 17 Global Step: 30260 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:08:58,019-Speed 24615.18 samples/sec Loss 2.3617 LearningRate 0.0004 Epoch: 17 Global Step: 30270 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:09:08,069-Speed 24457.66 samples/sec Loss 2.3551 LearningRate 0.0004 Epoch: 17 Global Step: 30280 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:09:18,045-Speed 24638.56 samples/sec Loss 2.3505 LearningRate 0.0004 Epoch: 17 Global Step: 30290 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:09:28,143-Speed 24342.49 samples/sec Loss 2.3499 LearningRate 0.0004 Epoch: 17 Global Step: 30300 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:09:38,182-Speed 24484.56 samples/sec Loss 2.3392 LearningRate 0.0004 Epoch: 17 Global Step: 30310 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:09:48,281-Speed 24336.46 samples/sec Loss 2.3246 LearningRate 0.0004 Epoch: 17 Global Step: 30320 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:09:58,386-Speed 24323.01 samples/sec Loss 2.3215 LearningRate 0.0004 Epoch: 17 Global Step: 30330 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:10:08,580-Speed 24115.67 samples/sec Loss 2.3538 LearningRate 0.0004 Epoch: 17 Global Step: 30340 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:10:18,815-Speed 24012.94 samples/sec Loss 2.3456 LearningRate 0.0004 Epoch: 17 Global Step: 30350 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:10:28,882-Speed 24418.31 samples/sec Loss 2.3163 LearningRate 0.0004 Epoch: 17 Global Step: 30360 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:10:38,840-Speed 24684.32 samples/sec Loss 2.3269 LearningRate 0.0004 Epoch: 17 Global Step: 30370 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:10:48,919-Speed 24383.84 samples/sec Loss 2.3405 LearningRate 0.0004 Epoch: 17 Global Step: 30380 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:10:58,944-Speed 24521.15 samples/sec Loss 2.3281 LearningRate 0.0004 Epoch: 17 Global Step: 30390 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:11:09,112-Speed 24180.19 samples/sec Loss 2.3446 LearningRate 0.0004 Epoch: 17 Global Step: 30400 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:11:19,134-Speed 24524.14 samples/sec Loss 2.3860 LearningRate 0.0004 Epoch: 17 Global Step: 30410 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:11:29,122-Speed 24607.99 samples/sec Loss 2.3743 LearningRate 0.0004 Epoch: 17 Global Step: 30420 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:11:39,177-Speed 24445.20 samples/sec Loss 2.3253 LearningRate 0.0004 Epoch: 17 Global Step: 30430 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:11:49,186-Speed 24558.75 samples/sec Loss 2.3560 LearningRate 0.0004 Epoch: 17 Global Step: 30440 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:11:59,113-Speed 24758.97 samples/sec Loss 2.3537 LearningRate 0.0004 Epoch: 17 Global Step: 30450 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:12:09,151-Speed 24486.17 samples/sec Loss 2.3289 LearningRate 0.0004 Epoch: 17 Global Step: 30460 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:12:19,070-Speed 24779.15 samples/sec Loss 2.3521 LearningRate 0.0004 Epoch: 17 Global Step: 30470 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:12:29,154-Speed 24376.27 samples/sec Loss 2.3299 LearningRate 0.0004 Epoch: 17 Global Step: 30480 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:12:39,111-Speed 24686.89 samples/sec Loss 2.4597 LearningRate 0.0004 Epoch: 17 Global Step: 30490 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:12:49,300-Speed 24125.30 samples/sec Loss 2.3501 LearningRate 0.0004 Epoch: 17 Global Step: 30500 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:12:59,466-Speed 24177.37 samples/sec Loss 2.3340 LearningRate 0.0004 Epoch: 17 Global Step: 30510 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:13:09,494-Speed 24510.93 samples/sec Loss 2.3235 LearningRate 0.0004 Epoch: 17 Global Step: 30520 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:13:19,439-Speed 24714.95 samples/sec Loss 2.3107 LearningRate 0.0004 Epoch: 17 Global Step: 30530 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:13:29,476-Speed 24488.08 samples/sec Loss 2.3313 LearningRate 0.0004 Epoch: 17 Global Step: 30540 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:13:39,449-Speed 24645.98 samples/sec Loss 2.3517 LearningRate 0.0004 Epoch: 17 Global Step: 30550 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:13:49,411-Speed 24675.19 samples/sec Loss 2.3786 LearningRate 0.0004 Epoch: 17 Global Step: 30560 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:13:59,339-Speed 24757.02 samples/sec Loss 2.3485 LearningRate 0.0004 Epoch: 17 Global Step: 30570 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:14:09,274-Speed 24740.23 samples/sec Loss 2.3175 LearningRate 0.0004 Epoch: 17 Global Step: 30580 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:14:19,352-Speed 24388.51 samples/sec Loss 2.3332 LearningRate 0.0004 Epoch: 17 Global Step: 30590 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:14:29,362-Speed 24555.67 samples/sec Loss 2.3159 LearningRate 0.0004 Epoch: 17 Global Step: 30600 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:14:39,346-Speed 24619.22 samples/sec Loss 2.3281 LearningRate 0.0004 Epoch: 17 Global Step: 30610 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:14:49,430-Speed 24376.08 samples/sec Loss 2.3580 LearningRate 0.0004 Epoch: 17 Global Step: 30620 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:14:59,449-Speed 24531.13 samples/sec Loss 2.3165 LearningRate 0.0004 Epoch: 17 Global Step: 30630 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:15:09,445-Speed 24592.24 samples/sec Loss 2.3192 LearningRate 0.0004 Epoch: 17 Global Step: 30640 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:15:19,422-Speed 24637.01 samples/sec Loss 2.3335 LearningRate 0.0004 Epoch: 17 Global Step: 30650 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:15:29,382-Speed 24677.80 samples/sec Loss 2.3141 LearningRate 0.0004 Epoch: 17 Global Step: 30660 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:15:39,345-Speed 24673.18 samples/sec Loss 2.3034 LearningRate 0.0004 Epoch: 17 Global Step: 30670 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:15:49,395-Speed 24458.14 samples/sec Loss 2.2976 LearningRate 0.0004 Epoch: 17 Global Step: 30680 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:15:59,463-Speed 24413.41 samples/sec Loss 2.3111 LearningRate 0.0004 Epoch: 17 Global Step: 30690 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-03-26 07:16:09,359-Speed 24844.56 samples/sec Loss 2.3116 LearningRate 0.0004 Epoch: 17 Global Step: 30700 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:16:19,452-Speed 24353.89 samples/sec Loss 2.2950 LearningRate 0.0004 Epoch: 17 Global Step: 30710 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:16:29,613-Speed 24188.91 samples/sec Loss 2.3117 LearningRate 0.0004 Epoch: 17 Global Step: 30720 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:16:39,595-Speed 24624.84 samples/sec Loss 2.2996 LearningRate 0.0004 Epoch: 17 Global Step: 30730 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:16:49,759-Speed 24184.69 samples/sec Loss 2.3491 LearningRate 0.0004 Epoch: 17 Global Step: 30740 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:16:59,776-Speed 24537.60 samples/sec Loss 2.3441 LearningRate 0.0004 Epoch: 17 Global Step: 30750 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:17:10,043-Speed 23943.14 samples/sec Loss 2.3434 LearningRate 0.0004 Epoch: 17 Global Step: 30760 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:17:20,086-Speed 24475.51 samples/sec Loss 2.2887 LearningRate 0.0004 Epoch: 17 Global Step: 30770 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:17:30,263-Speed 24151.63 samples/sec Loss 2.2841 LearningRate 0.0004 Epoch: 17 Global Step: 30780 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:17:40,287-Speed 24524.12 samples/sec Loss 2.3114 LearningRate 0.0004 Epoch: 17 Global Step: 30790 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:17:50,232-Speed 24715.09 samples/sec Loss 2.2999 LearningRate 0.0004 Epoch: 17 Global Step: 30800 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:18:00,257-Speed 24518.80 samples/sec Loss 2.3071 LearningRate 0.0004 Epoch: 17 Global Step: 30810 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:18:10,237-Speed 24628.99 samples/sec Loss 2.3350 LearningRate 0.0004 Epoch: 17 Global Step: 30820 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:18:20,208-Speed 24650.65 samples/sec Loss 2.3064 LearningRate 0.0004 Epoch: 17 Global Step: 30830 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:18:30,155-Speed 24714.28 samples/sec Loss 2.2959 LearningRate 0.0004 Epoch: 17 Global Step: 30840 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:18:40,147-Speed 24600.49 samples/sec Loss 2.3112 LearningRate 0.0004 Epoch: 17 Global Step: 30850 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:18:50,069-Speed 24773.12 samples/sec Loss 2.3295 LearningRate 0.0004 Epoch: 17 Global Step: 30860 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:19:00,006-Speed 24735.22 samples/sec Loss 2.4019 LearningRate 0.0004 Epoch: 17 Global Step: 30870 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:19:09,986-Speed 24627.49 samples/sec Loss 2.3159 LearningRate 0.0004 Epoch: 17 Global Step: 30880 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:19:20,026-Speed 24481.82 samples/sec Loss 2.3041 LearningRate 0.0004 Epoch: 17 Global Step: 30890 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:19:30,030-Speed 24575.25 samples/sec Loss 2.3284 LearningRate 0.0004 Epoch: 17 Global Step: 30900 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:19:40,042-Speed 24555.57 samples/sec Loss 2.3210 LearningRate 0.0004 Epoch: 17 Global Step: 30910 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:19:49,961-Speed 24779.02 samples/sec Loss 2.3384 LearningRate 0.0004 Epoch: 17 Global Step: 30920 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:20:00,063-Speed 24331.63 samples/sec Loss 2.3283 LearningRate 0.0004 Epoch: 17 Global Step: 30930 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:20:09,994-Speed 24749.42 samples/sec Loss 2.3133 LearningRate 0.0004 Epoch: 17 Global Step: 30940 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:20:20,021-Speed 24515.19 samples/sec Loss 2.3149 LearningRate 0.0004 Epoch: 17 Global Step: 30950 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:20:29,991-Speed 24652.43 samples/sec Loss 2.3263 LearningRate 0.0004 Epoch: 17 Global Step: 30960 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:20:39,898-Speed 24809.37 samples/sec Loss 2.3186 LearningRate 0.0004 Epoch: 17 Global Step: 30970 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:20:49,834-Speed 24736.13 samples/sec Loss 2.3181 LearningRate 0.0004 Epoch: 17 Global Step: 30980 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:20:59,717-Speed 24870.79 samples/sec Loss 2.3132 LearningRate 0.0004 Epoch: 17 Global Step: 30990 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:21:09,570-Speed 24947.59 samples/sec Loss 2.3023 LearningRate 0.0004 Epoch: 17 Global Step: 31000 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:21:19,448-Speed 24882.77 samples/sec Loss 2.3133 LearningRate 0.0004 Epoch: 17 Global Step: 31010 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:21:29,243-Speed 25094.63 samples/sec Loss 2.3122 LearningRate 0.0004 Epoch: 17 Global Step: 31020 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:21:39,011-Speed 25162.82 samples/sec Loss 2.3285 LearningRate 0.0004 Epoch: 17 Global Step: 31030 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:21:48,692-Speed 25389.66 samples/sec Loss 2.3229 LearningRate 0.0004 Epoch: 17 Global Step: 31040 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:21:58,423-Speed 25258.26 samples/sec Loss 2.3173 LearningRate 0.0004 Epoch: 17 Global Step: 31050 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:22:08,302-Speed 24881.10 samples/sec Loss 2.2929 LearningRate 0.0004 Epoch: 17 Global Step: 31060 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:22:18,079-Speed 25137.80 samples/sec Loss 2.3490 LearningRate 0.0004 Epoch: 17 Global Step: 31070 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:22:27,873-Speed 25098.48 samples/sec Loss 2.3349 LearningRate 0.0004 Epoch: 17 Global Step: 31080 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:22:37,733-Speed 24927.56 samples/sec Loss 2.3291 LearningRate 0.0004 Epoch: 17 Global Step: 31090 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:22:47,504-Speed 25154.70 samples/sec Loss 2.3468 LearningRate 0.0004 Epoch: 17 Global Step: 31100 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:22:57,272-Speed 25165.88 samples/sec Loss 2.3312 LearningRate 0.0004 Epoch: 17 Global Step: 31110 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:23:57,530-Speed 4078.48 samples/sec Loss 2.2830 LearningRate 0.0004 Epoch: 18 Global Step: 31120 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:24:07,350-Speed 25031.85 samples/sec Loss 2.2754 LearningRate 0.0004 Epoch: 18 Global Step: 31130 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:24:17,102-Speed 25204.97 samples/sec Loss 2.2783 LearningRate 0.0004 Epoch: 18 Global Step: 31140 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:24:26,846-Speed 25222.61 samples/sec Loss 2.2922 LearningRate 0.0004 Epoch: 18 Global Step: 31150 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:24:36,624-Speed 25138.10 samples/sec Loss 2.2497 LearningRate 0.0004 Epoch: 18 Global Step: 31160 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:24:46,337-Speed 25307.07 samples/sec Loss 2.2556 LearningRate 0.0004 Epoch: 18 Global Step: 31170 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:24:56,074-Speed 25242.79 samples/sec Loss 2.2685 LearningRate 0.0004 Epoch: 18 Global Step: 31180 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:25:05,819-Speed 25223.10 samples/sec Loss 2.2858 LearningRate 0.0004 Epoch: 18 Global Step: 31190 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:25:15,666-Speed 24962.03 samples/sec Loss 2.2648 LearningRate 0.0004 Epoch: 18 Global Step: 31200 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:25:25,476-Speed 25053.19 samples/sec Loss 2.2820 LearningRate 0.0004 Epoch: 18 Global Step: 31210 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:25:35,343-Speed 24910.98 samples/sec Loss 2.2609 LearningRate 0.0004 Epoch: 18 Global Step: 31220 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:25:45,271-Speed 24756.95 samples/sec Loss 2.2817 LearningRate 0.0004 Epoch: 18 Global Step: 31230 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:25:55,180-Speed 24806.16 samples/sec Loss 2.2854 LearningRate 0.0004 Epoch: 18 Global Step: 31240 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:26:05,117-Speed 24735.19 samples/sec Loss 2.2855 LearningRate 0.0004 Epoch: 18 Global Step: 31250 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:26:15,007-Speed 24850.25 samples/sec Loss 2.2772 LearningRate 0.0004 Epoch: 18 Global Step: 31260 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:26:24,868-Speed 24927.12 samples/sec Loss 2.2771 LearningRate 0.0004 Epoch: 18 Global Step: 31270 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:26:34,582-Speed 25302.13 samples/sec Loss 2.2945 LearningRate 0.0004 Epoch: 18 Global Step: 31280 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:26:44,470-Speed 24857.53 samples/sec Loss 2.2760 LearningRate 0.0004 Epoch: 18 Global Step: 31290 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:26:54,199-Speed 25264.81 samples/sec Loss 2.2859 LearningRate 0.0004 Epoch: 18 Global Step: 31300 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:27:04,013-Speed 25044.15 samples/sec Loss 2.2912 LearningRate 0.0004 Epoch: 18 Global Step: 31310 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:27:13,759-Speed 25219.94 samples/sec Loss 2.2769 LearningRate 0.0004 Epoch: 18 Global Step: 31320 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:27:23,495-Speed 25245.14 samples/sec Loss 2.3055 LearningRate 0.0004 Epoch: 18 Global Step: 31330 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-03-26 07:27:33,339-Speed 24968.80 samples/sec Loss 2.2923 LearningRate 0.0004 Epoch: 18 Global Step: 31340 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:27:43,307-Speed 24657.51 samples/sec Loss 2.2936 LearningRate 0.0004 Epoch: 18 Global Step: 31350 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:27:53,230-Speed 24769.88 samples/sec Loss 2.2818 LearningRate 0.0004 Epoch: 18 Global Step: 31360 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:28:03,191-Speed 24676.19 samples/sec Loss 2.2888 LearningRate 0.0004 Epoch: 18 Global Step: 31370 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:28:13,143-Speed 24695.31 samples/sec Loss 2.2690 LearningRate 0.0004 Epoch: 18 Global Step: 31380 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:28:23,132-Speed 24607.18 samples/sec Loss 2.2761 LearningRate 0.0004 Epoch: 18 Global Step: 31390 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:28:33,145-Speed 24547.99 samples/sec Loss 2.2776 LearningRate 0.0004 Epoch: 18 Global Step: 31400 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:28:43,066-Speed 24771.85 samples/sec Loss 2.2725 LearningRate 0.0004 Epoch: 18 Global Step: 31410 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:28:53,087-Speed 24527.40 samples/sec Loss 2.2694 LearningRate 0.0004 Epoch: 18 Global Step: 31420 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:29:02,975-Speed 24857.90 samples/sec Loss 2.3129 LearningRate 0.0004 Epoch: 18 Global Step: 31430 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:29:12,883-Speed 24808.65 samples/sec Loss 2.2736 LearningRate 0.0004 Epoch: 18 Global Step: 31440 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:29:22,852-Speed 24655.48 samples/sec Loss 2.2830 LearningRate 0.0004 Epoch: 18 Global Step: 31450 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:29:32,735-Speed 24868.88 samples/sec Loss 2.2961 LearningRate 0.0004 Epoch: 18 Global Step: 31460 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:29:42,503-Speed 25162.32 samples/sec Loss 2.3036 LearningRate 0.0004 Epoch: 18 Global Step: 31470 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:29:52,202-Speed 25341.65 samples/sec Loss 2.2589 LearningRate 0.0004 Epoch: 18 Global Step: 31480 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:30:01,875-Speed 25411.39 samples/sec Loss 2.2630 LearningRate 0.0004 Epoch: 18 Global Step: 31490 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:30:11,614-Speed 25240.67 samples/sec Loss 2.2709 LearningRate 0.0004 Epoch: 18 Global Step: 31500 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:30:21,353-Speed 25238.84 samples/sec Loss 2.2846 LearningRate 0.0004 Epoch: 18 Global Step: 31510 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:30:31,159-Speed 25064.49 samples/sec Loss 2.2735 LearningRate 0.0004 Epoch: 18 Global Step: 31520 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:30:40,997-Speed 24984.38 samples/sec Loss 2.2679 LearningRate 0.0004 Epoch: 18 Global Step: 31530 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:30:50,864-Speed 24909.41 samples/sec Loss 2.2939 LearningRate 0.0004 Epoch: 18 Global Step: 31540 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:31:00,692-Speed 25008.11 samples/sec Loss 2.2796 LearningRate 0.0004 Epoch: 18 Global Step: 31550 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:31:10,369-Speed 25400.38 samples/sec Loss 2.2582 LearningRate 0.0004 Epoch: 18 Global Step: 31560 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:31:20,060-Speed 25362.65 samples/sec Loss 2.2561 LearningRate 0.0004 Epoch: 18 Global Step: 31570 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:31:29,846-Speed 25118.18 samples/sec Loss 2.2749 LearningRate 0.0004 Epoch: 18 Global Step: 31580 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:31:39,603-Speed 25191.09 samples/sec Loss 2.2780 LearningRate 0.0004 Epoch: 18 Global Step: 31590 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:31:49,354-Speed 25207.61 samples/sec Loss 2.2838 LearningRate 0.0004 Epoch: 18 Global Step: 31600 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:31:59,022-Speed 25424.14 samples/sec Loss 2.2802 LearningRate 0.0004 Epoch: 18 Global Step: 31610 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:32:08,747-Speed 25275.21 samples/sec Loss 2.2732 LearningRate 0.0004 Epoch: 18 Global Step: 31620 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:32:18,437-Speed 25363.66 samples/sec Loss 2.2656 LearningRate 0.0004 Epoch: 18 Global Step: 31630 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:32:28,116-Speed 25394.52 samples/sec Loss 2.2711 LearningRate 0.0004 Epoch: 18 Global Step: 31640 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:32:37,806-Speed 25364.61 samples/sec Loss 2.3369 LearningRate 0.0004 Epoch: 18 Global Step: 31650 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:32:47,653-Speed 24961.86 samples/sec Loss 2.3132 LearningRate 0.0004 Epoch: 18 Global Step: 31660 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:32:57,340-Speed 25371.77 samples/sec Loss 2.2890 LearningRate 0.0004 Epoch: 18 Global Step: 31670 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:33:07,039-Speed 25341.14 samples/sec Loss 2.2634 LearningRate 0.0004 Epoch: 18 Global Step: 31680 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:33:16,776-Speed 25242.83 samples/sec Loss 2.2720 LearningRate 0.0004 Epoch: 18 Global Step: 31690 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:33:26,472-Speed 25351.19 samples/sec Loss 2.2714 LearningRate 0.0004 Epoch: 18 Global Step: 31700 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:33:36,177-Speed 25326.83 samples/sec Loss 2.2627 LearningRate 0.0004 Epoch: 18 Global Step: 31710 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:33:46,010-Speed 24997.34 samples/sec Loss 2.2647 LearningRate 0.0004 Epoch: 18 Global Step: 31720 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:33:55,724-Speed 25302.37 samples/sec Loss 2.2743 LearningRate 0.0004 Epoch: 18 Global Step: 31730 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:34:05,400-Speed 25408.33 samples/sec Loss 2.2592 LearningRate 0.0004 Epoch: 18 Global Step: 31740 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:34:15,081-Speed 25389.20 samples/sec Loss 2.2468 LearningRate 0.0004 Epoch: 18 Global Step: 31750 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:34:24,782-Speed 25336.00 samples/sec Loss 2.2484 LearningRate 0.0004 Epoch: 18 Global Step: 31760 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:34:34,510-Speed 25265.34 samples/sec Loss 2.2599 LearningRate 0.0004 Epoch: 18 Global Step: 31770 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:34:44,293-Speed 25127.29 samples/sec Loss 2.2887 LearningRate 0.0004 Epoch: 18 Global Step: 31780 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:34:54,022-Speed 25263.29 samples/sec Loss 2.2519 LearningRate 0.0004 Epoch: 18 Global Step: 31790 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:35:03,853-Speed 25003.79 samples/sec Loss 2.2703 LearningRate 0.0004 Epoch: 18 Global Step: 31800 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:35:13,681-Speed 25008.67 samples/sec Loss 2.2451 LearningRate 0.0004 Epoch: 18 Global Step: 31810 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:35:23,449-Speed 25162.44 samples/sec Loss 2.2664 LearningRate 0.0004 Epoch: 18 Global Step: 31820 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:35:33,337-Speed 24857.46 samples/sec Loss 2.2812 LearningRate 0.0004 Epoch: 18 Global Step: 31830 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:35:43,329-Speed 24598.61 samples/sec Loss 2.2730 LearningRate 0.0004 Epoch: 18 Global Step: 31840 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:35:53,203-Speed 24894.83 samples/sec Loss 2.2630 LearningRate 0.0004 Epoch: 18 Global Step: 31850 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:36:03,044-Speed 24975.78 samples/sec Loss 2.2605 LearningRate 0.0004 Epoch: 18 Global Step: 31860 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:36:12,849-Speed 25064.93 samples/sec Loss 2.2575 LearningRate 0.0004 Epoch: 18 Global Step: 31870 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:36:22,624-Speed 25145.13 samples/sec Loss 2.2641 LearningRate 0.0004 Epoch: 18 Global Step: 31880 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-26 07:36:32,364-Speed 25235.62 samples/sec Loss 2.2340 LearningRate 0.0004 Epoch: 18 Global Step: 31890 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:36:42,111-Speed 25216.84 samples/sec Loss 2.2466 LearningRate 0.0004 Epoch: 18 Global Step: 31900 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:36:51,826-Speed 25301.05 samples/sec Loss 2.2627 LearningRate 0.0004 Epoch: 18 Global Step: 31910 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-26 07:37:01,589-Speed 25174.02 samples/sec Loss 2.3177 LearningRate 0.0004 Epoch: 18 Global Step: 31920 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 07:37:11,497-Speed 24808.69 samples/sec Loss 2.2947 LearningRate 0.0004 Epoch: 18 Global Step: 31930 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 07:37:21,200-Speed 25331.44 samples/sec Loss 2.2764 LearningRate 0.0004 Epoch: 18 Global Step: 31940 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 07:37:31,054-Speed 24943.58 samples/sec Loss 2.2860 LearningRate 0.0004 Epoch: 18 Global Step: 31950 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 07:37:40,866-Speed 25052.90 samples/sec Loss 2.2709 LearningRate 0.0004 Epoch: 18 Global Step: 31960 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 07:37:50,618-Speed 25204.52 samples/sec Loss 2.2246 LearningRate 0.0004 Epoch: 18 Global Step: 31970 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 07:38:00,359-Speed 25231.15 samples/sec Loss 2.2322 LearningRate 0.0004 Epoch: 18 Global Step: 31980 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 07:38:10,164-Speed 25067.80 samples/sec Loss 2.2317 LearningRate 0.0004 Epoch: 18 Global Step: 31990 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:38:20,014-Speed 24952.76 samples/sec Loss 2.2299 LearningRate 0.0004 Epoch: 18 Global Step: 32000 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:38:29,842-Speed 25007.42 samples/sec Loss 2.2397 LearningRate 0.0004 Epoch: 18 Global Step: 32010 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:38:39,818-Speed 24638.58 samples/sec Loss 2.2653 LearningRate 0.0004 Epoch: 18 Global Step: 32020 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:38:49,577-Speed 25186.64 samples/sec Loss 2.2773 LearningRate 0.0004 Epoch: 18 Global Step: 32030 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:38:59,368-Speed 25104.13 samples/sec Loss 2.2288 LearningRate 0.0004 Epoch: 18 Global Step: 32040 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:39:09,168-Speed 25080.19 samples/sec Loss 2.2177 LearningRate 0.0004 Epoch: 18 Global Step: 32050 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:39:18,933-Speed 25171.48 samples/sec Loss 2.2312 LearningRate 0.0004 Epoch: 18 Global Step: 32060 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:39:28,755-Speed 25026.60 samples/sec Loss 2.2271 LearningRate 0.0004 Epoch: 18 Global Step: 32070 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:39:38,676-Speed 24774.41 samples/sec Loss 2.2669 LearningRate 0.0004 Epoch: 18 Global Step: 32080 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:39:48,553-Speed 24886.70 samples/sec Loss 2.2665 LearningRate 0.0004 Epoch: 18 Global Step: 32090 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:39:58,410-Speed 24934.76 samples/sec Loss 2.2446 LearningRate 0.0004 Epoch: 18 Global Step: 32100 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:40:08,165-Speed 25196.41 samples/sec Loss 2.2569 LearningRate 0.0004 Epoch: 18 Global Step: 32110 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 07:40:17,866-Speed 25336.30 samples/sec Loss 2.2621 LearningRate 0.0004 Epoch: 18 Global Step: 32120 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 07:40:27,652-Speed 25114.78 samples/sec Loss 2.2534 LearningRate 0.0004 Epoch: 18 Global Step: 32130 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 07:40:37,430-Speed 25138.06 samples/sec Loss 2.2353 LearningRate 0.0004 Epoch: 18 Global Step: 32140 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 07:40:47,230-Speed 25080.31 samples/sec Loss 2.2297 LearningRate 0.0004 Epoch: 18 Global Step: 32150 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 07:40:56,968-Speed 25240.84 samples/sec Loss 2.2421 LearningRate 0.0004 Epoch: 18 Global Step: 32160 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 07:41:06,748-Speed 25132.84 samples/sec Loss 2.2332 LearningRate 0.0004 Epoch: 18 Global Step: 32170 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 07:41:16,543-Speed 25093.28 samples/sec Loss 2.2594 LearningRate 0.0004 Epoch: 18 Global Step: 32180 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 07:41:26,310-Speed 25166.54 samples/sec Loss 2.2545 LearningRate 0.0004 Epoch: 18 Global Step: 32190 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 07:41:36,067-Speed 25191.09 samples/sec Loss 2.2541 LearningRate 0.0004 Epoch: 18 Global Step: 32200 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 07:41:46,187-Speed 24288.11 samples/sec Loss 2.2264 LearningRate 0.0004 Epoch: 18 Global Step: 32210 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:41:56,029-Speed 24972.80 samples/sec Loss 2.2311 LearningRate 0.0004 Epoch: 18 Global Step: 32220 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:42:05,747-Speed 25293.97 samples/sec Loss 2.2594 LearningRate 0.0004 Epoch: 18 Global Step: 32230 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:42:15,557-Speed 25053.57 samples/sec Loss 2.2421 LearningRate 0.0004 Epoch: 18 Global Step: 32240 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:42:25,399-Speed 24974.73 samples/sec Loss 2.2386 LearningRate 0.0004 Epoch: 18 Global Step: 32250 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:42:35,215-Speed 25038.14 samples/sec Loss 2.2492 LearningRate 0.0004 Epoch: 18 Global Step: 32260 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:42:45,012-Speed 25090.04 samples/sec Loss 2.2527 LearningRate 0.0004 Epoch: 18 Global Step: 32270 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:42:54,767-Speed 25197.62 samples/sec Loss 2.3107 LearningRate 0.0004 Epoch: 18 Global Step: 32280 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:43:04,540-Speed 25150.56 samples/sec Loss 2.2709 LearningRate 0.0004 Epoch: 18 Global Step: 32290 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:43:14,297-Speed 25191.25 samples/sec Loss 2.2331 LearningRate 0.0004 Epoch: 18 Global Step: 32300 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:43:24,146-Speed 24955.57 samples/sec Loss 2.2402 LearningRate 0.0004 Epoch: 18 Global Step: 32310 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:43:33,921-Speed 25146.39 samples/sec Loss 2.2258 LearningRate 0.0003 Epoch: 18 Global Step: 32320 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:43:43,780-Speed 24930.01 samples/sec Loss 2.2334 LearningRate 0.0003 Epoch: 18 Global Step: 32330 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:43:53,574-Speed 25096.22 samples/sec Loss 2.2382 LearningRate 0.0003 Epoch: 18 Global Step: 32340 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:44:03,360-Speed 25118.34 samples/sec Loss 2.2190 LearningRate 0.0003 Epoch: 18 Global Step: 32350 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:44:13,081-Speed 25281.55 samples/sec Loss 2.2070 LearningRate 0.0003 Epoch: 18 Global Step: 32360 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:44:22,793-Speed 25308.39 samples/sec Loss 2.2030 LearningRate 0.0003 Epoch: 18 Global Step: 32370 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:44:32,587-Speed 25095.07 samples/sec Loss 2.2161 LearningRate 0.0003 Epoch: 18 Global Step: 32380 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:44:42,338-Speed 25209.45 samples/sec Loss 2.2499 LearningRate 0.0003 Epoch: 18 Global Step: 32390 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:44:52,151-Speed 25047.59 samples/sec Loss 2.2346 LearningRate 0.0003 Epoch: 18 Global Step: 32400 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:45:02,005-Speed 24944.04 samples/sec Loss 2.2348 LearningRate 0.0003 Epoch: 18 Global Step: 32410 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:45:11,820-Speed 25041.82 samples/sec Loss 2.2245 LearningRate 0.0003 Epoch: 18 Global Step: 32420 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:45:21,520-Speed 25340.75 samples/sec Loss 2.2321 LearningRate 0.0003 Epoch: 18 Global Step: 32430 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:45:31,394-Speed 24893.34 samples/sec Loss 2.2343 LearningRate 0.0003 Epoch: 18 Global Step: 32440 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:45:41,432-Speed 24486.09 samples/sec Loss 2.2366 LearningRate 0.0003 Epoch: 18 Global Step: 32450 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 07:45:51,536-Speed 24325.98 samples/sec Loss 2.2136 LearningRate 0.0003 Epoch: 18 Global Step: 32460 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 07:46:01,624-Speed 24367.43 samples/sec Loss 2.2389 LearningRate 0.0003 Epoch: 18 Global Step: 32470 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 07:46:11,802-Speed 24152.37 samples/sec Loss 2.2584 LearningRate 0.0003 Epoch: 18 Global Step: 32480 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 07:46:21,901-Speed 24337.27 samples/sec Loss 2.2407 LearningRate 0.0003 Epoch: 18 Global Step: 32490 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 07:46:32,057-Speed 24201.58 samples/sec Loss 2.2257 LearningRate 0.0003 Epoch: 18 Global Step: 32500 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 07:46:42,153-Speed 24343.72 samples/sec Loss 2.2074 LearningRate 0.0003 Epoch: 18 Global Step: 32510 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 07:46:52,167-Speed 24544.65 samples/sec Loss 2.2084 LearningRate 0.0003 Epoch: 18 Global Step: 32520 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 07:47:01,937-Speed 25156.34 samples/sec Loss 2.2580 LearningRate 0.0003 Epoch: 18 Global Step: 32530 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 07:47:11,685-Speed 25214.39 samples/sec Loss 2.2405 LearningRate 0.0003 Epoch: 18 Global Step: 32540 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 07:47:21,442-Speed 25192.76 samples/sec Loss 2.2231 LearningRate 0.0003 Epoch: 18 Global Step: 32550 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:47:31,158-Speed 25296.21 samples/sec Loss 2.2114 LearningRate 0.0003 Epoch: 18 Global Step: 32560 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:47:40,947-Speed 25109.35 samples/sec Loss 2.1928 LearningRate 0.0003 Epoch: 18 Global Step: 32570 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:47:50,704-Speed 25190.91 samples/sec Loss 2.2122 LearningRate 0.0003 Epoch: 18 Global Step: 32580 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:48:00,427-Speed 25280.19 samples/sec Loss 2.2461 LearningRate 0.0003 Epoch: 18 Global Step: 32590 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:48:10,279-Speed 24948.17 samples/sec Loss 2.2413 LearningRate 0.0003 Epoch: 18 Global Step: 32600 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:48:20,021-Speed 25230.34 samples/sec Loss 2.2342 LearningRate 0.0003 Epoch: 18 Global Step: 32610 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:48:29,805-Speed 25124.23 samples/sec Loss 2.2189 LearningRate 0.0003 Epoch: 18 Global Step: 32620 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:48:39,596-Speed 25102.53 samples/sec Loss 2.2219 LearningRate 0.0003 Epoch: 18 Global Step: 32630 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:48:49,361-Speed 25171.67 samples/sec Loss 2.2166 LearningRate 0.0003 Epoch: 18 Global Step: 32640 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:48:59,207-Speed 24963.51 samples/sec Loss 2.2087 LearningRate 0.0003 Epoch: 18 Global Step: 32650 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:49:09,046-Speed 24981.79 samples/sec Loss 2.2136 LearningRate 0.0003 Epoch: 18 Global Step: 32660 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:49:18,818-Speed 25152.62 samples/sec Loss 2.2022 LearningRate 0.0003 Epoch: 18 Global Step: 32670 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:49:28,615-Speed 25089.39 samples/sec Loss 2.2099 LearningRate 0.0003 Epoch: 18 Global Step: 32680 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:49:38,589-Speed 24642.83 samples/sec Loss 2.2288 LearningRate 0.0003 Epoch: 18 Global Step: 32690 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:49:48,339-Speed 25208.70 samples/sec Loss 2.2183 LearningRate 0.0003 Epoch: 18 Global Step: 32700 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:49:58,075-Speed 25247.24 samples/sec Loss 2.2290 LearningRate 0.0003 Epoch: 18 Global Step: 32710 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 07:50:07,922-Speed 24958.82 samples/sec Loss 2.2280 LearningRate 0.0003 Epoch: 18 Global Step: 32720 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 07:50:17,707-Speed 25120.56 samples/sec Loss 2.2171 LearningRate 0.0003 Epoch: 18 Global Step: 32730 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 07:50:27,440-Speed 25252.19 samples/sec Loss 2.2232 LearningRate 0.0003 Epoch: 18 Global Step: 32740 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 07:50:37,208-Speed 25162.20 samples/sec Loss 2.2245 LearningRate 0.0003 Epoch: 18 Global Step: 32750 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 07:50:46,929-Speed 25285.43 samples/sec Loss 2.2344 LearningRate 0.0003 Epoch: 18 Global Step: 32760 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 07:50:56,621-Speed 25359.77 samples/sec Loss 2.2155 LearningRate 0.0003 Epoch: 18 Global Step: 32770 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 07:51:06,404-Speed 25123.18 samples/sec Loss 2.2335 LearningRate 0.0003 Epoch: 18 Global Step: 32780 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 07:51:16,114-Speed 25313.36 samples/sec Loss 2.2329 LearningRate 0.0003 Epoch: 18 Global Step: 32790 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 07:51:25,965-Speed 24951.49 samples/sec Loss 2.2243 LearningRate 0.0003 Epoch: 18 Global Step: 32800 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 07:51:35,776-Speed 25051.60 samples/sec Loss 2.2434 LearningRate 0.0003 Epoch: 18 Global Step: 32810 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:51:45,511-Speed 25252.78 samples/sec Loss 2.2560 LearningRate 0.0003 Epoch: 18 Global Step: 32820 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:51:55,313-Speed 25073.84 samples/sec Loss 2.2174 LearningRate 0.0003 Epoch: 18 Global Step: 32830 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:52:54,455-Speed 4155.53 samples/sec Loss 2.2084 LearningRate 0.0003 Epoch: 19 Global Step: 32840 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:53:04,298-Speed 24970.91 samples/sec Loss 2.1768 LearningRate 0.0003 Epoch: 19 Global Step: 32850 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:53:14,377-Speed 24386.73 samples/sec Loss 2.1892 LearningRate 0.0003 Epoch: 19 Global Step: 32860 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:53:24,254-Speed 24885.19 samples/sec Loss 2.1956 LearningRate 0.0003 Epoch: 19 Global Step: 32870 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:53:34,119-Speed 24916.00 samples/sec Loss 2.1751 LearningRate 0.0003 Epoch: 19 Global Step: 32880 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:53:44,006-Speed 24860.29 samples/sec Loss 2.1754 LearningRate 0.0003 Epoch: 19 Global Step: 32890 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:53:53,969-Speed 24668.54 samples/sec Loss 2.1890 LearningRate 0.0003 Epoch: 19 Global Step: 32900 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:54:03,856-Speed 24859.69 samples/sec Loss 2.1869 LearningRate 0.0003 Epoch: 19 Global Step: 32910 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-03-26 07:54:13,842-Speed 24613.05 samples/sec Loss 2.2137 LearningRate 0.0003 Epoch: 19 Global Step: 32920 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:54:23,814-Speed 24648.29 samples/sec Loss 2.1923 LearningRate 0.0003 Epoch: 19 Global Step: 32930 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:54:33,761-Speed 24710.11 samples/sec Loss 2.2051 LearningRate 0.0003 Epoch: 19 Global Step: 32940 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:54:43,574-Speed 25046.23 samples/sec Loss 2.2021 LearningRate 0.0003 Epoch: 19 Global Step: 32950 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:54:53,314-Speed 25235.36 samples/sec Loss 2.2014 LearningRate 0.0003 Epoch: 19 Global Step: 32960 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:55:03,180-Speed 24918.26 samples/sec Loss 2.1867 LearningRate 0.0003 Epoch: 19 Global Step: 32970 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:55:13,216-Speed 24527.05 samples/sec Loss 2.1707 LearningRate 0.0003 Epoch: 19 Global Step: 32980 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:55:23,035-Speed 25031.44 samples/sec Loss 2.1495 LearningRate 0.0003 Epoch: 19 Global Step: 32990 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 07:55:32,860-Speed 25047.89 samples/sec Loss 2.1818 LearningRate 0.0003 Epoch: 19 Global Step: 33000 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 07:55:42,707-Speed 24959.65 samples/sec Loss 2.2648 LearningRate 0.0003 Epoch: 19 Global Step: 33010 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 07:55:52,514-Speed 25062.77 samples/sec Loss 2.2271 LearningRate 0.0003 Epoch: 19 Global Step: 33020 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 07:56:02,400-Speed 24861.90 samples/sec Loss 2.1789 LearningRate 0.0003 Epoch: 19 Global Step: 33030 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 07:56:12,367-Speed 24660.93 samples/sec Loss 2.1775 LearningRate 0.0003 Epoch: 19 Global Step: 33040 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 07:56:22,555-Speed 24123.73 samples/sec Loss 2.1951 LearningRate 0.0003 Epoch: 19 Global Step: 33050 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 07:56:32,825-Speed 23931.69 samples/sec Loss 2.2083 LearningRate 0.0003 Epoch: 19 Global Step: 33060 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 07:56:43,135-Speed 23840.24 samples/sec Loss 2.1897 LearningRate 0.0003 Epoch: 19 Global Step: 33070 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 07:56:53,597-Speed 23490.79 samples/sec Loss 2.2032 LearningRate 0.0003 Epoch: 19 Global Step: 33080 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 07:57:03,943-Speed 23758.39 samples/sec Loss 2.2287 LearningRate 0.0003 Epoch: 19 Global Step: 33090 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:57:14,327-Speed 23670.95 samples/sec Loss 2.2025 LearningRate 0.0003 Epoch: 19 Global Step: 33100 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:57:24,497-Speed 24166.80 samples/sec Loss 2.2003 LearningRate 0.0003 Epoch: 19 Global Step: 33110 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:57:34,453-Speed 24687.36 samples/sec Loss 2.1761 LearningRate 0.0003 Epoch: 19 Global Step: 33120 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:57:44,402-Speed 24705.48 samples/sec Loss 2.1964 LearningRate 0.0003 Epoch: 19 Global Step: 33130 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 07:57:54,334-Speed 24745.85 samples/sec Loss 2.1875 LearningRate 0.0003 Epoch: 19 Global Step: 33140 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 07:58:04,385-Speed 24457.13 samples/sec Loss 2.1921 LearningRate 0.0003 Epoch: 19 Global Step: 33150 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 07:58:14,425-Speed 24479.96 samples/sec Loss 2.2102 LearningRate 0.0003 Epoch: 19 Global Step: 33160 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 07:58:24,693-Speed 23938.48 samples/sec Loss 2.1864 LearningRate 0.0003 Epoch: 19 Global Step: 33170 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 07:58:34,844-Speed 24211.48 samples/sec Loss 2.1867 LearningRate 0.0003 Epoch: 19 Global Step: 33180 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 07:58:44,969-Speed 24275.70 samples/sec Loss 2.1938 LearningRate 0.0003 Epoch: 19 Global Step: 33190 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 07:58:55,063-Speed 24351.61 samples/sec Loss 2.1906 LearningRate 0.0003 Epoch: 19 Global Step: 33200 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 07:59:05,151-Speed 24365.16 samples/sec Loss 2.1992 LearningRate 0.0003 Epoch: 19 Global Step: 33210 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 07:59:15,218-Speed 24421.41 samples/sec Loss 2.2027 LearningRate 0.0003 Epoch: 19 Global Step: 33220 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 07:59:25,374-Speed 24201.62 samples/sec Loss 2.2057 LearningRate 0.0003 Epoch: 19 Global Step: 33230 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:59:35,335-Speed 24718.81 samples/sec Loss 2.1982 LearningRate 0.0003 Epoch: 19 Global Step: 33240 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:59:45,212-Speed 24886.00 samples/sec Loss 2.1867 LearningRate 0.0003 Epoch: 19 Global Step: 33250 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 07:59:55,115-Speed 24820.18 samples/sec Loss 2.1691 LearningRate 0.0003 Epoch: 19 Global Step: 33260 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 08:00:04,900-Speed 25156.19 samples/sec Loss 2.1822 LearningRate 0.0003 Epoch: 19 Global Step: 33270 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 08:00:14,837-Speed 25079.65 samples/sec Loss 2.1750 LearningRate 0.0003 Epoch: 19 Global Step: 33280 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 08:00:24,766-Speed 24757.83 samples/sec Loss 2.1863 LearningRate 0.0003 Epoch: 19 Global Step: 33290 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 08:00:34,700-Speed 24740.71 samples/sec Loss 2.2046 LearningRate 0.0003 Epoch: 19 Global Step: 33300 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 08:00:44,687-Speed 24631.15 samples/sec Loss 2.1950 LearningRate 0.0003 Epoch: 19 Global Step: 33310 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 08:00:54,817-Speed 24263.72 samples/sec Loss 2.1722 LearningRate 0.0003 Epoch: 19 Global Step: 33320 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 08:01:05,035-Speed 24053.87 samples/sec Loss 2.1935 LearningRate 0.0003 Epoch: 19 Global Step: 33330 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 08:01:15,301-Speed 23940.55 samples/sec Loss 2.1864 LearningRate 0.0003 Epoch: 19 Global Step: 33340 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 08:01:25,750-Speed 23523.07 samples/sec Loss 2.1956 LearningRate 0.0003 Epoch: 19 Global Step: 33350 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 08:01:36,132-Speed 23674.85 samples/sec Loss 2.1906 LearningRate 0.0003 Epoch: 19 Global Step: 33360 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 08:01:46,352-Speed 24051.02 samples/sec Loss 2.1795 LearningRate 0.0003 Epoch: 19 Global Step: 33370 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:01:56,727-Speed 23689.70 samples/sec Loss 2.1941 LearningRate 0.0003 Epoch: 19 Global Step: 33380 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:02:06,993-Speed 23941.08 samples/sec Loss 2.1833 LearningRate 0.0003 Epoch: 19 Global Step: 33390 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:02:17,288-Speed 23876.31 samples/sec Loss 2.1698 LearningRate 0.0003 Epoch: 19 Global Step: 33400 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:02:27,520-Speed 24022.48 samples/sec Loss 2.1653 LearningRate 0.0003 Epoch: 19 Global Step: 33410 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:02:37,864-Speed 23760.89 samples/sec Loss 2.1766 LearningRate 0.0003 Epoch: 19 Global Step: 33420 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:02:48,206-Speed 23767.45 samples/sec Loss 2.1904 LearningRate 0.0003 Epoch: 19 Global Step: 33430 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:02:58,520-Speed 23830.20 samples/sec Loss 2.1616 LearningRate 0.0003 Epoch: 19 Global Step: 33440 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:03:08,793-Speed 23928.52 samples/sec Loss 2.1811 LearningRate 0.0003 Epoch: 19 Global Step: 33450 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:03:19,053-Speed 23956.54 samples/sec Loss 2.2003 LearningRate 0.0003 Epoch: 19 Global Step: 33460 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:03:29,657-Speed 23177.76 samples/sec Loss 2.1731 LearningRate 0.0003 Epoch: 19 Global Step: 33470 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 08:03:40,315-Speed 23061.63 samples/sec Loss 2.1963 LearningRate 0.0003 Epoch: 19 Global Step: 33480 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 08:03:51,099-Speed 22792.53 samples/sec Loss 2.1852 LearningRate 0.0003 Epoch: 19 Global Step: 33490 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 08:04:01,821-Speed 22923.05 samples/sec Loss 2.1592 LearningRate 0.0003 Epoch: 19 Global Step: 33500 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 08:04:12,136-Speed 23830.37 samples/sec Loss 2.1702 LearningRate 0.0003 Epoch: 19 Global Step: 33510 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 08:04:22,441-Speed 23852.48 samples/sec Loss 2.1738 LearningRate 0.0003 Epoch: 19 Global Step: 33520 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 08:04:32,711-Speed 23931.96 samples/sec Loss 2.1463 LearningRate 0.0003 Epoch: 19 Global Step: 33530 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:04:42,911-Speed 24097.30 samples/sec Loss 2.1686 LearningRate 0.0003 Epoch: 19 Global Step: 33540 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:04:53,089-Speed 24150.23 samples/sec Loss 2.1698 LearningRate 0.0003 Epoch: 19 Global Step: 33550 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:05:03,262-Speed 24160.53 samples/sec Loss 2.1701 LearningRate 0.0003 Epoch: 19 Global Step: 33560 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:05:13,587-Speed 23804.95 samples/sec Loss 2.1690 LearningRate 0.0003 Epoch: 19 Global Step: 33570 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:05:23,827-Speed 24003.06 samples/sec Loss 2.1555 LearningRate 0.0003 Epoch: 19 Global Step: 33580 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:05:34,053-Speed 24036.07 samples/sec Loss 2.1521 LearningRate 0.0003 Epoch: 19 Global Step: 33590 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:05:44,229-Speed 24155.36 samples/sec Loss 2.1598 LearningRate 0.0003 Epoch: 19 Global Step: 33600 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:05:54,312-Speed 24377.46 samples/sec Loss 2.1696 LearningRate 0.0003 Epoch: 19 Global Step: 33610 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:06:04,618-Speed 23848.39 samples/sec Loss 2.1498 LearningRate 0.0003 Epoch: 19 Global Step: 33620 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:06:14,852-Speed 24018.18 samples/sec Loss 2.1568 LearningRate 0.0003 Epoch: 19 Global Step: 33630 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 08:06:25,093-Speed 24002.62 samples/sec Loss 2.1920 LearningRate 0.0003 Epoch: 19 Global Step: 33640 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 08:06:35,360-Speed 23939.33 samples/sec Loss 2.1926 LearningRate 0.0003 Epoch: 19 Global Step: 33650 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 08:06:45,583-Speed 24045.18 samples/sec Loss 2.1420 LearningRate 0.0003 Epoch: 19 Global Step: 33660 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 08:06:55,859-Speed 23917.99 samples/sec Loss 2.1564 LearningRate 0.0003 Epoch: 19 Global Step: 33670 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 08:07:06,150-Speed 23885.77 samples/sec Loss 2.1550 LearningRate 0.0003 Epoch: 19 Global Step: 33680 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 08:07:16,342-Speed 24116.13 samples/sec Loss 2.1613 LearningRate 0.0003 Epoch: 19 Global Step: 33690 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 08:07:26,480-Speed 24241.91 samples/sec Loss 2.1659 LearningRate 0.0003 Epoch: 19 Global Step: 33700 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 08:07:36,758-Speed 23916.09 samples/sec Loss 2.1858 LearningRate 0.0003 Epoch: 19 Global Step: 33710 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:07:46,976-Speed 24053.67 samples/sec Loss 2.1742 LearningRate 0.0003 Epoch: 19 Global Step: 33720 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-26 08:07:57,138-Speed 24188.65 samples/sec Loss 2.1274 LearningRate 0.0003 Epoch: 19 Global Step: 33730 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-26 08:08:07,306-Speed 24172.46 samples/sec Loss 2.1395 LearningRate 0.0003 Epoch: 19 Global Step: 33740 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-26 08:08:17,440-Speed 24259.62 samples/sec Loss 2.1697 LearningRate 0.0003 Epoch: 19 Global Step: 33750 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-26 08:08:27,602-Speed 24186.14 samples/sec Loss 2.1914 LearningRate 0.0003 Epoch: 19 Global Step: 33760 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-26 08:08:37,808-Speed 24084.72 samples/sec Loss 2.1603 LearningRate 0.0003 Epoch: 19 Global Step: 33770 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-26 08:08:48,020-Speed 24069.61 samples/sec Loss 2.1396 LearningRate 0.0003 Epoch: 19 Global Step: 33780 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-26 08:08:58,295-Speed 23921.72 samples/sec Loss 2.1544 LearningRate 0.0003 Epoch: 19 Global Step: 33790 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-26 08:09:08,570-Speed 23921.95 samples/sec Loss 2.1715 LearningRate 0.0003 Epoch: 19 Global Step: 33800 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-26 08:09:18,985-Speed 23602.48 samples/sec Loss 2.1269 LearningRate 0.0003 Epoch: 19 Global Step: 33810 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-26 08:09:29,381-Speed 23643.17 samples/sec Loss 2.1452 LearningRate 0.0003 Epoch: 19 Global Step: 33820 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:09:39,596-Speed 24062.76 samples/sec Loss 2.1443 LearningRate 0.0003 Epoch: 19 Global Step: 33830 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:09:49,858-Speed 23953.87 samples/sec Loss 2.1358 LearningRate 0.0003 Epoch: 19 Global Step: 33840 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:10:00,118-Speed 23956.13 samples/sec Loss 2.1395 LearningRate 0.0003 Epoch: 19 Global Step: 33850 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:10:10,477-Speed 23729.69 samples/sec Loss 2.1359 LearningRate 0.0003 Epoch: 19 Global Step: 33860 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:10:20,817-Speed 23771.86 samples/sec Loss 2.1520 LearningRate 0.0003 Epoch: 19 Global Step: 33870 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:10:31,094-Speed 23916.29 samples/sec Loss 2.1617 LearningRate 0.0003 Epoch: 19 Global Step: 33880 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:10:41,415-Speed 23814.95 samples/sec Loss 2.1621 LearningRate 0.0003 Epoch: 19 Global Step: 33890 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:10:51,727-Speed 23835.44 samples/sec Loss 2.1552 LearningRate 0.0003 Epoch: 19 Global Step: 33900 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:11:01,945-Speed 24055.10 samples/sec Loss 2.1547 LearningRate 0.0003 Epoch: 19 Global Step: 33910 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:11:12,377-Speed 23561.70 samples/sec Loss 2.1309 LearningRate 0.0003 Epoch: 19 Global Step: 33920 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 08:11:22,624-Speed 23988.10 samples/sec Loss 2.1577 LearningRate 0.0003 Epoch: 19 Global Step: 33930 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:11:32,864-Speed 24000.95 samples/sec Loss 2.1417 LearningRate 0.0003 Epoch: 19 Global Step: 33940 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:11:43,183-Speed 23819.75 samples/sec Loss 2.1324 LearningRate 0.0003 Epoch: 19 Global Step: 33950 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:11:53,504-Speed 23814.23 samples/sec Loss 2.1271 LearningRate 0.0003 Epoch: 19 Global Step: 33960 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:12:03,884-Speed 23678.83 samples/sec Loss 2.1310 LearningRate 0.0003 Epoch: 19 Global Step: 33970 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:12:14,208-Speed 23807.01 samples/sec Loss 2.1435 LearningRate 0.0003 Epoch: 19 Global Step: 33980 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:12:24,591-Speed 23673.62 samples/sec Loss 2.1453 LearningRate 0.0003 Epoch: 19 Global Step: 33990 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:12:34,956-Speed 23711.30 samples/sec Loss 2.1611 LearningRate 0.0003 Epoch: 19 Global Step: 34000 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:12:45,311-Speed 23736.92 samples/sec Loss 2.1832 LearningRate 0.0003 Epoch: 19 Global Step: 34010 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:12:55,731-Speed 23588.27 samples/sec Loss 2.1806 LearningRate 0.0003 Epoch: 19 Global Step: 34020 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:13:06,066-Speed 23782.72 samples/sec Loss 2.1586 LearningRate 0.0003 Epoch: 19 Global Step: 34030 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 08:13:16,491-Speed 23577.62 samples/sec Loss 2.1583 LearningRate 0.0003 Epoch: 19 Global Step: 34040 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 08:13:26,747-Speed 23967.47 samples/sec Loss 2.1666 LearningRate 0.0003 Epoch: 19 Global Step: 34050 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:13:37,120-Speed 23698.16 samples/sec Loss 2.1269 LearningRate 0.0003 Epoch: 19 Global Step: 34060 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:13:47,429-Speed 23843.76 samples/sec Loss 2.1506 LearningRate 0.0003 Epoch: 19 Global Step: 34070 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:13:57,612-Speed 24140.79 samples/sec Loss 2.1325 LearningRate 0.0003 Epoch: 19 Global Step: 34080 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:14:07,715-Speed 24328.09 samples/sec Loss 2.1531 LearningRate 0.0003 Epoch: 19 Global Step: 34090 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:14:17,931-Speed 24060.26 samples/sec Loss 2.1706 LearningRate 0.0003 Epoch: 19 Global Step: 34100 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:14:28,292-Speed 23722.94 samples/sec Loss 2.1522 LearningRate 0.0003 Epoch: 19 Global Step: 34110 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:14:38,700-Speed 23614.36 samples/sec Loss 2.1345 LearningRate 0.0003 Epoch: 19 Global Step: 34120 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:14:48,999-Speed 23868.10 samples/sec Loss 2.1348 LearningRate 0.0003 Epoch: 19 Global Step: 34130 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:14:59,423-Speed 23579.09 samples/sec Loss 2.1451 LearningRate 0.0003 Epoch: 19 Global Step: 34140 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:15:09,721-Speed 23868.23 samples/sec Loss 2.1500 LearningRate 0.0003 Epoch: 19 Global Step: 34150 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 08:15:20,067-Speed 23756.61 samples/sec Loss 2.1564 LearningRate 0.0003 Epoch: 19 Global Step: 34160 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 08:15:30,477-Speed 23613.05 samples/sec Loss 2.1409 LearningRate 0.0003 Epoch: 19 Global Step: 34170 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 08:15:40,793-Speed 23826.02 samples/sec Loss 2.1728 LearningRate 0.0003 Epoch: 19 Global Step: 34180 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 08:15:51,157-Speed 23714.85 samples/sec Loss 2.1595 LearningRate 0.0003 Epoch: 19 Global Step: 34190 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:16:01,604-Speed 23528.44 samples/sec Loss 2.1319 LearningRate 0.0003 Epoch: 19 Global Step: 34200 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:16:12,049-Speed 23532.15 samples/sec Loss 2.1189 LearningRate 0.0003 Epoch: 19 Global Step: 34210 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:16:22,413-Speed 23718.65 samples/sec Loss 2.1216 LearningRate 0.0003 Epoch: 19 Global Step: 34220 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:16:32,854-Speed 23541.29 samples/sec Loss 2.1476 LearningRate 0.0003 Epoch: 19 Global Step: 34230 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:16:43,203-Speed 23750.43 samples/sec Loss 2.1378 LearningRate 0.0003 Epoch: 19 Global Step: 34240 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:16:53,597-Speed 23646.72 samples/sec Loss 2.1515 LearningRate 0.0003 Epoch: 19 Global Step: 34250 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:17:03,935-Speed 23774.64 samples/sec Loss 2.1474 LearningRate 0.0003 Epoch: 19 Global Step: 34260 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:17:14,414-Speed 23457.56 samples/sec Loss 2.1128 LearningRate 0.0003 Epoch: 19 Global Step: 34270 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:17:24,757-Speed 23763.11 samples/sec Loss 2.1198 LearningRate 0.0003 Epoch: 19 Global Step: 34280 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:17:34,945-Speed 24123.93 samples/sec Loss 2.1340 LearningRate 0.0003 Epoch: 19 Global Step: 34290 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:17:45,167-Speed 24047.26 samples/sec Loss 2.1291 LearningRate 0.0003 Epoch: 19 Global Step: 34300 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:17:55,558-Speed 23653.73 samples/sec Loss 2.1341 LearningRate 0.0003 Epoch: 19 Global Step: 34310 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:18:05,886-Speed 23807.19 samples/sec Loss 2.1368 LearningRate 0.0003 Epoch: 19 Global Step: 34320 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:18:16,217-Speed 23790.60 samples/sec Loss 2.1404 LearningRate 0.0003 Epoch: 19 Global Step: 34330 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:18:26,514-Speed 23872.23 samples/sec Loss 2.1511 LearningRate 0.0003 Epoch: 19 Global Step: 34340 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:18:36,808-Speed 23876.64 samples/sec Loss 2.1447 LearningRate 0.0003 Epoch: 19 Global Step: 34350 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:18:47,192-Speed 23675.41 samples/sec Loss 2.1474 LearningRate 0.0003 Epoch: 19 Global Step: 34360 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:18:57,502-Speed 23840.79 samples/sec Loss 2.1071 LearningRate 0.0003 Epoch: 19 Global Step: 34370 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:19:07,791-Speed 23888.15 samples/sec Loss 2.1611 LearningRate 0.0003 Epoch: 19 Global Step: 34380 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:19:18,072-Speed 23908.65 samples/sec Loss 2.1820 LearningRate 0.0003 Epoch: 19 Global Step: 34390 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 08:19:28,377-Speed 23852.52 samples/sec Loss 2.1686 LearningRate 0.0003 Epoch: 19 Global Step: 34400 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:19:38,717-Speed 23769.99 samples/sec Loss 2.2115 LearningRate 0.0003 Epoch: 19 Global Step: 34410 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:19:48,985-Speed 23939.08 samples/sec Loss 2.1346 LearningRate 0.0003 Epoch: 19 Global Step: 34420 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:19:59,270-Speed 23896.72 samples/sec Loss 2.1229 LearningRate 0.0003 Epoch: 19 Global Step: 34430 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:20:09,532-Speed 23952.76 samples/sec Loss 2.1267 LearningRate 0.0003 Epoch: 19 Global Step: 34440 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:20:19,811-Speed 23911.81 samples/sec Loss 2.1339 LearningRate 0.0003 Epoch: 19 Global Step: 34450 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:20:30,134-Speed 23810.32 samples/sec Loss 2.1166 LearningRate 0.0003 Epoch: 19 Global Step: 34460 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:20:40,491-Speed 23737.28 samples/sec Loss 2.1234 LearningRate 0.0003 Epoch: 19 Global Step: 34470 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:20:50,827-Speed 23781.06 samples/sec Loss 2.1218 LearningRate 0.0003 Epoch: 19 Global Step: 34480 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:21:01,159-Speed 23790.79 samples/sec Loss 2.1144 LearningRate 0.0003 Epoch: 19 Global Step: 34490 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:21:11,514-Speed 23736.46 samples/sec Loss 2.1203 LearningRate 0.0003 Epoch: 19 Global Step: 34500 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 08:21:21,812-Speed 23870.00 samples/sec Loss 2.1447 LearningRate 0.0003 Epoch: 19 Global Step: 34510 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:21:32,202-Speed 23656.20 samples/sec Loss 2.1355 LearningRate 0.0003 Epoch: 19 Global Step: 34520 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:21:42,515-Speed 23833.59 samples/sec Loss 2.1384 LearningRate 0.0003 Epoch: 19 Global Step: 34530 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:21:52,822-Speed 23846.60 samples/sec Loss 2.1108 LearningRate 0.0003 Epoch: 19 Global Step: 34540 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:22:03,202-Speed 23679.06 samples/sec Loss 2.1298 LearningRate 0.0003 Epoch: 19 Global Step: 34550 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:22:13,589-Speed 23662.39 samples/sec Loss 2.1388 LearningRate 0.0003 Epoch: 19 Global Step: 34560 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:23:13,869-Speed 4077.11 samples/sec Loss 2.1554 LearningRate 0.0003 Epoch: 20 Global Step: 34570 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:23:23,990-Speed 24283.98 samples/sec Loss 2.1092 LearningRate 0.0003 Epoch: 20 Global Step: 34580 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:23:34,124-Speed 24256.55 samples/sec Loss 2.0875 LearningRate 0.0003 Epoch: 20 Global Step: 34590 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:23:44,252-Speed 24267.66 samples/sec Loss 2.0993 LearningRate 0.0003 Epoch: 20 Global Step: 34600 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:23:54,490-Speed 24007.94 samples/sec Loss 2.1105 LearningRate 0.0003 Epoch: 20 Global Step: 34610 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 08:24:04,791-Speed 23860.94 samples/sec Loss 2.0890 LearningRate 0.0003 Epoch: 20 Global Step: 34620 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 08:24:14,987-Speed 24111.66 samples/sec Loss 2.0888 LearningRate 0.0003 Epoch: 20 Global Step: 34630 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 08:24:25,185-Speed 24101.98 samples/sec Loss 2.0839 LearningRate 0.0003 Epoch: 20 Global Step: 34640 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 08:24:35,212-Speed 24514.59 samples/sec Loss 2.1078 LearningRate 0.0003 Epoch: 20 Global Step: 34650 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 08:24:45,301-Speed 24362.50 samples/sec Loss 2.1324 LearningRate 0.0003 Epoch: 20 Global Step: 34660 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 08:24:55,562-Speed 23954.13 samples/sec Loss 2.1259 LearningRate 0.0003 Epoch: 20 Global Step: 34670 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:25:05,867-Speed 23852.79 samples/sec Loss 2.0840 LearningRate 0.0003 Epoch: 20 Global Step: 34680 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:25:16,042-Speed 24156.90 samples/sec Loss 2.1094 LearningRate 0.0003 Epoch: 20 Global Step: 34690 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:25:26,214-Speed 24163.40 samples/sec Loss 2.1047 LearningRate 0.0003 Epoch: 20 Global Step: 34700 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:25:36,493-Speed 23913.49 samples/sec Loss 2.1330 LearningRate 0.0003 Epoch: 20 Global Step: 34710 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:25:46,690-Speed 24103.46 samples/sec Loss 2.0937 LearningRate 0.0003 Epoch: 20 Global Step: 34720 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:25:57,039-Speed 23757.41 samples/sec Loss 2.1037 LearningRate 0.0003 Epoch: 20 Global Step: 34730 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:26:07,224-Speed 24132.59 samples/sec Loss 2.1321 LearningRate 0.0003 Epoch: 20 Global Step: 34740 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:26:17,227-Speed 24572.05 samples/sec Loss 2.1320 LearningRate 0.0003 Epoch: 20 Global Step: 34750 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:26:27,291-Speed 24422.08 samples/sec Loss 2.1223 LearningRate 0.0003 Epoch: 20 Global Step: 34760 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:26:36,990-Speed 25343.29 samples/sec Loss 2.1033 LearningRate 0.0003 Epoch: 20 Global Step: 34770 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 08:26:46,712-Speed 25282.54 samples/sec Loss 2.0886 LearningRate 0.0003 Epoch: 20 Global Step: 34780 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:26:56,445-Speed 25259.27 samples/sec Loss 2.1236 LearningRate 0.0003 Epoch: 20 Global Step: 34790 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:27:06,186-Speed 25235.27 samples/sec Loss 2.1543 LearningRate 0.0003 Epoch: 20 Global Step: 34800 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:27:16,090-Speed 24816.86 samples/sec Loss 2.1217 LearningRate 0.0003 Epoch: 20 Global Step: 34810 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:27:25,931-Speed 24976.55 samples/sec Loss 2.1179 LearningRate 0.0003 Epoch: 20 Global Step: 34820 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:27:35,791-Speed 24930.72 samples/sec Loss 2.1146 LearningRate 0.0003 Epoch: 20 Global Step: 34830 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:27:45,712-Speed 24775.40 samples/sec Loss 2.1041 LearningRate 0.0003 Epoch: 20 Global Step: 34840 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:27:55,404-Speed 25360.06 samples/sec Loss 2.0841 LearningRate 0.0003 Epoch: 20 Global Step: 34850 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:28:05,250-Speed 24963.17 samples/sec Loss 2.0795 LearningRate 0.0003 Epoch: 20 Global Step: 34860 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:28:14,999-Speed 25213.20 samples/sec Loss 2.0841 LearningRate 0.0003 Epoch: 20 Global Step: 34870 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:28:24,642-Speed 25488.20 samples/sec Loss 2.1096 LearningRate 0.0003 Epoch: 20 Global Step: 34880 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:28:34,334-Speed 25359.91 samples/sec Loss 2.1166 LearningRate 0.0003 Epoch: 20 Global Step: 34890 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:28:44,122-Speed 25112.45 samples/sec Loss 2.1297 LearningRate 0.0003 Epoch: 20 Global Step: 34900 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:28:53,943-Speed 25027.36 samples/sec Loss 2.0974 LearningRate 0.0003 Epoch: 20 Global Step: 34910 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:29:03,697-Speed 25199.74 samples/sec Loss 2.1170 LearningRate 0.0003 Epoch: 20 Global Step: 34920 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:29:13,483-Speed 25123.15 samples/sec Loss 2.1097 LearningRate 0.0003 Epoch: 20 Global Step: 34930 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:29:23,298-Speed 25041.15 samples/sec Loss 2.1575 LearningRate 0.0003 Epoch: 20 Global Step: 34940 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:29:33,025-Speed 25270.08 samples/sec Loss 2.0977 LearningRate 0.0003 Epoch: 20 Global Step: 34950 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:29:42,705-Speed 25392.45 samples/sec Loss 2.1042 LearningRate 0.0003 Epoch: 20 Global Step: 34960 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:29:52,386-Speed 25389.42 samples/sec Loss 2.0955 LearningRate 0.0003 Epoch: 20 Global Step: 34970 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:30:02,114-Speed 25265.89 samples/sec Loss 2.1014 LearningRate 0.0003 Epoch: 20 Global Step: 34980 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 08:30:11,827-Speed 25305.11 samples/sec Loss 2.1108 LearningRate 0.0003 Epoch: 20 Global Step: 34990 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 08:30:21,613-Speed 25117.10 samples/sec Loss 2.1071 LearningRate 0.0003 Epoch: 20 Global Step: 35000 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:30:31,332-Speed 25293.23 samples/sec Loss 2.1075 LearningRate 0.0003 Epoch: 20 Global Step: 35010 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:30:41,010-Speed 25396.66 samples/sec Loss 2.0848 LearningRate 0.0003 Epoch: 20 Global Step: 35020 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:30:50,719-Speed 25316.22 samples/sec Loss 2.1011 LearningRate 0.0003 Epoch: 20 Global Step: 35030 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:31:00,679-Speed 24676.86 samples/sec Loss 2.1075 LearningRate 0.0003 Epoch: 20 Global Step: 35040 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:31:10,473-Speed 25096.73 samples/sec Loss 2.1625 LearningRate 0.0003 Epoch: 20 Global Step: 35050 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:31:20,202-Speed 25261.83 samples/sec Loss 2.1603 LearningRate 0.0003 Epoch: 20 Global Step: 35060 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:31:29,901-Speed 25342.57 samples/sec Loss 2.1140 LearningRate 0.0003 Epoch: 20 Global Step: 35070 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:31:39,692-Speed 25104.22 samples/sec Loss 2.1023 LearningRate 0.0003 Epoch: 20 Global Step: 35080 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:31:49,462-Speed 25159.54 samples/sec Loss 2.0854 LearningRate 0.0003 Epoch: 20 Global Step: 35090 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:31:59,229-Speed 25166.94 samples/sec Loss 2.1018 LearningRate 0.0003 Epoch: 20 Global Step: 35100 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 08:32:08,946-Speed 25294.79 samples/sec Loss 2.1066 LearningRate 0.0003 Epoch: 20 Global Step: 35110 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 08:32:18,654-Speed 25317.52 samples/sec Loss 2.1188 LearningRate 0.0003 Epoch: 20 Global Step: 35120 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 08:32:28,358-Speed 25331.13 samples/sec Loss 2.1257 LearningRate 0.0003 Epoch: 20 Global Step: 35130 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 08:32:38,071-Speed 25307.70 samples/sec Loss 2.0764 LearningRate 0.0003 Epoch: 20 Global Step: 35140 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 08:32:47,816-Speed 25225.26 samples/sec Loss 2.0710 LearningRate 0.0003 Epoch: 20 Global Step: 35150 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 08:32:57,530-Speed 25301.13 samples/sec Loss 2.0838 LearningRate 0.0003 Epoch: 20 Global Step: 35160 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 08:33:07,313-Speed 25126.64 samples/sec Loss 2.0852 LearningRate 0.0003 Epoch: 20 Global Step: 35170 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:33:17,052-Speed 25240.74 samples/sec Loss 2.0976 LearningRate 0.0003 Epoch: 20 Global Step: 35180 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:33:26,808-Speed 25194.42 samples/sec Loss 2.0972 LearningRate 0.0003 Epoch: 20 Global Step: 35190 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:33:36,569-Speed 25180.46 samples/sec Loss 2.0906 LearningRate 0.0003 Epoch: 20 Global Step: 35200 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:33:46,355-Speed 25116.20 samples/sec Loss 2.1047 LearningRate 0.0003 Epoch: 20 Global Step: 35210 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:33:56,053-Speed 25343.65 samples/sec Loss 2.0916 LearningRate 0.0003 Epoch: 20 Global Step: 35220 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:34:05,758-Speed 25326.99 samples/sec Loss 2.1071 LearningRate 0.0003 Epoch: 20 Global Step: 35230 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:34:15,516-Speed 25189.55 samples/sec Loss 2.0894 LearningRate 0.0003 Epoch: 20 Global Step: 35240 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:34:25,222-Speed 25325.53 samples/sec Loss 2.0941 LearningRate 0.0003 Epoch: 20 Global Step: 35250 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:34:35,106-Speed 24866.61 samples/sec Loss 2.0808 LearningRate 0.0003 Epoch: 20 Global Step: 35260 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:34:44,876-Speed 25157.64 samples/sec Loss 2.1146 LearningRate 0.0003 Epoch: 20 Global Step: 35270 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 08:34:54,657-Speed 25130.86 samples/sec Loss 2.0928 LearningRate 0.0003 Epoch: 20 Global Step: 35280 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 08:35:04,598-Speed 24727.41 samples/sec Loss 2.0840 LearningRate 0.0003 Epoch: 20 Global Step: 35290 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 08:35:14,351-Speed 25201.23 samples/sec Loss 2.0661 LearningRate 0.0003 Epoch: 20 Global Step: 35300 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:35:24,064-Speed 25303.53 samples/sec Loss 2.0603 LearningRate 0.0003 Epoch: 20 Global Step: 35310 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:35:33,816-Speed 25206.45 samples/sec Loss 2.0995 LearningRate 0.0003 Epoch: 20 Global Step: 35320 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:35:43,569-Speed 25203.86 samples/sec Loss 2.1039 LearningRate 0.0003 Epoch: 20 Global Step: 35330 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:35:53,287-Speed 25290.30 samples/sec Loss 2.1026 LearningRate 0.0003 Epoch: 20 Global Step: 35340 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:36:03,084-Speed 25090.16 samples/sec Loss 2.0791 LearningRate 0.0003 Epoch: 20 Global Step: 35350 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:36:12,788-Speed 25327.83 samples/sec Loss 2.0610 LearningRate 0.0003 Epoch: 20 Global Step: 35360 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:36:22,491-Speed 25330.73 samples/sec Loss 2.0707 LearningRate 0.0003 Epoch: 20 Global Step: 35370 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:36:32,230-Speed 25237.48 samples/sec Loss 2.0788 LearningRate 0.0003 Epoch: 20 Global Step: 35380 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:36:41,935-Speed 25327.83 samples/sec Loss 2.0896 LearningRate 0.0003 Epoch: 20 Global Step: 35390 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:36:51,738-Speed 25072.51 samples/sec Loss 2.0718 LearningRate 0.0003 Epoch: 20 Global Step: 35400 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 08:37:01,489-Speed 25205.31 samples/sec Loss 2.0707 LearningRate 0.0003 Epoch: 20 Global Step: 35410 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-26 08:37:11,221-Speed 25255.11 samples/sec Loss 2.0883 LearningRate 0.0003 Epoch: 20 Global Step: 35420 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:37:20,995-Speed 25148.15 samples/sec Loss 2.0946 LearningRate 0.0003 Epoch: 20 Global Step: 35430 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:37:30,762-Speed 25164.93 samples/sec Loss 2.0641 LearningRate 0.0003 Epoch: 20 Global Step: 35440 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:37:40,515-Speed 25201.78 samples/sec Loss 2.0806 LearningRate 0.0003 Epoch: 20 Global Step: 35450 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:37:50,218-Speed 25331.35 samples/sec Loss 2.0787 LearningRate 0.0003 Epoch: 20 Global Step: 35460 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:37:59,940-Speed 25282.28 samples/sec Loss 2.0657 LearningRate 0.0003 Epoch: 20 Global Step: 35470 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:38:09,743-Speed 25076.49 samples/sec Loss 2.0626 LearningRate 0.0003 Epoch: 20 Global Step: 35480 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:38:19,576-Speed 24995.73 samples/sec Loss 2.0819 LearningRate 0.0003 Epoch: 20 Global Step: 35490 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:38:29,382-Speed 25064.51 samples/sec Loss 2.0669 LearningRate 0.0003 Epoch: 20 Global Step: 35500 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-26 08:38:39,082-Speed 25340.42 samples/sec Loss 2.0714 LearningRate 0.0003 Epoch: 20 Global Step: 35510 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:38:49,031-Speed 24704.66 samples/sec Loss 2.0728 LearningRate 0.0003 Epoch: 20 Global Step: 35520 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 08:38:58,851-Speed 25027.76 samples/sec Loss 2.0561 LearningRate 0.0003 Epoch: 20 Global Step: 35530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 08:39:08,685-Speed 24992.89 samples/sec Loss 2.0633 LearningRate 0.0003 Epoch: 20 Global Step: 35540 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 08:39:18,379-Speed 25356.68 samples/sec Loss 2.0709 LearningRate 0.0003 Epoch: 20 Global Step: 35550 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:39:28,146-Speed 25163.89 samples/sec Loss 2.0843 LearningRate 0.0003 Epoch: 20 Global Step: 35560 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:39:37,875-Speed 25265.45 samples/sec Loss 2.0771 LearningRate 0.0003 Epoch: 20 Global Step: 35570 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:39:47,601-Speed 25271.43 samples/sec Loss 2.0813 LearningRate 0.0003 Epoch: 20 Global Step: 35580 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:39:57,406-Speed 25067.16 samples/sec Loss 2.0808 LearningRate 0.0003 Epoch: 20 Global Step: 35590 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:40:07,184-Speed 25138.24 samples/sec Loss 2.0750 LearningRate 0.0003 Epoch: 20 Global Step: 35600 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:40:16,966-Speed 25127.89 samples/sec Loss 2.0861 LearningRate 0.0003 Epoch: 20 Global Step: 35610 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:40:26,708-Speed 25230.03 samples/sec Loss 2.0446 LearningRate 0.0003 Epoch: 20 Global Step: 35620 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:40:36,445-Speed 25242.03 samples/sec Loss 2.0574 LearningRate 0.0003 Epoch: 20 Global Step: 35630 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:40:46,198-Speed 25201.11 samples/sec Loss 2.0866 LearningRate 0.0003 Epoch: 20 Global Step: 35640 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:40:56,036-Speed 24984.59 samples/sec Loss 2.0684 LearningRate 0.0003 Epoch: 20 Global Step: 35650 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 08:41:05,739-Speed 25332.97 samples/sec Loss 2.0685 LearningRate 0.0003 Epoch: 20 Global Step: 35660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 08:41:15,572-Speed 24996.60 samples/sec Loss 2.0708 LearningRate 0.0003 Epoch: 20 Global Step: 35670 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 08:41:25,277-Speed 25326.73 samples/sec Loss 2.0767 LearningRate 0.0003 Epoch: 20 Global Step: 35680 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:41:35,034-Speed 25193.58 samples/sec Loss 2.1068 LearningRate 0.0003 Epoch: 20 Global Step: 35690 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:41:44,818-Speed 25123.81 samples/sec Loss 2.0662 LearningRate 0.0003 Epoch: 20 Global Step: 35700 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:41:54,513-Speed 25352.01 samples/sec Loss 2.0510 LearningRate 0.0003 Epoch: 20 Global Step: 35710 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:42:04,292-Speed 25136.86 samples/sec Loss 2.0493 LearningRate 0.0003 Epoch: 20 Global Step: 35720 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:42:14,006-Speed 25303.75 samples/sec Loss 2.0561 LearningRate 0.0003 Epoch: 20 Global Step: 35730 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:42:23,754-Speed 25214.18 samples/sec Loss 2.0491 LearningRate 0.0003 Epoch: 20 Global Step: 35740 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:42:33,567-Speed 25046.02 samples/sec Loss 2.0405 LearningRate 0.0003 Epoch: 20 Global Step: 35750 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:42:43,355-Speed 25114.25 samples/sec Loss 2.0643 LearningRate 0.0003 Epoch: 20 Global Step: 35760 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:42:53,046-Speed 25363.57 samples/sec Loss 2.0654 LearningRate 0.0003 Epoch: 20 Global Step: 35770 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:43:02,812-Speed 25169.18 samples/sec Loss 2.0594 LearningRate 0.0003 Epoch: 20 Global Step: 35780 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 08:43:12,492-Speed 25395.60 samples/sec Loss 2.0571 LearningRate 0.0003 Epoch: 20 Global Step: 35790 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:43:22,260-Speed 25163.76 samples/sec Loss 2.0458 LearningRate 0.0003 Epoch: 20 Global Step: 35800 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:43:32,012-Speed 25207.91 samples/sec Loss 2.0829 LearningRate 0.0003 Epoch: 20 Global Step: 35810 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:43:41,742-Speed 25261.06 samples/sec Loss 2.0388 LearningRate 0.0003 Epoch: 20 Global Step: 35820 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:43:51,570-Speed 25009.13 samples/sec Loss 2.0420 LearningRate 0.0003 Epoch: 20 Global Step: 35830 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:44:01,382-Speed 25051.96 samples/sec Loss 2.0509 LearningRate 0.0003 Epoch: 20 Global Step: 35840 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:44:11,191-Speed 25057.86 samples/sec Loss 2.0454 LearningRate 0.0003 Epoch: 20 Global Step: 35850 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:44:20,962-Speed 25154.09 samples/sec Loss 2.0605 LearningRate 0.0003 Epoch: 20 Global Step: 35860 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:44:30,683-Speed 25283.70 samples/sec Loss 2.0823 LearningRate 0.0003 Epoch: 20 Global Step: 35870 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:44:40,438-Speed 25199.40 samples/sec Loss 2.0795 LearningRate 0.0003 Epoch: 20 Global Step: 35880 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:44:50,160-Speed 25280.26 samples/sec Loss 2.0650 LearningRate 0.0003 Epoch: 20 Global Step: 35890 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 08:44:59,880-Speed 25288.03 samples/sec Loss 2.0554 LearningRate 0.0003 Epoch: 20 Global Step: 35900 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 08:45:09,636-Speed 25192.64 samples/sec Loss 2.0529 LearningRate 0.0003 Epoch: 20 Global Step: 35910 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 08:45:19,429-Speed 25099.17 samples/sec Loss 2.0538 LearningRate 0.0003 Epoch: 20 Global Step: 35920 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:45:29,211-Speed 25127.71 samples/sec Loss 2.0540 LearningRate 0.0003 Epoch: 20 Global Step: 35930 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:45:39,066-Speed 24943.66 samples/sec Loss 2.0815 LearningRate 0.0003 Epoch: 20 Global Step: 35940 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:45:48,795-Speed 25265.31 samples/sec Loss 2.0513 LearningRate 0.0003 Epoch: 20 Global Step: 35950 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:45:58,673-Speed 24882.30 samples/sec Loss 2.0335 LearningRate 0.0003 Epoch: 20 Global Step: 35960 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:46:08,563-Speed 24853.00 samples/sec Loss 2.0362 LearningRate 0.0003 Epoch: 20 Global Step: 35970 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:46:18,491-Speed 24757.97 samples/sec Loss 2.0545 LearningRate 0.0003 Epoch: 20 Global Step: 35980 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:46:28,211-Speed 25285.92 samples/sec Loss 2.0465 LearningRate 0.0003 Epoch: 20 Global Step: 35990 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:46:37,969-Speed 25189.74 samples/sec Loss 2.0496 LearningRate 0.0003 Epoch: 20 Global Step: 36000 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:46:47,757-Speed 25112.11 samples/sec Loss 2.0545 LearningRate 0.0003 Epoch: 20 Global Step: 36010 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:46:57,515-Speed 25190.77 samples/sec Loss 2.0658 LearningRate 0.0003 Epoch: 20 Global Step: 36020 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:47:07,279-Speed 25174.95 samples/sec Loss 2.0595 LearningRate 0.0003 Epoch: 20 Global Step: 36030 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:47:16,988-Speed 25315.75 samples/sec Loss 2.0415 LearningRate 0.0003 Epoch: 20 Global Step: 36040 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:47:26,686-Speed 25345.34 samples/sec Loss 2.0471 LearningRate 0.0003 Epoch: 20 Global Step: 36050 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:47:36,462-Speed 25140.39 samples/sec Loss 2.0415 LearningRate 0.0003 Epoch: 20 Global Step: 36060 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:47:46,162-Speed 25341.06 samples/sec Loss 2.0658 LearningRate 0.0003 Epoch: 20 Global Step: 36070 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:47:55,892-Speed 25259.81 samples/sec Loss 2.0457 LearningRate 0.0003 Epoch: 20 Global Step: 36080 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:48:05,620-Speed 25267.90 samples/sec Loss 2.0694 LearningRate 0.0003 Epoch: 20 Global Step: 36090 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:48:15,394-Speed 25147.44 samples/sec Loss 2.0816 LearningRate 0.0003 Epoch: 20 Global Step: 36100 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:48:25,284-Speed 24853.16 samples/sec Loss 2.0746 LearningRate 0.0003 Epoch: 20 Global Step: 36110 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:48:35,122-Speed 24983.46 samples/sec Loss 2.0610 LearningRate 0.0003 Epoch: 20 Global Step: 36120 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 08:48:44,929-Speed 25065.95 samples/sec Loss 2.0201 LearningRate 0.0003 Epoch: 20 Global Step: 36130 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 08:48:54,773-Speed 24967.31 samples/sec Loss 2.0404 LearningRate 0.0003 Epoch: 20 Global Step: 36140 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 08:49:04,503-Speed 25261.38 samples/sec Loss 2.0461 LearningRate 0.0003 Epoch: 20 Global Step: 36150 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 08:49:14,257-Speed 25200.80 samples/sec Loss 2.0490 LearningRate 0.0003 Epoch: 20 Global Step: 36160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 08:49:28,933-Speed 16746.54 samples/sec Loss 2.0672 LearningRate 0.0003 Epoch: 20 Global Step: 36170 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 08:49:38,701-Speed 25162.29 samples/sec Loss 2.0603 LearningRate 0.0003 Epoch: 20 Global Step: 36180 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 08:49:48,480-Speed 25134.59 samples/sec Loss 2.0658 LearningRate 0.0003 Epoch: 20 Global Step: 36190 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 08:49:58,316-Speed 24990.57 samples/sec Loss 2.0686 LearningRate 0.0003 Epoch: 20 Global Step: 36200 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 08:50:08,148-Speed 24998.08 samples/sec Loss 2.0366 LearningRate 0.0003 Epoch: 20 Global Step: 36210 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 08:50:17,957-Speed 25059.87 samples/sec Loss 2.0484 LearningRate 0.0003 Epoch: 20 Global Step: 36220 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:50:27,707-Speed 25208.35 samples/sec Loss 2.0412 LearningRate 0.0003 Epoch: 20 Global Step: 36230 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:50:37,391-Speed 25383.28 samples/sec Loss 2.0499 LearningRate 0.0003 Epoch: 20 Global Step: 36240 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:50:47,215-Speed 25019.47 samples/sec Loss 2.0522 LearningRate 0.0003 Epoch: 20 Global Step: 36250 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:50:56,945-Speed 25262.71 samples/sec Loss 2.0409 LearningRate 0.0003 Epoch: 20 Global Step: 36260 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:51:06,692-Speed 25216.17 samples/sec Loss 2.0684 LearningRate 0.0003 Epoch: 20 Global Step: 36270 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:51:16,503-Speed 25052.39 samples/sec Loss 2.0721 LearningRate 0.0003 Epoch: 20 Global Step: 36280 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:51:26,322-Speed 25032.76 samples/sec Loss 2.0744 LearningRate 0.0003 Epoch: 20 Global Step: 36290 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:52:26,282-Speed 4098.86 samples/sec Loss 2.0421 LearningRate 0.0003 Epoch: 21 Global Step: 36300 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:52:35,970-Speed 25370.06 samples/sec Loss 2.0375 LearningRate 0.0003 Epoch: 21 Global Step: 36310 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:52:45,720-Speed 25210.71 samples/sec Loss 2.0483 LearningRate 0.0003 Epoch: 21 Global Step: 36320 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 08:52:55,448-Speed 25265.19 samples/sec Loss 2.0159 LearningRate 0.0003 Epoch: 21 Global Step: 36330 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 08:53:05,171-Speed 25282.79 samples/sec Loss 2.0003 LearningRate 0.0003 Epoch: 21 Global Step: 36340 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 08:53:14,904-Speed 25252.93 samples/sec Loss 2.0138 LearningRate 0.0003 Epoch: 21 Global Step: 36350 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:53:24,673-Speed 25161.31 samples/sec Loss 2.0388 LearningRate 0.0003 Epoch: 21 Global Step: 36360 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:53:34,487-Speed 25048.03 samples/sec Loss 2.0245 LearningRate 0.0003 Epoch: 21 Global Step: 36370 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:53:44,281-Speed 25095.20 samples/sec Loss 2.0384 LearningRate 0.0003 Epoch: 21 Global Step: 36380 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:53:54,043-Speed 25178.92 samples/sec Loss 2.0288 LearningRate 0.0003 Epoch: 21 Global Step: 36390 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:54:03,775-Speed 25255.05 samples/sec Loss 2.0245 LearningRate 0.0003 Epoch: 21 Global Step: 36400 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:54:13,505-Speed 25262.31 samples/sec Loss 2.0130 LearningRate 0.0003 Epoch: 21 Global Step: 36410 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:54:23,253-Speed 25213.76 samples/sec Loss 2.0141 LearningRate 0.0003 Epoch: 21 Global Step: 36420 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:54:33,002-Speed 25214.53 samples/sec Loss 2.0190 LearningRate 0.0003 Epoch: 21 Global Step: 36430 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:54:42,992-Speed 24603.65 samples/sec Loss 2.0403 LearningRate 0.0003 Epoch: 21 Global Step: 36440 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:54:52,865-Speed 24897.76 samples/sec Loss 2.0273 LearningRate 0.0003 Epoch: 21 Global Step: 36450 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 08:55:02,620-Speed 25204.72 samples/sec Loss 2.0211 LearningRate 0.0003 Epoch: 21 Global Step: 36460 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 08:55:12,350-Speed 25262.23 samples/sec Loss 2.0313 LearningRate 0.0003 Epoch: 21 Global Step: 36470 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 08:55:22,212-Speed 24924.20 samples/sec Loss 2.0652 LearningRate 0.0003 Epoch: 21 Global Step: 36480 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:55:32,220-Speed 24559.36 samples/sec Loss 2.0237 LearningRate 0.0003 Epoch: 21 Global Step: 36490 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:55:42,336-Speed 24297.51 samples/sec Loss 2.0112 LearningRate 0.0003 Epoch: 21 Global Step: 36500 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:55:52,346-Speed 24557.71 samples/sec Loss 2.0408 LearningRate 0.0003 Epoch: 21 Global Step: 36510 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:56:02,428-Speed 24380.10 samples/sec Loss 2.0353 LearningRate 0.0003 Epoch: 21 Global Step: 36520 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:56:12,412-Speed 24617.13 samples/sec Loss 2.0189 LearningRate 0.0003 Epoch: 21 Global Step: 36530 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:56:22,559-Speed 24224.85 samples/sec Loss 2.0258 LearningRate 0.0003 Epoch: 21 Global Step: 36540 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:56:32,597-Speed 24485.90 samples/sec Loss 2.0419 LearningRate 0.0003 Epoch: 21 Global Step: 36550 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:56:42,552-Speed 24692.28 samples/sec Loss 2.0325 LearningRate 0.0003 Epoch: 21 Global Step: 36560 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:56:52,590-Speed 24488.25 samples/sec Loss 2.0309 LearningRate 0.0003 Epoch: 21 Global Step: 36570 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:57:02,701-Speed 24308.78 samples/sec Loss 2.0162 LearningRate 0.0003 Epoch: 21 Global Step: 36580 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 08:57:12,734-Speed 24502.24 samples/sec Loss 2.0432 LearningRate 0.0003 Epoch: 21 Global Step: 36590 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:57:22,871-Speed 24244.93 samples/sec Loss 2.0407 LearningRate 0.0003 Epoch: 21 Global Step: 36600 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:57:32,964-Speed 24355.72 samples/sec Loss 2.0402 LearningRate 0.0003 Epoch: 21 Global Step: 36610 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:57:43,012-Speed 24463.52 samples/sec Loss 2.0361 LearningRate 0.0003 Epoch: 21 Global Step: 36620 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:57:52,950-Speed 24731.03 samples/sec Loss 2.0328 LearningRate 0.0003 Epoch: 21 Global Step: 36630 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:58:02,971-Speed 24527.80 samples/sec Loss 2.0157 LearningRate 0.0003 Epoch: 21 Global Step: 36640 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:58:13,124-Speed 24209.47 samples/sec Loss 2.0054 LearningRate 0.0003 Epoch: 21 Global Step: 36650 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:58:23,223-Speed 24337.96 samples/sec Loss 2.0362 LearningRate 0.0003 Epoch: 21 Global Step: 36660 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:58:33,178-Speed 24688.78 samples/sec Loss 2.0386 LearningRate 0.0003 Epoch: 21 Global Step: 36670 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:58:43,346-Speed 24174.57 samples/sec Loss 2.0218 LearningRate 0.0003 Epoch: 21 Global Step: 36680 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:58:53,536-Speed 24121.58 samples/sec Loss 2.0167 LearningRate 0.0003 Epoch: 21 Global Step: 36690 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 08:59:03,770-Speed 24014.79 samples/sec Loss 2.0338 LearningRate 0.0003 Epoch: 21 Global Step: 36700 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 08:59:13,963-Speed 24114.97 samples/sec Loss 2.0371 LearningRate 0.0003 Epoch: 21 Global Step: 36710 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:59:23,998-Speed 24494.91 samples/sec Loss 2.0196 LearningRate 0.0003 Epoch: 21 Global Step: 36720 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:59:34,010-Speed 24549.98 samples/sec Loss 2.0175 LearningRate 0.0003 Epoch: 21 Global Step: 36730 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:59:44,006-Speed 24589.83 samples/sec Loss 2.0222 LearningRate 0.0003 Epoch: 21 Global Step: 36740 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 08:59:53,999-Speed 24595.40 samples/sec Loss 2.0194 LearningRate 0.0003 Epoch: 21 Global Step: 36750 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:00:04,082-Speed 24375.92 samples/sec Loss 2.0338 LearningRate 0.0003 Epoch: 21 Global Step: 36760 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:00:14,144-Speed 24429.03 samples/sec Loss 2.0035 LearningRate 0.0003 Epoch: 21 Global Step: 36770 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:00:24,124-Speed 24626.41 samples/sec Loss 2.0011 LearningRate 0.0003 Epoch: 21 Global Step: 36780 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:00:34,112-Speed 24612.63 samples/sec Loss 2.0084 LearningRate 0.0003 Epoch: 21 Global Step: 36790 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:00:44,168-Speed 24441.98 samples/sec Loss 2.0169 LearningRate 0.0003 Epoch: 21 Global Step: 36800 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:00:54,395-Speed 24033.52 samples/sec Loss 2.0139 LearningRate 0.0003 Epoch: 21 Global Step: 36810 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:01:04,214-Speed 25031.89 samples/sec Loss 2.0290 LearningRate 0.0003 Epoch: 21 Global Step: 36820 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:01:13,913-Speed 25343.11 samples/sec Loss 2.0156 LearningRate 0.0003 Epoch: 21 Global Step: 36830 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:01:23,687-Speed 25146.69 samples/sec Loss 2.0222 LearningRate 0.0003 Epoch: 21 Global Step: 36840 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:01:33,468-Speed 25131.10 samples/sec Loss 2.0080 LearningRate 0.0003 Epoch: 21 Global Step: 36850 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:01:43,200-Speed 25257.09 samples/sec Loss 2.0141 LearningRate 0.0003 Epoch: 21 Global Step: 36860 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:01:52,887-Speed 25373.50 samples/sec Loss 2.0072 LearningRate 0.0003 Epoch: 21 Global Step: 36870 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:02:02,603-Speed 25302.69 samples/sec Loss 1.9984 LearningRate 0.0003 Epoch: 21 Global Step: 36880 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:02:12,301-Speed 25351.57 samples/sec Loss 2.0114 LearningRate 0.0003 Epoch: 21 Global Step: 36890 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:02:22,062-Speed 25182.24 samples/sec Loss 2.0384 LearningRate 0.0003 Epoch: 21 Global Step: 36900 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:02:31,905-Speed 24970.71 samples/sec Loss 2.0183 LearningRate 0.0003 Epoch: 21 Global Step: 36910 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:02:41,790-Speed 24868.46 samples/sec Loss 2.0223 LearningRate 0.0003 Epoch: 21 Global Step: 36920 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:02:51,526-Speed 25246.05 samples/sec Loss 2.0032 LearningRate 0.0003 Epoch: 21 Global Step: 36930 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:03:01,217-Speed 25368.05 samples/sec Loss 2.0065 LearningRate 0.0003 Epoch: 21 Global Step: 36940 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:03:10,934-Speed 25296.64 samples/sec Loss 2.0127 LearningRate 0.0003 Epoch: 21 Global Step: 36950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:03:20,641-Speed 25321.54 samples/sec Loss 2.0202 LearningRate 0.0003 Epoch: 21 Global Step: 36960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:03:30,348-Speed 25322.05 samples/sec Loss 2.0113 LearningRate 0.0003 Epoch: 21 Global Step: 36970 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:03:40,154-Speed 25067.79 samples/sec Loss 2.0091 LearningRate 0.0003 Epoch: 21 Global Step: 36980 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:03:49,877-Speed 25278.49 samples/sec Loss 2.0139 LearningRate 0.0003 Epoch: 21 Global Step: 36990 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:03:59,563-Speed 25377.53 samples/sec Loss 1.9990 LearningRate 0.0003 Epoch: 21 Global Step: 37000 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:04:09,269-Speed 25323.53 samples/sec Loss 2.0106 LearningRate 0.0003 Epoch: 21 Global Step: 37010 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:04:18,988-Speed 25289.47 samples/sec Loss 2.0321 LearningRate 0.0003 Epoch: 21 Global Step: 37020 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:04:28,763-Speed 25144.15 samples/sec Loss 1.9917 LearningRate 0.0003 Epoch: 21 Global Step: 37030 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:04:38,534-Speed 25154.68 samples/sec Loss 2.0029 LearningRate 0.0003 Epoch: 21 Global Step: 37040 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:04:48,314-Speed 25133.06 samples/sec Loss 1.9664 LearningRate 0.0003 Epoch: 21 Global Step: 37050 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:04:58,057-Speed 25226.86 samples/sec Loss 1.9774 LearningRate 0.0003 Epoch: 21 Global Step: 37060 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:05:07,805-Speed 25215.03 samples/sec Loss 1.9796 LearningRate 0.0003 Epoch: 21 Global Step: 37070 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:05:17,591-Speed 25118.78 samples/sec Loss 1.9974 LearningRate 0.0003 Epoch: 21 Global Step: 37080 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:05:27,298-Speed 25323.13 samples/sec Loss 1.9936 LearningRate 0.0003 Epoch: 21 Global Step: 37090 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:05:37,051-Speed 25201.55 samples/sec Loss 1.9958 LearningRate 0.0003 Epoch: 21 Global Step: 37100 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:05:46,789-Speed 25238.96 samples/sec Loss 1.9765 LearningRate 0.0003 Epoch: 21 Global Step: 37110 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:05:56,517-Speed 25267.40 samples/sec Loss 1.9888 LearningRate 0.0003 Epoch: 21 Global Step: 37120 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:06:06,275-Speed 25190.34 samples/sec Loss 1.9841 LearningRate 0.0003 Epoch: 21 Global Step: 37130 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:06:16,015-Speed 25234.18 samples/sec Loss 1.9982 LearningRate 0.0003 Epoch: 21 Global Step: 37140 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:06:25,711-Speed 25351.83 samples/sec Loss 1.9805 LearningRate 0.0003 Epoch: 21 Global Step: 37150 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:06:35,504-Speed 25099.31 samples/sec Loss 1.9774 LearningRate 0.0003 Epoch: 21 Global Step: 37160 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:06:45,336-Speed 25000.00 samples/sec Loss 1.9989 LearningRate 0.0003 Epoch: 21 Global Step: 37170 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:06:55,062-Speed 25270.81 samples/sec Loss 2.0107 LearningRate 0.0003 Epoch: 21 Global Step: 37180 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:07:04,800-Speed 25239.43 samples/sec Loss 1.9926 LearningRate 0.0003 Epoch: 21 Global Step: 37190 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:07:14,515-Speed 25303.07 samples/sec Loss 2.0089 LearningRate 0.0003 Epoch: 21 Global Step: 37200 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:07:24,230-Speed 25301.57 samples/sec Loss 1.9904 LearningRate 0.0003 Epoch: 21 Global Step: 37210 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:07:33,958-Speed 25266.91 samples/sec Loss 2.0088 LearningRate 0.0003 Epoch: 21 Global Step: 37220 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:07:43,813-Speed 24940.18 samples/sec Loss 1.9836 LearningRate 0.0003 Epoch: 21 Global Step: 37230 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:07:53,596-Speed 25125.70 samples/sec Loss 1.9933 LearningRate 0.0003 Epoch: 21 Global Step: 37240 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:08:03,405-Speed 25057.06 samples/sec Loss 1.9889 LearningRate 0.0003 Epoch: 21 Global Step: 37250 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:08:13,181-Speed 25141.86 samples/sec Loss 2.0077 LearningRate 0.0003 Epoch: 21 Global Step: 37260 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:08:22,943-Speed 25179.21 samples/sec Loss 1.9921 LearningRate 0.0003 Epoch: 21 Global Step: 37270 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:08:32,644-Speed 25334.96 samples/sec Loss 1.9694 LearningRate 0.0003 Epoch: 21 Global Step: 37280 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:08:42,398-Speed 25199.32 samples/sec Loss 1.9914 LearningRate 0.0003 Epoch: 21 Global Step: 37290 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:08:52,129-Speed 25262.93 samples/sec Loss 1.9904 LearningRate 0.0003 Epoch: 21 Global Step: 37300 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:09:01,877-Speed 25216.53 samples/sec Loss 1.9856 LearningRate 0.0003 Epoch: 21 Global Step: 37310 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:09:11,623-Speed 25220.50 samples/sec Loss 1.9921 LearningRate 0.0003 Epoch: 21 Global Step: 37320 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:09:21,396-Speed 25151.99 samples/sec Loss 2.0018 LearningRate 0.0003 Epoch: 21 Global Step: 37330 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:09:31,266-Speed 24904.02 samples/sec Loss 2.0103 LearningRate 0.0003 Epoch: 21 Global Step: 37340 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:09:41,095-Speed 25004.83 samples/sec Loss 2.0132 LearningRate 0.0003 Epoch: 21 Global Step: 37350 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:09:50,957-Speed 24925.18 samples/sec Loss 1.9738 LearningRate 0.0003 Epoch: 21 Global Step: 37360 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:10:00,722-Speed 25172.42 samples/sec Loss 1.9861 LearningRate 0.0003 Epoch: 21 Global Step: 37370 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:10:10,448-Speed 25273.33 samples/sec Loss 1.9936 LearningRate 0.0003 Epoch: 21 Global Step: 37380 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:10:20,132-Speed 25381.43 samples/sec Loss 1.9948 LearningRate 0.0003 Epoch: 21 Global Step: 37390 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:10:29,908-Speed 25141.74 samples/sec Loss 1.9921 LearningRate 0.0003 Epoch: 21 Global Step: 37400 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:10:39,708-Speed 25081.08 samples/sec Loss 1.9814 LearningRate 0.0003 Epoch: 21 Global Step: 37410 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:10:49,403-Speed 25354.52 samples/sec Loss 1.9775 LearningRate 0.0003 Epoch: 21 Global Step: 37420 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:10:59,173-Speed 25158.14 samples/sec Loss 1.9945 LearningRate 0.0003 Epoch: 21 Global Step: 37430 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:11:08,874-Speed 25337.66 samples/sec Loss 1.9652 LearningRate 0.0003 Epoch: 21 Global Step: 37440 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:11:18,594-Speed 25285.82 samples/sec Loss 1.9799 LearningRate 0.0003 Epoch: 21 Global Step: 37450 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:11:28,467-Speed 24896.84 samples/sec Loss 1.9862 LearningRate 0.0003 Epoch: 21 Global Step: 37460 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:11:38,311-Speed 24969.18 samples/sec Loss 1.9918 LearningRate 0.0003 Epoch: 21 Global Step: 37470 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:11:48,075-Speed 25173.24 samples/sec Loss 1.9860 LearningRate 0.0003 Epoch: 21 Global Step: 37480 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:11:57,917-Speed 24972.49 samples/sec Loss 1.9706 LearningRate 0.0003 Epoch: 21 Global Step: 37490 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:12:07,663-Speed 25219.65 samples/sec Loss 1.9575 LearningRate 0.0003 Epoch: 21 Global Step: 37500 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:12:17,431-Speed 25164.47 samples/sec Loss 1.9839 LearningRate 0.0003 Epoch: 21 Global Step: 37510 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:12:27,187-Speed 25192.25 samples/sec Loss 1.9677 LearningRate 0.0003 Epoch: 21 Global Step: 37520 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:12:36,927-Speed 25237.22 samples/sec Loss 1.9718 LearningRate 0.0003 Epoch: 21 Global Step: 37530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:12:46,595-Speed 25422.11 samples/sec Loss 1.9848 LearningRate 0.0003 Epoch: 21 Global Step: 37540 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:12:56,257-Speed 25440.59 samples/sec Loss 1.9651 LearningRate 0.0003 Epoch: 21 Global Step: 37550 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:13:06,026-Speed 25162.18 samples/sec Loss 1.9670 LearningRate 0.0003 Epoch: 21 Global Step: 37560 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:13:15,778-Speed 25203.95 samples/sec Loss 1.9908 LearningRate 0.0003 Epoch: 21 Global Step: 37570 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:13:25,496-Speed 25292.49 samples/sec Loss 1.9886 LearningRate 0.0003 Epoch: 21 Global Step: 37580 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:13:35,229-Speed 25252.21 samples/sec Loss 1.9706 LearningRate 0.0003 Epoch: 21 Global Step: 37590 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:13:45,003-Speed 25147.94 samples/sec Loss 1.9953 LearningRate 0.0003 Epoch: 21 Global Step: 37600 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:13:54,767-Speed 25175.15 samples/sec Loss 1.9691 LearningRate 0.0003 Epoch: 21 Global Step: 37610 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:14:04,528-Speed 25181.05 samples/sec Loss 1.9790 LearningRate 0.0003 Epoch: 21 Global Step: 37620 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:14:14,247-Speed 25290.09 samples/sec Loss 1.9899 LearningRate 0.0003 Epoch: 21 Global Step: 37630 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:14:24,015-Speed 25164.91 samples/sec Loss 1.9946 LearningRate 0.0003 Epoch: 21 Global Step: 37640 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:14:33,707-Speed 25361.18 samples/sec Loss 1.9820 LearningRate 0.0003 Epoch: 21 Global Step: 37650 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:14:43,491-Speed 25124.94 samples/sec Loss 1.9823 LearningRate 0.0003 Epoch: 21 Global Step: 37660 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:14:53,223-Speed 25256.23 samples/sec Loss 1.9921 LearningRate 0.0003 Epoch: 21 Global Step: 37670 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:15:03,029-Speed 25065.92 samples/sec Loss 1.9721 LearningRate 0.0003 Epoch: 21 Global Step: 37680 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:15:12,797-Speed 25160.31 samples/sec Loss 1.9820 LearningRate 0.0003 Epoch: 21 Global Step: 37690 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:15:22,569-Speed 25154.29 samples/sec Loss 1.9648 LearningRate 0.0003 Epoch: 21 Global Step: 37700 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:15:32,410-Speed 24977.12 samples/sec Loss 1.9920 LearningRate 0.0003 Epoch: 21 Global Step: 37710 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:15:42,113-Speed 25331.15 samples/sec Loss 2.0188 LearningRate 0.0003 Epoch: 21 Global Step: 37720 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:15:51,777-Speed 25436.01 samples/sec Loss 2.0006 LearningRate 0.0003 Epoch: 21 Global Step: 37730 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:16:01,510-Speed 25251.49 samples/sec Loss 1.9626 LearningRate 0.0003 Epoch: 21 Global Step: 37740 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:16:11,244-Speed 25251.02 samples/sec Loss 1.9622 LearningRate 0.0003 Epoch: 21 Global Step: 37750 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:16:20,967-Speed 25279.80 samples/sec Loss 1.9586 LearningRate 0.0003 Epoch: 21 Global Step: 37760 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:16:30,778-Speed 25053.05 samples/sec Loss 1.9644 LearningRate 0.0003 Epoch: 21 Global Step: 37770 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:16:40,580-Speed 25076.82 samples/sec Loss 1.9666 LearningRate 0.0003 Epoch: 21 Global Step: 37780 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:16:50,437-Speed 24937.41 samples/sec Loss 1.9743 LearningRate 0.0003 Epoch: 21 Global Step: 37790 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:17:00,224-Speed 25112.95 samples/sec Loss 1.9576 LearningRate 0.0003 Epoch: 21 Global Step: 37800 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:17:10,081-Speed 24937.39 samples/sec Loss 1.9682 LearningRate 0.0003 Epoch: 21 Global Step: 37810 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:17:19,857-Speed 25142.23 samples/sec Loss 1.9778 LearningRate 0.0003 Epoch: 21 Global Step: 37820 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:17:29,578-Speed 25287.21 samples/sec Loss 1.9769 LearningRate 0.0003 Epoch: 21 Global Step: 37830 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:17:39,374-Speed 25091.14 samples/sec Loss 1.9869 LearningRate 0.0003 Epoch: 21 Global Step: 37840 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:17:49,122-Speed 25213.10 samples/sec Loss 1.9642 LearningRate 0.0003 Epoch: 21 Global Step: 37850 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:17:58,830-Speed 25318.66 samples/sec Loss 1.9739 LearningRate 0.0003 Epoch: 21 Global Step: 37860 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:18:08,617-Speed 25115.00 samples/sec Loss 1.9688 LearningRate 0.0003 Epoch: 21 Global Step: 37870 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:18:18,328-Speed 25310.66 samples/sec Loss 1.9533 LearningRate 0.0003 Epoch: 21 Global Step: 37880 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:18:28,074-Speed 25220.52 samples/sec Loss 1.9930 LearningRate 0.0003 Epoch: 21 Global Step: 37890 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:18:37,892-Speed 25032.56 samples/sec Loss 1.9572 LearningRate 0.0003 Epoch: 21 Global Step: 37900 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:18:47,611-Speed 25291.37 samples/sec Loss 1.9589 LearningRate 0.0003 Epoch: 21 Global Step: 37910 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:18:57,350-Speed 25238.80 samples/sec Loss 1.9800 LearningRate 0.0003 Epoch: 21 Global Step: 37920 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:19:07,070-Speed 25286.66 samples/sec Loss 1.9808 LearningRate 0.0003 Epoch: 21 Global Step: 37930 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:19:16,769-Speed 25341.16 samples/sec Loss 1.9779 LearningRate 0.0003 Epoch: 21 Global Step: 37940 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:19:26,560-Speed 25104.58 samples/sec Loss 1.9951 LearningRate 0.0003 Epoch: 21 Global Step: 37950 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:19:36,304-Speed 25228.29 samples/sec Loss 1.9838 LearningRate 0.0003 Epoch: 21 Global Step: 37960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:19:46,031-Speed 25269.94 samples/sec Loss 1.9506 LearningRate 0.0003 Epoch: 21 Global Step: 37970 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:19:55,706-Speed 25407.72 samples/sec Loss 1.9765 LearningRate 0.0003 Epoch: 21 Global Step: 37980 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:20:05,498-Speed 25100.96 samples/sec Loss 1.9846 LearningRate 0.0003 Epoch: 21 Global Step: 37990 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:20:15,244-Speed 25219.72 samples/sec Loss 1.9739 LearningRate 0.0003 Epoch: 21 Global Step: 38000 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:20:24,997-Speed 25202.78 samples/sec Loss 1.9811 LearningRate 0.0003 Epoch: 21 Global Step: 38010 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:20:34,831-Speed 24995.54 samples/sec Loss 1.9992 LearningRate 0.0002 Epoch: 21 Global Step: 38020 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:21:33,695-Speed 4175.13 samples/sec Loss 1.9528 LearningRate 0.0002 Epoch: 22 Global Step: 38030 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:21:43,872-Speed 24151.72 samples/sec Loss 1.9513 LearningRate 0.0002 Epoch: 22 Global Step: 38040 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:21:53,626-Speed 25200.17 samples/sec Loss 1.9722 LearningRate 0.0002 Epoch: 22 Global Step: 38050 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:22:03,394-Speed 25164.25 samples/sec Loss 1.9267 LearningRate 0.0002 Epoch: 22 Global Step: 38060 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:22:13,152-Speed 25189.20 samples/sec Loss 1.9408 LearningRate 0.0002 Epoch: 22 Global Step: 38070 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:22:22,909-Speed 25191.32 samples/sec Loss 1.9375 LearningRate 0.0002 Epoch: 22 Global Step: 38080 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:22:32,733-Speed 25020.14 samples/sec Loss 1.9119 LearningRate 0.0002 Epoch: 22 Global Step: 38090 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:22:42,502-Speed 25160.86 samples/sec Loss 1.9430 LearningRate 0.0002 Epoch: 22 Global Step: 38100 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:22:52,358-Speed 24937.15 samples/sec Loss 1.9440 LearningRate 0.0002 Epoch: 22 Global Step: 38110 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:23:02,084-Speed 25270.87 samples/sec Loss 1.9516 LearningRate 0.0002 Epoch: 22 Global Step: 38120 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:23:11,881-Speed 25089.57 samples/sec Loss 1.9654 LearningRate 0.0002 Epoch: 22 Global Step: 38130 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:23:21,607-Speed 25273.61 samples/sec Loss 1.9494 LearningRate 0.0002 Epoch: 22 Global Step: 38140 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:23:31,471-Speed 24919.95 samples/sec Loss 1.9553 LearningRate 0.0002 Epoch: 22 Global Step: 38150 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:23:41,175-Speed 25330.47 samples/sec Loss 1.9276 LearningRate 0.0002 Epoch: 22 Global Step: 38160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:23:50,918-Speed 25228.52 samples/sec Loss 1.9509 LearningRate 0.0002 Epoch: 22 Global Step: 38170 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:24:00,719-Speed 25080.09 samples/sec Loss 1.9645 LearningRate 0.0002 Epoch: 22 Global Step: 38180 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:24:10,535-Speed 25038.83 samples/sec Loss 1.9492 LearningRate 0.0002 Epoch: 22 Global Step: 38190 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:24:20,312-Speed 25140.87 samples/sec Loss 1.9388 LearningRate 0.0002 Epoch: 22 Global Step: 38200 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:24:30,137-Speed 25019.48 samples/sec Loss 1.9251 LearningRate 0.0002 Epoch: 22 Global Step: 38210 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:24:39,819-Speed 25385.84 samples/sec Loss 1.9271 LearningRate 0.0002 Epoch: 22 Global Step: 38220 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:24:49,506-Speed 25375.22 samples/sec Loss 1.9571 LearningRate 0.0002 Epoch: 22 Global Step: 38230 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:24:59,219-Speed 25307.92 samples/sec Loss 1.9212 LearningRate 0.0002 Epoch: 22 Global Step: 38240 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:25:09,003-Speed 25122.48 samples/sec Loss 1.9594 LearningRate 0.0002 Epoch: 22 Global Step: 38250 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:25:18,824-Speed 25028.23 samples/sec Loss 1.9347 LearningRate 0.0002 Epoch: 22 Global Step: 38260 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:25:28,750-Speed 24760.73 samples/sec Loss 1.9277 LearningRate 0.0002 Epoch: 22 Global Step: 38270 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:25:38,622-Speed 24900.54 samples/sec Loss 1.9601 LearningRate 0.0002 Epoch: 22 Global Step: 38280 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:25:48,358-Speed 25245.99 samples/sec Loss 1.9541 LearningRate 0.0002 Epoch: 22 Global Step: 38290 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:25:58,265-Speed 24809.32 samples/sec Loss 1.9555 LearningRate 0.0002 Epoch: 22 Global Step: 38300 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:26:08,039-Speed 25147.22 samples/sec Loss 1.9619 LearningRate 0.0002 Epoch: 22 Global Step: 38310 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:26:17,883-Speed 24968.61 samples/sec Loss 1.9403 LearningRate 0.0002 Epoch: 22 Global Step: 38320 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:26:27,706-Speed 25022.63 samples/sec Loss 1.9452 LearningRate 0.0002 Epoch: 22 Global Step: 38330 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:26:37,560-Speed 24945.37 samples/sec Loss 1.9478 LearningRate 0.0002 Epoch: 22 Global Step: 38340 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:26:47,244-Speed 25380.80 samples/sec Loss 1.9515 LearningRate 0.0002 Epoch: 22 Global Step: 38350 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:26:57,000-Speed 25194.62 samples/sec Loss 1.9475 LearningRate 0.0002 Epoch: 22 Global Step: 38360 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:27:06,853-Speed 24946.84 samples/sec Loss 1.9490 LearningRate 0.0002 Epoch: 22 Global Step: 38370 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:27:16,575-Speed 25283.69 samples/sec Loss 1.9645 LearningRate 0.0002 Epoch: 22 Global Step: 38380 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:27:26,327-Speed 25202.82 samples/sec Loss 1.9421 LearningRate 0.0002 Epoch: 22 Global Step: 38390 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:27:36,115-Speed 25112.56 samples/sec Loss 1.9242 LearningRate 0.0002 Epoch: 22 Global Step: 38400 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:27:45,956-Speed 24976.01 samples/sec Loss 1.9469 LearningRate 0.0002 Epoch: 22 Global Step: 38410 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:27:55,654-Speed 25343.85 samples/sec Loss 1.9405 LearningRate 0.0002 Epoch: 22 Global Step: 38420 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:28:05,530-Speed 24889.64 samples/sec Loss 1.9438 LearningRate 0.0002 Epoch: 22 Global Step: 38430 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:28:15,301-Speed 25154.34 samples/sec Loss 1.9484 LearningRate 0.0002 Epoch: 22 Global Step: 38440 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:28:25,157-Speed 24946.18 samples/sec Loss 1.9301 LearningRate 0.0002 Epoch: 22 Global Step: 38450 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:28:34,957-Speed 25081.33 samples/sec Loss 1.9278 LearningRate 0.0002 Epoch: 22 Global Step: 38460 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:28:44,729-Speed 25151.68 samples/sec Loss 1.9352 LearningRate 0.0002 Epoch: 22 Global Step: 38470 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:28:54,592-Speed 24921.38 samples/sec Loss 1.9415 LearningRate 0.0002 Epoch: 22 Global Step: 38480 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:29:04,491-Speed 24829.69 samples/sec Loss 1.9622 LearningRate 0.0002 Epoch: 22 Global Step: 38490 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:29:14,228-Speed 25242.04 samples/sec Loss 1.9460 LearningRate 0.0002 Epoch: 22 Global Step: 38500 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:29:24,038-Speed 25055.24 samples/sec Loss 1.9419 LearningRate 0.0002 Epoch: 22 Global Step: 38510 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:29:33,861-Speed 25023.53 samples/sec Loss 1.9391 LearningRate 0.0002 Epoch: 22 Global Step: 38520 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:29:43,655-Speed 25096.64 samples/sec Loss 1.9363 LearningRate 0.0002 Epoch: 22 Global Step: 38530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:29:53,528-Speed 24893.99 samples/sec Loss 1.9588 LearningRate 0.0002 Epoch: 22 Global Step: 38540 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:30:03,298-Speed 25158.39 samples/sec Loss 1.9757 LearningRate 0.0002 Epoch: 22 Global Step: 38550 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:30:13,114-Speed 25042.68 samples/sec Loss 1.9422 LearningRate 0.0002 Epoch: 22 Global Step: 38560 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:30:22,824-Speed 25310.88 samples/sec Loss 1.9132 LearningRate 0.0002 Epoch: 22 Global Step: 38570 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:30:32,769-Speed 24716.88 samples/sec Loss 1.9326 LearningRate 0.0002 Epoch: 22 Global Step: 38580 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:30:42,517-Speed 25213.86 samples/sec Loss 1.9304 LearningRate 0.0002 Epoch: 22 Global Step: 38590 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:30:52,385-Speed 24908.29 samples/sec Loss 1.9351 LearningRate 0.0002 Epoch: 22 Global Step: 38600 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:31:02,209-Speed 25021.29 samples/sec Loss 1.9299 LearningRate 0.0002 Epoch: 22 Global Step: 38610 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:31:12,181-Speed 24650.36 samples/sec Loss 1.9531 LearningRate 0.0002 Epoch: 22 Global Step: 38620 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:31:21,902-Speed 25287.03 samples/sec Loss 1.9231 LearningRate 0.0002 Epoch: 22 Global Step: 38630 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:31:31,597-Speed 25354.82 samples/sec Loss 1.9183 LearningRate 0.0002 Epoch: 22 Global Step: 38640 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:31:41,457-Speed 24929.57 samples/sec Loss 1.9279 LearningRate 0.0002 Epoch: 22 Global Step: 38650 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:31:51,181-Speed 25279.01 samples/sec Loss 1.9393 LearningRate 0.0002 Epoch: 22 Global Step: 38660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:32:00,914-Speed 25254.41 samples/sec Loss 1.9280 LearningRate 0.0002 Epoch: 22 Global Step: 38670 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:32:10,767-Speed 24945.94 samples/sec Loss 1.9504 LearningRate 0.0002 Epoch: 22 Global Step: 38680 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:32:20,560-Speed 25097.76 samples/sec Loss 1.9381 LearningRate 0.0002 Epoch: 22 Global Step: 38690 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:32:30,429-Speed 24906.50 samples/sec Loss 1.9255 LearningRate 0.0002 Epoch: 22 Global Step: 38700 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:32:40,223-Speed 25093.97 samples/sec Loss 1.9446 LearningRate 0.0002 Epoch: 22 Global Step: 38710 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:32:50,080-Speed 24937.84 samples/sec Loss 1.9580 LearningRate 0.0002 Epoch: 22 Global Step: 38720 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:32:59,966-Speed 24860.25 samples/sec Loss 1.9499 LearningRate 0.0002 Epoch: 22 Global Step: 38730 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:33:09,758-Speed 25101.89 samples/sec Loss 1.9183 LearningRate 0.0002 Epoch: 22 Global Step: 38740 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:33:19,581-Speed 25022.53 samples/sec Loss 1.9128 LearningRate 0.0002 Epoch: 22 Global Step: 38750 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:33:29,304-Speed 25279.92 samples/sec Loss 1.9129 LearningRate 0.0002 Epoch: 22 Global Step: 38760 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:33:39,030-Speed 25272.59 samples/sec Loss 1.9185 LearningRate 0.0002 Epoch: 22 Global Step: 38770 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:33:48,832-Speed 25073.83 samples/sec Loss 1.9202 LearningRate 0.0002 Epoch: 22 Global Step: 38780 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:33:58,727-Speed 24839.83 samples/sec Loss 1.9282 LearningRate 0.0002 Epoch: 22 Global Step: 38790 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:34:08,463-Speed 25246.78 samples/sec Loss 1.9165 LearningRate 0.0002 Epoch: 22 Global Step: 38800 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:34:18,305-Speed 24975.48 samples/sec Loss 1.9270 LearningRate 0.0002 Epoch: 22 Global Step: 38810 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:34:28,077-Speed 25152.68 samples/sec Loss 1.9223 LearningRate 0.0002 Epoch: 22 Global Step: 38820 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:34:37,848-Speed 25154.43 samples/sec Loss 1.9409 LearningRate 0.0002 Epoch: 22 Global Step: 38830 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:34:47,560-Speed 25309.41 samples/sec Loss 1.9218 LearningRate 0.0002 Epoch: 22 Global Step: 38840 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:34:57,275-Speed 25302.19 samples/sec Loss 1.9194 LearningRate 0.0002 Epoch: 22 Global Step: 38850 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:35:07,045-Speed 25155.67 samples/sec Loss 1.9263 LearningRate 0.0002 Epoch: 22 Global Step: 38860 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:35:16,855-Speed 25056.93 samples/sec Loss 1.9425 LearningRate 0.0002 Epoch: 22 Global Step: 38870 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:35:26,642-Speed 25114.38 samples/sec Loss 1.9268 LearningRate 0.0002 Epoch: 22 Global Step: 38880 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:35:36,458-Speed 25040.57 samples/sec Loss 1.9178 LearningRate 0.0002 Epoch: 22 Global Step: 38890 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:35:46,267-Speed 25060.04 samples/sec Loss 1.9242 LearningRate 0.0002 Epoch: 22 Global Step: 38900 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:35:56,152-Speed 24863.33 samples/sec Loss 1.9276 LearningRate 0.0002 Epoch: 22 Global Step: 38910 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:36:05,973-Speed 25026.03 samples/sec Loss 1.9243 LearningRate 0.0002 Epoch: 22 Global Step: 38920 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:36:15,744-Speed 25154.94 samples/sec Loss 1.9106 LearningRate 0.0002 Epoch: 22 Global Step: 38930 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:36:25,489-Speed 25224.19 samples/sec Loss 1.8922 LearningRate 0.0002 Epoch: 22 Global Step: 38940 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:36:35,212-Speed 25279.30 samples/sec Loss 1.8996 LearningRate 0.0002 Epoch: 22 Global Step: 38950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:36:44,949-Speed 25243.49 samples/sec Loss 1.9050 LearningRate 0.0002 Epoch: 22 Global Step: 38960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:36:54,699-Speed 25208.39 samples/sec Loss 1.9411 LearningRate 0.0002 Epoch: 22 Global Step: 38970 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:37:04,455-Speed 25199.97 samples/sec Loss 1.9176 LearningRate 0.0002 Epoch: 22 Global Step: 38980 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:37:14,231-Speed 25142.51 samples/sec Loss 1.9389 LearningRate 0.0002 Epoch: 22 Global Step: 38990 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-26 09:37:23,999-Speed 25162.23 samples/sec Loss 1.9154 LearningRate 0.0002 Epoch: 22 Global Step: 39000 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:37:33,728-Speed 25262.07 samples/sec Loss 1.9000 LearningRate 0.0002 Epoch: 22 Global Step: 39010 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-26 09:37:43,444-Speed 25296.69 samples/sec Loss 1.9079 LearningRate 0.0002 Epoch: 22 Global Step: 39020 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:37:53,373-Speed 24756.31 samples/sec Loss 1.9207 LearningRate 0.0002 Epoch: 22 Global Step: 39030 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:38:03,496-Speed 24287.23 samples/sec Loss 1.9137 LearningRate 0.0002 Epoch: 22 Global Step: 39040 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:38:13,390-Speed 24841.46 samples/sec Loss 1.9076 LearningRate 0.0002 Epoch: 22 Global Step: 39050 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:38:23,243-Speed 24943.50 samples/sec Loss 1.9339 LearningRate 0.0002 Epoch: 22 Global Step: 39060 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:38:33,104-Speed 24926.10 samples/sec Loss 1.9242 LearningRate 0.0002 Epoch: 22 Global Step: 39070 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:38:43,072-Speed 24657.05 samples/sec Loss 1.9227 LearningRate 0.0002 Epoch: 22 Global Step: 39080 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:38:52,793-Speed 25286.21 samples/sec Loss 1.9074 LearningRate 0.0002 Epoch: 22 Global Step: 39090 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:39:02,488-Speed 25350.59 samples/sec Loss 1.9009 LearningRate 0.0002 Epoch: 22 Global Step: 39100 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-26 09:39:12,397-Speed 24805.62 samples/sec Loss 1.9007 LearningRate 0.0002 Epoch: 22 Global Step: 39110 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-26 09:39:22,146-Speed 25216.96 samples/sec Loss 1.9141 LearningRate 0.0002 Epoch: 22 Global Step: 39120 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:39:31,965-Speed 25034.63 samples/sec Loss 1.9036 LearningRate 0.0002 Epoch: 22 Global Step: 39130 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:39:41,905-Speed 24727.91 samples/sec Loss 1.9003 LearningRate 0.0002 Epoch: 22 Global Step: 39140 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:39:51,696-Speed 25102.93 samples/sec Loss 1.8990 LearningRate 0.0002 Epoch: 22 Global Step: 39150 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:40:01,505-Speed 25058.72 samples/sec Loss 1.9427 LearningRate 0.0002 Epoch: 22 Global Step: 39160 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:40:11,366-Speed 24924.38 samples/sec Loss 1.9160 LearningRate 0.0002 Epoch: 22 Global Step: 39170 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:40:21,131-Speed 25172.06 samples/sec Loss 1.9182 LearningRate 0.0002 Epoch: 22 Global Step: 39180 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:40:30,904-Speed 25150.89 samples/sec Loss 1.9155 LearningRate 0.0002 Epoch: 22 Global Step: 39190 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:40:40,616-Speed 25308.32 samples/sec Loss 1.9127 LearningRate 0.0002 Epoch: 22 Global Step: 39200 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:40:50,487-Speed 24899.28 samples/sec Loss 1.9095 LearningRate 0.0002 Epoch: 22 Global Step: 39210 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:41:00,356-Speed 24912.78 samples/sec Loss 1.9244 LearningRate 0.0002 Epoch: 22 Global Step: 39220 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:41:10,282-Speed 24762.09 samples/sec Loss 1.9146 LearningRate 0.0002 Epoch: 22 Global Step: 39230 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:41:20,113-Speed 25002.73 samples/sec Loss 1.9021 LearningRate 0.0002 Epoch: 22 Global Step: 39240 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:41:29,810-Speed 25347.66 samples/sec Loss 1.8928 LearningRate 0.0002 Epoch: 22 Global Step: 39250 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:41:39,543-Speed 25253.43 samples/sec Loss 1.8772 LearningRate 0.0002 Epoch: 22 Global Step: 39260 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:41:49,274-Speed 25258.01 samples/sec Loss 1.8944 LearningRate 0.0002 Epoch: 22 Global Step: 39270 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-26 09:41:59,091-Speed 25037.87 samples/sec Loss 1.9123 LearningRate 0.0002 Epoch: 22 Global Step: 39280 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-26 09:42:08,939-Speed 24958.15 samples/sec Loss 1.9054 LearningRate 0.0002 Epoch: 22 Global Step: 39290 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-26 09:42:18,714-Speed 25146.28 samples/sec Loss 1.9287 LearningRate 0.0002 Epoch: 22 Global Step: 39300 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-26 09:42:28,422-Speed 25317.40 samples/sec Loss 1.9049 LearningRate 0.0002 Epoch: 22 Global Step: 39310 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-26 09:42:38,263-Speed 24979.24 samples/sec Loss 1.8913 LearningRate 0.0002 Epoch: 22 Global Step: 39320 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-26 09:42:47,974-Speed 25309.07 samples/sec Loss 1.8888 LearningRate 0.0002 Epoch: 22 Global Step: 39330 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-26 09:42:57,851-Speed 24885.15 samples/sec Loss 1.9054 LearningRate 0.0002 Epoch: 22 Global Step: 39340 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-26 09:43:07,582-Speed 25261.79 samples/sec Loss 1.8868 LearningRate 0.0002 Epoch: 22 Global Step: 39350 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-26 09:43:17,291-Speed 25315.93 samples/sec Loss 1.8930 LearningRate 0.0002 Epoch: 22 Global Step: 39360 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-26 09:43:27,028-Speed 25240.35 samples/sec Loss 1.8948 LearningRate 0.0002 Epoch: 22 Global Step: 39370 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:43:36,795-Speed 25166.66 samples/sec Loss 1.9054 LearningRate 0.0002 Epoch: 22 Global Step: 39380 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:43:46,561-Speed 25170.84 samples/sec Loss 1.8889 LearningRate 0.0002 Epoch: 22 Global Step: 39390 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:43:56,515-Speed 24698.69 samples/sec Loss 1.9078 LearningRate 0.0002 Epoch: 22 Global Step: 39400 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:44:06,214-Speed 25341.36 samples/sec Loss 1.9311 LearningRate 0.0002 Epoch: 22 Global Step: 39410 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:44:15,903-Speed 25370.88 samples/sec Loss 1.8994 LearningRate 0.0002 Epoch: 22 Global Step: 39420 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:44:25,610-Speed 25323.12 samples/sec Loss 1.8621 LearningRate 0.0002 Epoch: 22 Global Step: 39430 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:44:35,362-Speed 25204.28 samples/sec Loss 1.8831 LearningRate 0.0002 Epoch: 22 Global Step: 39440 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:44:45,093-Speed 25258.43 samples/sec Loss 1.9166 LearningRate 0.0002 Epoch: 22 Global Step: 39450 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:44:54,829-Speed 25246.06 samples/sec Loss 1.8940 LearningRate 0.0002 Epoch: 22 Global Step: 39460 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:45:04,591-Speed 25178.87 samples/sec Loss 1.9021 LearningRate 0.0002 Epoch: 22 Global Step: 39470 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-26 09:45:14,306-Speed 25300.28 samples/sec Loss 1.8932 LearningRate 0.0002 Epoch: 22 Global Step: 39480 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:45:24,181-Speed 24891.65 samples/sec Loss 1.9092 LearningRate 0.0002 Epoch: 22 Global Step: 39490 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:45:34,076-Speed 24839.00 samples/sec Loss 1.9102 LearningRate 0.0002 Epoch: 22 Global Step: 39500 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:45:43,935-Speed 24931.31 samples/sec Loss 1.9103 LearningRate 0.0002 Epoch: 22 Global Step: 39510 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:45:53,702-Speed 25166.48 samples/sec Loss 1.9368 LearningRate 0.0002 Epoch: 22 Global Step: 39520 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:46:03,494-Speed 25102.35 samples/sec Loss 1.8893 LearningRate 0.0002 Epoch: 22 Global Step: 39530 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:46:13,297-Speed 25070.76 samples/sec Loss 1.8845 LearningRate 0.0002 Epoch: 22 Global Step: 39540 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:46:22,999-Speed 25333.99 samples/sec Loss 1.8900 LearningRate 0.0002 Epoch: 22 Global Step: 39550 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:46:32,669-Speed 25419.53 samples/sec Loss 1.9028 LearningRate 0.0002 Epoch: 22 Global Step: 39560 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:46:42,387-Speed 25292.81 samples/sec Loss 1.8967 LearningRate 0.0002 Epoch: 22 Global Step: 39570 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:46:52,167-Speed 25131.04 samples/sec Loss 1.9012 LearningRate 0.0002 Epoch: 22 Global Step: 39580 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-26 09:47:02,034-Speed 24911.98 samples/sec Loss 1.9024 LearningRate 0.0002 Epoch: 22 Global Step: 39590 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:47:11,754-Speed 25289.26 samples/sec Loss 1.9147 LearningRate 0.0002 Epoch: 22 Global Step: 39600 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:47:21,559-Speed 25069.05 samples/sec Loss 1.9007 LearningRate 0.0002 Epoch: 22 Global Step: 39610 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:47:31,333-Speed 25146.34 samples/sec Loss 1.8860 LearningRate 0.0002 Epoch: 22 Global Step: 39620 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:47:41,186-Speed 24945.89 samples/sec Loss 1.8932 LearningRate 0.0002 Epoch: 22 Global Step: 39630 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:47:51,189-Speed 24573.52 samples/sec Loss 1.8836 LearningRate 0.0002 Epoch: 22 Global Step: 39640 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:48:01,137-Speed 24707.84 samples/sec Loss 1.8820 LearningRate 0.0002 Epoch: 22 Global Step: 39650 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:48:11,108-Speed 24648.83 samples/sec Loss 1.8959 LearningRate 0.0002 Epoch: 22 Global Step: 39660 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:48:21,024-Speed 24788.62 samples/sec Loss 1.9007 LearningRate 0.0002 Epoch: 22 Global Step: 39670 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:48:31,006-Speed 24622.04 samples/sec Loss 1.8920 LearningRate 0.0002 Epoch: 22 Global Step: 39680 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:48:40,949-Speed 24720.01 samples/sec Loss 1.8983 LearningRate 0.0002 Epoch: 22 Global Step: 39690 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-26 09:48:50,754-Speed 25070.39 samples/sec Loss 1.9092 LearningRate 0.0002 Epoch: 22 Global Step: 39700 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-26 09:49:00,512-Speed 25188.97 samples/sec Loss 1.9050 LearningRate 0.0002 Epoch: 22 Global Step: 39710 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-26 09:49:10,230-Speed 25293.07 samples/sec Loss 1.9142 LearningRate 0.0002 Epoch: 22 Global Step: 39720 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:49:20,120-Speed 24851.20 samples/sec Loss 1.9108 LearningRate 0.0002 Epoch: 22 Global Step: 39730 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:49:30,037-Speed 24786.43 samples/sec Loss 1.9039 LearningRate 0.0002 Epoch: 22 Global Step: 39740 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:49:39,954-Speed 24784.70 samples/sec Loss 1.8955 LearningRate 0.0002 Epoch: 22 Global Step: 39750 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:50:39,013-Speed 4161.34 samples/sec Loss 1.8816 LearningRate 0.0002 Epoch: 23 Global Step: 39760 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:50:48,909-Speed 24840.35 samples/sec Loss 1.8625 LearningRate 0.0002 Epoch: 23 Global Step: 39770 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:50:58,746-Speed 24988.70 samples/sec Loss 1.8682 LearningRate 0.0002 Epoch: 23 Global Step: 39780 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:51:08,513-Speed 25164.91 samples/sec Loss 1.8815 LearningRate 0.0002 Epoch: 23 Global Step: 39790 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:51:18,422-Speed 24805.99 samples/sec Loss 1.8805 LearningRate 0.0002 Epoch: 23 Global Step: 39800 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:51:28,283-Speed 24926.25 samples/sec Loss 1.8912 LearningRate 0.0002 Epoch: 23 Global Step: 39810 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:51:38,056-Speed 25148.61 samples/sec Loss 1.8926 LearningRate 0.0002 Epoch: 23 Global Step: 39820 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:51:47,758-Speed 25335.50 samples/sec Loss 1.8639 LearningRate 0.0002 Epoch: 23 Global Step: 39830 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:51:57,500-Speed 25228.72 samples/sec Loss 1.8722 LearningRate 0.0002 Epoch: 23 Global Step: 39840 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:52:07,190-Speed 25365.37 samples/sec Loss 1.8694 LearningRate 0.0002 Epoch: 23 Global Step: 39850 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:52:16,975-Speed 25119.89 samples/sec Loss 1.8809 LearningRate 0.0002 Epoch: 23 Global Step: 39860 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:52:26,743-Speed 25163.29 samples/sec Loss 1.8600 LearningRate 0.0002 Epoch: 23 Global Step: 39870 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:52:36,437-Speed 25354.21 samples/sec Loss 1.8867 LearningRate 0.0002 Epoch: 23 Global Step: 39880 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:52:46,197-Speed 25184.34 samples/sec Loss 1.8658 LearningRate 0.0002 Epoch: 23 Global Step: 39890 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:52:55,990-Speed 25105.36 samples/sec Loss 1.8606 LearningRate 0.0002 Epoch: 23 Global Step: 39900 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:53:05,779-Speed 25110.81 samples/sec Loss 1.8570 LearningRate 0.0002 Epoch: 23 Global Step: 39910 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:53:15,567-Speed 25112.37 samples/sec Loss 1.8655 LearningRate 0.0002 Epoch: 23 Global Step: 39920 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-26 09:53:25,315-Speed 25216.03 samples/sec Loss 1.8715 LearningRate 0.0002 Epoch: 23 Global Step: 39930 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:53:35,091-Speed 25142.44 samples/sec Loss 1.8473 LearningRate 0.0002 Epoch: 23 Global Step: 39940 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:53:44,807-Speed 25297.51 samples/sec Loss 1.8782 LearningRate 0.0002 Epoch: 23 Global Step: 39950 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:53:54,598-Speed 25103.81 samples/sec Loss 1.8676 LearningRate 0.0002 Epoch: 23 Global Step: 39960 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:54:04,305-Speed 25323.01 samples/sec Loss 1.8768 LearningRate 0.0002 Epoch: 23 Global Step: 39970 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:54:14,134-Speed 25004.72 samples/sec Loss 1.8811 LearningRate 0.0002 Epoch: 23 Global Step: 39980 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:54:23,832-Speed 25345.22 samples/sec Loss 1.8821 LearningRate 0.0002 Epoch: 23 Global Step: 39990 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:54:33,584-Speed 25202.57 samples/sec Loss 1.8777 LearningRate 0.0002 Epoch: 23 Global Step: 40000 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:54:43,391-Speed 25064.49 samples/sec Loss 1.8845 LearningRate 0.0002 Epoch: 23 Global Step: 40010 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:54:53,184-Speed 25098.09 samples/sec Loss 1.8797 LearningRate 0.0002 Epoch: 23 Global Step: 40020 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:55:02,940-Speed 25196.32 samples/sec Loss 1.9045 LearningRate 0.0002 Epoch: 23 Global Step: 40030 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:55:12,646-Speed 25325.42 samples/sec Loss 1.8886 LearningRate 0.0002 Epoch: 23 Global Step: 40040 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:55:22,320-Speed 25408.10 samples/sec Loss 1.8742 LearningRate 0.0002 Epoch: 23 Global Step: 40050 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:55:32,027-Speed 25321.34 samples/sec Loss 1.8554 LearningRate 0.0002 Epoch: 23 Global Step: 40060 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:55:41,830-Speed 25074.46 samples/sec Loss 1.8656 LearningRate 0.0002 Epoch: 23 Global Step: 40070 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:55:51,573-Speed 25232.12 samples/sec Loss 1.8592 LearningRate 0.0002 Epoch: 23 Global Step: 40080 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:56:01,391-Speed 25035.46 samples/sec Loss 1.8782 LearningRate 0.0002 Epoch: 23 Global Step: 40090 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:56:11,184-Speed 25099.82 samples/sec Loss 1.8751 LearningRate 0.0002 Epoch: 23 Global Step: 40100 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:56:20,903-Speed 25296.17 samples/sec Loss 1.8568 LearningRate 0.0002 Epoch: 23 Global Step: 40110 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:56:30,584-Speed 25387.15 samples/sec Loss 1.8774 LearningRate 0.0002 Epoch: 23 Global Step: 40120 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:56:40,320-Speed 25247.46 samples/sec Loss 1.8962 LearningRate 0.0002 Epoch: 23 Global Step: 40130 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-26 09:56:50,105-Speed 25117.26 samples/sec Loss 1.8678 LearningRate 0.0002 Epoch: 23 Global Step: 40140 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:56:59,843-Speed 25241.04 samples/sec Loss 1.8570 LearningRate 0.0002 Epoch: 23 Global Step: 40150 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:57:09,573-Speed 25262.85 samples/sec Loss 1.8808 LearningRate 0.0002 Epoch: 23 Global Step: 40160 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:57:19,392-Speed 25030.98 samples/sec Loss 1.8806 LearningRate 0.0002 Epoch: 23 Global Step: 40170 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:57:29,129-Speed 25244.17 samples/sec Loss 1.8646 LearningRate 0.0002 Epoch: 23 Global Step: 40180 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:57:38,956-Speed 25013.45 samples/sec Loss 1.8667 LearningRate 0.0002 Epoch: 23 Global Step: 40190 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:57:48,764-Speed 25060.93 samples/sec Loss 1.8797 LearningRate 0.0002 Epoch: 23 Global Step: 40200 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:57:58,629-Speed 24912.97 samples/sec Loss 1.8742 LearningRate 0.0002 Epoch: 23 Global Step: 40210 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:58:08,396-Speed 25165.85 samples/sec Loss 1.8850 LearningRate 0.0002 Epoch: 23 Global Step: 40220 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:58:18,204-Speed 25061.83 samples/sec Loss 1.8613 LearningRate 0.0002 Epoch: 23 Global Step: 40230 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:58:28,008-Speed 25070.92 samples/sec Loss 1.8713 LearningRate 0.0002 Epoch: 23 Global Step: 40240 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:58:37,761-Speed 25199.90 samples/sec Loss 1.8741 LearningRate 0.0002 Epoch: 23 Global Step: 40250 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:58:47,550-Speed 25108.28 samples/sec Loss 1.8645 LearningRate 0.0002 Epoch: 23 Global Step: 40260 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:58:57,265-Speed 25301.71 samples/sec Loss 1.8647 LearningRate 0.0002 Epoch: 23 Global Step: 40270 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:59:06,996-Speed 25260.56 samples/sec Loss 1.8530 LearningRate 0.0002 Epoch: 23 Global Step: 40280 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:59:16,728-Speed 25256.05 samples/sec Loss 1.8745 LearningRate 0.0002 Epoch: 23 Global Step: 40290 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:59:26,458-Speed 25262.27 samples/sec Loss 1.8890 LearningRate 0.0002 Epoch: 23 Global Step: 40300 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:59:36,185-Speed 25268.52 samples/sec Loss 1.8916 LearningRate 0.0002 Epoch: 23 Global Step: 40310 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:59:45,993-Speed 25059.10 samples/sec Loss 1.8656 LearningRate 0.0002 Epoch: 23 Global Step: 40320 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 09:59:55,741-Speed 25215.42 samples/sec Loss 1.8499 LearningRate 0.0002 Epoch: 23 Global Step: 40330 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:00:05,480-Speed 25237.35 samples/sec Loss 1.8500 LearningRate 0.0002 Epoch: 23 Global Step: 40340 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-26 10:00:15,194-Speed 25304.37 samples/sec Loss 1.8708 LearningRate 0.0002 Epoch: 23 Global Step: 40350 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:00:24,994-Speed 25079.79 samples/sec Loss 1.8691 LearningRate 0.0002 Epoch: 23 Global Step: 40360 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:00:34,808-Speed 25044.16 samples/sec Loss 1.8623 LearningRate 0.0002 Epoch: 23 Global Step: 40370 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:00:44,550-Speed 25231.56 samples/sec Loss 1.8717 LearningRate 0.0002 Epoch: 23 Global Step: 40380 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:00:54,285-Speed 25246.88 samples/sec Loss 1.8640 LearningRate 0.0002 Epoch: 23 Global Step: 40390 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:01:04,019-Speed 25251.84 samples/sec Loss 1.8518 LearningRate 0.0002 Epoch: 23 Global Step: 40400 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:01:13,795-Speed 25143.37 samples/sec Loss 1.8624 LearningRate 0.0002 Epoch: 23 Global Step: 40410 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:01:23,562-Speed 25164.73 samples/sec Loss 1.8408 LearningRate 0.0002 Epoch: 23 Global Step: 40420 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:01:33,301-Speed 25240.24 samples/sec Loss 1.8661 LearningRate 0.0002 Epoch: 23 Global Step: 40430 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:01:43,033-Speed 25254.68 samples/sec Loss 1.8596 LearningRate 0.0002 Epoch: 23 Global Step: 40440 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:01:52,769-Speed 25244.39 samples/sec Loss 1.8490 LearningRate 0.0002 Epoch: 23 Global Step: 40450 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-26 10:02:02,523-Speed 25199.24 samples/sec Loss 1.8613 LearningRate 0.0002 Epoch: 23 Global Step: 40460 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:02:12,262-Speed 25240.09 samples/sec Loss 1.8493 LearningRate 0.0002 Epoch: 23 Global Step: 40470 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:02:22,010-Speed 25212.84 samples/sec Loss 1.8494 LearningRate 0.0002 Epoch: 23 Global Step: 40480 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:02:31,781-Speed 25155.02 samples/sec Loss 1.8551 LearningRate 0.0002 Epoch: 23 Global Step: 40490 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:02:41,566-Speed 25118.46 samples/sec Loss 1.8684 LearningRate 0.0002 Epoch: 23 Global Step: 40500 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:02:51,403-Speed 24986.24 samples/sec Loss 1.8605 LearningRate 0.0002 Epoch: 23 Global Step: 40510 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:03:01,165-Speed 25177.45 samples/sec Loss 1.8458 LearningRate 0.0002 Epoch: 23 Global Step: 40520 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:03:10,923-Speed 25188.20 samples/sec Loss 1.8534 LearningRate 0.0002 Epoch: 23 Global Step: 40530 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:03:20,639-Speed 25297.43 samples/sec Loss 1.8806 LearningRate 0.0002 Epoch: 23 Global Step: 40540 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:03:30,462-Speed 25022.85 samples/sec Loss 1.8554 LearningRate 0.0002 Epoch: 23 Global Step: 40550 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:03:40,329-Speed 24911.87 samples/sec Loss 1.8435 LearningRate 0.0002 Epoch: 23 Global Step: 40560 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-26 10:03:50,015-Speed 25374.69 samples/sec Loss 1.8644 LearningRate 0.0002 Epoch: 23 Global Step: 40570 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-26 10:03:59,854-Speed 24982.00 samples/sec Loss 1.8428 LearningRate 0.0002 Epoch: 23 Global Step: 40580 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:04:09,586-Speed 25258.69 samples/sec Loss 1.8763 LearningRate 0.0002 Epoch: 23 Global Step: 40590 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:04:19,358-Speed 25153.30 samples/sec Loss 1.8576 LearningRate 0.0002 Epoch: 23 Global Step: 40600 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:04:29,074-Speed 25296.94 samples/sec Loss 1.8506 LearningRate 0.0002 Epoch: 23 Global Step: 40610 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:04:38,808-Speed 25252.84 samples/sec Loss 1.8511 LearningRate 0.0002 Epoch: 23 Global Step: 40620 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:04:48,499-Speed 25361.28 samples/sec Loss 1.8663 LearningRate 0.0002 Epoch: 23 Global Step: 40630 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:04:58,261-Speed 25182.49 samples/sec Loss 1.8570 LearningRate 0.0002 Epoch: 23 Global Step: 40640 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:05:08,011-Speed 25211.43 samples/sec Loss 1.8682 LearningRate 0.0002 Epoch: 23 Global Step: 40650 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:05:17,769-Speed 25188.73 samples/sec Loss 1.8687 LearningRate 0.0002 Epoch: 23 Global Step: 40660 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:05:27,514-Speed 25221.47 samples/sec Loss 1.8394 LearningRate 0.0002 Epoch: 23 Global Step: 40670 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:05:37,377-Speed 24921.47 samples/sec Loss 1.8274 LearningRate 0.0002 Epoch: 23 Global Step: 40680 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-26 10:05:47,115-Speed 25240.75 samples/sec Loss 1.8445 LearningRate 0.0002 Epoch: 23 Global Step: 40690 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-26 10:05:56,835-Speed 25286.93 samples/sec Loss 1.8650 LearningRate 0.0002 Epoch: 23 Global Step: 40700 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-26 10:06:06,587-Speed 25204.40 samples/sec Loss 1.8333 LearningRate 0.0002 Epoch: 23 Global Step: 40710 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:06:16,320-Speed 25252.14 samples/sec Loss 1.8611 LearningRate 0.0002 Epoch: 23 Global Step: 40720 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:06:26,079-Speed 25184.21 samples/sec Loss 1.8416 LearningRate 0.0002 Epoch: 23 Global Step: 40730 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:06:35,836-Speed 25193.44 samples/sec Loss 1.8378 LearningRate 0.0002 Epoch: 23 Global Step: 40740 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:06:45,731-Speed 24838.26 samples/sec Loss 1.8167 LearningRate 0.0002 Epoch: 23 Global Step: 40750 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:06:55,553-Speed 25025.09 samples/sec Loss 1.8330 LearningRate 0.0002 Epoch: 23 Global Step: 40760 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:07:05,346-Speed 25097.85 samples/sec Loss 1.8540 LearningRate 0.0002 Epoch: 23 Global Step: 40770 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:07:15,054-Speed 25318.58 samples/sec Loss 1.8576 LearningRate 0.0002 Epoch: 23 Global Step: 40780 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:07:24,860-Speed 25064.02 samples/sec Loss 1.8460 LearningRate 0.0002 Epoch: 23 Global Step: 40790 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:07:34,692-Speed 25000.38 samples/sec Loss 1.8389 LearningRate 0.0002 Epoch: 23 Global Step: 40800 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:07:44,421-Speed 25263.44 samples/sec Loss 1.8325 LearningRate 0.0002 Epoch: 23 Global Step: 40810 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-26 10:07:54,134-Speed 25305.73 samples/sec Loss 1.8377 LearningRate 0.0002 Epoch: 23 Global Step: 40820 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:08:03,899-Speed 25170.34 samples/sec Loss 1.8615 LearningRate 0.0002 Epoch: 23 Global Step: 40830 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:08:13,633-Speed 25251.99 samples/sec Loss 1.8373 LearningRate 0.0002 Epoch: 23 Global Step: 40840 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:08:23,410-Speed 25137.15 samples/sec Loss 1.8459 LearningRate 0.0002 Epoch: 23 Global Step: 40850 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:08:33,399-Speed 24606.84 samples/sec Loss 1.8551 LearningRate 0.0002 Epoch: 23 Global Step: 40860 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:08:43,188-Speed 25107.99 samples/sec Loss 1.8474 LearningRate 0.0002 Epoch: 23 Global Step: 40870 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:08:52,911-Speed 25280.60 samples/sec Loss 1.8551 LearningRate 0.0002 Epoch: 23 Global Step: 40880 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:09:02,813-Speed 24821.46 samples/sec Loss 1.8461 LearningRate 0.0002 Epoch: 23 Global Step: 40890 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:09:12,650-Speed 24983.68 samples/sec Loss 1.8491 LearningRate 0.0002 Epoch: 23 Global Step: 40900 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:09:22,602-Speed 24697.61 samples/sec Loss 1.8467 LearningRate 0.0002 Epoch: 23 Global Step: 40910 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:09:32,460-Speed 24933.38 samples/sec Loss 1.8326 LearningRate 0.0002 Epoch: 23 Global Step: 40920 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-26 10:09:42,191-Speed 25261.63 samples/sec Loss 1.8457 LearningRate 0.0002 Epoch: 23 Global Step: 40930 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:09:51,988-Speed 25087.13 samples/sec Loss 1.8468 LearningRate 0.0002 Epoch: 23 Global Step: 40940 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:10:01,772-Speed 25122.46 samples/sec Loss 1.8331 LearningRate 0.0002 Epoch: 23 Global Step: 40950 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:10:11,651-Speed 24879.53 samples/sec Loss 1.8201 LearningRate 0.0002 Epoch: 23 Global Step: 40960 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:10:21,371-Speed 25286.78 samples/sec Loss 1.8343 LearningRate 0.0002 Epoch: 23 Global Step: 40970 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:10:31,202-Speed 25001.56 samples/sec Loss 1.8451 LearningRate 0.0002 Epoch: 23 Global Step: 40980 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:10:41,010-Speed 25057.95 samples/sec Loss 1.8401 LearningRate 0.0002 Epoch: 23 Global Step: 40990 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:10:50,777-Speed 25173.56 samples/sec Loss 1.8308 LearningRate 0.0002 Epoch: 23 Global Step: 41000 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:11:00,616-Speed 24981.17 samples/sec Loss 1.8302 LearningRate 0.0002 Epoch: 23 Global Step: 41010 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:11:10,456-Speed 24977.15 samples/sec Loss 1.8266 LearningRate 0.0002 Epoch: 23 Global Step: 41020 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:11:20,345-Speed 24854.43 samples/sec Loss 1.8450 LearningRate 0.0002 Epoch: 23 Global Step: 41030 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:11:30,345-Speed 24578.85 samples/sec Loss 1.8387 LearningRate 0.0002 Epoch: 23 Global Step: 41040 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:11:40,267-Speed 24771.46 samples/sec Loss 1.8495 LearningRate 0.0002 Epoch: 23 Global Step: 41050 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:11:50,000-Speed 25253.53 samples/sec Loss 1.8484 LearningRate 0.0002 Epoch: 23 Global Step: 41060 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:11:59,944-Speed 24716.48 samples/sec Loss 1.8368 LearningRate 0.0002 Epoch: 23 Global Step: 41070 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:12:09,657-Speed 25306.62 samples/sec Loss 1.8302 LearningRate 0.0002 Epoch: 23 Global Step: 41080 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:12:19,446-Speed 25107.69 samples/sec Loss 1.8424 LearningRate 0.0002 Epoch: 23 Global Step: 41090 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:12:29,298-Speed 24949.54 samples/sec Loss 1.8319 LearningRate 0.0002 Epoch: 23 Global Step: 41100 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:12:39,070-Speed 25153.66 samples/sec Loss 1.8073 LearningRate 0.0002 Epoch: 23 Global Step: 41110 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:12:49,011-Speed 24728.54 samples/sec Loss 1.8115 LearningRate 0.0002 Epoch: 23 Global Step: 41120 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:12:58,749-Speed 25245.62 samples/sec Loss 1.8135 LearningRate 0.0002 Epoch: 23 Global Step: 41130 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:13:08,452-Speed 25338.47 samples/sec Loss 1.8352 LearningRate 0.0002 Epoch: 23 Global Step: 41140 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:13:18,297-Speed 24966.47 samples/sec Loss 1.8246 LearningRate 0.0002 Epoch: 23 Global Step: 41150 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:13:28,037-Speed 25240.14 samples/sec Loss 1.8201 LearningRate 0.0002 Epoch: 23 Global Step: 41160 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:13:37,821-Speed 25121.11 samples/sec Loss 1.8238 LearningRate 0.0002 Epoch: 23 Global Step: 41170 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:13:47,503-Speed 25388.65 samples/sec Loss 1.8306 LearningRate 0.0002 Epoch: 23 Global Step: 41180 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:13:57,283-Speed 25136.00 samples/sec Loss 1.8294 LearningRate 0.0002 Epoch: 23 Global Step: 41190 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:14:07,062-Speed 25134.70 samples/sec Loss 1.8404 LearningRate 0.0002 Epoch: 23 Global Step: 41200 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:14:16,845-Speed 25125.94 samples/sec Loss 1.8390 LearningRate 0.0002 Epoch: 23 Global Step: 41210 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:14:26,603-Speed 25187.27 samples/sec Loss 1.8129 LearningRate 0.0002 Epoch: 23 Global Step: 41220 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:14:36,488-Speed 24872.54 samples/sec Loss 1.8208 LearningRate 0.0002 Epoch: 23 Global Step: 41230 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-26 10:14:46,219-Speed 25256.93 samples/sec Loss 1.8175 LearningRate 0.0002 Epoch: 23 Global Step: 41240 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:14:55,991-Speed 25154.62 samples/sec Loss 1.8315 LearningRate 0.0002 Epoch: 23 Global Step: 41250 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:15:05,736-Speed 25220.44 samples/sec Loss 1.8344 LearningRate 0.0002 Epoch: 23 Global Step: 41260 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:15:15,469-Speed 25252.99 samples/sec Loss 1.8202 LearningRate 0.0002 Epoch: 23 Global Step: 41270 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:15:25,285-Speed 25040.35 samples/sec Loss 1.8211 LearningRate 0.0002 Epoch: 23 Global Step: 41280 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:15:35,268-Speed 24620.58 samples/sec Loss 1.8242 LearningRate 0.0002 Epoch: 23 Global Step: 41290 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:15:45,314-Speed 24467.62 samples/sec Loss 1.8269 LearningRate 0.0002 Epoch: 23 Global Step: 41300 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:15:55,179-Speed 24913.28 samples/sec Loss 1.8313 LearningRate 0.0002 Epoch: 23 Global Step: 41310 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:16:04,994-Speed 25043.19 samples/sec Loss 1.8472 LearningRate 0.0002 Epoch: 23 Global Step: 41320 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:16:14,801-Speed 25062.04 samples/sec Loss 1.8327 LearningRate 0.0002 Epoch: 23 Global Step: 41330 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:16:24,573-Speed 25153.07 samples/sec Loss 1.8351 LearningRate 0.0002 Epoch: 23 Global Step: 41340 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:16:34,322-Speed 25209.86 samples/sec Loss 1.8382 LearningRate 0.0002 Epoch: 23 Global Step: 41350 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:16:44,193-Speed 24901.24 samples/sec Loss 1.8323 LearningRate 0.0002 Epoch: 23 Global Step: 41360 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:16:54,014-Speed 25028.07 samples/sec Loss 1.8338 LearningRate 0.0002 Epoch: 23 Global Step: 41370 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:17:03,863-Speed 24957.73 samples/sec Loss 1.8335 LearningRate 0.0002 Epoch: 23 Global Step: 41380 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:17:13,636-Speed 25149.01 samples/sec Loss 1.8307 LearningRate 0.0002 Epoch: 23 Global Step: 41390 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:17:23,578-Speed 24731.44 samples/sec Loss 1.8201 LearningRate 0.0002 Epoch: 23 Global Step: 41400 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:17:33,426-Speed 24961.39 samples/sec Loss 1.8349 LearningRate 0.0002 Epoch: 23 Global Step: 41410 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:17:43,304-Speed 24883.66 samples/sec Loss 1.8325 LearningRate 0.0002 Epoch: 23 Global Step: 41420 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:17:53,138-Speed 24994.31 samples/sec Loss 1.8162 LearningRate 0.0002 Epoch: 23 Global Step: 41430 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:18:02,989-Speed 24950.75 samples/sec Loss 1.8275 LearningRate 0.0002 Epoch: 23 Global Step: 41440 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-26 10:18:12,802-Speed 25047.83 samples/sec Loss 1.8300 LearningRate 0.0002 Epoch: 23 Global Step: 41450 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:18:22,602-Speed 25081.40 samples/sec Loss 1.8100 LearningRate 0.0002 Epoch: 23 Global Step: 41460 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:18:32,357-Speed 25198.54 samples/sec Loss 1.8288 LearningRate 0.0002 Epoch: 23 Global Step: 41470 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:18:42,181-Speed 25019.15 samples/sec Loss 1.8159 LearningRate 0.0002 Epoch: 23 Global Step: 41480 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:19:43,135-Speed 4032.18 samples/sec Loss 1.8087 LearningRate 0.0002 Epoch: 24 Global Step: 41490 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:19:52,961-Speed 25013.41 samples/sec Loss 1.8088 LearningRate 0.0002 Epoch: 24 Global Step: 41500 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:20:02,829-Speed 24908.98 samples/sec Loss 1.8191 LearningRate 0.0002 Epoch: 24 Global Step: 41510 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:20:12,621-Speed 25099.67 samples/sec Loss 1.8117 LearningRate 0.0002 Epoch: 24 Global Step: 41520 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:20:22,474-Speed 24946.74 samples/sec Loss 1.8293 LearningRate 0.0002 Epoch: 24 Global Step: 41530 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:20:32,335-Speed 24927.54 samples/sec Loss 1.8380 LearningRate 0.0002 Epoch: 24 Global Step: 41540 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:20:42,195-Speed 24931.17 samples/sec Loss 1.8017 LearningRate 0.0002 Epoch: 24 Global Step: 41550 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:20:52,143-Speed 24709.48 samples/sec Loss 1.7987 LearningRate 0.0002 Epoch: 24 Global Step: 41560 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:21:01,988-Speed 24968.77 samples/sec Loss 1.7898 LearningRate 0.0002 Epoch: 24 Global Step: 41570 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:21:11,761-Speed 25152.75 samples/sec Loss 1.8086 LearningRate 0.0002 Epoch: 24 Global Step: 41580 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:21:21,630-Speed 24904.85 samples/sec Loss 1.8203 LearningRate 0.0002 Epoch: 24 Global Step: 41590 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:21:31,514-Speed 24868.16 samples/sec Loss 1.7947 LearningRate 0.0002 Epoch: 24 Global Step: 41600 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:21:41,329-Speed 25042.97 samples/sec Loss 1.8154 LearningRate 0.0002 Epoch: 24 Global Step: 41610 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:21:51,078-Speed 25211.54 samples/sec Loss 1.8022 LearningRate 0.0002 Epoch: 24 Global Step: 41620 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:22:00,916-Speed 24985.29 samples/sec Loss 1.7988 LearningRate 0.0002 Epoch: 24 Global Step: 41630 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:22:10,792-Speed 24887.00 samples/sec Loss 1.8180 LearningRate 0.0002 Epoch: 24 Global Step: 41640 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:22:20,714-Speed 24772.46 samples/sec Loss 1.8184 LearningRate 0.0002 Epoch: 24 Global Step: 41650 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-26 10:22:30,476-Speed 25180.49 samples/sec Loss 1.8073 LearningRate 0.0002 Epoch: 24 Global Step: 41660 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:22:40,447-Speed 24650.84 samples/sec Loss 1.8108 LearningRate 0.0002 Epoch: 24 Global Step: 41670 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:22:50,363-Speed 24786.56 samples/sec Loss 1.8133 LearningRate 0.0002 Epoch: 24 Global Step: 41680 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:23:00,276-Speed 24793.33 samples/sec Loss 1.8404 LearningRate 0.0002 Epoch: 24 Global Step: 41690 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:23:10,109-Speed 24998.95 samples/sec Loss 1.8135 LearningRate 0.0002 Epoch: 24 Global Step: 41700 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:23:19,898-Speed 25109.44 samples/sec Loss 1.7990 LearningRate 0.0002 Epoch: 24 Global Step: 41710 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:23:29,825-Speed 24760.99 samples/sec Loss 1.7855 LearningRate 0.0002 Epoch: 24 Global Step: 41720 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:23:39,702-Speed 24885.83 samples/sec Loss 1.7743 LearningRate 0.0002 Epoch: 24 Global Step: 41730 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:23:49,542-Speed 24979.98 samples/sec Loss 1.8007 LearningRate 0.0002 Epoch: 24 Global Step: 41740 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:23:59,476-Speed 24749.48 samples/sec Loss 1.7996 LearningRate 0.0002 Epoch: 24 Global Step: 41750 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:24:09,324-Speed 24957.48 samples/sec Loss 1.8026 LearningRate 0.0002 Epoch: 24 Global Step: 41760 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-26 10:24:19,119-Speed 25094.61 samples/sec Loss 1.8178 LearningRate 0.0002 Epoch: 24 Global Step: 41770 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-26 10:24:28,940-Speed 25027.80 samples/sec Loss 1.8317 LearningRate 0.0002 Epoch: 24 Global Step: 41780 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:24:38,917-Speed 24635.14 samples/sec Loss 1.7987 LearningRate 0.0002 Epoch: 24 Global Step: 41790 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:24:48,778-Speed 24925.12 samples/sec Loss 1.8089 LearningRate 0.0002 Epoch: 24 Global Step: 41800 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:24:58,534-Speed 25195.14 samples/sec Loss 1.8080 LearningRate 0.0002 Epoch: 24 Global Step: 41810 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:25:08,345-Speed 25052.40 samples/sec Loss 1.8073 LearningRate 0.0002 Epoch: 24 Global Step: 41820 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:25:18,169-Speed 25020.50 samples/sec Loss 1.8271 LearningRate 0.0002 Epoch: 24 Global Step: 41830 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:25:28,193-Speed 24521.53 samples/sec Loss 1.8039 LearningRate 0.0002 Epoch: 24 Global Step: 41840 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:25:37,931-Speed 25239.97 samples/sec Loss 1.8124 LearningRate 0.0002 Epoch: 24 Global Step: 41850 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:25:47,710-Speed 25134.71 samples/sec Loss 1.8508 LearningRate 0.0002 Epoch: 24 Global Step: 41860 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:25:57,486-Speed 25142.82 samples/sec Loss 1.8186 LearningRate 0.0002 Epoch: 24 Global Step: 41870 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:26:07,357-Speed 24907.54 samples/sec Loss 1.8081 LearningRate 0.0002 Epoch: 24 Global Step: 41880 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-26 10:26:17,103-Speed 25217.80 samples/sec Loss 1.7998 LearningRate 0.0002 Epoch: 24 Global Step: 41890 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-26 10:26:26,892-Speed 25110.44 samples/sec Loss 1.8012 LearningRate 0.0002 Epoch: 24 Global Step: 41900 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-26 10:26:36,830-Speed 24732.10 samples/sec Loss 1.8106 LearningRate 0.0002 Epoch: 24 Global Step: 41910 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-26 10:26:46,668-Speed 24992.11 samples/sec Loss 1.8037 LearningRate 0.0002 Epoch: 24 Global Step: 41920 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-26 10:26:56,569-Speed 24826.94 samples/sec Loss 1.7841 LearningRate 0.0002 Epoch: 24 Global Step: 41930 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-26 10:27:06,345-Speed 25142.22 samples/sec Loss 1.7924 LearningRate 0.0002 Epoch: 24 Global Step: 41940 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-26 10:27:16,138-Speed 25099.98 samples/sec Loss 1.7833 LearningRate 0.0002 Epoch: 24 Global Step: 41950 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-26 10:27:25,920-Speed 25127.18 samples/sec Loss 1.8067 LearningRate 0.0002 Epoch: 24 Global Step: 41960 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-26 10:27:35,811-Speed 24853.72 samples/sec Loss 1.7806 LearningRate 0.0002 Epoch: 24 Global Step: 41970 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-26 10:27:45,684-Speed 24896.84 samples/sec Loss 1.7914 LearningRate 0.0002 Epoch: 24 Global Step: 41980 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:27:55,647-Speed 24672.21 samples/sec Loss 1.8130 LearningRate 0.0002 Epoch: 24 Global Step: 41990 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:28:05,434-Speed 25115.06 samples/sec Loss 1.7954 LearningRate 0.0002 Epoch: 24 Global Step: 42000 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:28:15,291-Speed 24933.85 samples/sec Loss 1.8092 LearningRate 0.0002 Epoch: 24 Global Step: 42010 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:28:25,149-Speed 24934.81 samples/sec Loss 1.8024 LearningRate 0.0002 Epoch: 24 Global Step: 42020 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:28:35,028-Speed 24881.28 samples/sec Loss 1.7928 LearningRate 0.0002 Epoch: 24 Global Step: 42030 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:28:44,873-Speed 24965.24 samples/sec Loss 1.8078 LearningRate 0.0002 Epoch: 24 Global Step: 42040 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:28:54,577-Speed 25331.52 samples/sec Loss 1.7798 LearningRate 0.0002 Epoch: 24 Global Step: 42050 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:29:04,389-Speed 25051.10 samples/sec Loss 1.7896 LearningRate 0.0002 Epoch: 24 Global Step: 42060 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:29:14,201-Speed 25052.47 samples/sec Loss 1.8013 LearningRate 0.0002 Epoch: 24 Global Step: 42070 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:29:24,045-Speed 24974.44 samples/sec Loss 1.7983 LearningRate 0.0002 Epoch: 24 Global Step: 42080 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-26 10:29:33,887-Speed 24976.22 samples/sec Loss 1.7928 LearningRate 0.0002 Epoch: 24 Global Step: 42090 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:29:43,823-Speed 24736.44 samples/sec Loss 1.8037 LearningRate 0.0002 Epoch: 24 Global Step: 42100 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:29:53,745-Speed 24774.49 samples/sec Loss 1.7958 LearningRate 0.0002 Epoch: 24 Global Step: 42110 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:30:03,517-Speed 25152.30 samples/sec Loss 1.8115 LearningRate 0.0002 Epoch: 24 Global Step: 42120 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:30:13,291-Speed 25146.34 samples/sec Loss 1.7803 LearningRate 0.0002 Epoch: 24 Global Step: 42130 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:30:23,026-Speed 25245.22 samples/sec Loss 1.8075 LearningRate 0.0002 Epoch: 24 Global Step: 42140 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:30:32,784-Speed 25190.08 samples/sec Loss 1.8037 LearningRate 0.0002 Epoch: 24 Global Step: 42150 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:30:42,664-Speed 24879.36 samples/sec Loss 1.8112 LearningRate 0.0002 Epoch: 24 Global Step: 42160 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:30:52,398-Speed 25250.71 samples/sec Loss 1.7968 LearningRate 0.0002 Epoch: 24 Global Step: 42170 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:31:02,191-Speed 25099.33 samples/sec Loss 1.7849 LearningRate 0.0002 Epoch: 24 Global Step: 42180 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:31:11,937-Speed 25219.82 samples/sec Loss 1.7659 LearningRate 0.0002 Epoch: 24 Global Step: 42190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-26 10:31:21,714-Speed 25140.08 samples/sec Loss 1.7931 LearningRate 0.0002 Epoch: 24 Global Step: 42200 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:31:31,529-Speed 25042.29 samples/sec Loss 1.7918 LearningRate 0.0002 Epoch: 24 Global Step: 42210 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:31:41,423-Speed 24841.57 samples/sec Loss 1.7836 LearningRate 0.0002 Epoch: 24 Global Step: 42220 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:31:51,235-Speed 25051.88 samples/sec Loss 1.7816 LearningRate 0.0002 Epoch: 24 Global Step: 42230 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:32:01,095-Speed 24927.45 samples/sec Loss 1.8196 LearningRate 0.0002 Epoch: 24 Global Step: 42240 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:32:10,917-Speed 25023.37 samples/sec Loss 1.7939 LearningRate 0.0002 Epoch: 24 Global Step: 42250 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:32:20,691-Speed 25149.23 samples/sec Loss 1.7642 LearningRate 0.0002 Epoch: 24 Global Step: 42260 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:32:30,406-Speed 25298.68 samples/sec Loss 1.7667 LearningRate 0.0002 Epoch: 24 Global Step: 42270 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:32:40,245-Speed 24980.96 samples/sec Loss 1.7792 LearningRate 0.0002 Epoch: 24 Global Step: 42280 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:32:50,024-Speed 25134.78 samples/sec Loss 1.7827 LearningRate 0.0002 Epoch: 24 Global Step: 42290 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:32:59,744-Speed 25287.00 samples/sec Loss 1.7823 LearningRate 0.0002 Epoch: 24 Global Step: 42300 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-26 10:33:09,513-Speed 25162.21 samples/sec Loss 1.7887 LearningRate 0.0002 Epoch: 24 Global Step: 42310 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:33:19,321-Speed 25061.44 samples/sec Loss 1.7873 LearningRate 0.0002 Epoch: 24 Global Step: 42320 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:33:29,188-Speed 24910.26 samples/sec Loss 1.7984 LearningRate 0.0002 Epoch: 24 Global Step: 42330 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:33:39,069-Speed 24882.96 samples/sec Loss 1.7755 LearningRate 0.0002 Epoch: 24 Global Step: 42340 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:33:48,928-Speed 24930.01 samples/sec Loss 1.7932 LearningRate 0.0002 Epoch: 24 Global Step: 42350 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:33:58,710-Speed 25131.93 samples/sec Loss 1.7971 LearningRate 0.0002 Epoch: 24 Global Step: 42360 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:34:08,492-Speed 25125.29 samples/sec Loss 1.7969 LearningRate 0.0002 Epoch: 24 Global Step: 42370 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:34:18,236-Speed 25226.39 samples/sec Loss 1.7779 LearningRate 0.0002 Epoch: 24 Global Step: 42380 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:34:28,130-Speed 24843.64 samples/sec Loss 1.7696 LearningRate 0.0002 Epoch: 24 Global Step: 42390 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:34:37,950-Speed 25036.17 samples/sec Loss 1.7617 LearningRate 0.0002 Epoch: 24 Global Step: 42400 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:34:47,714-Speed 25172.08 samples/sec Loss 1.7676 LearningRate 0.0002 Epoch: 24 Global Step: 42410 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-26 10:34:57,502-Speed 25118.85 samples/sec Loss 1.7839 LearningRate 0.0002 Epoch: 24 Global Step: 42420 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:35:07,296-Speed 25098.63 samples/sec Loss 1.7908 LearningRate 0.0002 Epoch: 24 Global Step: 42430 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:35:17,153-Speed 24933.85 samples/sec Loss 1.7740 LearningRate 0.0002 Epoch: 24 Global Step: 42440 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:35:27,018-Speed 24914.70 samples/sec Loss 1.7760 LearningRate 0.0002 Epoch: 24 Global Step: 42450 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:35:36,917-Speed 24829.43 samples/sec Loss 1.8010 LearningRate 0.0002 Epoch: 24 Global Step: 42460 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:35:46,826-Speed 24804.63 samples/sec Loss 1.7851 LearningRate 0.0002 Epoch: 24 Global Step: 42470 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:35:56,723-Speed 24836.97 samples/sec Loss 1.7980 LearningRate 0.0002 Epoch: 24 Global Step: 42480 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:36:06,575-Speed 24949.39 samples/sec Loss 1.7699 LearningRate 0.0002 Epoch: 24 Global Step: 42490 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:36:16,373-Speed 25085.07 samples/sec Loss 1.7628 LearningRate 0.0002 Epoch: 24 Global Step: 42500 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:36:26,232-Speed 24931.46 samples/sec Loss 1.7694 LearningRate 0.0002 Epoch: 24 Global Step: 42510 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:36:36,020-Speed 25118.79 samples/sec Loss 1.7732 LearningRate 0.0002 Epoch: 24 Global Step: 42520 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:36:45,834-Speed 25045.71 samples/sec Loss 1.7610 LearningRate 0.0002 Epoch: 24 Global Step: 42530 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:36:55,729-Speed 24839.39 samples/sec Loss 1.7696 LearningRate 0.0002 Epoch: 24 Global Step: 42540 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-26 10:37:05,927-Speed 24103.61 samples/sec Loss 1.7769 LearningRate 0.0002 Epoch: 24 Global Step: 42550 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:37:15,951-Speed 24519.27 samples/sec Loss 1.7771 LearningRate 0.0002 Epoch: 24 Global Step: 42560 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:37:25,860-Speed 24806.44 samples/sec Loss 1.7990 LearningRate 0.0002 Epoch: 24 Global Step: 42570 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-26 10:37:35,870-Speed 24555.06 samples/sec Loss 1.7826 LearningRate 0.0002 Epoch: 24 Global Step: 42580 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-26 10:37:45,998-Speed 24267.27 samples/sec Loss 1.7693 LearningRate 0.0002 Epoch: 24 Global Step: 42590 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-26 10:37:55,945-Speed 24709.09 samples/sec Loss 1.7671 LearningRate 0.0002 Epoch: 24 Global Step: 42600 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-26 10:38:05,929-Speed 24620.78 samples/sec Loss 1.7539 LearningRate 0.0002 Epoch: 24 Global Step: 42610 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-26 10:38:15,987-Speed 24436.95 samples/sec Loss 1.7694 LearningRate 0.0002 Epoch: 24 Global Step: 42620 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-26 10:38:26,065-Speed 24387.91 samples/sec Loss 1.7660 LearningRate 0.0002 Epoch: 24 Global Step: 42630 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-26 10:38:36,091-Speed 24516.44 samples/sec Loss 1.7669 LearningRate 0.0002 Epoch: 24 Global Step: 42640 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-26 10:38:45,960-Speed 24906.76 samples/sec Loss 1.7758 LearningRate 0.0002 Epoch: 24 Global Step: 42650 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-26 10:38:55,865-Speed 24816.09 samples/sec Loss 1.7715 LearningRate 0.0002 Epoch: 24 Global Step: 42660 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-26 10:39:05,769-Speed 24819.58 samples/sec Loss 1.7848 LearningRate 0.0002 Epoch: 24 Global Step: 42670 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:39:15,610-Speed 24977.35 samples/sec Loss 1.7905 LearningRate 0.0002 Epoch: 24 Global Step: 42680 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:39:25,411-Speed 25078.08 samples/sec Loss 1.7757 LearningRate 0.0002 Epoch: 24 Global Step: 42690 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:39:35,216-Speed 25069.98 samples/sec Loss 1.7834 LearningRate 0.0002 Epoch: 24 Global Step: 42700 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:39:45,043-Speed 25012.01 samples/sec Loss 1.7866 LearningRate 0.0002 Epoch: 24 Global Step: 42710 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:39:54,817-Speed 25148.94 samples/sec Loss 1.7760 LearningRate 0.0002 Epoch: 24 Global Step: 42720 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:40:04,640-Speed 25021.71 samples/sec Loss 1.7899 LearningRate 0.0002 Epoch: 24 Global Step: 42730 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:40:14,510-Speed 24904.29 samples/sec Loss 1.7645 LearningRate 0.0002 Epoch: 24 Global Step: 42740 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:40:24,302-Speed 25103.04 samples/sec Loss 1.7672 LearningRate 0.0002 Epoch: 24 Global Step: 42750 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:40:34,070-Speed 25164.93 samples/sec Loss 1.7691 LearningRate 0.0002 Epoch: 24 Global Step: 42760 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:40:43,840-Speed 25159.60 samples/sec Loss 1.7627 LearningRate 0.0002 Epoch: 24 Global Step: 42770 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:40:53,591-Speed 25205.61 samples/sec Loss 1.7534 LearningRate 0.0002 Epoch: 24 Global Step: 42780 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:41:03,311-Speed 25285.98 samples/sec Loss 1.7538 LearningRate 0.0002 Epoch: 24 Global Step: 42790 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:41:13,070-Speed 25188.10 samples/sec Loss 1.7639 LearningRate 0.0002 Epoch: 24 Global Step: 42800 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:41:22,851-Speed 25128.95 samples/sec Loss 1.7787 LearningRate 0.0002 Epoch: 24 Global Step: 42810 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-26 10:41:32,621-Speed 25157.13 samples/sec Loss 1.7719 LearningRate 0.0002 Epoch: 24 Global Step: 42820 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-26 10:41:42,495-Speed 24892.25 samples/sec Loss 1.7732 LearningRate 0.0002 Epoch: 24 Global Step: 42830 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-26 10:41:52,301-Speed 25066.77 samples/sec Loss 1.7732 LearningRate 0.0002 Epoch: 24 Global Step: 42840 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-26 10:42:02,083-Speed 25125.66 samples/sec Loss 1.7663 LearningRate 0.0002 Epoch: 24 Global Step: 42850 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-26 10:42:11,836-Speed 25203.64 samples/sec Loss 1.7710 LearningRate 0.0002 Epoch: 24 Global Step: 42860 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-26 10:42:21,517-Speed 25388.17 samples/sec Loss 1.7699 LearningRate 0.0002 Epoch: 24 Global Step: 42870 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-26 10:42:31,334-Speed 25050.75 samples/sec Loss 1.7678 LearningRate 0.0002 Epoch: 24 Global Step: 42880 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-26 10:42:41,177-Speed 24973.90 samples/sec Loss 1.7523 LearningRate 0.0002 Epoch: 24 Global Step: 42890 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-26 10:42:50,992-Speed 25041.53 samples/sec Loss 1.7543 LearningRate 0.0002 Epoch: 24 Global Step: 42900 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-26 10:43:00,798-Speed 25066.93 samples/sec Loss 1.7624 LearningRate 0.0002 Epoch: 24 Global Step: 42910 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:43:10,725-Speed 24760.07 samples/sec Loss 1.7545 LearningRate 0.0002 Epoch: 24 Global Step: 42920 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:43:20,599-Speed 24892.64 samples/sec Loss 1.7638 LearningRate 0.0002 Epoch: 24 Global Step: 42930 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:43:30,471-Speed 24898.33 samples/sec Loss 1.7844 LearningRate 0.0002 Epoch: 24 Global Step: 42940 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:43:40,247-Speed 25141.65 samples/sec Loss 1.7564 LearningRate 0.0002 Epoch: 24 Global Step: 42950 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:43:50,018-Speed 25159.74 samples/sec Loss 1.7824 LearningRate 0.0002 Epoch: 24 Global Step: 42960 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:43:59,766-Speed 25216.34 samples/sec Loss 1.7607 LearningRate 0.0002 Epoch: 24 Global Step: 42970 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:44:09,572-Speed 25064.68 samples/sec Loss 1.7560 LearningRate 0.0002 Epoch: 24 Global Step: 42980 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:44:19,315-Speed 25227.75 samples/sec Loss 1.7702 LearningRate 0.0002 Epoch: 24 Global Step: 42990 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:44:29,043-Speed 25265.52 samples/sec Loss 1.7707 LearningRate 0.0002 Epoch: 24 Global Step: 43000 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:44:38,806-Speed 25176.93 samples/sec Loss 1.7702 LearningRate 0.0002 Epoch: 24 Global Step: 43010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-26 10:44:48,664-Speed 24935.55 samples/sec Loss 1.7653 LearningRate 0.0002 Epoch: 24 Global Step: 43020 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:44:58,382-Speed 25291.80 samples/sec Loss 1.7659 LearningRate 0.0002 Epoch: 24 Global Step: 43030 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:45:08,213-Speed 25002.06 samples/sec Loss 1.7639 LearningRate 0.0002 Epoch: 24 Global Step: 43040 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:45:18,175-Speed 24673.68 samples/sec Loss 1.8043 LearningRate 0.0002 Epoch: 24 Global Step: 43050 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:45:27,999-Speed 25019.76 samples/sec Loss 1.7683 LearningRate 0.0002 Epoch: 24 Global Step: 43060 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:45:37,690-Speed 25362.15 samples/sec Loss 1.7623 LearningRate 0.0002 Epoch: 24 Global Step: 43070 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:45:47,515-Speed 25016.83 samples/sec Loss 1.7470 LearningRate 0.0002 Epoch: 24 Global Step: 43080 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:45:57,283-Speed 25162.26 samples/sec Loss 1.7452 LearningRate 0.0002 Epoch: 24 Global Step: 43090 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:46:07,132-Speed 24962.27 samples/sec Loss 1.7597 LearningRate 0.0002 Epoch: 24 Global Step: 43100 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:46:17,071-Speed 24728.61 samples/sec Loss 1.7675 LearningRate 0.0002 Epoch: 24 Global Step: 43110 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:46:26,836-Speed 25169.73 samples/sec Loss 1.7837 LearningRate 0.0002 Epoch: 24 Global Step: 43120 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:46:36,662-Speed 25018.56 samples/sec Loss 1.7686 LearningRate 0.0002 Epoch: 24 Global Step: 43130 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:46:46,665-Speed 24577.95 samples/sec Loss 1.7602 LearningRate 0.0002 Epoch: 24 Global Step: 43140 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:46:56,409-Speed 25224.33 samples/sec Loss 1.7577 LearningRate 0.0002 Epoch: 24 Global Step: 43150 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:47:06,188-Speed 25135.47 samples/sec Loss 1.7399 LearningRate 0.0002 Epoch: 24 Global Step: 43160 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:47:16,043-Speed 24942.90 samples/sec Loss 1.7605 LearningRate 0.0002 Epoch: 24 Global Step: 43170 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:47:25,963-Speed 24776.52 samples/sec Loss 1.7646 LearningRate 0.0002 Epoch: 24 Global Step: 43180 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:47:35,799-Speed 24988.58 samples/sec Loss 1.7615 LearningRate 0.0002 Epoch: 24 Global Step: 43190 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:47:45,662-Speed 24921.62 samples/sec Loss 1.7789 LearningRate 0.0002 Epoch: 24 Global Step: 43200 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:48:45,049-Speed 4138.40 samples/sec Loss 1.7710 LearningRate 0.0002 Epoch: 25 Global Step: 43210 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:48:55,074-Speed 24517.22 samples/sec Loss 1.7387 LearningRate 0.0002 Epoch: 25 Global Step: 43220 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:49:05,226-Speed 24221.79 samples/sec Loss 1.7415 LearningRate 0.0002 Epoch: 25 Global Step: 43230 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:49:15,321-Speed 24347.98 samples/sec Loss 1.7497 LearningRate 0.0002 Epoch: 25 Global Step: 43240 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:49:25,413-Speed 24354.73 samples/sec Loss 1.7403 LearningRate 0.0002 Epoch: 25 Global Step: 43250 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:49:35,562-Speed 24218.19 samples/sec Loss 1.7490 LearningRate 0.0002 Epoch: 25 Global Step: 43260 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:49:45,856-Speed 23877.48 samples/sec Loss 1.7445 LearningRate 0.0002 Epoch: 25 Global Step: 43270 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:49:55,996-Speed 24241.35 samples/sec Loss 1.7444 LearningRate 0.0002 Epoch: 25 Global Step: 43280 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:50:06,167-Speed 24165.39 samples/sec Loss 1.7416 LearningRate 0.0002 Epoch: 25 Global Step: 43290 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:50:15,940-Speed 25158.57 samples/sec Loss 1.7476 LearningRate 0.0002 Epoch: 25 Global Step: 43300 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:50:25,872-Speed 24749.58 samples/sec Loss 1.7364 LearningRate 0.0002 Epoch: 25 Global Step: 43310 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:50:35,777-Speed 24820.44 samples/sec Loss 1.7313 LearningRate 0.0002 Epoch: 25 Global Step: 43320 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:50:45,686-Speed 24806.00 samples/sec Loss 1.7480 LearningRate 0.0002 Epoch: 25 Global Step: 43330 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:50:55,434-Speed 25213.34 samples/sec Loss 1.7428 LearningRate 0.0002 Epoch: 25 Global Step: 43340 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:51:05,246-Speed 25049.68 samples/sec Loss 1.7430 LearningRate 0.0002 Epoch: 25 Global Step: 43350 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:51:15,068-Speed 25026.99 samples/sec Loss 1.7398 LearningRate 0.0002 Epoch: 25 Global Step: 43360 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:51:24,885-Speed 25038.72 samples/sec Loss 1.7418 LearningRate 0.0002 Epoch: 25 Global Step: 43370 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:51:34,635-Speed 25208.37 samples/sec Loss 1.7255 LearningRate 0.0002 Epoch: 25 Global Step: 43380 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:51:44,501-Speed 24915.18 samples/sec Loss 1.7467 LearningRate 0.0002 Epoch: 25 Global Step: 43390 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:51:54,362-Speed 24923.99 samples/sec Loss 1.7589 LearningRate 0.0002 Epoch: 25 Global Step: 43400 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:52:04,149-Speed 25114.49 samples/sec Loss 1.7499 LearningRate 0.0002 Epoch: 25 Global Step: 43410 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:52:13,978-Speed 25008.16 samples/sec Loss 1.7594 LearningRate 0.0002 Epoch: 25 Global Step: 43420 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:52:23,869-Speed 24848.31 samples/sec Loss 1.7464 LearningRate 0.0002 Epoch: 25 Global Step: 43430 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:52:33,693-Speed 25019.45 samples/sec Loss 1.7341 LearningRate 0.0002 Epoch: 25 Global Step: 43440 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:52:43,464-Speed 25154.42 samples/sec Loss 1.7320 LearningRate 0.0002 Epoch: 25 Global Step: 43450 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:52:53,257-Speed 25099.61 samples/sec Loss 1.7315 LearningRate 0.0002 Epoch: 25 Global Step: 43460 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:53:03,087-Speed 25004.98 samples/sec Loss 1.7295 LearningRate 0.0002 Epoch: 25 Global Step: 43470 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:53:13,031-Speed 24716.92 samples/sec Loss 1.7374 LearningRate 0.0002 Epoch: 25 Global Step: 43480 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:53:22,842-Speed 25051.68 samples/sec Loss 1.7422 LearningRate 0.0002 Epoch: 25 Global Step: 43490 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:53:32,618-Speed 25144.38 samples/sec Loss 1.7361 LearningRate 0.0002 Epoch: 25 Global Step: 43500 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:53:42,434-Speed 25040.34 samples/sec Loss 1.7462 LearningRate 0.0002 Epoch: 25 Global Step: 43510 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:53:52,145-Speed 25311.40 samples/sec Loss 1.7570 LearningRate 0.0002 Epoch: 25 Global Step: 43520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-26 10:54:01,961-Speed 25045.73 samples/sec Loss 1.7485 LearningRate 0.0002 Epoch: 25 Global Step: 43530 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:54:11,826-Speed 24915.88 samples/sec Loss 1.7462 LearningRate 0.0002 Epoch: 25 Global Step: 43540 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:54:21,695-Speed 24905.79 samples/sec Loss 1.7400 LearningRate 0.0002 Epoch: 25 Global Step: 43550 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:54:31,620-Speed 24763.64 samples/sec Loss 1.7404 LearningRate 0.0002 Epoch: 25 Global Step: 43560 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:54:41,518-Speed 24831.86 samples/sec Loss 1.7468 LearningRate 0.0002 Epoch: 25 Global Step: 43570 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:54:51,309-Speed 25103.78 samples/sec Loss 1.7339 LearningRate 0.0002 Epoch: 25 Global Step: 43580 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:55:01,078-Speed 25158.50 samples/sec Loss 1.7356 LearningRate 0.0002 Epoch: 25 Global Step: 43590 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:55:10,892-Speed 25046.21 samples/sec Loss 1.7364 LearningRate 0.0002 Epoch: 25 Global Step: 43600 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:55:20,676-Speed 25120.57 samples/sec Loss 1.7379 LearningRate 0.0002 Epoch: 25 Global Step: 43610 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:55:30,549-Speed 24895.93 samples/sec Loss 1.7468 LearningRate 0.0002 Epoch: 25 Global Step: 43620 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:55:40,326-Speed 25142.49 samples/sec Loss 1.7335 LearningRate 0.0002 Epoch: 25 Global Step: 43630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-26 10:55:50,234-Speed 24806.38 samples/sec Loss 1.7291 LearningRate 0.0002 Epoch: 25 Global Step: 43640 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:55:59,993-Speed 25187.74 samples/sec Loss 1.7293 LearningRate 0.0002 Epoch: 25 Global Step: 43650 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:56:09,878-Speed 24863.15 samples/sec Loss 1.7439 LearningRate 0.0002 Epoch: 25 Global Step: 43660 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:56:19,678-Speed 25081.04 samples/sec Loss 1.7325 LearningRate 0.0002 Epoch: 25 Global Step: 43670 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:56:29,537-Speed 24931.88 samples/sec Loss 1.7374 LearningRate 0.0002 Epoch: 25 Global Step: 43680 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:56:39,446-Speed 24807.95 samples/sec Loss 1.7336 LearningRate 0.0002 Epoch: 25 Global Step: 43690 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:56:49,262-Speed 25037.29 samples/sec Loss 1.7436 LearningRate 0.0002 Epoch: 25 Global Step: 43700 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:56:59,145-Speed 24870.77 samples/sec Loss 1.7511 LearningRate 0.0002 Epoch: 25 Global Step: 43710 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:57:08,919-Speed 25147.80 samples/sec Loss 1.7479 LearningRate 0.0002 Epoch: 25 Global Step: 43720 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:57:18,673-Speed 25199.09 samples/sec Loss 1.7311 LearningRate 0.0002 Epoch: 25 Global Step: 43730 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:57:28,480-Speed 25063.79 samples/sec Loss 1.7551 LearningRate 0.0002 Epoch: 25 Global Step: 43740 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:57:38,376-Speed 24840.10 samples/sec Loss 1.7392 LearningRate 0.0002 Epoch: 25 Global Step: 43750 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:57:48,233-Speed 24934.75 samples/sec Loss 1.7447 LearningRate 0.0002 Epoch: 25 Global Step: 43760 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:57:58,009-Speed 25141.66 samples/sec Loss 1.7322 LearningRate 0.0002 Epoch: 25 Global Step: 43770 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:58:07,727-Speed 25298.71 samples/sec Loss 1.7395 LearningRate 0.0002 Epoch: 25 Global Step: 43780 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:58:17,474-Speed 25218.58 samples/sec Loss 1.7322 LearningRate 0.0002 Epoch: 25 Global Step: 43790 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:58:27,237-Speed 25185.26 samples/sec Loss 1.7562 LearningRate 0.0002 Epoch: 25 Global Step: 43800 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:58:37,051-Speed 25045.91 samples/sec Loss 1.7332 LearningRate 0.0002 Epoch: 25 Global Step: 43810 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:58:46,852-Speed 25078.67 samples/sec Loss 1.7263 LearningRate 0.0002 Epoch: 25 Global Step: 43820 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:58:56,732-Speed 24879.84 samples/sec Loss 1.7247 LearningRate 0.0002 Epoch: 25 Global Step: 43830 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:59:06,503-Speed 25153.63 samples/sec Loss 1.7384 LearningRate 0.0002 Epoch: 25 Global Step: 43840 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:59:16,303-Speed 25082.79 samples/sec Loss 1.7342 LearningRate 0.0002 Epoch: 25 Global Step: 43850 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:59:26,140-Speed 24989.40 samples/sec Loss 1.7252 LearningRate 0.0002 Epoch: 25 Global Step: 43860 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:59:36,051-Speed 24800.20 samples/sec Loss 1.7361 LearningRate 0.0002 Epoch: 25 Global Step: 43870 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:59:45,951-Speed 24826.58 samples/sec Loss 1.7342 LearningRate 0.0002 Epoch: 25 Global Step: 43880 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 10:59:55,727-Speed 25143.14 samples/sec Loss 1.7310 LearningRate 0.0002 Epoch: 25 Global Step: 43890 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:00:05,622-Speed 24841.96 samples/sec Loss 1.7314 LearningRate 0.0002 Epoch: 25 Global Step: 43900 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:00:15,486-Speed 24917.67 samples/sec Loss 1.7345 LearningRate 0.0002 Epoch: 25 Global Step: 43910 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:00:25,303-Speed 25038.34 samples/sec Loss 1.7392 LearningRate 0.0002 Epoch: 25 Global Step: 43920 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:00:35,067-Speed 25173.20 samples/sec Loss 1.7708 LearningRate 0.0002 Epoch: 25 Global Step: 43930 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:00:44,834-Speed 25164.76 samples/sec Loss 1.7444 LearningRate 0.0002 Epoch: 25 Global Step: 43940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-26 11:00:54,594-Speed 25183.35 samples/sec Loss 1.7359 LearningRate 0.0002 Epoch: 25 Global Step: 43950 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:01:04,408-Speed 25045.52 samples/sec Loss 1.7287 LearningRate 0.0002 Epoch: 25 Global Step: 43960 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:01:14,168-Speed 25182.78 samples/sec Loss 1.7170 LearningRate 0.0002 Epoch: 25 Global Step: 43970 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:01:23,907-Speed 25245.27 samples/sec Loss 1.7138 LearningRate 0.0002 Epoch: 25 Global Step: 43980 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:01:33,666-Speed 25188.21 samples/sec Loss 1.7157 LearningRate 0.0002 Epoch: 25 Global Step: 43990 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:01:43,456-Speed 25106.76 samples/sec Loss 1.7356 LearningRate 0.0002 Epoch: 25 Global Step: 44000 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:01:53,289-Speed 24997.08 samples/sec Loss 1.7257 LearningRate 0.0002 Epoch: 25 Global Step: 44010 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:02:03,192-Speed 24819.73 samples/sec Loss 1.7250 LearningRate 0.0002 Epoch: 25 Global Step: 44020 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:02:13,052-Speed 24929.04 samples/sec Loss 1.7284 LearningRate 0.0002 Epoch: 25 Global Step: 44030 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:02:22,846-Speed 25094.51 samples/sec Loss 1.7154 LearningRate 0.0002 Epoch: 25 Global Step: 44040 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:02:32,727-Speed 24876.26 samples/sec Loss 1.7168 LearningRate 0.0002 Epoch: 25 Global Step: 44050 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:02:42,545-Speed 25036.65 samples/sec Loss 1.7129 LearningRate 0.0002 Epoch: 25 Global Step: 44060 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:02:52,334-Speed 25106.41 samples/sec Loss 1.6997 LearningRate 0.0002 Epoch: 25 Global Step: 44070 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:03:02,206-Speed 24906.19 samples/sec Loss 1.7075 LearningRate 0.0002 Epoch: 25 Global Step: 44080 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:03:12,023-Speed 25037.10 samples/sec Loss 1.7230 LearningRate 0.0002 Epoch: 25 Global Step: 44090 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:03:21,775-Speed 25202.89 samples/sec Loss 1.7290 LearningRate 0.0002 Epoch: 25 Global Step: 44100 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:03:31,535-Speed 25191.29 samples/sec Loss 1.7243 LearningRate 0.0002 Epoch: 25 Global Step: 44110 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:03:41,433-Speed 24833.49 samples/sec Loss 1.7156 LearningRate 0.0002 Epoch: 25 Global Step: 44120 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:03:51,401-Speed 24663.71 samples/sec Loss 1.7090 LearningRate 0.0002 Epoch: 25 Global Step: 44130 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:04:01,313-Speed 24796.51 samples/sec Loss 1.7176 LearningRate 0.0002 Epoch: 25 Global Step: 44140 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:04:11,186-Speed 24896.07 samples/sec Loss 1.7172 LearningRate 0.0002 Epoch: 25 Global Step: 44150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-26 11:04:21,097-Speed 24798.92 samples/sec Loss 1.7160 LearningRate 0.0002 Epoch: 25 Global Step: 44160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-26 11:04:30,895-Speed 25085.24 samples/sec Loss 1.7267 LearningRate 0.0002 Epoch: 25 Global Step: 44170 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:04:40,794-Speed 24831.30 samples/sec Loss 1.7204 LearningRate 0.0002 Epoch: 25 Global Step: 44180 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:04:50,552-Speed 25187.70 samples/sec Loss 1.7196 LearningRate 0.0002 Epoch: 25 Global Step: 44190 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:05:00,292-Speed 25234.80 samples/sec Loss 1.7141 LearningRate 0.0002 Epoch: 25 Global Step: 44200 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:05:10,101-Speed 25057.78 samples/sec Loss 1.7203 LearningRate 0.0002 Epoch: 25 Global Step: 44210 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:05:19,878-Speed 25140.64 samples/sec Loss 1.7315 LearningRate 0.0002 Epoch: 25 Global Step: 44220 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:05:29,772-Speed 24844.63 samples/sec Loss 1.7247 LearningRate 0.0002 Epoch: 25 Global Step: 44230 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:05:39,575-Speed 25075.61 samples/sec Loss 1.7141 LearningRate 0.0002 Epoch: 25 Global Step: 44240 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:05:49,488-Speed 24793.29 samples/sec Loss 1.7113 LearningRate 0.0002 Epoch: 25 Global Step: 44250 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:05:59,335-Speed 24967.63 samples/sec Loss 1.7091 LearningRate 0.0002 Epoch: 25 Global Step: 44260 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:06:09,234-Speed 24832.19 samples/sec Loss 1.7169 LearningRate 0.0002 Epoch: 25 Global Step: 44270 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:06:19,123-Speed 24856.38 samples/sec Loss 1.7172 LearningRate 0.0002 Epoch: 25 Global Step: 44280 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:06:28,867-Speed 25223.34 samples/sec Loss 1.7129 LearningRate 0.0002 Epoch: 25 Global Step: 44290 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:06:38,733-Speed 24912.72 samples/sec Loss 1.7156 LearningRate 0.0002 Epoch: 25 Global Step: 44300 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:06:48,644-Speed 24799.15 samples/sec Loss 1.7144 LearningRate 0.0002 Epoch: 25 Global Step: 44310 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:06:58,516-Speed 24898.86 samples/sec Loss 1.7185 LearningRate 0.0002 Epoch: 25 Global Step: 44320 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:07:08,373-Speed 24934.19 samples/sec Loss 1.7064 LearningRate 0.0002 Epoch: 25 Global Step: 44330 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:07:18,109-Speed 25246.97 samples/sec Loss 1.7206 LearningRate 0.0002 Epoch: 25 Global Step: 44340 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:07:28,010-Speed 24825.09 samples/sec Loss 1.7189 LearningRate 0.0002 Epoch: 25 Global Step: 44350 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:07:38,038-Speed 24511.16 samples/sec Loss 1.7155 LearningRate 0.0002 Epoch: 25 Global Step: 44360 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:07:47,831-Speed 25101.53 samples/sec Loss 1.7014 LearningRate 0.0002 Epoch: 25 Global Step: 44370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-26 11:07:57,745-Speed 24791.55 samples/sec Loss 1.7131 LearningRate 0.0002 Epoch: 25 Global Step: 44380 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:08:07,816-Speed 24405.19 samples/sec Loss 1.7205 LearningRate 0.0002 Epoch: 25 Global Step: 44390 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:08:17,900-Speed 24372.96 samples/sec Loss 1.7108 LearningRate 0.0002 Epoch: 25 Global Step: 44400 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:08:28,009-Speed 24315.13 samples/sec Loss 1.7249 LearningRate 0.0002 Epoch: 25 Global Step: 44410 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:08:38,122-Speed 24303.83 samples/sec Loss 1.7232 LearningRate 0.0002 Epoch: 25 Global Step: 44420 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:08:48,203-Speed 24381.36 samples/sec Loss 1.7110 LearningRate 0.0002 Epoch: 25 Global Step: 44430 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:08:58,290-Speed 24369.73 samples/sec Loss 1.7059 LearningRate 0.0002 Epoch: 25 Global Step: 44440 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:09:08,404-Speed 24302.30 samples/sec Loss 1.7093 LearningRate 0.0002 Epoch: 25 Global Step: 44450 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:09:18,493-Speed 24361.82 samples/sec Loss 1.6872 LearningRate 0.0002 Epoch: 25 Global Step: 44460 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:09:28,597-Speed 24323.64 samples/sec Loss 1.6922 LearningRate 0.0002 Epoch: 25 Global Step: 44470 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:09:38,675-Speed 24390.96 samples/sec Loss 1.7059 LearningRate 0.0002 Epoch: 25 Global Step: 44480 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:09:48,806-Speed 24262.91 samples/sec Loss 1.7182 LearningRate 0.0002 Epoch: 25 Global Step: 44490 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:09:58,936-Speed 24263.71 samples/sec Loss 1.6978 LearningRate 0.0002 Epoch: 25 Global Step: 44500 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:10:09,039-Speed 24328.31 samples/sec Loss 1.7075 LearningRate 0.0002 Epoch: 25 Global Step: 44510 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:10:19,124-Speed 24373.84 samples/sec Loss 1.7175 LearningRate 0.0002 Epoch: 25 Global Step: 44520 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:10:29,215-Speed 24356.44 samples/sec Loss 1.7110 LearningRate 0.0002 Epoch: 25 Global Step: 44530 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:10:39,339-Speed 24278.16 samples/sec Loss 1.7098 LearningRate 0.0002 Epoch: 25 Global Step: 44540 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:10:49,489-Speed 24216.26 samples/sec Loss 1.6981 LearningRate 0.0002 Epoch: 25 Global Step: 44550 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:10:59,711-Speed 24046.13 samples/sec Loss 1.7010 LearningRate 0.0002 Epoch: 25 Global Step: 44560 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:11:09,812-Speed 24331.85 samples/sec Loss 1.7219 LearningRate 0.0002 Epoch: 25 Global Step: 44570 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:11:19,960-Speed 24220.93 samples/sec Loss 1.7057 LearningRate 0.0002 Epoch: 25 Global Step: 44580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-26 11:11:30,064-Speed 24331.09 samples/sec Loss 1.6970 LearningRate 0.0002 Epoch: 25 Global Step: 44590 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:11:40,190-Speed 24274.89 samples/sec Loss 1.7073 LearningRate 0.0002 Epoch: 25 Global Step: 44600 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:11:50,107-Speed 24784.99 samples/sec Loss 1.7033 LearningRate 0.0002 Epoch: 25 Global Step: 44610 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:11:59,916-Speed 25059.66 samples/sec Loss 1.7029 LearningRate 0.0002 Epoch: 25 Global Step: 44620 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:12:09,682-Speed 25168.62 samples/sec Loss 1.7033 LearningRate 0.0002 Epoch: 25 Global Step: 44630 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:12:19,427-Speed 25221.55 samples/sec Loss 1.6998 LearningRate 0.0002 Epoch: 25 Global Step: 44640 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-26 11:12:29,227-Speed 25078.61 samples/sec Loss 1.7030 LearningRate 0.0002 Epoch: 25 Global Step: 44650 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-26 11:12:38,967-Speed 25237.28 samples/sec Loss 1.7107 LearningRate 0.0002 Epoch: 25 Global Step: 44660 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-26 11:12:48,879-Speed 24798.17 samples/sec Loss 1.7109 LearningRate 0.0002 Epoch: 25 Global Step: 44670 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-26 11:12:58,686-Speed 25061.40 samples/sec Loss 1.7034 LearningRate 0.0002 Epoch: 25 Global Step: 44680 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-26 11:13:08,494-Speed 25060.88 samples/sec Loss 1.6905 LearningRate 0.0002 Epoch: 25 Global Step: 44690 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-26 11:13:18,296-Speed 25074.99 samples/sec Loss 1.6956 LearningRate 0.0002 Epoch: 25 Global Step: 44700 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-26 11:13:28,055-Speed 25186.50 samples/sec Loss 1.6844 LearningRate 0.0002 Epoch: 25 Global Step: 44710 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-26 11:13:37,813-Speed 25187.73 samples/sec Loss 1.6969 LearningRate 0.0002 Epoch: 25 Global Step: 44720 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-26 11:13:47,615-Speed 25077.75 samples/sec Loss 1.6916 LearningRate 0.0002 Epoch: 25 Global Step: 44730 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-26 11:13:57,365-Speed 25207.47 samples/sec Loss 1.6875 LearningRate 0.0002 Epoch: 25 Global Step: 44740 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:14:07,205-Speed 24980.02 samples/sec Loss 1.6968 LearningRate 0.0002 Epoch: 25 Global Step: 44750 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:14:17,042-Speed 24987.72 samples/sec Loss 1.6937 LearningRate 0.0002 Epoch: 25 Global Step: 44760 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:14:26,811-Speed 25160.55 samples/sec Loss 1.6970 LearningRate 0.0002 Epoch: 25 Global Step: 44770 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:14:36,546-Speed 25250.51 samples/sec Loss 1.6948 LearningRate 0.0002 Epoch: 25 Global Step: 44780 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:14:46,374-Speed 25011.35 samples/sec Loss 1.6799 LearningRate 0.0002 Epoch: 25 Global Step: 44790 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:14:56,129-Speed 25195.44 samples/sec Loss 1.6898 LearningRate 0.0002 Epoch: 25 Global Step: 44800 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:15:05,898-Speed 25161.32 samples/sec Loss 1.7023 LearningRate 0.0002 Epoch: 25 Global Step: 44810 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:15:15,650-Speed 25204.32 samples/sec Loss 1.6946 LearningRate 0.0002 Epoch: 25 Global Step: 44820 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:15:25,468-Speed 25039.23 samples/sec Loss 1.6984 LearningRate 0.0002 Epoch: 25 Global Step: 44830 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:15:35,226-Speed 25196.17 samples/sec Loss 1.7074 LearningRate 0.0002 Epoch: 25 Global Step: 44840 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:15:45,002-Speed 25142.02 samples/sec Loss 1.6890 LearningRate 0.0002 Epoch: 25 Global Step: 44850 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:15:54,833-Speed 25001.31 samples/sec Loss 1.6842 LearningRate 0.0002 Epoch: 25 Global Step: 44860 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:16:04,586-Speed 25202.74 samples/sec Loss 1.6940 LearningRate 0.0002 Epoch: 25 Global Step: 44870 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:16:14,315-Speed 25263.08 samples/sec Loss 1.6948 LearningRate 0.0002 Epoch: 25 Global Step: 44880 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:16:24,041-Speed 25272.75 samples/sec Loss 1.6996 LearningRate 0.0002 Epoch: 25 Global Step: 44890 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:16:33,715-Speed 25408.56 samples/sec Loss 1.7020 LearningRate 0.0002 Epoch: 25 Global Step: 44900 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:16:43,499-Speed 25122.38 samples/sec Loss 1.7183 LearningRate 0.0002 Epoch: 25 Global Step: 44910 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:16:53,366-Speed 24910.03 samples/sec Loss 1.7138 LearningRate 0.0002 Epoch: 25 Global Step: 44920 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:17:03,072-Speed 25331.48 samples/sec Loss 1.6925 LearningRate 0.0002 Epoch: 25 Global Step: 44930 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:18:03,268-Speed 4082.79 samples/sec Loss 1.6847 LearningRate 0.0002 Epoch: 26 Global Step: 44940 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:18:13,051-Speed 25124.77 samples/sec Loss 1.6737 LearningRate 0.0002 Epoch: 26 Global Step: 44950 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:18:22,749-Speed 25343.41 samples/sec Loss 1.6880 LearningRate 0.0002 Epoch: 26 Global Step: 44960 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:18:32,469-Speed 25287.61 samples/sec Loss 1.6894 LearningRate 0.0002 Epoch: 26 Global Step: 44970 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:18:42,246-Speed 25141.26 samples/sec Loss 1.6993 LearningRate 0.0002 Epoch: 26 Global Step: 44980 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:18:51,923-Speed 25400.56 samples/sec Loss 1.6904 LearningRate 0.0002 Epoch: 26 Global Step: 44990 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:19:01,652-Speed 25262.47 samples/sec Loss 1.6859 LearningRate 0.0002 Epoch: 26 Global Step: 45000 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:19:11,381-Speed 25266.68 samples/sec Loss 1.6863 LearningRate 0.0002 Epoch: 26 Global Step: 45010 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-26 11:19:21,191-Speed 25053.30 samples/sec Loss 1.6910 LearningRate 0.0002 Epoch: 26 Global Step: 45020 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-26 11:19:31,068-Speed 24885.78 samples/sec Loss 1.7008 LearningRate 0.0001 Epoch: 26 Global Step: 45030 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-26 11:19:40,889-Speed 25024.59 samples/sec Loss 1.6758 LearningRate 0.0001 Epoch: 26 Global Step: 45040 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-26 11:19:50,721-Speed 24999.36 samples/sec Loss 1.6794 LearningRate 0.0001 Epoch: 26 Global Step: 45050 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-26 11:20:00,574-Speed 24949.63 samples/sec Loss 1.6895 LearningRate 0.0001 Epoch: 26 Global Step: 45060 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-26 11:20:10,293-Speed 25290.35 samples/sec Loss 1.6738 LearningRate 0.0001 Epoch: 26 Global Step: 45070 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-26 11:20:20,027-Speed 25251.55 samples/sec Loss 1.6837 LearningRate 0.0001 Epoch: 26 Global Step: 45080 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-26 11:20:29,792-Speed 25170.41 samples/sec Loss 1.6908 LearningRate 0.0001 Epoch: 26 Global Step: 45090 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-26 11:20:39,528-Speed 25247.11 samples/sec Loss 1.6799 LearningRate 0.0001 Epoch: 26 Global Step: 45100 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-26 11:20:49,319-Speed 25105.09 samples/sec Loss 1.6691 LearningRate 0.0001 Epoch: 26 Global Step: 45110 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:20:59,053-Speed 25248.88 samples/sec Loss 1.6888 LearningRate 0.0001 Epoch: 26 Global Step: 45120 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:21:08,804-Speed 25206.34 samples/sec Loss 1.6888 LearningRate 0.0001 Epoch: 26 Global Step: 45130 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:21:18,504-Speed 25339.68 samples/sec Loss 1.6843 LearningRate 0.0001 Epoch: 26 Global Step: 45140 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:21:28,246-Speed 25229.98 samples/sec Loss 1.6822 LearningRate 0.0001 Epoch: 26 Global Step: 45150 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:21:37,981-Speed 25246.88 samples/sec Loss 1.6833 LearningRate 0.0001 Epoch: 26 Global Step: 45160 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:21:47,708-Speed 25266.61 samples/sec Loss 1.6959 LearningRate 0.0001 Epoch: 26 Global Step: 45170 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:21:57,426-Speed 25299.49 samples/sec Loss 1.6967 LearningRate 0.0001 Epoch: 26 Global Step: 45180 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:22:07,161-Speed 25249.05 samples/sec Loss 1.6975 LearningRate 0.0001 Epoch: 26 Global Step: 45190 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:22:16,983-Speed 25024.49 samples/sec Loss 1.6921 LearningRate 0.0001 Epoch: 26 Global Step: 45200 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:22:26,744-Speed 25181.93 samples/sec Loss 1.6585 LearningRate 0.0001 Epoch: 26 Global Step: 45210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-26 11:22:36,439-Speed 25350.87 samples/sec Loss 1.6578 LearningRate 0.0001 Epoch: 26 Global Step: 45220 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:22:46,228-Speed 25107.47 samples/sec Loss 1.6826 LearningRate 0.0001 Epoch: 26 Global Step: 45230 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:22:55,927-Speed 25341.03 samples/sec Loss 1.6725 LearningRate 0.0001 Epoch: 26 Global Step: 45240 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:23:05,631-Speed 25328.79 samples/sec Loss 1.6711 LearningRate 0.0001 Epoch: 26 Global Step: 45250 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:23:15,328-Speed 25348.16 samples/sec Loss 1.6821 LearningRate 0.0001 Epoch: 26 Global Step: 45260 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:23:25,035-Speed 25318.75 samples/sec Loss 1.6921 LearningRate 0.0001 Epoch: 26 Global Step: 45270 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:23:34,733-Speed 25343.97 samples/sec Loss 1.6941 LearningRate 0.0001 Epoch: 26 Global Step: 45280 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:23:44,575-Speed 24975.44 samples/sec Loss 1.6774 LearningRate 0.0001 Epoch: 26 Global Step: 45290 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:23:54,333-Speed 25187.35 samples/sec Loss 1.6912 LearningRate 0.0001 Epoch: 26 Global Step: 45300 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:24:04,034-Speed 25336.30 samples/sec Loss 1.6938 LearningRate 0.0001 Epoch: 26 Global Step: 45310 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:24:13,805-Speed 25155.29 samples/sec Loss 1.6889 LearningRate 0.0001 Epoch: 26 Global Step: 45320 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:24:23,524-Speed 25288.62 samples/sec Loss 1.6818 LearningRate 0.0001 Epoch: 26 Global Step: 45330 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:24:33,295-Speed 25153.15 samples/sec Loss 1.6850 LearningRate 0.0001 Epoch: 26 Global Step: 45340 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:24:43,051-Speed 25193.36 samples/sec Loss 1.6688 LearningRate 0.0001 Epoch: 26 Global Step: 45350 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:24:52,868-Speed 25039.02 samples/sec Loss 1.6820 LearningRate 0.0001 Epoch: 26 Global Step: 45360 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:25:02,608-Speed 25234.16 samples/sec Loss 1.6829 LearningRate 0.0001 Epoch: 26 Global Step: 45370 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:25:12,355-Speed 25216.43 samples/sec Loss 1.6759 LearningRate 0.0001 Epoch: 26 Global Step: 45380 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:25:22,053-Speed 25343.78 samples/sec Loss 1.6786 LearningRate 0.0001 Epoch: 26 Global Step: 45390 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:25:31,734-Speed 25387.98 samples/sec Loss 1.6681 LearningRate 0.0001 Epoch: 26 Global Step: 45400 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:25:41,452-Speed 25290.64 samples/sec Loss 1.6965 LearningRate 0.0001 Epoch: 26 Global Step: 45410 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:25:51,168-Speed 25299.19 samples/sec Loss 1.6729 LearningRate 0.0001 Epoch: 26 Global Step: 45420 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:26:00,885-Speed 25293.62 samples/sec Loss 1.6782 LearningRate 0.0001 Epoch: 26 Global Step: 45430 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:26:10,686-Speed 25078.33 samples/sec Loss 1.6820 LearningRate 0.0001 Epoch: 26 Global Step: 45440 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:26:20,410-Speed 25276.30 samples/sec Loss 1.6791 LearningRate 0.0001 Epoch: 26 Global Step: 45450 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:26:30,281-Speed 24902.61 samples/sec Loss 1.6851 LearningRate 0.0001 Epoch: 26 Global Step: 45460 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:26:39,942-Speed 25439.99 samples/sec Loss 1.6792 LearningRate 0.0001 Epoch: 26 Global Step: 45470 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:26:49,688-Speed 25218.44 samples/sec Loss 1.6777 LearningRate 0.0001 Epoch: 26 Global Step: 45480 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:26:59,397-Speed 25316.26 samples/sec Loss 1.6789 LearningRate 0.0001 Epoch: 26 Global Step: 45490 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:27:09,122-Speed 25272.21 samples/sec Loss 1.6655 LearningRate 0.0001 Epoch: 26 Global Step: 45500 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:27:18,893-Speed 25155.67 samples/sec Loss 1.6717 LearningRate 0.0001 Epoch: 26 Global Step: 45510 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:27:28,690-Speed 25087.96 samples/sec Loss 1.6756 LearningRate 0.0001 Epoch: 26 Global Step: 45520 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:27:38,484-Speed 25095.46 samples/sec Loss 1.6727 LearningRate 0.0001 Epoch: 26 Global Step: 45530 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:27:48,476-Speed 24597.75 samples/sec Loss 1.6674 LearningRate 0.0001 Epoch: 26 Global Step: 45540 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:27:58,527-Speed 24455.78 samples/sec Loss 1.6683 LearningRate 0.0001 Epoch: 26 Global Step: 45550 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:28:08,564-Speed 24487.17 samples/sec Loss 1.6612 LearningRate 0.0001 Epoch: 26 Global Step: 45560 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:28:18,587-Speed 24522.39 samples/sec Loss 1.6666 LearningRate 0.0001 Epoch: 26 Global Step: 45570 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:28:28,573-Speed 24613.14 samples/sec Loss 1.6689 LearningRate 0.0001 Epoch: 26 Global Step: 45580 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:28:38,638-Speed 24419.11 samples/sec Loss 1.6763 LearningRate 0.0001 Epoch: 26 Global Step: 45590 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:28:48,704-Speed 24417.34 samples/sec Loss 1.6827 LearningRate 0.0001 Epoch: 26 Global Step: 45600 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:28:58,711-Speed 24562.71 samples/sec Loss 1.6732 LearningRate 0.0001 Epoch: 26 Global Step: 45610 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:29:08,766-Speed 24443.30 samples/sec Loss 1.6852 LearningRate 0.0001 Epoch: 26 Global Step: 45620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-26 11:29:18,747-Speed 24624.42 samples/sec Loss 1.6661 LearningRate 0.0001 Epoch: 26 Global Step: 45630 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:29:28,799-Speed 24452.82 samples/sec Loss 1.6700 LearningRate 0.0001 Epoch: 26 Global Step: 45640 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:29:38,855-Speed 24441.25 samples/sec Loss 1.6678 LearningRate 0.0001 Epoch: 26 Global Step: 45650 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:29:48,817-Speed 24670.87 samples/sec Loss 1.6582 LearningRate 0.0001 Epoch: 26 Global Step: 45660 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:29:58,850-Speed 24497.90 samples/sec Loss 1.6739 LearningRate 0.0001 Epoch: 26 Global Step: 45670 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:30:08,879-Speed 24507.25 samples/sec Loss 1.6606 LearningRate 0.0001 Epoch: 26 Global Step: 45680 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:30:18,918-Speed 24484.24 samples/sec Loss 1.6559 LearningRate 0.0001 Epoch: 26 Global Step: 45690 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:30:28,887-Speed 24654.61 samples/sec Loss 1.6685 LearningRate 0.0001 Epoch: 26 Global Step: 45700 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:30:38,871-Speed 24617.60 samples/sec Loss 1.6639 LearningRate 0.0001 Epoch: 26 Global Step: 45710 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:30:48,890-Speed 24529.94 samples/sec Loss 1.6717 LearningRate 0.0001 Epoch: 26 Global Step: 45720 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:30:58,859-Speed 24655.55 samples/sec Loss 1.6609 LearningRate 0.0001 Epoch: 26 Global Step: 45730 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:31:08,864-Speed 24565.14 samples/sec Loss 1.6712 LearningRate 0.0001 Epoch: 26 Global Step: 45740 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:31:18,833-Speed 24654.98 samples/sec Loss 1.6635 LearningRate 0.0001 Epoch: 26 Global Step: 45750 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:31:28,830-Speed 24586.50 samples/sec Loss 1.6671 LearningRate 0.0001 Epoch: 26 Global Step: 45760 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:31:38,881-Speed 24460.13 samples/sec Loss 1.6704 LearningRate 0.0001 Epoch: 26 Global Step: 45770 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:31:48,861-Speed 24627.37 samples/sec Loss 1.6606 LearningRate 0.0001 Epoch: 26 Global Step: 45780 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:31:58,837-Speed 24636.58 samples/sec Loss 1.6656 LearningRate 0.0001 Epoch: 26 Global Step: 45790 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:32:08,879-Speed 24475.32 samples/sec Loss 1.6651 LearningRate 0.0001 Epoch: 26 Global Step: 45800 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:32:18,992-Speed 24305.08 samples/sec Loss 1.6505 LearningRate 0.0001 Epoch: 26 Global Step: 45810 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:32:29,039-Speed 24465.13 samples/sec Loss 1.6567 LearningRate 0.0001 Epoch: 26 Global Step: 45820 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:32:39,028-Speed 24606.28 samples/sec Loss 1.6673 LearningRate 0.0001 Epoch: 26 Global Step: 45830 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:32:49,131-Speed 24331.63 samples/sec Loss 1.6728 LearningRate 0.0001 Epoch: 26 Global Step: 45840 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:32:59,123-Speed 24604.24 samples/sec Loss 1.6493 LearningRate 0.0001 Epoch: 26 Global Step: 45850 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:33:08,934-Speed 25053.24 samples/sec Loss 1.6548 LearningRate 0.0001 Epoch: 26 Global Step: 45860 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:33:18,654-Speed 25288.00 samples/sec Loss 1.6662 LearningRate 0.0001 Epoch: 26 Global Step: 45870 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:33:28,392-Speed 25239.45 samples/sec Loss 1.6586 LearningRate 0.0001 Epoch: 26 Global Step: 45880 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:33:38,075-Speed 25384.67 samples/sec Loss 1.6646 LearningRate 0.0001 Epoch: 26 Global Step: 45890 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:33:47,841-Speed 25169.80 samples/sec Loss 1.6657 LearningRate 0.0001 Epoch: 26 Global Step: 45900 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:33:57,598-Speed 25191.74 samples/sec Loss 1.6729 LearningRate 0.0001 Epoch: 26 Global Step: 45910 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:34:07,415-Speed 25037.52 samples/sec Loss 1.6682 LearningRate 0.0001 Epoch: 26 Global Step: 45920 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:34:17,187-Speed 25152.93 samples/sec Loss 1.6727 LearningRate 0.0001 Epoch: 26 Global Step: 45930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-26 11:34:27,002-Speed 25042.92 samples/sec Loss 1.6502 LearningRate 0.0001 Epoch: 26 Global Step: 45940 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:34:36,780-Speed 25137.57 samples/sec Loss 1.6642 LearningRate 0.0001 Epoch: 26 Global Step: 45950 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:34:46,719-Speed 24729.93 samples/sec Loss 1.6589 LearningRate 0.0001 Epoch: 26 Global Step: 45960 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:34:56,917-Speed 24102.90 samples/sec Loss 1.6577 LearningRate 0.0001 Epoch: 26 Global Step: 45970 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:35:06,819-Speed 24823.00 samples/sec Loss 1.6622 LearningRate 0.0001 Epoch: 26 Global Step: 45980 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:35:16,720-Speed 24824.85 samples/sec Loss 1.6406 LearningRate 0.0001 Epoch: 26 Global Step: 45990 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:35:26,713-Speed 24595.32 samples/sec Loss 1.6548 LearningRate 0.0001 Epoch: 26 Global Step: 46000 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:35:36,745-Speed 24502.94 samples/sec Loss 1.6588 LearningRate 0.0001 Epoch: 26 Global Step: 46010 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:35:46,658-Speed 24793.17 samples/sec Loss 1.6578 LearningRate 0.0001 Epoch: 26 Global Step: 46020 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:35:56,605-Speed 24712.47 samples/sec Loss 1.6690 LearningRate 0.0001 Epoch: 26 Global Step: 46030 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:36:06,615-Speed 24555.12 samples/sec Loss 1.6490 LearningRate 0.0001 Epoch: 26 Global Step: 46040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-26 11:36:16,594-Speed 24629.13 samples/sec Loss 1.6601 LearningRate 0.0001 Epoch: 26 Global Step: 46050 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-26 11:36:26,495-Speed 24827.02 samples/sec Loss 1.6671 LearningRate 0.0001 Epoch: 26 Global Step: 46060 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-26 11:36:36,374-Speed 24882.09 samples/sec Loss 1.6491 LearningRate 0.0001 Epoch: 26 Global Step: 46070 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-26 11:36:46,357-Speed 24626.09 samples/sec Loss 1.6465 LearningRate 0.0001 Epoch: 26 Global Step: 46080 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-26 11:36:56,347-Speed 24604.31 samples/sec Loss 1.6625 LearningRate 0.0001 Epoch: 26 Global Step: 46090 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-26 11:37:06,272-Speed 24764.77 samples/sec Loss 1.6570 LearningRate 0.0001 Epoch: 26 Global Step: 46100 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-26 11:37:16,190-Speed 24782.53 samples/sec Loss 1.6464 LearningRate 0.0001 Epoch: 26 Global Step: 46110 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-26 11:37:26,163-Speed 24645.75 samples/sec Loss 1.6462 LearningRate 0.0001 Epoch: 26 Global Step: 46120 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-26 11:37:36,125-Speed 24671.91 samples/sec Loss 1.6526 LearningRate 0.0001 Epoch: 26 Global Step: 46130 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-26 11:37:46,076-Speed 24702.00 samples/sec Loss 1.6496 LearningRate 0.0001 Epoch: 26 Global Step: 46140 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-26 11:37:56,019-Speed 24719.39 samples/sec Loss 1.6567 LearningRate 0.0001 Epoch: 26 Global Step: 46150 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-26 11:38:05,897-Speed 24882.59 samples/sec Loss 1.6555 LearningRate 0.0001 Epoch: 26 Global Step: 46160 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:38:15,887-Speed 24602.23 samples/sec Loss 1.6552 LearningRate 0.0001 Epoch: 26 Global Step: 46170 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:38:25,764-Speed 24884.78 samples/sec Loss 1.6425 LearningRate 0.0001 Epoch: 26 Global Step: 46180 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:38:35,640-Speed 24888.76 samples/sec Loss 1.6439 LearningRate 0.0001 Epoch: 26 Global Step: 46190 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:38:45,541-Speed 24825.39 samples/sec Loss 1.6604 LearningRate 0.0001 Epoch: 26 Global Step: 46200 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:38:55,608-Speed 24416.42 samples/sec Loss 1.6685 LearningRate 0.0001 Epoch: 26 Global Step: 46210 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:39:05,537-Speed 24754.09 samples/sec Loss 1.6423 LearningRate 0.0001 Epoch: 26 Global Step: 46220 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:39:15,500-Speed 24668.82 samples/sec Loss 1.6413 LearningRate 0.0001 Epoch: 26 Global Step: 46230 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:39:25,521-Speed 24529.83 samples/sec Loss 1.6553 LearningRate 0.0001 Epoch: 26 Global Step: 46240 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:39:35,386-Speed 24914.40 samples/sec Loss 1.6470 LearningRate 0.0001 Epoch: 26 Global Step: 46250 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:39:45,331-Speed 24715.82 samples/sec Loss 1.6613 LearningRate 0.0001 Epoch: 26 Global Step: 46260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-26 11:39:55,273-Speed 24722.80 samples/sec Loss 1.6483 LearningRate 0.0001 Epoch: 26 Global Step: 46270 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:40:05,283-Speed 24555.64 samples/sec Loss 1.6534 LearningRate 0.0001 Epoch: 26 Global Step: 46280 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:40:15,176-Speed 24845.00 samples/sec Loss 1.6445 LearningRate 0.0001 Epoch: 26 Global Step: 46290 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:40:25,041-Speed 24916.03 samples/sec Loss 1.6504 LearningRate 0.0001 Epoch: 26 Global Step: 46300 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:40:35,096-Speed 24444.91 samples/sec Loss 1.6529 LearningRate 0.0001 Epoch: 26 Global Step: 46310 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:40:44,980-Speed 24869.00 samples/sec Loss 1.6403 LearningRate 0.0001 Epoch: 26 Global Step: 46320 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:40:54,979-Speed 24582.59 samples/sec Loss 1.6453 LearningRate 0.0001 Epoch: 26 Global Step: 46330 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:41:04,856-Speed 24884.14 samples/sec Loss 1.6447 LearningRate 0.0001 Epoch: 26 Global Step: 46340 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:41:14,732-Speed 24888.30 samples/sec Loss 1.6401 LearningRate 0.0001 Epoch: 26 Global Step: 46350 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:41:24,654-Speed 24770.33 samples/sec Loss 1.6419 LearningRate 0.0001 Epoch: 26 Global Step: 46360 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:41:34,541-Speed 24860.98 samples/sec Loss 1.6314 LearningRate 0.0001 Epoch: 26 Global Step: 46370 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:41:44,459-Speed 24783.99 samples/sec Loss 1.6532 LearningRate 0.0001 Epoch: 26 Global Step: 46380 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:41:54,442-Speed 24619.81 samples/sec Loss 1.6425 LearningRate 0.0001 Epoch: 26 Global Step: 46390 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:42:04,419-Speed 24635.88 samples/sec Loss 1.6590 LearningRate 0.0001 Epoch: 26 Global Step: 46400 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:42:14,363-Speed 24718.03 samples/sec Loss 1.6391 LearningRate 0.0001 Epoch: 26 Global Step: 46410 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:42:24,280-Speed 24784.77 samples/sec Loss 1.6376 LearningRate 0.0001 Epoch: 26 Global Step: 46420 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:42:34,200-Speed 24777.08 samples/sec Loss 1.6395 LearningRate 0.0001 Epoch: 26 Global Step: 46430 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:42:44,153-Speed 24693.47 samples/sec Loss 1.6514 LearningRate 0.0001 Epoch: 26 Global Step: 46440 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:42:54,063-Speed 24802.58 samples/sec Loss 1.6383 LearningRate 0.0001 Epoch: 26 Global Step: 46450 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:43:03,994-Speed 24754.99 samples/sec Loss 1.6485 LearningRate 0.0001 Epoch: 26 Global Step: 46460 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:43:14,019-Speed 24519.13 samples/sec Loss 1.6515 LearningRate 0.0001 Epoch: 26 Global Step: 46470 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:43:23,963-Speed 24718.20 samples/sec Loss 1.6493 LearningRate 0.0001 Epoch: 26 Global Step: 46480 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:43:34,148-Speed 24131.98 samples/sec Loss 1.6437 LearningRate 0.0001 Epoch: 26 Global Step: 46490 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:43:43,954-Speed 25065.33 samples/sec Loss 1.6405 LearningRate 0.0001 Epoch: 26 Global Step: 46500 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:43:53,692-Speed 25240.35 samples/sec Loss 1.6482 LearningRate 0.0001 Epoch: 26 Global Step: 46510 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:44:03,505-Speed 25047.32 samples/sec Loss 1.6383 LearningRate 0.0001 Epoch: 26 Global Step: 46520 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:44:13,210-Speed 25326.99 samples/sec Loss 1.6352 LearningRate 0.0001 Epoch: 26 Global Step: 46530 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:44:22,961-Speed 25206.34 samples/sec Loss 1.6328 LearningRate 0.0001 Epoch: 26 Global Step: 46540 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:44:32,729-Speed 25164.03 samples/sec Loss 1.6320 LearningRate 0.0001 Epoch: 26 Global Step: 46550 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:44:42,464-Speed 25245.43 samples/sec Loss 1.6433 LearningRate 0.0001 Epoch: 26 Global Step: 46560 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:44:52,290-Speed 25017.37 samples/sec Loss 1.6536 LearningRate 0.0001 Epoch: 26 Global Step: 46570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-26 11:45:02,254-Speed 24669.51 samples/sec Loss 1.6515 LearningRate 0.0001 Epoch: 26 Global Step: 46580 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:45:12,198-Speed 24715.96 samples/sec Loss 1.6417 LearningRate 0.0001 Epoch: 26 Global Step: 46590 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:45:22,131-Speed 24745.88 samples/sec Loss 1.6458 LearningRate 0.0001 Epoch: 26 Global Step: 46600 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:45:32,023-Speed 24847.04 samples/sec Loss 1.6464 LearningRate 0.0001 Epoch: 26 Global Step: 46610 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:45:41,945-Speed 24774.54 samples/sec Loss 1.6530 LearningRate 0.0001 Epoch: 26 Global Step: 46620 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:45:51,888-Speed 24720.90 samples/sec Loss 1.6449 LearningRate 0.0001 Epoch: 26 Global Step: 46630 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:46:01,814-Speed 24761.91 samples/sec Loss 1.6481 LearningRate 0.0001 Epoch: 26 Global Step: 46640 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:46:11,724-Speed 24803.05 samples/sec Loss 1.6426 LearningRate 0.0001 Epoch: 26 Global Step: 46650 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:46:21,658-Speed 24740.54 samples/sec Loss 1.6314 LearningRate 0.0001 Epoch: 26 Global Step: 46660 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:47:21,087-Speed 4135.45 samples/sec Loss 1.6455 LearningRate 0.0001 Epoch: 27 Global Step: 46670 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:47:30,807-Speed 25289.16 samples/sec Loss 1.6234 LearningRate 0.0001 Epoch: 27 Global Step: 46680 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:47:40,672-Speed 24913.97 samples/sec Loss 1.6218 LearningRate 0.0001 Epoch: 27 Global Step: 46690 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:47:50,821-Speed 24217.37 samples/sec Loss 1.6430 LearningRate 0.0001 Epoch: 27 Global Step: 46700 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:48:00,782-Speed 24675.54 samples/sec Loss 1.6321 LearningRate 0.0001 Epoch: 27 Global Step: 46710 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:48:10,702-Speed 24778.33 samples/sec Loss 1.6279 LearningRate 0.0001 Epoch: 27 Global Step: 46720 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:48:20,724-Speed 24524.67 samples/sec Loss 1.6175 LearningRate 0.0001 Epoch: 27 Global Step: 46730 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:48:30,735-Speed 24551.72 samples/sec Loss 1.6221 LearningRate 0.0001 Epoch: 27 Global Step: 46740 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:48:40,768-Speed 24499.66 samples/sec Loss 1.6227 LearningRate 0.0001 Epoch: 27 Global Step: 46750 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:48:50,822-Speed 24445.66 samples/sec Loss 1.6339 LearningRate 0.0001 Epoch: 27 Global Step: 46760 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:49:00,865-Speed 24473.81 samples/sec Loss 1.6465 LearningRate 0.0001 Epoch: 27 Global Step: 46770 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:49:10,974-Speed 24316.40 samples/sec Loss 1.6339 LearningRate 0.0001 Epoch: 27 Global Step: 46780 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:49:20,881-Speed 24811.15 samples/sec Loss 1.6295 LearningRate 0.0001 Epoch: 27 Global Step: 46790 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:49:30,971-Speed 24357.66 samples/sec Loss 1.6297 LearningRate 0.0001 Epoch: 27 Global Step: 46800 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:49:41,130-Speed 24198.46 samples/sec Loss 1.6328 LearningRate 0.0001 Epoch: 27 Global Step: 46810 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:49:51,038-Speed 24807.00 samples/sec Loss 1.6321 LearningRate 0.0001 Epoch: 27 Global Step: 46820 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:50:00,998-Speed 24678.26 samples/sec Loss 1.6310 LearningRate 0.0001 Epoch: 27 Global Step: 46830 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:50:11,098-Speed 24339.20 samples/sec Loss 1.6321 LearningRate 0.0001 Epoch: 27 Global Step: 46840 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:50:21,205-Speed 24319.39 samples/sec Loss 1.6208 LearningRate 0.0001 Epoch: 27 Global Step: 46850 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:50:31,332-Speed 24269.76 samples/sec Loss 1.6340 LearningRate 0.0001 Epoch: 27 Global Step: 46860 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:50:41,330-Speed 24584.32 samples/sec Loss 1.6302 LearningRate 0.0001 Epoch: 27 Global Step: 46870 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:50:51,451-Speed 24287.60 samples/sec Loss 1.6497 LearningRate 0.0001 Epoch: 27 Global Step: 46880 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:51:01,427-Speed 24639.43 samples/sec Loss 1.6224 LearningRate 0.0001 Epoch: 27 Global Step: 46890 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:51:11,539-Speed 24306.62 samples/sec Loss 1.6228 LearningRate 0.0001 Epoch: 27 Global Step: 46900 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:51:21,579-Speed 24481.09 samples/sec Loss 1.6169 LearningRate 0.0001 Epoch: 27 Global Step: 46910 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:51:31,518-Speed 24728.87 samples/sec Loss 1.6330 LearningRate 0.0001 Epoch: 27 Global Step: 46920 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:51:41,673-Speed 24204.34 samples/sec Loss 1.6180 LearningRate 0.0001 Epoch: 27 Global Step: 46930 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:51:51,737-Speed 24422.64 samples/sec Loss 1.6419 LearningRate 0.0001 Epoch: 27 Global Step: 46940 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:52:01,962-Speed 24037.60 samples/sec Loss 1.6300 LearningRate 0.0001 Epoch: 27 Global Step: 46950 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:52:12,037-Speed 24394.46 samples/sec Loss 1.6179 LearningRate 0.0001 Epoch: 27 Global Step: 46960 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:52:22,126-Speed 24363.68 samples/sec Loss 1.6330 LearningRate 0.0001 Epoch: 27 Global Step: 46970 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:52:32,020-Speed 24842.27 samples/sec Loss 1.6225 LearningRate 0.0001 Epoch: 27 Global Step: 46980 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:52:42,088-Speed 24415.21 samples/sec Loss 1.6277 LearningRate 0.0001 Epoch: 27 Global Step: 46990 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:52:52,085-Speed 24587.94 samples/sec Loss 1.6313 LearningRate 0.0001 Epoch: 27 Global Step: 47000 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:53:02,145-Speed 24433.10 samples/sec Loss 1.6434 LearningRate 0.0001 Epoch: 27 Global Step: 47010 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:53:12,133-Speed 24609.70 samples/sec Loss 1.6239 LearningRate 0.0001 Epoch: 27 Global Step: 47020 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:53:22,289-Speed 24202.50 samples/sec Loss 1.6245 LearningRate 0.0001 Epoch: 27 Global Step: 47030 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:53:32,404-Speed 24299.62 samples/sec Loss 1.6355 LearningRate 0.0001 Epoch: 27 Global Step: 47040 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:53:42,387-Speed 24623.01 samples/sec Loss 1.6188 LearningRate 0.0001 Epoch: 27 Global Step: 47050 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:53:52,261-Speed 24892.31 samples/sec Loss 1.6227 LearningRate 0.0001 Epoch: 27 Global Step: 47060 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:54:02,197-Speed 24736.37 samples/sec Loss 1.6387 LearningRate 0.0001 Epoch: 27 Global Step: 47070 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:54:12,153-Speed 24689.91 samples/sec Loss 1.6109 LearningRate 0.0001 Epoch: 27 Global Step: 47080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-26 11:54:22,146-Speed 24599.21 samples/sec Loss 1.6169 LearningRate 0.0001 Epoch: 27 Global Step: 47090 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:54:32,204-Speed 24436.11 samples/sec Loss 1.6302 LearningRate 0.0001 Epoch: 27 Global Step: 47100 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:54:42,273-Speed 24411.30 samples/sec Loss 1.6311 LearningRate 0.0001 Epoch: 27 Global Step: 47110 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:54:52,384-Speed 24310.12 samples/sec Loss 1.6270 LearningRate 0.0001 Epoch: 27 Global Step: 47120 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-26 11:55:02,423-Speed 24485.35 samples/sec Loss 1.6323 LearningRate 0.0001 Epoch: 27 Global Step: 47130 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-26 11:55:12,431-Speed 24559.98 samples/sec Loss 1.6146 LearningRate 0.0001 Epoch: 27 Global Step: 47140 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-26 11:55:22,411-Speed 24628.62 samples/sec Loss 1.6207 LearningRate 0.0001 Epoch: 27 Global Step: 47150 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-26 11:55:32,489-Speed 24388.33 samples/sec Loss 1.6256 LearningRate 0.0001 Epoch: 27 Global Step: 47160 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-26 11:55:42,504-Speed 24541.79 samples/sec Loss 1.6145 LearningRate 0.0001 Epoch: 27 Global Step: 47170 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-26 11:55:52,484-Speed 24631.29 samples/sec Loss 1.6196 LearningRate 0.0001 Epoch: 27 Global Step: 47180 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-26 11:56:02,596-Speed 24306.02 samples/sec Loss 1.6165 LearningRate 0.0001 Epoch: 27 Global Step: 47190 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-26 11:56:12,655-Speed 24435.84 samples/sec Loss 1.6109 LearningRate 0.0001 Epoch: 27 Global Step: 47200 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-26 11:56:22,703-Speed 24461.14 samples/sec Loss 1.6252 LearningRate 0.0001 Epoch: 27 Global Step: 47210 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-26 11:56:32,656-Speed 24695.71 samples/sec Loss 1.6059 LearningRate 0.0001 Epoch: 27 Global Step: 47220 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:56:42,588-Speed 24747.31 samples/sec Loss 1.6324 LearningRate 0.0001 Epoch: 27 Global Step: 47230 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:56:52,585-Speed 24587.02 samples/sec Loss 1.6173 LearningRate 0.0001 Epoch: 27 Global Step: 47240 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:57:02,657-Speed 24402.00 samples/sec Loss 1.6283 LearningRate 0.0001 Epoch: 27 Global Step: 47250 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:57:12,400-Speed 25227.13 samples/sec Loss 1.6239 LearningRate 0.0001 Epoch: 27 Global Step: 47260 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:57:22,283-Speed 24869.66 samples/sec Loss 1.6068 LearningRate 0.0001 Epoch: 27 Global Step: 47270 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:57:32,044-Speed 25181.69 samples/sec Loss 1.6157 LearningRate 0.0001 Epoch: 27 Global Step: 47280 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:57:41,890-Speed 24963.61 samples/sec Loss 1.6222 LearningRate 0.0001 Epoch: 27 Global Step: 47290 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:57:51,758-Speed 24908.45 samples/sec Loss 1.6128 LearningRate 0.0001 Epoch: 27 Global Step: 47300 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:58:01,494-Speed 25246.25 samples/sec Loss 1.6147 LearningRate 0.0001 Epoch: 27 Global Step: 47310 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:58:11,269-Speed 25145.03 samples/sec Loss 1.6149 LearningRate 0.0001 Epoch: 27 Global Step: 47320 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:58:21,050-Speed 25137.26 samples/sec Loss 1.6210 LearningRate 0.0001 Epoch: 27 Global Step: 47330 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:58:30,820-Speed 25160.18 samples/sec Loss 1.6144 LearningRate 0.0001 Epoch: 27 Global Step: 47340 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:58:40,615-Speed 25094.49 samples/sec Loss 1.6184 LearningRate 0.0001 Epoch: 27 Global Step: 47350 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:58:50,376-Speed 25181.01 samples/sec Loss 1.6072 LearningRate 0.0001 Epoch: 27 Global Step: 47360 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:59:00,232-Speed 24937.70 samples/sec Loss 1.6125 LearningRate 0.0001 Epoch: 27 Global Step: 47370 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:59:10,029-Speed 25087.84 samples/sec Loss 1.6128 LearningRate 0.0001 Epoch: 27 Global Step: 47380 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:59:19,863-Speed 24994.12 samples/sec Loss 1.6040 LearningRate 0.0001 Epoch: 27 Global Step: 47390 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:59:29,726-Speed 24920.52 samples/sec Loss 1.6186 LearningRate 0.0001 Epoch: 27 Global Step: 47400 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:59:39,471-Speed 25223.06 samples/sec Loss 1.6127 LearningRate 0.0001 Epoch: 27 Global Step: 47410 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:59:49,271-Speed 25080.34 samples/sec Loss 1.6178 LearningRate 0.0001 Epoch: 27 Global Step: 47420 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 11:59:58,978-Speed 25323.54 samples/sec Loss 1.6139 LearningRate 0.0001 Epoch: 27 Global Step: 47430 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:00:08,744-Speed 25166.57 samples/sec Loss 1.6040 LearningRate 0.0001 Epoch: 27 Global Step: 47440 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:00:18,565-Speed 25028.00 samples/sec Loss 1.5972 LearningRate 0.0001 Epoch: 27 Global Step: 47450 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:00:28,517-Speed 24698.51 samples/sec Loss 1.6131 LearningRate 0.0001 Epoch: 27 Global Step: 47460 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:00:38,436-Speed 24779.97 samples/sec Loss 1.6099 LearningRate 0.0001 Epoch: 27 Global Step: 47470 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:00:48,255-Speed 25034.40 samples/sec Loss 1.6105 LearningRate 0.0001 Epoch: 27 Global Step: 47480 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:00:58,233-Speed 24631.92 samples/sec Loss 1.6110 LearningRate 0.0001 Epoch: 27 Global Step: 47490 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:01:08,091-Speed 24936.12 samples/sec Loss 1.6014 LearningRate 0.0001 Epoch: 27 Global Step: 47500 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:01:17,892-Speed 25076.94 samples/sec Loss 1.6172 LearningRate 0.0001 Epoch: 27 Global Step: 47510 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:01:27,846-Speed 24693.46 samples/sec Loss 1.6013 LearningRate 0.0001 Epoch: 27 Global Step: 47520 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:01:37,708-Speed 24924.77 samples/sec Loss 1.5993 LearningRate 0.0001 Epoch: 27 Global Step: 47530 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:01:47,497-Speed 25108.86 samples/sec Loss 1.6109 LearningRate 0.0001 Epoch: 27 Global Step: 47540 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:01:57,302-Speed 25066.96 samples/sec Loss 1.6069 LearningRate 0.0001 Epoch: 27 Global Step: 47550 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:02:07,098-Speed 25093.25 samples/sec Loss 1.6064 LearningRate 0.0001 Epoch: 27 Global Step: 47560 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:02:16,970-Speed 24897.24 samples/sec Loss 1.6161 LearningRate 0.0001 Epoch: 27 Global Step: 47570 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:02:26,930-Speed 24676.37 samples/sec Loss 1.6174 LearningRate 0.0001 Epoch: 27 Global Step: 47580 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:02:36,805-Speed 24890.09 samples/sec Loss 1.6178 LearningRate 0.0001 Epoch: 27 Global Step: 47590 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:02:46,689-Speed 24866.50 samples/sec Loss 1.5927 LearningRate 0.0001 Epoch: 27 Global Step: 47600 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:02:56,600-Speed 24802.07 samples/sec Loss 1.5951 LearningRate 0.0001 Epoch: 27 Global Step: 47610 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:03:06,419-Speed 25030.93 samples/sec Loss 1.6154 LearningRate 0.0001 Epoch: 27 Global Step: 47620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-26 12:03:16,149-Speed 25262.41 samples/sec Loss 1.6009 LearningRate 0.0001 Epoch: 27 Global Step: 47630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-26 12:03:25,944-Speed 25093.86 samples/sec Loss 1.6104 LearningRate 0.0001 Epoch: 27 Global Step: 47640 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:03:35,704-Speed 25183.16 samples/sec Loss 1.6104 LearningRate 0.0001 Epoch: 27 Global Step: 47650 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:03:45,503-Speed 25084.83 samples/sec Loss 1.6129 LearningRate 0.0001 Epoch: 27 Global Step: 47660 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:03:55,327-Speed 25017.67 samples/sec Loss 1.6027 LearningRate 0.0001 Epoch: 27 Global Step: 47670 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:04:05,042-Speed 25301.14 samples/sec Loss 1.6194 LearningRate 0.0001 Epoch: 27 Global Step: 47680 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:04:14,899-Speed 24934.56 samples/sec Loss 1.6036 LearningRate 0.0001 Epoch: 27 Global Step: 47690 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:04:24,677-Speed 25137.73 samples/sec Loss 1.5942 LearningRate 0.0001 Epoch: 27 Global Step: 47700 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:04:34,416-Speed 25238.31 samples/sec Loss 1.6061 LearningRate 0.0001 Epoch: 27 Global Step: 47710 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:04:44,211-Speed 25092.83 samples/sec Loss 1.5987 LearningRate 0.0001 Epoch: 27 Global Step: 47720 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:04:54,051-Speed 24981.23 samples/sec Loss 1.5993 LearningRate 0.0001 Epoch: 27 Global Step: 47730 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:05:03,880-Speed 25005.43 samples/sec Loss 1.5906 LearningRate 0.0001 Epoch: 27 Global Step: 47740 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:05:13,692-Speed 25050.95 samples/sec Loss 1.5939 LearningRate 0.0001 Epoch: 27 Global Step: 47750 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:05:23,419-Speed 25269.02 samples/sec Loss 1.5983 LearningRate 0.0001 Epoch: 27 Global Step: 47760 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:05:33,138-Speed 25287.72 samples/sec Loss 1.6038 LearningRate 0.0001 Epoch: 27 Global Step: 47770 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:05:42,952-Speed 25046.53 samples/sec Loss 1.6066 LearningRate 0.0001 Epoch: 27 Global Step: 47780 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:05:52,663-Speed 25316.54 samples/sec Loss 1.6017 LearningRate 0.0001 Epoch: 27 Global Step: 47790 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:06:02,355-Speed 25359.86 samples/sec Loss 1.5992 LearningRate 0.0001 Epoch: 27 Global Step: 47800 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:06:12,049-Speed 25356.30 samples/sec Loss 1.6006 LearningRate 0.0001 Epoch: 27 Global Step: 47810 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:06:21,762-Speed 25305.95 samples/sec Loss 1.5984 LearningRate 0.0001 Epoch: 27 Global Step: 47820 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:06:31,570-Speed 25060.53 samples/sec Loss 1.6062 LearningRate 0.0001 Epoch: 27 Global Step: 47830 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:06:41,294-Speed 25275.23 samples/sec Loss 1.6037 LearningRate 0.0001 Epoch: 27 Global Step: 47840 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:06:51,120-Speed 25016.30 samples/sec Loss 1.5947 LearningRate 0.0001 Epoch: 27 Global Step: 47850 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:07:00,913-Speed 25098.68 samples/sec Loss 1.5821 LearningRate 0.0001 Epoch: 27 Global Step: 47860 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:07:10,642-Speed 25263.29 samples/sec Loss 1.5925 LearningRate 0.0001 Epoch: 27 Global Step: 47870 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:07:20,419-Speed 25139.82 samples/sec Loss 1.6015 LearningRate 0.0001 Epoch: 27 Global Step: 47880 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:07:30,162-Speed 25227.40 samples/sec Loss 1.5987 LearningRate 0.0001 Epoch: 27 Global Step: 47890 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:07:39,910-Speed 25215.68 samples/sec Loss 1.5988 LearningRate 0.0001 Epoch: 27 Global Step: 47900 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:07:49,627-Speed 25293.74 samples/sec Loss 1.5901 LearningRate 0.0001 Epoch: 27 Global Step: 47910 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:07:59,360-Speed 25255.77 samples/sec Loss 1.5953 LearningRate 0.0001 Epoch: 27 Global Step: 47920 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:08:09,160-Speed 25080.86 samples/sec Loss 1.6057 LearningRate 0.0001 Epoch: 27 Global Step: 47930 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:08:18,956-Speed 25089.43 samples/sec Loss 1.5954 LearningRate 0.0001 Epoch: 27 Global Step: 47940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-26 12:08:28,811-Speed 24941.95 samples/sec Loss 1.5924 LearningRate 0.0001 Epoch: 27 Global Step: 47950 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:08:38,583-Speed 25154.84 samples/sec Loss 1.6050 LearningRate 0.0001 Epoch: 27 Global Step: 47960 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:08:48,434-Speed 24949.14 samples/sec Loss 1.5996 LearningRate 0.0001 Epoch: 27 Global Step: 47970 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:08:58,226-Speed 25105.74 samples/sec Loss 1.5954 LearningRate 0.0001 Epoch: 27 Global Step: 47980 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:09:08,017-Speed 25104.23 samples/sec Loss 1.5762 LearningRate 0.0001 Epoch: 27 Global Step: 47990 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:09:17,764-Speed 25217.71 samples/sec Loss 1.5998 LearningRate 0.0001 Epoch: 27 Global Step: 48000 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:09:27,478-Speed 25303.43 samples/sec Loss 1.5921 LearningRate 0.0001 Epoch: 27 Global Step: 48010 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:09:37,282-Speed 25071.56 samples/sec Loss 1.5853 LearningRate 0.0001 Epoch: 27 Global Step: 48020 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:09:46,997-Speed 25300.50 samples/sec Loss 1.6017 LearningRate 0.0001 Epoch: 27 Global Step: 48030 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:09:56,737-Speed 25235.61 samples/sec Loss 1.5902 LearningRate 0.0001 Epoch: 27 Global Step: 48040 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:10:06,455-Speed 25293.67 samples/sec Loss 1.5922 LearningRate 0.0001 Epoch: 27 Global Step: 48050 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:10:16,227-Speed 25153.88 samples/sec Loss 1.5967 LearningRate 0.0001 Epoch: 27 Global Step: 48060 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:10:25,975-Speed 25216.13 samples/sec Loss 1.5870 LearningRate 0.0001 Epoch: 27 Global Step: 48070 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:10:35,821-Speed 24965.19 samples/sec Loss 1.5875 LearningRate 0.0001 Epoch: 27 Global Step: 48080 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:10:45,631-Speed 25055.80 samples/sec Loss 1.5873 LearningRate 0.0001 Epoch: 27 Global Step: 48090 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:10:55,322-Speed 25363.73 samples/sec Loss 1.5996 LearningRate 0.0001 Epoch: 27 Global Step: 48100 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:11:05,096-Speed 25148.59 samples/sec Loss 1.5984 LearningRate 0.0001 Epoch: 27 Global Step: 48110 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:11:14,882-Speed 25116.36 samples/sec Loss 1.5971 LearningRate 0.0001 Epoch: 27 Global Step: 48120 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:11:24,702-Speed 25034.36 samples/sec Loss 1.5841 LearningRate 0.0001 Epoch: 27 Global Step: 48130 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:11:34,428-Speed 25280.22 samples/sec Loss 1.5975 LearningRate 0.0001 Epoch: 27 Global Step: 48140 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:11:44,170-Speed 25229.99 samples/sec Loss 1.5818 LearningRate 0.0001 Epoch: 27 Global Step: 48150 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:11:53,894-Speed 25278.41 samples/sec Loss 1.5814 LearningRate 0.0001 Epoch: 27 Global Step: 48160 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:12:03,703-Speed 25061.95 samples/sec Loss 1.5917 LearningRate 0.0001 Epoch: 27 Global Step: 48170 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:12:13,417-Speed 25304.04 samples/sec Loss 1.6077 LearningRate 0.0001 Epoch: 27 Global Step: 48180 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:12:23,175-Speed 25189.06 samples/sec Loss 1.5959 LearningRate 0.0001 Epoch: 27 Global Step: 48190 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:12:32,959-Speed 25119.66 samples/sec Loss 1.5947 LearningRate 0.0001 Epoch: 27 Global Step: 48200 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:12:42,748-Speed 25109.89 samples/sec Loss 1.5964 LearningRate 0.0001 Epoch: 27 Global Step: 48210 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:12:52,480-Speed 25255.53 samples/sec Loss 1.5865 LearningRate 0.0001 Epoch: 27 Global Step: 48220 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:13:02,199-Speed 25292.27 samples/sec Loss 1.5895 LearningRate 0.0001 Epoch: 27 Global Step: 48230 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:13:11,995-Speed 25088.99 samples/sec Loss 1.5900 LearningRate 0.0001 Epoch: 27 Global Step: 48240 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:13:21,688-Speed 25359.16 samples/sec Loss 1.5901 LearningRate 0.0001 Epoch: 27 Global Step: 48250 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:13:31,421-Speed 25251.98 samples/sec Loss 1.5906 LearningRate 0.0001 Epoch: 27 Global Step: 48260 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:13:41,168-Speed 25218.64 samples/sec Loss 1.5954 LearningRate 0.0001 Epoch: 27 Global Step: 48270 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:13:50,884-Speed 25296.34 samples/sec Loss 1.5987 LearningRate 0.0001 Epoch: 27 Global Step: 48280 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:14:00,727-Speed 24970.91 samples/sec Loss 1.5903 LearningRate 0.0001 Epoch: 27 Global Step: 48290 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:14:10,660-Speed 24746.10 samples/sec Loss 1.5795 LearningRate 0.0001 Epoch: 27 Global Step: 48300 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:14:20,478-Speed 25033.69 samples/sec Loss 1.5871 LearningRate 0.0001 Epoch: 27 Global Step: 48310 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:14:30,284-Speed 25068.32 samples/sec Loss 1.5840 LearningRate 0.0001 Epoch: 27 Global Step: 48320 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:14:40,081-Speed 25087.17 samples/sec Loss 1.5903 LearningRate 0.0001 Epoch: 27 Global Step: 48330 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:14:49,826-Speed 25223.33 samples/sec Loss 1.5731 LearningRate 0.0001 Epoch: 27 Global Step: 48340 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:14:59,658-Speed 25001.29 samples/sec Loss 1.5970 LearningRate 0.0001 Epoch: 27 Global Step: 48350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-26 12:15:09,508-Speed 24952.39 samples/sec Loss 1.5935 LearningRate 0.0001 Epoch: 27 Global Step: 48360 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:15:19,221-Speed 25307.54 samples/sec Loss 1.6015 LearningRate 0.0001 Epoch: 27 Global Step: 48370 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:15:28,985-Speed 25175.27 samples/sec Loss 1.6010 LearningRate 0.0001 Epoch: 27 Global Step: 48380 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:15:38,787-Speed 25076.03 samples/sec Loss 1.6009 LearningRate 0.0001 Epoch: 27 Global Step: 48390 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-26 12:16:37,813-Speed 4163.70 samples/sec Loss 1.5899 LearningRate 0.0001 Epoch: 28 Global Step: 48400 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-26 12:16:47,604-Speed 25103.76 samples/sec Loss 1.5760 LearningRate 0.0001 Epoch: 28 Global Step: 48410 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-26 12:16:57,372-Speed 25164.28 samples/sec Loss 1.5611 LearningRate 0.0001 Epoch: 28 Global Step: 48420 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-26 12:17:07,039-Speed 25426.76 samples/sec Loss 1.5867 LearningRate 0.0001 Epoch: 28 Global Step: 48430 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-26 12:17:16,760-Speed 25284.99 samples/sec Loss 1.5773 LearningRate 0.0001 Epoch: 28 Global Step: 48440 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-26 12:17:26,453-Speed 25357.53 samples/sec Loss 1.5700 LearningRate 0.0001 Epoch: 28 Global Step: 48450 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-26 12:17:36,176-Speed 25277.68 samples/sec Loss 1.5691 LearningRate 0.0001 Epoch: 28 Global Step: 48460 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-26 12:17:45,890-Speed 25304.46 samples/sec Loss 1.5778 LearningRate 0.0001 Epoch: 28 Global Step: 48470 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-26 12:17:55,574-Speed 25382.40 samples/sec Loss 1.5822 LearningRate 0.0001 Epoch: 28 Global Step: 48480 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-26 12:18:05,329-Speed 25195.49 samples/sec Loss 1.5712 LearningRate 0.0001 Epoch: 28 Global Step: 48490 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:18:15,047-Speed 25292.17 samples/sec Loss 1.5610 LearningRate 0.0001 Epoch: 28 Global Step: 48500 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:18:24,776-Speed 25264.72 samples/sec Loss 1.5714 LearningRate 0.0001 Epoch: 28 Global Step: 48510 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:18:34,468-Speed 25360.17 samples/sec Loss 1.5714 LearningRate 0.0001 Epoch: 28 Global Step: 48520 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:18:44,207-Speed 25239.10 samples/sec Loss 1.5732 LearningRate 0.0001 Epoch: 28 Global Step: 48530 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:18:53,947-Speed 25234.43 samples/sec Loss 1.5613 LearningRate 0.0001 Epoch: 28 Global Step: 48540 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:19:03,829-Speed 24872.19 samples/sec Loss 1.5816 LearningRate 0.0001 Epoch: 28 Global Step: 48550 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:19:13,545-Speed 25300.92 samples/sec Loss 1.5626 LearningRate 0.0001 Epoch: 28 Global Step: 48560 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:19:23,368-Speed 25020.70 samples/sec Loss 1.5757 LearningRate 0.0001 Epoch: 28 Global Step: 48570 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:19:33,188-Speed 25031.71 samples/sec Loss 1.5847 LearningRate 0.0001 Epoch: 28 Global Step: 48580 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-26 12:19:42,914-Speed 25271.32 samples/sec Loss 1.5802 LearningRate 0.0001 Epoch: 28 Global Step: 48590 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-26 12:19:52,680-Speed 25169.55 samples/sec Loss 1.5911 LearningRate 0.0001 Epoch: 28 Global Step: 48600 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-26 12:20:02,735-Speed 24444.00 samples/sec Loss 1.5778 LearningRate 0.0001 Epoch: 28 Global Step: 48610 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-26 12:20:12,827-Speed 24356.29 samples/sec Loss 1.5782 LearningRate 0.0001 Epoch: 28 Global Step: 48620 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-26 12:20:22,915-Speed 24362.78 samples/sec Loss 1.5726 LearningRate 0.0001 Epoch: 28 Global Step: 48630 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-26 12:20:32,991-Speed 24394.75 samples/sec Loss 1.5865 LearningRate 0.0001 Epoch: 28 Global Step: 48640 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-26 12:20:43,056-Speed 24426.46 samples/sec Loss 1.5720 LearningRate 0.0001 Epoch: 28 Global Step: 48650 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-26 12:20:53,136-Speed 24383.58 samples/sec Loss 1.5807 LearningRate 0.0001 Epoch: 28 Global Step: 48660 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-26 12:21:02,935-Speed 25083.15 samples/sec Loss 1.5741 LearningRate 0.0001 Epoch: 28 Global Step: 48670 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-26 12:21:12,700-Speed 25171.02 samples/sec Loss 1.5639 LearningRate 0.0001 Epoch: 28 Global Step: 48680 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:21:22,470-Speed 25162.10 samples/sec Loss 1.5785 LearningRate 0.0001 Epoch: 28 Global Step: 48690 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:21:32,196-Speed 25270.75 samples/sec Loss 1.5625 LearningRate 0.0001 Epoch: 28 Global Step: 48700 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:21:41,880-Speed 25382.62 samples/sec Loss 1.5660 LearningRate 0.0001 Epoch: 28 Global Step: 48710 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:21:51,577-Speed 25347.06 samples/sec Loss 1.5696 LearningRate 0.0001 Epoch: 28 Global Step: 48720 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:22:01,421-Speed 24968.83 samples/sec Loss 1.5799 LearningRate 0.0001 Epoch: 28 Global Step: 48730 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:22:11,305-Speed 24873.76 samples/sec Loss 1.5751 LearningRate 0.0001 Epoch: 28 Global Step: 48740 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:22:21,145-Speed 24980.03 samples/sec Loss 1.5772 LearningRate 0.0001 Epoch: 28 Global Step: 48750 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:22:31,043-Speed 24831.20 samples/sec Loss 1.5645 LearningRate 0.0001 Epoch: 28 Global Step: 48760 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:22:40,962-Speed 24779.56 samples/sec Loss 1.5722 LearningRate 0.0001 Epoch: 28 Global Step: 48770 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:22:50,771-Speed 25059.42 samples/sec Loss 1.5793 LearningRate 0.0001 Epoch: 28 Global Step: 48780 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:23:00,673-Speed 24823.42 samples/sec Loss 1.5667 LearningRate 0.0001 Epoch: 28 Global Step: 48790 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:23:10,491-Speed 25034.18 samples/sec Loss 1.5698 LearningRate 0.0001 Epoch: 28 Global Step: 48800 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:23:20,223-Speed 25257.69 samples/sec Loss 1.5820 LearningRate 0.0001 Epoch: 28 Global Step: 48810 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:23:29,993-Speed 25156.84 samples/sec Loss 1.5805 LearningRate 0.0001 Epoch: 28 Global Step: 48820 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:23:39,707-Speed 25303.06 samples/sec Loss 1.5752 LearningRate 0.0001 Epoch: 28 Global Step: 48830 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:23:49,545-Speed 24985.06 samples/sec Loss 1.5763 LearningRate 0.0001 Epoch: 28 Global Step: 48840 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:23:59,418-Speed 24895.35 samples/sec Loss 1.5717 LearningRate 0.0001 Epoch: 28 Global Step: 48850 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:24:09,222-Speed 25069.42 samples/sec Loss 1.5688 LearningRate 0.0001 Epoch: 28 Global Step: 48860 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:24:18,931-Speed 25316.62 samples/sec Loss 1.5724 LearningRate 0.0001 Epoch: 28 Global Step: 48870 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:24:28,624-Speed 25358.29 samples/sec Loss 1.5480 LearningRate 0.0001 Epoch: 28 Global Step: 48880 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:24:38,371-Speed 25216.98 samples/sec Loss 1.5682 LearningRate 0.0001 Epoch: 28 Global Step: 48890 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:24:48,044-Speed 25411.27 samples/sec Loss 1.5672 LearningRate 0.0001 Epoch: 28 Global Step: 48900 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:24:57,770-Speed 25271.06 samples/sec Loss 1.5702 LearningRate 0.0001 Epoch: 28 Global Step: 48910 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:25:07,500-Speed 25261.76 samples/sec Loss 1.5734 LearningRate 0.0001 Epoch: 28 Global Step: 48920 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:25:17,220-Speed 25288.79 samples/sec Loss 1.5780 LearningRate 0.0001 Epoch: 28 Global Step: 48930 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:25:26,911-Speed 25364.84 samples/sec Loss 1.5689 LearningRate 0.0001 Epoch: 28 Global Step: 48940 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:25:36,697-Speed 25117.66 samples/sec Loss 1.5684 LearningRate 0.0001 Epoch: 28 Global Step: 48950 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:25:46,417-Speed 25285.86 samples/sec Loss 1.5619 LearningRate 0.0001 Epoch: 28 Global Step: 48960 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:25:56,168-Speed 25207.61 samples/sec Loss 1.5554 LearningRate 0.0001 Epoch: 28 Global Step: 48970 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:26:05,888-Speed 25285.95 samples/sec Loss 1.5667 LearningRate 0.0001 Epoch: 28 Global Step: 48980 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:26:15,737-Speed 24955.73 samples/sec Loss 1.5777 LearningRate 0.0001 Epoch: 28 Global Step: 48990 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:26:25,426-Speed 25369.03 samples/sec Loss 1.5663 LearningRate 0.0001 Epoch: 28 Global Step: 49000 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:26:35,149-Speed 25278.05 samples/sec Loss 1.5489 LearningRate 0.0001 Epoch: 28 Global Step: 49010 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:26:44,907-Speed 25189.68 samples/sec Loss 1.5740 LearningRate 0.0001 Epoch: 28 Global Step: 49020 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:26:54,639-Speed 25253.64 samples/sec Loss 1.5656 LearningRate 0.0001 Epoch: 28 Global Step: 49030 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:27:04,310-Speed 25417.43 samples/sec Loss 1.5712 LearningRate 0.0001 Epoch: 28 Global Step: 49040 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:27:14,075-Speed 25171.18 samples/sec Loss 1.5815 LearningRate 0.0001 Epoch: 28 Global Step: 49050 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:27:23,819-Speed 25225.89 samples/sec Loss 1.5700 LearningRate 0.0001 Epoch: 28 Global Step: 49060 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-26 12:27:33,653-Speed 24995.46 samples/sec Loss 1.5780 LearningRate 0.0001 Epoch: 28 Global Step: 49070 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-26 12:27:43,412-Speed 25186.13 samples/sec Loss 1.5680 LearningRate 0.0001 Epoch: 28 Global Step: 49080 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-26 12:27:53,198-Speed 25115.78 samples/sec Loss 1.5736 LearningRate 0.0001 Epoch: 28 Global Step: 49090 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-26 12:28:02,890-Speed 25360.08 samples/sec Loss 1.5651 LearningRate 0.0001 Epoch: 28 Global Step: 49100 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-26 12:28:12,647-Speed 25194.14 samples/sec Loss 1.5608 LearningRate 0.0001 Epoch: 28 Global Step: 49110 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-26 12:28:22,364-Speed 25294.53 samples/sec Loss 1.5727 LearningRate 0.0001 Epoch: 28 Global Step: 49120 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-26 12:28:32,137-Speed 25149.16 samples/sec Loss 1.5650 LearningRate 0.0001 Epoch: 28 Global Step: 49130 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-26 12:28:41,947-Speed 25057.78 samples/sec Loss 1.5694 LearningRate 0.0001 Epoch: 28 Global Step: 49140 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-26 12:28:51,649-Speed 25336.82 samples/sec Loss 1.5641 LearningRate 0.0001 Epoch: 28 Global Step: 49150 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-26 12:29:01,412-Speed 25175.33 samples/sec Loss 1.5786 LearningRate 0.0001 Epoch: 28 Global Step: 49160 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:29:11,206-Speed 25097.96 samples/sec Loss 1.5552 LearningRate 0.0001 Epoch: 28 Global Step: 49170 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:29:20,921-Speed 25300.96 samples/sec Loss 1.5559 LearningRate 0.0001 Epoch: 28 Global Step: 49180 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:29:30,695-Speed 25149.44 samples/sec Loss 1.5668 LearningRate 0.0001 Epoch: 28 Global Step: 49190 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:29:40,413-Speed 25293.93 samples/sec Loss 1.5679 LearningRate 0.0001 Epoch: 28 Global Step: 49200 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:29:50,181-Speed 25162.38 samples/sec Loss 1.5552 LearningRate 0.0001 Epoch: 28 Global Step: 49210 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:29:59,968-Speed 25114.17 samples/sec Loss 1.5554 LearningRate 0.0001 Epoch: 28 Global Step: 49220 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:30:09,712-Speed 25227.19 samples/sec Loss 1.5587 LearningRate 0.0001 Epoch: 28 Global Step: 49230 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:30:19,573-Speed 24923.38 samples/sec Loss 1.5565 LearningRate 0.0001 Epoch: 28 Global Step: 49240 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:30:29,329-Speed 25196.49 samples/sec Loss 1.5540 LearningRate 0.0001 Epoch: 28 Global Step: 49250 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:30:39,067-Speed 25239.41 samples/sec Loss 1.5499 LearningRate 0.0001 Epoch: 28 Global Step: 49260 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:30:48,774-Speed 25320.76 samples/sec Loss 1.5586 LearningRate 0.0001 Epoch: 28 Global Step: 49270 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:30:58,641-Speed 24912.50 samples/sec Loss 1.5567 LearningRate 0.0001 Epoch: 28 Global Step: 49280 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:31:08,386-Speed 25222.00 samples/sec Loss 1.5441 LearningRate 0.0001 Epoch: 28 Global Step: 49290 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:31:18,168-Speed 25127.77 samples/sec Loss 1.5499 LearningRate 0.0001 Epoch: 28 Global Step: 49300 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:31:27,939-Speed 25163.42 samples/sec Loss 1.5515 LearningRate 0.0001 Epoch: 28 Global Step: 49310 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:31:37,619-Speed 25391.44 samples/sec Loss 1.5516 LearningRate 0.0001 Epoch: 28 Global Step: 49320 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-26 12:31:47,416-Speed 25088.82 samples/sec Loss 1.5567 LearningRate 0.0001 Epoch: 28 Global Step: 49330 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-26 12:31:57,117-Speed 25337.03 samples/sec Loss 1.5520 LearningRate 0.0001 Epoch: 28 Global Step: 49340 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-26 12:32:06,803-Speed 25375.70 samples/sec Loss 1.5529 LearningRate 0.0001 Epoch: 28 Global Step: 49350 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-26 12:32:16,508-Speed 25327.28 samples/sec Loss 1.5510 LearningRate 0.0001 Epoch: 28 Global Step: 49360 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-26 12:32:26,311-Speed 25073.71 samples/sec Loss 1.5313 LearningRate 0.0001 Epoch: 28 Global Step: 49370 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-26 12:32:36,064-Speed 25203.99 samples/sec Loss 1.5433 LearningRate 0.0001 Epoch: 28 Global Step: 49380 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-26 12:32:45,849-Speed 25118.26 samples/sec Loss 1.5561 LearningRate 0.0001 Epoch: 28 Global Step: 49390 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-26 12:32:55,585-Speed 25245.47 samples/sec Loss 1.5522 LearningRate 0.0001 Epoch: 28 Global Step: 49400 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-26 12:33:05,571-Speed 24615.02 samples/sec Loss 1.5480 LearningRate 0.0001 Epoch: 28 Global Step: 49410 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-26 12:33:15,618-Speed 24464.57 samples/sec Loss 1.5506 LearningRate 0.0001 Epoch: 28 Global Step: 49420 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:33:25,595-Speed 24636.27 samples/sec Loss 1.5663 LearningRate 0.0001 Epoch: 28 Global Step: 49430 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:33:35,505-Speed 24801.39 samples/sec Loss 1.5458 LearningRate 0.0001 Epoch: 28 Global Step: 49440 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:33:45,522-Speed 24537.86 samples/sec Loss 1.5548 LearningRate 0.0001 Epoch: 28 Global Step: 49450 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:33:55,497-Speed 24639.69 samples/sec Loss 1.5564 LearningRate 0.0001 Epoch: 28 Global Step: 49460 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:34:05,495-Speed 24584.90 samples/sec Loss 1.5579 LearningRate 0.0001 Epoch: 28 Global Step: 49470 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:34:15,545-Speed 24458.43 samples/sec Loss 1.5528 LearningRate 0.0001 Epoch: 28 Global Step: 49480 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:34:25,563-Speed 24534.00 samples/sec Loss 1.5555 LearningRate 0.0001 Epoch: 28 Global Step: 49490 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:34:35,581-Speed 24535.46 samples/sec Loss 1.5523 LearningRate 0.0001 Epoch: 28 Global Step: 49500 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:34:45,690-Speed 24314.61 samples/sec Loss 1.5497 LearningRate 0.0001 Epoch: 28 Global Step: 49510 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:34:55,699-Speed 24558.64 samples/sec Loss 1.5486 LearningRate 0.0001 Epoch: 28 Global Step: 49520 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:35:05,712-Speed 24546.85 samples/sec Loss 1.5445 LearningRate 0.0001 Epoch: 28 Global Step: 49530 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:35:15,748-Speed 24491.79 samples/sec Loss 1.5503 LearningRate 0.0001 Epoch: 28 Global Step: 49540 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:35:25,865-Speed 24295.41 samples/sec Loss 1.5555 LearningRate 0.0001 Epoch: 28 Global Step: 49550 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:35:36,037-Speed 24162.71 samples/sec Loss 1.5453 LearningRate 0.0001 Epoch: 28 Global Step: 49560 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:35:46,018-Speed 24626.68 samples/sec Loss 1.5551 LearningRate 0.0001 Epoch: 28 Global Step: 49570 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:35:56,016-Speed 24584.38 samples/sec Loss 1.5461 LearningRate 0.0001 Epoch: 28 Global Step: 49580 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:36:06,166-Speed 24217.18 samples/sec Loss 1.5473 LearningRate 0.0001 Epoch: 28 Global Step: 49590 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:36:16,190-Speed 24521.34 samples/sec Loss 1.5530 LearningRate 0.0001 Epoch: 28 Global Step: 49600 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:36:26,173-Speed 24620.78 samples/sec Loss 1.5509 LearningRate 0.0001 Epoch: 28 Global Step: 49610 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:36:36,286-Speed 24305.03 samples/sec Loss 1.5430 LearningRate 0.0001 Epoch: 28 Global Step: 49620 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-26 12:36:46,507-Speed 24045.26 samples/sec Loss 1.5331 LearningRate 0.0001 Epoch: 28 Global Step: 49630 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:36:56,504-Speed 24589.67 samples/sec Loss 1.5468 LearningRate 0.0001 Epoch: 28 Global Step: 49640 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:37:06,576-Speed 24403.09 samples/sec Loss 1.5348 LearningRate 0.0001 Epoch: 28 Global Step: 49650 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:37:16,625-Speed 24461.21 samples/sec Loss 1.5546 LearningRate 0.0001 Epoch: 28 Global Step: 49660 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:37:26,699-Speed 24405.01 samples/sec Loss 1.5486 LearningRate 0.0001 Epoch: 28 Global Step: 49670 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:37:36,743-Speed 24473.54 samples/sec Loss 1.5455 LearningRate 0.0001 Epoch: 28 Global Step: 49680 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:37:46,710-Speed 24659.08 samples/sec Loss 1.5468 LearningRate 0.0001 Epoch: 28 Global Step: 49690 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:37:56,828-Speed 24295.88 samples/sec Loss 1.5448 LearningRate 0.0001 Epoch: 28 Global Step: 49700 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:38:06,844-Speed 24540.00 samples/sec Loss 1.5518 LearningRate 0.0001 Epoch: 28 Global Step: 49710 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:38:16,944-Speed 24334.45 samples/sec Loss 1.5416 LearningRate 0.0001 Epoch: 28 Global Step: 49720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-26 12:38:26,996-Speed 24453.07 samples/sec Loss 1.5311 LearningRate 0.0001 Epoch: 28 Global Step: 49730 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:38:37,030-Speed 24497.31 samples/sec Loss 1.5385 LearningRate 0.0001 Epoch: 28 Global Step: 49740 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:38:46,998-Speed 24657.09 samples/sec Loss 1.5460 LearningRate 0.0001 Epoch: 28 Global Step: 49750 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:38:57,050-Speed 24454.47 samples/sec Loss 1.5562 LearningRate 0.0001 Epoch: 28 Global Step: 49760 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:39:07,156-Speed 24320.42 samples/sec Loss 1.5407 LearningRate 0.0001 Epoch: 28 Global Step: 49770 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:39:17,137-Speed 24625.72 samples/sec Loss 1.5367 LearningRate 0.0001 Epoch: 28 Global Step: 49780 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:39:27,144-Speed 24566.15 samples/sec Loss 1.5350 LearningRate 0.0001 Epoch: 28 Global Step: 49790 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:39:37,136-Speed 24597.41 samples/sec Loss 1.5380 LearningRate 0.0001 Epoch: 28 Global Step: 49800 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:39:47,173-Speed 24490.40 samples/sec Loss 1.5445 LearningRate 0.0001 Epoch: 28 Global Step: 49810 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:39:57,195-Speed 24525.40 samples/sec Loss 1.5258 LearningRate 0.0001 Epoch: 28 Global Step: 49820 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:40:07,170-Speed 24639.64 samples/sec Loss 1.5395 LearningRate 0.0001 Epoch: 28 Global Step: 49830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-26 12:40:17,329-Speed 24195.43 samples/sec Loss 1.5360 LearningRate 0.0001 Epoch: 28 Global Step: 49840 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:40:27,354-Speed 24518.52 samples/sec Loss 1.5465 LearningRate 0.0001 Epoch: 28 Global Step: 49850 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:40:37,623-Speed 23936.20 samples/sec Loss 1.5403 LearningRate 0.0001 Epoch: 28 Global Step: 49860 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:40:47,701-Speed 24387.38 samples/sec Loss 1.5424 LearningRate 0.0001 Epoch: 28 Global Step: 49870 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:40:57,789-Speed 24366.67 samples/sec Loss 1.5474 LearningRate 0.0001 Epoch: 28 Global Step: 49880 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:41:07,875-Speed 24368.94 samples/sec Loss 1.5294 LearningRate 0.0001 Epoch: 28 Global Step: 49890 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:41:17,901-Speed 24518.64 samples/sec Loss 1.5329 LearningRate 0.0001 Epoch: 28 Global Step: 49900 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:41:27,896-Speed 24593.02 samples/sec Loss 1.5482 LearningRate 0.0001 Epoch: 28 Global Step: 49910 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:41:37,961-Speed 24420.60 samples/sec Loss 1.5459 LearningRate 0.0001 Epoch: 28 Global Step: 49920 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:41:48,008-Speed 24463.51 samples/sec Loss 1.5384 LearningRate 0.0001 Epoch: 28 Global Step: 49930 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:41:58,221-Speed 24066.25 samples/sec Loss 1.5360 LearningRate 0.0001 Epoch: 28 Global Step: 49940 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:42:08,253-Speed 24500.61 samples/sec Loss 1.5427 LearningRate 0.0001 Epoch: 28 Global Step: 49950 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:42:18,286-Speed 24498.51 samples/sec Loss 1.5409 LearningRate 0.0001 Epoch: 28 Global Step: 49960 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:42:28,414-Speed 24269.40 samples/sec Loss 1.5388 LearningRate 0.0001 Epoch: 28 Global Step: 49970 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:42:38,420-Speed 24567.26 samples/sec Loss 1.5388 LearningRate 0.0001 Epoch: 28 Global Step: 49980 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:42:48,594-Speed 24161.41 samples/sec Loss 1.5361 LearningRate 0.0001 Epoch: 28 Global Step: 49990 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:42:58,668-Speed 24403.67 samples/sec Loss 1.5462 LearningRate 0.0001 Epoch: 28 Global Step: 50000 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:43:08,671-Speed 24573.02 samples/sec Loss 1.5464 LearningRate 0.0001 Epoch: 28 Global Step: 50010 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:43:18,776-Speed 24325.24 samples/sec Loss 1.5319 LearningRate 0.0001 Epoch: 28 Global Step: 50020 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:43:28,828-Speed 24452.77 samples/sec Loss 1.5335 LearningRate 0.0001 Epoch: 28 Global Step: 50030 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:43:38,812-Speed 24618.55 samples/sec Loss 1.5496 LearningRate 0.0001 Epoch: 28 Global Step: 50040 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:43:48,770-Speed 24680.51 samples/sec Loss 1.5328 LearningRate 0.0001 Epoch: 28 Global Step: 50050 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:43:58,910-Speed 24240.51 samples/sec Loss 1.5416 LearningRate 0.0001 Epoch: 28 Global Step: 50060 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:44:08,891-Speed 24629.07 samples/sec Loss 1.5377 LearningRate 0.0001 Epoch: 28 Global Step: 50070 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:44:19,002-Speed 24310.41 samples/sec Loss 1.5222 LearningRate 0.0001 Epoch: 28 Global Step: 50080 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:44:29,087-Speed 24371.20 samples/sec Loss 1.5476 LearningRate 0.0001 Epoch: 28 Global Step: 50090 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:44:39,098-Speed 24552.23 samples/sec Loss 1.5396 LearningRate 0.0001 Epoch: 28 Global Step: 50100 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:44:49,284-Speed 24137.81 samples/sec Loss 1.5568 LearningRate 0.0001 Epoch: 28 Global Step: 50110 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:44:59,253-Speed 24655.78 samples/sec Loss 1.5450 LearningRate 0.0001 Epoch: 28 Global Step: 50120 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:45:58,998-Speed 4113.58 samples/sec Loss 1.5272 LearningRate 0.0001 Epoch: 29 Global Step: 50130 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:46:09,063-Speed 24421.63 samples/sec Loss 1.5333 LearningRate 0.0001 Epoch: 29 Global Step: 50140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-26 12:46:19,015-Speed 24695.60 samples/sec Loss 1.5133 LearningRate 0.0001 Epoch: 29 Global Step: 50150 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:46:28,954-Speed 24733.01 samples/sec Loss 1.5270 LearningRate 0.0001 Epoch: 29 Global Step: 50160 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:46:38,953-Speed 24580.85 samples/sec Loss 1.5468 LearningRate 0.0001 Epoch: 29 Global Step: 50170 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:46:49,007-Speed 24448.24 samples/sec Loss 1.5252 LearningRate 0.0001 Epoch: 29 Global Step: 50180 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:46:59,086-Speed 24387.81 samples/sec Loss 1.5286 LearningRate 0.0001 Epoch: 29 Global Step: 50190 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:47:09,093-Speed 24561.79 samples/sec Loss 1.5280 LearningRate 0.0001 Epoch: 29 Global Step: 50200 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:47:19,153-Speed 24434.06 samples/sec Loss 1.5183 LearningRate 0.0001 Epoch: 29 Global Step: 50210 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:47:29,247-Speed 24351.11 samples/sec Loss 1.5151 LearningRate 0.0001 Epoch: 29 Global Step: 50220 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:47:39,315-Speed 24411.51 samples/sec Loss 1.5147 LearningRate 0.0001 Epoch: 29 Global Step: 50230 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:47:49,304-Speed 24606.45 samples/sec Loss 1.5285 LearningRate 0.0001 Epoch: 29 Global Step: 50240 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:47:59,273-Speed 24656.19 samples/sec Loss 1.5256 LearningRate 0.0001 Epoch: 29 Global Step: 50250 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:48:09,314-Speed 24478.93 samples/sec Loss 1.5314 LearningRate 0.0001 Epoch: 29 Global Step: 50260 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:48:19,439-Speed 24275.20 samples/sec Loss 1.5275 LearningRate 0.0001 Epoch: 29 Global Step: 50270 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:48:29,439-Speed 24579.62 samples/sec Loss 1.5160 LearningRate 0.0001 Epoch: 29 Global Step: 50280 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:48:39,422-Speed 24621.59 samples/sec Loss 1.5210 LearningRate 0.0001 Epoch: 29 Global Step: 50290 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:48:49,487-Speed 24420.67 samples/sec Loss 1.5334 LearningRate 0.0001 Epoch: 29 Global Step: 50300 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:48:59,574-Speed 24365.87 samples/sec Loss 1.5243 LearningRate 0.0001 Epoch: 29 Global Step: 50310 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:49:09,763-Speed 24123.90 samples/sec Loss 1.5244 LearningRate 0.0001 Epoch: 29 Global Step: 50320 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:49:19,751-Speed 24608.26 samples/sec Loss 1.5380 LearningRate 0.0001 Epoch: 29 Global Step: 50330 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:49:29,714-Speed 24671.17 samples/sec Loss 1.5292 LearningRate 0.0001 Epoch: 29 Global Step: 50340 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:49:39,763-Speed 24460.40 samples/sec Loss 1.5302 LearningRate 0.0001 Epoch: 29 Global Step: 50350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-26 12:49:49,812-Speed 24460.92 samples/sec Loss 1.5271 LearningRate 0.0001 Epoch: 29 Global Step: 50360 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:49:59,967-Speed 24206.04 samples/sec Loss 1.5296 LearningRate 0.0001 Epoch: 29 Global Step: 50370 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:50:09,943-Speed 24638.27 samples/sec Loss 1.5226 LearningRate 0.0001 Epoch: 29 Global Step: 50380 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:50:20,276-Speed 23786.57 samples/sec Loss 1.5195 LearningRate 0.0001 Epoch: 29 Global Step: 50390 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:50:30,235-Speed 24683.45 samples/sec Loss 1.5283 LearningRate 0.0001 Epoch: 29 Global Step: 50400 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:50:40,293-Speed 24436.72 samples/sec Loss 1.5216 LearningRate 0.0001 Epoch: 29 Global Step: 50410 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:50:50,385-Speed 24356.11 samples/sec Loss 1.5232 LearningRate 0.0001 Epoch: 29 Global Step: 50420 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:51:00,423-Speed 24486.08 samples/sec Loss 1.5328 LearningRate 0.0001 Epoch: 29 Global Step: 50430 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:51:10,435-Speed 24549.78 samples/sec Loss 1.5264 LearningRate 0.0001 Epoch: 29 Global Step: 50440 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:51:20,413-Speed 24631.88 samples/sec Loss 1.5222 LearningRate 0.0001 Epoch: 29 Global Step: 50450 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:51:30,371-Speed 24682.17 samples/sec Loss 1.5186 LearningRate 0.0001 Epoch: 29 Global Step: 50460 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:51:40,449-Speed 24390.69 samples/sec Loss 1.5371 LearningRate 0.0001 Epoch: 29 Global Step: 50470 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:51:50,588-Speed 24240.78 samples/sec Loss 1.5249 LearningRate 0.0001 Epoch: 29 Global Step: 50480 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:52:00,650-Speed 24426.65 samples/sec Loss 1.5182 LearningRate 0.0001 Epoch: 29 Global Step: 50490 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:52:10,809-Speed 24196.66 samples/sec Loss 1.5323 LearningRate 0.0001 Epoch: 29 Global Step: 50500 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:52:21,032-Speed 24043.42 samples/sec Loss 1.5233 LearningRate 0.0001 Epoch: 29 Global Step: 50510 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:52:31,167-Speed 24252.42 samples/sec Loss 1.5223 LearningRate 0.0001 Epoch: 29 Global Step: 50520 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:52:41,400-Speed 24020.64 samples/sec Loss 1.5171 LearningRate 0.0001 Epoch: 29 Global Step: 50530 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:52:51,603-Speed 24089.15 samples/sec Loss 1.5280 LearningRate 0.0001 Epoch: 29 Global Step: 50540 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:53:01,715-Speed 24314.42 samples/sec Loss 1.5264 LearningRate 0.0001 Epoch: 29 Global Step: 50550 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:53:11,730-Speed 24541.97 samples/sec Loss 1.5308 LearningRate 0.0001 Epoch: 29 Global Step: 50560 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:53:21,765-Speed 24493.00 samples/sec Loss 1.5295 LearningRate 0.0001 Epoch: 29 Global Step: 50570 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:53:31,960-Speed 24108.99 samples/sec Loss 1.5322 LearningRate 0.0001 Epoch: 29 Global Step: 50580 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:53:42,014-Speed 24446.16 samples/sec Loss 1.5137 LearningRate 0.0001 Epoch: 29 Global Step: 50590 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:53:52,192-Speed 24150.39 samples/sec Loss 1.5222 LearningRate 0.0001 Epoch: 29 Global Step: 50600 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:54:02,242-Speed 24455.32 samples/sec Loss 1.5213 LearningRate 0.0001 Epoch: 29 Global Step: 50610 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:54:12,362-Speed 24289.71 samples/sec Loss 1.5238 LearningRate 0.0001 Epoch: 29 Global Step: 50620 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:54:22,505-Speed 24232.85 samples/sec Loss 1.5173 LearningRate 0.0001 Epoch: 29 Global Step: 50630 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:54:32,555-Speed 24460.92 samples/sec Loss 1.5099 LearningRate 0.0001 Epoch: 29 Global Step: 50640 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:54:42,537-Speed 24625.30 samples/sec Loss 1.5167 LearningRate 0.0001 Epoch: 29 Global Step: 50650 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:54:52,455-Speed 24784.34 samples/sec Loss 1.5131 LearningRate 0.0001 Epoch: 29 Global Step: 50660 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:55:02,464-Speed 24559.50 samples/sec Loss 1.5132 LearningRate 0.0001 Epoch: 29 Global Step: 50670 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:55:12,407-Speed 24718.63 samples/sec Loss 1.5222 LearningRate 0.0001 Epoch: 29 Global Step: 50680 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:55:22,285-Speed 24885.38 samples/sec Loss 1.5212 LearningRate 0.0001 Epoch: 29 Global Step: 50690 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:55:32,120-Speed 25000.31 samples/sec Loss 1.5092 LearningRate 0.0001 Epoch: 29 Global Step: 50700 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:55:41,906-Speed 25117.48 samples/sec Loss 1.5285 LearningRate 0.0001 Epoch: 29 Global Step: 50710 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:55:51,799-Speed 24847.17 samples/sec Loss 1.5154 LearningRate 0.0001 Epoch: 29 Global Step: 50720 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:56:01,678-Speed 24879.63 samples/sec Loss 1.5224 LearningRate 0.0001 Epoch: 29 Global Step: 50730 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:56:11,486-Speed 25061.29 samples/sec Loss 1.5138 LearningRate 0.0001 Epoch: 29 Global Step: 50740 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:56:21,253-Speed 25170.53 samples/sec Loss 1.5110 LearningRate 0.0001 Epoch: 29 Global Step: 50750 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:56:30,975-Speed 25283.64 samples/sec Loss 1.5142 LearningRate 0.0001 Epoch: 29 Global Step: 50760 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:56:40,819-Speed 24968.93 samples/sec Loss 1.5149 LearningRate 0.0001 Epoch: 29 Global Step: 50770 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:56:50,643-Speed 25021.97 samples/sec Loss 1.5228 LearningRate 0.0001 Epoch: 29 Global Step: 50780 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:57:00,415-Speed 25152.33 samples/sec Loss 1.5178 LearningRate 0.0001 Epoch: 29 Global Step: 50790 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:57:10,255-Speed 24978.95 samples/sec Loss 1.5142 LearningRate 0.0001 Epoch: 29 Global Step: 50800 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:57:19,989-Speed 25249.87 samples/sec Loss 1.5180 LearningRate 0.0001 Epoch: 29 Global Step: 50810 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:57:29,771-Speed 25127.23 samples/sec Loss 1.5213 LearningRate 0.0001 Epoch: 29 Global Step: 50820 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:57:39,553-Speed 25125.57 samples/sec Loss 1.5203 LearningRate 0.0001 Epoch: 29 Global Step: 50830 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:57:49,393-Speed 24979.15 samples/sec Loss 1.4959 LearningRate 0.0001 Epoch: 29 Global Step: 50840 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:57:59,201-Speed 25061.35 samples/sec Loss 1.5105 LearningRate 0.0001 Epoch: 29 Global Step: 50850 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:58:08,961-Speed 25184.49 samples/sec Loss 1.5167 LearningRate 0.0001 Epoch: 29 Global Step: 50860 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:58:18,830-Speed 24906.77 samples/sec Loss 1.5176 LearningRate 0.0001 Epoch: 29 Global Step: 50870 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:58:28,580-Speed 25212.13 samples/sec Loss 1.5100 LearningRate 0.0001 Epoch: 29 Global Step: 50880 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:58:38,410-Speed 25005.55 samples/sec Loss 1.5110 LearningRate 0.0001 Epoch: 29 Global Step: 50890 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:58:48,226-Speed 25040.69 samples/sec Loss 1.5156 LearningRate 0.0001 Epoch: 29 Global Step: 50900 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:58:57,928-Speed 25335.47 samples/sec Loss 1.5069 LearningRate 0.0001 Epoch: 29 Global Step: 50910 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:59:07,763-Speed 24990.96 samples/sec Loss 1.5173 LearningRate 0.0001 Epoch: 29 Global Step: 50920 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:59:17,469-Speed 25324.58 samples/sec Loss 1.5060 LearningRate 0.0001 Epoch: 29 Global Step: 50930 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:59:27,239-Speed 25161.82 samples/sec Loss 1.4971 LearningRate 0.0001 Epoch: 29 Global Step: 50940 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:59:36,968-Speed 25263.19 samples/sec Loss 1.5175 LearningRate 0.0001 Epoch: 29 Global Step: 50950 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 12:59:46,701-Speed 25255.57 samples/sec Loss 1.5021 LearningRate 0.0001 Epoch: 29 Global Step: 50960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-26 12:59:56,446-Speed 25222.96 samples/sec Loss 1.5133 LearningRate 0.0001 Epoch: 29 Global Step: 50970 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:00:06,204-Speed 25189.56 samples/sec Loss 1.5151 LearningRate 0.0001 Epoch: 29 Global Step: 50980 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:00:16,040-Speed 24990.18 samples/sec Loss 1.5121 LearningRate 0.0001 Epoch: 29 Global Step: 50990 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:00:25,760-Speed 25286.61 samples/sec Loss 1.5164 LearningRate 0.0001 Epoch: 29 Global Step: 51000 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-26 13:00:35,572-Speed 25051.28 samples/sec Loss 1.5074 LearningRate 0.0001 Epoch: 29 Global Step: 51010 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-26 13:00:45,446-Speed 24891.21 samples/sec Loss 1.5056 LearningRate 0.0001 Epoch: 29 Global Step: 51020 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-26 13:00:55,214-Speed 25166.19 samples/sec Loss 1.4991 LearningRate 0.0001 Epoch: 29 Global Step: 51030 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-26 13:01:05,119-Speed 24814.22 samples/sec Loss 1.5142 LearningRate 0.0001 Epoch: 29 Global Step: 51040 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-26 13:01:14,937-Speed 25033.06 samples/sec Loss 1.5079 LearningRate 0.0001 Epoch: 29 Global Step: 51050 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-26 13:01:24,729-Speed 25105.65 samples/sec Loss 1.5149 LearningRate 0.0001 Epoch: 29 Global Step: 51060 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-26 13:01:34,568-Speed 24984.52 samples/sec Loss 1.5048 LearningRate 0.0001 Epoch: 29 Global Step: 51070 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-26 13:01:44,329-Speed 25180.36 samples/sec Loss 1.5026 LearningRate 0.0001 Epoch: 29 Global Step: 51080 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-26 13:01:54,086-Speed 25196.19 samples/sec Loss 1.4920 LearningRate 0.0001 Epoch: 29 Global Step: 51090 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-26 13:02:03,823-Speed 25246.25 samples/sec Loss 1.5023 LearningRate 0.0001 Epoch: 29 Global Step: 51100 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:02:13,568-Speed 25219.77 samples/sec Loss 1.5102 LearningRate 0.0001 Epoch: 29 Global Step: 51110 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:02:23,328-Speed 25182.93 samples/sec Loss 1.5040 LearningRate 0.0001 Epoch: 29 Global Step: 51120 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:02:33,073-Speed 25223.16 samples/sec Loss 1.5072 LearningRate 0.0001 Epoch: 29 Global Step: 51130 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:02:43,020-Speed 24710.71 samples/sec Loss 1.4996 LearningRate 0.0001 Epoch: 29 Global Step: 51140 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:02:52,740-Speed 25285.82 samples/sec Loss 1.5129 LearningRate 0.0001 Epoch: 29 Global Step: 51150 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:03:02,509-Speed 25162.29 samples/sec Loss 1.4999 LearningRate 0.0001 Epoch: 29 Global Step: 51160 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:03:12,243-Speed 25251.30 samples/sec Loss 1.5122 LearningRate 0.0001 Epoch: 29 Global Step: 51170 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:03:21,936-Speed 25358.55 samples/sec Loss 1.4987 LearningRate 0.0001 Epoch: 29 Global Step: 51180 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-26 13:03:31,693-Speed 25190.87 samples/sec Loss 1.4953 LearningRate 0.0001 Epoch: 29 Global Step: 51190 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-26 13:03:41,531-Speed 24983.48 samples/sec Loss 1.5076 LearningRate 0.0001 Epoch: 29 Global Step: 51200 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-26 13:03:51,310-Speed 25136.33 samples/sec Loss 1.5077 LearningRate 0.0001 Epoch: 29 Global Step: 51210 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-26 13:04:01,158-Speed 24959.34 samples/sec Loss 1.5118 LearningRate 0.0001 Epoch: 29 Global Step: 51220 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-26 13:04:10,973-Speed 25041.66 samples/sec Loss 1.5043 LearningRate 0.0001 Epoch: 29 Global Step: 51230 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-26 13:04:20,819-Speed 24965.95 samples/sec Loss 1.4994 LearningRate 0.0001 Epoch: 29 Global Step: 51240 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-26 13:04:30,520-Speed 25336.55 samples/sec Loss 1.5016 LearningRate 0.0001 Epoch: 29 Global Step: 51250 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-26 13:04:40,236-Speed 25296.90 samples/sec Loss 1.5056 LearningRate 0.0001 Epoch: 29 Global Step: 51260 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-26 13:04:49,957-Speed 25286.65 samples/sec Loss 1.4975 LearningRate 0.0001 Epoch: 29 Global Step: 51270 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-26 13:04:59,706-Speed 25210.51 samples/sec Loss 1.5046 LearningRate 0.0001 Epoch: 29 Global Step: 51280 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:05:09,484-Speed 25140.68 samples/sec Loss 1.5044 LearningRate 0.0001 Epoch: 29 Global Step: 51290 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:05:19,229-Speed 25222.81 samples/sec Loss 1.4994 LearningRate 0.0001 Epoch: 29 Global Step: 51300 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:05:28,983-Speed 25202.44 samples/sec Loss 1.5090 LearningRate 0.0001 Epoch: 29 Global Step: 51310 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:05:38,730-Speed 25218.04 samples/sec Loss 1.4949 LearningRate 0.0001 Epoch: 29 Global Step: 51320 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:05:48,471-Speed 25232.42 samples/sec Loss 1.4935 LearningRate 0.0001 Epoch: 29 Global Step: 51330 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:05:58,213-Speed 25230.55 samples/sec Loss 1.4966 LearningRate 0.0001 Epoch: 29 Global Step: 51340 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:06:08,016-Speed 25074.18 samples/sec Loss 1.5045 LearningRate 0.0001 Epoch: 29 Global Step: 51350 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:06:17,790-Speed 25148.26 samples/sec Loss 1.5005 LearningRate 0.0001 Epoch: 29 Global Step: 51360 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:06:27,508-Speed 25290.61 samples/sec Loss 1.4918 LearningRate 0.0001 Epoch: 29 Global Step: 51370 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:06:37,263-Speed 25195.70 samples/sec Loss 1.4906 LearningRate 0.0001 Epoch: 29 Global Step: 51380 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:06:47,103-Speed 24980.38 samples/sec Loss 1.4885 LearningRate 0.0001 Epoch: 29 Global Step: 51390 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:06:56,851-Speed 25214.67 samples/sec Loss 1.4989 LearningRate 0.0001 Epoch: 29 Global Step: 51400 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:07:06,717-Speed 24914.87 samples/sec Loss 1.4918 LearningRate 0.0001 Epoch: 29 Global Step: 51410 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:07:16,448-Speed 25259.63 samples/sec Loss 1.4984 LearningRate 0.0001 Epoch: 29 Global Step: 51420 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:07:26,215-Speed 25167.27 samples/sec Loss 1.4984 LearningRate 0.0001 Epoch: 29 Global Step: 51430 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:07:35,991-Speed 25145.22 samples/sec Loss 1.4846 LearningRate 0.0001 Epoch: 29 Global Step: 51440 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:07:45,735-Speed 25226.33 samples/sec Loss 1.4898 LearningRate 0.0001 Epoch: 29 Global Step: 51450 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:07:55,510-Speed 25143.32 samples/sec Loss 1.5113 LearningRate 0.0001 Epoch: 29 Global Step: 51460 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:08:05,182-Speed 25414.70 samples/sec Loss 1.4948 LearningRate 0.0001 Epoch: 29 Global Step: 51470 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:08:14,862-Speed 25389.24 samples/sec Loss 1.4896 LearningRate 0.0001 Epoch: 29 Global Step: 51480 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:08:24,594-Speed 25260.20 samples/sec Loss 1.4930 LearningRate 0.0001 Epoch: 29 Global Step: 51490 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:08:34,468-Speed 24893.13 samples/sec Loss 1.4890 LearningRate 0.0001 Epoch: 29 Global Step: 51500 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:08:44,237-Speed 25159.74 samples/sec Loss 1.5067 LearningRate 0.0001 Epoch: 29 Global Step: 51510 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:08:53,948-Speed 25311.17 samples/sec Loss 1.5017 LearningRate 0.0001 Epoch: 29 Global Step: 51520 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:09:03,802-Speed 24942.34 samples/sec Loss 1.4941 LearningRate 0.0001 Epoch: 29 Global Step: 51530 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:09:13,484-Speed 25388.32 samples/sec Loss 1.4908 LearningRate 0.0001 Epoch: 29 Global Step: 51540 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:09:23,261-Speed 25138.57 samples/sec Loss 1.4987 LearningRate 0.0001 Epoch: 29 Global Step: 51550 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:09:32,987-Speed 25272.63 samples/sec Loss 1.5005 LearningRate 0.0001 Epoch: 29 Global Step: 51560 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:09:42,731-Speed 25224.72 samples/sec Loss 1.4867 LearningRate 0.0001 Epoch: 29 Global Step: 51570 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:09:52,522-Speed 25103.69 samples/sec Loss 1.4951 LearningRate 0.0001 Epoch: 29 Global Step: 51580 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:10:02,269-Speed 25217.63 samples/sec Loss 1.5002 LearningRate 0.0001 Epoch: 29 Global Step: 51590 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:10:12,036-Speed 25166.78 samples/sec Loss 1.4840 LearningRate 0.0001 Epoch: 29 Global Step: 51600 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:10:21,797-Speed 25180.11 samples/sec Loss 1.4946 LearningRate 0.0001 Epoch: 29 Global Step: 51610 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:10:31,559-Speed 25179.14 samples/sec Loss 1.4979 LearningRate 0.0001 Epoch: 29 Global Step: 51620 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:10:41,304-Speed 25222.21 samples/sec Loss 1.4928 LearningRate 0.0001 Epoch: 29 Global Step: 51630 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:10:51,065-Speed 25179.87 samples/sec Loss 1.4932 LearningRate 0.0001 Epoch: 29 Global Step: 51640 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:11:00,823-Speed 25191.04 samples/sec Loss 1.4829 LearningRate 0.0001 Epoch: 29 Global Step: 51650 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:11:10,562-Speed 25238.63 samples/sec Loss 1.5028 LearningRate 0.0001 Epoch: 29 Global Step: 51660 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:11:20,270-Speed 25317.56 samples/sec Loss 1.4865 LearningRate 0.0001 Epoch: 29 Global Step: 51670 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:11:30,139-Speed 24906.51 samples/sec Loss 1.4915 LearningRate 0.0001 Epoch: 29 Global Step: 51680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-26 13:11:39,855-Speed 25296.98 samples/sec Loss 1.4997 LearningRate 0.0001 Epoch: 29 Global Step: 51690 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:11:49,638-Speed 25126.15 samples/sec Loss 1.4902 LearningRate 0.0001 Epoch: 29 Global Step: 51700 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:11:59,485-Speed 24966.98 samples/sec Loss 1.5029 LearningRate 0.0001 Epoch: 29 Global Step: 51710 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:12:09,240-Speed 25200.29 samples/sec Loss 1.5078 LearningRate 0.0001 Epoch: 29 Global Step: 51720 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:12:18,986-Speed 25220.87 samples/sec Loss 1.4870 LearningRate 0.0001 Epoch: 29 Global Step: 51730 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:12:28,859-Speed 24897.13 samples/sec Loss 1.4939 LearningRate 0.0001 Epoch: 29 Global Step: 51740 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:12:38,660-Speed 25079.58 samples/sec Loss 1.4944 LearningRate 0.0001 Epoch: 29 Global Step: 51750 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:12:48,533-Speed 24893.22 samples/sec Loss 1.4965 LearningRate 0.0001 Epoch: 29 Global Step: 51760 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:12:58,300-Speed 25167.33 samples/sec Loss 1.4864 LearningRate 0.0001 Epoch: 29 Global Step: 51770 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:13:08,056-Speed 25192.34 samples/sec Loss 1.5059 LearningRate 0.0001 Epoch: 29 Global Step: 51780 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:13:17,734-Speed 25396.70 samples/sec Loss 1.4820 LearningRate 0.0001 Epoch: 29 Global Step: 51790 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:13:27,497-Speed 25176.74 samples/sec Loss 1.4955 LearningRate 0.0001 Epoch: 29 Global Step: 51800 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:13:37,273-Speed 25144.26 samples/sec Loss 1.4918 LearningRate 0.0001 Epoch: 29 Global Step: 51810 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:13:47,022-Speed 25212.54 samples/sec Loss 1.5001 LearningRate 0.0001 Epoch: 29 Global Step: 51820 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:13:56,786-Speed 25173.00 samples/sec Loss 1.4929 LearningRate 0.0001 Epoch: 29 Global Step: 51830 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:14:06,521-Speed 25252.25 samples/sec Loss 1.4877 LearningRate 0.0001 Epoch: 29 Global Step: 51840 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:15:06,020-Speed 4130.55 samples/sec Loss 1.4964 LearningRate 0.0001 Epoch: 30 Global Step: 51850 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:15:15,719-Speed 25342.26 samples/sec Loss 1.4801 LearningRate 0.0001 Epoch: 30 Global Step: 51860 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:15:25,535-Speed 25042.81 samples/sec Loss 1.4875 LearningRate 0.0001 Epoch: 30 Global Step: 51870 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:15:35,346-Speed 25056.32 samples/sec Loss 1.4809 LearningRate 0.0001 Epoch: 30 Global Step: 51880 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:15:45,046-Speed 25342.62 samples/sec Loss 1.4813 LearningRate 0.0001 Epoch: 30 Global Step: 51890 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:15:54,880-Speed 24993.17 samples/sec Loss 1.4706 LearningRate 0.0001 Epoch: 30 Global Step: 51900 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:16:04,571-Speed 25363.33 samples/sec Loss 1.4827 LearningRate 0.0001 Epoch: 30 Global Step: 51910 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:16:14,287-Speed 25297.78 samples/sec Loss 1.4851 LearningRate 0.0001 Epoch: 30 Global Step: 51920 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:16:23,995-Speed 25317.10 samples/sec Loss 1.4832 LearningRate 0.0001 Epoch: 30 Global Step: 51930 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:16:33,697-Speed 25340.49 samples/sec Loss 1.4806 LearningRate 0.0001 Epoch: 30 Global Step: 51940 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:16:43,499-Speed 25085.27 samples/sec Loss 1.4880 LearningRate 0.0001 Epoch: 30 Global Step: 51950 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:16:53,291-Speed 25100.06 samples/sec Loss 1.4738 LearningRate 0.0001 Epoch: 30 Global Step: 51960 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:17:03,058-Speed 25165.54 samples/sec Loss 1.4821 LearningRate 0.0001 Epoch: 30 Global Step: 51970 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:17:12,891-Speed 24999.32 samples/sec Loss 1.4746 LearningRate 0.0001 Epoch: 30 Global Step: 51980 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:17:22,544-Speed 25461.95 samples/sec Loss 1.4768 LearningRate 0.0001 Epoch: 30 Global Step: 51990 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:17:32,282-Speed 25240.71 samples/sec Loss 1.4677 LearningRate 0.0001 Epoch: 30 Global Step: 52000 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:17:41,996-Speed 25306.96 samples/sec Loss 1.4789 LearningRate 0.0001 Epoch: 30 Global Step: 52010 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:17:51,653-Speed 25453.96 samples/sec Loss 1.4841 LearningRate 0.0001 Epoch: 30 Global Step: 52020 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:18:01,388-Speed 25247.09 samples/sec Loss 1.4809 LearningRate 0.0001 Epoch: 30 Global Step: 52030 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:18:11,177-Speed 25111.70 samples/sec Loss 1.4800 LearningRate 0.0001 Epoch: 30 Global Step: 52040 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:18:20,967-Speed 25107.30 samples/sec Loss 1.4802 LearningRate 0.0001 Epoch: 30 Global Step: 52050 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:18:30,796-Speed 25006.83 samples/sec Loss 1.4761 LearningRate 0.0001 Epoch: 30 Global Step: 52060 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:18:40,506-Speed 25312.10 samples/sec Loss 1.4846 LearningRate 0.0001 Epoch: 30 Global Step: 52070 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:18:50,292-Speed 25119.26 samples/sec Loss 1.4773 LearningRate 0.0001 Epoch: 30 Global Step: 52080 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:19:00,022-Speed 25260.70 samples/sec Loss 1.4855 LearningRate 0.0001 Epoch: 30 Global Step: 52090 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:19:09,760-Speed 25239.78 samples/sec Loss 1.4795 LearningRate 0.0001 Epoch: 30 Global Step: 52100 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:19:19,582-Speed 25024.48 samples/sec Loss 1.4786 LearningRate 0.0001 Epoch: 30 Global Step: 52110 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:19:29,408-Speed 25013.61 samples/sec Loss 1.4883 LearningRate 0.0001 Epoch: 30 Global Step: 52120 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:19:39,186-Speed 25144.15 samples/sec Loss 1.4782 LearningRate 0.0001 Epoch: 30 Global Step: 52130 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:19:48,998-Speed 25051.23 samples/sec Loss 1.4762 LearningRate 0.0001 Epoch: 30 Global Step: 52140 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:19:58,690-Speed 25361.28 samples/sec Loss 1.4848 LearningRate 0.0001 Epoch: 30 Global Step: 52150 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:20:08,481-Speed 25106.82 samples/sec Loss 1.4727 LearningRate 0.0001 Epoch: 30 Global Step: 52160 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:20:18,230-Speed 25211.61 samples/sec Loss 1.4711 LearningRate 0.0001 Epoch: 30 Global Step: 52170 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:20:27,979-Speed 25212.22 samples/sec Loss 1.4771 LearningRate 0.0001 Epoch: 30 Global Step: 52180 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:20:37,755-Speed 25144.87 samples/sec Loss 1.4765 LearningRate 0.0001 Epoch: 30 Global Step: 52190 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:20:47,623-Speed 24910.76 samples/sec Loss 1.4793 LearningRate 0.0001 Epoch: 30 Global Step: 52200 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:20:57,349-Speed 25269.81 samples/sec Loss 1.4888 LearningRate 0.0001 Epoch: 30 Global Step: 52210 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:21:07,097-Speed 25213.85 samples/sec Loss 1.4818 LearningRate 0.0001 Epoch: 30 Global Step: 52220 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:21:16,790-Speed 25360.73 samples/sec Loss 1.4811 LearningRate 0.0001 Epoch: 30 Global Step: 52230 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:21:26,550-Speed 25185.85 samples/sec Loss 1.4822 LearningRate 0.0001 Epoch: 30 Global Step: 52240 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:21:36,319-Speed 25158.50 samples/sec Loss 1.4848 LearningRate 0.0001 Epoch: 30 Global Step: 52250 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:21:45,981-Speed 25439.59 samples/sec Loss 1.4755 LearningRate 0.0001 Epoch: 30 Global Step: 52260 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-26 13:21:55,774-Speed 25098.97 samples/sec Loss 1.4713 LearningRate 0.0001 Epoch: 30 Global Step: 52270 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-26 13:22:05,582-Speed 25060.60 samples/sec Loss 1.4772 LearningRate 0.0001 Epoch: 30 Global Step: 52280 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-26 13:22:15,371-Speed 25108.68 samples/sec Loss 1.4705 LearningRate 0.0001 Epoch: 30 Global Step: 52290 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-26 13:22:25,127-Speed 25196.31 samples/sec Loss 1.4783 LearningRate 0.0001 Epoch: 30 Global Step: 52300 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-26 13:22:34,934-Speed 25064.36 samples/sec Loss 1.4838 LearningRate 0.0001 Epoch: 30 Global Step: 52310 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-26 13:22:44,726-Speed 25106.64 samples/sec Loss 1.4773 LearningRate 0.0001 Epoch: 30 Global Step: 52320 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-26 13:22:54,478-Speed 25204.95 samples/sec Loss 1.4702 LearningRate 0.0001 Epoch: 30 Global Step: 52330 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-26 13:23:04,202-Speed 25280.13 samples/sec Loss 1.4810 LearningRate 0.0001 Epoch: 30 Global Step: 52340 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-26 13:23:14,025-Speed 25023.19 samples/sec Loss 1.4584 LearningRate 0.0001 Epoch: 30 Global Step: 52350 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-26 13:23:23,825-Speed 25080.37 samples/sec Loss 1.4776 LearningRate 0.0001 Epoch: 30 Global Step: 52360 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:23:33,572-Speed 25218.06 samples/sec Loss 1.4712 LearningRate 0.0001 Epoch: 30 Global Step: 52370 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:23:43,277-Speed 25327.34 samples/sec Loss 1.4781 LearningRate 0.0001 Epoch: 30 Global Step: 52380 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:23:53,163-Speed 24866.93 samples/sec Loss 1.4770 LearningRate 0.0001 Epoch: 30 Global Step: 52390 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:24:02,869-Speed 25332.30 samples/sec Loss 1.4707 LearningRate 0.0001 Epoch: 30 Global Step: 52400 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:24:12,610-Speed 25234.74 samples/sec Loss 1.4776 LearningRate 0.0001 Epoch: 30 Global Step: 52410 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:24:22,331-Speed 25284.36 samples/sec Loss 1.4743 LearningRate 0.0001 Epoch: 30 Global Step: 52420 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:24:32,106-Speed 25146.34 samples/sec Loss 1.4766 LearningRate 0.0001 Epoch: 30 Global Step: 52430 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:24:41,898-Speed 25104.48 samples/sec Loss 1.4702 LearningRate 0.0001 Epoch: 30 Global Step: 52440 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:24:51,658-Speed 25182.25 samples/sec Loss 1.4698 LearningRate 0.0001 Epoch: 30 Global Step: 52450 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:25:01,422-Speed 25173.84 samples/sec Loss 1.4682 LearningRate 0.0001 Epoch: 30 Global Step: 52460 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:25:11,167-Speed 25223.80 samples/sec Loss 1.4727 LearningRate 0.0001 Epoch: 30 Global Step: 52470 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:25:20,907-Speed 25235.58 samples/sec Loss 1.4769 LearningRate 0.0001 Epoch: 30 Global Step: 52480 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:25:30,708-Speed 25076.60 samples/sec Loss 1.4778 LearningRate 0.0001 Epoch: 30 Global Step: 52490 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:25:40,432-Speed 25276.50 samples/sec Loss 1.4759 LearningRate 0.0001 Epoch: 30 Global Step: 52500 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:25:50,232-Speed 25084.82 samples/sec Loss 1.4740 LearningRate 0.0001 Epoch: 30 Global Step: 52510 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:25:59,947-Speed 25301.04 samples/sec Loss 1.4709 LearningRate 0.0001 Epoch: 30 Global Step: 52520 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:26:09,691-Speed 25225.17 samples/sec Loss 1.4751 LearningRate 0.0001 Epoch: 30 Global Step: 52530 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:26:19,512-Speed 25026.11 samples/sec Loss 1.4786 LearningRate 0.0001 Epoch: 30 Global Step: 52540 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:26:29,219-Speed 25319.95 samples/sec Loss 1.4680 LearningRate 0.0001 Epoch: 30 Global Step: 52550 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:26:39,026-Speed 25065.49 samples/sec Loss 1.4556 LearningRate 0.0001 Epoch: 30 Global Step: 52560 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:26:48,749-Speed 25278.05 samples/sec Loss 1.4670 LearningRate 0.0001 Epoch: 30 Global Step: 52570 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:26:58,458-Speed 25315.22 samples/sec Loss 1.4729 LearningRate 0.0001 Epoch: 30 Global Step: 52580 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:27:08,312-Speed 24944.34 samples/sec Loss 1.4632 LearningRate 0.0001 Epoch: 30 Global Step: 52590 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:27:18,139-Speed 25012.19 samples/sec Loss 1.4694 LearningRate 0.0001 Epoch: 30 Global Step: 52600 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:27:28,078-Speed 24730.57 samples/sec Loss 1.4717 LearningRate 0.0001 Epoch: 30 Global Step: 52610 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:27:37,818-Speed 25234.61 samples/sec Loss 1.4666 LearningRate 0.0001 Epoch: 30 Global Step: 52620 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:27:47,898-Speed 24386.03 samples/sec Loss 1.4756 LearningRate 0.0001 Epoch: 30 Global Step: 52630 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-26 13:27:57,989-Speed 24358.89 samples/sec Loss 1.4681 LearningRate 0.0001 Epoch: 30 Global Step: 52640 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-26 13:28:08,060-Speed 24407.59 samples/sec Loss 1.4674 LearningRate 0.0001 Epoch: 30 Global Step: 52650 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-26 13:28:18,178-Speed 24291.51 samples/sec Loss 1.4704 LearningRate 0.0001 Epoch: 30 Global Step: 52660 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-26 13:28:28,291-Speed 24305.34 samples/sec Loss 1.4666 LearningRate 0.0001 Epoch: 30 Global Step: 52670 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-26 13:28:38,426-Speed 24250.89 samples/sec Loss 1.4616 LearningRate 0.0001 Epoch: 30 Global Step: 52680 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-26 13:28:48,499-Speed 24402.22 samples/sec Loss 1.4639 LearningRate 0.0001 Epoch: 30 Global Step: 52690 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-26 13:28:58,592-Speed 24352.61 samples/sec Loss 1.4684 LearningRate 0.0001 Epoch: 30 Global Step: 52700 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-26 13:29:08,757-Speed 24182.51 samples/sec Loss 1.4547 LearningRate 0.0001 Epoch: 30 Global Step: 52710 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-26 13:29:18,824-Speed 24414.60 samples/sec Loss 1.4483 LearningRate 0.0001 Epoch: 30 Global Step: 52720 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-26 13:29:28,960-Speed 24250.14 samples/sec Loss 1.4706 LearningRate 0.0001 Epoch: 30 Global Step: 52730 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:29:39,037-Speed 24390.24 samples/sec Loss 1.4690 LearningRate 0.0001 Epoch: 30 Global Step: 52740 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:29:49,120-Speed 24377.19 samples/sec Loss 1.4537 LearningRate 0.0001 Epoch: 30 Global Step: 52750 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:29:59,195-Speed 24396.03 samples/sec Loss 1.4597 LearningRate 0.0001 Epoch: 30 Global Step: 52760 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:30:09,266-Speed 24406.77 samples/sec Loss 1.4662 LearningRate 0.0001 Epoch: 30 Global Step: 52770 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:30:19,324-Speed 24438.26 samples/sec Loss 1.4683 LearningRate 0.0001 Epoch: 30 Global Step: 52780 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:30:29,420-Speed 24346.40 samples/sec Loss 1.4660 LearningRate 0.0001 Epoch: 30 Global Step: 52790 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:30:39,507-Speed 24365.57 samples/sec Loss 1.4654 LearningRate 0.0001 Epoch: 30 Global Step: 52800 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:30:49,546-Speed 24485.59 samples/sec Loss 1.4505 LearningRate 0.0001 Epoch: 30 Global Step: 52810 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:30:59,595-Speed 24459.15 samples/sec Loss 1.4610 LearningRate 0.0001 Epoch: 30 Global Step: 52820 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:31:09,664-Speed 24409.02 samples/sec Loss 1.4621 LearningRate 0.0001 Epoch: 30 Global Step: 52830 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:31:19,785-Speed 24285.54 samples/sec Loss 1.4640 LearningRate 0.0001 Epoch: 30 Global Step: 52840 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:31:29,840-Speed 24445.29 samples/sec Loss 1.4691 LearningRate 0.0001 Epoch: 30 Global Step: 52850 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:31:39,928-Speed 24366.00 samples/sec Loss 1.4530 LearningRate 0.0001 Epoch: 30 Global Step: 52860 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:31:50,025-Speed 24342.11 samples/sec Loss 1.4652 LearningRate 0.0001 Epoch: 30 Global Step: 52870 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:32:00,069-Speed 24472.04 samples/sec Loss 1.4621 LearningRate 0.0001 Epoch: 30 Global Step: 52880 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:32:10,171-Speed 24331.45 samples/sec Loss 1.4533 LearningRate 0.0001 Epoch: 30 Global Step: 52890 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:32:20,253-Speed 24378.57 samples/sec Loss 1.4637 LearningRate 0.0001 Epoch: 30 Global Step: 52900 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:32:30,288-Speed 24494.40 samples/sec Loss 1.4604 LearningRate 0.0001 Epoch: 30 Global Step: 52910 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:32:40,390-Speed 24331.84 samples/sec Loss 1.4665 LearningRate 0.0001 Epoch: 30 Global Step: 52920 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:32:50,477-Speed 24366.90 samples/sec Loss 1.4672 LearningRate 0.0001 Epoch: 30 Global Step: 52930 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-26 13:33:00,553-Speed 24398.79 samples/sec Loss 1.4733 LearningRate 0.0001 Epoch: 30 Global Step: 52940 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-26 13:33:10,658-Speed 24324.05 samples/sec Loss 1.4546 LearningRate 0.0001 Epoch: 30 Global Step: 52950 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-26 13:33:20,738-Speed 24386.56 samples/sec Loss 1.4608 LearningRate 0.0001 Epoch: 30 Global Step: 52960 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-26 13:33:30,839-Speed 24332.76 samples/sec Loss 1.4540 LearningRate 0.0001 Epoch: 30 Global Step: 52970 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-26 13:33:40,929-Speed 24364.86 samples/sec Loss 1.4548 LearningRate 0.0001 Epoch: 30 Global Step: 52980 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-26 13:33:51,122-Speed 24120.13 samples/sec Loss 1.4603 LearningRate 0.0001 Epoch: 30 Global Step: 52990 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-26 13:34:01,254-Speed 24258.44 samples/sec Loss 1.4591 LearningRate 0.0001 Epoch: 30 Global Step: 53000 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-26 13:34:11,332-Speed 24388.60 samples/sec Loss 1.4556 LearningRate 0.0001 Epoch: 30 Global Step: 53010 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-26 13:34:21,549-Speed 24058.61 samples/sec Loss 1.4522 LearningRate 0.0001 Epoch: 30 Global Step: 53020 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-26 13:34:31,656-Speed 24318.28 samples/sec Loss 1.4656 LearningRate 0.0001 Epoch: 30 Global Step: 53030 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:34:41,704-Speed 24461.43 samples/sec Loss 1.4644 LearningRate 0.0001 Epoch: 30 Global Step: 53040 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:34:51,825-Speed 24286.21 samples/sec Loss 1.4592 LearningRate 0.0001 Epoch: 30 Global Step: 53050 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:35:01,725-Speed 24828.30 samples/sec Loss 1.4575 LearningRate 0.0001 Epoch: 30 Global Step: 53060 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:35:11,585-Speed 24927.15 samples/sec Loss 1.4518 LearningRate 0.0001 Epoch: 30 Global Step: 53070 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:35:21,471-Speed 24864.17 samples/sec Loss 1.4560 LearningRate 0.0001 Epoch: 30 Global Step: 53080 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:35:31,315-Speed 24970.94 samples/sec Loss 1.4552 LearningRate 0.0001 Epoch: 30 Global Step: 53090 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:35:41,087-Speed 25151.34 samples/sec Loss 1.4529 LearningRate 0.0001 Epoch: 30 Global Step: 53100 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:35:50,868-Speed 25132.49 samples/sec Loss 1.4614 LearningRate 0.0001 Epoch: 30 Global Step: 53110 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:36:00,630-Speed 25179.70 samples/sec Loss 1.4515 LearningRate 0.0001 Epoch: 30 Global Step: 53120 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:36:10,332-Speed 25332.55 samples/sec Loss 1.4636 LearningRate 0.0001 Epoch: 30 Global Step: 53130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-26 13:36:20,271-Speed 24731.06 samples/sec Loss 1.4468 LearningRate 0.0001 Epoch: 30 Global Step: 53140 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:36:30,046-Speed 25147.30 samples/sec Loss 1.4524 LearningRate 0.0001 Epoch: 30 Global Step: 53150 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:36:39,876-Speed 25002.51 samples/sec Loss 1.4557 LearningRate 0.0001 Epoch: 30 Global Step: 53160 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:36:49,732-Speed 24937.50 samples/sec Loss 1.4627 LearningRate 0.0001 Epoch: 30 Global Step: 53170 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-26 13:36:59,516-Speed 25122.34 samples/sec Loss 1.4487 LearningRate 0.0001 Epoch: 30 Global Step: 53180 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:37:09,311-Speed 25095.26 samples/sec Loss 1.4473 LearningRate 0.0001 Epoch: 30 Global Step: 53190 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:37:19,129-Speed 25034.13 samples/sec Loss 1.4555 LearningRate 0.0001 Epoch: 30 Global Step: 53200 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 13:37:28,929-Speed 25084.64 samples/sec Loss 1.4498 LearningRate 0.0001 Epoch: 30 Global Step: 53210 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 13:37:38,616-Speed 25372.53 samples/sec Loss 1.4518 LearningRate 0.0001 Epoch: 30 Global Step: 53220 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 13:37:48,332-Speed 25296.45 samples/sec Loss 1.4496 LearningRate 0.0001 Epoch: 30 Global Step: 53230 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 13:37:58,211-Speed 24881.67 samples/sec Loss 1.4595 LearningRate 0.0001 Epoch: 30 Global Step: 53240 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 13:38:08,110-Speed 24831.03 samples/sec Loss 1.4471 LearningRate 0.0001 Epoch: 30 Global Step: 53250 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 13:38:17,971-Speed 24925.89 samples/sec Loss 1.4463 LearningRate 0.0001 Epoch: 30 Global Step: 53260 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 13:38:27,737-Speed 25167.54 samples/sec Loss 1.4545 LearningRate 0.0001 Epoch: 30 Global Step: 53270 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 13:38:37,549-Speed 25052.11 samples/sec Loss 1.4489 LearningRate 0.0001 Epoch: 30 Global Step: 53280 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 13:38:47,471-Speed 24771.78 samples/sec Loss 1.4549 LearningRate 0.0001 Epoch: 30 Global Step: 53290 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 13:38:57,384-Speed 24796.65 samples/sec Loss 1.4423 LearningRate 0.0001 Epoch: 30 Global Step: 53300 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:39:07,299-Speed 24788.55 samples/sec Loss 1.4447 LearningRate 0.0001 Epoch: 30 Global Step: 53310 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:39:17,180-Speed 24880.96 samples/sec Loss 1.4573 LearningRate 0.0001 Epoch: 30 Global Step: 53320 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:39:26,948-Speed 25163.90 samples/sec Loss 1.4583 LearningRate 0.0001 Epoch: 30 Global Step: 53330 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:39:36,753-Speed 25068.52 samples/sec Loss 1.4538 LearningRate 0.0001 Epoch: 30 Global Step: 53340 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:39:46,559-Speed 25066.61 samples/sec Loss 1.4441 LearningRate 0.0001 Epoch: 30 Global Step: 53350 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:39:56,361-Speed 25074.44 samples/sec Loss 1.4470 LearningRate 0.0001 Epoch: 30 Global Step: 53360 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:40:06,170-Speed 25057.67 samples/sec Loss 1.4592 LearningRate 0.0001 Epoch: 30 Global Step: 53370 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:40:16,087-Speed 24785.44 samples/sec Loss 1.4500 LearningRate 0.0001 Epoch: 30 Global Step: 53380 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:40:25,828-Speed 25233.48 samples/sec Loss 1.4450 LearningRate 0.0001 Epoch: 30 Global Step: 53390 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:40:35,633-Speed 25074.04 samples/sec Loss 1.4612 LearningRate 0.0001 Epoch: 30 Global Step: 53400 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:40:45,547-Speed 24792.22 samples/sec Loss 1.4474 LearningRate 0.0001 Epoch: 30 Global Step: 53410 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:40:55,422-Speed 24888.49 samples/sec Loss 1.4485 LearningRate 0.0001 Epoch: 30 Global Step: 53420 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:41:05,161-Speed 25237.38 samples/sec Loss 1.4385 LearningRate 0.0001 Epoch: 30 Global Step: 53430 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:41:14,922-Speed 25181.94 samples/sec Loss 1.4519 LearningRate 0.0001 Epoch: 30 Global Step: 53440 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:41:24,646-Speed 25277.18 samples/sec Loss 1.4483 LearningRate 0.0001 Epoch: 30 Global Step: 53450 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:41:34,389-Speed 25230.03 samples/sec Loss 1.4453 LearningRate 0.0001 Epoch: 30 Global Step: 53460 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:41:44,174-Speed 25117.06 samples/sec Loss 1.4481 LearningRate 0.0001 Epoch: 30 Global Step: 53470 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:41:53,969-Speed 25093.95 samples/sec Loss 1.4524 LearningRate 0.0001 Epoch: 30 Global Step: 53480 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:42:03,734-Speed 25172.72 samples/sec Loss 1.4419 LearningRate 0.0001 Epoch: 30 Global Step: 53490 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:42:13,500-Speed 25165.52 samples/sec Loss 1.4450 LearningRate 0.0001 Epoch: 30 Global Step: 53500 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:42:23,283-Speed 25126.96 samples/sec Loss 1.4432 LearningRate 0.0001 Epoch: 30 Global Step: 53510 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:42:33,055-Speed 25152.58 samples/sec Loss 1.4438 LearningRate 0.0001 Epoch: 30 Global Step: 53520 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:42:42,843-Speed 25112.54 samples/sec Loss 1.4593 LearningRate 0.0001 Epoch: 30 Global Step: 53530 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:42:52,617-Speed 25146.07 samples/sec Loss 1.4535 LearningRate 0.0001 Epoch: 30 Global Step: 53540 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:43:02,346-Speed 25270.57 samples/sec Loss 1.4591 LearningRate 0.0001 Epoch: 30 Global Step: 53550 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:43:12,115-Speed 25159.92 samples/sec Loss 1.4644 LearningRate 0.0001 Epoch: 30 Global Step: 53560 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:43:22,000-Speed 24871.07 samples/sec Loss 1.4646 LearningRate 0.0001 Epoch: 30 Global Step: 53570 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:44:20,973-Speed 4167.43 samples/sec Loss 1.4470 LearningRate 0.0001 Epoch: 31 Global Step: 53580 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:44:30,811-Speed 24985.54 samples/sec Loss 1.4410 LearningRate 0.0001 Epoch: 31 Global Step: 53590 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:44:40,584-Speed 25154.31 samples/sec Loss 1.4436 LearningRate 0.0001 Epoch: 31 Global Step: 53600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-26 13:44:50,337-Speed 25203.82 samples/sec Loss 1.4445 LearningRate 0.0001 Epoch: 31 Global Step: 53610 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:45:00,093-Speed 25193.43 samples/sec Loss 1.4370 LearningRate 0.0001 Epoch: 31 Global Step: 53620 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:45:09,996-Speed 24819.00 samples/sec Loss 1.4382 LearningRate 0.0001 Epoch: 31 Global Step: 53630 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:45:19,845-Speed 24959.07 samples/sec Loss 1.4356 LearningRate 0.0001 Epoch: 31 Global Step: 53640 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:45:29,710-Speed 24922.22 samples/sec Loss 1.4372 LearningRate 0.0001 Epoch: 31 Global Step: 53650 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:45:39,572-Speed 24925.40 samples/sec Loss 1.4342 LearningRate 0.0001 Epoch: 31 Global Step: 53660 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:45:49,404-Speed 25003.54 samples/sec Loss 1.4462 LearningRate 0.0001 Epoch: 31 Global Step: 53670 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:45:59,094-Speed 25365.12 samples/sec Loss 1.4342 LearningRate 0.0001 Epoch: 31 Global Step: 53680 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:46:08,941-Speed 24963.97 samples/sec Loss 1.4398 LearningRate 0.0001 Epoch: 31 Global Step: 53690 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:46:18,704-Speed 25175.57 samples/sec Loss 1.4416 LearningRate 0.0001 Epoch: 31 Global Step: 53700 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 13:46:28,495-Speed 25105.82 samples/sec Loss 1.4419 LearningRate 0.0001 Epoch: 31 Global Step: 53710 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 13:46:38,279-Speed 25121.33 samples/sec Loss 1.4336 LearningRate 0.0001 Epoch: 31 Global Step: 53720 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 13:46:48,083-Speed 25072.63 samples/sec Loss 1.4318 LearningRate 0.0001 Epoch: 31 Global Step: 53730 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 13:46:58,099-Speed 24542.09 samples/sec Loss 1.4446 LearningRate 0.0001 Epoch: 31 Global Step: 53740 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 13:47:08,151-Speed 24451.06 samples/sec Loss 1.4346 LearningRate 0.0001 Epoch: 31 Global Step: 53750 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 13:47:18,143-Speed 24599.93 samples/sec Loss 1.4385 LearningRate 0.0001 Epoch: 31 Global Step: 53760 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 13:47:28,195-Speed 24452.10 samples/sec Loss 1.4476 LearningRate 0.0001 Epoch: 31 Global Step: 53770 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 13:47:38,267-Speed 24402.65 samples/sec Loss 1.4289 LearningRate 0.0001 Epoch: 31 Global Step: 53780 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 13:47:48,341-Speed 24399.26 samples/sec Loss 1.4341 LearningRate 0.0001 Epoch: 31 Global Step: 53790 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 13:47:58,482-Speed 24238.15 samples/sec Loss 1.4306 LearningRate 0.0001 Epoch: 31 Global Step: 53800 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:48:08,572-Speed 24360.47 samples/sec Loss 1.4392 LearningRate 0.0001 Epoch: 31 Global Step: 53810 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:48:18,601-Speed 24509.53 samples/sec Loss 1.4452 LearningRate 0.0001 Epoch: 31 Global Step: 53820 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:48:28,669-Speed 24413.10 samples/sec Loss 1.4474 LearningRate 0.0001 Epoch: 31 Global Step: 53830 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:48:38,689-Speed 24530.29 samples/sec Loss 1.4385 LearningRate 0.0001 Epoch: 31 Global Step: 53840 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:48:48,858-Speed 24170.34 samples/sec Loss 1.4325 LearningRate 0.0001 Epoch: 31 Global Step: 53850 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:48:58,822-Speed 24669.45 samples/sec Loss 1.4548 LearningRate 0.0001 Epoch: 31 Global Step: 53860 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 13:49:08,897-Speed 24396.55 samples/sec Loss 1.4370 LearningRate 0.0001 Epoch: 31 Global Step: 53870 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 13:49:18,995-Speed 24339.47 samples/sec Loss 1.4524 LearningRate 0.0001 Epoch: 31 Global Step: 53880 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 13:49:29,094-Speed 24339.60 samples/sec Loss 1.4393 LearningRate 0.0001 Epoch: 31 Global Step: 53890 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 13:49:39,163-Speed 24410.67 samples/sec Loss 1.4351 LearningRate 0.0001 Epoch: 31 Global Step: 53900 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 13:49:49,185-Speed 24523.83 samples/sec Loss 1.4326 LearningRate 0.0001 Epoch: 31 Global Step: 53910 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 13:49:59,121-Speed 24747.25 samples/sec Loss 1.4443 LearningRate 0.0001 Epoch: 31 Global Step: 53920 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 13:50:09,310-Speed 24123.93 samples/sec Loss 1.4491 LearningRate 0.0001 Epoch: 31 Global Step: 53930 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 13:50:19,388-Speed 24390.44 samples/sec Loss 1.4442 LearningRate 0.0001 Epoch: 31 Global Step: 53940 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 13:50:29,387-Speed 24581.73 samples/sec Loss 1.4415 LearningRate 0.0001 Epoch: 31 Global Step: 53950 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 13:50:39,423-Speed 24490.71 samples/sec Loss 1.4295 LearningRate 0.0001 Epoch: 31 Global Step: 53960 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:50:49,542-Speed 24291.14 samples/sec Loss 1.4456 LearningRate 0.0001 Epoch: 31 Global Step: 53970 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:50:59,559-Speed 24538.80 samples/sec Loss 1.4272 LearningRate 0.0001 Epoch: 31 Global Step: 53980 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:51:09,560-Speed 24577.02 samples/sec Loss 1.4347 LearningRate 0.0001 Epoch: 31 Global Step: 53990 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:51:19,740-Speed 24146.38 samples/sec Loss 1.4358 LearningRate 0.0001 Epoch: 31 Global Step: 54000 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:51:29,843-Speed 24327.20 samples/sec Loss 1.4329 LearningRate 0.0001 Epoch: 31 Global Step: 54010 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:51:39,961-Speed 24294.13 samples/sec Loss 1.4487 LearningRate 0.0001 Epoch: 31 Global Step: 54020 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:51:49,932-Speed 24650.92 samples/sec Loss 1.4388 LearningRate 0.0001 Epoch: 31 Global Step: 54030 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:52:00,129-Speed 24106.54 samples/sec Loss 1.4410 LearningRate 0.0001 Epoch: 31 Global Step: 54040 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:52:10,265-Speed 24248.95 samples/sec Loss 1.4315 LearningRate 0.0001 Epoch: 31 Global Step: 54050 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:52:20,282-Speed 24537.38 samples/sec Loss 1.4364 LearningRate 0.0001 Epoch: 31 Global Step: 54060 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:52:30,319-Speed 24490.45 samples/sec Loss 1.4395 LearningRate 0.0001 Epoch: 31 Global Step: 54070 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:52:40,369-Speed 24459.07 samples/sec Loss 1.4380 LearningRate 0.0001 Epoch: 31 Global Step: 54080 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:52:50,497-Speed 24268.93 samples/sec Loss 1.4376 LearningRate 0.0001 Epoch: 31 Global Step: 54090 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:53:00,501-Speed 24567.97 samples/sec Loss 1.4396 LearningRate 0.0001 Epoch: 31 Global Step: 54100 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:53:10,464-Speed 24671.15 samples/sec Loss 1.4259 LearningRate 0.0001 Epoch: 31 Global Step: 54110 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:53:20,528-Speed 24422.29 samples/sec Loss 1.4400 LearningRate 0.0001 Epoch: 31 Global Step: 54120 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:53:30,556-Speed 24510.82 samples/sec Loss 1.4275 LearningRate 0.0001 Epoch: 31 Global Step: 54130 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:53:40,647-Speed 24358.27 samples/sec Loss 1.4255 LearningRate 0.0001 Epoch: 31 Global Step: 54140 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:53:50,698-Speed 24456.06 samples/sec Loss 1.4249 LearningRate 0.0001 Epoch: 31 Global Step: 54150 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:54:00,720-Speed 24524.83 samples/sec Loss 1.4376 LearningRate 0.0001 Epoch: 31 Global Step: 54160 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:54:10,775-Speed 24453.93 samples/sec Loss 1.4295 LearningRate 0.0001 Epoch: 31 Global Step: 54170 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:54:20,828-Speed 24450.16 samples/sec Loss 1.4314 LearningRate 0.0001 Epoch: 31 Global Step: 54180 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:54:30,894-Speed 24417.86 samples/sec Loss 1.4275 LearningRate 0.0001 Epoch: 31 Global Step: 54190 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:54:40,901-Speed 24561.72 samples/sec Loss 1.4308 LearningRate 0.0001 Epoch: 31 Global Step: 54200 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:54:50,938-Speed 24490.36 samples/sec Loss 1.4307 LearningRate 0.0001 Epoch: 31 Global Step: 54210 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:55:01,042-Speed 24325.90 samples/sec Loss 1.4321 LearningRate 0.0001 Epoch: 31 Global Step: 54220 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:55:11,202-Speed 24190.34 samples/sec Loss 1.4281 LearningRate 0.0001 Epoch: 31 Global Step: 54230 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:55:21,365-Speed 24184.36 samples/sec Loss 1.4318 LearningRate 0.0001 Epoch: 31 Global Step: 54240 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:55:31,523-Speed 24198.15 samples/sec Loss 1.4403 LearningRate 0.0001 Epoch: 31 Global Step: 54250 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:55:41,573-Speed 24455.79 samples/sec Loss 1.4386 LearningRate 0.0001 Epoch: 31 Global Step: 54260 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:55:51,687-Speed 24301.77 samples/sec Loss 1.4270 LearningRate 0.0001 Epoch: 31 Global Step: 54270 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:56:01,685-Speed 24583.24 samples/sec Loss 1.4330 LearningRate 0.0001 Epoch: 31 Global Step: 54280 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:56:11,668-Speed 24620.11 samples/sec Loss 1.4365 LearningRate 0.0001 Epoch: 31 Global Step: 54290 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:56:21,780-Speed 24307.10 samples/sec Loss 1.4286 LearningRate 0.0001 Epoch: 31 Global Step: 54300 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:56:31,789-Speed 24557.30 samples/sec Loss 1.4387 LearningRate 0.0001 Epoch: 31 Global Step: 54310 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:56:41,814-Speed 24520.77 samples/sec Loss 1.4350 LearningRate 0.0001 Epoch: 31 Global Step: 54320 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:56:51,845-Speed 24501.87 samples/sec Loss 1.4262 LearningRate 0.0001 Epoch: 31 Global Step: 54330 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:57:01,885-Speed 24481.66 samples/sec Loss 1.4290 LearningRate 0.0001 Epoch: 31 Global Step: 54340 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:57:11,976-Speed 24357.16 samples/sec Loss 1.4248 LearningRate 0.0001 Epoch: 31 Global Step: 54350 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:57:22,030-Speed 24447.40 samples/sec Loss 1.4268 LearningRate 0.0001 Epoch: 31 Global Step: 54360 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:57:32,029-Speed 24580.49 samples/sec Loss 1.4260 LearningRate 0.0001 Epoch: 31 Global Step: 54370 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:57:42,034-Speed 24567.66 samples/sec Loss 1.4312 LearningRate 0.0001 Epoch: 31 Global Step: 54380 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:57:52,150-Speed 24297.64 samples/sec Loss 1.4258 LearningRate 0.0001 Epoch: 31 Global Step: 54390 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:58:02,155-Speed 24567.09 samples/sec Loss 1.4264 LearningRate 0.0001 Epoch: 31 Global Step: 54400 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:58:12,148-Speed 24596.38 samples/sec Loss 1.4307 LearningRate 0.0001 Epoch: 31 Global Step: 54410 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:58:22,264-Speed 24298.58 samples/sec Loss 1.4279 LearningRate 0.0001 Epoch: 31 Global Step: 54420 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:58:32,318-Speed 24448.52 samples/sec Loss 1.4202 LearningRate 0.0001 Epoch: 31 Global Step: 54430 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:58:42,402-Speed 24373.83 samples/sec Loss 1.4115 LearningRate 0.0001 Epoch: 31 Global Step: 54440 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:58:52,515-Speed 24305.99 samples/sec Loss 1.4198 LearningRate 0.0001 Epoch: 31 Global Step: 54450 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:59:02,536-Speed 24528.30 samples/sec Loss 1.4185 LearningRate 0.0001 Epoch: 31 Global Step: 54460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-26 13:59:12,535-Speed 24581.19 samples/sec Loss 1.4271 LearningRate 0.0001 Epoch: 31 Global Step: 54470 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:59:22,251-Speed 25298.11 samples/sec Loss 1.4166 LearningRate 0.0001 Epoch: 31 Global Step: 54480 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:59:32,079-Speed 25009.58 samples/sec Loss 1.4083 LearningRate 0.0001 Epoch: 31 Global Step: 54490 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:59:41,820-Speed 25233.25 samples/sec Loss 1.4167 LearningRate 0.0001 Epoch: 31 Global Step: 54500 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 13:59:51,625-Speed 25068.08 samples/sec Loss 1.4208 LearningRate 0.0001 Epoch: 31 Global Step: 54510 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:00:01,486-Speed 24926.81 samples/sec Loss 1.4237 LearningRate 0.0001 Epoch: 31 Global Step: 54520 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:00:11,277-Speed 25106.10 samples/sec Loss 1.4249 LearningRate 0.0001 Epoch: 31 Global Step: 54530 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:00:21,071-Speed 25095.17 samples/sec Loss 1.4242 LearningRate 0.0001 Epoch: 31 Global Step: 54540 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:00:30,856-Speed 25120.02 samples/sec Loss 1.4186 LearningRate 0.0001 Epoch: 31 Global Step: 54550 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:00:40,685-Speed 25007.95 samples/sec Loss 1.4190 LearningRate 0.0001 Epoch: 31 Global Step: 54560 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:00:50,486-Speed 25079.99 samples/sec Loss 1.4251 LearningRate 0.0001 Epoch: 31 Global Step: 54570 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:01:00,245-Speed 25184.80 samples/sec Loss 1.4196 LearningRate 0.0001 Epoch: 31 Global Step: 54580 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:01:10,179-Speed 24744.17 samples/sec Loss 1.4201 LearningRate 0.0001 Epoch: 31 Global Step: 54590 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:01:19,967-Speed 25110.18 samples/sec Loss 1.4136 LearningRate 0.0001 Epoch: 31 Global Step: 54600 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:01:29,863-Speed 24844.15 samples/sec Loss 1.4318 LearningRate 0.0001 Epoch: 31 Global Step: 54610 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:01:39,641-Speed 25136.60 samples/sec Loss 1.4370 LearningRate 0.0001 Epoch: 31 Global Step: 54620 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:01:49,502-Speed 24926.37 samples/sec Loss 1.4156 LearningRate 0.0001 Epoch: 31 Global Step: 54630 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:01:59,308-Speed 25064.19 samples/sec Loss 1.4201 LearningRate 0.0001 Epoch: 31 Global Step: 54640 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:02:09,126-Speed 25041.28 samples/sec Loss 1.4202 LearningRate 0.0001 Epoch: 31 Global Step: 54650 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:02:18,886-Speed 25184.28 samples/sec Loss 1.4140 LearningRate 0.0001 Epoch: 31 Global Step: 54660 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:02:28,591-Speed 25329.35 samples/sec Loss 1.4288 LearningRate 0.0001 Epoch: 31 Global Step: 54670 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:02:38,249-Speed 25449.30 samples/sec Loss 1.4229 LearningRate 0.0001 Epoch: 31 Global Step: 54680 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:02:47,985-Speed 25248.88 samples/sec Loss 1.4177 LearningRate 0.0001 Epoch: 31 Global Step: 54690 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:02:57,812-Speed 25012.64 samples/sec Loss 1.4217 LearningRate 0.0001 Epoch: 31 Global Step: 54700 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:03:07,572-Speed 25183.38 samples/sec Loss 1.4206 LearningRate 0.0001 Epoch: 31 Global Step: 54710 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:03:17,331-Speed 25188.12 samples/sec Loss 1.4082 LearningRate 0.0001 Epoch: 31 Global Step: 54720 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:03:27,064-Speed 25255.26 samples/sec Loss 1.4164 LearningRate 0.0001 Epoch: 31 Global Step: 54730 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:03:36,869-Speed 25071.01 samples/sec Loss 1.4189 LearningRate 0.0001 Epoch: 31 Global Step: 54740 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:03:46,687-Speed 25037.28 samples/sec Loss 1.4239 LearningRate 0.0001 Epoch: 31 Global Step: 54750 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:03:56,470-Speed 25125.25 samples/sec Loss 1.4158 LearningRate 0.0001 Epoch: 31 Global Step: 54760 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:04:06,215-Speed 25222.65 samples/sec Loss 1.4162 LearningRate 0.0001 Epoch: 31 Global Step: 54770 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:04:15,950-Speed 25246.92 samples/sec Loss 1.4173 LearningRate 0.0001 Epoch: 31 Global Step: 54780 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:04:25,658-Speed 25319.58 samples/sec Loss 1.4093 LearningRate 0.0001 Epoch: 31 Global Step: 54790 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:04:35,405-Speed 25218.84 samples/sec Loss 1.4065 LearningRate 0.0001 Epoch: 31 Global Step: 54800 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:04:45,316-Speed 24803.10 samples/sec Loss 1.4137 LearningRate 0.0001 Epoch: 31 Global Step: 54810 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:04:55,050-Speed 25253.72 samples/sec Loss 1.4120 LearningRate 0.0001 Epoch: 31 Global Step: 54820 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:05:04,814-Speed 25172.37 samples/sec Loss 1.4144 LearningRate 0.0001 Epoch: 31 Global Step: 54830 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:05:14,553-Speed 25242.80 samples/sec Loss 1.4151 LearningRate 0.0001 Epoch: 31 Global Step: 54840 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:05:24,285-Speed 25258.99 samples/sec Loss 1.4147 LearningRate 0.0001 Epoch: 31 Global Step: 54850 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:05:34,024-Speed 25238.49 samples/sec Loss 1.4257 LearningRate 0.0001 Epoch: 31 Global Step: 54860 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:05:43,806-Speed 25126.45 samples/sec Loss 1.4140 LearningRate 0.0001 Epoch: 31 Global Step: 54870 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:05:53,512-Speed 25325.47 samples/sec Loss 1.4232 LearningRate 0.0001 Epoch: 31 Global Step: 54880 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:06:03,415-Speed 24820.51 samples/sec Loss 1.4187 LearningRate 0.0001 Epoch: 31 Global Step: 54890 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:06:13,098-Speed 25384.40 samples/sec Loss 1.4064 LearningRate 0.0001 Epoch: 31 Global Step: 54900 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:06:22,902-Speed 25070.78 samples/sec Loss 1.4277 LearningRate 0.0001 Epoch: 31 Global Step: 54910 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:06:32,699-Speed 25088.38 samples/sec Loss 1.4046 LearningRate 0.0001 Epoch: 31 Global Step: 54920 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:06:42,460-Speed 25189.22 samples/sec Loss 1.4241 LearningRate 0.0001 Epoch: 31 Global Step: 54930 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:06:52,393-Speed 24746.00 samples/sec Loss 1.4119 LearningRate 0.0001 Epoch: 31 Global Step: 54940 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:07:02,264-Speed 24900.02 samples/sec Loss 1.4011 LearningRate 0.0001 Epoch: 31 Global Step: 54950 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:07:11,925-Speed 25442.64 samples/sec Loss 1.3991 LearningRate 0.0001 Epoch: 31 Global Step: 54960 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:07:21,654-Speed 25265.02 samples/sec Loss 1.4089 LearningRate 0.0001 Epoch: 31 Global Step: 54970 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:07:31,451-Speed 25088.33 samples/sec Loss 1.4078 LearningRate 0.0001 Epoch: 31 Global Step: 54980 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:07:41,175-Speed 25278.87 samples/sec Loss 1.4090 LearningRate 0.0001 Epoch: 31 Global Step: 54990 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:07:50,921-Speed 25220.07 samples/sec Loss 1.4120 LearningRate 0.0001 Epoch: 31 Global Step: 55000 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:08:00,690-Speed 25160.74 samples/sec Loss 1.4043 LearningRate 0.0001 Epoch: 31 Global Step: 55010 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:08:10,534-Speed 24967.28 samples/sec Loss 1.4231 LearningRate 0.0001 Epoch: 31 Global Step: 55020 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:08:20,359-Speed 25015.92 samples/sec Loss 1.4148 LearningRate 0.0001 Epoch: 31 Global Step: 55030 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:08:30,198-Speed 24982.71 samples/sec Loss 1.4186 LearningRate 0.0001 Epoch: 31 Global Step: 55040 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:08:39,902-Speed 25329.16 samples/sec Loss 1.4136 LearningRate 0.0001 Epoch: 31 Global Step: 55050 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:08:49,692-Speed 25108.14 samples/sec Loss 1.4137 LearningRate 0.0001 Epoch: 31 Global Step: 55060 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:08:59,456-Speed 25174.99 samples/sec Loss 1.4190 LearningRate 0.0001 Epoch: 31 Global Step: 55070 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:09:09,214-Speed 25187.41 samples/sec Loss 1.4052 LearningRate 0.0001 Epoch: 31 Global Step: 55080 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:09:18,978-Speed 25174.24 samples/sec Loss 1.4058 LearningRate 0.0001 Epoch: 31 Global Step: 55090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-26 14:09:28,695-Speed 25297.23 samples/sec Loss 1.4128 LearningRate 0.0001 Epoch: 31 Global Step: 55100 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:09:38,446-Speed 25209.88 samples/sec Loss 1.4171 LearningRate 0.0001 Epoch: 31 Global Step: 55110 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:09:48,211-Speed 25173.44 samples/sec Loss 1.3989 LearningRate 0.0001 Epoch: 31 Global Step: 55120 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:09:58,115-Speed 24817.85 samples/sec Loss 1.4084 LearningRate 0.0001 Epoch: 31 Global Step: 55130 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:10:07,951-Speed 24989.06 samples/sec Loss 1.4104 LearningRate 0.0001 Epoch: 31 Global Step: 55140 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:10:17,803-Speed 24949.45 samples/sec Loss 1.4145 LearningRate 0.0001 Epoch: 31 Global Step: 55150 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:10:27,598-Speed 25094.16 samples/sec Loss 1.4126 LearningRate 0.0001 Epoch: 31 Global Step: 55160 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:10:37,316-Speed 25294.28 samples/sec Loss 1.4089 LearningRate 0.0001 Epoch: 31 Global Step: 55170 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:10:47,036-Speed 25288.04 samples/sec Loss 1.4088 LearningRate 0.0001 Epoch: 31 Global Step: 55180 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:10:56,809-Speed 25148.55 samples/sec Loss 1.4201 LearningRate 0.0001 Epoch: 31 Global Step: 55190 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:11:06,608-Speed 25091.20 samples/sec Loss 1.4126 LearningRate 0.0001 Epoch: 31 Global Step: 55200 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:11:16,314-Speed 25324.59 samples/sec Loss 1.3962 LearningRate 0.0000 Epoch: 31 Global Step: 55210 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:11:25,986-Speed 25413.08 samples/sec Loss 1.4090 LearningRate 0.0000 Epoch: 31 Global Step: 55220 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:11:35,728-Speed 25229.97 samples/sec Loss 1.4146 LearningRate 0.0000 Epoch: 31 Global Step: 55230 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:11:45,444-Speed 25297.60 samples/sec Loss 1.4079 LearningRate 0.0000 Epoch: 31 Global Step: 55240 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:11:55,201-Speed 25192.26 samples/sec Loss 1.4070 LearningRate 0.0000 Epoch: 31 Global Step: 55250 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:12:05,067-Speed 24913.69 samples/sec Loss 1.4197 LearningRate 0.0000 Epoch: 31 Global Step: 55260 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:12:14,895-Speed 25009.03 samples/sec Loss 1.4137 LearningRate 0.0000 Epoch: 31 Global Step: 55270 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:12:24,604-Speed 25315.40 samples/sec Loss 1.4118 LearningRate 0.0000 Epoch: 31 Global Step: 55280 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:12:34,411-Speed 25062.99 samples/sec Loss 1.4244 LearningRate 0.0000 Epoch: 31 Global Step: 55290 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:12:44,099-Speed 25370.65 samples/sec Loss 1.4049 LearningRate 0.0000 Epoch: 31 Global Step: 55300 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:13:43,195-Speed 4158.73 samples/sec Loss 1.4113 LearningRate 0.0000 Epoch: 32 Global Step: 55310 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:13:53,057-Speed 24924.48 samples/sec Loss 1.3985 LearningRate 0.0000 Epoch: 32 Global Step: 55320 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:14:02,960-Speed 24820.61 samples/sec Loss 1.4182 LearningRate 0.0000 Epoch: 32 Global Step: 55330 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:14:12,947-Speed 24612.17 samples/sec Loss 1.3976 LearningRate 0.0000 Epoch: 32 Global Step: 55340 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:14:22,854-Speed 24807.86 samples/sec Loss 1.4093 LearningRate 0.0000 Epoch: 32 Global Step: 55350 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:14:32,777-Speed 24769.02 samples/sec Loss 1.3937 LearningRate 0.0000 Epoch: 32 Global Step: 55360 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:14:42,636-Speed 24932.34 samples/sec Loss 1.4034 LearningRate 0.0000 Epoch: 32 Global Step: 55370 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:14:52,574-Speed 24732.47 samples/sec Loss 1.3991 LearningRate 0.0000 Epoch: 32 Global Step: 55380 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:15:02,505-Speed 24750.56 samples/sec Loss 1.4051 LearningRate 0.0000 Epoch: 32 Global Step: 55390 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:15:12,504-Speed 24580.38 samples/sec Loss 1.3998 LearningRate 0.0000 Epoch: 32 Global Step: 55400 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:15:22,460-Speed 24686.46 samples/sec Loss 1.4012 LearningRate 0.0000 Epoch: 32 Global Step: 55410 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:15:32,262-Speed 25077.05 samples/sec Loss 1.3935 LearningRate 0.0000 Epoch: 32 Global Step: 55420 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:15:42,045-Speed 25123.86 samples/sec Loss 1.4041 LearningRate 0.0000 Epoch: 32 Global Step: 55430 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:15:51,808-Speed 25177.03 samples/sec Loss 1.4034 LearningRate 0.0000 Epoch: 32 Global Step: 55440 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:16:01,577-Speed 25158.35 samples/sec Loss 1.4019 LearningRate 0.0000 Epoch: 32 Global Step: 55450 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:16:11,388-Speed 25054.23 samples/sec Loss 1.3957 LearningRate 0.0000 Epoch: 32 Global Step: 55460 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:16:21,124-Speed 25246.74 samples/sec Loss 1.4014 LearningRate 0.0000 Epoch: 32 Global Step: 55470 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:16:30,945-Speed 25026.18 samples/sec Loss 1.3966 LearningRate 0.0000 Epoch: 32 Global Step: 55480 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:16:40,666-Speed 25285.34 samples/sec Loss 1.3986 LearningRate 0.0000 Epoch: 32 Global Step: 55490 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:16:50,426-Speed 25184.22 samples/sec Loss 1.4006 LearningRate 0.0000 Epoch: 32 Global Step: 55500 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:17:00,277-Speed 24954.55 samples/sec Loss 1.4015 LearningRate 0.0000 Epoch: 32 Global Step: 55510 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:17:09,975-Speed 25344.54 samples/sec Loss 1.3987 LearningRate 0.0000 Epoch: 32 Global Step: 55520 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:17:19,748-Speed 25149.53 samples/sec Loss 1.4072 LearningRate 0.0000 Epoch: 32 Global Step: 55530 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:17:29,470-Speed 25284.54 samples/sec Loss 1.3991 LearningRate 0.0000 Epoch: 32 Global Step: 55540 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:17:39,304-Speed 24993.36 samples/sec Loss 1.3950 LearningRate 0.0000 Epoch: 32 Global Step: 55550 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:17:49,202-Speed 24831.92 samples/sec Loss 1.4003 LearningRate 0.0000 Epoch: 32 Global Step: 55560 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:17:59,064-Speed 24925.24 samples/sec Loss 1.4077 LearningRate 0.0000 Epoch: 32 Global Step: 55570 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:18:08,957-Speed 24843.71 samples/sec Loss 1.4019 LearningRate 0.0000 Epoch: 32 Global Step: 55580 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:18:18,833-Speed 24890.06 samples/sec Loss 1.4011 LearningRate 0.0000 Epoch: 32 Global Step: 55590 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:18:28,601-Speed 25162.16 samples/sec Loss 1.4056 LearningRate 0.0000 Epoch: 32 Global Step: 55600 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:18:38,462-Speed 24927.57 samples/sec Loss 1.4085 LearningRate 0.0000 Epoch: 32 Global Step: 55610 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:18:48,171-Speed 25316.58 samples/sec Loss 1.4091 LearningRate 0.0000 Epoch: 32 Global Step: 55620 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:18:57,919-Speed 25212.55 samples/sec Loss 1.3927 LearningRate 0.0000 Epoch: 32 Global Step: 55630 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:19:07,664-Speed 25224.50 samples/sec Loss 1.4013 LearningRate 0.0000 Epoch: 32 Global Step: 55640 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:19:17,563-Speed 24828.78 samples/sec Loss 1.3976 LearningRate 0.0000 Epoch: 32 Global Step: 55650 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:19:27,342-Speed 25134.08 samples/sec Loss 1.3994 LearningRate 0.0000 Epoch: 32 Global Step: 55660 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:19:37,090-Speed 25215.47 samples/sec Loss 1.3956 LearningRate 0.0000 Epoch: 32 Global Step: 55670 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:19:46,849-Speed 25186.70 samples/sec Loss 1.4009 LearningRate 0.0000 Epoch: 32 Global Step: 55680 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:19:56,640-Speed 25103.42 samples/sec Loss 1.3994 LearningRate 0.0000 Epoch: 32 Global Step: 55690 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:20:06,398-Speed 25191.64 samples/sec Loss 1.3892 LearningRate 0.0000 Epoch: 32 Global Step: 55700 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:20:16,214-Speed 25042.70 samples/sec Loss 1.4066 LearningRate 0.0000 Epoch: 32 Global Step: 55710 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:20:25,957-Speed 25226.95 samples/sec Loss 1.3994 LearningRate 0.0000 Epoch: 32 Global Step: 55720 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:20:35,829-Speed 24899.26 samples/sec Loss 1.4010 LearningRate 0.0000 Epoch: 32 Global Step: 55730 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:20:45,709-Speed 24879.56 samples/sec Loss 1.4005 LearningRate 0.0000 Epoch: 32 Global Step: 55740 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:20:55,538-Speed 25004.72 samples/sec Loss 1.3948 LearningRate 0.0000 Epoch: 32 Global Step: 55750 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:21:05,299-Speed 25183.12 samples/sec Loss 1.3979 LearningRate 0.0000 Epoch: 32 Global Step: 55760 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:21:15,098-Speed 25084.79 samples/sec Loss 1.3970 LearningRate 0.0000 Epoch: 32 Global Step: 55770 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:21:24,976-Speed 24881.39 samples/sec Loss 1.4043 LearningRate 0.0000 Epoch: 32 Global Step: 55780 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:21:34,836-Speed 24928.52 samples/sec Loss 1.4001 LearningRate 0.0000 Epoch: 32 Global Step: 55790 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:21:44,570-Speed 25252.05 samples/sec Loss 1.4024 LearningRate 0.0000 Epoch: 32 Global Step: 55800 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:21:54,351-Speed 25131.37 samples/sec Loss 1.4050 LearningRate 0.0000 Epoch: 32 Global Step: 55810 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:22:04,135-Speed 25123.24 samples/sec Loss 1.3950 LearningRate 0.0000 Epoch: 32 Global Step: 55820 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:22:13,922-Speed 25115.55 samples/sec Loss 1.3907 LearningRate 0.0000 Epoch: 32 Global Step: 55830 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:22:23,700-Speed 25138.02 samples/sec Loss 1.3922 LearningRate 0.0000 Epoch: 32 Global Step: 55840 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:22:33,539-Speed 24980.65 samples/sec Loss 1.3875 LearningRate 0.0000 Epoch: 32 Global Step: 55850 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:22:43,307-Speed 25163.66 samples/sec Loss 1.3926 LearningRate 0.0000 Epoch: 32 Global Step: 55860 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:22:53,199-Speed 24849.39 samples/sec Loss 1.3878 LearningRate 0.0000 Epoch: 32 Global Step: 55870 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:23:03,010-Speed 25054.05 samples/sec Loss 1.3827 LearningRate 0.0000 Epoch: 32 Global Step: 55880 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:23:12,754-Speed 25225.15 samples/sec Loss 1.4008 LearningRate 0.0000 Epoch: 32 Global Step: 55890 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:23:22,473-Speed 25290.13 samples/sec Loss 1.3986 LearningRate 0.0000 Epoch: 32 Global Step: 55900 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:23:32,209-Speed 25244.50 samples/sec Loss 1.3933 LearningRate 0.0000 Epoch: 32 Global Step: 55910 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:23:41,938-Speed 25265.20 samples/sec Loss 1.3938 LearningRate 0.0000 Epoch: 32 Global Step: 55920 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:23:51,689-Speed 25207.08 samples/sec Loss 1.3926 LearningRate 0.0000 Epoch: 32 Global Step: 55930 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:24:01,560-Speed 24905.49 samples/sec Loss 1.3864 LearningRate 0.0000 Epoch: 32 Global Step: 55940 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:24:11,356-Speed 25092.96 samples/sec Loss 1.4017 LearningRate 0.0000 Epoch: 32 Global Step: 55950 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:24:21,144-Speed 25111.77 samples/sec Loss 1.4050 LearningRate 0.0000 Epoch: 32 Global Step: 55960 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:24:30,890-Speed 25221.54 samples/sec Loss 1.3943 LearningRate 0.0000 Epoch: 32 Global Step: 55970 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:24:40,652-Speed 25178.10 samples/sec Loss 1.3901 LearningRate 0.0000 Epoch: 32 Global Step: 55980 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:24:50,427-Speed 25145.82 samples/sec Loss 1.4003 LearningRate 0.0000 Epoch: 32 Global Step: 55990 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:25:00,295-Speed 24907.24 samples/sec Loss 1.3968 LearningRate 0.0000 Epoch: 32 Global Step: 56000 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:25:10,165-Speed 24904.41 samples/sec Loss 1.4048 LearningRate 0.0000 Epoch: 32 Global Step: 56010 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:25:19,921-Speed 25196.05 samples/sec Loss 1.3886 LearningRate 0.0000 Epoch: 32 Global Step: 56020 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:25:29,690-Speed 25159.58 samples/sec Loss 1.3908 LearningRate 0.0000 Epoch: 32 Global Step: 56030 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:25:39,456-Speed 25169.21 samples/sec Loss 1.3949 LearningRate 0.0000 Epoch: 32 Global Step: 56040 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:25:49,221-Speed 25171.11 samples/sec Loss 1.4000 LearningRate 0.0000 Epoch: 32 Global Step: 56050 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:25:58,896-Speed 25404.33 samples/sec Loss 1.3860 LearningRate 0.0000 Epoch: 32 Global Step: 56060 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:26:08,569-Speed 25410.03 samples/sec Loss 1.3919 LearningRate 0.0000 Epoch: 32 Global Step: 56070 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:26:18,358-Speed 25109.01 samples/sec Loss 1.3931 LearningRate 0.0000 Epoch: 32 Global Step: 56080 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:26:28,164-Speed 25064.75 samples/sec Loss 1.3942 LearningRate 0.0000 Epoch: 32 Global Step: 56090 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:26:37,885-Speed 25283.71 samples/sec Loss 1.3841 LearningRate 0.0000 Epoch: 32 Global Step: 56100 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:26:47,665-Speed 25138.83 samples/sec Loss 1.3848 LearningRate 0.0000 Epoch: 32 Global Step: 56110 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:26:57,647-Speed 24623.42 samples/sec Loss 1.4011 LearningRate 0.0000 Epoch: 32 Global Step: 56120 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:27:07,390-Speed 25227.70 samples/sec Loss 1.3823 LearningRate 0.0000 Epoch: 32 Global Step: 56130 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:27:17,157-Speed 25164.99 samples/sec Loss 1.3910 LearningRate 0.0000 Epoch: 32 Global Step: 56140 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:27:26,943-Speed 25114.92 samples/sec Loss 1.3857 LearningRate 0.0000 Epoch: 32 Global Step: 56150 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:27:36,719-Speed 25143.87 samples/sec Loss 1.3812 LearningRate 0.0000 Epoch: 32 Global Step: 56160 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:27:46,470-Speed 25207.23 samples/sec Loss 1.3836 LearningRate 0.0000 Epoch: 32 Global Step: 56170 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:27:56,413-Speed 24721.54 samples/sec Loss 1.3850 LearningRate 0.0000 Epoch: 32 Global Step: 56180 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:28:06,184-Speed 25154.23 samples/sec Loss 1.3864 LearningRate 0.0000 Epoch: 32 Global Step: 56190 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:28:15,885-Speed 25338.97 samples/sec Loss 1.3805 LearningRate 0.0000 Epoch: 32 Global Step: 56200 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:28:25,620-Speed 25248.67 samples/sec Loss 1.3842 LearningRate 0.0000 Epoch: 32 Global Step: 56210 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:28:35,421-Speed 25079.36 samples/sec Loss 1.3810 LearningRate 0.0000 Epoch: 32 Global Step: 56220 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:28:45,155-Speed 25251.81 samples/sec Loss 1.3886 LearningRate 0.0000 Epoch: 32 Global Step: 56230 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:28:54,938-Speed 25124.38 samples/sec Loss 1.3868 LearningRate 0.0000 Epoch: 32 Global Step: 56240 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:29:04,647-Speed 25316.86 samples/sec Loss 1.3895 LearningRate 0.0000 Epoch: 32 Global Step: 56250 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:29:14,373-Speed 25273.41 samples/sec Loss 1.3858 LearningRate 0.0000 Epoch: 32 Global Step: 56260 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:29:24,166-Speed 25104.45 samples/sec Loss 1.3873 LearningRate 0.0000 Epoch: 32 Global Step: 56270 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:29:33,932-Speed 25168.07 samples/sec Loss 1.3787 LearningRate 0.0000 Epoch: 32 Global Step: 56280 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:29:43,752-Speed 25028.56 samples/sec Loss 1.3922 LearningRate 0.0000 Epoch: 32 Global Step: 56290 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:29:53,473-Speed 25285.51 samples/sec Loss 1.3874 LearningRate 0.0000 Epoch: 32 Global Step: 56300 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:30:03,361-Speed 24857.37 samples/sec Loss 1.3779 LearningRate 0.0000 Epoch: 32 Global Step: 56310 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:30:13,168-Speed 25065.27 samples/sec Loss 1.3818 LearningRate 0.0000 Epoch: 32 Global Step: 56320 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:30:23,058-Speed 24852.94 samples/sec Loss 1.3777 LearningRate 0.0000 Epoch: 32 Global Step: 56330 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:30:32,804-Speed 25221.54 samples/sec Loss 1.3881 LearningRate 0.0000 Epoch: 32 Global Step: 56340 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:30:42,597-Speed 25100.23 samples/sec Loss 1.3904 LearningRate 0.0000 Epoch: 32 Global Step: 56350 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:30:52,359-Speed 25177.73 samples/sec Loss 1.3917 LearningRate 0.0000 Epoch: 32 Global Step: 56360 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:31:02,151-Speed 25106.31 samples/sec Loss 1.3860 LearningRate 0.0000 Epoch: 32 Global Step: 56370 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:31:11,875-Speed 25275.84 samples/sec Loss 1.3859 LearningRate 0.0000 Epoch: 32 Global Step: 56380 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:31:21,712-Speed 24993.66 samples/sec Loss 1.3814 LearningRate 0.0000 Epoch: 32 Global Step: 56390 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:31:31,494-Speed 25126.80 samples/sec Loss 1.3814 LearningRate 0.0000 Epoch: 32 Global Step: 56400 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:31:41,255-Speed 25181.90 samples/sec Loss 1.3770 LearningRate 0.0000 Epoch: 32 Global Step: 56410 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:31:51,089-Speed 24993.33 samples/sec Loss 1.3757 LearningRate 0.0000 Epoch: 32 Global Step: 56420 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:32:00,894-Speed 25067.07 samples/sec Loss 1.3849 LearningRate 0.0000 Epoch: 32 Global Step: 56430 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:32:10,617-Speed 25280.58 samples/sec Loss 1.3794 LearningRate 0.0000 Epoch: 32 Global Step: 56440 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:32:20,341-Speed 25277.16 samples/sec Loss 1.3877 LearningRate 0.0000 Epoch: 32 Global Step: 56450 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:32:30,114-Speed 25147.54 samples/sec Loss 1.3787 LearningRate 0.0000 Epoch: 32 Global Step: 56460 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:32:39,835-Speed 25283.86 samples/sec Loss 1.3883 LearningRate 0.0000 Epoch: 32 Global Step: 56470 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:32:49,645-Speed 25057.71 samples/sec Loss 1.3749 LearningRate 0.0000 Epoch: 32 Global Step: 56480 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:32:59,345-Speed 25338.21 samples/sec Loss 1.3839 LearningRate 0.0000 Epoch: 32 Global Step: 56490 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:33:09,040-Speed 25362.34 samples/sec Loss 1.3777 LearningRate 0.0000 Epoch: 32 Global Step: 56500 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:33:18,818-Speed 25137.69 samples/sec Loss 1.3810 LearningRate 0.0000 Epoch: 32 Global Step: 56510 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:33:28,522-Speed 25329.07 samples/sec Loss 1.3806 LearningRate 0.0000 Epoch: 32 Global Step: 56520 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:33:38,237-Speed 25308.61 samples/sec Loss 1.3764 LearningRate 0.0000 Epoch: 32 Global Step: 56530 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:33:47,995-Speed 25188.17 samples/sec Loss 1.3771 LearningRate 0.0000 Epoch: 32 Global Step: 56540 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:33:57,736-Speed 25234.75 samples/sec Loss 1.3854 LearningRate 0.0000 Epoch: 32 Global Step: 56550 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:34:07,524-Speed 25112.27 samples/sec Loss 1.3862 LearningRate 0.0000 Epoch: 32 Global Step: 56560 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:34:17,324-Speed 25080.39 samples/sec Loss 1.3787 LearningRate 0.0000 Epoch: 32 Global Step: 56570 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:34:27,007-Speed 25381.21 samples/sec Loss 1.3888 LearningRate 0.0000 Epoch: 32 Global Step: 56580 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:34:36,855-Speed 24958.93 samples/sec Loss 1.3893 LearningRate 0.0000 Epoch: 32 Global Step: 56590 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:34:46,652-Speed 25090.77 samples/sec Loss 1.3746 LearningRate 0.0000 Epoch: 32 Global Step: 56600 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:34:56,410-Speed 25189.70 samples/sec Loss 1.3734 LearningRate 0.0000 Epoch: 32 Global Step: 56610 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:35:06,327-Speed 24784.34 samples/sec Loss 1.3763 LearningRate 0.0000 Epoch: 32 Global Step: 56620 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-26 14:35:16,185-Speed 24935.49 samples/sec Loss 1.3842 LearningRate 0.0000 Epoch: 32 Global Step: 56630 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:35:26,021-Speed 24989.87 samples/sec Loss 1.3834 LearningRate 0.0000 Epoch: 32 Global Step: 56640 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:35:35,830-Speed 25058.74 samples/sec Loss 1.3764 LearningRate 0.0000 Epoch: 32 Global Step: 56650 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:35:45,669-Speed 24982.46 samples/sec Loss 1.3856 LearningRate 0.0000 Epoch: 32 Global Step: 56660 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:35:55,471-Speed 25074.18 samples/sec Loss 1.3786 LearningRate 0.0000 Epoch: 32 Global Step: 56670 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:36:05,307-Speed 24989.49 samples/sec Loss 1.3742 LearningRate 0.0000 Epoch: 32 Global Step: 56680 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:36:15,138-Speed 25020.35 samples/sec Loss 1.3793 LearningRate 0.0000 Epoch: 32 Global Step: 56690 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:36:24,996-Speed 24931.19 samples/sec Loss 1.3675 LearningRate 0.0000 Epoch: 32 Global Step: 56700 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:36:34,695-Speed 25343.42 samples/sec Loss 1.3817 LearningRate 0.0000 Epoch: 32 Global Step: 56710 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-26 14:36:44,453-Speed 25188.44 samples/sec Loss 1.3697 LearningRate 0.0000 Epoch: 32 Global Step: 56720 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 14:36:54,349-Speed 24837.46 samples/sec Loss 1.3762 LearningRate 0.0000 Epoch: 32 Global Step: 56730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:37:04,180-Speed 25002.39 samples/sec Loss 1.3801 LearningRate 0.0000 Epoch: 32 Global Step: 56740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:37:13,992-Speed 25050.10 samples/sec Loss 1.3703 LearningRate 0.0000 Epoch: 32 Global Step: 56750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:37:23,778-Speed 25116.24 samples/sec Loss 1.3812 LearningRate 0.0000 Epoch: 32 Global Step: 56760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:37:33,661-Speed 24871.13 samples/sec Loss 1.3713 LearningRate 0.0000 Epoch: 32 Global Step: 56770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:37:43,528-Speed 24909.72 samples/sec Loss 1.3864 LearningRate 0.0000 Epoch: 32 Global Step: 56780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:37:53,276-Speed 25212.73 samples/sec Loss 1.3796 LearningRate 0.0000 Epoch: 32 Global Step: 56790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:38:03,048-Speed 25152.65 samples/sec Loss 1.3833 LearningRate 0.0000 Epoch: 32 Global Step: 56800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:38:12,808-Speed 25184.87 samples/sec Loss 1.3799 LearningRate 0.0000 Epoch: 32 Global Step: 56810 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:38:22,591-Speed 25130.93 samples/sec Loss 1.3824 LearningRate 0.0000 Epoch: 32 Global Step: 56820 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:38:32,328-Speed 25243.99 samples/sec Loss 1.3719 LearningRate 0.0000 Epoch: 32 Global Step: 56830 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:38:42,119-Speed 25105.56 samples/sec Loss 1.3764 LearningRate 0.0000 Epoch: 32 Global Step: 56840 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:38:51,943-Speed 25018.73 samples/sec Loss 1.3747 LearningRate 0.0000 Epoch: 32 Global Step: 56850 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:39:01,651-Speed 25320.49 samples/sec Loss 1.3763 LearningRate 0.0000 Epoch: 32 Global Step: 56860 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:39:11,500-Speed 24954.83 samples/sec Loss 1.3732 LearningRate 0.0000 Epoch: 32 Global Step: 56870 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:39:21,205-Speed 25331.48 samples/sec Loss 1.3768 LearningRate 0.0000 Epoch: 32 Global Step: 56880 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:39:30,923-Speed 25291.86 samples/sec Loss 1.3759 LearningRate 0.0000 Epoch: 32 Global Step: 56890 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:39:40,719-Speed 25092.27 samples/sec Loss 1.3815 LearningRate 0.0000 Epoch: 32 Global Step: 56900 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:39:50,506-Speed 25115.02 samples/sec Loss 1.3802 LearningRate 0.0000 Epoch: 32 Global Step: 56910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:40:00,279-Speed 25152.31 samples/sec Loss 1.3789 LearningRate 0.0000 Epoch: 32 Global Step: 56920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:40:10,169-Speed 24851.25 samples/sec Loss 1.3734 LearningRate 0.0000 Epoch: 32 Global Step: 56930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:40:19,953-Speed 25122.44 samples/sec Loss 1.3722 LearningRate 0.0000 Epoch: 32 Global Step: 56940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:40:29,959-Speed 24563.43 samples/sec Loss 1.3745 LearningRate 0.0000 Epoch: 32 Global Step: 56950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:40:39,831-Speed 24899.48 samples/sec Loss 1.3679 LearningRate 0.0000 Epoch: 32 Global Step: 56960 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:40:49,725-Speed 24843.27 samples/sec Loss 1.3723 LearningRate 0.0000 Epoch: 32 Global Step: 56970 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:40:59,608-Speed 24870.38 samples/sec Loss 1.3818 LearningRate 0.0000 Epoch: 32 Global Step: 56980 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:41:09,403-Speed 25094.20 samples/sec Loss 1.3766 LearningRate 0.0000 Epoch: 32 Global Step: 56990 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:41:19,178-Speed 25146.97 samples/sec Loss 1.3772 LearningRate 0.0000 Epoch: 32 Global Step: 57000 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:41:28,997-Speed 25036.02 samples/sec Loss 1.3886 LearningRate 0.0000 Epoch: 32 Global Step: 57010 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:41:38,765-Speed 25162.44 samples/sec Loss 1.3743 LearningRate 0.0000 Epoch: 32 Global Step: 57020 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:41:48,565-Speed 25080.97 samples/sec Loss 1.3833 LearningRate 0.0000 Epoch: 32 Global Step: 57030 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:42:47,872-Speed 4144.01 samples/sec Loss 1.3745 LearningRate 0.0000 Epoch: 33 Global Step: 57040 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:42:57,593-Speed 25283.73 samples/sec Loss 1.3729 LearningRate 0.0000 Epoch: 33 Global Step: 57050 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:43:07,349-Speed 25196.45 samples/sec Loss 1.3652 LearningRate 0.0000 Epoch: 33 Global Step: 57060 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:43:17,110-Speed 25180.81 samples/sec Loss 1.3704 LearningRate 0.0000 Epoch: 33 Global Step: 57070 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:43:26,907-Speed 25087.23 samples/sec Loss 1.3774 LearningRate 0.0000 Epoch: 33 Global Step: 57080 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:43:36,716-Speed 25058.52 samples/sec Loss 1.3660 LearningRate 0.0000 Epoch: 33 Global Step: 57090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:43:46,429-Speed 25305.16 samples/sec Loss 1.3629 LearningRate 0.0000 Epoch: 33 Global Step: 57100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:43:56,223-Speed 25095.29 samples/sec Loss 1.3696 LearningRate 0.0000 Epoch: 33 Global Step: 57110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:44:05,989-Speed 25167.96 samples/sec Loss 1.3643 LearningRate 0.0000 Epoch: 33 Global Step: 57120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:44:15,762-Speed 25152.32 samples/sec Loss 1.3653 LearningRate 0.0000 Epoch: 33 Global Step: 57130 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 14:44:25,515-Speed 25199.75 samples/sec Loss 1.3636 LearningRate 0.0000 Epoch: 33 Global Step: 57140 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 14:44:35,315-Speed 25082.52 samples/sec Loss 1.3612 LearningRate 0.0000 Epoch: 33 Global Step: 57150 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 14:44:45,150-Speed 24990.18 samples/sec Loss 1.3716 LearningRate 0.0000 Epoch: 33 Global Step: 57160 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 14:44:54,908-Speed 25189.57 samples/sec Loss 1.3780 LearningRate 0.0000 Epoch: 33 Global Step: 57170 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 14:45:04,750-Speed 24973.76 samples/sec Loss 1.3748 LearningRate 0.0000 Epoch: 33 Global Step: 57180 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 14:45:14,623-Speed 24894.57 samples/sec Loss 1.3666 LearningRate 0.0000 Epoch: 33 Global Step: 57190 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 14:45:24,383-Speed 25189.66 samples/sec Loss 1.3666 LearningRate 0.0000 Epoch: 33 Global Step: 57200 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 14:45:34,125-Speed 25230.85 samples/sec Loss 1.3593 LearningRate 0.0000 Epoch: 33 Global Step: 57210 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 14:45:43,849-Speed 25276.60 samples/sec Loss 1.3671 LearningRate 0.0000 Epoch: 33 Global Step: 57220 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 14:45:53,607-Speed 25187.22 samples/sec Loss 1.3596 LearningRate 0.0000 Epoch: 33 Global Step: 57230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:46:03,394-Speed 25116.49 samples/sec Loss 1.3736 LearningRate 0.0000 Epoch: 33 Global Step: 57240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:46:13,195-Speed 25077.80 samples/sec Loss 1.3684 LearningRate 0.0000 Epoch: 33 Global Step: 57250 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:46:23,124-Speed 24753.94 samples/sec Loss 1.3728 LearningRate 0.0000 Epoch: 33 Global Step: 57260 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:46:32,818-Speed 25357.43 samples/sec Loss 1.3634 LearningRate 0.0000 Epoch: 33 Global Step: 57270 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:46:42,565-Speed 25216.17 samples/sec Loss 1.3617 LearningRate 0.0000 Epoch: 33 Global Step: 57280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:46:52,418-Speed 24946.29 samples/sec Loss 1.3571 LearningRate 0.0000 Epoch: 33 Global Step: 57290 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:47:02,182-Speed 25174.91 samples/sec Loss 1.3562 LearningRate 0.0000 Epoch: 33 Global Step: 57300 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:47:11,950-Speed 25163.11 samples/sec Loss 1.3723 LearningRate 0.0000 Epoch: 33 Global Step: 57310 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:47:21,856-Speed 24809.23 samples/sec Loss 1.3709 LearningRate 0.0000 Epoch: 33 Global Step: 57320 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:47:31,690-Speed 24995.49 samples/sec Loss 1.3807 LearningRate 0.0000 Epoch: 33 Global Step: 57330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-26 14:47:41,455-Speed 25169.71 samples/sec Loss 1.3695 LearningRate 0.0000 Epoch: 33 Global Step: 57340 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:47:51,298-Speed 24970.33 samples/sec Loss 1.3725 LearningRate 0.0000 Epoch: 33 Global Step: 57350 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:48:01,122-Speed 25020.18 samples/sec Loss 1.3651 LearningRate 0.0000 Epoch: 33 Global Step: 57360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:48:10,831-Speed 25323.69 samples/sec Loss 1.3704 LearningRate 0.0000 Epoch: 33 Global Step: 57370 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:48:20,509-Speed 25394.41 samples/sec Loss 1.3685 LearningRate 0.0000 Epoch: 33 Global Step: 57380 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:48:30,292-Speed 25123.43 samples/sec Loss 1.3676 LearningRate 0.0000 Epoch: 33 Global Step: 57390 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:48:40,029-Speed 25251.30 samples/sec Loss 1.3705 LearningRate 0.0000 Epoch: 33 Global Step: 57400 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:48:49,842-Speed 25048.74 samples/sec Loss 1.3648 LearningRate 0.0000 Epoch: 33 Global Step: 57410 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:48:59,625-Speed 25122.51 samples/sec Loss 1.3698 LearningRate 0.0000 Epoch: 33 Global Step: 57420 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:49:09,519-Speed 24841.95 samples/sec Loss 1.3659 LearningRate 0.0000 Epoch: 33 Global Step: 57430 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:49:19,246-Speed 25267.49 samples/sec Loss 1.3614 LearningRate 0.0000 Epoch: 33 Global Step: 57440 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:49:29,103-Speed 24934.31 samples/sec Loss 1.3645 LearningRate 0.0000 Epoch: 33 Global Step: 57450 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:49:38,891-Speed 25111.16 samples/sec Loss 1.3648 LearningRate 0.0000 Epoch: 33 Global Step: 57460 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:49:48,675-Speed 25122.03 samples/sec Loss 1.3650 LearningRate 0.0000 Epoch: 33 Global Step: 57470 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:49:58,398-Speed 25279.18 samples/sec Loss 1.3727 LearningRate 0.0000 Epoch: 33 Global Step: 57480 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:50:08,132-Speed 25251.55 samples/sec Loss 1.3763 LearningRate 0.0000 Epoch: 33 Global Step: 57490 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:50:17,827-Speed 25351.94 samples/sec Loss 1.3600 LearningRate 0.0000 Epoch: 33 Global Step: 57500 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:50:27,617-Speed 25105.69 samples/sec Loss 1.3709 LearningRate 0.0000 Epoch: 33 Global Step: 57510 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:50:37,301-Speed 25381.85 samples/sec Loss 1.3684 LearningRate 0.0000 Epoch: 33 Global Step: 57520 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 14:50:47,058-Speed 25193.34 samples/sec Loss 1.3639 LearningRate 0.0000 Epoch: 33 Global Step: 57530 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 14:50:56,817-Speed 25185.06 samples/sec Loss 1.3641 LearningRate 0.0000 Epoch: 33 Global Step: 57540 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 14:51:06,598-Speed 25128.55 samples/sec Loss 1.3626 LearningRate 0.0000 Epoch: 33 Global Step: 57550 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 14:51:16,297-Speed 25343.20 samples/sec Loss 1.3586 LearningRate 0.0000 Epoch: 33 Global Step: 57560 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 14:51:26,099-Speed 25075.83 samples/sec Loss 1.3688 LearningRate 0.0000 Epoch: 33 Global Step: 57570 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 14:51:35,854-Speed 25198.00 samples/sec Loss 1.3598 LearningRate 0.0000 Epoch: 33 Global Step: 57580 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 14:51:45,700-Speed 24964.56 samples/sec Loss 1.3579 LearningRate 0.0000 Epoch: 33 Global Step: 57590 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 14:51:55,557-Speed 24936.00 samples/sec Loss 1.3564 LearningRate 0.0000 Epoch: 33 Global Step: 57600 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 14:52:05,328-Speed 25155.21 samples/sec Loss 1.3584 LearningRate 0.0000 Epoch: 33 Global Step: 57610 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 14:52:15,130-Speed 25075.66 samples/sec Loss 1.3702 LearningRate 0.0000 Epoch: 33 Global Step: 57620 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:52:24,904-Speed 25146.86 samples/sec Loss 1.3629 LearningRate 0.0000 Epoch: 33 Global Step: 57630 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:52:34,604-Speed 25341.43 samples/sec Loss 1.3662 LearningRate 0.0000 Epoch: 33 Global Step: 57640 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 14:52:44,491-Speed 24862.47 samples/sec Loss 1.3584 LearningRate 0.0000 Epoch: 33 Global Step: 57650 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 14:52:54,422-Speed 24749.60 samples/sec Loss 1.3646 LearningRate 0.0000 Epoch: 33 Global Step: 57660 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 14:53:04,173-Speed 25205.47 samples/sec Loss 1.3615 LearningRate 0.0000 Epoch: 33 Global Step: 57670 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 14:53:13,844-Speed 25415.97 samples/sec Loss 1.3645 LearningRate 0.0000 Epoch: 33 Global Step: 57680 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 14:53:23,569-Speed 25275.12 samples/sec Loss 1.3640 LearningRate 0.0000 Epoch: 33 Global Step: 57690 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 14:53:33,353-Speed 25122.57 samples/sec Loss 1.3643 LearningRate 0.0000 Epoch: 33 Global Step: 57700 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 14:53:43,171-Speed 25034.02 samples/sec Loss 1.3629 LearningRate 0.0000 Epoch: 33 Global Step: 57710 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 14:53:52,952-Speed 25128.11 samples/sec Loss 1.3580 LearningRate 0.0000 Epoch: 33 Global Step: 57720 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 14:54:02,769-Speed 25037.58 samples/sec Loss 1.3628 LearningRate 0.0000 Epoch: 33 Global Step: 57730 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 14:54:12,598-Speed 25008.00 samples/sec Loss 1.3667 LearningRate 0.0000 Epoch: 33 Global Step: 57740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:54:22,340-Speed 25237.08 samples/sec Loss 1.3557 LearningRate 0.0000 Epoch: 33 Global Step: 57750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:54:32,086-Speed 25219.02 samples/sec Loss 1.3566 LearningRate 0.0000 Epoch: 33 Global Step: 57760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:54:41,858-Speed 25158.87 samples/sec Loss 1.3594 LearningRate 0.0000 Epoch: 33 Global Step: 57770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:54:51,741-Speed 24871.63 samples/sec Loss 1.3511 LearningRate 0.0000 Epoch: 33 Global Step: 57780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:55:01,601-Speed 24929.15 samples/sec Loss 1.3487 LearningRate 0.0000 Epoch: 33 Global Step: 57790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:55:11,374-Speed 25151.91 samples/sec Loss 1.3741 LearningRate 0.0000 Epoch: 33 Global Step: 57800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:55:21,131-Speed 25190.67 samples/sec Loss 1.3644 LearningRate 0.0000 Epoch: 33 Global Step: 57810 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:55:30,872-Speed 25236.72 samples/sec Loss 1.3610 LearningRate 0.0000 Epoch: 33 Global Step: 57820 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:55:40,684-Speed 25048.86 samples/sec Loss 1.3651 LearningRate 0.0000 Epoch: 33 Global Step: 57830 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:55:50,432-Speed 25215.44 samples/sec Loss 1.3576 LearningRate 0.0000 Epoch: 33 Global Step: 57840 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:56:00,288-Speed 24945.99 samples/sec Loss 1.3575 LearningRate 0.0000 Epoch: 33 Global Step: 57850 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:56:10,094-Speed 25065.97 samples/sec Loss 1.3553 LearningRate 0.0000 Epoch: 33 Global Step: 57860 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:56:20,016-Speed 24772.24 samples/sec Loss 1.3626 LearningRate 0.0000 Epoch: 33 Global Step: 57870 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:56:29,841-Speed 25017.60 samples/sec Loss 1.3530 LearningRate 0.0000 Epoch: 33 Global Step: 57880 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:56:39,632-Speed 25110.07 samples/sec Loss 1.3568 LearningRate 0.0000 Epoch: 33 Global Step: 57890 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:56:49,348-Speed 25298.75 samples/sec Loss 1.3459 LearningRate 0.0000 Epoch: 33 Global Step: 57900 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:56:59,149-Speed 25080.28 samples/sec Loss 1.3593 LearningRate 0.0000 Epoch: 33 Global Step: 57910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:57:08,896-Speed 25217.04 samples/sec Loss 1.3448 LearningRate 0.0000 Epoch: 33 Global Step: 57920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:57:18,695-Speed 25090.79 samples/sec Loss 1.3535 LearningRate 0.0000 Epoch: 33 Global Step: 57930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:57:28,455-Speed 25182.17 samples/sec Loss 1.3479 LearningRate 0.0000 Epoch: 33 Global Step: 57940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-26 14:57:38,169-Speed 25302.23 samples/sec Loss 1.3584 LearningRate 0.0000 Epoch: 33 Global Step: 57950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:57:47,942-Speed 25150.79 samples/sec Loss 1.3479 LearningRate 0.0000 Epoch: 33 Global Step: 57960 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:57:57,690-Speed 25215.58 samples/sec Loss 1.3505 LearningRate 0.0000 Epoch: 33 Global Step: 57970 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:58:07,519-Speed 25006.84 samples/sec Loss 1.3627 LearningRate 0.0000 Epoch: 33 Global Step: 57980 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:58:17,308-Speed 25109.73 samples/sec Loss 1.3495 LearningRate 0.0000 Epoch: 33 Global Step: 57990 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:58:27,095-Speed 25112.58 samples/sec Loss 1.3493 LearningRate 0.0000 Epoch: 33 Global Step: 58000 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:58:36,869-Speed 25147.23 samples/sec Loss 1.3547 LearningRate 0.0000 Epoch: 33 Global Step: 58010 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:58:46,612-Speed 25228.37 samples/sec Loss 1.3571 LearningRate 0.0000 Epoch: 33 Global Step: 58020 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:58:56,336-Speed 25279.04 samples/sec Loss 1.3471 LearningRate 0.0000 Epoch: 33 Global Step: 58030 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:59:06,023-Speed 25372.89 samples/sec Loss 1.3424 LearningRate 0.0000 Epoch: 33 Global Step: 58040 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:59:15,819-Speed 25092.53 samples/sec Loss 1.3529 LearningRate 0.0000 Epoch: 33 Global Step: 58050 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:59:25,647-Speed 25008.93 samples/sec Loss 1.3577 LearningRate 0.0000 Epoch: 33 Global Step: 58060 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:59:35,398-Speed 25206.21 samples/sec Loss 1.3407 LearningRate 0.0000 Epoch: 33 Global Step: 58070 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:59:45,264-Speed 24914.52 samples/sec Loss 1.3512 LearningRate 0.0000 Epoch: 33 Global Step: 58080 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 14:59:55,046-Speed 25125.34 samples/sec Loss 1.3606 LearningRate 0.0000 Epoch: 33 Global Step: 58090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:00:04,822-Speed 25141.13 samples/sec Loss 1.3499 LearningRate 0.0000 Epoch: 33 Global Step: 58100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:00:14,677-Speed 24941.51 samples/sec Loss 1.3526 LearningRate 0.0000 Epoch: 33 Global Step: 58110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:00:24,447-Speed 25159.31 samples/sec Loss 1.3568 LearningRate 0.0000 Epoch: 33 Global Step: 58120 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:00:34,254-Speed 25061.11 samples/sec Loss 1.3467 LearningRate 0.0000 Epoch: 33 Global Step: 58130 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:00:43,992-Speed 25241.95 samples/sec Loss 1.3500 LearningRate 0.0000 Epoch: 33 Global Step: 58140 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:00:53,881-Speed 24854.20 samples/sec Loss 1.3621 LearningRate 0.0000 Epoch: 33 Global Step: 58150 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:01:03,700-Speed 25040.00 samples/sec Loss 1.3619 LearningRate 0.0000 Epoch: 33 Global Step: 58160 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:01:13,439-Speed 25240.26 samples/sec Loss 1.3535 LearningRate 0.0000 Epoch: 33 Global Step: 58170 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:01:23,309-Speed 24904.57 samples/sec Loss 1.3490 LearningRate 0.0000 Epoch: 33 Global Step: 58180 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:01:33,167-Speed 24933.62 samples/sec Loss 1.3411 LearningRate 0.0000 Epoch: 33 Global Step: 58190 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:01:42,959-Speed 25104.99 samples/sec Loss 1.3468 LearningRate 0.0000 Epoch: 33 Global Step: 58200 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:01:52,703-Speed 25225.85 samples/sec Loss 1.3527 LearningRate 0.0000 Epoch: 33 Global Step: 58210 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:02:02,418-Speed 25305.38 samples/sec Loss 1.3470 LearningRate 0.0000 Epoch: 33 Global Step: 58220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:02:12,151-Speed 25254.83 samples/sec Loss 1.3560 LearningRate 0.0000 Epoch: 33 Global Step: 58230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:02:21,944-Speed 25097.34 samples/sec Loss 1.3554 LearningRate 0.0000 Epoch: 33 Global Step: 58240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:02:31,630-Speed 25375.53 samples/sec Loss 1.3462 LearningRate 0.0000 Epoch: 33 Global Step: 58250 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:02:41,402-Speed 25152.77 samples/sec Loss 1.3548 LearningRate 0.0000 Epoch: 33 Global Step: 58260 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:02:51,136-Speed 25251.96 samples/sec Loss 1.3495 LearningRate 0.0000 Epoch: 33 Global Step: 58270 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:03:00,876-Speed 25235.32 samples/sec Loss 1.3522 LearningRate 0.0000 Epoch: 33 Global Step: 58280 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:03:10,653-Speed 25147.10 samples/sec Loss 1.3536 LearningRate 0.0000 Epoch: 33 Global Step: 58290 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:03:20,347-Speed 25357.14 samples/sec Loss 1.3478 LearningRate 0.0000 Epoch: 33 Global Step: 58300 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:03:30,190-Speed 24972.26 samples/sec Loss 1.3541 LearningRate 0.0000 Epoch: 33 Global Step: 58310 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:03:39,959-Speed 25160.46 samples/sec Loss 1.3456 LearningRate 0.0000 Epoch: 33 Global Step: 58320 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:03:49,695-Speed 25246.15 samples/sec Loss 1.3534 LearningRate 0.0000 Epoch: 33 Global Step: 58330 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:03:59,480-Speed 25120.11 samples/sec Loss 1.3532 LearningRate 0.0000 Epoch: 33 Global Step: 58340 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:04:09,275-Speed 25094.41 samples/sec Loss 1.3532 LearningRate 0.0000 Epoch: 33 Global Step: 58350 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:04:19,110-Speed 24991.14 samples/sec Loss 1.3451 LearningRate 0.0000 Epoch: 33 Global Step: 58360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:04:28,869-Speed 25186.49 samples/sec Loss 1.3446 LearningRate 0.0000 Epoch: 33 Global Step: 58370 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:04:38,547-Speed 25395.88 samples/sec Loss 1.3545 LearningRate 0.0000 Epoch: 33 Global Step: 58380 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:04:48,349-Speed 25076.39 samples/sec Loss 1.3576 LearningRate 0.0000 Epoch: 33 Global Step: 58390 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:04:58,116-Speed 25167.35 samples/sec Loss 1.3491 LearningRate 0.0000 Epoch: 33 Global Step: 58400 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:05:07,889-Speed 25151.98 samples/sec Loss 1.3468 LearningRate 0.0000 Epoch: 33 Global Step: 58410 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:05:17,784-Speed 24839.70 samples/sec Loss 1.3478 LearningRate 0.0000 Epoch: 33 Global Step: 58420 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:05:27,717-Speed 24745.84 samples/sec Loss 1.3565 LearningRate 0.0000 Epoch: 33 Global Step: 58430 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:05:37,771-Speed 24446.22 samples/sec Loss 1.3533 LearningRate 0.0000 Epoch: 33 Global Step: 58440 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:05:47,826-Speed 24451.57 samples/sec Loss 1.3363 LearningRate 0.0000 Epoch: 33 Global Step: 58450 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:05:57,848-Speed 24525.60 samples/sec Loss 1.3449 LearningRate 0.0000 Epoch: 33 Global Step: 58460 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:06:07,958-Speed 24313.99 samples/sec Loss 1.3422 LearningRate 0.0000 Epoch: 33 Global Step: 58470 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:06:17,929-Speed 24653.17 samples/sec Loss 1.3436 LearningRate 0.0000 Epoch: 33 Global Step: 58480 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:06:28,180-Speed 23975.74 samples/sec Loss 1.3444 LearningRate 0.0000 Epoch: 33 Global Step: 58490 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:06:38,213-Speed 24500.03 samples/sec Loss 1.3543 LearningRate 0.0000 Epoch: 33 Global Step: 58500 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:06:48,260-Speed 24464.18 samples/sec Loss 1.3541 LearningRate 0.0000 Epoch: 33 Global Step: 58510 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:06:58,119-Speed 24929.21 samples/sec Loss 1.3493 LearningRate 0.0000 Epoch: 33 Global Step: 58520 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:07:07,906-Speed 25115.76 samples/sec Loss 1.3502 LearningRate 0.0000 Epoch: 33 Global Step: 58530 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:07:17,700-Speed 25097.42 samples/sec Loss 1.3425 LearningRate 0.0000 Epoch: 33 Global Step: 58540 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:07:27,484-Speed 25126.64 samples/sec Loss 1.3489 LearningRate 0.0000 Epoch: 33 Global Step: 58550 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:07:37,284-Speed 25081.57 samples/sec Loss 1.3445 LearningRate 0.0000 Epoch: 33 Global Step: 58560 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:07:46,997-Speed 25307.28 samples/sec Loss 1.3401 LearningRate 0.0000 Epoch: 33 Global Step: 58570 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:07:56,682-Speed 25378.17 samples/sec Loss 1.3419 LearningRate 0.0000 Epoch: 33 Global Step: 58580 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:08:06,478-Speed 25093.73 samples/sec Loss 1.3435 LearningRate 0.0000 Epoch: 33 Global Step: 58590 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:08:16,368-Speed 24852.42 samples/sec Loss 1.3426 LearningRate 0.0000 Epoch: 33 Global Step: 58600 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:08:26,185-Speed 25037.75 samples/sec Loss 1.3461 LearningRate 0.0000 Epoch: 33 Global Step: 58610 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:08:35,967-Speed 25128.35 samples/sec Loss 1.3407 LearningRate 0.0000 Epoch: 33 Global Step: 58620 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:08:45,890-Speed 24775.83 samples/sec Loss 1.3468 LearningRate 0.0000 Epoch: 33 Global Step: 58630 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:08:55,714-Speed 25019.54 samples/sec Loss 1.3596 LearningRate 0.0000 Epoch: 33 Global Step: 58640 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:09:05,558-Speed 24968.04 samples/sec Loss 1.3410 LearningRate 0.0000 Epoch: 33 Global Step: 58650 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:09:15,292-Speed 25249.82 samples/sec Loss 1.3444 LearningRate 0.0000 Epoch: 33 Global Step: 58660 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:09:25,095-Speed 25074.21 samples/sec Loss 1.3483 LearningRate 0.0000 Epoch: 33 Global Step: 58670 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:09:34,984-Speed 24854.99 samples/sec Loss 1.3380 LearningRate 0.0000 Epoch: 33 Global Step: 58680 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:09:44,840-Speed 24940.13 samples/sec Loss 1.3391 LearningRate 0.0000 Epoch: 33 Global Step: 58690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:09:54,724-Speed 24866.91 samples/sec Loss 1.3559 LearningRate 0.0000 Epoch: 33 Global Step: 58700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:10:04,526-Speed 25077.08 samples/sec Loss 1.3518 LearningRate 0.0000 Epoch: 33 Global Step: 58710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:10:14,400-Speed 24892.06 samples/sec Loss 1.3567 LearningRate 0.0000 Epoch: 33 Global Step: 58720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:10:24,253-Speed 24947.39 samples/sec Loss 1.3516 LearningRate 0.0000 Epoch: 33 Global Step: 58730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:10:34,024-Speed 25155.44 samples/sec Loss 1.3497 LearningRate 0.0000 Epoch: 33 Global Step: 58740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:10:44,114-Speed 24360.69 samples/sec Loss 1.3410 LearningRate 0.0000 Epoch: 33 Global Step: 58750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:10:54,051-Speed 24735.59 samples/sec Loss 1.3487 LearningRate 0.0000 Epoch: 33 Global Step: 58760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:11:53,444-Speed 4138.01 samples/sec Loss 1.3388 LearningRate 0.0000 Epoch: 34 Global Step: 58770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:12:03,199-Speed 25194.83 samples/sec Loss 1.3407 LearningRate 0.0000 Epoch: 34 Global Step: 58780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:12:12,948-Speed 25213.56 samples/sec Loss 1.3337 LearningRate 0.0000 Epoch: 34 Global Step: 58790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:12:22,679-Speed 25258.69 samples/sec Loss 1.3392 LearningRate 0.0000 Epoch: 34 Global Step: 58800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:12:32,383-Speed 25330.04 samples/sec Loss 1.3354 LearningRate 0.0000 Epoch: 34 Global Step: 58810 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:12:42,286-Speed 24818.82 samples/sec Loss 1.3438 LearningRate 0.0000 Epoch: 34 Global Step: 58820 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:12:52,026-Speed 25235.77 samples/sec Loss 1.3403 LearningRate 0.0000 Epoch: 34 Global Step: 58830 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:13:01,861-Speed 24991.26 samples/sec Loss 1.3418 LearningRate 0.0000 Epoch: 34 Global Step: 58840 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:13:11,790-Speed 24754.40 samples/sec Loss 1.3427 LearningRate 0.0000 Epoch: 34 Global Step: 58850 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:13:21,747-Speed 24686.25 samples/sec Loss 1.3410 LearningRate 0.0000 Epoch: 34 Global Step: 58860 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:13:31,709-Speed 24672.53 samples/sec Loss 1.3364 LearningRate 0.0000 Epoch: 34 Global Step: 58870 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:13:41,607-Speed 24831.64 samples/sec Loss 1.3369 LearningRate 0.0000 Epoch: 34 Global Step: 58880 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:13:51,531-Speed 24769.43 samples/sec Loss 1.3305 LearningRate 0.0000 Epoch: 34 Global Step: 58890 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:14:01,432-Speed 24825.63 samples/sec Loss 1.3413 LearningRate 0.0000 Epoch: 34 Global Step: 58900 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:14:11,398-Speed 24662.66 samples/sec Loss 1.3428 LearningRate 0.0000 Epoch: 34 Global Step: 58910 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:14:21,361-Speed 24670.33 samples/sec Loss 1.3438 LearningRate 0.0000 Epoch: 34 Global Step: 58920 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:14:31,286-Speed 24765.71 samples/sec Loss 1.3478 LearningRate 0.0000 Epoch: 34 Global Step: 58930 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:14:41,221-Speed 24742.60 samples/sec Loss 1.3371 LearningRate 0.0000 Epoch: 34 Global Step: 58940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:14:51,164-Speed 24720.43 samples/sec Loss 1.3429 LearningRate 0.0000 Epoch: 34 Global Step: 58950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:15:01,088-Speed 24767.26 samples/sec Loss 1.3336 LearningRate 0.0000 Epoch: 34 Global Step: 58960 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:15:10,981-Speed 24845.98 samples/sec Loss 1.3326 LearningRate 0.0000 Epoch: 34 Global Step: 58970 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:15:20,931-Speed 24703.20 samples/sec Loss 1.3309 LearningRate 0.0000 Epoch: 34 Global Step: 58980 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:15:30,891-Speed 24678.01 samples/sec Loss 1.3489 LearningRate 0.0000 Epoch: 34 Global Step: 58990 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:15:40,880-Speed 24605.73 samples/sec Loss 1.3304 LearningRate 0.0000 Epoch: 34 Global Step: 59000 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:15:50,787-Speed 24809.64 samples/sec Loss 1.3320 LearningRate 0.0000 Epoch: 34 Global Step: 59010 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:16:00,714-Speed 24761.17 samples/sec Loss 1.3345 LearningRate 0.0000 Epoch: 34 Global Step: 59020 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:16:10,619-Speed 24812.96 samples/sec Loss 1.3379 LearningRate 0.0000 Epoch: 34 Global Step: 59030 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:16:20,545-Speed 24762.97 samples/sec Loss 1.3183 LearningRate 0.0000 Epoch: 34 Global Step: 59040 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:16:30,478-Speed 24744.84 samples/sec Loss 1.3328 LearningRate 0.0000 Epoch: 34 Global Step: 59050 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:16:40,355-Speed 24893.62 samples/sec Loss 1.3431 LearningRate 0.0000 Epoch: 34 Global Step: 59060 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:16:50,294-Speed 24732.00 samples/sec Loss 1.3443 LearningRate 0.0000 Epoch: 34 Global Step: 59070 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:17:00,155-Speed 24925.81 samples/sec Loss 1.3442 LearningRate 0.0000 Epoch: 34 Global Step: 59080 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:17:10,141-Speed 24614.27 samples/sec Loss 1.3458 LearningRate 0.0000 Epoch: 34 Global Step: 59090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:17:20,130-Speed 24604.07 samples/sec Loss 1.3387 LearningRate 0.0000 Epoch: 34 Global Step: 59100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:17:30,215-Speed 24372.40 samples/sec Loss 1.3338 LearningRate 0.0000 Epoch: 34 Global Step: 59110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:17:40,156-Speed 24724.57 samples/sec Loss 1.3453 LearningRate 0.0000 Epoch: 34 Global Step: 59120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:17:50,106-Speed 24702.94 samples/sec Loss 1.3403 LearningRate 0.0000 Epoch: 34 Global Step: 59130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:18:00,002-Speed 24835.90 samples/sec Loss 1.3358 LearningRate 0.0000 Epoch: 34 Global Step: 59140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:18:09,924-Speed 24775.39 samples/sec Loss 1.3388 LearningRate 0.0000 Epoch: 34 Global Step: 59150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:18:19,785-Speed 24924.15 samples/sec Loss 1.3320 LearningRate 0.0000 Epoch: 34 Global Step: 59160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:18:29,677-Speed 24847.23 samples/sec Loss 1.3310 LearningRate 0.0000 Epoch: 34 Global Step: 59170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:18:39,588-Speed 24799.74 samples/sec Loss 1.3373 LearningRate 0.0000 Epoch: 34 Global Step: 59180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:18:49,473-Speed 24865.78 samples/sec Loss 1.3339 LearningRate 0.0000 Epoch: 34 Global Step: 59190 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:18:59,373-Speed 24828.59 samples/sec Loss 1.3384 LearningRate 0.0000 Epoch: 34 Global Step: 59200 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:19:09,318-Speed 24715.37 samples/sec Loss 1.3386 LearningRate 0.0000 Epoch: 34 Global Step: 59210 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:19:19,326-Speed 24557.94 samples/sec Loss 1.3327 LearningRate 0.0000 Epoch: 34 Global Step: 59220 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:19:29,203-Speed 24887.75 samples/sec Loss 1.3376 LearningRate 0.0000 Epoch: 34 Global Step: 59230 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:19:39,127-Speed 24766.82 samples/sec Loss 1.3296 LearningRate 0.0000 Epoch: 34 Global Step: 59240 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:19:49,091-Speed 24666.10 samples/sec Loss 1.3411 LearningRate 0.0000 Epoch: 34 Global Step: 59250 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:19:58,936-Speed 24967.88 samples/sec Loss 1.3424 LearningRate 0.0000 Epoch: 34 Global Step: 59260 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:20:08,804-Speed 24911.64 samples/sec Loss 1.3457 LearningRate 0.0000 Epoch: 34 Global Step: 59270 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:20:18,585-Speed 25130.86 samples/sec Loss 1.3325 LearningRate 0.0000 Epoch: 34 Global Step: 59280 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:20:28,309-Speed 25274.95 samples/sec Loss 1.3402 LearningRate 0.0000 Epoch: 34 Global Step: 59290 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:20:38,277-Speed 24657.62 samples/sec Loss 1.3360 LearningRate 0.0000 Epoch: 34 Global Step: 59300 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:20:48,263-Speed 24613.43 samples/sec Loss 1.3361 LearningRate 0.0000 Epoch: 34 Global Step: 59310 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:20:58,243-Speed 24628.18 samples/sec Loss 1.3412 LearningRate 0.0000 Epoch: 34 Global Step: 59320 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:21:08,268-Speed 24521.46 samples/sec Loss 1.3363 LearningRate 0.0000 Epoch: 34 Global Step: 59330 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:21:18,281-Speed 24548.03 samples/sec Loss 1.3263 LearningRate 0.0000 Epoch: 34 Global Step: 59340 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:21:28,276-Speed 24590.41 samples/sec Loss 1.3377 LearningRate 0.0000 Epoch: 34 Global Step: 59350 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:21:38,286-Speed 24557.38 samples/sec Loss 1.3349 LearningRate 0.0000 Epoch: 34 Global Step: 59360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:21:48,285-Speed 24582.01 samples/sec Loss 1.3294 LearningRate 0.0000 Epoch: 34 Global Step: 59370 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:21:58,306-Speed 24531.71 samples/sec Loss 1.3470 LearningRate 0.0000 Epoch: 34 Global Step: 59380 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:22:08,452-Speed 24225.44 samples/sec Loss 1.3275 LearningRate 0.0000 Epoch: 34 Global Step: 59390 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:22:18,432-Speed 24630.12 samples/sec Loss 1.3343 LearningRate 0.0000 Epoch: 34 Global Step: 59400 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:22:28,565-Speed 24256.67 samples/sec Loss 1.3395 LearningRate 0.0000 Epoch: 34 Global Step: 59410 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:22:38,545-Speed 24633.42 samples/sec Loss 1.3360 LearningRate 0.0000 Epoch: 34 Global Step: 59420 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:22:48,590-Speed 24470.23 samples/sec Loss 1.3396 LearningRate 0.0000 Epoch: 34 Global Step: 59430 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:22:58,629-Speed 24490.60 samples/sec Loss 1.3299 LearningRate 0.0000 Epoch: 34 Global Step: 59440 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:23:08,723-Speed 24350.91 samples/sec Loss 1.3321 LearningRate 0.0000 Epoch: 34 Global Step: 59450 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:23:18,839-Speed 24298.49 samples/sec Loss 1.3287 LearningRate 0.0000 Epoch: 34 Global Step: 59460 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:23:28,947-Speed 24322.79 samples/sec Loss 1.3416 LearningRate 0.0000 Epoch: 34 Global Step: 59470 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:23:39,021-Speed 24398.01 samples/sec Loss 1.3515 LearningRate 0.0000 Epoch: 34 Global Step: 59480 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:23:49,140-Speed 24293.73 samples/sec Loss 1.3362 LearningRate 0.0000 Epoch: 34 Global Step: 59490 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:23:59,182-Speed 24476.13 samples/sec Loss 1.3258 LearningRate 0.0000 Epoch: 34 Global Step: 59500 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:24:09,465-Speed 23903.01 samples/sec Loss 1.3223 LearningRate 0.0000 Epoch: 34 Global Step: 59510 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:24:19,510-Speed 24468.74 samples/sec Loss 1.3283 LearningRate 0.0000 Epoch: 34 Global Step: 59520 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:24:29,546-Speed 24491.60 samples/sec Loss 1.3273 LearningRate 0.0000 Epoch: 34 Global Step: 59530 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:24:39,517-Speed 24652.56 samples/sec Loss 1.3412 LearningRate 0.0000 Epoch: 34 Global Step: 59540 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:24:49,566-Speed 24459.18 samples/sec Loss 1.3297 LearningRate 0.0000 Epoch: 34 Global Step: 59550 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:24:59,777-Speed 24076.62 samples/sec Loss 1.3317 LearningRate 0.0000 Epoch: 34 Global Step: 59560 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:25:09,780-Speed 24576.66 samples/sec Loss 1.3322 LearningRate 0.0000 Epoch: 34 Global Step: 59570 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:25:19,784-Speed 24568.52 samples/sec Loss 1.3271 LearningRate 0.0000 Epoch: 34 Global Step: 59580 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:25:29,765-Speed 24628.30 samples/sec Loss 1.3356 LearningRate 0.0000 Epoch: 34 Global Step: 59590 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:25:39,787-Speed 24525.45 samples/sec Loss 1.3293 LearningRate 0.0000 Epoch: 34 Global Step: 59600 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:25:49,788-Speed 24576.44 samples/sec Loss 1.3258 LearningRate 0.0000 Epoch: 34 Global Step: 59610 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:25:59,886-Speed 24340.09 samples/sec Loss 1.3247 LearningRate 0.0000 Epoch: 34 Global Step: 59620 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:26:09,897-Speed 24552.37 samples/sec Loss 1.3268 LearningRate 0.0000 Epoch: 34 Global Step: 59630 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:26:19,960-Speed 24427.43 samples/sec Loss 1.3327 LearningRate 0.0000 Epoch: 34 Global Step: 59640 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:26:30,009-Speed 24458.19 samples/sec Loss 1.3249 LearningRate 0.0000 Epoch: 34 Global Step: 59650 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:26:40,033-Speed 24522.10 samples/sec Loss 1.3268 LearningRate 0.0000 Epoch: 34 Global Step: 59660 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:26:50,150-Speed 24293.74 samples/sec Loss 1.3235 LearningRate 0.0000 Epoch: 34 Global Step: 59670 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:27:00,193-Speed 24476.26 samples/sec Loss 1.3184 LearningRate 0.0000 Epoch: 34 Global Step: 59680 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:27:10,272-Speed 24387.05 samples/sec Loss 1.3291 LearningRate 0.0000 Epoch: 34 Global Step: 59690 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:27:20,326-Speed 24445.97 samples/sec Loss 1.3278 LearningRate 0.0000 Epoch: 34 Global Step: 59700 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:27:30,336-Speed 24556.68 samples/sec Loss 1.3328 LearningRate 0.0000 Epoch: 34 Global Step: 59710 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:27:40,359-Speed 24521.06 samples/sec Loss 1.3197 LearningRate 0.0000 Epoch: 34 Global Step: 59720 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:27:50,340-Speed 24626.89 samples/sec Loss 1.3233 LearningRate 0.0000 Epoch: 34 Global Step: 59730 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:28:00,230-Speed 24852.18 samples/sec Loss 1.3286 LearningRate 0.0000 Epoch: 34 Global Step: 59740 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:28:10,007-Speed 25139.04 samples/sec Loss 1.3176 LearningRate 0.0000 Epoch: 34 Global Step: 59750 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:28:20,017-Speed 24555.39 samples/sec Loss 1.3261 LearningRate 0.0000 Epoch: 34 Global Step: 59760 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:28:29,867-Speed 24952.55 samples/sec Loss 1.3302 LearningRate 0.0000 Epoch: 34 Global Step: 59770 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:28:39,657-Speed 25106.71 samples/sec Loss 1.3320 LearningRate 0.0000 Epoch: 34 Global Step: 59780 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:28:49,547-Speed 24851.62 samples/sec Loss 1.3260 LearningRate 0.0000 Epoch: 34 Global Step: 59790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:28:59,375-Speed 25009.22 samples/sec Loss 1.3161 LearningRate 0.0000 Epoch: 34 Global Step: 59800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:29:09,339-Speed 24667.39 samples/sec Loss 1.3294 LearningRate 0.0000 Epoch: 34 Global Step: 59810 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:29:19,267-Speed 24757.08 samples/sec Loss 1.3096 LearningRate 0.0000 Epoch: 34 Global Step: 59820 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:29:29,157-Speed 24859.70 samples/sec Loss 1.3254 LearningRate 0.0000 Epoch: 34 Global Step: 59830 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:29:39,100-Speed 24720.65 samples/sec Loss 1.3267 LearningRate 0.0000 Epoch: 34 Global Step: 59840 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:29:48,999-Speed 24829.24 samples/sec Loss 1.3287 LearningRate 0.0000 Epoch: 34 Global Step: 59850 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:29:58,975-Speed 24639.14 samples/sec Loss 1.3279 LearningRate 0.0000 Epoch: 34 Global Step: 59860 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:30:08,938-Speed 24670.51 samples/sec Loss 1.3195 LearningRate 0.0000 Epoch: 34 Global Step: 59870 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:30:18,826-Speed 24856.95 samples/sec Loss 1.3222 LearningRate 0.0000 Epoch: 34 Global Step: 59880 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:30:28,750-Speed 24767.24 samples/sec Loss 1.3266 LearningRate 0.0000 Epoch: 34 Global Step: 59890 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:30:38,644-Speed 24843.52 samples/sec Loss 1.3250 LearningRate 0.0000 Epoch: 34 Global Step: 59900 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:30:48,626-Speed 24623.32 samples/sec Loss 1.3258 LearningRate 0.0000 Epoch: 34 Global Step: 59910 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:30:58,519-Speed 24846.45 samples/sec Loss 1.3295 LearningRate 0.0000 Epoch: 34 Global Step: 59920 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:31:08,457-Speed 24732.42 samples/sec Loss 1.3292 LearningRate 0.0000 Epoch: 34 Global Step: 59930 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:31:18,336-Speed 24879.54 samples/sec Loss 1.3192 LearningRate 0.0000 Epoch: 34 Global Step: 59940 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:31:28,341-Speed 24568.87 samples/sec Loss 1.3254 LearningRate 0.0000 Epoch: 34 Global Step: 59950 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:31:38,283-Speed 24724.54 samples/sec Loss 1.3184 LearningRate 0.0000 Epoch: 34 Global Step: 59960 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:31:48,237-Speed 24698.26 samples/sec Loss 1.3190 LearningRate 0.0000 Epoch: 34 Global Step: 59970 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:31:58,112-Speed 24897.37 samples/sec Loss 1.3209 LearningRate 0.0000 Epoch: 34 Global Step: 59980 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:32:08,021-Speed 24804.94 samples/sec Loss 1.3271 LearningRate 0.0000 Epoch: 34 Global Step: 59990 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:32:17,965-Speed 24720.18 samples/sec Loss 1.3254 LearningRate 0.0000 Epoch: 34 Global Step: 60000 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:32:27,892-Speed 24760.12 samples/sec Loss 1.3250 LearningRate 0.0000 Epoch: 34 Global Step: 60010 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:32:37,887-Speed 24592.42 samples/sec Loss 1.3219 LearningRate 0.0000 Epoch: 34 Global Step: 60020 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:32:47,824-Speed 24734.06 samples/sec Loss 1.3177 LearningRate 0.0000 Epoch: 34 Global Step: 60030 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:32:57,738-Speed 24794.86 samples/sec Loss 1.3291 LearningRate 0.0000 Epoch: 34 Global Step: 60040 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:33:07,821-Speed 24377.60 samples/sec Loss 1.3262 LearningRate 0.0000 Epoch: 34 Global Step: 60050 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:33:17,990-Speed 24170.40 samples/sec Loss 1.3108 LearningRate 0.0000 Epoch: 34 Global Step: 60060 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:33:28,167-Speed 24150.98 samples/sec Loss 1.3239 LearningRate 0.0000 Epoch: 34 Global Step: 60070 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:33:38,132-Speed 24667.31 samples/sec Loss 1.3284 LearningRate 0.0000 Epoch: 34 Global Step: 60080 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:33:48,024-Speed 24845.67 samples/sec Loss 1.3227 LearningRate 0.0000 Epoch: 34 Global Step: 60090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:33:58,029-Speed 24568.25 samples/sec Loss 1.3098 LearningRate 0.0000 Epoch: 34 Global Step: 60100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-26 15:34:08,169-Speed 24238.56 samples/sec Loss 1.3353 LearningRate 0.0000 Epoch: 34 Global Step: 60110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:34:18,204-Speed 24494.30 samples/sec Loss 1.3174 LearningRate 0.0000 Epoch: 34 Global Step: 60120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:34:28,351-Speed 24225.14 samples/sec Loss 1.3219 LearningRate 0.0000 Epoch: 34 Global Step: 60130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:34:38,412-Speed 24429.68 samples/sec Loss 1.3144 LearningRate 0.0000 Epoch: 34 Global Step: 60140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:34:48,475-Speed 24426.38 samples/sec Loss 1.3126 LearningRate 0.0000 Epoch: 34 Global Step: 60150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:34:58,558-Speed 24383.61 samples/sec Loss 1.3205 LearningRate 0.0000 Epoch: 34 Global Step: 60160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:35:08,654-Speed 24345.10 samples/sec Loss 1.3261 LearningRate 0.0000 Epoch: 34 Global Step: 60170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:35:18,826-Speed 24163.28 samples/sec Loss 1.3134 LearningRate 0.0000 Epoch: 34 Global Step: 60180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-26 15:35:28,854-Speed 24512.12 samples/sec Loss 1.3185 LearningRate 0.0000 Epoch: 34 Global Step: 60190 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:35:38,785-Speed 24749.01 samples/sec Loss 1.3269 LearningRate 0.0000 Epoch: 34 Global Step: 60200 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:35:48,635-Speed 24955.21 samples/sec Loss 1.3140 LearningRate 0.0000 Epoch: 34 Global Step: 60210 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:35:58,555-Speed 24778.78 samples/sec Loss 1.3320 LearningRate 0.0000 Epoch: 34 Global Step: 60220 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:36:08,461-Speed 24813.14 samples/sec Loss 1.3108 LearningRate 0.0000 Epoch: 34 Global Step: 60230 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:36:18,363-Speed 24824.35 samples/sec Loss 1.3244 LearningRate 0.0000 Epoch: 34 Global Step: 60240 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:36:28,303-Speed 24726.75 samples/sec Loss 1.3265 LearningRate 0.0000 Epoch: 34 Global Step: 60250 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-26 15:36:38,327-Speed 24521.80 samples/sec Loss 1.3252 LearningRate 0.0000 Epoch: 34 Global Step: 60260 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-26 15:36:48,295-Speed 24667.76 samples/sec Loss 1.3258 LearningRate 0.0000 Epoch: 34 Global Step: 60270 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-26 15:36:58,254-Speed 24679.03 samples/sec Loss 1.3296 LearningRate 0.0000 Epoch: 34 Global Step: 60280 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-26 15:37:08,212-Speed 24683.05 samples/sec Loss 1.3207 LearningRate 0.0000 Epoch: 34 Global Step: 60290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:37:18,133-Speed 24775.98 samples/sec Loss 1.3221 LearningRate 0.0000 Epoch: 34 Global Step: 60300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:37:28,153-Speed 24530.73 samples/sec Loss 1.3217 LearningRate 0.0000 Epoch: 34 Global Step: 60310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:37:38,185-Speed 24504.63 samples/sec Loss 1.3259 LearningRate 0.0000 Epoch: 34 Global Step: 60320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:37:48,210-Speed 24517.77 samples/sec Loss 1.3228 LearningRate 0.0000 Epoch: 34 Global Step: 60330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:37:58,147-Speed 24736.71 samples/sec Loss 1.3101 LearningRate 0.0000 Epoch: 34 Global Step: 60340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:38:08,090-Speed 24724.56 samples/sec Loss 1.3250 LearningRate 0.0000 Epoch: 34 Global Step: 60350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:38:18,011-Speed 24773.85 samples/sec Loss 1.3140 LearningRate 0.0000 Epoch: 34 Global Step: 60360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:38:27,907-Speed 24837.70 samples/sec Loss 1.3293 LearningRate 0.0000 Epoch: 34 Global Step: 60370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:38:37,821-Speed 24799.22 samples/sec Loss 1.3170 LearningRate 0.0000 Epoch: 34 Global Step: 60380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:38:47,708-Speed 24859.56 samples/sec Loss 1.3130 LearningRate 0.0000 Epoch: 34 Global Step: 60390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:38:57,627-Speed 24783.20 samples/sec Loss 1.3203 LearningRate 0.0000 Epoch: 34 Global Step: 60400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:39:07,632-Speed 24567.61 samples/sec Loss 1.3168 LearningRate 0.0000 Epoch: 34 Global Step: 60410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:39:17,539-Speed 24813.12 samples/sec Loss 1.3224 LearningRate 0.0000 Epoch: 34 Global Step: 60420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:39:27,487-Speed 24708.26 samples/sec Loss 1.3204 LearningRate 0.0000 Epoch: 34 Global Step: 60430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:39:37,415-Speed 24758.39 samples/sec Loss 1.3154 LearningRate 0.0000 Epoch: 34 Global Step: 60440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:39:47,322-Speed 24811.76 samples/sec Loss 1.3193 LearningRate 0.0000 Epoch: 34 Global Step: 60450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:39:57,234-Speed 24796.74 samples/sec Loss 1.3241 LearningRate 0.0000 Epoch: 34 Global Step: 60460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:40:07,224-Speed 24605.38 samples/sec Loss 1.3258 LearningRate 0.0000 Epoch: 34 Global Step: 60470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:40:17,153-Speed 24756.36 samples/sec Loss 1.3321 LearningRate 0.0000 Epoch: 34 Global Step: 60480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:40:27,115-Speed 24672.88 samples/sec Loss 1.3112 LearningRate 0.0000 Epoch: 34 Global Step: 60490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:41:26,120-Speed 4165.12 samples/sec Loss 1.3190 LearningRate 0.0000 Epoch: 35 Global Step: 60500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:41:35,987-Speed 24910.85 samples/sec Loss 1.3141 LearningRate 0.0000 Epoch: 35 Global Step: 60510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:41:45,899-Speed 24797.93 samples/sec Loss 1.3173 LearningRate 0.0000 Epoch: 35 Global Step: 60520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:41:55,813-Speed 24791.89 samples/sec Loss 1.3117 LearningRate 0.0000 Epoch: 35 Global Step: 60530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:42:05,800-Speed 24611.80 samples/sec Loss 1.3116 LearningRate 0.0000 Epoch: 35 Global Step: 60540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:42:15,762-Speed 24673.52 samples/sec Loss 1.3220 LearningRate 0.0000 Epoch: 35 Global Step: 60550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:42:25,720-Speed 24683.44 samples/sec Loss 1.3140 LearningRate 0.0000 Epoch: 35 Global Step: 60560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:42:35,786-Speed 24418.64 samples/sec Loss 1.3143 LearningRate 0.0000 Epoch: 35 Global Step: 60570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:42:45,743-Speed 24690.71 samples/sec Loss 1.3115 LearningRate 0.0000 Epoch: 35 Global Step: 60580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:42:55,710-Speed 24660.69 samples/sec Loss 1.3185 LearningRate 0.0000 Epoch: 35 Global Step: 60590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:43:05,633-Speed 24770.66 samples/sec Loss 1.3183 LearningRate 0.0000 Epoch: 35 Global Step: 60600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:43:15,559-Speed 24763.32 samples/sec Loss 1.3036 LearningRate 0.0000 Epoch: 35 Global Step: 60610 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-26 15:43:25,521-Speed 24671.32 samples/sec Loss 1.3283 LearningRate 0.0000 Epoch: 35 Global Step: 60620 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-26 15:43:35,407-Speed 24863.14 samples/sec Loss 1.3119 LearningRate 0.0000 Epoch: 35 Global Step: 60630 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-26 15:43:45,380-Speed 24644.92 samples/sec Loss 1.3199 LearningRate 0.0000 Epoch: 35 Global Step: 60640 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-26 15:43:55,396-Speed 24539.15 samples/sec Loss 1.3160 LearningRate 0.0000 Epoch: 35 Global Step: 60650 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-26 15:44:05,305-Speed 24809.54 samples/sec Loss 1.3184 LearningRate 0.0000 Epoch: 35 Global Step: 60660 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-26 15:44:15,204-Speed 24831.44 samples/sec Loss 1.3100 LearningRate 0.0000 Epoch: 35 Global Step: 60670 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-26 15:44:25,080-Speed 24887.03 samples/sec Loss 1.3168 LearningRate 0.0000 Epoch: 35 Global Step: 60680 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-26 15:44:35,002-Speed 24771.85 samples/sec Loss 1.3081 LearningRate 0.0000 Epoch: 35 Global Step: 60690 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-26 15:44:44,919-Speed 24786.88 samples/sec Loss 1.3102 LearningRate 0.0000 Epoch: 35 Global Step: 60700 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-26 15:44:54,896-Speed 24634.16 samples/sec Loss 1.3160 LearningRate 0.0000 Epoch: 35 Global Step: 60710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:45:04,842-Speed 24713.24 samples/sec Loss 1.3079 LearningRate 0.0000 Epoch: 35 Global Step: 60720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:45:14,782-Speed 24728.16 samples/sec Loss 1.3058 LearningRate 0.0000 Epoch: 35 Global Step: 60730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:45:24,731-Speed 24705.35 samples/sec Loss 1.3133 LearningRate 0.0000 Epoch: 35 Global Step: 60740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:45:34,675-Speed 24717.31 samples/sec Loss 1.3052 LearningRate 0.0000 Epoch: 35 Global Step: 60750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:45:44,665-Speed 24604.77 samples/sec Loss 1.3123 LearningRate 0.0000 Epoch: 35 Global Step: 60760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:45:54,672-Speed 24563.14 samples/sec Loss 1.3098 LearningRate 0.0000 Epoch: 35 Global Step: 60770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:46:04,606-Speed 24741.84 samples/sec Loss 1.3142 LearningRate 0.0000 Epoch: 35 Global Step: 60780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:46:14,506-Speed 24827.89 samples/sec Loss 1.3240 LearningRate 0.0000 Epoch: 35 Global Step: 60790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:46:24,428-Speed 24772.11 samples/sec Loss 1.3140 LearningRate 0.0000 Epoch: 35 Global Step: 60800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:46:34,331-Speed 24820.94 samples/sec Loss 1.3169 LearningRate 0.0000 Epoch: 35 Global Step: 60810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:46:44,472-Speed 24237.04 samples/sec Loss 1.3167 LearningRate 0.0000 Epoch: 35 Global Step: 60820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:46:54,490-Speed 24537.49 samples/sec Loss 1.3117 LearningRate 0.0000 Epoch: 35 Global Step: 60830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:47:04,471-Speed 24624.54 samples/sec Loss 1.3118 LearningRate 0.0000 Epoch: 35 Global Step: 60840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:47:14,345-Speed 24899.68 samples/sec Loss 1.3107 LearningRate 0.0000 Epoch: 35 Global Step: 60850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:47:24,283-Speed 24733.15 samples/sec Loss 1.3089 LearningRate 0.0000 Epoch: 35 Global Step: 60860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:47:34,202-Speed 24778.91 samples/sec Loss 1.3084 LearningRate 0.0000 Epoch: 35 Global Step: 60870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:47:44,164-Speed 24674.19 samples/sec Loss 1.3121 LearningRate 0.0000 Epoch: 35 Global Step: 60880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:47:54,100-Speed 24739.12 samples/sec Loss 1.3092 LearningRate 0.0000 Epoch: 35 Global Step: 60890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:48:04,036-Speed 24737.24 samples/sec Loss 1.3071 LearningRate 0.0000 Epoch: 35 Global Step: 60900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:48:13,973-Speed 24735.23 samples/sec Loss 1.3217 LearningRate 0.0000 Epoch: 35 Global Step: 60910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:48:23,907-Speed 24742.85 samples/sec Loss 1.3133 LearningRate 0.0000 Epoch: 35 Global Step: 60920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:48:33,889-Speed 24624.95 samples/sec Loss 1.3103 LearningRate 0.0000 Epoch: 35 Global Step: 60930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:48:43,870-Speed 24626.01 samples/sec Loss 1.3168 LearningRate 0.0000 Epoch: 35 Global Step: 60940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:48:53,832-Speed 24672.79 samples/sec Loss 1.3149 LearningRate 0.0000 Epoch: 35 Global Step: 60950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:49:03,797-Speed 24666.15 samples/sec Loss 1.3186 LearningRate 0.0000 Epoch: 35 Global Step: 60960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:49:13,724-Speed 24760.69 samples/sec Loss 1.3121 LearningRate 0.0000 Epoch: 35 Global Step: 60970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:49:23,655-Speed 24747.97 samples/sec Loss 1.3095 LearningRate 0.0000 Epoch: 35 Global Step: 60980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:49:33,593-Speed 24733.09 samples/sec Loss 1.3205 LearningRate 0.0000 Epoch: 35 Global Step: 60990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:49:43,497-Speed 24816.92 samples/sec Loss 1.3163 LearningRate 0.0000 Epoch: 35 Global Step: 61000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:49:53,375-Speed 24881.81 samples/sec Loss 1.3151 LearningRate 0.0000 Epoch: 35 Global Step: 61010 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:50:03,401-Speed 24514.88 samples/sec Loss 1.3141 LearningRate 0.0000 Epoch: 35 Global Step: 61020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:50:13,416-Speed 24545.23 samples/sec Loss 1.3128 LearningRate 0.0000 Epoch: 35 Global Step: 61030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:50:23,195-Speed 25140.76 samples/sec Loss 1.3105 LearningRate 0.0000 Epoch: 35 Global Step: 61040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:50:32,916-Speed 25284.58 samples/sec Loss 1.3076 LearningRate 0.0000 Epoch: 35 Global Step: 61050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:50:42,717-Speed 25078.46 samples/sec Loss 1.3087 LearningRate 0.0000 Epoch: 35 Global Step: 61060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:50:52,410-Speed 25358.56 samples/sec Loss 1.3058 LearningRate 0.0000 Epoch: 35 Global Step: 61070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:51:02,150-Speed 25234.20 samples/sec Loss 1.3125 LearningRate 0.0000 Epoch: 35 Global Step: 61080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:51:11,994-Speed 24970.29 samples/sec Loss 1.3070 LearningRate 0.0000 Epoch: 35 Global Step: 61090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:51:21,781-Speed 25113.43 samples/sec Loss 1.3138 LearningRate 0.0000 Epoch: 35 Global Step: 61100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:51:31,727-Speed 24711.37 samples/sec Loss 1.3094 LearningRate 0.0000 Epoch: 35 Global Step: 61110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-26 15:51:41,435-Speed 25319.79 samples/sec Loss 1.3042 LearningRate 0.0000 Epoch: 35 Global Step: 61120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:51:51,141-Speed 25325.72 samples/sec Loss 1.3088 LearningRate 0.0000 Epoch: 35 Global Step: 61130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:52:00,894-Speed 25204.39 samples/sec Loss 1.3041 LearningRate 0.0000 Epoch: 35 Global Step: 61140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:52:10,583-Speed 25369.36 samples/sec Loss 1.3071 LearningRate 0.0000 Epoch: 35 Global Step: 61150 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-26 15:52:20,317-Speed 25250.87 samples/sec Loss 1.3108 LearningRate 0.0000 Epoch: 35 Global Step: 61160 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-26 15:52:30,075-Speed 25190.35 samples/sec Loss 1.3104 LearningRate 0.0000 Epoch: 35 Global Step: 61170 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-26 15:52:39,981-Speed 24812.59 samples/sec Loss 1.3122 LearningRate 0.0000 Epoch: 35 Global Step: 61180 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-26 15:52:49,827-Speed 24964.50 samples/sec Loss 1.3071 LearningRate 0.0000 Epoch: 35 Global Step: 61190 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-26 15:52:59,599-Speed 25152.52 samples/sec Loss 1.3098 LearningRate 0.0000 Epoch: 35 Global Step: 61200 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-26 15:53:09,357-Speed 25189.27 samples/sec Loss 1.3097 LearningRate 0.0000 Epoch: 35 Global Step: 61210 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-26 15:53:19,185-Speed 25009.16 samples/sec Loss 1.3149 LearningRate 0.0000 Epoch: 35 Global Step: 61220 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-26 15:53:28,972-Speed 25115.19 samples/sec Loss 1.2970 LearningRate 0.0000 Epoch: 35 Global Step: 61230 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-26 15:53:38,741-Speed 25160.57 samples/sec Loss 1.3111 LearningRate 0.0000 Epoch: 35 Global Step: 61240 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-26 15:53:48,668-Speed 24760.48 samples/sec Loss 1.3220 LearningRate 0.0000 Epoch: 35 Global Step: 61250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:53:58,458-Speed 25107.18 samples/sec Loss 1.3102 LearningRate 0.0000 Epoch: 35 Global Step: 61260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:54:08,315-Speed 24937.21 samples/sec Loss 1.2977 LearningRate 0.0000 Epoch: 35 Global Step: 61270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:54:18,249-Speed 24742.41 samples/sec Loss 1.3084 LearningRate 0.0000 Epoch: 35 Global Step: 61280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:54:28,094-Speed 24968.99 samples/sec Loss 1.3074 LearningRate 0.0000 Epoch: 35 Global Step: 61290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:54:37,843-Speed 25211.69 samples/sec Loss 1.2996 LearningRate 0.0000 Epoch: 35 Global Step: 61300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:54:47,605-Speed 25180.14 samples/sec Loss 1.3091 LearningRate 0.0000 Epoch: 35 Global Step: 61310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:54:57,391-Speed 25117.12 samples/sec Loss 1.3036 LearningRate 0.0000 Epoch: 35 Global Step: 61320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:55:07,228-Speed 24985.47 samples/sec Loss 1.3023 LearningRate 0.0000 Epoch: 35 Global Step: 61330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:55:17,054-Speed 25016.41 samples/sec Loss 1.3014 LearningRate 0.0000 Epoch: 35 Global Step: 61340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:55:26,814-Speed 25183.05 samples/sec Loss 1.3021 LearningRate 0.0000 Epoch: 35 Global Step: 61350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:55:36,679-Speed 24914.76 samples/sec Loss 1.3067 LearningRate 0.0000 Epoch: 35 Global Step: 61360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:55:46,374-Speed 25352.48 samples/sec Loss 1.3113 LearningRate 0.0000 Epoch: 35 Global Step: 61370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:55:56,190-Speed 25040.13 samples/sec Loss 1.3032 LearningRate 0.0000 Epoch: 35 Global Step: 61380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:56:06,071-Speed 24873.39 samples/sec Loss 1.3022 LearningRate 0.0000 Epoch: 35 Global Step: 61390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:56:15,881-Speed 25055.68 samples/sec Loss 1.3110 LearningRate 0.0000 Epoch: 35 Global Step: 61400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:56:25,674-Speed 25098.61 samples/sec Loss 1.2952 LearningRate 0.0000 Epoch: 35 Global Step: 61410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:56:35,346-Speed 25412.32 samples/sec Loss 1.3068 LearningRate 0.0000 Epoch: 35 Global Step: 61420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:56:45,102-Speed 25200.67 samples/sec Loss 1.3032 LearningRate 0.0000 Epoch: 35 Global Step: 61430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:56:54,965-Speed 24920.69 samples/sec Loss 1.3043 LearningRate 0.0000 Epoch: 35 Global Step: 61440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:57:04,807-Speed 24971.58 samples/sec Loss 1.2975 LearningRate 0.0000 Epoch: 35 Global Step: 61450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:57:14,574-Speed 25165.36 samples/sec Loss 1.2965 LearningRate 0.0000 Epoch: 35 Global Step: 61460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:57:24,380-Speed 25073.86 samples/sec Loss 1.2974 LearningRate 0.0000 Epoch: 35 Global Step: 61470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:57:34,197-Speed 25039.07 samples/sec Loss 1.2943 LearningRate 0.0000 Epoch: 35 Global Step: 61480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:57:43,964-Speed 25166.42 samples/sec Loss 1.3026 LearningRate 0.0000 Epoch: 35 Global Step: 61490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:57:53,820-Speed 24939.26 samples/sec Loss 1.3090 LearningRate 0.0000 Epoch: 35 Global Step: 61500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:58:03,586-Speed 25167.82 samples/sec Loss 1.3069 LearningRate 0.0000 Epoch: 35 Global Step: 61510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 15:58:13,349-Speed 25176.93 samples/sec Loss 1.3039 LearningRate 0.0000 Epoch: 35 Global Step: 61520 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-26 15:58:23,189-Speed 24979.33 samples/sec Loss 1.3050 LearningRate 0.0000 Epoch: 35 Global Step: 61530 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-26 15:58:33,121-Speed 24747.74 samples/sec Loss 1.3037 LearningRate 0.0000 Epoch: 35 Global Step: 61540 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-26 15:58:42,946-Speed 25016.99 samples/sec Loss 1.3102 LearningRate 0.0000 Epoch: 35 Global Step: 61550 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-26 15:58:52,808-Speed 24924.93 samples/sec Loss 1.2987 LearningRate 0.0000 Epoch: 35 Global Step: 61560 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-26 15:59:02,574-Speed 25169.75 samples/sec Loss 1.3024 LearningRate 0.0000 Epoch: 35 Global Step: 61570 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-26 15:59:12,370-Speed 25090.66 samples/sec Loss 1.3105 LearningRate 0.0000 Epoch: 35 Global Step: 61580 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-26 15:59:22,123-Speed 25201.51 samples/sec Loss 1.3051 LearningRate 0.0000 Epoch: 35 Global Step: 61590 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-26 15:59:31,925-Speed 25073.33 samples/sec Loss 1.2978 LearningRate 0.0000 Epoch: 35 Global Step: 61600 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-26 15:59:41,716-Speed 25104.44 samples/sec Loss 1.2987 LearningRate 0.0000 Epoch: 35 Global Step: 61610 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-26 15:59:51,522-Speed 25073.43 samples/sec Loss 1.2997 LearningRate 0.0000 Epoch: 35 Global Step: 61620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:00:01,367-Speed 24972.34 samples/sec Loss 1.3017 LearningRate 0.0000 Epoch: 35 Global Step: 61630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:00:11,298-Speed 24758.00 samples/sec Loss 1.2963 LearningRate 0.0000 Epoch: 35 Global Step: 61640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:00:21,149-Speed 24952.70 samples/sec Loss 1.3088 LearningRate 0.0000 Epoch: 35 Global Step: 61650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:00:31,169-Speed 24528.08 samples/sec Loss 1.3039 LearningRate 0.0000 Epoch: 35 Global Step: 61660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:00:41,382-Speed 24067.72 samples/sec Loss 1.2999 LearningRate 0.0000 Epoch: 35 Global Step: 61670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:00:51,556-Speed 24157.54 samples/sec Loss 1.3053 LearningRate 0.0000 Epoch: 35 Global Step: 61680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:01:01,669-Speed 24312.92 samples/sec Loss 1.2983 LearningRate 0.0000 Epoch: 35 Global Step: 61690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:01:11,665-Speed 24590.04 samples/sec Loss 1.2970 LearningRate 0.0000 Epoch: 35 Global Step: 61700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:01:21,652-Speed 24611.33 samples/sec Loss 1.2961 LearningRate 0.0000 Epoch: 35 Global Step: 61710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:01:31,669-Speed 24536.05 samples/sec Loss 1.3084 LearningRate 0.0000 Epoch: 35 Global Step: 61720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-26 16:01:41,676-Speed 24560.91 samples/sec Loss 1.3048 LearningRate 0.0000 Epoch: 35 Global Step: 61730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:01:51,706-Speed 24506.66 samples/sec Loss 1.2988 LearningRate 0.0000 Epoch: 35 Global Step: 61740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:02:01,757-Speed 24454.47 samples/sec Loss 1.3035 LearningRate 0.0000 Epoch: 35 Global Step: 61750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:02:11,830-Speed 24399.99 samples/sec Loss 1.3060 LearningRate 0.0000 Epoch: 35 Global Step: 61760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:02:22,026-Speed 24108.11 samples/sec Loss 1.3043 LearningRate 0.0000 Epoch: 35 Global Step: 61770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:02:32,255-Speed 24029.70 samples/sec Loss 1.2960 LearningRate 0.0000 Epoch: 35 Global Step: 61780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:02:42,345-Speed 24359.28 samples/sec Loss 1.3026 LearningRate 0.0000 Epoch: 35 Global Step: 61790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:02:52,362-Speed 24537.35 samples/sec Loss 1.2906 LearningRate 0.0000 Epoch: 35 Global Step: 61800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:03:02,341-Speed 24632.17 samples/sec Loss 1.3028 LearningRate 0.0000 Epoch: 35 Global Step: 61810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:03:12,325-Speed 24618.51 samples/sec Loss 1.2982 LearningRate 0.0000 Epoch: 35 Global Step: 61820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:03:22,383-Speed 24438.39 samples/sec Loss 1.2976 LearningRate 0.0000 Epoch: 35 Global Step: 61830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:03:32,308-Speed 24764.29 samples/sec Loss 1.2916 LearningRate 0.0000 Epoch: 35 Global Step: 61840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:03:42,256-Speed 24711.72 samples/sec Loss 1.2973 LearningRate 0.0000 Epoch: 35 Global Step: 61850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:03:52,261-Speed 24566.08 samples/sec Loss 1.2981 LearningRate 0.0000 Epoch: 35 Global Step: 61860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:04:02,356-Speed 24346.96 samples/sec Loss 1.3052 LearningRate 0.0000 Epoch: 35 Global Step: 61870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:04:12,427-Speed 24412.59 samples/sec Loss 1.2976 LearningRate 0.0000 Epoch: 35 Global Step: 61880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:04:22,303-Speed 24888.00 samples/sec Loss 1.2915 LearningRate 0.0000 Epoch: 35 Global Step: 61890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:04:32,430-Speed 24269.63 samples/sec Loss 1.2891 LearningRate 0.0000 Epoch: 35 Global Step: 61900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:04:42,535-Speed 24324.51 samples/sec Loss 1.2982 LearningRate 0.0000 Epoch: 35 Global Step: 61910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:04:52,653-Speed 24292.67 samples/sec Loss 1.2913 LearningRate 0.0000 Epoch: 35 Global Step: 61920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:05:02,520-Speed 24910.58 samples/sec Loss 1.3071 LearningRate 0.0000 Epoch: 35 Global Step: 61930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:05:12,483-Speed 24672.65 samples/sec Loss 1.2994 LearningRate 0.0000 Epoch: 35 Global Step: 61940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:05:22,433-Speed 24702.06 samples/sec Loss 1.3090 LearningRate 0.0000 Epoch: 35 Global Step: 61950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:05:32,402-Speed 24656.07 samples/sec Loss 1.2969 LearningRate 0.0000 Epoch: 35 Global Step: 61960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:05:42,423-Speed 24527.44 samples/sec Loss 1.2967 LearningRate 0.0000 Epoch: 35 Global Step: 61970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:05:52,494-Speed 24406.75 samples/sec Loss 1.3048 LearningRate 0.0000 Epoch: 35 Global Step: 61980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:06:02,337-Speed 24971.22 samples/sec Loss 1.2980 LearningRate 0.0000 Epoch: 35 Global Step: 61990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:06:12,164-Speed 25015.06 samples/sec Loss 1.2949 LearningRate 0.0000 Epoch: 35 Global Step: 62000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:06:21,975-Speed 25054.07 samples/sec Loss 1.3081 LearningRate 0.0000 Epoch: 35 Global Step: 62010 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:06:31,844-Speed 24904.99 samples/sec Loss 1.3000 LearningRate 0.0000 Epoch: 35 Global Step: 62020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:06:41,643-Speed 25093.03 samples/sec Loss 1.2968 LearningRate 0.0000 Epoch: 35 Global Step: 62030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:06:51,494-Speed 24956.11 samples/sec Loss 1.2868 LearningRate 0.0000 Epoch: 35 Global Step: 62040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:07:01,282-Speed 25110.11 samples/sec Loss 1.2977 LearningRate 0.0000 Epoch: 35 Global Step: 62050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:07:11,004-Speed 25284.40 samples/sec Loss 1.3011 LearningRate 0.0000 Epoch: 35 Global Step: 62060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:07:20,822-Speed 25044.18 samples/sec Loss 1.2922 LearningRate 0.0000 Epoch: 35 Global Step: 62070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:07:30,696-Speed 24892.68 samples/sec Loss 1.2991 LearningRate 0.0000 Epoch: 35 Global Step: 62080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:07:40,504-Speed 25059.65 samples/sec Loss 1.2936 LearningRate 0.0000 Epoch: 35 Global Step: 62090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:07:50,396-Speed 24846.90 samples/sec Loss 1.3032 LearningRate 0.0000 Epoch: 35 Global Step: 62100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:08:00,140-Speed 25226.78 samples/sec Loss 1.2957 LearningRate 0.0000 Epoch: 35 Global Step: 62110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:08:09,945-Speed 25075.52 samples/sec Loss 1.3014 LearningRate 0.0000 Epoch: 35 Global Step: 62120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:08:19,736-Speed 25102.82 samples/sec Loss 1.3012 LearningRate 0.0000 Epoch: 35 Global Step: 62130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:08:29,790-Speed 24448.73 samples/sec Loss 1.3011 LearningRate 0.0000 Epoch: 35 Global Step: 62140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:08:39,830-Speed 24480.68 samples/sec Loss 1.3142 LearningRate 0.0000 Epoch: 35 Global Step: 62150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:08:49,840-Speed 24551.99 samples/sec Loss 1.2976 LearningRate 0.0000 Epoch: 35 Global Step: 62160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:08:59,795-Speed 24691.81 samples/sec Loss 1.2999 LearningRate 0.0000 Epoch: 35 Global Step: 62170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:09:09,782-Speed 24610.37 samples/sec Loss 1.2984 LearningRate 0.0000 Epoch: 35 Global Step: 62180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:09:19,741-Speed 24680.76 samples/sec Loss 1.2921 LearningRate 0.0000 Epoch: 35 Global Step: 62190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:09:29,640-Speed 24828.91 samples/sec Loss 1.2937 LearningRate 0.0000 Epoch: 35 Global Step: 62200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:09:39,624-Speed 24617.56 samples/sec Loss 1.3062 LearningRate 0.0000 Epoch: 35 Global Step: 62210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:10:38,999-Speed 4139.23 samples/sec Loss 1.3031 LearningRate 0.0000 Epoch: 36 Global Step: 62220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:10:48,683-Speed 25381.41 samples/sec Loss 1.2951 LearningRate 0.0000 Epoch: 36 Global Step: 62230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:10:58,455-Speed 25153.17 samples/sec Loss 1.2955 LearningRate 0.0000 Epoch: 36 Global Step: 62240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:11:08,169-Speed 25303.91 samples/sec Loss 1.2930 LearningRate 0.0000 Epoch: 36 Global Step: 62250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:11:17,973-Speed 25071.49 samples/sec Loss 1.3002 LearningRate 0.0000 Epoch: 36 Global Step: 62260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:11:27,710-Speed 25242.24 samples/sec Loss 1.2955 LearningRate 0.0000 Epoch: 36 Global Step: 62270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:11:37,409-Speed 25343.13 samples/sec Loss 1.2915 LearningRate 0.0000 Epoch: 36 Global Step: 62280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:11:47,252-Speed 24970.63 samples/sec Loss 1.2920 LearningRate 0.0000 Epoch: 36 Global Step: 62290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:11:56,917-Speed 25432.26 samples/sec Loss 1.2873 LearningRate 0.0000 Epoch: 36 Global Step: 62300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:12:06,653-Speed 25246.78 samples/sec Loss 1.2877 LearningRate 0.0000 Epoch: 36 Global Step: 62310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:12:16,395-Speed 25239.02 samples/sec Loss 1.2892 LearningRate 0.0000 Epoch: 36 Global Step: 62320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:12:26,070-Speed 25405.64 samples/sec Loss 1.2987 LearningRate 0.0000 Epoch: 36 Global Step: 62330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:12:35,779-Speed 25314.85 samples/sec Loss 1.2876 LearningRate 0.0000 Epoch: 36 Global Step: 62340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:12:45,648-Speed 24905.98 samples/sec Loss 1.2886 LearningRate 0.0000 Epoch: 36 Global Step: 62350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:12:55,472-Speed 25023.91 samples/sec Loss 1.2860 LearningRate 0.0000 Epoch: 36 Global Step: 62360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:13:05,277-Speed 25069.24 samples/sec Loss 1.2867 LearningRate 0.0000 Epoch: 36 Global Step: 62370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:13:15,048-Speed 25154.85 samples/sec Loss 1.2959 LearningRate 0.0000 Epoch: 36 Global Step: 62380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:13:24,747-Speed 25343.96 samples/sec Loss 1.2897 LearningRate 0.0000 Epoch: 36 Global Step: 62390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:13:34,464-Speed 25294.55 samples/sec Loss 1.3055 LearningRate 0.0000 Epoch: 36 Global Step: 62400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:13:44,311-Speed 24961.39 samples/sec Loss 1.2900 LearningRate 0.0000 Epoch: 36 Global Step: 62410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:13:54,092-Speed 25129.06 samples/sec Loss 1.2973 LearningRate 0.0000 Epoch: 36 Global Step: 62420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:14:03,875-Speed 25125.13 samples/sec Loss 1.2913 LearningRate 0.0000 Epoch: 36 Global Step: 62430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:14:13,593-Speed 25291.85 samples/sec Loss 1.3022 LearningRate 0.0000 Epoch: 36 Global Step: 62440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:14:23,324-Speed 25259.92 samples/sec Loss 1.2933 LearningRate 0.0000 Epoch: 36 Global Step: 62450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:14:33,048-Speed 25275.02 samples/sec Loss 1.2955 LearningRate 0.0000 Epoch: 36 Global Step: 62460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:14:42,834-Speed 25117.63 samples/sec Loss 1.2884 LearningRate 0.0000 Epoch: 36 Global Step: 62470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:14:52,689-Speed 24943.01 samples/sec Loss 1.2902 LearningRate 0.0000 Epoch: 36 Global Step: 62480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:15:02,587-Speed 24831.83 samples/sec Loss 1.2978 LearningRate 0.0000 Epoch: 36 Global Step: 62490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:15:12,310-Speed 25281.32 samples/sec Loss 1.2874 LearningRate 0.0000 Epoch: 36 Global Step: 62500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:15:22,019-Speed 25314.31 samples/sec Loss 1.2877 LearningRate 0.0000 Epoch: 36 Global Step: 62510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:15:31,843-Speed 25020.30 samples/sec Loss 1.2897 LearningRate 0.0000 Epoch: 36 Global Step: 62520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:15:41,533-Speed 25368.11 samples/sec Loss 1.2976 LearningRate 0.0000 Epoch: 36 Global Step: 62530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-26 16:15:51,264-Speed 25258.72 samples/sec Loss 1.2939 LearningRate 0.0000 Epoch: 36 Global Step: 62540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:16:01,004-Speed 25235.15 samples/sec Loss 1.2862 LearningRate 0.0000 Epoch: 36 Global Step: 62550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:16:10,851-Speed 24969.49 samples/sec Loss 1.3011 LearningRate 0.0000 Epoch: 36 Global Step: 62560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:16:20,617-Speed 25168.43 samples/sec Loss 1.2986 LearningRate 0.0000 Epoch: 36 Global Step: 62570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:16:30,489-Speed 24898.34 samples/sec Loss 1.2938 LearningRate 0.0000 Epoch: 36 Global Step: 62580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:16:40,322-Speed 24994.11 samples/sec Loss 1.3002 LearningRate 0.0000 Epoch: 36 Global Step: 62590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:16:50,037-Speed 25301.47 samples/sec Loss 1.3044 LearningRate 0.0000 Epoch: 36 Global Step: 62600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:16:59,882-Speed 24967.15 samples/sec Loss 1.2883 LearningRate 0.0000 Epoch: 36 Global Step: 62610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:17:09,765-Speed 24870.32 samples/sec Loss 1.2904 LearningRate 0.0000 Epoch: 36 Global Step: 62620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:17:19,747-Speed 24627.00 samples/sec Loss 1.2970 LearningRate 0.0000 Epoch: 36 Global Step: 62630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:17:29,681-Speed 24742.95 samples/sec Loss 1.2985 LearningRate 0.0000 Epoch: 36 Global Step: 62640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:17:39,401-Speed 25287.16 samples/sec Loss 1.2954 LearningRate 0.0000 Epoch: 36 Global Step: 62650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:17:49,081-Speed 25390.99 samples/sec Loss 1.2941 LearningRate 0.0000 Epoch: 36 Global Step: 62660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:17:58,912-Speed 25001.98 samples/sec Loss 1.2918 LearningRate 0.0000 Epoch: 36 Global Step: 62670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:18:08,775-Speed 24920.98 samples/sec Loss 1.2970 LearningRate 0.0000 Epoch: 36 Global Step: 62680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:18:18,566-Speed 25102.60 samples/sec Loss 1.2870 LearningRate 0.0000 Epoch: 36 Global Step: 62690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:18:28,347-Speed 25132.05 samples/sec Loss 1.2862 LearningRate 0.0000 Epoch: 36 Global Step: 62700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:18:38,082-Speed 25257.23 samples/sec Loss 1.2897 LearningRate 0.0000 Epoch: 36 Global Step: 62710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:18:47,757-Speed 25404.09 samples/sec Loss 1.2929 LearningRate 0.0000 Epoch: 36 Global Step: 62720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:18:57,444-Speed 25377.02 samples/sec Loss 1.2964 LearningRate 0.0000 Epoch: 36 Global Step: 62730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:19:07,136-Speed 25358.12 samples/sec Loss 1.2937 LearningRate 0.0000 Epoch: 36 Global Step: 62740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:19:16,815-Speed 25397.36 samples/sec Loss 1.2872 LearningRate 0.0000 Epoch: 36 Global Step: 62750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:19:26,602-Speed 25114.35 samples/sec Loss 1.2936 LearningRate 0.0000 Epoch: 36 Global Step: 62760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:19:36,281-Speed 25394.51 samples/sec Loss 1.2916 LearningRate 0.0000 Epoch: 36 Global Step: 62770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:19:46,117-Speed 24987.73 samples/sec Loss 1.2891 LearningRate 0.0000 Epoch: 36 Global Step: 62780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:19:55,826-Speed 25319.06 samples/sec Loss 1.2935 LearningRate 0.0000 Epoch: 36 Global Step: 62790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:20:05,587-Speed 25180.69 samples/sec Loss 1.2970 LearningRate 0.0000 Epoch: 36 Global Step: 62800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:20:15,342-Speed 25194.62 samples/sec Loss 1.3005 LearningRate 0.0000 Epoch: 36 Global Step: 62810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:20:25,149-Speed 25064.54 samples/sec Loss 1.2828 LearningRate 0.0000 Epoch: 36 Global Step: 62820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:20:34,977-Speed 25010.14 samples/sec Loss 1.2934 LearningRate 0.0000 Epoch: 36 Global Step: 62830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:20:44,864-Speed 24859.10 samples/sec Loss 1.2949 LearningRate 0.0000 Epoch: 36 Global Step: 62840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:20:54,678-Speed 25046.31 samples/sec Loss 1.2930 LearningRate 0.0000 Epoch: 36 Global Step: 62850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:21:04,455-Speed 25139.43 samples/sec Loss 1.2952 LearningRate 0.0000 Epoch: 36 Global Step: 62860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:21:14,265-Speed 25056.68 samples/sec Loss 1.2923 LearningRate 0.0000 Epoch: 36 Global Step: 62870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:21:23,989-Speed 25284.99 samples/sec Loss 1.2929 LearningRate 0.0000 Epoch: 36 Global Step: 62880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:21:33,787-Speed 25084.14 samples/sec Loss 1.2882 LearningRate 0.0000 Epoch: 36 Global Step: 62890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:21:43,535-Speed 25214.85 samples/sec Loss 1.2853 LearningRate 0.0000 Epoch: 36 Global Step: 62900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:21:53,279-Speed 25226.88 samples/sec Loss 1.2972 LearningRate 0.0000 Epoch: 36 Global Step: 62910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:22:03,020-Speed 25230.80 samples/sec Loss 1.2932 LearningRate 0.0000 Epoch: 36 Global Step: 62920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:22:12,788-Speed 25164.68 samples/sec Loss 1.2937 LearningRate 0.0000 Epoch: 36 Global Step: 62930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:22:22,510-Speed 25283.06 samples/sec Loss 1.2819 LearningRate 0.0000 Epoch: 36 Global Step: 62940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:22:32,259-Speed 25210.66 samples/sec Loss 1.2938 LearningRate 0.0000 Epoch: 36 Global Step: 62950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:22:42,034-Speed 25145.61 samples/sec Loss 1.2847 LearningRate 0.0000 Epoch: 36 Global Step: 62960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:22:51,944-Speed 24810.12 samples/sec Loss 1.2849 LearningRate 0.0000 Epoch: 36 Global Step: 62970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:23:01,799-Speed 24940.18 samples/sec Loss 1.2989 LearningRate 0.0000 Epoch: 36 Global Step: 62980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:23:11,601-Speed 25075.29 samples/sec Loss 1.2901 LearningRate 0.0000 Epoch: 36 Global Step: 62990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:23:21,397-Speed 25092.27 samples/sec Loss 1.2870 LearningRate 0.0000 Epoch: 36 Global Step: 63000 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-26 16:23:31,107-Speed 25312.79 samples/sec Loss 1.2952 LearningRate 0.0000 Epoch: 36 Global Step: 63010 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-26 16:23:40,881-Speed 25147.24 samples/sec Loss 1.2971 LearningRate 0.0000 Epoch: 36 Global Step: 63020 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-26 16:23:50,588-Speed 25322.31 samples/sec Loss 1.2939 LearningRate 0.0000 Epoch: 36 Global Step: 63030 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-26 16:24:00,382-Speed 25102.59 samples/sec Loss 1.2851 LearningRate 0.0000 Epoch: 36 Global Step: 63040 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-26 16:24:10,156-Speed 25146.74 samples/sec Loss 1.2875 LearningRate 0.0000 Epoch: 36 Global Step: 63050 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-26 16:24:19,912-Speed 25195.90 samples/sec Loss 1.2794 LearningRate 0.0000 Epoch: 36 Global Step: 63060 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-26 16:24:29,735-Speed 25023.44 samples/sec Loss 1.2832 LearningRate 0.0000 Epoch: 36 Global Step: 63070 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-26 16:24:39,461-Speed 25276.11 samples/sec Loss 1.2786 LearningRate 0.0000 Epoch: 36 Global Step: 63080 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-26 16:24:49,381-Speed 24778.19 samples/sec Loss 1.2789 LearningRate 0.0000 Epoch: 36 Global Step: 63090 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-26 16:24:59,285-Speed 24818.92 samples/sec Loss 1.2895 LearningRate 0.0000 Epoch: 36 Global Step: 63100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:25:09,192-Speed 24809.39 samples/sec Loss 1.2875 LearningRate 0.0000 Epoch: 36 Global Step: 63110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:25:19,124-Speed 24748.36 samples/sec Loss 1.2786 LearningRate 0.0000 Epoch: 36 Global Step: 63120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:25:29,114-Speed 24602.50 samples/sec Loss 1.2889 LearningRate 0.0000 Epoch: 36 Global Step: 63130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:25:39,063-Speed 24705.92 samples/sec Loss 1.2876 LearningRate 0.0000 Epoch: 36 Global Step: 63140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:25:49,023-Speed 24685.11 samples/sec Loss 1.2875 LearningRate 0.0000 Epoch: 36 Global Step: 63150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:25:59,083-Speed 24430.17 samples/sec Loss 1.2779 LearningRate 0.0000 Epoch: 36 Global Step: 63160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:26:09,000-Speed 24784.38 samples/sec Loss 1.2817 LearningRate 0.0000 Epoch: 36 Global Step: 63170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:26:18,959-Speed 24685.20 samples/sec Loss 1.2839 LearningRate 0.0000 Epoch: 36 Global Step: 63180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:26:28,853-Speed 24843.34 samples/sec Loss 1.2883 LearningRate 0.0000 Epoch: 36 Global Step: 63190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:26:38,789-Speed 24736.97 samples/sec Loss 1.2804 LearningRate 0.0000 Epoch: 36 Global Step: 63200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:26:48,663-Speed 24893.63 samples/sec Loss 1.2849 LearningRate 0.0000 Epoch: 36 Global Step: 63210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:26:58,567-Speed 24817.94 samples/sec Loss 1.2949 LearningRate 0.0000 Epoch: 36 Global Step: 63220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:27:08,486-Speed 24785.40 samples/sec Loss 1.2817 LearningRate 0.0000 Epoch: 36 Global Step: 63230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:27:18,411-Speed 24766.49 samples/sec Loss 1.2701 LearningRate 0.0000 Epoch: 36 Global Step: 63240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:27:28,286-Speed 24890.46 samples/sec Loss 1.2891 LearningRate 0.0000 Epoch: 36 Global Step: 63250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:27:38,251-Speed 24665.64 samples/sec Loss 1.2933 LearningRate 0.0000 Epoch: 36 Global Step: 63260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:27:48,215-Speed 24666.93 samples/sec Loss 1.2872 LearningRate 0.0000 Epoch: 36 Global Step: 63270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:27:58,014-Speed 25084.44 samples/sec Loss 1.2856 LearningRate 0.0000 Epoch: 36 Global Step: 63280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:28:07,757-Speed 25227.75 samples/sec Loss 1.2827 LearningRate 0.0000 Epoch: 36 Global Step: 63290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:28:17,561-Speed 25072.12 samples/sec Loss 1.2837 LearningRate 0.0000 Epoch: 36 Global Step: 63300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:28:27,203-Speed 25492.27 samples/sec Loss 1.2852 LearningRate 0.0000 Epoch: 36 Global Step: 63310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:28:37,010-Speed 25063.14 samples/sec Loss 1.2870 LearningRate 0.0000 Epoch: 36 Global Step: 63320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:28:46,803-Speed 25098.45 samples/sec Loss 1.2932 LearningRate 0.0000 Epoch: 36 Global Step: 63330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:28:56,541-Speed 25240.94 samples/sec Loss 1.2898 LearningRate 0.0000 Epoch: 36 Global Step: 63340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:29:06,370-Speed 25007.06 samples/sec Loss 1.2819 LearningRate 0.0000 Epoch: 36 Global Step: 63350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:29:16,144-Speed 25148.52 samples/sec Loss 1.2853 LearningRate 0.0000 Epoch: 36 Global Step: 63360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:29:25,955-Speed 25054.12 samples/sec Loss 1.2845 LearningRate 0.0000 Epoch: 36 Global Step: 63370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:29:35,798-Speed 24968.85 samples/sec Loss 1.2809 LearningRate 0.0000 Epoch: 36 Global Step: 63380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:29:45,723-Speed 24764.93 samples/sec Loss 1.2836 LearningRate 0.0000 Epoch: 36 Global Step: 63390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:29:55,642-Speed 24778.51 samples/sec Loss 1.2884 LearningRate 0.0000 Epoch: 36 Global Step: 63400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:30:05,589-Speed 24718.75 samples/sec Loss 1.2859 LearningRate 0.0000 Epoch: 36 Global Step: 63410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:30:15,472-Speed 24874.26 samples/sec Loss 1.2800 LearningRate 0.0000 Epoch: 36 Global Step: 63420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:30:25,439-Speed 24660.73 samples/sec Loss 1.2842 LearningRate 0.0000 Epoch: 36 Global Step: 63430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:30:35,237-Speed 25085.52 samples/sec Loss 1.2852 LearningRate 0.0000 Epoch: 36 Global Step: 63440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:30:45,080-Speed 24970.23 samples/sec Loss 1.2830 LearningRate 0.0000 Epoch: 36 Global Step: 63450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:30:54,807-Speed 25266.57 samples/sec Loss 1.2904 LearningRate 0.0000 Epoch: 36 Global Step: 63460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:31:04,507-Speed 25338.17 samples/sec Loss 1.2784 LearningRate 0.0000 Epoch: 36 Global Step: 63470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:31:14,219-Speed 25309.61 samples/sec Loss 1.2761 LearningRate 0.0000 Epoch: 36 Global Step: 63480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:31:24,024-Speed 25067.41 samples/sec Loss 1.2928 LearningRate 0.0000 Epoch: 36 Global Step: 63490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:31:33,756-Speed 25255.85 samples/sec Loss 1.2720 LearningRate 0.0000 Epoch: 36 Global Step: 63500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:31:43,533-Speed 25139.36 samples/sec Loss 1.2893 LearningRate 0.0000 Epoch: 36 Global Step: 63510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:31:53,373-Speed 24977.50 samples/sec Loss 1.2876 LearningRate 0.0000 Epoch: 36 Global Step: 63520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:32:03,088-Speed 25299.32 samples/sec Loss 1.2895 LearningRate 0.0000 Epoch: 36 Global Step: 63530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:32:12,841-Speed 25201.37 samples/sec Loss 1.2741 LearningRate 0.0000 Epoch: 36 Global Step: 63540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:32:22,631-Speed 25104.86 samples/sec Loss 1.2818 LearningRate 0.0000 Epoch: 36 Global Step: 63550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:32:32,461-Speed 25005.13 samples/sec Loss 1.2781 LearningRate 0.0000 Epoch: 36 Global Step: 63560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:32:42,306-Speed 24963.61 samples/sec Loss 1.2850 LearningRate 0.0000 Epoch: 36 Global Step: 63570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:32:51,995-Speed 25368.30 samples/sec Loss 1.2859 LearningRate 0.0000 Epoch: 36 Global Step: 63580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:33:01,672-Speed 25397.61 samples/sec Loss 1.2889 LearningRate 0.0000 Epoch: 36 Global Step: 63590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:33:11,442-Speed 25158.05 samples/sec Loss 1.2806 LearningRate 0.0000 Epoch: 36 Global Step: 63600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:33:21,142-Speed 25337.83 samples/sec Loss 1.2806 LearningRate 0.0000 Epoch: 36 Global Step: 63610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:33:30,963-Speed 25027.01 samples/sec Loss 1.2862 LearningRate 0.0000 Epoch: 36 Global Step: 63620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:33:40,639-Speed 25402.20 samples/sec Loss 1.2804 LearningRate 0.0000 Epoch: 36 Global Step: 63630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:33:50,482-Speed 24971.37 samples/sec Loss 1.2757 LearningRate 0.0000 Epoch: 36 Global Step: 63640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:34:00,324-Speed 24972.05 samples/sec Loss 1.2806 LearningRate 0.0000 Epoch: 36 Global Step: 63650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:34:10,115-Speed 25111.36 samples/sec Loss 1.2834 LearningRate 0.0000 Epoch: 36 Global Step: 63660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:34:19,911-Speed 25089.36 samples/sec Loss 1.2716 LearningRate 0.0000 Epoch: 36 Global Step: 63670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:34:29,676-Speed 25170.44 samples/sec Loss 1.2702 LearningRate 0.0000 Epoch: 36 Global Step: 63680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:34:39,547-Speed 24898.75 samples/sec Loss 1.2838 LearningRate 0.0000 Epoch: 36 Global Step: 63690 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-26 16:34:49,416-Speed 24906.72 samples/sec Loss 1.2723 LearningRate 0.0000 Epoch: 36 Global Step: 63700 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-26 16:34:59,159-Speed 25225.67 samples/sec Loss 1.2783 LearningRate 0.0000 Epoch: 36 Global Step: 63710 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-26 16:35:08,881-Speed 25282.01 samples/sec Loss 1.2773 LearningRate 0.0000 Epoch: 36 Global Step: 63720 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-26 16:35:18,755-Speed 24892.36 samples/sec Loss 1.2762 LearningRate 0.0000 Epoch: 36 Global Step: 63730 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-26 16:35:28,541-Speed 25115.84 samples/sec Loss 1.2885 LearningRate 0.0000 Epoch: 36 Global Step: 63740 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-26 16:35:38,255-Speed 25303.22 samples/sec Loss 1.2871 LearningRate 0.0000 Epoch: 36 Global Step: 63750 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-26 16:35:47,994-Speed 25236.71 samples/sec Loss 1.2887 LearningRate 0.0000 Epoch: 36 Global Step: 63760 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-26 16:35:57,769-Speed 25146.15 samples/sec Loss 1.2893 LearningRate 0.0000 Epoch: 36 Global Step: 63770 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-26 16:36:07,526-Speed 25191.49 samples/sec Loss 1.2830 LearningRate 0.0000 Epoch: 36 Global Step: 63780 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-26 16:36:17,388-Speed 24922.66 samples/sec Loss 1.2821 LearningRate 0.0000 Epoch: 36 Global Step: 63790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:36:27,132-Speed 25226.41 samples/sec Loss 1.2809 LearningRate 0.0000 Epoch: 36 Global Step: 63800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-26 16:36:36,807-Speed 25405.52 samples/sec Loss 1.2813 LearningRate 0.0000 Epoch: 36 Global Step: 63810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:36:46,615-Speed 25058.27 samples/sec Loss 1.2776 LearningRate 0.0000 Epoch: 36 Global Step: 63820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:36:56,386-Speed 25157.01 samples/sec Loss 1.2873 LearningRate 0.0000 Epoch: 36 Global Step: 63830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:37:06,183-Speed 25087.81 samples/sec Loss 1.2869 LearningRate 0.0000 Epoch: 36 Global Step: 63840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:37:15,982-Speed 25083.87 samples/sec Loss 1.2819 LearningRate 0.0000 Epoch: 36 Global Step: 63850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:37:25,868-Speed 24863.80 samples/sec Loss 1.2870 LearningRate 0.0000 Epoch: 36 Global Step: 63860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:37:35,796-Speed 24757.23 samples/sec Loss 1.2766 LearningRate 0.0000 Epoch: 36 Global Step: 63870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:37:45,673-Speed 24887.09 samples/sec Loss 1.2846 LearningRate 0.0000 Epoch: 36 Global Step: 63880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:37:55,500-Speed 25011.81 samples/sec Loss 1.2841 LearningRate 0.0000 Epoch: 36 Global Step: 63890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:38:05,243-Speed 25227.76 samples/sec Loss 1.2776 LearningRate 0.0000 Epoch: 36 Global Step: 63900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:38:15,072-Speed 25022.85 samples/sec Loss 1.2773 LearningRate 0.0000 Epoch: 36 Global Step: 63910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:38:24,825-Speed 25201.10 samples/sec Loss 1.2855 LearningRate 0.0000 Epoch: 36 Global Step: 63920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:38:34,641-Speed 25040.28 samples/sec Loss 1.2819 LearningRate 0.0000 Epoch: 36 Global Step: 63930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:38:44,469-Speed 25010.86 samples/sec Loss 1.2859 LearningRate 0.0000 Epoch: 36 Global Step: 63940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:39:44,163-Speed 4117.07 samples/sec Loss 1.2865 LearningRate 0.0000 Epoch: 37 Global Step: 63950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:39:53,924-Speed 25180.39 samples/sec Loss 1.2930 LearningRate 0.0000 Epoch: 37 Global Step: 63960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:40:03,714-Speed 25106.94 samples/sec Loss 1.2803 LearningRate 0.0000 Epoch: 37 Global Step: 63970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:40:13,681-Speed 24661.78 samples/sec Loss 1.2751 LearningRate 0.0000 Epoch: 37 Global Step: 63980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:40:23,470-Speed 25108.38 samples/sec Loss 1.2831 LearningRate 0.0000 Epoch: 37 Global Step: 63990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:40:33,291-Speed 25033.36 samples/sec Loss 1.2804 LearningRate 0.0000 Epoch: 37 Global Step: 64000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:40:43,198-Speed 24815.85 samples/sec Loss 1.2725 LearningRate 0.0000 Epoch: 37 Global Step: 64010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:40:52,934-Speed 25245.15 samples/sec Loss 1.2748 LearningRate 0.0000 Epoch: 37 Global Step: 64020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:41:02,629-Speed 25353.94 samples/sec Loss 1.2855 LearningRate 0.0000 Epoch: 37 Global Step: 64030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:41:12,408-Speed 25142.46 samples/sec Loss 1.2784 LearningRate 0.0000 Epoch: 37 Global Step: 64040 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 16:41:22,148-Speed 25233.14 samples/sec Loss 1.2737 LearningRate 0.0000 Epoch: 37 Global Step: 64050 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 16:41:31,930-Speed 25136.79 samples/sec Loss 1.2770 LearningRate 0.0000 Epoch: 37 Global Step: 64060 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 16:41:41,768-Speed 24989.63 samples/sec Loss 1.2751 LearningRate 0.0000 Epoch: 37 Global Step: 64070 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 16:41:51,641-Speed 24898.06 samples/sec Loss 1.2771 LearningRate 0.0000 Epoch: 37 Global Step: 64080 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 16:42:01,367-Speed 25271.00 samples/sec Loss 1.2708 LearningRate 0.0000 Epoch: 37 Global Step: 64090 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 16:42:11,149-Speed 25125.88 samples/sec Loss 1.2781 LearningRate 0.0000 Epoch: 37 Global Step: 64100 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 16:42:20,973-Speed 25021.27 samples/sec Loss 1.2787 LearningRate 0.0000 Epoch: 37 Global Step: 64110 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 16:42:30,674-Speed 25335.34 samples/sec Loss 1.2822 LearningRate 0.0000 Epoch: 37 Global Step: 64120 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 16:42:40,565-Speed 24848.38 samples/sec Loss 1.2706 LearningRate 0.0000 Epoch: 37 Global Step: 64130 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 16:42:50,402-Speed 24985.71 samples/sec Loss 1.2877 LearningRate 0.0000 Epoch: 37 Global Step: 64140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:43:00,232-Speed 25006.97 samples/sec Loss 1.2794 LearningRate 0.0000 Epoch: 37 Global Step: 64150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:43:09,957-Speed 25273.03 samples/sec Loss 1.2783 LearningRate 0.0000 Epoch: 37 Global Step: 64160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:43:19,798-Speed 24974.74 samples/sec Loss 1.2813 LearningRate 0.0000 Epoch: 37 Global Step: 64170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:43:29,555-Speed 25191.61 samples/sec Loss 1.2683 LearningRate 0.0000 Epoch: 37 Global Step: 64180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:43:39,262-Speed 25327.48 samples/sec Loss 1.2743 LearningRate 0.0000 Epoch: 37 Global Step: 64190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:43:49,086-Speed 25032.84 samples/sec Loss 1.2791 LearningRate 0.0000 Epoch: 37 Global Step: 64200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:43:58,873-Speed 25112.65 samples/sec Loss 1.2822 LearningRate 0.0000 Epoch: 37 Global Step: 64210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:44:08,599-Speed 25279.62 samples/sec Loss 1.2717 LearningRate 0.0000 Epoch: 37 Global Step: 64220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:44:18,375-Speed 25141.59 samples/sec Loss 1.2782 LearningRate 0.0000 Epoch: 37 Global Step: 64230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:44:28,192-Speed 25046.48 samples/sec Loss 1.2732 LearningRate 0.0000 Epoch: 37 Global Step: 64240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:44:38,235-Speed 24473.09 samples/sec Loss 1.2796 LearningRate 0.0000 Epoch: 37 Global Step: 64250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:44:48,394-Speed 24192.86 samples/sec Loss 1.2859 LearningRate 0.0000 Epoch: 37 Global Step: 64260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:44:58,427-Speed 24500.04 samples/sec Loss 1.2754 LearningRate 0.0000 Epoch: 37 Global Step: 64270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:45:08,499-Speed 24404.46 samples/sec Loss 1.2699 LearningRate 0.0000 Epoch: 37 Global Step: 64280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:45:18,603-Speed 24325.76 samples/sec Loss 1.2789 LearningRate 0.0000 Epoch: 37 Global Step: 64290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:45:28,691-Speed 24364.58 samples/sec Loss 1.2675 LearningRate 0.0000 Epoch: 37 Global Step: 64300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:45:38,749-Speed 24441.66 samples/sec Loss 1.2842 LearningRate 0.0000 Epoch: 37 Global Step: 64310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:45:48,814-Speed 24420.20 samples/sec Loss 1.2837 LearningRate 0.0000 Epoch: 37 Global Step: 64320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:45:59,076-Speed 23952.30 samples/sec Loss 1.2874 LearningRate 0.0000 Epoch: 37 Global Step: 64330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:46:09,124-Speed 24460.87 samples/sec Loss 1.2733 LearningRate 0.0000 Epoch: 37 Global Step: 64340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:46:19,367-Speed 23995.22 samples/sec Loss 1.2762 LearningRate 0.0000 Epoch: 37 Global Step: 64350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:46:29,474-Speed 24321.04 samples/sec Loss 1.2736 LearningRate 0.0000 Epoch: 37 Global Step: 64360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:46:39,577-Speed 24329.56 samples/sec Loss 1.2762 LearningRate 0.0000 Epoch: 37 Global Step: 64370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:46:49,728-Speed 24213.61 samples/sec Loss 1.2863 LearningRate 0.0000 Epoch: 37 Global Step: 64380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:46:59,836-Speed 24314.16 samples/sec Loss 1.2768 LearningRate 0.0000 Epoch: 37 Global Step: 64390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:47:09,868-Speed 24501.14 samples/sec Loss 1.2902 LearningRate 0.0000 Epoch: 37 Global Step: 64400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:47:19,963-Speed 24348.12 samples/sec Loss 1.2788 LearningRate 0.0000 Epoch: 37 Global Step: 64410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:47:30,172-Speed 24076.18 samples/sec Loss 1.2764 LearningRate 0.0000 Epoch: 37 Global Step: 64420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:47:40,305-Speed 24257.83 samples/sec Loss 1.2757 LearningRate 0.0000 Epoch: 37 Global Step: 64430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:47:50,337-Speed 24500.73 samples/sec Loss 1.2721 LearningRate 0.0000 Epoch: 37 Global Step: 64440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:48:00,404-Speed 24414.82 samples/sec Loss 1.2706 LearningRate 0.0000 Epoch: 37 Global Step: 64450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:48:10,640-Speed 24011.68 samples/sec Loss 1.2844 LearningRate 0.0000 Epoch: 37 Global Step: 64460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:48:20,764-Speed 24279.25 samples/sec Loss 1.2767 LearningRate 0.0000 Epoch: 37 Global Step: 64470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:48:30,886-Speed 24283.45 samples/sec Loss 1.2816 LearningRate 0.0000 Epoch: 37 Global Step: 64480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:48:40,983-Speed 24342.01 samples/sec Loss 1.2716 LearningRate 0.0000 Epoch: 37 Global Step: 64490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:48:51,113-Speed 24265.47 samples/sec Loss 1.2750 LearningRate 0.0000 Epoch: 37 Global Step: 64500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:49:01,237-Speed 24280.10 samples/sec Loss 1.2764 LearningRate 0.0000 Epoch: 37 Global Step: 64510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:49:11,429-Speed 24116.16 samples/sec Loss 1.2634 LearningRate 0.0000 Epoch: 37 Global Step: 64520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:49:21,693-Speed 23947.39 samples/sec Loss 1.2801 LearningRate 0.0000 Epoch: 37 Global Step: 64530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:49:31,848-Speed 24202.95 samples/sec Loss 1.2695 LearningRate 0.0000 Epoch: 37 Global Step: 64540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-26 16:49:41,924-Speed 24392.89 samples/sec Loss 1.2733 LearningRate 0.0000 Epoch: 37 Global Step: 64550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:49:52,024-Speed 24337.65 samples/sec Loss 1.2724 LearningRate 0.0000 Epoch: 37 Global Step: 64560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:50:02,132-Speed 24318.91 samples/sec Loss 1.2776 LearningRate 0.0000 Epoch: 37 Global Step: 64570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:50:12,304-Speed 24165.29 samples/sec Loss 1.2735 LearningRate 0.0000 Epoch: 37 Global Step: 64580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:50:22,431-Speed 24274.73 samples/sec Loss 1.2807 LearningRate 0.0000 Epoch: 37 Global Step: 64590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:50:32,614-Speed 24139.01 samples/sec Loss 1.2744 LearningRate 0.0000 Epoch: 37 Global Step: 64600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:50:42,716-Speed 24330.26 samples/sec Loss 1.2813 LearningRate 0.0000 Epoch: 37 Global Step: 64610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:50:52,851-Speed 24254.31 samples/sec Loss 1.2792 LearningRate 0.0000 Epoch: 37 Global Step: 64620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:51:02,969-Speed 24290.85 samples/sec Loss 1.2709 LearningRate 0.0000 Epoch: 37 Global Step: 64630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:51:13,121-Speed 24210.97 samples/sec Loss 1.2751 LearningRate 0.0000 Epoch: 37 Global Step: 64640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:51:23,205-Speed 24373.22 samples/sec Loss 1.2733 LearningRate 0.0000 Epoch: 37 Global Step: 64650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:51:33,354-Speed 24220.40 samples/sec Loss 1.2758 LearningRate 0.0000 Epoch: 37 Global Step: 64660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:51:43,501-Speed 24224.84 samples/sec Loss 1.2768 LearningRate 0.0000 Epoch: 37 Global Step: 64670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:51:53,593-Speed 24356.83 samples/sec Loss 1.2852 LearningRate 0.0000 Epoch: 37 Global Step: 64680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:52:03,685-Speed 24357.30 samples/sec Loss 1.2705 LearningRate 0.0000 Epoch: 37 Global Step: 64690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:52:13,801-Speed 24296.98 samples/sec Loss 1.2850 LearningRate 0.0000 Epoch: 37 Global Step: 64700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:52:23,895-Speed 24349.04 samples/sec Loss 1.2713 LearningRate 0.0000 Epoch: 37 Global Step: 64710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:52:34,164-Speed 23935.84 samples/sec Loss 1.2833 LearningRate 0.0000 Epoch: 37 Global Step: 64720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:52:44,316-Speed 24212.27 samples/sec Loss 1.2744 LearningRate 0.0000 Epoch: 37 Global Step: 64730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:52:54,438-Speed 24281.03 samples/sec Loss 1.2769 LearningRate 0.0000 Epoch: 37 Global Step: 64740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:53:04,548-Speed 24313.37 samples/sec Loss 1.2832 LearningRate 0.0000 Epoch: 37 Global Step: 64750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:53:14,683-Speed 24252.02 samples/sec Loss 1.2746 LearningRate 0.0000 Epoch: 37 Global Step: 64760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:53:24,844-Speed 24190.54 samples/sec Loss 1.2723 LearningRate 0.0000 Epoch: 37 Global Step: 64770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:53:35,106-Speed 23951.96 samples/sec Loss 1.2694 LearningRate 0.0000 Epoch: 37 Global Step: 64780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:53:45,283-Speed 24152.27 samples/sec Loss 1.2602 LearningRate 0.0000 Epoch: 37 Global Step: 64790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:53:55,536-Speed 23973.07 samples/sec Loss 1.2606 LearningRate 0.0000 Epoch: 37 Global Step: 64800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:54:05,725-Speed 24124.73 samples/sec Loss 1.2700 LearningRate 0.0000 Epoch: 37 Global Step: 64810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:54:15,979-Speed 23968.49 samples/sec Loss 1.2711 LearningRate 0.0000 Epoch: 37 Global Step: 64820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:54:26,046-Speed 24417.28 samples/sec Loss 1.2683 LearningRate 0.0000 Epoch: 37 Global Step: 64830 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 16:54:36,164-Speed 24291.50 samples/sec Loss 1.2729 LearningRate 0.0000 Epoch: 37 Global Step: 64840 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 16:54:46,335-Speed 24165.83 samples/sec Loss 1.2721 LearningRate 0.0000 Epoch: 37 Global Step: 64850 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 16:54:56,499-Speed 24183.21 samples/sec Loss 1.2679 LearningRate 0.0000 Epoch: 37 Global Step: 64860 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 16:55:06,623-Speed 24277.94 samples/sec Loss 1.2739 LearningRate 0.0000 Epoch: 37 Global Step: 64870 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 16:55:16,803-Speed 24142.89 samples/sec Loss 1.2618 LearningRate 0.0000 Epoch: 37 Global Step: 64880 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 16:55:27,002-Speed 24101.09 samples/sec Loss 1.2759 LearningRate 0.0000 Epoch: 37 Global Step: 64890 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 16:55:37,088-Speed 24368.18 samples/sec Loss 1.2687 LearningRate 0.0000 Epoch: 37 Global Step: 64900 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 16:55:47,360-Speed 23927.02 samples/sec Loss 1.2652 LearningRate 0.0000 Epoch: 37 Global Step: 64910 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 16:55:57,465-Speed 24324.73 samples/sec Loss 1.2707 LearningRate 0.0000 Epoch: 37 Global Step: 64920 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 16:56:07,552-Speed 24365.89 samples/sec Loss 1.2735 LearningRate 0.0000 Epoch: 37 Global Step: 64930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:56:17,803-Speed 23977.04 samples/sec Loss 1.2817 LearningRate 0.0000 Epoch: 37 Global Step: 64940 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 16:56:27,954-Speed 24212.13 samples/sec Loss 1.2755 LearningRate 0.0000 Epoch: 37 Global Step: 64950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 16:56:38,098-Speed 24229.42 samples/sec Loss 1.2721 LearningRate 0.0000 Epoch: 37 Global Step: 64960 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 16:56:48,269-Speed 24165.36 samples/sec Loss 1.2675 LearningRate 0.0000 Epoch: 37 Global Step: 64970 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 16:56:58,397-Speed 24269.50 samples/sec Loss 1.2741 LearningRate 0.0000 Epoch: 37 Global Step: 64980 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 16:57:08,489-Speed 24354.83 samples/sec Loss 1.2787 LearningRate 0.0000 Epoch: 37 Global Step: 64990 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 16:57:18,588-Speed 24336.89 samples/sec Loss 1.2799 LearningRate 0.0000 Epoch: 37 Global Step: 65000 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 16:57:28,658-Speed 24408.07 samples/sec Loss 1.2688 LearningRate 0.0000 Epoch: 37 Global Step: 65010 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 16:57:38,819-Speed 24190.06 samples/sec Loss 1.2655 LearningRate 0.0000 Epoch: 37 Global Step: 65020 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 16:57:48,986-Speed 24176.56 samples/sec Loss 1.2776 LearningRate 0.0000 Epoch: 37 Global Step: 65030 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 16:57:59,210-Speed 24040.58 samples/sec Loss 1.2708 LearningRate 0.0000 Epoch: 37 Global Step: 65040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:58:09,368-Speed 24196.21 samples/sec Loss 1.2560 LearningRate 0.0000 Epoch: 37 Global Step: 65050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:58:19,554-Speed 24132.94 samples/sec Loss 1.2692 LearningRate 0.0000 Epoch: 37 Global Step: 65060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:58:29,669-Speed 24303.22 samples/sec Loss 1.2724 LearningRate 0.0000 Epoch: 37 Global Step: 65070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:58:39,779-Speed 24312.94 samples/sec Loss 1.2769 LearningRate 0.0000 Epoch: 37 Global Step: 65080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:58:50,028-Speed 23987.87 samples/sec Loss 1.2751 LearningRate 0.0000 Epoch: 37 Global Step: 65090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:59:00,189-Speed 24191.95 samples/sec Loss 1.2710 LearningRate 0.0000 Epoch: 37 Global Step: 65100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:59:10,360-Speed 24164.36 samples/sec Loss 1.2787 LearningRate 0.0000 Epoch: 37 Global Step: 65110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:59:20,535-Speed 24156.96 samples/sec Loss 1.2710 LearningRate 0.0000 Epoch: 37 Global Step: 65120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:59:30,689-Speed 24205.09 samples/sec Loss 1.2704 LearningRate 0.0000 Epoch: 37 Global Step: 65130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:59:40,765-Speed 24395.66 samples/sec Loss 1.2706 LearningRate 0.0000 Epoch: 37 Global Step: 65140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 16:59:50,863-Speed 24341.31 samples/sec Loss 1.2770 LearningRate 0.0000 Epoch: 37 Global Step: 65150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:00:00,944-Speed 24379.38 samples/sec Loss 1.2681 LearningRate 0.0000 Epoch: 37 Global Step: 65160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:00:11,161-Speed 24057.69 samples/sec Loss 1.2699 LearningRate 0.0000 Epoch: 37 Global Step: 65170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:00:21,356-Speed 24107.21 samples/sec Loss 1.2735 LearningRate 0.0000 Epoch: 37 Global Step: 65180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:00:31,562-Speed 24083.13 samples/sec Loss 1.2716 LearningRate 0.0000 Epoch: 37 Global Step: 65190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:00:41,663-Speed 24334.92 samples/sec Loss 1.2731 LearningRate 0.0000 Epoch: 37 Global Step: 65200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:00:51,719-Speed 24439.96 samples/sec Loss 1.2709 LearningRate 0.0000 Epoch: 37 Global Step: 65210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:01:01,822-Speed 24327.46 samples/sec Loss 1.2701 LearningRate 0.0000 Epoch: 37 Global Step: 65220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:01:11,907-Speed 24371.27 samples/sec Loss 1.2626 LearningRate 0.0000 Epoch: 37 Global Step: 65230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:01:22,044-Speed 24247.60 samples/sec Loss 1.2683 LearningRate 0.0000 Epoch: 37 Global Step: 65240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:01:32,119-Speed 24394.27 samples/sec Loss 1.2604 LearningRate 0.0000 Epoch: 37 Global Step: 65250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:01:42,279-Speed 24191.35 samples/sec Loss 1.2683 LearningRate 0.0000 Epoch: 37 Global Step: 65260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:01:52,346-Speed 24419.68 samples/sec Loss 1.2674 LearningRate 0.0000 Epoch: 37 Global Step: 65270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:02:02,424-Speed 24388.58 samples/sec Loss 1.2779 LearningRate 0.0000 Epoch: 37 Global Step: 65280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:02:12,553-Speed 24264.32 samples/sec Loss 1.2771 LearningRate 0.0000 Epoch: 37 Global Step: 65290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:02:22,793-Speed 24007.11 samples/sec Loss 1.2792 LearningRate 0.0000 Epoch: 37 Global Step: 65300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:02:32,911-Speed 24299.52 samples/sec Loss 1.2667 LearningRate 0.0000 Epoch: 37 Global Step: 65310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:02:43,144-Speed 24017.57 samples/sec Loss 1.2771 LearningRate 0.0000 Epoch: 37 Global Step: 65320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:02:53,376-Speed 24023.03 samples/sec Loss 1.2703 LearningRate 0.0000 Epoch: 37 Global Step: 65330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:03:03,478-Speed 24327.93 samples/sec Loss 1.2733 LearningRate 0.0000 Epoch: 37 Global Step: 65340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:03:13,692-Speed 24064.63 samples/sec Loss 1.2615 LearningRate 0.0000 Epoch: 37 Global Step: 65350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:03:23,892-Speed 24096.19 samples/sec Loss 1.2602 LearningRate 0.0000 Epoch: 37 Global Step: 65360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:03:34,062-Speed 24170.79 samples/sec Loss 1.2748 LearningRate 0.0000 Epoch: 37 Global Step: 65370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:03:44,140-Speed 24388.88 samples/sec Loss 1.2721 LearningRate 0.0000 Epoch: 37 Global Step: 65380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:03:54,208-Speed 24412.07 samples/sec Loss 1.2670 LearningRate 0.0000 Epoch: 37 Global Step: 65390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:04:04,300-Speed 24354.61 samples/sec Loss 1.2563 LearningRate 0.0000 Epoch: 37 Global Step: 65400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:04:14,343-Speed 24472.73 samples/sec Loss 1.2603 LearningRate 0.0000 Epoch: 37 Global Step: 65410 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 17:04:24,489-Speed 24225.27 samples/sec Loss 1.2797 LearningRate 0.0000 Epoch: 37 Global Step: 65420 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 17:04:34,660-Speed 24172.40 samples/sec Loss 1.2663 LearningRate 0.0000 Epoch: 37 Global Step: 65430 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 17:04:44,807-Speed 24222.75 samples/sec Loss 1.2737 LearningRate 0.0000 Epoch: 37 Global Step: 65440 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 17:04:55,034-Speed 24032.77 samples/sec Loss 1.2778 LearningRate 0.0000 Epoch: 37 Global Step: 65450 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 17:05:05,136-Speed 24328.95 samples/sec Loss 1.2753 LearningRate 0.0000 Epoch: 37 Global Step: 65460 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 17:05:15,273-Speed 24253.42 samples/sec Loss 1.2725 LearningRate 0.0000 Epoch: 37 Global Step: 65470 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 17:05:25,452-Speed 24151.91 samples/sec Loss 1.2756 LearningRate 0.0000 Epoch: 37 Global Step: 65480 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 17:05:35,547-Speed 24353.91 samples/sec Loss 1.2801 LearningRate 0.0000 Epoch: 37 Global Step: 65490 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 17:05:45,811-Speed 23945.90 samples/sec Loss 1.2773 LearningRate 0.0000 Epoch: 37 Global Step: 65500 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 17:05:55,937-Speed 24273.42 samples/sec Loss 1.2525 LearningRate 0.0000 Epoch: 37 Global Step: 65510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:06:06,093-Speed 24200.37 samples/sec Loss 1.2795 LearningRate 0.0000 Epoch: 37 Global Step: 65520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:06:16,263-Speed 24168.64 samples/sec Loss 1.2663 LearningRate 0.0000 Epoch: 37 Global Step: 65530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:06:26,356-Speed 24352.04 samples/sec Loss 1.2537 LearningRate 0.0000 Epoch: 37 Global Step: 65540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:06:36,531-Speed 24157.14 samples/sec Loss 1.2602 LearningRate 0.0000 Epoch: 37 Global Step: 65550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:06:46,597-Speed 24415.76 samples/sec Loss 1.2647 LearningRate 0.0000 Epoch: 37 Global Step: 65560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:06:56,786-Speed 24122.57 samples/sec Loss 1.2728 LearningRate 0.0000 Epoch: 37 Global Step: 65570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:07:06,888-Speed 24331.53 samples/sec Loss 1.2706 LearningRate 0.0000 Epoch: 37 Global Step: 65580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:07:16,969-Speed 24382.69 samples/sec Loss 1.2735 LearningRate 0.0000 Epoch: 37 Global Step: 65590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:07:27,057-Speed 24364.52 samples/sec Loss 1.2705 LearningRate 0.0000 Epoch: 37 Global Step: 65600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:07:37,234-Speed 24151.66 samples/sec Loss 1.2679 LearningRate 0.0000 Epoch: 37 Global Step: 65610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:07:47,310-Speed 24403.15 samples/sec Loss 1.2702 LearningRate 0.0000 Epoch: 37 Global Step: 65620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:07:57,410-Speed 24336.77 samples/sec Loss 1.2824 LearningRate 0.0000 Epoch: 37 Global Step: 65630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:08:07,521-Speed 24310.08 samples/sec Loss 1.2691 LearningRate 0.0000 Epoch: 37 Global Step: 65640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:08:17,615-Speed 24356.12 samples/sec Loss 1.2828 LearningRate 0.0000 Epoch: 37 Global Step: 65650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:08:27,777-Speed 24187.60 samples/sec Loss 1.2753 LearningRate 0.0000 Epoch: 37 Global Step: 65660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:08:37,929-Speed 24210.70 samples/sec Loss 1.2759 LearningRate 0.0000 Epoch: 37 Global Step: 65670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:09:37,579-Speed 4120.19 samples/sec Loss 1.2674 LearningRate 0.0000 Epoch: 38 Global Step: 65680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:09:47,433-Speed 24943.76 samples/sec Loss 1.2749 LearningRate 0.0000 Epoch: 38 Global Step: 65690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:09:57,313-Speed 24875.96 samples/sec Loss 1.2799 LearningRate 0.0000 Epoch: 38 Global Step: 65700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:10:07,216-Speed 24822.92 samples/sec Loss 1.2665 LearningRate 0.0000 Epoch: 38 Global Step: 65710 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 17:10:17,173-Speed 24690.95 samples/sec Loss 1.2523 LearningRate 0.0000 Epoch: 38 Global Step: 65720 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 17:10:27,151-Speed 24633.16 samples/sec Loss 1.2672 LearningRate 0.0000 Epoch: 38 Global Step: 65730 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 17:10:37,138-Speed 24618.66 samples/sec Loss 1.2695 LearningRate 0.0000 Epoch: 38 Global Step: 65740 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 17:10:47,099-Speed 24675.63 samples/sec Loss 1.2646 LearningRate 0.0000 Epoch: 38 Global Step: 65750 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 17:10:57,025-Speed 24761.63 samples/sec Loss 1.2685 LearningRate 0.0000 Epoch: 38 Global Step: 65760 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 17:11:07,107-Speed 24378.18 samples/sec Loss 1.2656 LearningRate 0.0000 Epoch: 38 Global Step: 65770 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 17:11:17,076-Speed 24655.75 samples/sec Loss 1.2578 LearningRate 0.0000 Epoch: 38 Global Step: 65780 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 17:11:27,093-Speed 24539.45 samples/sec Loss 1.2676 LearningRate 0.0000 Epoch: 38 Global Step: 65790 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 17:11:37,004-Speed 24799.65 samples/sec Loss 1.2678 LearningRate 0.0000 Epoch: 38 Global Step: 65800 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 17:11:46,964-Speed 24676.63 samples/sec Loss 1.2720 LearningRate 0.0000 Epoch: 38 Global Step: 65810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:11:56,934-Speed 24651.91 samples/sec Loss 1.2568 LearningRate 0.0000 Epoch: 38 Global Step: 65820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:12:06,977-Speed 24481.39 samples/sec Loss 1.2624 LearningRate 0.0000 Epoch: 38 Global Step: 65830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:12:16,998-Speed 24527.50 samples/sec Loss 1.2709 LearningRate 0.0000 Epoch: 38 Global Step: 65840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:12:26,957-Speed 24679.62 samples/sec Loss 1.2626 LearningRate 0.0000 Epoch: 38 Global Step: 65850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:12:36,980-Speed 24532.02 samples/sec Loss 1.2710 LearningRate 0.0000 Epoch: 38 Global Step: 65860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:12:47,003-Speed 24522.93 samples/sec Loss 1.2651 LearningRate 0.0000 Epoch: 38 Global Step: 65870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:12:57,009-Speed 24567.37 samples/sec Loss 1.2695 LearningRate 0.0000 Epoch: 38 Global Step: 65880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:13:07,070-Speed 24429.51 samples/sec Loss 1.2672 LearningRate 0.0000 Epoch: 38 Global Step: 65890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:13:17,107-Speed 24490.09 samples/sec Loss 1.2654 LearningRate 0.0000 Epoch: 38 Global Step: 65900 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 17:13:27,077-Speed 24652.85 samples/sec Loss 1.2700 LearningRate 0.0000 Epoch: 38 Global Step: 65910 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 17:13:37,217-Speed 24238.61 samples/sec Loss 1.2734 LearningRate 0.0000 Epoch: 38 Global Step: 65920 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 17:13:47,216-Speed 24581.92 samples/sec Loss 1.2711 LearningRate 0.0000 Epoch: 38 Global Step: 65930 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 17:13:57,268-Speed 24450.94 samples/sec Loss 1.2677 LearningRate 0.0000 Epoch: 38 Global Step: 65940 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 17:14:07,167-Speed 24829.59 samples/sec Loss 1.2627 LearningRate 0.0000 Epoch: 38 Global Step: 65950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 17:14:17,137-Speed 24652.95 samples/sec Loss 1.2609 LearningRate 0.0000 Epoch: 38 Global Step: 65960 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 17:14:27,051-Speed 24790.67 samples/sec Loss 1.2615 LearningRate 0.0000 Epoch: 38 Global Step: 65970 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 17:14:37,014-Speed 24670.55 samples/sec Loss 1.2712 LearningRate 0.0000 Epoch: 38 Global Step: 65980 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 17:14:46,962-Speed 24709.75 samples/sec Loss 1.2751 LearningRate 0.0000 Epoch: 38 Global Step: 65990 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 17:14:57,055-Speed 24358.23 samples/sec Loss 1.2638 LearningRate 0.0000 Epoch: 38 Global Step: 66000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:15:07,056-Speed 24576.14 samples/sec Loss 1.2697 LearningRate 0.0000 Epoch: 38 Global Step: 66010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:15:17,025-Speed 24655.41 samples/sec Loss 1.2618 LearningRate 0.0000 Epoch: 38 Global Step: 66020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:15:27,000-Speed 24640.50 samples/sec Loss 1.2730 LearningRate 0.0000 Epoch: 38 Global Step: 66030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:15:36,991-Speed 24601.73 samples/sec Loss 1.2727 LearningRate 0.0000 Epoch: 38 Global Step: 66040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:15:46,863-Speed 24899.90 samples/sec Loss 1.2661 LearningRate 0.0000 Epoch: 38 Global Step: 66050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:15:56,857-Speed 24593.29 samples/sec Loss 1.2724 LearningRate 0.0000 Epoch: 38 Global Step: 66060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:16:06,831-Speed 24644.57 samples/sec Loss 1.2655 LearningRate 0.0000 Epoch: 38 Global Step: 66070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:16:16,747-Speed 24793.78 samples/sec Loss 1.2638 LearningRate 0.0000 Epoch: 38 Global Step: 66080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:16:26,658-Speed 24799.37 samples/sec Loss 1.2661 LearningRate 0.0000 Epoch: 38 Global Step: 66090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:16:36,649-Speed 24610.81 samples/sec Loss 1.2712 LearningRate 0.0000 Epoch: 38 Global Step: 66100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:16:46,661-Speed 24551.73 samples/sec Loss 1.2663 LearningRate 0.0000 Epoch: 38 Global Step: 66110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:16:56,640-Speed 24630.54 samples/sec Loss 1.2605 LearningRate 0.0000 Epoch: 38 Global Step: 66120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:17:06,559-Speed 24780.19 samples/sec Loss 1.2709 LearningRate 0.0000 Epoch: 38 Global Step: 66130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:17:16,483-Speed 24767.44 samples/sec Loss 1.2687 LearningRate 0.0000 Epoch: 38 Global Step: 66140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:17:26,417-Speed 24740.91 samples/sec Loss 1.2720 LearningRate 0.0000 Epoch: 38 Global Step: 66150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:17:36,103-Speed 25377.71 samples/sec Loss 1.2738 LearningRate 0.0000 Epoch: 38 Global Step: 66160 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 17:17:45,957-Speed 24942.34 samples/sec Loss 1.2613 LearningRate 0.0000 Epoch: 38 Global Step: 66170 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 17:17:55,841-Speed 24867.46 samples/sec Loss 1.2692 LearningRate 0.0000 Epoch: 38 Global Step: 66180 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 17:18:05,760-Speed 24782.42 samples/sec Loss 1.2674 LearningRate 0.0000 Epoch: 38 Global Step: 66190 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 17:18:15,648-Speed 24857.73 samples/sec Loss 1.2645 LearningRate 0.0000 Epoch: 38 Global Step: 66200 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 17:18:25,591-Speed 24718.29 samples/sec Loss 1.2632 LearningRate 0.0000 Epoch: 38 Global Step: 66210 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 17:18:35,489-Speed 24832.69 samples/sec Loss 1.2681 LearningRate 0.0000 Epoch: 38 Global Step: 66220 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 17:18:45,408-Speed 24781.79 samples/sec Loss 1.2647 LearningRate 0.0000 Epoch: 38 Global Step: 66230 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 17:18:55,304-Speed 24838.96 samples/sec Loss 1.2595 LearningRate 0.0000 Epoch: 38 Global Step: 66240 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 17:19:05,205-Speed 24824.40 samples/sec Loss 1.2701 LearningRate 0.0000 Epoch: 38 Global Step: 66250 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-26 17:19:15,065-Speed 24929.50 samples/sec Loss 1.2657 LearningRate 0.0000 Epoch: 38 Global Step: 66260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:19:24,906-Speed 24973.72 samples/sec Loss 1.2802 LearningRate 0.0000 Epoch: 38 Global Step: 66270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:19:34,897-Speed 24601.97 samples/sec Loss 1.2747 LearningRate 0.0000 Epoch: 38 Global Step: 66280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:19:44,702-Speed 25070.64 samples/sec Loss 1.2665 LearningRate 0.0000 Epoch: 38 Global Step: 66290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:19:54,517-Speed 25040.78 samples/sec Loss 1.2737 LearningRate 0.0000 Epoch: 38 Global Step: 66300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:20:04,478-Speed 24675.58 samples/sec Loss 1.2680 LearningRate 0.0000 Epoch: 38 Global Step: 66310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:20:14,223-Speed 25222.25 samples/sec Loss 1.2584 LearningRate 0.0000 Epoch: 38 Global Step: 66320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:20:24,106-Speed 24869.05 samples/sec Loss 1.2617 LearningRate 0.0000 Epoch: 38 Global Step: 66330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:20:33,881-Speed 25146.69 samples/sec Loss 1.2660 LearningRate 0.0000 Epoch: 38 Global Step: 66340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:20:43,713-Speed 24996.87 samples/sec Loss 1.2692 LearningRate 0.0000 Epoch: 38 Global Step: 66350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:20:53,517-Speed 25070.32 samples/sec Loss 1.2661 LearningRate 0.0000 Epoch: 38 Global Step: 66360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:21:03,312-Speed 25093.51 samples/sec Loss 1.2614 LearningRate 0.0000 Epoch: 38 Global Step: 66370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:21:13,278-Speed 24661.35 samples/sec Loss 1.2722 LearningRate 0.0000 Epoch: 38 Global Step: 66380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:21:23,247-Speed 24654.69 samples/sec Loss 1.2670 LearningRate 0.0000 Epoch: 38 Global Step: 66390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:21:33,156-Speed 24804.07 samples/sec Loss 1.2705 LearningRate 0.0000 Epoch: 38 Global Step: 66400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:21:43,093-Speed 24736.72 samples/sec Loss 1.2682 LearningRate 0.0000 Epoch: 38 Global Step: 66410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:21:53,057-Speed 24665.28 samples/sec Loss 1.2644 LearningRate 0.0000 Epoch: 38 Global Step: 66420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:22:02,864-Speed 25063.72 samples/sec Loss 1.2631 LearningRate 0.0000 Epoch: 38 Global Step: 66430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:22:12,759-Speed 24839.57 samples/sec Loss 1.2626 LearningRate 0.0000 Epoch: 38 Global Step: 66440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:22:22,645-Speed 24862.67 samples/sec Loss 1.2672 LearningRate 0.0000 Epoch: 38 Global Step: 66450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:22:32,538-Speed 24844.03 samples/sec Loss 1.2648 LearningRate 0.0000 Epoch: 38 Global Step: 66460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:22:42,592-Speed 24447.23 samples/sec Loss 1.2701 LearningRate 0.0000 Epoch: 38 Global Step: 66470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:22:52,454-Speed 24921.96 samples/sec Loss 1.2621 LearningRate 0.0000 Epoch: 38 Global Step: 66480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:23:02,269-Speed 25043.96 samples/sec Loss 1.2640 LearningRate 0.0000 Epoch: 38 Global Step: 66490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:23:12,044-Speed 25144.20 samples/sec Loss 1.2653 LearningRate 0.0000 Epoch: 38 Global Step: 66500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:23:22,001-Speed 24683.99 samples/sec Loss 1.2601 LearningRate 0.0000 Epoch: 38 Global Step: 66510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:23:31,976-Speed 24641.97 samples/sec Loss 1.2668 LearningRate 0.0000 Epoch: 38 Global Step: 66520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:23:41,817-Speed 24976.38 samples/sec Loss 1.2530 LearningRate 0.0000 Epoch: 38 Global Step: 66530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:23:51,662-Speed 24965.36 samples/sec Loss 1.2673 LearningRate 0.0000 Epoch: 38 Global Step: 66540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:24:01,468-Speed 25063.10 samples/sec Loss 1.2514 LearningRate 0.0000 Epoch: 38 Global Step: 66550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:24:11,301-Speed 24995.66 samples/sec Loss 1.2578 LearningRate 0.0000 Epoch: 38 Global Step: 66560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:24:21,144-Speed 24972.57 samples/sec Loss 1.2707 LearningRate 0.0000 Epoch: 38 Global Step: 66570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:24:31,262-Speed 24292.05 samples/sec Loss 1.2585 LearningRate 0.0000 Epoch: 38 Global Step: 66580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:24:41,069-Speed 25060.43 samples/sec Loss 1.2666 LearningRate 0.0000 Epoch: 38 Global Step: 66590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:24:50,854-Speed 25119.21 samples/sec Loss 1.2605 LearningRate 0.0000 Epoch: 38 Global Step: 66600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:25:00,760-Speed 24819.25 samples/sec Loss 1.2655 LearningRate 0.0000 Epoch: 38 Global Step: 66610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:25:10,639-Speed 24878.58 samples/sec Loss 1.2592 LearningRate 0.0000 Epoch: 38 Global Step: 66620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:25:20,465-Speed 25013.61 samples/sec Loss 1.2651 LearningRate 0.0000 Epoch: 38 Global Step: 66630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:25:30,283-Speed 25034.73 samples/sec Loss 1.2664 LearningRate 0.0000 Epoch: 38 Global Step: 66640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:25:40,062-Speed 25134.98 samples/sec Loss 1.2618 LearningRate 0.0000 Epoch: 38 Global Step: 66650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:25:49,943-Speed 24873.61 samples/sec Loss 1.2560 LearningRate 0.0000 Epoch: 38 Global Step: 66660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-26 17:25:59,735-Speed 25101.76 samples/sec Loss 1.2637 LearningRate 0.0000 Epoch: 38 Global Step: 66670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:26:09,614-Speed 24880.33 samples/sec Loss 1.2626 LearningRate 0.0000 Epoch: 38 Global Step: 66680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:26:19,406-Speed 25101.21 samples/sec Loss 1.2646 LearningRate 0.0000 Epoch: 38 Global Step: 66690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:26:29,292-Speed 24863.29 samples/sec Loss 1.2635 LearningRate 0.0000 Epoch: 38 Global Step: 66700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:26:39,135-Speed 24968.67 samples/sec Loss 1.2578 LearningRate 0.0000 Epoch: 38 Global Step: 66710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:26:48,949-Speed 25044.69 samples/sec Loss 1.2635 LearningRate 0.0000 Epoch: 38 Global Step: 66720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:26:58,751-Speed 25077.92 samples/sec Loss 1.2592 LearningRate 0.0000 Epoch: 38 Global Step: 66730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:27:08,501-Speed 25207.75 samples/sec Loss 1.2624 LearningRate 0.0000 Epoch: 38 Global Step: 66740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:27:18,294-Speed 25098.61 samples/sec Loss 1.2651 LearningRate 0.0000 Epoch: 38 Global Step: 66750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:27:28,185-Speed 24851.93 samples/sec Loss 1.2651 LearningRate 0.0000 Epoch: 38 Global Step: 66760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:27:38,250-Speed 24419.52 samples/sec Loss 1.2660 LearningRate 0.0000 Epoch: 38 Global Step: 66770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:27:48,104-Speed 24944.38 samples/sec Loss 1.2636 LearningRate 0.0000 Epoch: 38 Global Step: 66780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:27:57,990-Speed 24860.29 samples/sec Loss 1.2533 LearningRate 0.0000 Epoch: 38 Global Step: 66790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:28:07,678-Speed 25370.67 samples/sec Loss 1.2553 LearningRate 0.0000 Epoch: 38 Global Step: 66800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:28:17,391-Speed 25303.50 samples/sec Loss 1.2689 LearningRate 0.0000 Epoch: 38 Global Step: 66810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:28:27,202-Speed 25053.99 samples/sec Loss 1.2641 LearningRate 0.0000 Epoch: 38 Global Step: 66820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:28:37,038-Speed 24989.24 samples/sec Loss 1.2650 LearningRate 0.0000 Epoch: 38 Global Step: 66830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:28:46,759-Speed 25283.97 samples/sec Loss 1.2609 LearningRate 0.0000 Epoch: 38 Global Step: 66840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:28:56,516-Speed 25191.27 samples/sec Loss 1.2523 LearningRate 0.0000 Epoch: 38 Global Step: 66850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:29:06,355-Speed 24981.42 samples/sec Loss 1.2666 LearningRate 0.0000 Epoch: 38 Global Step: 66860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:29:16,213-Speed 24935.28 samples/sec Loss 1.2686 LearningRate 0.0000 Epoch: 38 Global Step: 66870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:29:26,095-Speed 24890.64 samples/sec Loss 1.2588 LearningRate 0.0000 Epoch: 38 Global Step: 66880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:29:35,855-Speed 25185.01 samples/sec Loss 1.2546 LearningRate 0.0000 Epoch: 38 Global Step: 66890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:29:45,639-Speed 25122.92 samples/sec Loss 1.2608 LearningRate 0.0000 Epoch: 38 Global Step: 66900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:29:55,454-Speed 25042.64 samples/sec Loss 1.2675 LearningRate 0.0000 Epoch: 38 Global Step: 66910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:30:05,304-Speed 24960.36 samples/sec Loss 1.2459 LearningRate 0.0000 Epoch: 38 Global Step: 66920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:30:15,077-Speed 25149.67 samples/sec Loss 1.2709 LearningRate 0.0000 Epoch: 38 Global Step: 66930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:30:24,912-Speed 24999.51 samples/sec Loss 1.2688 LearningRate 0.0000 Epoch: 38 Global Step: 66940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:30:34,778-Speed 24911.89 samples/sec Loss 1.2690 LearningRate 0.0000 Epoch: 38 Global Step: 66950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:30:44,511-Speed 25255.24 samples/sec Loss 1.2576 LearningRate 0.0000 Epoch: 38 Global Step: 66960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:30:54,346-Speed 24991.29 samples/sec Loss 1.2684 LearningRate 0.0000 Epoch: 38 Global Step: 66970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:31:04,160-Speed 25045.45 samples/sec Loss 1.2776 LearningRate 0.0000 Epoch: 38 Global Step: 66980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:31:13,976-Speed 25038.54 samples/sec Loss 1.2652 LearningRate 0.0000 Epoch: 38 Global Step: 66990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:31:23,767-Speed 25105.28 samples/sec Loss 1.2565 LearningRate 0.0000 Epoch: 38 Global Step: 67000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:31:33,477-Speed 25313.67 samples/sec Loss 1.2709 LearningRate 0.0000 Epoch: 38 Global Step: 67010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:31:43,251-Speed 25146.62 samples/sec Loss 1.2620 LearningRate 0.0000 Epoch: 38 Global Step: 67020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:31:53,019-Speed 25163.19 samples/sec Loss 1.2593 LearningRate 0.0000 Epoch: 38 Global Step: 67030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:32:02,766-Speed 25219.46 samples/sec Loss 1.2583 LearningRate 0.0000 Epoch: 38 Global Step: 67040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:32:12,471-Speed 25331.74 samples/sec Loss 1.2613 LearningRate 0.0000 Epoch: 38 Global Step: 67050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:32:22,300-Speed 25005.52 samples/sec Loss 1.2596 LearningRate 0.0000 Epoch: 38 Global Step: 67060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:32:32,001-Speed 25338.16 samples/sec Loss 1.2668 LearningRate 0.0000 Epoch: 38 Global Step: 67070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:32:41,791-Speed 25106.29 samples/sec Loss 1.2613 LearningRate 0.0000 Epoch: 38 Global Step: 67080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:32:51,622-Speed 25003.47 samples/sec Loss 1.2606 LearningRate 0.0000 Epoch: 38 Global Step: 67090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:33:01,335-Speed 25304.83 samples/sec Loss 1.2540 LearningRate 0.0000 Epoch: 38 Global Step: 67100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:33:11,074-Speed 25238.75 samples/sec Loss 1.2552 LearningRate 0.0000 Epoch: 38 Global Step: 67110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:33:20,869-Speed 25095.24 samples/sec Loss 1.2628 LearningRate 0.0000 Epoch: 38 Global Step: 67120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:33:30,611-Speed 25229.15 samples/sec Loss 1.2625 LearningRate 0.0000 Epoch: 38 Global Step: 67130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:33:40,322-Speed 25311.67 samples/sec Loss 1.2674 LearningRate 0.0000 Epoch: 38 Global Step: 67140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:33:50,135-Speed 25048.28 samples/sec Loss 1.2667 LearningRate 0.0000 Epoch: 38 Global Step: 67150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:33:59,898-Speed 25176.21 samples/sec Loss 1.2555 LearningRate 0.0000 Epoch: 38 Global Step: 67160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:34:09,632-Speed 25249.19 samples/sec Loss 1.2588 LearningRate 0.0000 Epoch: 38 Global Step: 67170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:34:19,456-Speed 25021.52 samples/sec Loss 1.2609 LearningRate 0.0000 Epoch: 38 Global Step: 67180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:34:29,278-Speed 25023.44 samples/sec Loss 1.2684 LearningRate 0.0000 Epoch: 38 Global Step: 67190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:34:39,135-Speed 24937.09 samples/sec Loss 1.2688 LearningRate 0.0000 Epoch: 38 Global Step: 67200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:34:48,990-Speed 24941.17 samples/sec Loss 1.2585 LearningRate 0.0000 Epoch: 38 Global Step: 67210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:34:58,892-Speed 24827.93 samples/sec Loss 1.2594 LearningRate 0.0000 Epoch: 38 Global Step: 67220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:35:08,790-Speed 24832.53 samples/sec Loss 1.2691 LearningRate 0.0000 Epoch: 38 Global Step: 67230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:35:18,611-Speed 25028.81 samples/sec Loss 1.2560 LearningRate 0.0000 Epoch: 38 Global Step: 67240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:35:28,450-Speed 24980.93 samples/sec Loss 1.2659 LearningRate 0.0000 Epoch: 38 Global Step: 67250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:35:38,203-Speed 25205.29 samples/sec Loss 1.2528 LearningRate 0.0000 Epoch: 38 Global Step: 67260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:35:47,984-Speed 25134.13 samples/sec Loss 1.2614 LearningRate 0.0000 Epoch: 38 Global Step: 67270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:35:57,777-Speed 25097.44 samples/sec Loss 1.2654 LearningRate 0.0000 Epoch: 38 Global Step: 67280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:36:07,577-Speed 25082.78 samples/sec Loss 1.2696 LearningRate 0.0000 Epoch: 38 Global Step: 67290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:36:17,372-Speed 25093.79 samples/sec Loss 1.2595 LearningRate 0.0000 Epoch: 38 Global Step: 67300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:36:27,105-Speed 25254.63 samples/sec Loss 1.2573 LearningRate 0.0000 Epoch: 38 Global Step: 67310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:36:36,902-Speed 25088.84 samples/sec Loss 1.2626 LearningRate 0.0000 Epoch: 38 Global Step: 67320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:36:46,692-Speed 25108.94 samples/sec Loss 1.2682 LearningRate 0.0000 Epoch: 38 Global Step: 67330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:36:56,594-Speed 24822.68 samples/sec Loss 1.2579 LearningRate 0.0000 Epoch: 38 Global Step: 67340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-26 17:37:06,391-Speed 25090.15 samples/sec Loss 1.2672 LearningRate 0.0000 Epoch: 38 Global Step: 67350 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:37:16,299-Speed 24805.52 samples/sec Loss 1.2602 LearningRate 0.0000 Epoch: 38 Global Step: 67360 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:37:26,069-Speed 25158.87 samples/sec Loss 1.2684 LearningRate 0.0000 Epoch: 38 Global Step: 67370 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:37:35,887-Speed 25034.91 samples/sec Loss 1.2662 LearningRate 0.0000 Epoch: 38 Global Step: 67380 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:37:45,659-Speed 25162.35 samples/sec Loss 1.2652 LearningRate 0.0000 Epoch: 38 Global Step: 67390 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:37:55,456-Speed 25089.20 samples/sec Loss 1.2706 LearningRate 0.0000 Epoch: 38 Global Step: 67400 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:38:55,153-Speed 4116.84 samples/sec Loss 1.2665 LearningRate 0.0000 Epoch: 39 Global Step: 67410 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:39:04,993-Speed 24986.97 samples/sec Loss 1.2640 LearningRate 0.0000 Epoch: 39 Global Step: 67420 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:39:14,789-Speed 25092.19 samples/sec Loss 1.2711 LearningRate 0.0000 Epoch: 39 Global Step: 67430 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:39:24,539-Speed 25210.30 samples/sec Loss 1.2616 LearningRate 0.0000 Epoch: 39 Global Step: 67440 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:39:34,286-Speed 25218.18 samples/sec Loss 1.2611 LearningRate 0.0000 Epoch: 39 Global Step: 67450 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:39:44,102-Speed 25038.26 samples/sec Loss 1.2645 LearningRate 0.0000 Epoch: 39 Global Step: 67460 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:39:53,913-Speed 25052.23 samples/sec Loss 1.2641 LearningRate 0.0000 Epoch: 39 Global Step: 67470 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-03-26 17:40:03,743-Speed 25003.92 samples/sec Loss 1.2525 LearningRate 0.0000 Epoch: 39 Global Step: 67480 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:40:13,575-Speed 25007.45 samples/sec Loss 1.2592 LearningRate 0.0000 Epoch: 39 Global Step: 67490 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:40:23,352-Speed 25140.08 samples/sec Loss 1.2626 LearningRate 0.0000 Epoch: 39 Global Step: 67500 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:40:33,181-Speed 25007.93 samples/sec Loss 1.2647 LearningRate 0.0000 Epoch: 39 Global Step: 67510 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:40:42,965-Speed 25119.76 samples/sec Loss 1.2552 LearningRate 0.0000 Epoch: 39 Global Step: 67520 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:40:52,813-Speed 24962.22 samples/sec Loss 1.2559 LearningRate 0.0000 Epoch: 39 Global Step: 67530 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:41:02,679-Speed 24913.79 samples/sec Loss 1.2513 LearningRate 0.0000 Epoch: 39 Global Step: 67540 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:41:12,549-Speed 24908.10 samples/sec Loss 1.2560 LearningRate 0.0000 Epoch: 39 Global Step: 67550 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:41:22,410-Speed 24926.07 samples/sec Loss 1.2566 LearningRate 0.0000 Epoch: 39 Global Step: 67560 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:41:32,298-Speed 24858.11 samples/sec Loss 1.2532 LearningRate 0.0000 Epoch: 39 Global Step: 67570 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:41:42,005-Speed 25321.86 samples/sec Loss 1.2598 LearningRate 0.0000 Epoch: 39 Global Step: 67580 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:41:51,722-Speed 25295.90 samples/sec Loss 1.2586 LearningRate 0.0000 Epoch: 39 Global Step: 67590 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:42:01,414-Speed 25361.20 samples/sec Loss 1.2608 LearningRate 0.0000 Epoch: 39 Global Step: 67600 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:42:11,151-Speed 25242.92 samples/sec Loss 1.2615 LearningRate 0.0000 Epoch: 39 Global Step: 67610 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:42:20,858-Speed 25320.34 samples/sec Loss 1.2597 LearningRate 0.0000 Epoch: 39 Global Step: 67620 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:42:30,770-Speed 24798.10 samples/sec Loss 1.2612 LearningRate 0.0000 Epoch: 39 Global Step: 67630 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:42:40,560-Speed 25107.85 samples/sec Loss 1.2662 LearningRate 0.0000 Epoch: 39 Global Step: 67640 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:42:50,359-Speed 25081.79 samples/sec Loss 1.2561 LearningRate 0.0000 Epoch: 39 Global Step: 67650 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:43:00,070-Speed 25312.14 samples/sec Loss 1.2667 LearningRate 0.0000 Epoch: 39 Global Step: 67660 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:43:09,932-Speed 24921.84 samples/sec Loss 1.2671 LearningRate 0.0000 Epoch: 39 Global Step: 67670 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:43:19,774-Speed 24974.41 samples/sec Loss 1.2625 LearningRate 0.0000 Epoch: 39 Global Step: 67680 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:43:29,482-Speed 25317.43 samples/sec Loss 1.2620 LearningRate 0.0000 Epoch: 39 Global Step: 67690 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:43:39,315-Speed 24995.80 samples/sec Loss 1.2631 LearningRate 0.0000 Epoch: 39 Global Step: 67700 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:43:49,109-Speed 25097.75 samples/sec Loss 1.2684 LearningRate 0.0000 Epoch: 39 Global Step: 67710 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:43:59,011-Speed 24831.08 samples/sec Loss 1.2686 LearningRate 0.0000 Epoch: 39 Global Step: 67720 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:44:08,729-Speed 25294.58 samples/sec Loss 1.2530 LearningRate 0.0000 Epoch: 39 Global Step: 67730 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:44:18,444-Speed 25302.64 samples/sec Loss 1.2694 LearningRate 0.0000 Epoch: 39 Global Step: 67740 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:44:28,252-Speed 25060.39 samples/sec Loss 1.2578 LearningRate 0.0000 Epoch: 39 Global Step: 67750 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:44:38,045-Speed 25098.63 samples/sec Loss 1.2603 LearningRate 0.0000 Epoch: 39 Global Step: 67760 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:44:47,903-Speed 24932.58 samples/sec Loss 1.2626 LearningRate 0.0000 Epoch: 39 Global Step: 67770 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:44:57,797-Speed 24843.21 samples/sec Loss 1.2660 LearningRate 0.0000 Epoch: 39 Global Step: 67780 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:45:07,565-Speed 25163.05 samples/sec Loss 1.2631 LearningRate 0.0000 Epoch: 39 Global Step: 67790 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:45:17,361-Speed 25090.48 samples/sec Loss 1.2493 LearningRate 0.0000 Epoch: 39 Global Step: 67800 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:45:27,205-Speed 24973.01 samples/sec Loss 1.2620 LearningRate 0.0000 Epoch: 39 Global Step: 67810 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:45:36,993-Speed 25111.76 samples/sec Loss 1.2473 LearningRate 0.0000 Epoch: 39 Global Step: 67820 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:45:46,848-Speed 24941.35 samples/sec Loss 1.2674 LearningRate 0.0000 Epoch: 39 Global Step: 67830 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:45:56,612-Speed 25170.93 samples/sec Loss 1.2727 LearningRate 0.0000 Epoch: 39 Global Step: 67840 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:46:06,307-Speed 25352.85 samples/sec Loss 1.2510 LearningRate 0.0000 Epoch: 39 Global Step: 67850 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:46:16,111-Speed 25071.71 samples/sec Loss 1.2671 LearningRate 0.0000 Epoch: 39 Global Step: 67860 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:46:25,953-Speed 24973.25 samples/sec Loss 1.2593 LearningRate 0.0000 Epoch: 39 Global Step: 67870 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:46:35,745-Speed 25109.92 samples/sec Loss 1.2608 LearningRate 0.0000 Epoch: 39 Global Step: 67880 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:46:45,546-Speed 25078.90 samples/sec Loss 1.2746 LearningRate 0.0000 Epoch: 39 Global Step: 67890 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:46:55,241-Speed 25351.16 samples/sec Loss 1.2613 LearningRate 0.0000 Epoch: 39 Global Step: 67900 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-26 17:47:05,028-Speed 25114.68 samples/sec Loss 1.2623 LearningRate 0.0000 Epoch: 39 Global Step: 67910 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-26 17:47:15,143-Speed 24301.57 samples/sec Loss 1.2562 LearningRate 0.0000 Epoch: 39 Global Step: 67920 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-26 17:47:25,237-Speed 24350.01 samples/sec Loss 1.2584 LearningRate 0.0000 Epoch: 39 Global Step: 67930 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-26 17:47:35,359-Speed 24284.13 samples/sec Loss 1.2644 LearningRate 0.0000 Epoch: 39 Global Step: 67940 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-26 17:47:45,429-Speed 24409.34 samples/sec Loss 1.2617 LearningRate 0.0000 Epoch: 39 Global Step: 67950 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-26 17:47:55,479-Speed 24455.61 samples/sec Loss 1.2568 LearningRate 0.0000 Epoch: 39 Global Step: 67960 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-26 17:48:05,538-Speed 24434.91 samples/sec Loss 1.2635 LearningRate 0.0000 Epoch: 39 Global Step: 67970 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-26 17:48:15,623-Speed 24373.32 samples/sec Loss 1.2620 LearningRate 0.0000 Epoch: 39 Global Step: 67980 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-26 17:48:25,717-Speed 24349.37 samples/sec Loss 1.2618 LearningRate 0.0000 Epoch: 39 Global Step: 67990 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-26 17:48:35,778-Speed 24430.42 samples/sec Loss 1.2611 LearningRate 0.0000 Epoch: 39 Global Step: 68000 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:48:45,851-Speed 24402.19 samples/sec Loss 1.2609 LearningRate 0.0000 Epoch: 39 Global Step: 68010 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:48:55,912-Speed 24429.96 samples/sec Loss 1.2608 LearningRate 0.0000 Epoch: 39 Global Step: 68020 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:49:05,955-Speed 24474.28 samples/sec Loss 1.2559 LearningRate 0.0000 Epoch: 39 Global Step: 68030 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:49:16,043-Speed 24370.28 samples/sec Loss 1.2550 LearningRate 0.0000 Epoch: 39 Global Step: 68040 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:49:26,178-Speed 24252.18 samples/sec Loss 1.2637 LearningRate 0.0000 Epoch: 39 Global Step: 68050 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:49:36,251-Speed 24399.95 samples/sec Loss 1.2626 LearningRate 0.0000 Epoch: 39 Global Step: 68060 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:49:46,431-Speed 24145.25 samples/sec Loss 1.2574 LearningRate 0.0000 Epoch: 39 Global Step: 68070 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:49:56,493-Speed 24426.47 samples/sec Loss 1.2574 LearningRate 0.0000 Epoch: 39 Global Step: 68080 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-26 17:50:06,593-Speed 24336.47 samples/sec Loss 1.2679 LearningRate 0.0000 Epoch: 39 Global Step: 68090 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-26 17:50:16,684-Speed 24357.38 samples/sec Loss 1.2598 LearningRate 0.0000 Epoch: 39 Global Step: 68100 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-26 17:50:26,778-Speed 24357.96 samples/sec Loss 1.2605 LearningRate 0.0000 Epoch: 39 Global Step: 68110 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-26 17:50:36,892-Speed 24307.90 samples/sec Loss 1.2672 LearningRate 0.0000 Epoch: 39 Global Step: 68120 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-26 17:50:47,041-Speed 24217.74 samples/sec Loss 1.2625 LearningRate 0.0000 Epoch: 39 Global Step: 68130 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-26 17:50:57,225-Speed 24136.81 samples/sec Loss 1.2596 LearningRate 0.0000 Epoch: 39 Global Step: 68140 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-26 17:51:07,281-Speed 24443.33 samples/sec Loss 1.2531 LearningRate 0.0000 Epoch: 39 Global Step: 68150 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-26 17:51:17,372-Speed 24359.11 samples/sec Loss 1.2753 LearningRate 0.0000 Epoch: 39 Global Step: 68160 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-26 17:51:27,509-Speed 24247.91 samples/sec Loss 1.2558 LearningRate 0.0000 Epoch: 39 Global Step: 68170 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-26 17:51:37,594-Speed 24371.88 samples/sec Loss 1.2506 LearningRate 0.0000 Epoch: 39 Global Step: 68180 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:51:47,748-Speed 24207.68 samples/sec Loss 1.2659 LearningRate 0.0000 Epoch: 39 Global Step: 68190 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:51:57,946-Speed 24102.46 samples/sec Loss 1.2733 LearningRate 0.0000 Epoch: 39 Global Step: 68200 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:52:08,074-Speed 24268.56 samples/sec Loss 1.2551 LearningRate 0.0000 Epoch: 39 Global Step: 68210 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:52:18,178-Speed 24326.44 samples/sec Loss 1.2537 LearningRate 0.0000 Epoch: 39 Global Step: 68220 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:52:28,319-Speed 24238.41 samples/sec Loss 1.2656 LearningRate 0.0000 Epoch: 39 Global Step: 68230 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:52:38,484-Speed 24179.98 samples/sec Loss 1.2524 LearningRate 0.0000 Epoch: 39 Global Step: 68240 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:52:48,592-Speed 24324.26 samples/sec Loss 1.2617 LearningRate 0.0000 Epoch: 39 Global Step: 68250 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:52:58,710-Speed 24290.14 samples/sec Loss 1.2502 LearningRate 0.0000 Epoch: 39 Global Step: 68260 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:53:08,765-Speed 24446.43 samples/sec Loss 1.2542 LearningRate 0.0000 Epoch: 39 Global Step: 68270 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:53:18,892-Speed 24278.85 samples/sec Loss 1.2606 LearningRate 0.0000 Epoch: 39 Global Step: 68280 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:53:28,926-Speed 24495.75 samples/sec Loss 1.2488 LearningRate 0.0000 Epoch: 39 Global Step: 68290 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-26 17:53:39,003-Speed 24392.08 samples/sec Loss 1.2537 LearningRate 0.0000 Epoch: 39 Global Step: 68300 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-26 17:53:49,105-Speed 24330.59 samples/sec Loss 1.2560 LearningRate 0.0000 Epoch: 39 Global Step: 68310 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-26 17:53:59,135-Speed 24505.50 samples/sec Loss 1.2611 LearningRate 0.0000 Epoch: 39 Global Step: 68320 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-26 17:54:09,241-Speed 24320.04 samples/sec Loss 1.2468 LearningRate 0.0000 Epoch: 39 Global Step: 68330 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-26 17:54:19,406-Speed 24181.56 samples/sec Loss 1.2601 LearningRate 0.0000 Epoch: 39 Global Step: 68340 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-26 17:54:29,583-Speed 24148.80 samples/sec Loss 1.2595 LearningRate 0.0000 Epoch: 39 Global Step: 68350 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-26 17:54:39,733-Speed 24218.21 samples/sec Loss 1.2567 LearningRate 0.0000 Epoch: 39 Global Step: 68360 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-26 17:54:49,908-Speed 24157.19 samples/sec Loss 1.2555 LearningRate 0.0000 Epoch: 39 Global Step: 68370 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-26 17:54:59,724-Speed 25040.05 samples/sec Loss 1.2692 LearningRate 0.0000 Epoch: 39 Global Step: 68380 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-26 17:55:09,530-Speed 25067.75 samples/sec Loss 1.2510 LearningRate 0.0000 Epoch: 39 Global Step: 68390 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:55:19,412-Speed 24875.30 samples/sec Loss 1.2581 LearningRate 0.0000 Epoch: 39 Global Step: 68400 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:55:29,176-Speed 25173.41 samples/sec Loss 1.2603 LearningRate 0.0000 Epoch: 39 Global Step: 68410 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:55:38,955-Speed 25134.69 samples/sec Loss 1.2678 LearningRate 0.0000 Epoch: 39 Global Step: 68420 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:55:48,788-Speed 24998.81 samples/sec Loss 1.2622 LearningRate 0.0000 Epoch: 39 Global Step: 68430 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:55:58,595-Speed 25063.46 samples/sec Loss 1.2598 LearningRate 0.0000 Epoch: 39 Global Step: 68440 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:56:08,446-Speed 24953.19 samples/sec Loss 1.2577 LearningRate 0.0000 Epoch: 39 Global Step: 68450 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:56:18,247-Speed 25078.83 samples/sec Loss 1.2552 LearningRate 0.0000 Epoch: 39 Global Step: 68460 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:56:28,081-Speed 24996.67 samples/sec Loss 1.2616 LearningRate 0.0000 Epoch: 39 Global Step: 68470 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:56:37,861-Speed 25132.88 samples/sec Loss 1.2589 LearningRate 0.0000 Epoch: 39 Global Step: 68480 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:56:47,711-Speed 24951.09 samples/sec Loss 1.2567 LearningRate 0.0000 Epoch: 39 Global Step: 68490 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:56:57,519-Speed 25059.28 samples/sec Loss 1.2639 LearningRate 0.0000 Epoch: 39 Global Step: 68500 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:57:07,319-Speed 25082.45 samples/sec Loss 1.2525 LearningRate 0.0000 Epoch: 39 Global Step: 68510 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:57:17,132-Speed 25047.20 samples/sec Loss 1.2639 LearningRate 0.0000 Epoch: 39 Global Step: 68520 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:57:26,889-Speed 25192.34 samples/sec Loss 1.2584 LearningRate 0.0000 Epoch: 39 Global Step: 68530 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:57:36,697-Speed 25059.92 samples/sec Loss 1.2652 LearningRate 0.0000 Epoch: 39 Global Step: 68540 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:57:46,480-Speed 25123.22 samples/sec Loss 1.2649 LearningRate 0.0000 Epoch: 39 Global Step: 68550 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:57:56,263-Speed 25124.18 samples/sec Loss 1.2542 LearningRate 0.0000 Epoch: 39 Global Step: 68560 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:58:05,976-Speed 25303.32 samples/sec Loss 1.2598 LearningRate 0.0000 Epoch: 39 Global Step: 68570 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:58:15,755-Speed 25143.39 samples/sec Loss 1.2559 LearningRate 0.0000 Epoch: 39 Global Step: 68580 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:58:25,528-Speed 25148.83 samples/sec Loss 1.2643 LearningRate 0.0000 Epoch: 39 Global Step: 68590 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:58:35,487-Speed 24680.11 samples/sec Loss 1.2648 LearningRate 0.0000 Epoch: 39 Global Step: 68600 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:58:45,206-Speed 25294.06 samples/sec Loss 1.2595 LearningRate 0.0000 Epoch: 39 Global Step: 68610 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:58:54,975-Speed 25160.11 samples/sec Loss 1.2565 LearningRate 0.0000 Epoch: 39 Global Step: 68620 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:59:04,815-Speed 24978.83 samples/sec Loss 1.2600 LearningRate 0.0000 Epoch: 39 Global Step: 68630 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:59:14,565-Speed 25208.44 samples/sec Loss 1.2478 LearningRate 0.0000 Epoch: 39 Global Step: 68640 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:59:24,423-Speed 24934.02 samples/sec Loss 1.2556 LearningRate 0.0000 Epoch: 39 Global Step: 68650 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:59:34,153-Speed 25262.66 samples/sec Loss 1.2509 LearningRate 0.0000 Epoch: 39 Global Step: 68660 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:59:43,997-Speed 24970.57 samples/sec Loss 1.2590 LearningRate 0.0000 Epoch: 39 Global Step: 68670 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 17:59:53,738-Speed 25231.60 samples/sec Loss 1.2593 LearningRate 0.0000 Epoch: 39 Global Step: 68680 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 18:00:03,397-Speed 25452.72 samples/sec Loss 1.2620 LearningRate 0.0000 Epoch: 39 Global Step: 68690 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 18:00:13,194-Speed 25087.11 samples/sec Loss 1.2620 LearningRate 0.0000 Epoch: 39 Global Step: 68700 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 18:00:23,106-Speed 24799.32 samples/sec Loss 1.2530 LearningRate 0.0000 Epoch: 39 Global Step: 68710 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 18:00:32,880-Speed 25146.10 samples/sec Loss 1.2580 LearningRate 0.0000 Epoch: 39 Global Step: 68720 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 18:00:42,630-Speed 25209.55 samples/sec Loss 1.2659 LearningRate 0.0000 Epoch: 39 Global Step: 68730 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 18:00:52,342-Speed 25308.97 samples/sec Loss 1.2611 LearningRate 0.0000 Epoch: 39 Global Step: 68740 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 18:01:02,083-Speed 25231.30 samples/sec Loss 1.2593 LearningRate 0.0000 Epoch: 39 Global Step: 68750 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 18:01:11,901-Speed 25037.00 samples/sec Loss 1.2715 LearningRate 0.0000 Epoch: 39 Global Step: 68760 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 18:01:21,626-Speed 25273.34 samples/sec Loss 1.2572 LearningRate 0.0000 Epoch: 39 Global Step: 68770 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 18:01:31,402-Speed 25141.34 samples/sec Loss 1.2513 LearningRate 0.0000 Epoch: 39 Global Step: 68780 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-26 18:01:41,187-Speed 25120.28 samples/sec Loss 1.2566 LearningRate 0.0000 Epoch: 39 Global Step: 68790 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-26 18:01:51,061-Speed 24893.45 samples/sec Loss 1.2603 LearningRate 0.0000 Epoch: 39 Global Step: 68800 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-26 18:02:00,911-Speed 24952.86 samples/sec Loss 1.2563 LearningRate 0.0000 Epoch: 39 Global Step: 68810 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-26 18:02:10,648-Speed 25254.02 samples/sec Loss 1.2593 LearningRate 0.0000 Epoch: 39 Global Step: 68820 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-26 18:02:20,532-Speed 24866.89 samples/sec Loss 1.2508 LearningRate 0.0000 Epoch: 39 Global Step: 68830 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-26 18:02:30,303-Speed 25157.68 samples/sec Loss 1.2590 LearningRate 0.0000 Epoch: 39 Global Step: 68840 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-26 18:02:40,070-Speed 25165.52 samples/sec Loss 1.2480 LearningRate 0.0000 Epoch: 39 Global Step: 68850 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-26 18:02:49,843-Speed 25149.69 samples/sec Loss 1.2672 LearningRate 0.0000 Epoch: 39 Global Step: 68860 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-26 18:02:59,725-Speed 24872.56 samples/sec Loss 1.2574 LearningRate 0.0000 Epoch: 39 Global Step: 68870 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-26 18:03:09,586-Speed 24926.23 samples/sec Loss 1.2653 LearningRate 0.0000 Epoch: 39 Global Step: 68880 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 18:03:19,411-Speed 25017.00 samples/sec Loss 1.2594 LearningRate 0.0000 Epoch: 39 Global Step: 68890 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 18:03:29,194-Speed 25123.22 samples/sec Loss 1.2592 LearningRate 0.0000 Epoch: 39 Global Step: 68900 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 18:03:38,983-Speed 25109.10 samples/sec Loss 1.2541 LearningRate 0.0000 Epoch: 39 Global Step: 68910 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 18:03:48,696-Speed 25312.85 samples/sec Loss 1.2617 LearningRate 0.0000 Epoch: 39 Global Step: 68920 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 18:03:58,544-Speed 24958.36 samples/sec Loss 1.2657 LearningRate 0.0000 Epoch: 39 Global Step: 68930 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 18:04:08,288-Speed 25225.57 samples/sec Loss 1.2557 LearningRate 0.0000 Epoch: 39 Global Step: 68940 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 18:04:18,018-Speed 25259.33 samples/sec Loss 1.2675 LearningRate 0.0000 Epoch: 39 Global Step: 68950 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 18:04:27,735-Speed 25294.97 samples/sec Loss 1.2526 LearningRate 0.0000 Epoch: 39 Global Step: 68960 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 18:04:37,530-Speed 25093.71 samples/sec Loss 1.2488 LearningRate 0.0000 Epoch: 39 Global Step: 68970 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 18:04:47,395-Speed 24916.66 samples/sec Loss 1.2570 LearningRate 0.0000 Epoch: 39 Global Step: 68980 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 18:04:57,076-Speed 25389.81 samples/sec Loss 1.2533 LearningRate 0.0000 Epoch: 39 Global Step: 68990 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 18:05:06,902-Speed 25013.50 samples/sec Loss 1.2644 LearningRate 0.0000 Epoch: 39 Global Step: 69000 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 18:05:16,652-Speed 25210.48 samples/sec Loss 1.2592 LearningRate 0.0000 Epoch: 39 Global Step: 69010 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 18:05:26,372-Speed 25286.79 samples/sec Loss 1.2598 LearningRate 0.0000 Epoch: 39 Global Step: 69020 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 18:05:36,177-Speed 25068.14 samples/sec Loss 1.2561 LearningRate 0.0000 Epoch: 39 Global Step: 69030 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 18:05:45,869-Speed 25360.39 samples/sec Loss 1.2640 LearningRate 0.0000 Epoch: 39 Global Step: 69040 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 18:05:55,572-Speed 25330.77 samples/sec Loss 1.2679 LearningRate 0.0000 Epoch: 39 Global Step: 69050 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 18:06:05,336-Speed 25173.51 samples/sec Loss 1.2620 LearningRate 0.0000 Epoch: 39 Global Step: 69060 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 18:06:15,136-Speed 25080.79 samples/sec Loss 1.2681 LearningRate 0.0000 Epoch: 39 Global Step: 69070 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 18:06:24,901-Speed 25171.84 samples/sec Loss 1.2553 LearningRate 0.0000 Epoch: 39 Global Step: 69080 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 18:06:34,611-Speed 25312.46 samples/sec Loss 1.2565 LearningRate 0.0000 Epoch: 39 Global Step: 69090 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 18:06:44,373-Speed 25178.82 samples/sec Loss 1.2629 LearningRate 0.0000 Epoch: 39 Global Step: 69100 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 18:06:54,163-Speed 25106.18 samples/sec Loss 1.2647 LearningRate 0.0000 Epoch: 39 Global Step: 69110 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-26 18:07:03,942-Speed 25135.09 samples/sec Loss 1.2615 LearningRate 0.0000 Epoch: 39 Global Step: 69120 Fp16 Grad Scale: 32768 Required: -0 hours Training: 2022-03-26 18:07:13,664-Speed 25280.24 samples/sec Loss 1.2615 LearningRate 0.0000 Epoch: 39 Global Step: 69130 Fp16 Grad Scale: 32768 Required: -0 hours