Training: 2022-03-04 20:30:03,610-rank_id: 0
Training: 2022-03-04 20:31:59,200-Speed 9419.43 samples/sec   Loss 42.4879   LearningRate 0.0000   Epoch: 0   Global Step: 20   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-03-04 20:32:25,370-Speed 9391.83 samples/sec   Loss 42.4783   LearningRate 0.0000   Epoch: 0   Global Step: 30   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-03-04 20:32:51,479-Speed 9413.31 samples/sec   Loss 42.4502   LearningRate 0.0000   Epoch: 0   Global Step: 40   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-03-04 20:33:17,610-Speed 9405.67 samples/sec   Loss 42.4231   LearningRate 0.0000   Epoch: 0   Global Step: 50   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-03-04 20:33:43,805-Speed 9382.61 samples/sec   Loss 42.3702   LearningRate 0.0000   Epoch: 0   Global Step: 60   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-03-04 20:34:10,003-Speed 9381.28 samples/sec   Loss 42.2705   LearningRate 0.0000   Epoch: 0   Global Step: 70   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-03-04 20:34:36,158-Speed 9397.12 samples/sec   Loss 42.1277   LearningRate 0.0000   Epoch: 0   Global Step: 80   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-03-04 20:35:02,302-Speed 9400.49 samples/sec   Loss 41.9391   LearningRate 0.0000   Epoch: 0   Global Step: 90   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-03-04 20:35:28,470-Speed 9392.39 samples/sec   Loss 41.7228   LearningRate 0.0000   Epoch: 0   Global Step: 100   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-03-04 20:35:54,664-Speed 9383.88 samples/sec   Loss 41.4637   LearningRate 0.0000   Epoch: 0   Global Step: 110   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-03-04 20:36:20,776-Speed 9412.54 samples/sec   Loss 41.1974   LearningRate 0.0000   Epoch: 0   Global Step: 120   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-03-04 20:36:46,920-Speed 9400.55 samples/sec   Loss 40.9261   LearningRate 0.0000   Epoch: 0   Global Step: 130   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-03-04 20:37:13,123-Speed 9379.80 samples/sec   Loss 40.6231   LearningRate 0.0000   Epoch: 0   Global Step: 140   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-03-04 20:37:39,415-Speed 9347.83 samples/sec   Loss 40.3331   LearningRate 0.0000   Epoch: 0   Global Step: 150   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-03-04 20:38:05,554-Speed 9403.86 samples/sec   Loss 40.0366   LearningRate 0.0000   Epoch: 0   Global Step: 160   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-03-04 20:38:31,700-Speed 9400.08 samples/sec   Loss 39.7915   LearningRate 0.0000   Epoch: 0   Global Step: 170   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-03-04 20:38:57,845-Speed 9400.66 samples/sec   Loss 39.5708   LearningRate 0.0000   Epoch: 0   Global Step: 180   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-03-04 20:39:24,095-Speed 9362.74 samples/sec   Loss 39.3797   LearningRate 0.0000   Epoch: 0   Global Step: 190   Fp16 Grad Scale: 131072   Required: 50 hours
Training: 2022-03-04 20:39:50,270-Speed 9389.90 samples/sec   Loss 39.2291   LearningRate 0.0000   Epoch: 0   Global Step: 200   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-03-04 20:40:16,419-Speed 9398.99 samples/sec   Loss 39.0980   LearningRate 0.0000   Epoch: 0   Global Step: 210   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-03-04 20:40:42,672-Speed 9361.62 samples/sec   Loss 38.9949   LearningRate 0.0000   Epoch: 0   Global Step: 220   Fp16 Grad Scale: 65536   Required: 50 hours
Training: 2022-03-04 20:41:08,882-Speed 9377.16 samples/sec   Loss 38.9242   LearningRate 0.0000   Epoch: 0   Global Step: 230   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-03-04 20:41:35,104-Speed 9372.62 samples/sec   Loss 38.8745   LearningRate 0.0000   Epoch: 0   Global Step: 240   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-03-04 20:42:01,276-Speed 9390.97 samples/sec   Loss 38.8414   LearningRate 0.0000   Epoch: 0   Global Step: 250   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-03-04 20:42:27,497-Speed 9373.19 samples/sec   Loss 38.9515   LearningRate 0.0000   Epoch: 0   Global Step: 260   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-03-04 20:42:53,706-Speed 9377.37 samples/sec   Loss 38.8448   LearningRate 0.0000   Epoch: 0   Global Step: 270   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-03-04 20:43:19,876-Speed 9391.23 samples/sec   Loss 38.8188   LearningRate 0.0000   Epoch: 0   Global Step: 280   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-03-04 20:43:46,061-Speed 9385.98 samples/sec   Loss 38.8203   LearningRate 0.0000   Epoch: 0   Global Step: 290   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-03-04 20:44:12,273-Speed 9376.50 samples/sec   Loss 38.8265   LearningRate 0.0000   Epoch: 0   Global Step: 300   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-03-04 20:44:38,486-Speed 9376.31 samples/sec   Loss 38.8582   LearningRate 0.0000   Epoch: 0   Global Step: 310   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-03-04 20:45:04,594-Speed 9413.55 samples/sec   Loss 38.8541   LearningRate 0.0000   Epoch: 0   Global Step: 320   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-03-04 20:45:30,808-Speed 9375.91 samples/sec   Loss 38.8525   LearningRate 0.0000   Epoch: 0   Global Step: 330   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-03-04 20:45:57,032-Speed 9372.07 samples/sec   Loss 38.8563   LearningRate 0.0000   Epoch: 0   Global Step: 340   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-03-04 20:46:23,207-Speed 9389.50 samples/sec   Loss 38.8785   LearningRate 0.0001   Epoch: 0   Global Step: 350   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-03-04 20:46:49,368-Speed 9394.78 samples/sec   Loss 39.2829   LearningRate 0.0001   Epoch: 0   Global Step: 360   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-03-04 20:47:15,571-Speed 9379.68 samples/sec   Loss 38.9493   LearningRate 0.0001   Epoch: 0   Global Step: 370   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-03-04 20:47:41,839-Speed 9356.14 samples/sec   Loss 38.8881   LearningRate 0.0001   Epoch: 0   Global Step: 380   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-03-04 20:48:08,028-Speed 9384.69 samples/sec   Loss 38.8905   LearningRate 0.0001   Epoch: 0   Global Step: 390   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-03-04 20:48:34,246-Speed 9374.21 samples/sec   Loss 38.8509   LearningRate 0.0001   Epoch: 0   Global Step: 400   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-03-04 20:49:00,433-Speed 9385.70 samples/sec   Loss 38.8412   LearningRate 0.0001   Epoch: 0   Global Step: 410   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-03-04 20:49:26,562-Speed 9406.34 samples/sec   Loss 38.8563   LearningRate 0.0001   Epoch: 0   Global Step: 420   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-03-04 20:49:52,715-Speed 9397.81 samples/sec   Loss 38.8280   LearningRate 0.0001   Epoch: 0   Global Step: 430   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-03-04 20:50:18,904-Speed 9384.39 samples/sec   Loss 38.8262   LearningRate 0.0001   Epoch: 0   Global Step: 440   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-03-04 20:50:45,117-Speed 9376.49 samples/sec   Loss 38.8329   LearningRate 0.0001   Epoch: 0   Global Step: 450   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-03-04 20:51:11,304-Speed 9385.23 samples/sec   Loss 38.8253   LearningRate 0.0001   Epoch: 0   Global Step: 460   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-03-04 20:51:37,493-Speed 9384.63 samples/sec   Loss 38.9925   LearningRate 0.0001   Epoch: 0   Global Step: 470   Fp16 Grad Scale: 2048   Required: 50 hours
Training: 2022-03-04 20:52:03,653-Speed 9395.14 samples/sec   Loss 38.8594   LearningRate 0.0001   Epoch: 0   Global Step: 480   Fp16 Grad Scale: 2048   Required: 50 hours
Training: 2022-03-04 20:52:29,867-Speed 9375.51 samples/sec   Loss 38.8976   LearningRate 0.0001   Epoch: 0   Global Step: 490   Fp16 Grad Scale: 2048   Required: 50 hours
Training: 2022-03-04 20:52:56,084-Speed 9374.63 samples/sec   Loss 38.8854   LearningRate 0.0001   Epoch: 0   Global Step: 500   Fp16 Grad Scale: 2048   Required: 50 hours
Training: 2022-03-04 20:53:22,270-Speed 9386.17 samples/sec   Loss 38.8807   LearningRate 0.0001   Epoch: 0   Global Step: 510   Fp16 Grad Scale: 2048   Required: 50 hours
Training: 2022-03-04 20:53:48,513-Speed 9365.29 samples/sec   Loss 38.8871   LearningRate 0.0001   Epoch: 0   Global Step: 520   Fp16 Grad Scale: 2048   Required: 50 hours
Training: 2022-03-04 20:54:14,727-Speed 9375.61 samples/sec   Loss 38.9119   LearningRate 0.0001   Epoch: 0   Global Step: 530   Fp16 Grad Scale: 2048   Required: 50 hours
Training: 2022-03-04 20:54:40,915-Speed 9384.97 samples/sec   Loss 38.9298   LearningRate 0.0001   Epoch: 0   Global Step: 540   Fp16 Grad Scale: 2048   Required: 50 hours
Training: 2022-03-04 20:55:07,106-Speed 9383.92 samples/sec   Loss 38.9156   LearningRate 0.0001   Epoch: 0   Global Step: 550   Fp16 Grad Scale: 2048   Required: 50 hours
Training: 2022-03-04 20:55:33,409-Speed 9344.05 samples/sec   Loss 38.9260   LearningRate 0.0001   Epoch: 0   Global Step: 560   Fp16 Grad Scale: 2048   Required: 50 hours
Training: 2022-03-04 20:55:59,619-Speed 9377.20 samples/sec   Loss 38.9574   LearningRate 0.0001   Epoch: 0   Global Step: 570   Fp16 Grad Scale: 4096   Required: 50 hours
Training: 2022-03-04 20:56:25,836-Speed 9374.61 samples/sec   Loss 38.9363   LearningRate 0.0001   Epoch: 0   Global Step: 580   Fp16 Grad Scale: 4096   Required: 50 hours
Training: 2022-03-04 20:56:52,087-Speed 9362.49 samples/sec   Loss 38.9533   LearningRate 0.0001   Epoch: 0   Global Step: 590   Fp16 Grad Scale: 4096   Required: 50 hours
Training: 2022-03-04 20:57:18,280-Speed 9383.19 samples/sec   Loss 39.0026   LearningRate 0.0001   Epoch: 0   Global Step: 600   Fp16 Grad Scale: 4096   Required: 50 hours
Training: 2022-03-04 20:57:44,408-Speed 9407.27 samples/sec   Loss 39.0228   LearningRate 0.0001   Epoch: 0   Global Step: 610   Fp16 Grad Scale: 2048   Required: 50 hours
Training: 2022-03-04 20:58:10,598-Speed 9384.32 samples/sec   Loss 39.1995   LearningRate 0.0001   Epoch: 0   Global Step: 620   Fp16 Grad Scale: 2048   Required: 50 hours
Training: 2022-03-04 20:58:36,695-Speed 9417.55 samples/sec   Loss 39.0336   LearningRate 0.0001   Epoch: 0   Global Step: 630   Fp16 Grad Scale: 2048   Required: 50 hours
Training: 2022-03-04 20:59:02,839-Speed 9401.05 samples/sec   Loss 39.0512   LearningRate 0.0001   Epoch: 0   Global Step: 640   Fp16 Grad Scale: 2048   Required: 50 hours
Training: 2022-03-04 20:59:28,962-Speed 9408.85 samples/sec   Loss 39.0467   LearningRate 0.0001   Epoch: 0   Global Step: 650   Fp16 Grad Scale: 2048   Required: 50 hours
Training: 2022-03-04 20:59:55,150-Speed 9385.12 samples/sec   Loss 39.0546   LearningRate 0.0001   Epoch: 0   Global Step: 660   Fp16 Grad Scale: 2048   Required: 50 hours
Training: 2022-03-04 21:00:21,326-Speed 9389.25 samples/sec   Loss 39.0568   LearningRate 0.0001   Epoch: 0   Global Step: 670   Fp16 Grad Scale: 2048   Required: 50 hours
Training: 2022-03-04 21:00:47,555-Speed 9370.18 samples/sec   Loss 39.0575   LearningRate 0.0001   Epoch: 0   Global Step: 680   Fp16 Grad Scale: 2048   Required: 50 hours
Training: 2022-03-04 21:01:13,781-Speed 9371.21 samples/sec   Loss 39.0660   LearningRate 0.0001   Epoch: 0   Global Step: 690   Fp16 Grad Scale: 2048   Required: 50 hours
Training: 2022-03-04 21:01:39,948-Speed 9392.72 samples/sec   Loss 39.1022   LearningRate 0.0001   Epoch: 0   Global Step: 700   Fp16 Grad Scale: 2048   Required: 50 hours
Training: 2022-03-04 21:02:06,110-Speed 9393.97 samples/sec   Loss 39.0941   LearningRate 0.0001   Epoch: 0   Global Step: 710   Fp16 Grad Scale: 4096   Required: 50 hours
Training: 2022-03-04 21:02:32,311-Speed 9380.29 samples/sec   Loss 39.1007   LearningRate 0.0001   Epoch: 0   Global Step: 720   Fp16 Grad Scale: 4096   Required: 50 hours
Training: 2022-03-04 21:02:58,481-Speed 9391.42 samples/sec   Loss 39.1284   LearningRate 0.0001   Epoch: 0   Global Step: 730   Fp16 Grad Scale: 4096   Required: 50 hours
Training: 2022-03-04 21:03:24,681-Speed 9381.78 samples/sec   Loss 39.1243   LearningRate 0.0001   Epoch: 0   Global Step: 740   Fp16 Grad Scale: 4096   Required: 50 hours
Training: 2022-03-04 21:03:50,941-Speed 9359.18 samples/sec   Loss 39.1188   LearningRate 0.0001   Epoch: 0   Global Step: 750   Fp16 Grad Scale: 4096   Required: 50 hours
Training: 2022-03-04 21:04:17,137-Speed 9382.05 samples/sec   Loss 39.1153   LearningRate 0.0001   Epoch: 0   Global Step: 760   Fp16 Grad Scale: 4096   Required: 50 hours
Training: 2022-03-04 21:04:43,356-Speed 9374.02 samples/sec   Loss 39.1058   LearningRate 0.0001   Epoch: 0   Global Step: 770   Fp16 Grad Scale: 4096   Required: 50 hours
Training: 2022-03-04 21:05:09,471-Speed 9411.25 samples/sec   Loss 39.1136   LearningRate 0.0001   Epoch: 0   Global Step: 780   Fp16 Grad Scale: 4096   Required: 50 hours
Training: 2022-03-04 21:05:35,713-Speed 9365.35 samples/sec   Loss 39.1154   LearningRate 0.0001   Epoch: 0   Global Step: 790   Fp16 Grad Scale: 4096   Required: 50 hours
Training: 2022-03-04 21:06:01,867-Speed 9397.30 samples/sec   Loss 39.1301   LearningRate 0.0001   Epoch: 0   Global Step: 800   Fp16 Grad Scale: 4096   Required: 50 hours
Training: 2022-03-04 21:06:28,107-Speed 9366.45 samples/sec   Loss 39.1347   LearningRate 0.0001   Epoch: 0   Global Step: 810   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-03-04 21:06:54,378-Speed 9355.01 samples/sec   Loss 39.1288   LearningRate 0.0001   Epoch: 0   Global Step: 820   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-03-04 21:07:20,602-Speed 9372.25 samples/sec   Loss 39.1449   LearningRate 0.0001   Epoch: 0   Global Step: 830   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-03-04 21:07:46,810-Speed 9377.72 samples/sec   Loss 39.1504   LearningRate 0.0001   Epoch: 0   Global Step: 840   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-03-04 21:08:12,969-Speed 9395.38 samples/sec   Loss 39.1360   LearningRate 0.0001   Epoch: 0   Global Step: 850   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-03-04 21:08:39,202-Speed 9368.98 samples/sec   Loss 39.1317   LearningRate 0.0001   Epoch: 0   Global Step: 860   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-03-04 21:09:05,391-Speed 9384.43 samples/sec   Loss 39.1395   LearningRate 0.0001   Epoch: 0   Global Step: 870   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-03-04 21:09:31,678-Speed 9349.48 samples/sec   Loss 39.1393   LearningRate 0.0001   Epoch: 0   Global Step: 880   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-03-04 21:09:57,882-Speed 9379.42 samples/sec   Loss 39.1531   LearningRate 0.0001   Epoch: 0   Global Step: 890   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-03-04 21:10:24,145-Speed 9358.06 samples/sec   Loss 39.1415   LearningRate 0.0001   Epoch: 0   Global Step: 900   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-03-04 21:10:50,435-Speed 9348.68 samples/sec   Loss 39.1467   LearningRate 0.0001   Epoch: 0   Global Step: 910   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-03-04 21:11:16,652-Speed 9374.34 samples/sec   Loss 39.1439   LearningRate 0.0001   Epoch: 0   Global Step: 920   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-03-04 21:11:42,948-Speed 9346.94 samples/sec   Loss 39.1466   LearningRate 0.0001   Epoch: 0   Global Step: 930   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-03-04 21:12:09,155-Speed 9378.36 samples/sec   Loss 39.1386   LearningRate 0.0001   Epoch: 0   Global Step: 940   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-03-04 21:12:35,364-Speed 9377.45 samples/sec   Loss 39.1392   LearningRate 0.0001   Epoch: 0   Global Step: 950   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-03-04 21:13:01,530-Speed 9392.66 samples/sec   Loss 39.1398   LearningRate 0.0001   Epoch: 0   Global Step: 960   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-03-04 21:13:27,758-Speed 9370.70 samples/sec   Loss 39.1443   LearningRate 0.0001   Epoch: 0   Global Step: 970   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-03-04 21:13:53,904-Speed 9400.20 samples/sec   Loss 39.1408   LearningRate 0.0001   Epoch: 0   Global Step: 980   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-03-04 21:14:20,053-Speed 9398.87 samples/sec   Loss 39.1392   LearningRate 0.0001   Epoch: 0   Global Step: 990   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-03-04 21:14:46,301-Speed 9363.52 samples/sec   Loss 39.1501   LearningRate 0.0001   Epoch: 0   Global Step: 1000   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-03-04 21:15:12,537-Speed 9367.82 samples/sec   Loss 39.1452   LearningRate 0.0001   Epoch: 0   Global Step: 1010   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-03-04 21:15:38,844-Speed 9342.60 samples/sec   Loss 39.1637   LearningRate 0.0001   Epoch: 0   Global Step: 1020   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-03-04 21:16:05,141-Speed 9346.19 samples/sec   Loss 39.1496   LearningRate 0.0001   Epoch: 0   Global Step: 1030   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-03-04 21:16:31,394-Speed 9361.48 samples/sec   Loss 39.1356   LearningRate 0.0002   Epoch: 0   Global Step: 1040   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-03-04 21:16:57,631-Speed 9367.61 samples/sec   Loss 39.1390   LearningRate 0.0002   Epoch: 0   Global Step: 1050   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-03-04 21:17:23,957-Speed 9335.79 samples/sec   Loss 39.1362   LearningRate 0.0002   Epoch: 0   Global Step: 1060   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-03-04 21:17:50,204-Speed 9363.90 samples/sec   Loss 39.1244   LearningRate 0.0002   Epoch: 0   Global Step: 1070   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-03-04 21:18:16,532-Speed 9335.11 samples/sec   Loss 39.1305   LearningRate 0.0002   Epoch: 0   Global Step: 1080   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-03-04 21:18:42,846-Speed 9339.77 samples/sec   Loss 39.1270   LearningRate 0.0002   Epoch: 0   Global Step: 1090   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-03-04 21:19:09,167-Speed 9337.85 samples/sec   Loss 39.1196   LearningRate 0.0002   Epoch: 0   Global Step: 1100   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-03-04 21:19:35,540-Speed 9318.80 samples/sec   Loss 39.1140   LearningRate 0.0002   Epoch: 0   Global Step: 1110   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-03-04 21:20:01,821-Speed 9352.24 samples/sec   Loss 39.0860   LearningRate 0.0002   Epoch: 0   Global Step: 1120   Fp16 Grad Scale: 16384   Required: 49 hours
Training: 2022-03-04 21:20:28,119-Speed 9345.91 samples/sec   Loss 39.1035   LearningRate 0.0002   Epoch: 0   Global Step: 1130   Fp16 Grad Scale: 16384   Required: 49 hours
Training: 2022-03-04 21:20:54,449-Speed 9334.39 samples/sec   Loss 39.0553   LearningRate 0.0002   Epoch: 0   Global Step: 1140   Fp16 Grad Scale: 16384   Required: 49 hours
Training: 2022-03-04 21:21:20,800-Speed 9327.00 samples/sec   Loss 39.0431   LearningRate 0.0002   Epoch: 0   Global Step: 1150   Fp16 Grad Scale: 16384   Required: 49 hours
Training: 2022-03-04 21:21:47,151-Speed 9327.02 samples/sec   Loss 39.0310   LearningRate 0.0002   Epoch: 0   Global Step: 1160   Fp16 Grad Scale: 16384   Required: 49 hours
Training: 2022-03-04 21:22:13,509-Speed 9324.40 samples/sec   Loss 39.0079   LearningRate 0.0002   Epoch: 0   Global Step: 1170   Fp16 Grad Scale: 16384   Required: 49 hours
Training: 2022-03-04 21:22:39,813-Speed 9343.62 samples/sec   Loss 39.0274   LearningRate 0.0002   Epoch: 0   Global Step: 1180   Fp16 Grad Scale: 8192   Required: 49 hours
Training: 2022-03-04 21:23:06,089-Speed 9353.70 samples/sec   Loss 39.0159   LearningRate 0.0002   Epoch: 0   Global Step: 1190   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 21:23:32,466-Speed 9317.71 samples/sec   Loss 38.9808   LearningRate 0.0002   Epoch: 0   Global Step: 1200   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 21:23:58,788-Speed 9337.86 samples/sec   Loss 38.9294   LearningRate 0.0002   Epoch: 0   Global Step: 1210   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 21:24:25,118-Speed 9334.09 samples/sec   Loss 38.8897   LearningRate 0.0002   Epoch: 0   Global Step: 1220   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 21:24:51,523-Speed 9307.92 samples/sec   Loss 38.9448   LearningRate 0.0002   Epoch: 0   Global Step: 1230   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 21:25:17,845-Speed 9337.36 samples/sec   Loss 38.9361   LearningRate 0.0002   Epoch: 0   Global Step: 1240   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 21:25:44,167-Speed 9337.05 samples/sec   Loss 38.8574   LearningRate 0.0002   Epoch: 0   Global Step: 1250   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 21:26:10,450-Speed 9351.05 samples/sec   Loss 38.8182   LearningRate 0.0002   Epoch: 0   Global Step: 1260   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 21:26:36,744-Speed 9347.10 samples/sec   Loss 38.7686   LearningRate 0.0002   Epoch: 0   Global Step: 1270   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 21:27:03,042-Speed 9345.68 samples/sec   Loss 38.7139   LearningRate 0.0002   Epoch: 0   Global Step: 1280   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 21:27:29,363-Speed 9337.88 samples/sec   Loss 38.6863   LearningRate 0.0002   Epoch: 0   Global Step: 1290   Fp16 Grad Scale: 8192   Required: 49 hours
Training: 2022-03-04 21:27:55,654-Speed 9348.01 samples/sec   Loss 38.6548   LearningRate 0.0002   Epoch: 0   Global Step: 1300   Fp16 Grad Scale: 8192   Required: 49 hours
Training: 2022-03-04 21:28:21,943-Speed 9348.79 samples/sec   Loss 38.6359   LearningRate 0.0002   Epoch: 0   Global Step: 1310   Fp16 Grad Scale: 8192   Required: 49 hours
Training: 2022-03-04 21:28:48,206-Speed 9358.28 samples/sec   Loss 38.6223   LearningRate 0.0002   Epoch: 0   Global Step: 1320   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 21:29:14,494-Speed 9349.14 samples/sec   Loss 38.6513   LearningRate 0.0002   Epoch: 0   Global Step: 1330   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 21:29:40,799-Speed 9343.78 samples/sec   Loss 38.6576   LearningRate 0.0002   Epoch: 0   Global Step: 1340   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 21:30:07,059-Speed 9358.93 samples/sec   Loss 38.5467   LearningRate 0.0002   Epoch: 0   Global Step: 1350   Fp16 Grad Scale: 2048   Required: 49 hours
Training: 2022-03-04 21:30:33,371-Speed 9340.67 samples/sec   Loss 38.5419   LearningRate 0.0002   Epoch: 0   Global Step: 1360   Fp16 Grad Scale: 2048   Required: 49 hours
Training: 2022-03-04 21:30:59,675-Speed 9344.18 samples/sec   Loss 38.5570   LearningRate 0.0002   Epoch: 0   Global Step: 1370   Fp16 Grad Scale: 2048   Required: 49 hours
Training: 2022-03-04 21:31:25,912-Speed 9367.38 samples/sec   Loss 38.5327   LearningRate 0.0002   Epoch: 0   Global Step: 1380   Fp16 Grad Scale: 2048   Required: 49 hours
Training: 2022-03-04 21:31:52,235-Speed 9337.00 samples/sec   Loss 38.4775   LearningRate 0.0002   Epoch: 0   Global Step: 1390   Fp16 Grad Scale: 2048   Required: 49 hours
Training: 2022-03-04 21:32:18,528-Speed 9347.54 samples/sec   Loss 38.5005   LearningRate 0.0002   Epoch: 0   Global Step: 1400   Fp16 Grad Scale: 2048   Required: 49 hours
Training: 2022-03-04 21:32:44,889-Speed 9323.70 samples/sec   Loss 38.4442   LearningRate 0.0002   Epoch: 0   Global Step: 1410   Fp16 Grad Scale: 2048   Required: 49 hours
Training: 2022-03-04 21:33:11,220-Speed 9333.76 samples/sec   Loss 38.7379   LearningRate 0.0002   Epoch: 0   Global Step: 1420   Fp16 Grad Scale: 2048   Required: 49 hours
Training: 2022-03-04 21:33:37,555-Speed 9332.57 samples/sec   Loss 38.4480   LearningRate 0.0002   Epoch: 0   Global Step: 1430   Fp16 Grad Scale: 2048   Required: 49 hours
Training: 2022-03-04 21:34:03,808-Speed 9361.56 samples/sec   Loss 38.4603   LearningRate 0.0002   Epoch: 0   Global Step: 1440   Fp16 Grad Scale: 2048   Required: 49 hours
Training: 2022-03-04 21:34:30,102-Speed 9347.27 samples/sec   Loss 38.4207   LearningRate 0.0002   Epoch: 0   Global Step: 1450   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 21:34:56,409-Speed 9342.34 samples/sec   Loss 38.3964   LearningRate 0.0002   Epoch: 0   Global Step: 1460   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 21:35:22,667-Speed 9360.09 samples/sec   Loss 38.3533   LearningRate 0.0002   Epoch: 0   Global Step: 1470   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 21:35:48,940-Speed 9354.72 samples/sec   Loss 38.2894   LearningRate 0.0002   Epoch: 0   Global Step: 1480   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 21:36:15,310-Speed 9319.88 samples/sec   Loss 38.2424   LearningRate 0.0002   Epoch: 0   Global Step: 1490   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 21:36:41,670-Speed 9323.86 samples/sec   Loss 38.1882   LearningRate 0.0002   Epoch: 0   Global Step: 1500   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 21:37:07,981-Speed 9341.24 samples/sec   Loss 38.1190   LearningRate 0.0002   Epoch: 0   Global Step: 1510   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 21:37:34,300-Speed 9338.02 samples/sec   Loss 38.1026   LearningRate 0.0002   Epoch: 0   Global Step: 1520   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 21:38:00,658-Speed 9324.53 samples/sec   Loss 38.0877   LearningRate 0.0002   Epoch: 0   Global Step: 1530   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 21:38:26,968-Speed 9341.63 samples/sec   Loss 38.4491   LearningRate 0.0002   Epoch: 0   Global Step: 1540   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 21:38:53,266-Speed 9345.65 samples/sec   Loss 38.1682   LearningRate 0.0002   Epoch: 0   Global Step: 1550   Fp16 Grad Scale: 8192   Required: 49 hours
Training: 2022-03-04 21:39:19,621-Speed 9325.70 samples/sec   Loss 38.0522   LearningRate 0.0002   Epoch: 0   Global Step: 1560   Fp16 Grad Scale: 8192   Required: 49 hours
Training: 2022-03-04 21:39:46,067-Speed 9293.35 samples/sec   Loss 37.9772   LearningRate 0.0002   Epoch: 0   Global Step: 1570   Fp16 Grad Scale: 8192   Required: 49 hours
Training: 2022-03-04 21:40:12,413-Speed 9328.75 samples/sec   Loss 37.9387   LearningRate 0.0002   Epoch: 0   Global Step: 1580   Fp16 Grad Scale: 8192   Required: 49 hours
Training: 2022-03-04 21:40:38,759-Speed 9328.44 samples/sec   Loss 37.9246   LearningRate 0.0002   Epoch: 0   Global Step: 1590   Fp16 Grad Scale: 8192   Required: 49 hours
Training: 2022-03-04 21:41:05,094-Speed 9332.68 samples/sec   Loss 37.8783   LearningRate 0.0002   Epoch: 0   Global Step: 1600   Fp16 Grad Scale: 8192   Required: 49 hours
Training: 2022-03-04 21:41:31,414-Speed 9338.64 samples/sec   Loss 37.8408   LearningRate 0.0002   Epoch: 0   Global Step: 1610   Fp16 Grad Scale: 8192   Required: 49 hours
Training: 2022-03-04 21:41:57,758-Speed 9329.14 samples/sec   Loss 37.7741   LearningRate 0.0002   Epoch: 0   Global Step: 1620   Fp16 Grad Scale: 8192   Required: 49 hours
Training: 2022-03-04 21:42:23,968-Speed 9377.35 samples/sec   Loss 37.7123   LearningRate 0.0002   Epoch: 0   Global Step: 1630   Fp16 Grad Scale: 8192   Required: 49 hours
Training: 2022-03-04 21:42:50,277-Speed 9341.79 samples/sec   Loss 37.6759   LearningRate 0.0002   Epoch: 0   Global Step: 1640   Fp16 Grad Scale: 8192   Required: 49 hours
Training: 2022-03-04 21:43:16,577-Speed 9344.84 samples/sec   Loss 37.6202   LearningRate 0.0002   Epoch: 0   Global Step: 1650   Fp16 Grad Scale: 16384   Required: 49 hours
Training: 2022-03-04 21:43:42,814-Speed 9367.29 samples/sec   Loss 37.5821   LearningRate 0.0002   Epoch: 0   Global Step: 1660   Fp16 Grad Scale: 8192   Required: 49 hours
Training: 2022-03-04 21:44:09,079-Speed 9357.71 samples/sec   Loss 37.5247   LearningRate 0.0002   Epoch: 0   Global Step: 1670   Fp16 Grad Scale: 8192   Required: 49 hours
Training: 2022-03-04 21:44:35,361-Speed 9351.23 samples/sec   Loss 37.4741   LearningRate 0.0002   Epoch: 0   Global Step: 1680   Fp16 Grad Scale: 8192   Required: 49 hours
Training: 2022-03-04 21:45:01,636-Speed 9354.71 samples/sec   Loss 37.5179   LearningRate 0.0002   Epoch: 0   Global Step: 1690   Fp16 Grad Scale: 8192   Required: 49 hours
Training: 2022-03-04 21:45:27,946-Speed 9341.28 samples/sec   Loss 37.4822   LearningRate 0.0002   Epoch: 0   Global Step: 1700   Fp16 Grad Scale: 8192   Required: 49 hours
Training: 2022-03-04 21:45:54,205-Speed 9359.54 samples/sec   Loss 37.4035   LearningRate 0.0002   Epoch: 0   Global Step: 1710   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 21:46:20,482-Speed 9352.96 samples/sec   Loss 37.3636   LearningRate 0.0002   Epoch: 0   Global Step: 1720   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 21:47:38,019-Speed 3169.66 samples/sec   Loss 37.3438   LearningRate 0.0003   Epoch: 1   Global Step: 1730   Fp16 Grad Scale: 4096   Required: 50 hours
Training: 2022-03-04 21:48:04,034-Speed 9447.26 samples/sec   Loss 37.3141   LearningRate 0.0003   Epoch: 1   Global Step: 1740   Fp16 Grad Scale: 4096   Required: 50 hours
Training: 2022-03-04 21:48:30,307-Speed 9354.87 samples/sec   Loss 37.2755   LearningRate 0.0003   Epoch: 1   Global Step: 1750   Fp16 Grad Scale: 4096   Required: 50 hours
Training: 2022-03-04 21:48:56,582-Speed 9353.76 samples/sec   Loss 37.3661   LearningRate 0.0003   Epoch: 1   Global Step: 1760   Fp16 Grad Scale: 4096   Required: 50 hours
Training: 2022-03-04 21:49:22,737-Speed 9396.82 samples/sec   Loss 37.3078   LearningRate 0.0003   Epoch: 1   Global Step: 1770   Fp16 Grad Scale: 4096   Required: 50 hours
Training: 2022-03-04 21:49:48,939-Speed 9380.16 samples/sec   Loss 37.1904   LearningRate 0.0003   Epoch: 1   Global Step: 1780   Fp16 Grad Scale: 4096   Required: 50 hours
Training: 2022-03-04 21:50:15,152-Speed 9375.99 samples/sec   Loss 37.1099   LearningRate 0.0003   Epoch: 1   Global Step: 1790   Fp16 Grad Scale: 4096   Required: 50 hours
Training: 2022-03-04 21:50:41,400-Speed 9363.92 samples/sec   Loss 37.0244   LearningRate 0.0003   Epoch: 1   Global Step: 1800   Fp16 Grad Scale: 4096   Required: 50 hours
Training: 2022-03-04 21:51:07,631-Speed 9369.56 samples/sec   Loss 36.9832   LearningRate 0.0003   Epoch: 1   Global Step: 1810   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-03-04 21:51:33,932-Speed 9344.38 samples/sec   Loss 36.9613   LearningRate 0.0003   Epoch: 1   Global Step: 1820   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-03-04 21:52:00,100-Speed 9392.93 samples/sec   Loss 36.8970   LearningRate 0.0003   Epoch: 1   Global Step: 1830   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-03-04 21:52:26,308-Speed 9377.76 samples/sec   Loss 36.8320   LearningRate 0.0003   Epoch: 1   Global Step: 1840   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-03-04 21:52:52,468-Speed 9395.26 samples/sec   Loss 36.8124   LearningRate 0.0003   Epoch: 1   Global Step: 1850   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-03-04 21:53:18,746-Speed 9352.69 samples/sec   Loss 36.7429   LearningRate 0.0003   Epoch: 1   Global Step: 1860   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-03-04 21:53:45,001-Speed 9361.09 samples/sec   Loss 36.7100   LearningRate 0.0003   Epoch: 1   Global Step: 1870   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-03-04 21:54:11,212-Speed 9376.71 samples/sec   Loss 36.6441   LearningRate 0.0003   Epoch: 1   Global Step: 1880   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-03-04 21:54:37,389-Speed 9389.52 samples/sec   Loss 36.6006   LearningRate 0.0003   Epoch: 1   Global Step: 1890   Fp16 Grad Scale: 8192   Required: 50 hours
Training: 2022-03-04 21:55:03,555-Speed 9392.88 samples/sec   Loss 36.5648   LearningRate 0.0003   Epoch: 1   Global Step: 1900   Fp16 Grad Scale: 8192   Required: 49 hours
Training: 2022-03-04 21:55:29,785-Speed 9369.81 samples/sec   Loss 36.4824   LearningRate 0.0003   Epoch: 1   Global Step: 1910   Fp16 Grad Scale: 16384   Required: 49 hours
Training: 2022-03-04 21:55:55,975-Speed 9384.43 samples/sec   Loss 36.4613   LearningRate 0.0003   Epoch: 1   Global Step: 1920   Fp16 Grad Scale: 8192   Required: 49 hours
Training: 2022-03-04 21:56:22,191-Speed 9374.95 samples/sec   Loss 36.4317   LearningRate 0.0003   Epoch: 1   Global Step: 1930   Fp16 Grad Scale: 8192   Required: 49 hours
Training: 2022-03-04 21:56:48,415-Speed 9372.10 samples/sec   Loss 36.3634   LearningRate 0.0003   Epoch: 1   Global Step: 1940   Fp16 Grad Scale: 8192   Required: 49 hours
Training: 2022-03-04 21:57:14,675-Speed 9359.30 samples/sec   Loss 36.3233   LearningRate 0.0003   Epoch: 1   Global Step: 1950   Fp16 Grad Scale: 8192   Required: 49 hours
Training: 2022-03-04 21:57:40,912-Speed 9367.58 samples/sec   Loss 36.3043   LearningRate 0.0003   Epoch: 1   Global Step: 1960   Fp16 Grad Scale: 8192   Required: 49 hours
Training: 2022-03-04 21:58:07,106-Speed 9383.11 samples/sec   Loss 36.2663   LearningRate 0.0003   Epoch: 1   Global Step: 1970   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 21:58:33,316-Speed 9376.95 samples/sec   Loss 36.3529   LearningRate 0.0003   Epoch: 1   Global Step: 1980   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 21:58:59,681-Speed 9321.82 samples/sec   Loss 36.2152   LearningRate 0.0003   Epoch: 1   Global Step: 1990   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 21:59:25,955-Speed 9354.75 samples/sec   Loss 36.1151   LearningRate 0.0003   Epoch: 1   Global Step: 2000   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 21:59:52,157-Speed 9380.08 samples/sec   Loss 36.0745   LearningRate 0.0003   Epoch: 1   Global Step: 2010   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 22:00:18,369-Speed 9376.33 samples/sec   Loss 36.0269   LearningRate 0.0003   Epoch: 1   Global Step: 2020   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 22:00:44,596-Speed 9371.23 samples/sec   Loss 35.9820   LearningRate 0.0003   Epoch: 1   Global Step: 2030   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 22:01:10,875-Speed 9352.50 samples/sec   Loss 36.0160   LearningRate 0.0003   Epoch: 1   Global Step: 2040   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 22:01:37,089-Speed 9375.99 samples/sec   Loss 35.9425   LearningRate 0.0003   Epoch: 1   Global Step: 2050   Fp16 Grad Scale: 2048   Required: 49 hours
Training: 2022-03-04 22:02:03,283-Speed 9382.93 samples/sec   Loss 35.9677   LearningRate 0.0003   Epoch: 1   Global Step: 2060   Fp16 Grad Scale: 2048   Required: 49 hours
Training: 2022-03-04 22:02:29,521-Speed 9367.15 samples/sec   Loss 35.9316   LearningRate 0.0003   Epoch: 1   Global Step: 2070   Fp16 Grad Scale: 2048   Required: 49 hours
Training: 2022-03-04 22:02:55,790-Speed 9355.98 samples/sec   Loss 35.8029   LearningRate 0.0003   Epoch: 1   Global Step: 2080   Fp16 Grad Scale: 2048   Required: 49 hours
Training: 2022-03-04 22:03:22,044-Speed 9361.52 samples/sec   Loss 35.7370   LearningRate 0.0003   Epoch: 1   Global Step: 2090   Fp16 Grad Scale: 2048   Required: 49 hours
Training: 2022-03-04 22:03:48,218-Speed 9390.17 samples/sec   Loss 35.8043   LearningRate 0.0003   Epoch: 1   Global Step: 2100   Fp16 Grad Scale: 2048   Required: 49 hours
Training: 2022-03-04 22:04:14,445-Speed 9371.78 samples/sec   Loss 35.6417   LearningRate 0.0003   Epoch: 1   Global Step: 2110   Fp16 Grad Scale: 2048   Required: 49 hours
Training: 2022-03-04 22:04:40,689-Speed 9365.08 samples/sec   Loss 35.5674   LearningRate 0.0003   Epoch: 1   Global Step: 2120   Fp16 Grad Scale: 2048   Required: 49 hours
Training: 2022-03-04 22:05:06,924-Speed 9368.29 samples/sec   Loss 35.5179   LearningRate 0.0003   Epoch: 1   Global Step: 2130   Fp16 Grad Scale: 2048   Required: 49 hours
Training: 2022-03-04 22:05:33,123-Speed 9381.76 samples/sec   Loss 35.5799   LearningRate 0.0003   Epoch: 1   Global Step: 2140   Fp16 Grad Scale: 2048   Required: 49 hours
Training: 2022-03-04 22:05:59,356-Speed 9369.25 samples/sec   Loss 35.6791   LearningRate 0.0003   Epoch: 1   Global Step: 2150   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 22:06:25,632-Speed 9353.51 samples/sec   Loss 35.4853   LearningRate 0.0003   Epoch: 1   Global Step: 2160   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 22:06:51,840-Speed 9378.01 samples/sec   Loss 35.4411   LearningRate 0.0003   Epoch: 1   Global Step: 2170   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 22:07:18,089-Speed 9363.11 samples/sec   Loss 35.6056   LearningRate 0.0003   Epoch: 1   Global Step: 2180   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 22:07:44,457-Speed 9321.13 samples/sec   Loss 35.3852   LearningRate 0.0003   Epoch: 1   Global Step: 2190   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 22:08:10,751-Speed 9347.27 samples/sec   Loss 35.9786   LearningRate 0.0003   Epoch: 1   Global Step: 2200   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 22:08:36,988-Speed 9367.91 samples/sec   Loss 35.5871   LearningRate 0.0003   Epoch: 1   Global Step: 2210   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 22:09:03,313-Speed 9336.12 samples/sec   Loss 35.2985   LearningRate 0.0003   Epoch: 1   Global Step: 2220   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 22:09:29,516-Speed 9379.73 samples/sec   Loss 35.1592   LearningRate 0.0003   Epoch: 1   Global Step: 2230   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 22:09:55,750-Speed 9368.40 samples/sec   Loss 35.0873   LearningRate 0.0003   Epoch: 1   Global Step: 2240   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 22:10:22,012-Speed 9358.36 samples/sec   Loss 35.0537   LearningRate 0.0003   Epoch: 1   Global Step: 2250   Fp16 Grad Scale: 2048   Required: 49 hours
Training: 2022-03-04 22:10:48,261-Speed 9363.23 samples/sec   Loss 34.9923   LearningRate 0.0003   Epoch: 1   Global Step: 2260   Fp16 Grad Scale: 2048   Required: 49 hours
Training: 2022-03-04 22:11:14,510-Speed 9363.08 samples/sec   Loss 35.1088   LearningRate 0.0003   Epoch: 1   Global Step: 2270   Fp16 Grad Scale: 2048   Required: 49 hours
Training: 2022-03-04 22:11:40,769-Speed 9359.87 samples/sec   Loss 36.3821   LearningRate 0.0003   Epoch: 1   Global Step: 2280   Fp16 Grad Scale: 2048   Required: 49 hours
Training: 2022-03-04 22:12:06,911-Speed 9401.04 samples/sec   Loss 35.1854   LearningRate 0.0003   Epoch: 1   Global Step: 2290   Fp16 Grad Scale: 2048   Required: 49 hours
Training: 2022-03-04 22:12:33,144-Speed 9368.92 samples/sec   Loss 34.9644   LearningRate 0.0003   Epoch: 1   Global Step: 2300   Fp16 Grad Scale: 2048   Required: 49 hours
Training: 2022-03-04 22:12:59,462-Speed 9338.43 samples/sec   Loss 34.8483   LearningRate 0.0003   Epoch: 1   Global Step: 2310   Fp16 Grad Scale: 2048   Required: 49 hours
Training: 2022-03-04 22:13:25,682-Speed 9373.22 samples/sec   Loss 34.8001   LearningRate 0.0003   Epoch: 1   Global Step: 2320   Fp16 Grad Scale: 2048   Required: 49 hours
Training: 2022-03-04 22:13:52,046-Speed 9322.43 samples/sec   Loss 34.6771   LearningRate 0.0003   Epoch: 1   Global Step: 2330   Fp16 Grad Scale: 2048   Required: 49 hours
Training: 2022-03-04 22:14:18,280-Speed 9368.37 samples/sec   Loss 34.6543   LearningRate 0.0003   Epoch: 1   Global Step: 2340   Fp16 Grad Scale: 2048   Required: 49 hours
Training: 2022-03-04 22:14:44,495-Speed 9375.12 samples/sec   Loss 34.5566   LearningRate 0.0003   Epoch: 1   Global Step: 2350   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 22:15:10,749-Speed 9361.46 samples/sec   Loss 34.4858   LearningRate 0.0003   Epoch: 1   Global Step: 2360   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 22:15:36,933-Speed 9386.61 samples/sec   Loss 34.5599   LearningRate 0.0003   Epoch: 1   Global Step: 2370   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 22:16:03,176-Speed 9364.94 samples/sec   Loss 34.5383   LearningRate 0.0003   Epoch: 1   Global Step: 2380   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 22:16:29,436-Speed 9359.13 samples/sec   Loss 34.4457   LearningRate 0.0003   Epoch: 1   Global Step: 2390   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 22:16:55,726-Speed 9348.26 samples/sec   Loss 34.2967   LearningRate 0.0003   Epoch: 1   Global Step: 2400   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 22:17:21,944-Speed 9374.26 samples/sec   Loss 34.1739   LearningRate 0.0003   Epoch: 1   Global Step: 2410   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 22:17:48,248-Speed 9343.47 samples/sec   Loss 34.0907   LearningRate 0.0004   Epoch: 1   Global Step: 2420   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 22:18:14,472-Speed 9371.81 samples/sec   Loss 33.9901   LearningRate 0.0004   Epoch: 1   Global Step: 2430   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 22:18:40,716-Speed 9365.07 samples/sec   Loss 33.9421   LearningRate 0.0004   Epoch: 1   Global Step: 2440   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 22:19:06,984-Speed 9356.25 samples/sec   Loss 33.9121   LearningRate 0.0004   Epoch: 1   Global Step: 2450   Fp16 Grad Scale: 8192   Required: 49 hours
Training: 2022-03-04 22:19:33,188-Speed 9378.90 samples/sec   Loss 33.8778   LearningRate 0.0004   Epoch: 1   Global Step: 2460   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 22:19:59,369-Speed 9387.56 samples/sec   Loss 33.8028   LearningRate 0.0004   Epoch: 1   Global Step: 2470   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 22:20:25,588-Speed 9373.56 samples/sec   Loss 33.7008   LearningRate 0.0004   Epoch: 1   Global Step: 2480   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 22:20:51,803-Speed 9375.03 samples/sec   Loss 33.6673   LearningRate 0.0004   Epoch: 1   Global Step: 2490   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 22:21:18,048-Speed 9364.72 samples/sec   Loss 33.6720   LearningRate 0.0004   Epoch: 1   Global Step: 2500   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 22:21:44,264-Speed 9374.90 samples/sec   Loss 33.8205   LearningRate 0.0004   Epoch: 1   Global Step: 2510   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 22:22:10,484-Speed 9373.39 samples/sec   Loss 33.6063   LearningRate 0.0004   Epoch: 1   Global Step: 2520   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 22:22:36,747-Speed 9358.09 samples/sec   Loss 33.4343   LearningRate 0.0004   Epoch: 1   Global Step: 2530   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 22:23:03,034-Speed 9350.45 samples/sec   Loss 33.3491   LearningRate 0.0004   Epoch: 1   Global Step: 2540   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 22:23:29,300-Speed 9357.16 samples/sec   Loss 33.5888   LearningRate 0.0004   Epoch: 1   Global Step: 2550   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 22:23:55,580-Speed 9351.87 samples/sec   Loss 33.5354   LearningRate 0.0004   Epoch: 1   Global Step: 2560   Fp16 Grad Scale: 8192   Required: 49 hours
Training: 2022-03-04 22:24:21,775-Speed 9382.32 samples/sec   Loss 33.2971   LearningRate 0.0004   Epoch: 1   Global Step: 2570   Fp16 Grad Scale: 8192   Required: 49 hours
Training: 2022-03-04 22:24:48,026-Speed 9362.34 samples/sec   Loss 33.1971   LearningRate 0.0004   Epoch: 1   Global Step: 2580   Fp16 Grad Scale: 8192   Required: 49 hours
Training: 2022-03-04 22:25:14,347-Speed 9337.74 samples/sec   Loss 33.0794   LearningRate 0.0004   Epoch: 1   Global Step: 2590   Fp16 Grad Scale: 8192   Required: 49 hours
Training: 2022-03-04 22:25:40,672-Speed 9336.12 samples/sec   Loss 32.9905   LearningRate 0.0004   Epoch: 1   Global Step: 2600   Fp16 Grad Scale: 8192   Required: 49 hours
Training: 2022-03-04 22:26:06,893-Speed 9372.96 samples/sec   Loss 33.0474   LearningRate 0.0004   Epoch: 1   Global Step: 2610   Fp16 Grad Scale: 8192   Required: 49 hours
Training: 2022-03-04 22:26:33,070-Speed 9388.93 samples/sec   Loss 32.9066   LearningRate 0.0004   Epoch: 1   Global Step: 2620   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 22:26:59,438-Speed 9320.66 samples/sec   Loss 32.8521   LearningRate 0.0004   Epoch: 1   Global Step: 2630   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 22:27:25,645-Speed 9378.25 samples/sec   Loss 32.6729   LearningRate 0.0004   Epoch: 1   Global Step: 2640   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 22:27:51,882-Speed 9367.23 samples/sec   Loss 32.5752   LearningRate 0.0004   Epoch: 1   Global Step: 2650   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 22:28:18,102-Speed 9373.62 samples/sec   Loss 32.4900   LearningRate 0.0004   Epoch: 1   Global Step: 2660   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 22:28:44,387-Speed 9350.52 samples/sec   Loss 32.4219   LearningRate 0.0004   Epoch: 1   Global Step: 2670   Fp16 Grad Scale: 2048   Required: 49 hours
Training: 2022-03-04 22:29:10,644-Speed 9360.28 samples/sec   Loss 32.3991   LearningRate 0.0004   Epoch: 1   Global Step: 2680   Fp16 Grad Scale: 2048   Required: 49 hours
Training: 2022-03-04 22:29:36,906-Speed 9358.53 samples/sec   Loss 32.2771   LearningRate 0.0004   Epoch: 1   Global Step: 2690   Fp16 Grad Scale: 2048   Required: 49 hours
Training: 2022-03-04 22:30:03,133-Speed 9370.83 samples/sec   Loss 32.1982   LearningRate 0.0004   Epoch: 1   Global Step: 2700   Fp16 Grad Scale: 2048   Required: 49 hours
Training: 2022-03-04 22:30:29,373-Speed 9366.34 samples/sec   Loss 32.1197   LearningRate 0.0004   Epoch: 1   Global Step: 2710   Fp16 Grad Scale: 2048   Required: 49 hours
Training: 2022-03-04 22:30:55,555-Speed 9387.99 samples/sec   Loss 32.2206   LearningRate 0.0004   Epoch: 1   Global Step: 2720   Fp16 Grad Scale: 2048   Required: 49 hours
Training: 2022-03-04 22:31:21,697-Speed 9401.42 samples/sec   Loss 32.5546   LearningRate 0.0004   Epoch: 1   Global Step: 2730   Fp16 Grad Scale: 2048   Required: 49 hours
Training: 2022-03-04 22:31:47,866-Speed 9391.79 samples/sec   Loss 31.9658   LearningRate 0.0004   Epoch: 1   Global Step: 2740   Fp16 Grad Scale: 1024   Required: 49 hours
Training: 2022-03-04 22:32:14,149-Speed 9350.69 samples/sec   Loss 31.8221   LearningRate 0.0004   Epoch: 1   Global Step: 2750   Fp16 Grad Scale: 1024   Required: 49 hours
Training: 2022-03-04 22:32:40,455-Speed 9342.74 samples/sec   Loss 31.7245   LearningRate 0.0004   Epoch: 1   Global Step: 2760   Fp16 Grad Scale: 1024   Required: 49 hours
Training: 2022-03-04 22:33:06,695-Speed 9366.60 samples/sec   Loss 31.6304   LearningRate 0.0004   Epoch: 1   Global Step: 2770   Fp16 Grad Scale: 1024   Required: 49 hours
Training: 2022-03-04 22:33:32,920-Speed 9371.24 samples/sec   Loss 31.5504   LearningRate 0.0004   Epoch: 1   Global Step: 2780   Fp16 Grad Scale: 1024   Required: 49 hours
Training: 2022-03-04 22:33:59,090-Speed 9391.46 samples/sec   Loss 31.4515   LearningRate 0.0004   Epoch: 1   Global Step: 2790   Fp16 Grad Scale: 1024   Required: 49 hours
Training: 2022-03-04 22:34:25,268-Speed 9388.30 samples/sec   Loss 31.3570   LearningRate 0.0004   Epoch: 1   Global Step: 2800   Fp16 Grad Scale: 1024   Required: 49 hours
Training: 2022-03-04 22:34:51,511-Speed 9365.29 samples/sec   Loss 31.2302   LearningRate 0.0004   Epoch: 1   Global Step: 2810   Fp16 Grad Scale: 1024   Required: 49 hours
Training: 2022-03-04 22:35:17,726-Speed 9375.33 samples/sec   Loss 31.1367   LearningRate 0.0004   Epoch: 1   Global Step: 2820   Fp16 Grad Scale: 1024   Required: 49 hours
Training: 2022-03-04 22:35:44,089-Speed 9322.37 samples/sec   Loss 31.0615   LearningRate 0.0004   Epoch: 1   Global Step: 2830   Fp16 Grad Scale: 1024   Required: 49 hours
Training: 2022-03-04 22:36:10,478-Speed 9313.38 samples/sec   Loss 31.0016   LearningRate 0.0004   Epoch: 1   Global Step: 2840   Fp16 Grad Scale: 2048   Required: 49 hours
Training: 2022-03-04 22:36:36,677-Speed 9380.88 samples/sec   Loss 30.8827   LearningRate 0.0004   Epoch: 1   Global Step: 2850   Fp16 Grad Scale: 2048   Required: 49 hours
Training: 2022-03-04 22:37:02,944-Speed 9356.80 samples/sec   Loss 30.7280   LearningRate 0.0004   Epoch: 1   Global Step: 2860   Fp16 Grad Scale: 2048   Required: 49 hours
Training: 2022-03-04 22:37:29,134-Speed 9384.01 samples/sec   Loss 30.6384   LearningRate 0.0004   Epoch: 1   Global Step: 2870   Fp16 Grad Scale: 2048   Required: 49 hours
Training: 2022-03-04 22:37:55,373-Speed 9366.71 samples/sec   Loss 30.5557   LearningRate 0.0004   Epoch: 1   Global Step: 2880   Fp16 Grad Scale: 2048   Required: 49 hours
Training: 2022-03-04 22:38:21,606-Speed 9368.57 samples/sec   Loss 30.4550   LearningRate 0.0004   Epoch: 1   Global Step: 2890   Fp16 Grad Scale: 2048   Required: 49 hours
Training: 2022-03-04 22:38:47,922-Speed 9339.28 samples/sec   Loss 30.3982   LearningRate 0.0004   Epoch: 1   Global Step: 2900   Fp16 Grad Scale: 2048   Required: 49 hours
Training: 2022-03-04 22:39:14,145-Speed 9372.41 samples/sec   Loss 30.3677   LearningRate 0.0004   Epoch: 1   Global Step: 2910   Fp16 Grad Scale: 2048   Required: 49 hours
Training: 2022-03-04 22:39:40,351-Speed 9378.43 samples/sec   Loss 30.2269   LearningRate 0.0004   Epoch: 1   Global Step: 2920   Fp16 Grad Scale: 2048   Required: 49 hours
Training: 2022-03-04 22:40:06,544-Speed 9382.99 samples/sec   Loss 30.1060   LearningRate 0.0004   Epoch: 1   Global Step: 2930   Fp16 Grad Scale: 2048   Required: 49 hours
Training: 2022-03-04 22:40:32,763-Speed 9374.07 samples/sec   Loss 30.0018   LearningRate 0.0004   Epoch: 1   Global Step: 2940   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 22:40:58,944-Speed 9387.21 samples/sec   Loss 29.8765   LearningRate 0.0004   Epoch: 1   Global Step: 2950   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 22:41:25,121-Speed 9388.99 samples/sec   Loss 29.7620   LearningRate 0.0004   Epoch: 1   Global Step: 2960   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 22:41:51,301-Speed 9387.38 samples/sec   Loss 29.6872   LearningRate 0.0004   Epoch: 1   Global Step: 2970   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 22:42:17,483-Speed 9387.41 samples/sec   Loss 29.5593   LearningRate 0.0004   Epoch: 1   Global Step: 2980   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 22:42:43,703-Speed 9373.50 samples/sec   Loss 29.4333   LearningRate 0.0004   Epoch: 1   Global Step: 2990   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 22:43:09,948-Speed 9364.46 samples/sec   Loss 29.3521   LearningRate 0.0004   Epoch: 1   Global Step: 3000   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-03-04 22:43:36,187-Speed 9366.76 samples/sec   Loss 29.2615   LearningRate 0.0004   Epoch: 1   Global Step: 3010   Fp16 Grad Scale: 4096   Required: 48 hours
Training: 2022-03-04 22:44:02,420-Speed 9368.73 samples/sec   Loss 29.1430   LearningRate 0.0004   Epoch: 1   Global Step: 3020   Fp16 Grad Scale: 4096   Required: 48 hours
Training: 2022-03-04 22:44:28,648-Speed 9370.59 samples/sec   Loss 29.0706   LearningRate 0.0004   Epoch: 1   Global Step: 3030   Fp16 Grad Scale: 4096   Required: 48 hours
Training: 2022-03-04 22:44:54,898-Speed 9363.08 samples/sec   Loss 28.9694   LearningRate 0.0004   Epoch: 1   Global Step: 3040   Fp16 Grad Scale: 8192   Required: 48 hours
Training: 2022-03-04 22:45:21,159-Speed 9358.54 samples/sec   Loss 28.9417   LearningRate 0.0004   Epoch: 1   Global Step: 3050   Fp16 Grad Scale: 8192   Required: 48 hours
Training: 2022-03-04 22:45:47,395-Speed 9367.79 samples/sec   Loss 28.7467   LearningRate 0.0004   Epoch: 1   Global Step: 3060   Fp16 Grad Scale: 8192   Required: 48 hours
Training: 2022-03-04 22:46:13,582-Speed 9385.30 samples/sec   Loss 28.6278   LearningRate 0.0004   Epoch: 1   Global Step: 3070   Fp16 Grad Scale: 8192   Required: 48 hours
Training: 2022-03-04 22:46:39,849-Speed 9356.31 samples/sec   Loss 28.5329   LearningRate 0.0004   Epoch: 1   Global Step: 3080   Fp16 Grad Scale: 8192   Required: 48 hours
Training: 2022-03-04 22:47:05,985-Speed 9403.47 samples/sec   Loss 28.3850   LearningRate 0.0004   Epoch: 1   Global Step: 3090   Fp16 Grad Scale: 8192   Required: 48 hours
Training: 2022-03-04 22:47:32,217-Speed 9369.12 samples/sec   Loss 28.2506   LearningRate 0.0004   Epoch: 1   Global Step: 3100   Fp16 Grad Scale: 8192   Required: 48 hours
Training: 2022-03-04 22:47:58,476-Speed 9359.67 samples/sec   Loss 28.1208   LearningRate 0.0004   Epoch: 1   Global Step: 3110   Fp16 Grad Scale: 8192   Required: 48 hours
Training: 2022-03-04 22:48:24,642-Speed 9392.73 samples/sec   Loss 28.0529   LearningRate 0.0005   Epoch: 1   Global Step: 3120   Fp16 Grad Scale: 8192   Required: 48 hours
Training: 2022-03-04 22:48:50,815-Speed 9390.31 samples/sec   Loss 27.9338   LearningRate 0.0005   Epoch: 1   Global Step: 3130   Fp16 Grad Scale: 8192   Required: 48 hours
Training: 2022-03-04 22:49:17,028-Speed 9376.05 samples/sec   Loss 27.8545   LearningRate 0.0005   Epoch: 1   Global Step: 3140   Fp16 Grad Scale: 16384   Required: 48 hours
Training: 2022-03-04 22:49:43,263-Speed 9368.15 samples/sec   Loss 27.6669   LearningRate 0.0005   Epoch: 1   Global Step: 3150   Fp16 Grad Scale: 16384   Required: 48 hours
Training: 2022-03-04 22:50:09,414-Speed 9398.08 samples/sec   Loss 27.5537   LearningRate 0.0005   Epoch: 1   Global Step: 3160   Fp16 Grad Scale: 16384   Required: 48 hours
Training: 2022-03-04 22:50:35,618-Speed 9379.18 samples/sec   Loss 27.4204   LearningRate 0.0005   Epoch: 1   Global Step: 3170   Fp16 Grad Scale: 16384   Required: 48 hours
Training: 2022-03-04 22:51:01,846-Speed 9370.54 samples/sec   Loss 27.3992   LearningRate 0.0005   Epoch: 1   Global Step: 3180   Fp16 Grad Scale: 16384   Required: 48 hours
Training: 2022-03-04 22:51:28,054-Speed 9377.71 samples/sec   Loss 27.2903   LearningRate 0.0005   Epoch: 1   Global Step: 3190   Fp16 Grad Scale: 16384   Required: 48 hours
Training: 2022-03-04 22:51:54,251-Speed 9381.53 samples/sec   Loss 27.0632   LearningRate 0.0005   Epoch: 1   Global Step: 3200   Fp16 Grad Scale: 16384   Required: 48 hours
Training: 2022-03-04 22:52:20,535-Speed 9350.62 samples/sec   Loss 26.9365   LearningRate 0.0005   Epoch: 1   Global Step: 3210   Fp16 Grad Scale: 16384   Required: 48 hours
Training: 2022-03-04 22:52:46,804-Speed 9356.09 samples/sec   Loss 26.8220   LearningRate 0.0005   Epoch: 1   Global Step: 3220   Fp16 Grad Scale: 16384   Required: 48 hours
Training: 2022-03-04 22:53:13,050-Speed 9363.84 samples/sec   Loss 26.7746   LearningRate 0.0005   Epoch: 1   Global Step: 3230   Fp16 Grad Scale: 16384   Required: 48 hours
Training: 2022-03-04 22:53:39,311-Speed 9358.85 samples/sec   Loss 26.5821   LearningRate 0.0005   Epoch: 1   Global Step: 3240   Fp16 Grad Scale: 32768   Required: 48 hours
Training: 2022-03-04 22:54:05,521-Speed 9376.89 samples/sec   Loss 26.4576   LearningRate 0.0005   Epoch: 1   Global Step: 3250   Fp16 Grad Scale: 32768   Required: 48 hours
Training: 2022-03-04 22:54:31,720-Speed 9380.88 samples/sec   Loss 26.3202   LearningRate 0.0005   Epoch: 1   Global Step: 3260   Fp16 Grad Scale: 32768   Required: 48 hours
Training: 2022-03-04 22:54:58,004-Speed 9350.70 samples/sec   Loss 26.2488   LearningRate 0.0005   Epoch: 1   Global Step: 3270   Fp16 Grad Scale: 32768   Required: 48 hours
Training: 2022-03-04 22:55:24,242-Speed 9367.08 samples/sec   Loss 26.0808   LearningRate 0.0005   Epoch: 1   Global Step: 3280   Fp16 Grad Scale: 32768   Required: 48 hours
Training: 2022-03-04 22:55:50,531-Speed 9348.67 samples/sec   Loss 25.9360   LearningRate 0.0005   Epoch: 1   Global Step: 3290   Fp16 Grad Scale: 32768   Required: 48 hours
Training: 2022-03-04 22:56:16,695-Speed 9393.71 samples/sec   Loss 25.7775   LearningRate 0.0005   Epoch: 1   Global Step: 3300   Fp16 Grad Scale: 32768   Required: 48 hours
Training: 2022-03-04 22:56:42,950-Speed 9360.75 samples/sec   Loss 25.7067   LearningRate 0.0005   Epoch: 1   Global Step: 3310   Fp16 Grad Scale: 32768   Required: 48 hours
Training: 2022-03-04 22:57:09,267-Speed 9338.61 samples/sec   Loss 25.6807   LearningRate 0.0005   Epoch: 1   Global Step: 3320   Fp16 Grad Scale: 32768   Required: 48 hours
Training: 2022-03-04 22:57:35,467-Speed 9380.54 samples/sec   Loss 25.4889   LearningRate 0.0005   Epoch: 1   Global Step: 3330   Fp16 Grad Scale: 32768   Required: 48 hours
Training: 2022-03-04 22:58:01,676-Speed 9378.17 samples/sec   Loss 25.3300   LearningRate 0.0005   Epoch: 1   Global Step: 3340   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-03-04 22:58:27,889-Speed 9375.86 samples/sec   Loss 25.3224   LearningRate 0.0005   Epoch: 1   Global Step: 3350   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-03-04 22:58:54,104-Speed 9375.46 samples/sec   Loss 25.0933   LearningRate 0.0005   Epoch: 1   Global Step: 3360   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-03-04 22:59:20,337-Speed 9368.70 samples/sec   Loss 24.9737   LearningRate 0.0005   Epoch: 1   Global Step: 3370   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-03-04 22:59:46,518-Speed 9387.30 samples/sec   Loss 24.8284   LearningRate 0.0005   Epoch: 1   Global Step: 3380   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-03-04 23:00:12,753-Speed 9368.94 samples/sec   Loss 24.7039   LearningRate 0.0005   Epoch: 1   Global Step: 3390   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-03-04 23:00:39,051-Speed 9345.81 samples/sec   Loss 24.5769   LearningRate 0.0005   Epoch: 1   Global Step: 3400   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-03-04 23:01:05,307-Speed 9360.29 samples/sec   Loss 24.4282   LearningRate 0.0005   Epoch: 1   Global Step: 3410   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-03-04 23:01:31,533-Speed 9371.14 samples/sec   Loss 24.3659   LearningRate 0.0005   Epoch: 1   Global Step: 3420   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-03-04 23:01:57,784-Speed 9362.38 samples/sec   Loss 24.2225   LearningRate 0.0005   Epoch: 1   Global Step: 3430   Fp16 Grad Scale: 65536   Required: 48 hours
Training: 2022-03-04 23:02:24,069-Speed 9350.33 samples/sec   Loss 24.1185   LearningRate 0.0005   Epoch: 1   Global Step: 3440   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-03-04 23:02:50,272-Speed 9379.44 samples/sec   Loss 23.9282   LearningRate 0.0005   Epoch: 1   Global Step: 3450   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-03-04 23:04:10,139-Speed 3077.17 samples/sec   Loss 23.8230   LearningRate 0.0005   Epoch: 2   Global Step: 3460   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-03-04 23:04:36,139-Speed 9452.56 samples/sec   Loss 23.7455   LearningRate 0.0005   Epoch: 2   Global Step: 3470   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-03-04 23:05:02,222-Speed 9422.87 samples/sec   Loss 23.5690   LearningRate 0.0005   Epoch: 2   Global Step: 3480   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-03-04 23:05:28,301-Speed 9423.94 samples/sec   Loss 23.4775   LearningRate 0.0005   Epoch: 2   Global Step: 3490   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-03-04 23:05:54,446-Speed 9400.39 samples/sec   Loss 23.3041   LearningRate 0.0005   Epoch: 2   Global Step: 3500   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-03-04 23:06:20,546-Speed 9417.04 samples/sec   Loss 23.2133   LearningRate 0.0005   Epoch: 2   Global Step: 3510   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-03-04 23:06:46,535-Speed 9456.85 samples/sec   Loss 23.1278   LearningRate 0.0005   Epoch: 2   Global Step: 3520   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-03-04 23:07:12,568-Speed 9440.74 samples/sec   Loss 22.9712   LearningRate 0.0005   Epoch: 2   Global Step: 3530   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-03-04 23:07:38,584-Speed 9447.04 samples/sec   Loss 22.8904   LearningRate 0.0005   Epoch: 2   Global Step: 3540   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:08:04,698-Speed 9411.53 samples/sec   Loss 22.7847   LearningRate 0.0005   Epoch: 2   Global Step: 3550   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:08:30,751-Speed 9433.58 samples/sec   Loss 22.6194   LearningRate 0.0005   Epoch: 2   Global Step: 3560   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:08:56,902-Speed 9398.39 samples/sec   Loss 22.5263   LearningRate 0.0005   Epoch: 2   Global Step: 3570   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:09:22,972-Speed 9427.18 samples/sec   Loss 22.4015   LearningRate 0.0005   Epoch: 2   Global Step: 3580   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:09:49,121-Speed 9399.20 samples/sec   Loss 22.2545   LearningRate 0.0005   Epoch: 2   Global Step: 3590   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:10:15,291-Speed 9391.05 samples/sec   Loss 22.2213   LearningRate 0.0005   Epoch: 2   Global Step: 3600   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:10:41,367-Speed 9425.38 samples/sec   Loss 22.0199   LearningRate 0.0005   Epoch: 2   Global Step: 3610   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:11:07,572-Speed 9378.92 samples/sec   Loss 21.9057   LearningRate 0.0005   Epoch: 2   Global Step: 3620   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:11:33,770-Speed 9381.39 samples/sec   Loss 21.8135   LearningRate 0.0005   Epoch: 2   Global Step: 3630   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:12:00,149-Speed 9316.99 samples/sec   Loss 21.7001   LearningRate 0.0005   Epoch: 2   Global Step: 3640   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:12:26,307-Speed 9395.63 samples/sec   Loss 21.5335   LearningRate 0.0005   Epoch: 2   Global Step: 3650   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:12:52,480-Speed 9390.38 samples/sec   Loss 21.4874   LearningRate 0.0005   Epoch: 2   Global Step: 3660   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:13:18,697-Speed 9374.67 samples/sec   Loss 21.3727   LearningRate 0.0005   Epoch: 2   Global Step: 3670   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:13:44,987-Speed 9348.20 samples/sec   Loss 21.1840   LearningRate 0.0005   Epoch: 2   Global Step: 3680   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:14:11,127-Speed 9402.22 samples/sec   Loss 21.0678   LearningRate 0.0005   Epoch: 2   Global Step: 3690   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:14:37,264-Speed 9403.30 samples/sec   Loss 21.0521   LearningRate 0.0005   Epoch: 2   Global Step: 3700   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:15:03,414-Speed 9398.75 samples/sec   Loss 20.9104   LearningRate 0.0005   Epoch: 2   Global Step: 3710   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:15:29,520-Speed 9414.37 samples/sec   Loss 20.6769   LearningRate 0.0005   Epoch: 2   Global Step: 3720   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:15:55,688-Speed 9391.86 samples/sec   Loss 20.6337   LearningRate 0.0005   Epoch: 2   Global Step: 3730   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:16:21,919-Speed 9369.79 samples/sec   Loss 20.5441   LearningRate 0.0005   Epoch: 2   Global Step: 3740   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:16:48,131-Speed 9376.11 samples/sec   Loss 20.4279   LearningRate 0.0005   Epoch: 2   Global Step: 3750   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:17:14,379-Speed 9363.36 samples/sec   Loss 20.3371   LearningRate 0.0005   Epoch: 2   Global Step: 3760   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:17:40,670-Speed 9348.17 samples/sec   Loss 20.1477   LearningRate 0.0005   Epoch: 2   Global Step: 3770   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:18:06,883-Speed 9375.85 samples/sec   Loss 20.0800   LearningRate 0.0005   Epoch: 2   Global Step: 3780   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:18:33,114-Speed 9369.87 samples/sec   Loss 19.9308   LearningRate 0.0005   Epoch: 2   Global Step: 3790   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:18:59,360-Speed 9363.77 samples/sec   Loss 19.8609   LearningRate 0.0005   Epoch: 2   Global Step: 3800   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:19:25,579-Speed 9373.87 samples/sec   Loss 19.7468   LearningRate 0.0006   Epoch: 2   Global Step: 3810   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:19:51,875-Speed 9346.31 samples/sec   Loss 19.6391   LearningRate 0.0006   Epoch: 2   Global Step: 3820   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:20:18,282-Speed 9306.88 samples/sec   Loss 19.4825   LearningRate 0.0006   Epoch: 2   Global Step: 3830   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:20:44,685-Speed 9308.44 samples/sec   Loss 19.3966   LearningRate 0.0006   Epoch: 2   Global Step: 3840   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:21:10,970-Speed 9350.06 samples/sec   Loss 19.3295   LearningRate 0.0006   Epoch: 2   Global Step: 3850   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:21:37,176-Speed 9378.27 samples/sec   Loss 19.1544   LearningRate 0.0006   Epoch: 2   Global Step: 3860   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:22:03,405-Speed 9370.34 samples/sec   Loss 19.0620   LearningRate 0.0006   Epoch: 2   Global Step: 3870   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:22:29,603-Speed 9381.26 samples/sec   Loss 19.0348   LearningRate 0.0006   Epoch: 2   Global Step: 3880   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-03-04 23:22:55,890-Speed 9349.78 samples/sec   Loss 18.8646   LearningRate 0.0006   Epoch: 2   Global Step: 3890   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-03-04 23:23:22,053-Speed 9393.57 samples/sec   Loss 18.8054   LearningRate 0.0006   Epoch: 2   Global Step: 3900   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-03-04 23:23:48,219-Speed 9393.01 samples/sec   Loss 18.6878   LearningRate 0.0006   Epoch: 2   Global Step: 3910   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-03-04 23:24:14,392-Speed 9390.70 samples/sec   Loss 18.5648   LearningRate 0.0006   Epoch: 2   Global Step: 3920   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-03-04 23:24:40,571-Speed 9388.08 samples/sec   Loss 18.4951   LearningRate 0.0006   Epoch: 2   Global Step: 3930   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-03-04 23:25:06,809-Speed 9366.96 samples/sec   Loss 18.3457   LearningRate 0.0006   Epoch: 2   Global Step: 3940   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-03-04 23:25:33,031-Speed 9372.71 samples/sec   Loss 18.3018   LearningRate 0.0006   Epoch: 2   Global Step: 3950   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-03-04 23:25:59,182-Speed 9398.03 samples/sec   Loss 18.2073   LearningRate 0.0006   Epoch: 2   Global Step: 3960   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-03-04 23:26:25,410-Speed 9370.57 samples/sec   Loss 18.0502   LearningRate 0.0006   Epoch: 2   Global Step: 3970   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-03-04 23:26:51,607-Speed 9381.60 samples/sec   Loss 18.0014   LearningRate 0.0006   Epoch: 2   Global Step: 3980   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:27:17,778-Speed 9390.74 samples/sec   Loss 17.8644   LearningRate 0.0006   Epoch: 2   Global Step: 3990   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:27:43,990-Speed 9376.63 samples/sec   Loss 17.7584   LearningRate 0.0006   Epoch: 2   Global Step: 4000   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:28:10,176-Speed 9385.65 samples/sec   Loss 17.6530   LearningRate 0.0006   Epoch: 2   Global Step: 4010   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:28:36,453-Speed 9352.92 samples/sec   Loss 17.5815   LearningRate 0.0006   Epoch: 2   Global Step: 4020   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:29:02,752-Speed 9345.08 samples/sec   Loss 17.4636   LearningRate 0.0006   Epoch: 2   Global Step: 4030   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:29:29,042-Speed 9348.51 samples/sec   Loss 17.4181   LearningRate 0.0006   Epoch: 2   Global Step: 4040   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:29:55,335-Speed 9347.41 samples/sec   Loss 17.3387   LearningRate 0.0006   Epoch: 2   Global Step: 4050   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:30:21,561-Speed 9370.97 samples/sec   Loss 17.1701   LearningRate 0.0006   Epoch: 2   Global Step: 4060   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:30:47,770-Speed 9377.22 samples/sec   Loss 17.1661   LearningRate 0.0006   Epoch: 2   Global Step: 4070   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:31:14,060-Speed 9348.65 samples/sec   Loss 17.0247   LearningRate 0.0006   Epoch: 2   Global Step: 4080   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:31:40,254-Speed 9382.76 samples/sec   Loss 16.8869   LearningRate 0.0006   Epoch: 2   Global Step: 4090   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:32:06,575-Speed 9337.39 samples/sec   Loss 16.8229   LearningRate 0.0006   Epoch: 2   Global Step: 4100   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:32:32,807-Speed 9369.24 samples/sec   Loss 16.7196   LearningRate 0.0006   Epoch: 2   Global Step: 4110   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:32:58,972-Speed 9393.48 samples/sec   Loss 16.6642   LearningRate 0.0006   Epoch: 2   Global Step: 4120   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:33:25,205-Speed 9368.77 samples/sec   Loss 16.5082   LearningRate 0.0006   Epoch: 2   Global Step: 4130   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:33:51,478-Speed 9354.51 samples/sec   Loss 16.5088   LearningRate 0.0006   Epoch: 2   Global Step: 4140   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:34:17,707-Speed 9370.31 samples/sec   Loss 16.4001   LearningRate 0.0006   Epoch: 2   Global Step: 4150   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:34:43,981-Speed 9354.26 samples/sec   Loss 16.3546   LearningRate 0.0006   Epoch: 2   Global Step: 4160   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:35:10,191-Speed 9376.90 samples/sec   Loss 16.3827   LearningRate 0.0006   Epoch: 2   Global Step: 4170   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:35:36,378-Speed 9384.82 samples/sec   Loss 16.1647   LearningRate 0.0006   Epoch: 2   Global Step: 4180   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:36:02,494-Speed 9410.89 samples/sec   Loss 16.0192   LearningRate 0.0006   Epoch: 2   Global Step: 4190   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:36:28,580-Speed 9421.42 samples/sec   Loss 15.9341   LearningRate 0.0006   Epoch: 2   Global Step: 4200   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:36:54,793-Speed 9376.03 samples/sec   Loss 15.8689   LearningRate 0.0006   Epoch: 2   Global Step: 4210   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:37:20,911-Speed 9410.04 samples/sec   Loss 15.8010   LearningRate 0.0006   Epoch: 2   Global Step: 4220   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:37:47,078-Speed 9392.10 samples/sec   Loss 15.7191   LearningRate 0.0006   Epoch: 2   Global Step: 4230   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:38:13,263-Speed 9386.25 samples/sec   Loss 15.6540   LearningRate 0.0006   Epoch: 2   Global Step: 4240   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:38:39,433-Speed 9391.10 samples/sec   Loss 15.5965   LearningRate 0.0006   Epoch: 2   Global Step: 4250   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:39:05,611-Speed 9388.41 samples/sec   Loss 15.4284   LearningRate 0.0006   Epoch: 2   Global Step: 4260   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:39:31,774-Speed 9393.70 samples/sec   Loss 15.3716   LearningRate 0.0006   Epoch: 2   Global Step: 4270   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:39:57,957-Speed 9386.85 samples/sec   Loss 15.3776   LearningRate 0.0006   Epoch: 2   Global Step: 4280   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:40:24,174-Speed 9374.67 samples/sec   Loss 15.2983   LearningRate 0.0006   Epoch: 2   Global Step: 4290   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:40:50,549-Speed 9318.25 samples/sec   Loss 15.2279   LearningRate 0.0006   Epoch: 2   Global Step: 4300   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:41:16,827-Speed 9352.72 samples/sec   Loss 15.0487   LearningRate 0.0006   Epoch: 2   Global Step: 4310   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:41:43,136-Speed 9341.68 samples/sec   Loss 15.0302   LearningRate 0.0006   Epoch: 2   Global Step: 4320   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:42:09,333-Speed 9382.62 samples/sec   Loss 14.9198   LearningRate 0.0006   Epoch: 2   Global Step: 4330   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:42:35,571-Speed 9366.95 samples/sec   Loss 14.8411   LearningRate 0.0006   Epoch: 2   Global Step: 4340   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:43:01,792-Speed 9372.89 samples/sec   Loss 14.7905   LearningRate 0.0006   Epoch: 2   Global Step: 4350   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:43:27,964-Speed 9390.41 samples/sec   Loss 14.7908   LearningRate 0.0006   Epoch: 2   Global Step: 4360   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:43:54,193-Speed 9370.31 samples/sec   Loss 14.6256   LearningRate 0.0006   Epoch: 2   Global Step: 4370   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:44:20,371-Speed 9388.79 samples/sec   Loss 14.5524   LearningRate 0.0006   Epoch: 2   Global Step: 4380   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:44:46,543-Speed 9390.36 samples/sec   Loss 14.4723   LearningRate 0.0006   Epoch: 2   Global Step: 4390   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-03-04 23:45:12,699-Speed 9396.58 samples/sec   Loss 14.4378   LearningRate 0.0006   Epoch: 2   Global Step: 4400   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-03-04 23:45:38,896-Speed 9381.30 samples/sec   Loss 14.3879   LearningRate 0.0006   Epoch: 2   Global Step: 4410   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-03-04 23:46:05,061-Speed 9394.45 samples/sec   Loss 14.3904   LearningRate 0.0006   Epoch: 2   Global Step: 4420   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-03-04 23:46:31,282-Speed 9372.83 samples/sec   Loss 14.2867   LearningRate 0.0006   Epoch: 2   Global Step: 4430   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-03-04 23:46:57,449-Speed 9392.47 samples/sec   Loss 14.1646   LearningRate 0.0006   Epoch: 2   Global Step: 4440   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-03-04 23:47:23,780-Speed 9333.76 samples/sec   Loss 14.0904   LearningRate 0.0006   Epoch: 2   Global Step: 4450   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-03-04 23:47:50,115-Speed 9332.61 samples/sec   Loss 14.0519   LearningRate 0.0006   Epoch: 2   Global Step: 4460   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-03-04 23:48:16,389-Speed 9354.25 samples/sec   Loss 13.9958   LearningRate 0.0006   Epoch: 2   Global Step: 4470   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-03-04 23:48:42,745-Speed 9325.36 samples/sec   Loss 13.8737   LearningRate 0.0006   Epoch: 2   Global Step: 4480   Fp16 Grad Scale: 131072   Required: 48 hours
Training: 2022-03-04 23:49:08,956-Speed 9376.56 samples/sec   Loss 13.8452   LearningRate 0.0006   Epoch: 2   Global Step: 4490   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:49:35,221-Speed 9357.47 samples/sec   Loss 13.7793   LearningRate 0.0007   Epoch: 2   Global Step: 4500   Fp16 Grad Scale: 262144   Required: 48 hours
Training: 2022-03-04 23:50:01,435-Speed 9375.75 samples/sec   Loss 13.7203   LearningRate 0.0007   Epoch: 2   Global Step: 4510   Fp16 Grad Scale: 262144   Required: 47 hours
Training: 2022-03-04 23:50:27,570-Speed 9403.85 samples/sec   Loss 13.6260   LearningRate 0.0007   Epoch: 2   Global Step: 4520   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-04 23:50:53,710-Speed 9402.13 samples/sec   Loss 13.5629   LearningRate 0.0007   Epoch: 2   Global Step: 4530   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-04 23:51:19,926-Speed 9375.12 samples/sec   Loss 13.5103   LearningRate 0.0007   Epoch: 2   Global Step: 4540   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-04 23:51:46,109-Speed 9386.57 samples/sec   Loss 13.5108   LearningRate 0.0007   Epoch: 2   Global Step: 4550   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-04 23:52:12,233-Speed 9407.95 samples/sec   Loss 13.4399   LearningRate 0.0007   Epoch: 2   Global Step: 4560   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-04 23:52:38,476-Speed 9365.36 samples/sec   Loss 13.3145   LearningRate 0.0007   Epoch: 2   Global Step: 4570   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-04 23:53:04,691-Speed 9375.04 samples/sec   Loss 13.2925   LearningRate 0.0007   Epoch: 2   Global Step: 4580   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-04 23:53:30,884-Speed 9383.21 samples/sec   Loss 13.1523   LearningRate 0.0007   Epoch: 2   Global Step: 4590   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-04 23:53:57,063-Speed 9388.19 samples/sec   Loss 13.1162   LearningRate 0.0007   Epoch: 2   Global Step: 4600   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-04 23:54:23,228-Speed 9392.91 samples/sec   Loss 13.1042   LearningRate 0.0007   Epoch: 2   Global Step: 4610   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-04 23:54:49,529-Speed 9344.65 samples/sec   Loss 13.0833   LearningRate 0.0007   Epoch: 2   Global Step: 4620   Fp16 Grad Scale: 262144   Required: 47 hours
Training: 2022-03-04 23:55:15,798-Speed 9355.93 samples/sec   Loss 13.0501   LearningRate 0.0007   Epoch: 2   Global Step: 4630   Fp16 Grad Scale: 262144   Required: 47 hours
Training: 2022-03-04 23:55:42,032-Speed 9368.37 samples/sec   Loss 12.9008   LearningRate 0.0007   Epoch: 2   Global Step: 4640   Fp16 Grad Scale: 262144   Required: 47 hours
Training: 2022-03-04 23:56:08,277-Speed 9364.65 samples/sec   Loss 12.8563   LearningRate 0.0007   Epoch: 2   Global Step: 4650   Fp16 Grad Scale: 262144   Required: 47 hours
Training: 2022-03-04 23:56:34,472-Speed 9382.26 samples/sec   Loss 12.8341   LearningRate 0.0007   Epoch: 2   Global Step: 4660   Fp16 Grad Scale: 262144   Required: 47 hours
Training: 2022-03-04 23:57:00,775-Speed 9343.95 samples/sec   Loss 12.7723   LearningRate 0.0007   Epoch: 2   Global Step: 4670   Fp16 Grad Scale: 262144   Required: 47 hours
Training: 2022-03-04 23:57:27,054-Speed 9353.24 samples/sec   Loss 12.7329   LearningRate 0.0007   Epoch: 2   Global Step: 4680   Fp16 Grad Scale: 262144   Required: 47 hours
Training: 2022-03-04 23:57:53,319-Speed 9357.83 samples/sec   Loss 12.6433   LearningRate 0.0007   Epoch: 2   Global Step: 4690   Fp16 Grad Scale: 262144   Required: 47 hours
Training: 2022-03-04 23:58:19,457-Speed 9402.68 samples/sec   Loss 12.6099   LearningRate 0.0007   Epoch: 2   Global Step: 4700   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-04 23:58:45,719-Speed 9358.53 samples/sec   Loss 12.5604   LearningRate 0.0007   Epoch: 2   Global Step: 4710   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-04 23:59:11,939-Speed 9373.05 samples/sec   Loss 12.5427   LearningRate 0.0007   Epoch: 2   Global Step: 4720   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-04 23:59:38,141-Speed 9380.02 samples/sec   Loss 12.4480   LearningRate 0.0007   Epoch: 2   Global Step: 4730   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:00:04,294-Speed 9397.48 samples/sec   Loss 12.4492   LearningRate 0.0007   Epoch: 2   Global Step: 4740   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:00:30,467-Speed 9390.24 samples/sec   Loss 12.3986   LearningRate 0.0007   Epoch: 2   Global Step: 4750   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:00:56,683-Speed 9374.64 samples/sec   Loss 12.2626   LearningRate 0.0007   Epoch: 2   Global Step: 4760   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:01:22,898-Speed 9375.29 samples/sec   Loss 12.2106   LearningRate 0.0007   Epoch: 2   Global Step: 4770   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:01:49,112-Speed 9375.41 samples/sec   Loss 12.2160   LearningRate 0.0007   Epoch: 2   Global Step: 4780   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:02:15,337-Speed 9371.53 samples/sec   Loss 12.1175   LearningRate 0.0007   Epoch: 2   Global Step: 4790   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:02:41,640-Speed 9343.78 samples/sec   Loss 12.0901   LearningRate 0.0007   Epoch: 2   Global Step: 4800   Fp16 Grad Scale: 262144   Required: 47 hours
Training: 2022-03-05 00:03:07,766-Speed 9407.24 samples/sec   Loss 12.0638   LearningRate 0.0007   Epoch: 2   Global Step: 4810   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:03:34,086-Speed 9337.84 samples/sec   Loss 12.0581   LearningRate 0.0007   Epoch: 2   Global Step: 4820   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:04:00,339-Speed 9361.57 samples/sec   Loss 11.9482   LearningRate 0.0007   Epoch: 2   Global Step: 4830   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:04:26,583-Speed 9364.53 samples/sec   Loss 11.9483   LearningRate 0.0007   Epoch: 2   Global Step: 4840   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:04:52,746-Speed 9394.16 samples/sec   Loss 11.8822   LearningRate 0.0007   Epoch: 2   Global Step: 4850   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:05:18,855-Speed 9413.09 samples/sec   Loss 11.8264   LearningRate 0.0007   Epoch: 2   Global Step: 4860   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-03-05 00:05:45,010-Speed 9397.00 samples/sec   Loss 11.7611   LearningRate 0.0007   Epoch: 2   Global Step: 4870   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-03-05 00:06:11,238-Speed 9370.31 samples/sec   Loss 11.7104   LearningRate 0.0007   Epoch: 2   Global Step: 4880   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-03-05 00:06:37,448-Speed 9376.79 samples/sec   Loss 11.7299   LearningRate 0.0007   Epoch: 2   Global Step: 4890   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-03-05 00:07:03,609-Speed 9394.64 samples/sec   Loss 11.6169   LearningRate 0.0007   Epoch: 2   Global Step: 4900   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-03-05 00:07:29,776-Speed 9392.93 samples/sec   Loss 11.5412   LearningRate 0.0007   Epoch: 2   Global Step: 4910   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-03-05 00:07:56,043-Speed 9356.85 samples/sec   Loss 11.5923   LearningRate 0.0007   Epoch: 2   Global Step: 4920   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-03-05 00:08:22,308-Speed 9357.39 samples/sec   Loss 11.4901   LearningRate 0.0007   Epoch: 2   Global Step: 4930   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-03-05 00:08:48,609-Speed 9344.38 samples/sec   Loss 11.4772   LearningRate 0.0007   Epoch: 2   Global Step: 4940   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-03-05 00:09:14,749-Speed 9402.17 samples/sec   Loss 11.4281   LearningRate 0.0007   Epoch: 2   Global Step: 4950   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-03-05 00:09:40,872-Speed 9408.15 samples/sec   Loss 11.3790   LearningRate 0.0007   Epoch: 2   Global Step: 4960   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:10:07,070-Speed 9381.38 samples/sec   Loss 11.3694   LearningRate 0.0007   Epoch: 2   Global Step: 4970   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:10:33,180-Speed 9412.53 samples/sec   Loss 11.2824   LearningRate 0.0007   Epoch: 2   Global Step: 4980   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:10:59,481-Speed 9344.71 samples/sec   Loss 11.2905   LearningRate 0.0007   Epoch: 2   Global Step: 4990   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:11:25,589-Speed 9413.70 samples/sec   Loss 11.2292   LearningRate 0.0007   Epoch: 2   Global Step: 5000   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:11:51,851-Speed 9358.25 samples/sec   Loss 11.1455   LearningRate 0.0007   Epoch: 2   Global Step: 5010   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:12:18,060-Speed 9377.68 samples/sec   Loss 11.1901   LearningRate 0.0007   Epoch: 2   Global Step: 5020   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:12:44,284-Speed 9371.91 samples/sec   Loss 11.0826   LearningRate 0.0007   Epoch: 2   Global Step: 5030   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:13:10,482-Speed 9381.00 samples/sec   Loss 11.0806   LearningRate 0.0007   Epoch: 2   Global Step: 5040   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:13:36,705-Speed 9372.41 samples/sec   Loss 11.0331   LearningRate 0.0007   Epoch: 2   Global Step: 5050   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:14:02,905-Speed 9380.39 samples/sec   Loss 10.9913   LearningRate 0.0007   Epoch: 2   Global Step: 5060   Fp16 Grad Scale: 262144   Required: 47 hours
Training: 2022-03-05 00:14:29,133-Speed 9370.65 samples/sec   Loss 10.9791   LearningRate 0.0007   Epoch: 2   Global Step: 5070   Fp16 Grad Scale: 262144   Required: 47 hours
Training: 2022-03-05 00:14:55,308-Speed 9389.33 samples/sec   Loss 10.9060   LearningRate 0.0007   Epoch: 2   Global Step: 5080   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:15:21,515-Speed 9378.23 samples/sec   Loss 10.8355   LearningRate 0.0007   Epoch: 2   Global Step: 5090   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:15:47,597-Speed 9423.11 samples/sec   Loss 10.7892   LearningRate 0.0007   Epoch: 2   Global Step: 5100   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-03-05 00:16:13,691-Speed 9418.67 samples/sec   Loss 10.8642   LearningRate 0.0007   Epoch: 2   Global Step: 5110   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-03-05 00:16:39,970-Speed 9352.02 samples/sec   Loss 10.7822   LearningRate 0.0007   Epoch: 2   Global Step: 5120   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-03-05 00:17:06,175-Speed 9378.86 samples/sec   Loss 10.7967   LearningRate 0.0007   Epoch: 2   Global Step: 5130   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-03-05 00:17:32,404-Speed 9370.29 samples/sec   Loss 10.7192   LearningRate 0.0007   Epoch: 2   Global Step: 5140   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-03-05 00:17:58,493-Speed 9420.55 samples/sec   Loss 10.7132   LearningRate 0.0007   Epoch: 2   Global Step: 5150   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-03-05 00:18:24,598-Speed 9414.85 samples/sec   Loss 10.6081   LearningRate 0.0007   Epoch: 2   Global Step: 5160   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-03-05 00:18:50,737-Speed 9402.33 samples/sec   Loss 10.6645   LearningRate 0.0007   Epoch: 2   Global Step: 5170   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-03-05 00:19:16,922-Speed 9386.25 samples/sec   Loss 10.6182   LearningRate 0.0007   Epoch: 2   Global Step: 5180   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-03-05 00:20:36,218-Speed 3099.32 samples/sec   Loss 10.5035   LearningRate 0.0008   Epoch: 3   Global Step: 5190   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-03-05 00:21:02,213-Speed 9454.55 samples/sec   Loss 10.4244   LearningRate 0.0008   Epoch: 3   Global Step: 5200   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:21:28,318-Speed 9414.96 samples/sec   Loss 10.3780   LearningRate 0.0008   Epoch: 3   Global Step: 5210   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:21:54,574-Speed 9360.56 samples/sec   Loss 10.3785   LearningRate 0.0008   Epoch: 3   Global Step: 5220   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:22:20,726-Speed 9397.75 samples/sec   Loss 10.2695   LearningRate 0.0008   Epoch: 3   Global Step: 5230   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:22:46,847-Speed 9408.94 samples/sec   Loss 10.3271   LearningRate 0.0008   Epoch: 3   Global Step: 5240   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:23:13,026-Speed 9387.97 samples/sec   Loss 10.4045   LearningRate 0.0008   Epoch: 3   Global Step: 5250   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:23:39,265-Speed 9366.34 samples/sec   Loss 10.2201   LearningRate 0.0008   Epoch: 3   Global Step: 5260   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:24:05,430-Speed 9393.42 samples/sec   Loss 10.1490   LearningRate 0.0008   Epoch: 3   Global Step: 5270   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:24:31,626-Speed 9382.06 samples/sec   Loss 10.1389   LearningRate 0.0008   Epoch: 3   Global Step: 5280   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:24:57,818-Speed 9383.41 samples/sec   Loss 10.2095   LearningRate 0.0008   Epoch: 3   Global Step: 5290   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:25:24,090-Speed 9354.95 samples/sec   Loss 10.2266   LearningRate 0.0008   Epoch: 3   Global Step: 5300   Fp16 Grad Scale: 262144   Required: 47 hours
Training: 2022-03-05 00:25:50,293-Speed 9379.24 samples/sec   Loss 10.0993   LearningRate 0.0008   Epoch: 3   Global Step: 5310   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:26:16,506-Speed 9376.21 samples/sec   Loss 10.0482   LearningRate 0.0008   Epoch: 3   Global Step: 5320   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:26:42,712-Speed 9378.46 samples/sec   Loss 10.0422   LearningRate 0.0008   Epoch: 3   Global Step: 5330   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:27:08,930-Speed 9374.07 samples/sec   Loss 9.9599   LearningRate 0.0008   Epoch: 3   Global Step: 5340   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:27:35,147-Speed 9374.68 samples/sec   Loss 9.9745   LearningRate 0.0008   Epoch: 3   Global Step: 5350   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:28:01,373-Speed 9371.33 samples/sec   Loss 9.9956   LearningRate 0.0008   Epoch: 3   Global Step: 5360   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:28:27,529-Speed 9396.32 samples/sec   Loss 9.9758   LearningRate 0.0008   Epoch: 3   Global Step: 5370   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:28:53,631-Speed 9415.78 samples/sec   Loss 9.8970   LearningRate 0.0008   Epoch: 3   Global Step: 5380   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-03-05 00:29:19,797-Speed 9392.94 samples/sec   Loss 9.8948   LearningRate 0.0008   Epoch: 3   Global Step: 5390   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-03-05 00:29:45,968-Speed 9391.03 samples/sec   Loss 9.8361   LearningRate 0.0008   Epoch: 3   Global Step: 5400   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-03-05 00:30:12,082-Speed 9411.24 samples/sec   Loss 9.7722   LearningRate 0.0008   Epoch: 3   Global Step: 5410   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-03-05 00:30:38,165-Speed 9422.45 samples/sec   Loss 9.7329   LearningRate 0.0008   Epoch: 3   Global Step: 5420   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-03-05 00:31:04,287-Speed 9408.78 samples/sec   Loss 9.8135   LearningRate 0.0008   Epoch: 3   Global Step: 5430   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-03-05 00:31:30,498-Speed 9376.64 samples/sec   Loss 9.7378   LearningRate 0.0008   Epoch: 3   Global Step: 5440   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-03-05 00:31:56,701-Speed 9379.31 samples/sec   Loss 9.6962   LearningRate 0.0008   Epoch: 3   Global Step: 5450   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-03-05 00:32:22,888-Speed 9385.16 samples/sec   Loss 9.6719   LearningRate 0.0008   Epoch: 3   Global Step: 5460   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-03-05 00:32:49,062-Speed 9389.88 samples/sec   Loss 9.7046   LearningRate 0.0008   Epoch: 3   Global Step: 5470   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-03-05 00:33:15,239-Speed 9388.63 samples/sec   Loss 9.6187   LearningRate 0.0008   Epoch: 3   Global Step: 5480   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:33:41,421-Speed 9386.87 samples/sec   Loss 9.5813   LearningRate 0.0008   Epoch: 3   Global Step: 5490   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:34:07,647-Speed 9371.52 samples/sec   Loss 9.5887   LearningRate 0.0008   Epoch: 3   Global Step: 5500   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:34:33,838-Speed 9383.69 samples/sec   Loss 9.6137   LearningRate 0.0008   Epoch: 3   Global Step: 5510   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:34:59,948-Speed 9413.02 samples/sec   Loss 9.5180   LearningRate 0.0008   Epoch: 3   Global Step: 5520   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:35:26,154-Speed 9378.47 samples/sec   Loss 9.5224   LearningRate 0.0008   Epoch: 3   Global Step: 5530   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:35:52,304-Speed 9398.19 samples/sec   Loss 9.4515   LearningRate 0.0008   Epoch: 3   Global Step: 5540   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:36:18,526-Speed 9372.75 samples/sec   Loss 9.3836   LearningRate 0.0008   Epoch: 3   Global Step: 5550   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:36:44,665-Speed 9402.35 samples/sec   Loss 9.3551   LearningRate 0.0008   Epoch: 3   Global Step: 5560   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:37:10,747-Speed 9423.07 samples/sec   Loss 9.3328   LearningRate 0.0008   Epoch: 3   Global Step: 5570   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:37:36,945-Speed 9381.05 samples/sec   Loss 9.4622   LearningRate 0.0008   Epoch: 3   Global Step: 5580   Fp16 Grad Scale: 262144   Required: 47 hours
Training: 2022-03-05 00:38:03,070-Speed 9407.44 samples/sec   Loss 9.3932   LearningRate 0.0008   Epoch: 3   Global Step: 5590   Fp16 Grad Scale: 262144   Required: 47 hours
Training: 2022-03-05 00:38:29,269-Speed 9381.12 samples/sec   Loss 9.3170   LearningRate 0.0008   Epoch: 3   Global Step: 5600   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:38:55,481-Speed 9376.18 samples/sec   Loss 9.3042   LearningRate 0.0008   Epoch: 3   Global Step: 5610   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:39:21,633-Speed 9397.75 samples/sec   Loss 9.2678   LearningRate 0.0008   Epoch: 3   Global Step: 5620   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:39:47,862-Speed 9370.19 samples/sec   Loss 9.1868   LearningRate 0.0008   Epoch: 3   Global Step: 5630   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:40:13,941-Speed 9423.96 samples/sec   Loss 9.1914   LearningRate 0.0008   Epoch: 3   Global Step: 5640   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:40:40,022-Speed 9423.20 samples/sec   Loss 9.1134   LearningRate 0.0008   Epoch: 3   Global Step: 5650   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:41:06,069-Speed 9435.39 samples/sec   Loss 9.1663   LearningRate 0.0008   Epoch: 3   Global Step: 5660   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:41:32,189-Speed 9409.50 samples/sec   Loss 9.1508   LearningRate 0.0008   Epoch: 3   Global Step: 5670   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:41:58,258-Speed 9427.51 samples/sec   Loss 9.0630   LearningRate 0.0008   Epoch: 3   Global Step: 5680   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:42:24,350-Speed 9419.49 samples/sec   Loss 9.0734   LearningRate 0.0008   Epoch: 3   Global Step: 5690   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:42:50,463-Speed 9411.49 samples/sec   Loss 9.0755   LearningRate 0.0008   Epoch: 3   Global Step: 5700   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:43:16,589-Speed 9407.26 samples/sec   Loss 9.0723   LearningRate 0.0008   Epoch: 3   Global Step: 5710   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:43:42,811-Speed 9372.75 samples/sec   Loss 9.0221   LearningRate 0.0008   Epoch: 3   Global Step: 5720   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:44:08,958-Speed 9399.28 samples/sec   Loss 9.0340   LearningRate 0.0008   Epoch: 3   Global Step: 5730   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:44:35,064-Speed 9414.27 samples/sec   Loss 8.9136   LearningRate 0.0008   Epoch: 3   Global Step: 5740   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-03-05 00:45:01,203-Speed 9402.39 samples/sec   Loss 8.9262   LearningRate 0.0008   Epoch: 3   Global Step: 5750   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-03-05 00:45:27,368-Speed 9393.41 samples/sec   Loss 8.8822   LearningRate 0.0008   Epoch: 3   Global Step: 5760   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-03-05 00:45:53,503-Speed 9403.62 samples/sec   Loss 8.8773   LearningRate 0.0008   Epoch: 3   Global Step: 5770   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-03-05 00:46:19,629-Speed 9407.16 samples/sec   Loss 8.8142   LearningRate 0.0008   Epoch: 3   Global Step: 5780   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-03-05 00:46:45,812-Speed 9386.53 samples/sec   Loss 8.9814   LearningRate 0.0008   Epoch: 3   Global Step: 5790   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-03-05 00:47:11,861-Speed 9435.06 samples/sec   Loss 8.8660   LearningRate 0.0008   Epoch: 3   Global Step: 5800   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-03-05 00:47:37,895-Speed 9440.77 samples/sec   Loss 8.8238   LearningRate 0.0008   Epoch: 3   Global Step: 5810   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-03-05 00:48:03,987-Speed 9419.25 samples/sec   Loss 8.7663   LearningRate 0.0008   Epoch: 3   Global Step: 5820   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-03-05 00:48:30,173-Speed 9385.68 samples/sec   Loss 8.7663   LearningRate 0.0008   Epoch: 3   Global Step: 5830   Fp16 Grad Scale: 65536   Required: 47 hours
Training: 2022-03-05 00:48:56,290-Speed 9410.32 samples/sec   Loss 8.7775   LearningRate 0.0008   Epoch: 3   Global Step: 5840   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:49:22,475-Speed 9386.24 samples/sec   Loss 8.7606   LearningRate 0.0008   Epoch: 3   Global Step: 5850   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:49:48,592-Speed 9410.52 samples/sec   Loss 8.7120   LearningRate 0.0008   Epoch: 3   Global Step: 5860   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:50:14,749-Speed 9395.97 samples/sec   Loss 8.6315   LearningRate 0.0008   Epoch: 3   Global Step: 5870   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:50:40,883-Speed 9404.13 samples/sec   Loss 8.6725   LearningRate 0.0009   Epoch: 3   Global Step: 5880   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:51:07,099-Speed 9374.75 samples/sec   Loss 8.6652   LearningRate 0.0009   Epoch: 3   Global Step: 5890   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:51:33,304-Speed 9378.84 samples/sec   Loss 8.7229   LearningRate 0.0009   Epoch: 3   Global Step: 5900   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:51:59,466-Speed 9394.23 samples/sec   Loss 8.6382   LearningRate 0.0009   Epoch: 3   Global Step: 5910   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-03-05 00:52:25,645-Speed 9388.29 samples/sec   Loss 8.5562   LearningRate 0.0009   Epoch: 3   Global Step: 5920   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 00:52:51,836-Speed 9383.71 samples/sec   Loss 8.5631   LearningRate 0.0009   Epoch: 3   Global Step: 5930   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 00:53:17,972-Speed 9403.32 samples/sec   Loss 8.5657   LearningRate 0.0009   Epoch: 3   Global Step: 5940   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-03-05 00:53:44,163-Speed 9383.96 samples/sec   Loss 8.5307   LearningRate 0.0009   Epoch: 3   Global Step: 5950   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-03-05 00:54:10,401-Speed 9367.25 samples/sec   Loss 8.4970   LearningRate 0.0009   Epoch: 3   Global Step: 5960   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-03-05 00:54:36,558-Speed 9395.74 samples/sec   Loss 8.4642   LearningRate 0.0009   Epoch: 3   Global Step: 5970   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-03-05 00:55:02,798-Speed 9366.40 samples/sec   Loss 8.4620   LearningRate 0.0009   Epoch: 3   Global Step: 5980   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-03-05 00:55:28,916-Speed 9409.79 samples/sec   Loss 8.4322   LearningRate 0.0009   Epoch: 3   Global Step: 5990   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-03-05 00:55:55,052-Speed 9403.94 samples/sec   Loss 8.3797   LearningRate 0.0009   Epoch: 3   Global Step: 6000   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-03-05 00:56:21,315-Speed 9357.93 samples/sec   Loss 8.4301   LearningRate 0.0009   Epoch: 3   Global Step: 6010   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-03-05 00:56:47,428-Speed 9412.34 samples/sec   Loss 8.3944   LearningRate 0.0009   Epoch: 3   Global Step: 6020   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-03-05 00:57:13,539-Speed 9412.48 samples/sec   Loss 8.3799   LearningRate 0.0009   Epoch: 3   Global Step: 6030   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-03-05 00:57:39,722-Speed 9386.53 samples/sec   Loss 8.3421   LearningRate 0.0009   Epoch: 3   Global Step: 6040   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 00:58:05,866-Speed 9400.88 samples/sec   Loss 8.3068   LearningRate 0.0009   Epoch: 3   Global Step: 6050   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 00:58:31,928-Speed 9430.07 samples/sec   Loss 8.2714   LearningRate 0.0009   Epoch: 3   Global Step: 6060   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-03-05 00:58:58,065-Speed 9403.33 samples/sec   Loss 8.3461   LearningRate 0.0009   Epoch: 3   Global Step: 6070   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-03-05 00:59:24,218-Speed 9397.26 samples/sec   Loss 8.2961   LearningRate 0.0009   Epoch: 3   Global Step: 6080   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-03-05 00:59:50,315-Speed 9417.61 samples/sec   Loss 8.3285   LearningRate 0.0009   Epoch: 3   Global Step: 6090   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-03-05 01:00:16,555-Speed 9367.14 samples/sec   Loss 8.2633   LearningRate 0.0009   Epoch: 3   Global Step: 6100   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-03-05 01:00:42,662-Speed 9413.85 samples/sec   Loss 8.2278   LearningRate 0.0009   Epoch: 3   Global Step: 6110   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-03-05 01:01:08,775-Speed 9412.01 samples/sec   Loss 8.1667   LearningRate 0.0009   Epoch: 3   Global Step: 6120   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-03-05 01:01:34,825-Speed 9434.34 samples/sec   Loss 8.1817   LearningRate 0.0009   Epoch: 3   Global Step: 6130   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-03-05 01:02:00,882-Speed 9431.82 samples/sec   Loss 8.1854   LearningRate 0.0009   Epoch: 3   Global Step: 6140   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-03-05 01:02:26,991-Speed 9413.51 samples/sec   Loss 8.1729   LearningRate 0.0009   Epoch: 3   Global Step: 6150   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-03-05 01:02:53,074-Speed 9422.38 samples/sec   Loss 8.1389   LearningRate 0.0009   Epoch: 3   Global Step: 6160   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:03:19,249-Speed 9389.76 samples/sec   Loss 8.0828   LearningRate 0.0009   Epoch: 3   Global Step: 6170   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:03:45,375-Speed 9406.80 samples/sec   Loss 8.0918   LearningRate 0.0009   Epoch: 3   Global Step: 6180   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:04:11,496-Speed 9408.81 samples/sec   Loss 8.1057   LearningRate 0.0009   Epoch: 3   Global Step: 6190   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:04:37,553-Speed 9432.33 samples/sec   Loss 8.0808   LearningRate 0.0009   Epoch: 3   Global Step: 6200   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:05:03,685-Speed 9404.93 samples/sec   Loss 8.0145   LearningRate 0.0009   Epoch: 3   Global Step: 6210   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-03-05 01:05:29,889-Speed 9379.47 samples/sec   Loss 8.0368   LearningRate 0.0009   Epoch: 3   Global Step: 6220   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-03-05 01:05:56,047-Speed 9395.54 samples/sec   Loss 7.9887   LearningRate 0.0009   Epoch: 3   Global Step: 6230   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-03-05 01:06:22,176-Speed 9405.98 samples/sec   Loss 8.1119   LearningRate 0.0009   Epoch: 3   Global Step: 6240   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-03-05 01:06:48,295-Speed 9409.58 samples/sec   Loss 8.0351   LearningRate 0.0009   Epoch: 3   Global Step: 6250   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-03-05 01:07:14,515-Speed 9373.58 samples/sec   Loss 7.9773   LearningRate 0.0009   Epoch: 3   Global Step: 6260   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-03-05 01:07:40,727-Speed 9376.07 samples/sec   Loss 7.9356   LearningRate 0.0009   Epoch: 3   Global Step: 6270   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-03-05 01:08:06,819-Speed 9419.43 samples/sec   Loss 7.9890   LearningRate 0.0009   Epoch: 3   Global Step: 6280   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-03-05 01:08:32,839-Speed 9445.56 samples/sec   Loss 7.9465   LearningRate 0.0009   Epoch: 3   Global Step: 6290   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-03-05 01:08:58,958-Speed 9409.56 samples/sec   Loss 7.9356   LearningRate 0.0009   Epoch: 3   Global Step: 6300   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-03-05 01:09:25,143-Speed 9385.86 samples/sec   Loss 7.8832   LearningRate 0.0009   Epoch: 3   Global Step: 6310   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:09:51,211-Speed 9428.21 samples/sec   Loss 7.8497   LearningRate 0.0009   Epoch: 3   Global Step: 6320   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:10:17,382-Speed 9390.77 samples/sec   Loss 7.7864   LearningRate 0.0009   Epoch: 3   Global Step: 6330   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:10:43,474-Speed 9419.77 samples/sec   Loss 7.8885   LearningRate 0.0009   Epoch: 3   Global Step: 6340   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:11:09,702-Speed 9370.58 samples/sec   Loss 7.8661   LearningRate 0.0009   Epoch: 3   Global Step: 6350   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:11:35,948-Speed 9363.88 samples/sec   Loss 7.7723   LearningRate 0.0009   Epoch: 3   Global Step: 6360   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:12:02,025-Speed 9424.88 samples/sec   Loss 7.8317   LearningRate 0.0009   Epoch: 3   Global Step: 6370   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:12:28,266-Speed 9365.80 samples/sec   Loss 7.7841   LearningRate 0.0009   Epoch: 3   Global Step: 6380   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:12:54,374-Speed 9413.58 samples/sec   Loss 7.7514   LearningRate 0.0009   Epoch: 3   Global Step: 6390   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:13:20,440-Speed 9428.94 samples/sec   Loss 7.8425   LearningRate 0.0009   Epoch: 3   Global Step: 6400   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:13:46,544-Speed 9414.95 samples/sec   Loss 7.8011   LearningRate 0.0009   Epoch: 3   Global Step: 6410   Fp16 Grad Scale: 262144   Required: 46 hours
Training: 2022-03-05 01:14:12,565-Speed 9444.90 samples/sec   Loss 7.7263   LearningRate 0.0009   Epoch: 3   Global Step: 6420   Fp16 Grad Scale: 262144   Required: 46 hours
Training: 2022-03-05 01:14:38,673-Speed 9413.65 samples/sec   Loss 7.6786   LearningRate 0.0009   Epoch: 3   Global Step: 6430   Fp16 Grad Scale: 262144   Required: 46 hours
Training: 2022-03-05 01:15:04,790-Speed 9410.44 samples/sec   Loss 7.6738   LearningRate 0.0009   Epoch: 3   Global Step: 6440   Fp16 Grad Scale: 262144   Required: 46 hours
Training: 2022-03-05 01:15:30,899-Speed 9413.32 samples/sec   Loss 7.6419   LearningRate 0.0009   Epoch: 3   Global Step: 6450   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:15:57,020-Speed 9409.15 samples/sec   Loss 7.6810   LearningRate 0.0009   Epoch: 3   Global Step: 6460   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:16:23,123-Speed 9415.39 samples/sec   Loss 7.6609   LearningRate 0.0009   Epoch: 3   Global Step: 6470   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:16:49,324-Speed 9380.17 samples/sec   Loss 7.6310   LearningRate 0.0009   Epoch: 3   Global Step: 6480   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:17:15,495-Speed 9390.73 samples/sec   Loss 7.6295   LearningRate 0.0009   Epoch: 3   Global Step: 6490   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:17:41,805-Speed 9341.71 samples/sec   Loss 7.5982   LearningRate 0.0009   Epoch: 3   Global Step: 6500   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:18:07,944-Speed 9402.14 samples/sec   Loss 7.5651   LearningRate 0.0009   Epoch: 3   Global Step: 6510   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:18:34,103-Speed 9395.14 samples/sec   Loss 7.5808   LearningRate 0.0009   Epoch: 3   Global Step: 6520   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:19:00,230-Speed 9406.91 samples/sec   Loss 7.5706   LearningRate 0.0009   Epoch: 3   Global Step: 6530   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:19:26,379-Speed 9399.03 samples/sec   Loss 7.5436   LearningRate 0.0009   Epoch: 3   Global Step: 6540   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:19:52,516-Speed 9403.04 samples/sec   Loss 7.5398   LearningRate 0.0009   Epoch: 3   Global Step: 6550   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:20:18,723-Speed 9377.78 samples/sec   Loss 7.5365   LearningRate 0.0009   Epoch: 3   Global Step: 6560   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:20:44,894-Speed 9391.02 samples/sec   Loss 7.5070   LearningRate 0.0010   Epoch: 3   Global Step: 6570   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:21:11,005-Speed 9413.09 samples/sec   Loss 7.5000   LearningRate 0.0010   Epoch: 3   Global Step: 6580   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:21:37,167-Speed 9394.31 samples/sec   Loss 7.4404   LearningRate 0.0010   Epoch: 3   Global Step: 6590   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:22:03,290-Speed 9408.24 samples/sec   Loss 7.4878   LearningRate 0.0010   Epoch: 3   Global Step: 6600   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:22:29,429-Speed 9402.51 samples/sec   Loss 7.5212   LearningRate 0.0010   Epoch: 3   Global Step: 6610   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:22:55,620-Speed 9383.52 samples/sec   Loss 7.4579   LearningRate 0.0010   Epoch: 3   Global Step: 6620   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:23:21,723-Speed 9415.47 samples/sec   Loss 7.4489   LearningRate 0.0010   Epoch: 3   Global Step: 6630   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:23:47,770-Speed 9435.78 samples/sec   Loss 7.4256   LearningRate 0.0010   Epoch: 3   Global Step: 6640   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:24:13,955-Speed 9385.98 samples/sec   Loss 7.4070   LearningRate 0.0010   Epoch: 3   Global Step: 6650   Fp16 Grad Scale: 262144   Required: 46 hours
Training: 2022-03-05 01:24:40,198-Speed 9365.05 samples/sec   Loss 7.4203   LearningRate 0.0010   Epoch: 3   Global Step: 6660   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:25:06,359-Speed 9394.58 samples/sec   Loss 7.3474   LearningRate 0.0010   Epoch: 3   Global Step: 6670   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:25:32,556-Speed 9381.73 samples/sec   Loss 7.3903   LearningRate 0.0010   Epoch: 3   Global Step: 6680   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:25:58,681-Speed 9407.60 samples/sec   Loss 7.3195   LearningRate 0.0010   Epoch: 3   Global Step: 6690   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:26:24,798-Speed 9410.15 samples/sec   Loss 7.3466   LearningRate 0.0010   Epoch: 3   Global Step: 6700   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:26:50,907-Speed 9413.21 samples/sec   Loss 7.3632   LearningRate 0.0010   Epoch: 3   Global Step: 6710   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:27:17,033-Speed 9407.08 samples/sec   Loss 7.3247   LearningRate 0.0010   Epoch: 3   Global Step: 6720   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:27:43,136-Speed 9415.67 samples/sec   Loss 7.3299   LearningRate 0.0010   Epoch: 3   Global Step: 6730   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:28:09,264-Speed 9406.22 samples/sec   Loss 7.3136   LearningRate 0.0010   Epoch: 3   Global Step: 6740   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:28:35,516-Speed 9362.13 samples/sec   Loss 7.2815   LearningRate 0.0010   Epoch: 3   Global Step: 6750   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:29:01,716-Speed 9380.29 samples/sec   Loss 7.2564   LearningRate 0.0010   Epoch: 3   Global Step: 6760   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:29:27,893-Speed 9388.96 samples/sec   Loss 7.2968   LearningRate 0.0010   Epoch: 3   Global Step: 6770   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:29:54,053-Speed 9394.73 samples/sec   Loss 7.2566   LearningRate 0.0010   Epoch: 3   Global Step: 6780   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:30:20,170-Speed 9410.48 samples/sec   Loss 7.2770   LearningRate 0.0010   Epoch: 3   Global Step: 6790   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:30:46,262-Speed 9419.46 samples/sec   Loss 7.2731   LearningRate 0.0010   Epoch: 3   Global Step: 6800   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:31:12,363-Speed 9415.96 samples/sec   Loss 7.2177   LearningRate 0.0010   Epoch: 3   Global Step: 6810   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:31:38,485-Speed 9408.61 samples/sec   Loss 7.2937   LearningRate 0.0010   Epoch: 3   Global Step: 6820   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-03-05 01:32:04,641-Speed 9396.47 samples/sec   Loss 7.2050   LearningRate 0.0010   Epoch: 3   Global Step: 6830   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-03-05 01:32:30,824-Speed 9386.48 samples/sec   Loss 7.2383   LearningRate 0.0010   Epoch: 3   Global Step: 6840   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-03-05 01:32:56,925-Speed 9416.15 samples/sec   Loss 7.2708   LearningRate 0.0010   Epoch: 3   Global Step: 6850   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-03-05 01:33:23,068-Speed 9400.76 samples/sec   Loss 7.2365   LearningRate 0.0010   Epoch: 3   Global Step: 6860   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-03-05 01:33:49,227-Speed 9395.56 samples/sec   Loss 7.2244   LearningRate 0.0010   Epoch: 3   Global Step: 6870   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-03-05 01:34:15,409-Speed 9387.13 samples/sec   Loss 7.1772   LearningRate 0.0010   Epoch: 3   Global Step: 6880   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-03-05 01:34:41,675-Speed 9356.59 samples/sec   Loss 7.1295   LearningRate 0.0010   Epoch: 3   Global Step: 6890   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-03-05 01:35:07,766-Speed 9419.85 samples/sec   Loss 7.1183   LearningRate 0.0010   Epoch: 3   Global Step: 6900   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-03-05 01:35:33,866-Speed 9416.53 samples/sec   Loss 7.1383   LearningRate 0.0010   Epoch: 3   Global Step: 6910   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-03-05 01:36:52,848-Speed 3111.63 samples/sec   Loss 7.0619   LearningRate 0.0010   Epoch: 4   Global Step: 6920   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:37:18,741-Speed 9492.07 samples/sec   Loss 7.0603   LearningRate 0.0010   Epoch: 4   Global Step: 6930   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:37:44,895-Speed 9397.09 samples/sec   Loss 7.0336   LearningRate 0.0010   Epoch: 4   Global Step: 6940   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:38:10,994-Speed 9416.90 samples/sec   Loss 6.9824   LearningRate 0.0010   Epoch: 4   Global Step: 6950   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:38:37,156-Speed 9394.06 samples/sec   Loss 7.0056   LearningRate 0.0010   Epoch: 4   Global Step: 6960   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:39:03,238-Speed 9422.94 samples/sec   Loss 7.0229   LearningRate 0.0010   Epoch: 4   Global Step: 6970   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:39:29,343-Speed 9414.57 samples/sec   Loss 6.9540   LearningRate 0.0010   Epoch: 4   Global Step: 6980   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:39:55,449-Speed 9414.21 samples/sec   Loss 6.9536   LearningRate 0.0010   Epoch: 4   Global Step: 6990   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:40:21,587-Speed 9402.84 samples/sec   Loss 6.9495   LearningRate 0.0010   Epoch: 4   Global Step: 7000   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:40:47,690-Speed 9415.58 samples/sec   Loss 6.9251   LearningRate 0.0010   Epoch: 4   Global Step: 7010   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:41:13,803-Speed 9411.67 samples/sec   Loss 6.9118   LearningRate 0.0010   Epoch: 4   Global Step: 7020   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:41:39,835-Speed 9441.38 samples/sec   Loss 6.9003   LearningRate 0.0010   Epoch: 4   Global Step: 7030   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:42:05,951-Speed 9410.58 samples/sec   Loss 6.9278   LearningRate 0.0010   Epoch: 4   Global Step: 7040   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:42:32,085-Speed 9404.15 samples/sec   Loss 6.9165   LearningRate 0.0010   Epoch: 4   Global Step: 7050   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:42:58,209-Speed 9408.01 samples/sec   Loss 6.8950   LearningRate 0.0010   Epoch: 4   Global Step: 7060   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:43:24,389-Speed 9387.85 samples/sec   Loss 6.8704   LearningRate 0.0010   Epoch: 4   Global Step: 7070   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:43:50,509-Speed 9409.41 samples/sec   Loss 6.9546   LearningRate 0.0010   Epoch: 4   Global Step: 7080   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:44:16,703-Speed 9382.50 samples/sec   Loss 6.8495   LearningRate 0.0010   Epoch: 4   Global Step: 7090   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:44:42,769-Speed 9428.79 samples/sec   Loss 6.9050   LearningRate 0.0010   Epoch: 4   Global Step: 7100   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:45:08,829-Speed 9431.00 samples/sec   Loss 6.8117   LearningRate 0.0010   Epoch: 4   Global Step: 7110   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:45:34,919-Speed 9420.31 samples/sec   Loss 6.8458   LearningRate 0.0010   Epoch: 4   Global Step: 7120   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-03-05 01:46:01,039-Speed 9409.77 samples/sec   Loss 6.8099   LearningRate 0.0010   Epoch: 4   Global Step: 7130   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-03-05 01:46:27,107-Speed 9428.12 samples/sec   Loss 6.7774   LearningRate 0.0010   Epoch: 4   Global Step: 7140   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-03-05 01:46:53,268-Speed 9394.65 samples/sec   Loss 6.8436   LearningRate 0.0010   Epoch: 4   Global Step: 7150   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-03-05 01:47:19,355-Speed 9421.13 samples/sec   Loss 6.7902   LearningRate 0.0010   Epoch: 4   Global Step: 7160   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-03-05 01:47:45,531-Speed 9389.28 samples/sec   Loss 6.7476   LearningRate 0.0010   Epoch: 4   Global Step: 7170   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-03-05 01:48:11,610-Speed 9424.25 samples/sec   Loss 6.7390   LearningRate 0.0010   Epoch: 4   Global Step: 7180   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-03-05 01:48:37,790-Speed 9387.54 samples/sec   Loss 6.7377   LearningRate 0.0010   Epoch: 4   Global Step: 7190   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-03-05 01:49:03,905-Speed 9410.99 samples/sec   Loss 6.7601   LearningRate 0.0010   Epoch: 4   Global Step: 7200   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-03-05 01:49:30,108-Speed 9379.65 samples/sec   Loss 6.6831   LearningRate 0.0010   Epoch: 4   Global Step: 7210   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-03-05 01:49:56,305-Speed 9381.43 samples/sec   Loss 6.6802   LearningRate 0.0010   Epoch: 4   Global Step: 7220   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:50:22,432-Speed 9406.98 samples/sec   Loss 6.7104   LearningRate 0.0010   Epoch: 4   Global Step: 7230   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:50:48,636-Speed 9379.20 samples/sec   Loss 6.6893   LearningRate 0.0010   Epoch: 4   Global Step: 7240   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-03-05 01:51:14,743-Speed 9413.98 samples/sec   Loss 6.6512   LearningRate 0.0010   Epoch: 4   Global Step: 7250   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-03-05 01:51:41,019-Speed 9353.30 samples/sec   Loss 6.6788   LearningRate 0.0010   Epoch: 4   Global Step: 7260   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-03-05 01:52:07,204-Speed 9386.12 samples/sec   Loss 6.6842   LearningRate 0.0010   Epoch: 4   Global Step: 7270   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-03-05 01:52:33,407-Speed 9379.27 samples/sec   Loss 6.6308   LearningRate 0.0010   Epoch: 4   Global Step: 7280   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-03-05 01:52:59,555-Speed 9399.59 samples/sec   Loss 6.5917   LearningRate 0.0010   Epoch: 4   Global Step: 7290   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 01:53:25,684-Speed 9405.98 samples/sec   Loss 6.6053   LearningRate 0.0010   Epoch: 4   Global Step: 7300   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 01:53:51,885-Speed 9380.13 samples/sec   Loss 6.6214   LearningRate 0.0010   Epoch: 4   Global Step: 7310   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 01:54:17,989-Speed 9415.44 samples/sec   Loss 6.6041   LearningRate 0.0010   Epoch: 4   Global Step: 7320   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 01:54:44,245-Speed 9360.50 samples/sec   Loss 6.5264   LearningRate 0.0010   Epoch: 4   Global Step: 7330   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 01:55:10,383-Speed 9402.90 samples/sec   Loss 6.5980   LearningRate 0.0010   Epoch: 4   Global Step: 7340   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 01:55:36,569-Speed 9385.66 samples/sec   Loss 6.5166   LearningRate 0.0010   Epoch: 4   Global Step: 7350   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 01:56:02,792-Speed 9372.31 samples/sec   Loss 6.5335   LearningRate 0.0010   Epoch: 4   Global Step: 7360   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 01:56:28,859-Speed 9428.48 samples/sec   Loss 6.6218   LearningRate 0.0010   Epoch: 4   Global Step: 7370   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 01:56:55,015-Speed 9396.36 samples/sec   Loss 6.5264   LearningRate 0.0010   Epoch: 4   Global Step: 7380   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 01:57:21,185-Speed 9391.45 samples/sec   Loss 6.5322   LearningRate 0.0010   Epoch: 4   Global Step: 7390   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 01:57:47,330-Speed 9400.44 samples/sec   Loss 6.5185   LearningRate 0.0010   Epoch: 4   Global Step: 7400   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 01:58:13,530-Speed 9380.74 samples/sec   Loss 6.4990   LearningRate 0.0010   Epoch: 4   Global Step: 7410   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 01:58:39,639-Speed 9413.41 samples/sec   Loss 6.4377   LearningRate 0.0010   Epoch: 4   Global Step: 7420   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 01:59:05,813-Speed 9390.04 samples/sec   Loss 6.4374   LearningRate 0.0010   Epoch: 4   Global Step: 7430   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 01:59:31,932-Speed 9409.41 samples/sec   Loss 6.4792   LearningRate 0.0010   Epoch: 4   Global Step: 7440   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 01:59:58,000-Speed 9428.23 samples/sec   Loss 6.4280   LearningRate 0.0010   Epoch: 4   Global Step: 7450   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 02:00:24,195-Speed 9382.48 samples/sec   Loss 6.4579   LearningRate 0.0010   Epoch: 4   Global Step: 7460   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 02:00:50,341-Speed 9399.96 samples/sec   Loss 6.4510   LearningRate 0.0010   Epoch: 4   Global Step: 7470   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 02:01:16,469-Speed 9406.33 samples/sec   Loss 6.4101   LearningRate 0.0010   Epoch: 4   Global Step: 7480   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 02:01:42,657-Speed 9384.80 samples/sec   Loss 6.3892   LearningRate 0.0010   Epoch: 4   Global Step: 7490   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 02:02:08,840-Speed 9386.51 samples/sec   Loss 6.4289   LearningRate 0.0010   Epoch: 4   Global Step: 7500   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 02:02:35,029-Speed 9384.77 samples/sec   Loss 6.4055   LearningRate 0.0010   Epoch: 4   Global Step: 7510   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 02:03:01,196-Speed 9392.59 samples/sec   Loss 6.3372   LearningRate 0.0010   Epoch: 4   Global Step: 7520   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 02:03:27,409-Speed 9375.79 samples/sec   Loss 6.3962   LearningRate 0.0010   Epoch: 4   Global Step: 7530   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 02:03:53,688-Speed 9352.18 samples/sec   Loss 6.2861   LearningRate 0.0010   Epoch: 4   Global Step: 7540   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 02:04:19,944-Speed 9360.59 samples/sec   Loss 6.3183   LearningRate 0.0010   Epoch: 4   Global Step: 7550   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-03-05 02:04:46,096-Speed 9398.42 samples/sec   Loss 6.3467   LearningRate 0.0010   Epoch: 4   Global Step: 7560   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 02:05:12,326-Speed 9369.80 samples/sec   Loss 6.2746   LearningRate 0.0010   Epoch: 4   Global Step: 7570   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 02:05:38,491-Speed 9393.13 samples/sec   Loss 6.2870   LearningRate 0.0010   Epoch: 4   Global Step: 7580   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 02:06:04,677-Speed 9385.44 samples/sec   Loss 6.2968   LearningRate 0.0010   Epoch: 4   Global Step: 7590   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 02:06:30,794-Speed 9411.28 samples/sec   Loss 6.2425   LearningRate 0.0010   Epoch: 4   Global Step: 7600   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 02:06:56,894-Speed 9416.40 samples/sec   Loss 6.2853   LearningRate 0.0010   Epoch: 4   Global Step: 7610   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 02:07:23,059-Speed 9393.15 samples/sec   Loss 6.2480   LearningRate 0.0010   Epoch: 4   Global Step: 7620   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 02:07:49,218-Speed 9395.17 samples/sec   Loss 6.2440   LearningRate 0.0010   Epoch: 4   Global Step: 7630   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 02:08:15,444-Speed 9370.93 samples/sec   Loss 6.2055   LearningRate 0.0010   Epoch: 4   Global Step: 7640   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 02:08:41,602-Speed 9395.98 samples/sec   Loss 6.2159   LearningRate 0.0010   Epoch: 4   Global Step: 7650   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 02:09:07,764-Speed 9394.22 samples/sec   Loss 6.2180   LearningRate 0.0010   Epoch: 4   Global Step: 7660   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 02:09:33,907-Speed 9400.91 samples/sec   Loss 6.2008   LearningRate 0.0010   Epoch: 4   Global Step: 7670   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:10:00,060-Speed 9397.16 samples/sec   Loss 6.2154   LearningRate 0.0010   Epoch: 4   Global Step: 7680   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:10:26,334-Speed 9354.29 samples/sec   Loss 6.1497   LearningRate 0.0010   Epoch: 4   Global Step: 7690   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:10:52,582-Speed 9363.35 samples/sec   Loss 6.1427   LearningRate 0.0010   Epoch: 4   Global Step: 7700   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:11:18,725-Speed 9400.73 samples/sec   Loss 6.1468   LearningRate 0.0010   Epoch: 4   Global Step: 7710   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:11:44,955-Speed 9370.05 samples/sec   Loss 6.1768   LearningRate 0.0010   Epoch: 4   Global Step: 7720   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:12:11,248-Speed 9347.21 samples/sec   Loss 6.1342   LearningRate 0.0010   Epoch: 4   Global Step: 7730   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:12:37,457-Speed 9377.39 samples/sec   Loss 6.1000   LearningRate 0.0010   Epoch: 4   Global Step: 7740   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:13:03,662-Speed 9378.98 samples/sec   Loss 6.1616   LearningRate 0.0010   Epoch: 4   Global Step: 7750   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:13:29,878-Speed 9374.72 samples/sec   Loss 6.1043   LearningRate 0.0010   Epoch: 4   Global Step: 7760   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:13:56,195-Speed 9338.68 samples/sec   Loss 6.1267   LearningRate 0.0010   Epoch: 4   Global Step: 7770   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 02:14:22,405-Speed 9378.08 samples/sec   Loss 6.0739   LearningRate 0.0010   Epoch: 4   Global Step: 7780   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 02:14:48,657-Speed 9362.07 samples/sec   Loss 6.0513   LearningRate 0.0010   Epoch: 4   Global Step: 7790   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 02:15:14,898-Speed 9365.87 samples/sec   Loss 6.0824   LearningRate 0.0010   Epoch: 4   Global Step: 7800   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 02:15:41,046-Speed 9399.26 samples/sec   Loss 6.0179   LearningRate 0.0010   Epoch: 4   Global Step: 7810   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:16:07,230-Speed 9386.01 samples/sec   Loss 6.0486   LearningRate 0.0010   Epoch: 4   Global Step: 7820   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:16:33,421-Speed 9384.42 samples/sec   Loss 6.0717   LearningRate 0.0010   Epoch: 4   Global Step: 7830   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:16:59,634-Speed 9375.68 samples/sec   Loss 5.9894   LearningRate 0.0010   Epoch: 4   Global Step: 7840   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:17:25,819-Speed 9386.23 samples/sec   Loss 6.0351   LearningRate 0.0010   Epoch: 4   Global Step: 7850   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:17:51,959-Speed 9401.85 samples/sec   Loss 6.0337   LearningRate 0.0010   Epoch: 4   Global Step: 7860   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:18:18,111-Speed 9397.87 samples/sec   Loss 6.0213   LearningRate 0.0010   Epoch: 4   Global Step: 7870   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:18:44,307-Speed 9381.79 samples/sec   Loss 6.0081   LearningRate 0.0010   Epoch: 4   Global Step: 7880   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:19:10,583-Speed 9353.63 samples/sec   Loss 5.9360   LearningRate 0.0010   Epoch: 4   Global Step: 7890   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:19:36,851-Speed 9356.23 samples/sec   Loss 5.9147   LearningRate 0.0010   Epoch: 4   Global Step: 7900   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:20:03,118-Speed 9356.32 samples/sec   Loss 5.9679   LearningRate 0.0010   Epoch: 4   Global Step: 7910   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 02:20:29,357-Speed 9366.94 samples/sec   Loss 5.9519   LearningRate 0.0010   Epoch: 4   Global Step: 7920   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 02:20:55,571-Speed 9375.19 samples/sec   Loss 5.9136   LearningRate 0.0010   Epoch: 4   Global Step: 7930   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 02:21:21,798-Speed 9370.89 samples/sec   Loss 5.9230   LearningRate 0.0010   Epoch: 4   Global Step: 7940   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 02:21:48,030-Speed 9368.94 samples/sec   Loss 5.9383   LearningRate 0.0010   Epoch: 4   Global Step: 7950   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 02:22:14,154-Speed 9407.87 samples/sec   Loss 5.9007   LearningRate 0.0010   Epoch: 4   Global Step: 7960   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 02:22:40,296-Speed 9401.39 samples/sec   Loss 5.9440   LearningRate 0.0010   Epoch: 4   Global Step: 7970   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 02:23:06,380-Speed 9422.37 samples/sec   Loss 5.9172   LearningRate 0.0010   Epoch: 4   Global Step: 7980   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:23:32,566-Speed 9385.65 samples/sec   Loss 5.8890   LearningRate 0.0010   Epoch: 4   Global Step: 7990   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:23:58,735-Speed 9391.74 samples/sec   Loss 5.8600   LearningRate 0.0010   Epoch: 4   Global Step: 8000   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:24:24,970-Speed 9368.04 samples/sec   Loss 5.9298   LearningRate 0.0010   Epoch: 4   Global Step: 8010   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:24:51,133-Speed 9394.04 samples/sec   Loss 5.8494   LearningRate 0.0010   Epoch: 4   Global Step: 8020   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:25:17,163-Speed 9441.62 samples/sec   Loss 5.8181   LearningRate 0.0010   Epoch: 4   Global Step: 8030   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:25:43,311-Speed 9399.44 samples/sec   Loss 5.8442   LearningRate 0.0010   Epoch: 4   Global Step: 8040   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:26:09,572-Speed 9358.70 samples/sec   Loss 5.8364   LearningRate 0.0010   Epoch: 4   Global Step: 8050   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:26:35,695-Speed 9408.34 samples/sec   Loss 5.8310   LearningRate 0.0010   Epoch: 4   Global Step: 8060   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:27:01,873-Speed 9388.32 samples/sec   Loss 5.8031   LearningRate 0.0010   Epoch: 4   Global Step: 8070   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:27:28,035-Speed 9394.22 samples/sec   Loss 5.7445   LearningRate 0.0010   Epoch: 4   Global Step: 8080   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 02:27:54,189-Speed 9397.08 samples/sec   Loss 5.7991   LearningRate 0.0010   Epoch: 4   Global Step: 8090   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 02:28:20,484-Speed 9346.84 samples/sec   Loss 5.7472   LearningRate 0.0010   Epoch: 4   Global Step: 8100   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 02:28:46,556-Speed 9426.76 samples/sec   Loss 5.7216   LearningRate 0.0010   Epoch: 4   Global Step: 8110   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 02:29:12,661-Speed 9414.71 samples/sec   Loss 5.7794   LearningRate 0.0010   Epoch: 4   Global Step: 8120   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 02:29:38,846-Speed 9385.89 samples/sec   Loss 5.7425   LearningRate 0.0010   Epoch: 4   Global Step: 8130   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 02:30:05,094-Speed 9363.27 samples/sec   Loss 5.7537   LearningRate 0.0010   Epoch: 4   Global Step: 8140   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 02:30:31,195-Speed 9416.01 samples/sec   Loss 5.7553   LearningRate 0.0010   Epoch: 4   Global Step: 8150   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:30:57,355-Speed 9394.80 samples/sec   Loss 5.7546   LearningRate 0.0010   Epoch: 4   Global Step: 8160   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:31:23,629-Speed 9354.41 samples/sec   Loss 5.7129   LearningRate 0.0010   Epoch: 4   Global Step: 8170   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:31:49,880-Speed 9362.12 samples/sec   Loss 5.6976   LearningRate 0.0010   Epoch: 4   Global Step: 8180   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:32:16,104-Speed 9371.99 samples/sec   Loss 5.7367   LearningRate 0.0010   Epoch: 4   Global Step: 8190   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:32:42,274-Speed 9391.35 samples/sec   Loss 5.7191   LearningRate 0.0010   Epoch: 4   Global Step: 8200   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:33:08,462-Speed 9384.64 samples/sec   Loss 5.6783   LearningRate 0.0010   Epoch: 4   Global Step: 8210   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:33:34,638-Speed 9389.07 samples/sec   Loss 5.6504   LearningRate 0.0010   Epoch: 4   Global Step: 8220   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:34:00,762-Speed 9407.69 samples/sec   Loss 5.6473   LearningRate 0.0010   Epoch: 4   Global Step: 8230   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:34:26,929-Speed 9392.43 samples/sec   Loss 5.6755   LearningRate 0.0010   Epoch: 4   Global Step: 8240   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:34:53,050-Speed 9408.92 samples/sec   Loss 5.6577   LearningRate 0.0010   Epoch: 4   Global Step: 8250   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 02:35:19,195-Speed 9400.33 samples/sec   Loss 5.6298   LearningRate 0.0010   Epoch: 4   Global Step: 8260   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:35:45,386-Speed 9383.92 samples/sec   Loss 5.6035   LearningRate 0.0010   Epoch: 4   Global Step: 8270   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:36:11,626-Speed 9366.19 samples/sec   Loss 5.5892   LearningRate 0.0010   Epoch: 4   Global Step: 8280   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:36:37,862-Speed 9367.83 samples/sec   Loss 5.6366   LearningRate 0.0010   Epoch: 4   Global Step: 8290   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:37:04,098-Speed 9367.63 samples/sec   Loss 5.6345   LearningRate 0.0010   Epoch: 4   Global Step: 8300   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:37:30,244-Speed 9399.94 samples/sec   Loss 5.5931   LearningRate 0.0010   Epoch: 4   Global Step: 8310   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:37:56,357-Speed 9411.76 samples/sec   Loss 5.5730   LearningRate 0.0010   Epoch: 4   Global Step: 8320   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:38:22,551-Speed 9382.77 samples/sec   Loss 5.5694   LearningRate 0.0010   Epoch: 4   Global Step: 8330   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:38:48,686-Speed 9404.09 samples/sec   Loss 5.6010   LearningRate 0.0010   Epoch: 4   Global Step: 8340   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:39:14,837-Speed 9398.07 samples/sec   Loss 5.5628   LearningRate 0.0010   Epoch: 4   Global Step: 8350   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:39:41,048-Speed 9376.74 samples/sec   Loss 5.5650   LearningRate 0.0010   Epoch: 4   Global Step: 8360   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 02:40:07,234-Speed 9385.46 samples/sec   Loss 5.5559   LearningRate 0.0010   Epoch: 4   Global Step: 8370   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 02:40:33,412-Speed 9388.41 samples/sec   Loss 5.5436   LearningRate 0.0010   Epoch: 4   Global Step: 8380   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 02:40:59,594-Speed 9387.43 samples/sec   Loss 5.5451   LearningRate 0.0010   Epoch: 4   Global Step: 8390   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 02:41:25,806-Speed 9376.45 samples/sec   Loss 5.5371   LearningRate 0.0010   Epoch: 4   Global Step: 8400   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 02:41:52,024-Speed 9374.17 samples/sec   Loss 5.5727   LearningRate 0.0010   Epoch: 4   Global Step: 8410   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 02:42:18,241-Speed 9374.26 samples/sec   Loss 5.5139   LearningRate 0.0010   Epoch: 4   Global Step: 8420   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 02:42:44,329-Speed 9421.19 samples/sec   Loss 5.5018   LearningRate 0.0010   Epoch: 4   Global Step: 8430   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 02:43:10,559-Speed 9369.42 samples/sec   Loss 5.6199   LearningRate 0.0010   Epoch: 4   Global Step: 8440   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 02:43:36,720-Speed 9394.53 samples/sec   Loss 5.5683   LearningRate 0.0010   Epoch: 4   Global Step: 8450   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 02:44:02,917-Speed 9381.77 samples/sec   Loss 5.4605   LearningRate 0.0010   Epoch: 4   Global Step: 8460   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-03-05 02:44:29,079-Speed 9394.07 samples/sec   Loss 5.4323   LearningRate 0.0010   Epoch: 4   Global Step: 8470   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 02:44:55,311-Speed 9369.19 samples/sec   Loss 5.4541   LearningRate 0.0010   Epoch: 4   Global Step: 8480   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 02:45:21,530-Speed 9373.71 samples/sec   Loss 5.4300   LearningRate 0.0009   Epoch: 4   Global Step: 8490   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-03-05 02:45:47,789-Speed 9359.62 samples/sec   Loss 5.4392   LearningRate 0.0009   Epoch: 4   Global Step: 8500   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:46:14,064-Speed 9354.02 samples/sec   Loss 5.4669   LearningRate 0.0009   Epoch: 4   Global Step: 8510   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:46:40,433-Speed 9320.36 samples/sec   Loss 5.4333   LearningRate 0.0009   Epoch: 4   Global Step: 8520   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:47:06,746-Speed 9340.55 samples/sec   Loss 5.4289   LearningRate 0.0009   Epoch: 4   Global Step: 8530   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:47:33,044-Speed 9345.50 samples/sec   Loss 5.4546   LearningRate 0.0009   Epoch: 4   Global Step: 8540   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:47:59,371-Speed 9335.17 samples/sec   Loss 5.4097   LearningRate 0.0009   Epoch: 4   Global Step: 8550   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 02:48:25,637-Speed 9357.32 samples/sec   Loss 5.4641   LearningRate 0.0009   Epoch: 4   Global Step: 8560   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 02:48:51,894-Speed 9360.01 samples/sec   Loss 5.4730   LearningRate 0.0009   Epoch: 4   Global Step: 8570   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 02:49:18,209-Speed 9339.45 samples/sec   Loss 5.4073   LearningRate 0.0009   Epoch: 4   Global Step: 8580   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 02:49:44,536-Speed 9335.44 samples/sec   Loss 5.4151   LearningRate 0.0009   Epoch: 4   Global Step: 8590   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 02:50:10,857-Speed 9337.33 samples/sec   Loss 5.4005   LearningRate 0.0009   Epoch: 4   Global Step: 8600   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-03-05 02:50:37,139-Speed 9351.68 samples/sec   Loss 5.4643   LearningRate 0.0009   Epoch: 4   Global Step: 8610   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-03-05 02:51:03,370-Speed 9369.43 samples/sec   Loss 5.4210   LearningRate 0.0009   Epoch: 4   Global Step: 8620   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 02:51:29,610-Speed 9366.24 samples/sec   Loss 5.4087   LearningRate 0.0009   Epoch: 4   Global Step: 8630   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 02:51:55,904-Speed 9346.93 samples/sec   Loss 5.3932   LearningRate 0.0009   Epoch: 4   Global Step: 8640   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 02:53:13,566-Speed 3164.54 samples/sec   Loss 5.2916   LearningRate 0.0009   Epoch: 5   Global Step: 8650   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:53:39,633-Speed 9428.54 samples/sec   Loss 5.2671   LearningRate 0.0009   Epoch: 5   Global Step: 8660   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:54:05,782-Speed 9399.06 samples/sec   Loss 5.2597   LearningRate 0.0009   Epoch: 5   Global Step: 8670   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-03-05 02:54:31,866-Speed 9422.43 samples/sec   Loss 5.3309   LearningRate 0.0009   Epoch: 5   Global Step: 8680   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 02:54:57,967-Speed 9416.12 samples/sec   Loss 5.3113   LearningRate 0.0009   Epoch: 5   Global Step: 8690   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 02:55:24,074-Speed 9413.96 samples/sec   Loss 5.2650   LearningRate 0.0009   Epoch: 5   Global Step: 8700   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 02:55:50,281-Speed 9378.23 samples/sec   Loss 5.2515   LearningRate 0.0009   Epoch: 5   Global Step: 8710   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 02:56:16,479-Speed 9381.46 samples/sec   Loss 5.2377   LearningRate 0.0009   Epoch: 5   Global Step: 8720   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-03-05 02:56:42,713-Speed 9368.30 samples/sec   Loss 5.2604   LearningRate 0.0009   Epoch: 5   Global Step: 8730   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-03-05 02:57:08,834-Speed 9408.86 samples/sec   Loss 5.2486   LearningRate 0.0009   Epoch: 5   Global Step: 8740   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-03-05 02:57:35,064-Speed 9369.79 samples/sec   Loss 5.2462   LearningRate 0.0009   Epoch: 5   Global Step: 8750   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-03-05 02:58:01,350-Speed 9349.76 samples/sec   Loss 5.2310   LearningRate 0.0009   Epoch: 5   Global Step: 8760   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-03-05 02:58:27,499-Speed 9399.09 samples/sec   Loss 5.2179   LearningRate 0.0009   Epoch: 5   Global Step: 8770   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-03-05 02:58:53,687-Speed 9384.99 samples/sec   Loss 5.2830   LearningRate 0.0009   Epoch: 5   Global Step: 8780   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-03-05 02:59:19,866-Speed 9388.00 samples/sec   Loss 5.2495   LearningRate 0.0009   Epoch: 5   Global Step: 8790   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-03-05 02:59:46,079-Speed 9375.85 samples/sec   Loss 5.2192   LearningRate 0.0009   Epoch: 5   Global Step: 8800   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-03-05 03:00:12,277-Speed 9381.27 samples/sec   Loss 5.2143   LearningRate 0.0009   Epoch: 5   Global Step: 8810   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-03-05 03:00:38,483-Speed 9378.16 samples/sec   Loss 5.2432   LearningRate 0.0009   Epoch: 5   Global Step: 8820   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-03-05 03:01:04,719-Speed 9367.72 samples/sec   Loss 5.2120   LearningRate 0.0009   Epoch: 5   Global Step: 8830   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-03-05 03:01:30,893-Speed 9389.98 samples/sec   Loss 5.1857   LearningRate 0.0009   Epoch: 5   Global Step: 8840   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-03-05 03:01:57,095-Speed 9379.80 samples/sec   Loss 5.2238   LearningRate 0.0009   Epoch: 5   Global Step: 8850   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-03-05 03:02:23,285-Speed 9384.82 samples/sec   Loss 5.1910   LearningRate 0.0009   Epoch: 5   Global Step: 8860   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-03-05 03:02:49,545-Speed 9358.96 samples/sec   Loss 5.2175   LearningRate 0.0009   Epoch: 5   Global Step: 8870   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-03-05 03:03:15,783-Speed 9367.07 samples/sec   Loss 5.1841   LearningRate 0.0009   Epoch: 5   Global Step: 8880   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-03-05 03:03:41,923-Speed 9402.20 samples/sec   Loss 5.1832   LearningRate 0.0009   Epoch: 5   Global Step: 8890   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-03-05 03:04:08,068-Speed 9400.28 samples/sec   Loss 5.2126   LearningRate 0.0009   Epoch: 5   Global Step: 8900   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-03-05 03:04:34,183-Speed 9411.30 samples/sec   Loss 5.1652   LearningRate 0.0009   Epoch: 5   Global Step: 8910   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:05:00,372-Speed 9384.38 samples/sec   Loss 5.1651   LearningRate 0.0009   Epoch: 5   Global Step: 8920   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:05:26,652-Speed 9352.00 samples/sec   Loss 5.1627   LearningRate 0.0009   Epoch: 5   Global Step: 8930   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:05:52,935-Speed 9350.76 samples/sec   Loss 5.1642   LearningRate 0.0009   Epoch: 5   Global Step: 8940   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:06:19,116-Speed 9387.80 samples/sec   Loss 5.2038   LearningRate 0.0009   Epoch: 5   Global Step: 8950   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:06:45,304-Speed 9384.65 samples/sec   Loss 5.1607   LearningRate 0.0009   Epoch: 5   Global Step: 8960   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:07:11,470-Speed 9393.06 samples/sec   Loss 5.1064   LearningRate 0.0009   Epoch: 5   Global Step: 8970   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:07:37,663-Speed 9383.11 samples/sec   Loss 5.1087   LearningRate 0.0009   Epoch: 5   Global Step: 8980   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:08:03,928-Speed 9357.21 samples/sec   Loss 5.1159   LearningRate 0.0009   Epoch: 5   Global Step: 8990   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:08:30,077-Speed 9398.94 samples/sec   Loss 5.1300   LearningRate 0.0009   Epoch: 5   Global Step: 9000   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:08:56,263-Speed 9385.43 samples/sec   Loss 5.0988   LearningRate 0.0009   Epoch: 5   Global Step: 9010   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-03-05 03:09:22,481-Speed 9374.28 samples/sec   Loss 5.1076   LearningRate 0.0009   Epoch: 5   Global Step: 9020   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:09:48,725-Speed 9364.85 samples/sec   Loss 5.1376   LearningRate 0.0009   Epoch: 5   Global Step: 9030   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:10:14,920-Speed 9382.63 samples/sec   Loss 5.0553   LearningRate 0.0009   Epoch: 5   Global Step: 9040   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:10:41,072-Speed 9398.12 samples/sec   Loss 5.0832   LearningRate 0.0009   Epoch: 5   Global Step: 9050   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:11:07,246-Speed 9389.66 samples/sec   Loss 5.0769   LearningRate 0.0009   Epoch: 5   Global Step: 9060   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:11:33,457-Speed 9377.06 samples/sec   Loss 5.1012   LearningRate 0.0009   Epoch: 5   Global Step: 9070   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:11:59,602-Speed 9399.97 samples/sec   Loss 5.0698   LearningRate 0.0009   Epoch: 5   Global Step: 9080   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:12:25,858-Speed 9361.46 samples/sec   Loss 5.0479   LearningRate 0.0009   Epoch: 5   Global Step: 9090   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:12:52,167-Speed 9341.79 samples/sec   Loss 5.0563   LearningRate 0.0009   Epoch: 5   Global Step: 9100   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:13:18,399-Speed 9369.13 samples/sec   Loss 5.0402   LearningRate 0.0009   Epoch: 5   Global Step: 9110   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:13:44,706-Speed 9342.29 samples/sec   Loss 5.0749   LearningRate 0.0009   Epoch: 5   Global Step: 9120   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-03-05 03:14:10,869-Speed 9393.96 samples/sec   Loss 5.0536   LearningRate 0.0009   Epoch: 5   Global Step: 9130   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:14:37,092-Speed 9372.12 samples/sec   Loss 5.0214   LearningRate 0.0009   Epoch: 5   Global Step: 9140   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:15:03,289-Speed 9382.01 samples/sec   Loss 5.0128   LearningRate 0.0009   Epoch: 5   Global Step: 9150   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-03-05 03:15:29,588-Speed 9345.09 samples/sec   Loss 5.0243   LearningRate 0.0009   Epoch: 5   Global Step: 9160   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-03-05 03:15:55,757-Speed 9392.49 samples/sec   Loss 4.9962   LearningRate 0.0009   Epoch: 5   Global Step: 9170   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-03-05 03:16:21,970-Speed 9376.11 samples/sec   Loss 4.9908   LearningRate 0.0009   Epoch: 5   Global Step: 9180   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-03-05 03:16:48,189-Speed 9373.63 samples/sec   Loss 5.0602   LearningRate 0.0009   Epoch: 5   Global Step: 9190   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-03-05 03:17:14,466-Speed 9352.87 samples/sec   Loss 5.0041   LearningRate 0.0009   Epoch: 5   Global Step: 9200   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-03-05 03:17:40,729-Speed 9358.29 samples/sec   Loss 4.9948   LearningRate 0.0009   Epoch: 5   Global Step: 9210   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-03-05 03:18:06,888-Speed 9395.17 samples/sec   Loss 5.0107   LearningRate 0.0009   Epoch: 5   Global Step: 9220   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-03-05 03:18:33,148-Speed 9359.27 samples/sec   Loss 4.9803   LearningRate 0.0009   Epoch: 5   Global Step: 9230   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-03-05 03:18:59,432-Speed 9350.24 samples/sec   Loss 4.9508   LearningRate 0.0009   Epoch: 5   Global Step: 9240   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-03-05 03:19:25,707-Speed 9353.78 samples/sec   Loss 5.0151   LearningRate 0.0009   Epoch: 5   Global Step: 9250   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:19:51,996-Speed 9348.88 samples/sec   Loss 4.9666   LearningRate 0.0009   Epoch: 5   Global Step: 9260   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:20:18,203-Speed 9378.25 samples/sec   Loss 4.9567   LearningRate 0.0009   Epoch: 5   Global Step: 9270   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:20:44,298-Speed 9418.22 samples/sec   Loss 4.9636   LearningRate 0.0009   Epoch: 5   Global Step: 9280   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:21:10,458-Speed 9394.77 samples/sec   Loss 4.9957   LearningRate 0.0009   Epoch: 5   Global Step: 9290   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:21:36,631-Speed 9390.45 samples/sec   Loss 4.9159   LearningRate 0.0009   Epoch: 5   Global Step: 9300   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-03-05 03:22:02,857-Speed 9371.27 samples/sec   Loss 4.9514   LearningRate 0.0009   Epoch: 5   Global Step: 9310   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-03-05 03:22:29,123-Speed 9356.85 samples/sec   Loss 4.9378   LearningRate 0.0009   Epoch: 5   Global Step: 9320   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-03-05 03:22:55,334-Speed 9376.50 samples/sec   Loss 4.9474   LearningRate 0.0009   Epoch: 5   Global Step: 9330   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-03-05 03:23:21,616-Speed 9351.30 samples/sec   Loss 4.9160   LearningRate 0.0009   Epoch: 5   Global Step: 9340   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-03-05 03:23:47,872-Speed 9360.72 samples/sec   Loss 4.9088   LearningRate 0.0009   Epoch: 5   Global Step: 9350   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-03-05 03:24:14,104-Speed 9369.12 samples/sec   Loss 4.9144   LearningRate 0.0009   Epoch: 5   Global Step: 9360   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-03-05 03:24:40,337-Speed 9368.82 samples/sec   Loss 4.9214   LearningRate 0.0009   Epoch: 5   Global Step: 9370   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-03-05 03:25:06,623-Speed 9349.72 samples/sec   Loss 4.9139   LearningRate 0.0009   Epoch: 5   Global Step: 9380   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-03-05 03:25:32,825-Speed 9380.07 samples/sec   Loss 4.8911   LearningRate 0.0009   Epoch: 5   Global Step: 9390   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-03-05 03:25:59,061-Speed 9367.63 samples/sec   Loss 4.9088   LearningRate 0.0009   Epoch: 5   Global Step: 9400   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:26:25,256-Speed 9382.47 samples/sec   Loss 4.8959   LearningRate 0.0009   Epoch: 5   Global Step: 9410   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:26:51,497-Speed 9365.75 samples/sec   Loss 4.8635   LearningRate 0.0009   Epoch: 5   Global Step: 9420   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:27:17,845-Speed 9327.87 samples/sec   Loss 4.8801   LearningRate 0.0009   Epoch: 5   Global Step: 9430   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:27:44,266-Speed 9301.97 samples/sec   Loss 4.8663   LearningRate 0.0009   Epoch: 5   Global Step: 9440   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:28:10,560-Speed 9347.23 samples/sec   Loss 4.8314   LearningRate 0.0009   Epoch: 5   Global Step: 9450   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:28:36,819-Speed 9359.46 samples/sec   Loss 4.9014   LearningRate 0.0009   Epoch: 5   Global Step: 9460   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:29:03,082-Speed 9358.08 samples/sec   Loss 4.8811   LearningRate 0.0009   Epoch: 5   Global Step: 9470   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:29:29,398-Speed 9339.01 samples/sec   Loss 4.8639   LearningRate 0.0009   Epoch: 5   Global Step: 9480   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:29:55,631-Speed 9368.90 samples/sec   Loss 4.8369   LearningRate 0.0009   Epoch: 5   Global Step: 9490   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:30:21,942-Speed 9340.71 samples/sec   Loss 4.8345   LearningRate 0.0009   Epoch: 5   Global Step: 9500   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-03-05 03:30:48,312-Speed 9320.18 samples/sec   Loss 4.7863   LearningRate 0.0009   Epoch: 5   Global Step: 9510   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-03-05 03:31:14,573-Speed 9359.67 samples/sec   Loss 4.8258   LearningRate 0.0009   Epoch: 5   Global Step: 9520   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:31:40,787-Speed 9375.55 samples/sec   Loss 4.8147   LearningRate 0.0009   Epoch: 5   Global Step: 9530   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:32:07,170-Speed 9315.34 samples/sec   Loss 4.8319   LearningRate 0.0009   Epoch: 5   Global Step: 9540   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:32:33,619-Speed 9292.33 samples/sec   Loss 4.8049   LearningRate 0.0009   Epoch: 5   Global Step: 9550   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:32:59,891-Speed 9354.80 samples/sec   Loss 4.8439   LearningRate 0.0009   Epoch: 5   Global Step: 9560   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:33:26,323-Speed 9298.20 samples/sec   Loss 4.8462   LearningRate 0.0009   Epoch: 5   Global Step: 9570   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:33:52,611-Speed 9349.25 samples/sec   Loss 4.8052   LearningRate 0.0009   Epoch: 5   Global Step: 9580   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:34:18,875-Speed 9357.76 samples/sec   Loss 4.7745   LearningRate 0.0009   Epoch: 5   Global Step: 9590   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:34:45,327-Speed 9291.31 samples/sec   Loss 4.7804   LearningRate 0.0009   Epoch: 5   Global Step: 9600   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:35:11,625-Speed 9345.26 samples/sec   Loss 4.7872   LearningRate 0.0009   Epoch: 5   Global Step: 9610   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:35:37,995-Speed 9320.13 samples/sec   Loss 4.7670   LearningRate 0.0009   Epoch: 5   Global Step: 9620   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-03-05 03:36:04,347-Speed 9326.48 samples/sec   Loss 4.7372   LearningRate 0.0009   Epoch: 5   Global Step: 9630   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-03-05 03:36:30,648-Speed 9344.67 samples/sec   Loss 4.7561   LearningRate 0.0009   Epoch: 5   Global Step: 9640   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-03-05 03:36:56,892-Speed 9365.32 samples/sec   Loss 4.7481   LearningRate 0.0009   Epoch: 5   Global Step: 9650   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-03-05 03:37:23,020-Speed 9406.20 samples/sec   Loss 4.7692   LearningRate 0.0009   Epoch: 5   Global Step: 9660   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-03-05 03:37:49,133-Speed 9412.18 samples/sec   Loss 4.7680   LearningRate 0.0009   Epoch: 5   Global Step: 9670   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-03-05 03:38:15,268-Speed 9403.99 samples/sec   Loss 4.7777   LearningRate 0.0009   Epoch: 5   Global Step: 9680   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-03-05 03:38:41,621-Speed 9326.03 samples/sec   Loss 4.7637   LearningRate 0.0009   Epoch: 5   Global Step: 9690   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-03-05 03:39:07,888-Speed 9356.49 samples/sec   Loss 4.7093   LearningRate 0.0009   Epoch: 5   Global Step: 9700   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-03-05 03:39:34,173-Speed 9350.40 samples/sec   Loss 4.7524   LearningRate 0.0009   Epoch: 5   Global Step: 9710   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:40:00,468-Speed 9346.73 samples/sec   Loss 4.7299   LearningRate 0.0009   Epoch: 5   Global Step: 9720   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:40:26,914-Speed 9293.13 samples/sec   Loss 4.7362   LearningRate 0.0009   Epoch: 5   Global Step: 9730   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:40:53,169-Speed 9360.87 samples/sec   Loss 4.7119   LearningRate 0.0009   Epoch: 5   Global Step: 9740   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:41:19,422-Speed 9361.85 samples/sec   Loss 4.7108   LearningRate 0.0009   Epoch: 5   Global Step: 9750   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:41:45,690-Speed 9356.29 samples/sec   Loss 4.6979   LearningRate 0.0009   Epoch: 5   Global Step: 9760   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:42:11,957-Speed 9356.45 samples/sec   Loss 4.7021   LearningRate 0.0009   Epoch: 5   Global Step: 9770   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:42:38,189-Speed 9369.98 samples/sec   Loss 4.6633   LearningRate 0.0009   Epoch: 5   Global Step: 9780   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:43:04,612-Speed 9301.12 samples/sec   Loss 4.6534   LearningRate 0.0009   Epoch: 5   Global Step: 9790   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:43:30,850-Speed 9367.22 samples/sec   Loss 4.6792   LearningRate 0.0009   Epoch: 5   Global Step: 9800   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:43:57,233-Speed 9315.11 samples/sec   Loss 4.7396   LearningRate 0.0009   Epoch: 5   Global Step: 9810   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-03-05 03:44:23,523-Speed 9348.58 samples/sec   Loss 4.6864   LearningRate 0.0009   Epoch: 5   Global Step: 9820   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-03-05 03:44:49,782-Speed 9359.63 samples/sec   Loss 4.6572   LearningRate 0.0009   Epoch: 5   Global Step: 9830   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-03-05 03:45:16,235-Speed 9290.89 samples/sec   Loss 4.6723   LearningRate 0.0009   Epoch: 5   Global Step: 9840   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-03-05 03:45:42,728-Speed 9276.92 samples/sec   Loss 4.6455   LearningRate 0.0009   Epoch: 5   Global Step: 9850   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-03-05 03:46:09,157-Speed 9299.33 samples/sec   Loss 4.6541   LearningRate 0.0009   Epoch: 5   Global Step: 9860   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-03-05 03:46:35,495-Speed 9331.36 samples/sec   Loss 4.6515   LearningRate 0.0009   Epoch: 5   Global Step: 9870   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:47:01,709-Speed 9375.46 samples/sec   Loss 4.6441   LearningRate 0.0009   Epoch: 5   Global Step: 9880   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:47:27,919-Speed 9376.87 samples/sec   Loss 4.6261   LearningRate 0.0009   Epoch: 5   Global Step: 9890   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:47:54,213-Speed 9347.12 samples/sec   Loss 4.6484   LearningRate 0.0009   Epoch: 5   Global Step: 9900   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:48:20,489-Speed 9353.68 samples/sec   Loss 4.6058   LearningRate 0.0009   Epoch: 5   Global Step: 9910   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:48:46,758-Speed 9355.74 samples/sec   Loss 4.6456   LearningRate 0.0009   Epoch: 5   Global Step: 9920   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:49:13,106-Speed 9327.77 samples/sec   Loss 4.6315   LearningRate 0.0009   Epoch: 5   Global Step: 9930   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:49:39,484-Speed 9317.02 samples/sec   Loss 4.6485   LearningRate 0.0009   Epoch: 5   Global Step: 9940   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:50:05,749-Speed 9357.66 samples/sec   Loss 4.6476   LearningRate 0.0009   Epoch: 5   Global Step: 9950   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:50:32,227-Speed 9281.80 samples/sec   Loss 4.6232   LearningRate 0.0009   Epoch: 5   Global Step: 9960   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-03-05 03:50:58,483-Speed 9360.46 samples/sec   Loss 4.6023   LearningRate 0.0009   Epoch: 5   Global Step: 9970   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-03-05 03:51:24,824-Speed 9330.53 samples/sec   Loss 4.5982   LearningRate 0.0009   Epoch: 5   Global Step: 9980   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-03-05 03:51:51,163-Speed 9330.95 samples/sec   Loss 4.6288   LearningRate 0.0009   Epoch: 5   Global Step: 9990   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-03-05 03:52:17,535-Speed 9319.48 samples/sec   Loss 4.6102   LearningRate 0.0009   Epoch: 5   Global Step: 10000   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-03-05 03:52:43,888-Speed 9325.92 samples/sec   Loss 4.6053   LearningRate 0.0009   Epoch: 5   Global Step: 10010   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-03-05 03:53:10,301-Speed 9305.05 samples/sec   Loss 4.6078   LearningRate 0.0009   Epoch: 5   Global Step: 10020   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 03:53:36,766-Speed 9286.76 samples/sec   Loss 4.5559   LearningRate 0.0009   Epoch: 5   Global Step: 10030   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 03:54:03,209-Speed 9294.24 samples/sec   Loss 4.5569   LearningRate 0.0009   Epoch: 5   Global Step: 10040   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 03:54:29,638-Speed 9299.37 samples/sec   Loss 4.6055   LearningRate 0.0009   Epoch: 5   Global Step: 10050   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 03:54:56,027-Speed 9313.27 samples/sec   Loss 4.5800   LearningRate 0.0009   Epoch: 5   Global Step: 10060   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 03:55:22,453-Speed 9300.28 samples/sec   Loss 4.5639   LearningRate 0.0009   Epoch: 5   Global Step: 10070   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 03:55:48,824-Speed 9319.87 samples/sec   Loss 4.5525   LearningRate 0.0009   Epoch: 5   Global Step: 10080   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 03:56:15,268-Speed 9293.86 samples/sec   Loss 4.5503   LearningRate 0.0009   Epoch: 5   Global Step: 10090   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 03:56:41,676-Speed 9307.01 samples/sec   Loss 4.5641   LearningRate 0.0009   Epoch: 5   Global Step: 10100   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 03:57:08,088-Speed 9305.17 samples/sec   Loss 4.5420   LearningRate 0.0009   Epoch: 5   Global Step: 10110   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 03:57:34,512-Speed 9301.20 samples/sec   Loss 4.5329   LearningRate 0.0009   Epoch: 5   Global Step: 10120   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-03-05 03:58:00,951-Speed 9295.60 samples/sec   Loss 4.5302   LearningRate 0.0009   Epoch: 5   Global Step: 10130   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-03-05 03:58:27,245-Speed 9347.39 samples/sec   Loss 4.5423   LearningRate 0.0009   Epoch: 5   Global Step: 10140   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 03:58:53,748-Speed 9273.33 samples/sec   Loss 4.5726   LearningRate 0.0009   Epoch: 5   Global Step: 10150   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 03:59:20,079-Speed 9333.95 samples/sec   Loss 4.5319   LearningRate 0.0009   Epoch: 5   Global Step: 10160   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 03:59:46,521-Speed 9294.76 samples/sec   Loss 4.5321   LearningRate 0.0009   Epoch: 5   Global Step: 10170   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:00:12,884-Speed 9322.35 samples/sec   Loss 4.4972   LearningRate 0.0009   Epoch: 5   Global Step: 10180   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:00:39,322-Speed 9296.41 samples/sec   Loss 4.5242   LearningRate 0.0009   Epoch: 5   Global Step: 10190   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:01:05,852-Speed 9263.64 samples/sec   Loss 4.5197   LearningRate 0.0009   Epoch: 5   Global Step: 10200   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:01:32,407-Speed 9255.47 samples/sec   Loss 4.4897   LearningRate 0.0009   Epoch: 5   Global Step: 10210   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:01:59,001-Speed 9241.58 samples/sec   Loss 4.5316   LearningRate 0.0009   Epoch: 5   Global Step: 10220   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:02:25,422-Speed 9301.95 samples/sec   Loss 4.5071   LearningRate 0.0009   Epoch: 5   Global Step: 10230   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:02:51,744-Speed 9336.92 samples/sec   Loss 4.4745   LearningRate 0.0009   Epoch: 5   Global Step: 10240   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-03-05 04:03:18,085-Speed 9330.44 samples/sec   Loss 4.4785   LearningRate 0.0009   Epoch: 5   Global Step: 10250   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-03-05 04:03:44,450-Speed 9321.75 samples/sec   Loss 4.4893   LearningRate 0.0009   Epoch: 5   Global Step: 10260   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-03-05 04:04:10,860-Speed 9305.91 samples/sec   Loss 4.5536   LearningRate 0.0009   Epoch: 5   Global Step: 10270   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-03-05 04:04:37,206-Speed 9328.64 samples/sec   Loss 4.4791   LearningRate 0.0009   Epoch: 5   Global Step: 10280   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-03-05 04:05:03,571-Speed 9322.77 samples/sec   Loss 4.5119   LearningRate 0.0009   Epoch: 5   Global Step: 10290   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-03-05 04:05:29,838-Speed 9356.54 samples/sec   Loss 4.5300   LearningRate 0.0009   Epoch: 5   Global Step: 10300   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-03-05 04:05:56,320-Speed 9280.67 samples/sec   Loss 4.4641   LearningRate 0.0009   Epoch: 5   Global Step: 10310   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-03-05 04:06:22,623-Speed 9343.68 samples/sec   Loss 4.4704   LearningRate 0.0009   Epoch: 5   Global Step: 10320   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-03-05 04:06:48,950-Speed 9335.38 samples/sec   Loss 4.4894   LearningRate 0.0009   Epoch: 5   Global Step: 10330   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-03-05 04:07:15,275-Speed 9336.12 samples/sec   Loss 4.5135   LearningRate 0.0009   Epoch: 5   Global Step: 10340   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:07:41,602-Speed 9335.51 samples/sec   Loss 4.4829   LearningRate 0.0009   Epoch: 5   Global Step: 10350   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:08:08,020-Speed 9302.94 samples/sec   Loss 4.5214   LearningRate 0.0009   Epoch: 5   Global Step: 10360   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:08:34,225-Speed 9378.78 samples/sec   Loss 4.5018   LearningRate 0.0009   Epoch: 5   Global Step: 10370   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:09:53,735-Speed 3090.97 samples/sec   Loss 4.3854   LearningRate 0.0009   Epoch: 6   Global Step: 10380   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:10:19,817-Speed 9423.07 samples/sec   Loss 4.3851   LearningRate 0.0009   Epoch: 6   Global Step: 10390   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:10:45,931-Speed 9411.58 samples/sec   Loss 4.3645   LearningRate 0.0009   Epoch: 6   Global Step: 10400   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:11:12,059-Speed 9406.57 samples/sec   Loss 4.3863   LearningRate 0.0009   Epoch: 6   Global Step: 10410   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:11:38,281-Speed 9372.82 samples/sec   Loss 4.3680   LearningRate 0.0009   Epoch: 6   Global Step: 10420   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:12:04,421-Speed 9401.79 samples/sec   Loss 4.3870   LearningRate 0.0009   Epoch: 6   Global Step: 10430   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:12:30,559-Speed 9402.78 samples/sec   Loss 4.3441   LearningRate 0.0009   Epoch: 6   Global Step: 10440   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-03-05 04:12:56,733-Speed 9390.26 samples/sec   Loss 4.4055   LearningRate 0.0009   Epoch: 6   Global Step: 10450   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-03-05 04:13:22,921-Speed 9384.75 samples/sec   Loss 4.3932   LearningRate 0.0009   Epoch: 6   Global Step: 10460   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-03-05 04:13:49,183-Speed 9358.58 samples/sec   Loss 4.3772   LearningRate 0.0009   Epoch: 6   Global Step: 10470   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-03-05 04:14:15,342-Speed 9395.15 samples/sec   Loss 4.3810   LearningRate 0.0009   Epoch: 6   Global Step: 10480   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-03-05 04:14:41,521-Speed 9388.05 samples/sec   Loss 4.3855   LearningRate 0.0009   Epoch: 6   Global Step: 10490   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-03-05 04:15:07,639-Speed 9410.02 samples/sec   Loss 4.3734   LearningRate 0.0009   Epoch: 6   Global Step: 10500   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-03-05 04:15:33,836-Speed 9381.51 samples/sec   Loss 4.3913   LearningRate 0.0009   Epoch: 6   Global Step: 10510   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:16:00,056-Speed 9373.37 samples/sec   Loss 4.3715   LearningRate 0.0009   Epoch: 6   Global Step: 10520   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:16:26,245-Speed 9384.55 samples/sec   Loss 4.3703   LearningRate 0.0009   Epoch: 6   Global Step: 10530   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:16:52,465-Speed 9373.33 samples/sec   Loss 4.3666   LearningRate 0.0009   Epoch: 6   Global Step: 10540   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:17:18,598-Speed 9404.79 samples/sec   Loss 4.3270   LearningRate 0.0009   Epoch: 6   Global Step: 10550   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:17:44,795-Speed 9381.65 samples/sec   Loss 4.3559   LearningRate 0.0009   Epoch: 6   Global Step: 10560   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:18:10,892-Speed 9417.76 samples/sec   Loss 4.3744   LearningRate 0.0009   Epoch: 6   Global Step: 10570   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:18:37,053-Speed 9394.58 samples/sec   Loss 4.3922   LearningRate 0.0009   Epoch: 6   Global Step: 10580   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:19:03,241-Speed 9385.22 samples/sec   Loss 4.3569   LearningRate 0.0009   Epoch: 6   Global Step: 10590   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:19:29,421-Speed 9387.87 samples/sec   Loss 4.3418   LearningRate 0.0009   Epoch: 6   Global Step: 10600   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:19:55,602-Speed 9387.44 samples/sec   Loss 4.3194   LearningRate 0.0009   Epoch: 6   Global Step: 10610   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-03-05 04:20:21,831-Speed 9370.08 samples/sec   Loss 4.3127   LearningRate 0.0009   Epoch: 6   Global Step: 10620   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-03-05 04:20:48,012-Speed 9387.65 samples/sec   Loss 4.3352   LearningRate 0.0009   Epoch: 6   Global Step: 10630   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-03-05 04:21:14,199-Speed 9385.16 samples/sec   Loss 4.3207   LearningRate 0.0009   Epoch: 6   Global Step: 10640   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-03-05 04:21:40,447-Speed 9363.37 samples/sec   Loss 4.3004   LearningRate 0.0009   Epoch: 6   Global Step: 10650   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-03-05 04:22:06,670-Speed 9372.40 samples/sec   Loss 4.3409   LearningRate 0.0009   Epoch: 6   Global Step: 10660   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-03-05 04:22:32,900-Speed 9369.57 samples/sec   Loss 4.3711   LearningRate 0.0009   Epoch: 6   Global Step: 10670   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-03-05 04:22:59,118-Speed 9374.16 samples/sec   Loss 4.3202   LearningRate 0.0009   Epoch: 6   Global Step: 10680   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-03-05 04:23:25,319-Speed 9380.62 samples/sec   Loss 4.2979   LearningRate 0.0009   Epoch: 6   Global Step: 10690   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-03-05 04:23:51,477-Speed 9396.52 samples/sec   Loss 4.3656   LearningRate 0.0009   Epoch: 6   Global Step: 10700   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-03-05 04:24:17,642-Speed 9393.34 samples/sec   Loss 4.3237   LearningRate 0.0009   Epoch: 6   Global Step: 10710   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-03-05 04:24:43,848-Speed 9378.23 samples/sec   Loss 4.3157   LearningRate 0.0009   Epoch: 6   Global Step: 10720   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-03-05 04:25:10,045-Speed 9381.89 samples/sec   Loss 4.2917   LearningRate 0.0009   Epoch: 6   Global Step: 10730   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-03-05 04:25:36,203-Speed 9395.34 samples/sec   Loss 4.2777   LearningRate 0.0009   Epoch: 6   Global Step: 10740   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:26:02,349-Speed 9399.77 samples/sec   Loss 4.2762   LearningRate 0.0009   Epoch: 6   Global Step: 10750   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:26:28,500-Speed 9398.15 samples/sec   Loss 4.3057   LearningRate 0.0009   Epoch: 6   Global Step: 10760   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:26:54,821-Speed 9337.36 samples/sec   Loss 4.3069   LearningRate 0.0009   Epoch: 6   Global Step: 10770   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:27:21,056-Speed 9368.23 samples/sec   Loss 4.2916   LearningRate 0.0009   Epoch: 6   Global Step: 10780   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-03-05 04:27:47,396-Speed 9330.82 samples/sec   Loss 4.2863   LearningRate 0.0009   Epoch: 6   Global Step: 10790   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-03-05 04:28:13,726-Speed 9334.08 samples/sec   Loss 4.2633   LearningRate 0.0009   Epoch: 6   Global Step: 10800   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-03-05 04:28:40,017-Speed 9348.12 samples/sec   Loss 4.2793   LearningRate 0.0009   Epoch: 6   Global Step: 10810   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-03-05 04:29:06,352-Speed 9332.62 samples/sec   Loss 4.2755   LearningRate 0.0009   Epoch: 6   Global Step: 10820   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-03-05 04:29:32,606-Speed 9361.27 samples/sec   Loss 4.2586   LearningRate 0.0009   Epoch: 6   Global Step: 10830   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-03-05 04:29:58,886-Speed 9352.05 samples/sec   Loss 4.2919   LearningRate 0.0009   Epoch: 6   Global Step: 10840   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-03-05 04:30:25,105-Speed 9373.68 samples/sec   Loss 4.2774   LearningRate 0.0009   Epoch: 6   Global Step: 10850   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-03-05 04:30:51,443-Speed 9331.22 samples/sec   Loss 4.2367   LearningRate 0.0009   Epoch: 6   Global Step: 10860   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-03-05 04:31:17,683-Speed 9366.16 samples/sec   Loss 4.2620   LearningRate 0.0009   Epoch: 6   Global Step: 10870   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-03-05 04:31:44,011-Speed 9334.86 samples/sec   Loss 4.2552   LearningRate 0.0009   Epoch: 6   Global Step: 10880   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:32:10,311-Speed 9345.02 samples/sec   Loss 4.2195   LearningRate 0.0009   Epoch: 6   Global Step: 10890   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:32:36,690-Speed 9316.88 samples/sec   Loss 4.2535   LearningRate 0.0009   Epoch: 6   Global Step: 10900   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:33:03,035-Speed 9329.06 samples/sec   Loss 4.2267   LearningRate 0.0009   Epoch: 6   Global Step: 10910   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:33:29,326-Speed 9348.07 samples/sec   Loss 4.2684   LearningRate 0.0009   Epoch: 6   Global Step: 10920   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:33:55,581-Speed 9360.95 samples/sec   Loss 4.2382   LearningRate 0.0009   Epoch: 6   Global Step: 10930   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:34:21,857-Speed 9353.19 samples/sec   Loss 4.1933   LearningRate 0.0009   Epoch: 6   Global Step: 10940   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:34:48,301-Speed 9294.02 samples/sec   Loss 4.1927   LearningRate 0.0009   Epoch: 6   Global Step: 10950   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:35:14,718-Speed 9303.53 samples/sec   Loss 4.2450   LearningRate 0.0009   Epoch: 6   Global Step: 10960   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:35:41,398-Speed 9211.66 samples/sec   Loss 4.2490   LearningRate 0.0009   Epoch: 6   Global Step: 10970   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:36:07,865-Speed 9286.19 samples/sec   Loss 4.2284   LearningRate 0.0009   Epoch: 6   Global Step: 10980   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-03-05 04:36:34,303-Speed 9296.12 samples/sec   Loss 4.2057   LearningRate 0.0009   Epoch: 6   Global Step: 10990   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:37:00,690-Speed 9314.15 samples/sec   Loss 4.1979   LearningRate 0.0009   Epoch: 6   Global Step: 11000   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-03-05 04:37:26,994-Speed 9343.36 samples/sec   Loss 4.1977   LearningRate 0.0009   Epoch: 6   Global Step: 11010   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-03-05 04:37:53,361-Speed 9321.32 samples/sec   Loss 4.2102   LearningRate 0.0009   Epoch: 6   Global Step: 11020   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-03-05 04:38:19,743-Speed 9315.84 samples/sec   Loss 4.1919   LearningRate 0.0009   Epoch: 6   Global Step: 11030   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-03-05 04:38:46,102-Speed 9324.06 samples/sec   Loss 4.1908   LearningRate 0.0009   Epoch: 6   Global Step: 11040   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-03-05 04:39:12,584-Speed 9280.51 samples/sec   Loss 4.1813   LearningRate 0.0009   Epoch: 6   Global Step: 11050   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-03-05 04:39:39,077-Speed 9277.30 samples/sec   Loss 4.1830   LearningRate 0.0009   Epoch: 6   Global Step: 11060   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-03-05 04:40:05,513-Speed 9296.89 samples/sec   Loss 4.2159   LearningRate 0.0009   Epoch: 6   Global Step: 11070   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-03-05 04:40:32,006-Speed 9276.75 samples/sec   Loss 4.1664   LearningRate 0.0009   Epoch: 6   Global Step: 11080   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-03-05 04:40:58,441-Speed 9297.52 samples/sec   Loss 4.1867   LearningRate 0.0009   Epoch: 6   Global Step: 11090   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-03-05 04:41:24,987-Speed 9258.33 samples/sec   Loss 4.1869   LearningRate 0.0009   Epoch: 6   Global Step: 11100   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:41:51,467-Speed 9281.31 samples/sec   Loss 4.1780   LearningRate 0.0009   Epoch: 6   Global Step: 11110   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:42:18,002-Speed 9261.99 samples/sec   Loss 4.1472   LearningRate 0.0009   Epoch: 6   Global Step: 11120   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:42:44,549-Speed 9258.11 samples/sec   Loss 4.1439   LearningRate 0.0009   Epoch: 6   Global Step: 11130   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:43:11,207-Speed 9219.38 samples/sec   Loss 4.1699   LearningRate 0.0009   Epoch: 6   Global Step: 11140   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:43:38,008-Speed 9170.41 samples/sec   Loss 4.1704   LearningRate 0.0009   Epoch: 6   Global Step: 11150   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:44:04,557-Speed 9257.31 samples/sec   Loss 4.1334   LearningRate 0.0009   Epoch: 6   Global Step: 11160   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:44:30,976-Speed 9302.78 samples/sec   Loss 4.1870   LearningRate 0.0009   Epoch: 6   Global Step: 11170   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:44:57,535-Speed 9253.55 samples/sec   Loss 4.1509   LearningRate 0.0009   Epoch: 6   Global Step: 11180   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-03-05 04:45:24,112-Speed 9247.55 samples/sec   Loss 4.1486   LearningRate 0.0009   Epoch: 6   Global Step: 11190   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-03-05 04:45:50,785-Speed 9214.50 samples/sec   Loss 4.1123   LearningRate 0.0009   Epoch: 6   Global Step: 11200   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-03-05 04:46:17,359-Speed 9248.78 samples/sec   Loss 4.1195   LearningRate 0.0009   Epoch: 6   Global Step: 11210   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-03-05 04:46:43,953-Speed 9241.35 samples/sec   Loss 4.1457   LearningRate 0.0009   Epoch: 6   Global Step: 11220   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-03-05 04:47:10,626-Speed 9215.27 samples/sec   Loss 4.1479   LearningRate 0.0009   Epoch: 6   Global Step: 11230   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-03-05 04:47:37,447-Speed 9163.24 samples/sec   Loss 4.0932   LearningRate 0.0009   Epoch: 6   Global Step: 11240   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-03-05 04:48:04,056-Speed 9236.41 samples/sec   Loss 4.1220   LearningRate 0.0009   Epoch: 6   Global Step: 11250   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-03-05 04:48:30,662-Speed 9237.66 samples/sec   Loss 4.0950   LearningRate 0.0009   Epoch: 6   Global Step: 11260   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-03-05 04:48:57,219-Speed 9254.43 samples/sec   Loss 4.1041   LearningRate 0.0009   Epoch: 6   Global Step: 11270   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-03-05 04:49:23,859-Speed 9225.66 samples/sec   Loss 4.1125   LearningRate 0.0009   Epoch: 6   Global Step: 11280   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:49:50,434-Speed 9248.23 samples/sec   Loss 4.1294   LearningRate 0.0009   Epoch: 6   Global Step: 11290   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:50:16,977-Speed 9259.59 samples/sec   Loss 4.0979   LearningRate 0.0009   Epoch: 6   Global Step: 11300   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:50:43,616-Speed 9226.18 samples/sec   Loss 4.1185   LearningRate 0.0009   Epoch: 6   Global Step: 11310   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:51:10,233-Speed 9233.49 samples/sec   Loss 4.0918   LearningRate 0.0009   Epoch: 6   Global Step: 11320   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:51:36,823-Speed 9243.85 samples/sec   Loss 4.0809   LearningRate 0.0009   Epoch: 6   Global Step: 11330   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:52:03,392-Speed 9250.27 samples/sec   Loss 4.0894   LearningRate 0.0009   Epoch: 6   Global Step: 11340   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:52:29,837-Speed 9293.76 samples/sec   Loss 4.0672   LearningRate 0.0009   Epoch: 6   Global Step: 11350   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:52:56,396-Speed 9253.86 samples/sec   Loss 4.1072   LearningRate 0.0009   Epoch: 6   Global Step: 11360   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:53:22,965-Speed 9250.32 samples/sec   Loss 4.0833   LearningRate 0.0009   Epoch: 6   Global Step: 11370   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-03-05 04:53:49,555-Speed 9243.32 samples/sec   Loss 4.0623   LearningRate 0.0009   Epoch: 6   Global Step: 11380   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-03-05 04:54:16,025-Speed 9284.89 samples/sec   Loss 4.0567   LearningRate 0.0009   Epoch: 6   Global Step: 11390   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-03-05 04:54:42,638-Speed 9234.75 samples/sec   Loss 4.0584   LearningRate 0.0009   Epoch: 6   Global Step: 11400   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-03-05 04:55:09,179-Speed 9260.03 samples/sec   Loss 4.0645   LearningRate 0.0009   Epoch: 6   Global Step: 11410   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 04:55:35,577-Speed 9310.07 samples/sec   Loss 4.0933   LearningRate 0.0009   Epoch: 6   Global Step: 11420   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 04:56:02,061-Speed 9280.15 samples/sec   Loss 4.0858   LearningRate 0.0009   Epoch: 6   Global Step: 11430   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 04:56:28,503-Speed 9294.79 samples/sec   Loss 4.0482   LearningRate 0.0009   Epoch: 6   Global Step: 11440   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 04:56:54,971-Speed 9285.41 samples/sec   Loss 4.0657   LearningRate 0.0009   Epoch: 6   Global Step: 11450   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 04:57:21,685-Speed 9200.16 samples/sec   Loss 4.0795   LearningRate 0.0009   Epoch: 6   Global Step: 11460   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 04:57:48,248-Speed 9252.59 samples/sec   Loss 4.0390   LearningRate 0.0009   Epoch: 6   Global Step: 11470   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 04:58:14,902-Speed 9220.81 samples/sec   Loss 4.0505   LearningRate 0.0009   Epoch: 6   Global Step: 11480   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 04:58:41,600-Speed 9205.86 samples/sec   Loss 4.0421   LearningRate 0.0009   Epoch: 6   Global Step: 11490   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 04:59:08,363-Speed 9183.23 samples/sec   Loss 4.0082   LearningRate 0.0009   Epoch: 6   Global Step: 11500   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 04:59:35,038-Speed 9213.58 samples/sec   Loss 4.0255   LearningRate 0.0009   Epoch: 6   Global Step: 11510   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-03-05 05:00:01,708-Speed 9215.07 samples/sec   Loss 4.0331   LearningRate 0.0009   Epoch: 6   Global Step: 11520   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-03-05 05:00:28,361-Speed 9221.46 samples/sec   Loss 4.0340   LearningRate 0.0009   Epoch: 6   Global Step: 11530   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-03-05 05:00:54,974-Speed 9235.09 samples/sec   Loss 4.0377   LearningRate 0.0009   Epoch: 6   Global Step: 11540   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:01:21,595-Speed 9232.18 samples/sec   Loss 4.0418   LearningRate 0.0009   Epoch: 6   Global Step: 11550   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:01:48,141-Speed 9258.53 samples/sec   Loss 4.0209   LearningRate 0.0009   Epoch: 6   Global Step: 11560   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:02:14,718-Speed 9247.30 samples/sec   Loss 4.0191   LearningRate 0.0009   Epoch: 6   Global Step: 11570   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:02:41,330-Speed 9235.54 samples/sec   Loss 3.9881   LearningRate 0.0009   Epoch: 6   Global Step: 11580   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:03:07,950-Speed 9232.36 samples/sec   Loss 4.0194   LearningRate 0.0009   Epoch: 6   Global Step: 11590   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:03:34,606-Speed 9220.20 samples/sec   Loss 4.0299   LearningRate 0.0009   Epoch: 6   Global Step: 11600   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:04:01,232-Speed 9230.27 samples/sec   Loss 3.9816   LearningRate 0.0009   Epoch: 6   Global Step: 11610   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-03-05 05:04:27,821-Speed 9243.52 samples/sec   Loss 4.0328   LearningRate 0.0009   Epoch: 6   Global Step: 11620   Fp16 Grad Scale: 16384   Required: 42 hours
Training: 2022-03-05 05:04:54,391-Speed 9250.10 samples/sec   Loss 4.0067   LearningRate 0.0009   Epoch: 6   Global Step: 11630   Fp16 Grad Scale: 16384   Required: 42 hours
Training: 2022-03-05 05:05:20,795-Speed 9307.90 samples/sec   Loss 4.0017   LearningRate 0.0009   Epoch: 6   Global Step: 11640   Fp16 Grad Scale: 16384   Required: 42 hours
Training: 2022-03-05 05:05:47,092-Speed 9346.05 samples/sec   Loss 3.9691   LearningRate 0.0009   Epoch: 6   Global Step: 11650   Fp16 Grad Scale: 16384   Required: 42 hours
Training: 2022-03-05 05:06:13,423-Speed 9333.84 samples/sec   Loss 4.0049   LearningRate 0.0009   Epoch: 6   Global Step: 11660   Fp16 Grad Scale: 16384   Required: 42 hours
Training: 2022-03-05 05:06:39,923-Speed 9274.79 samples/sec   Loss 3.9609   LearningRate 0.0009   Epoch: 6   Global Step: 11670   Fp16 Grad Scale: 16384   Required: 42 hours
Training: 2022-03-05 05:07:06,355-Speed 9298.28 samples/sec   Loss 4.0212   LearningRate 0.0009   Epoch: 6   Global Step: 11680   Fp16 Grad Scale: 16384   Required: 42 hours
Training: 2022-03-05 05:07:32,909-Speed 9255.53 samples/sec   Loss 3.9503   LearningRate 0.0009   Epoch: 6   Global Step: 11690   Fp16 Grad Scale: 16384   Required: 42 hours
Training: 2022-03-05 05:07:59,438-Speed 9264.29 samples/sec   Loss 3.9697   LearningRate 0.0009   Epoch: 6   Global Step: 11700   Fp16 Grad Scale: 16384   Required: 42 hours
Training: 2022-03-05 05:08:25,914-Speed 9282.95 samples/sec   Loss 3.9778   LearningRate 0.0009   Epoch: 6   Global Step: 11710   Fp16 Grad Scale: 16384   Required: 42 hours
Training: 2022-03-05 05:08:52,298-Speed 9315.04 samples/sec   Loss 4.0403   LearningRate 0.0009   Epoch: 6   Global Step: 11720   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-03-05 05:09:18,694-Speed 9311.02 samples/sec   Loss 3.9831   LearningRate 0.0009   Epoch: 6   Global Step: 11730   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-03-05 05:09:45,180-Speed 9279.16 samples/sec   Loss 3.9591   LearningRate 0.0009   Epoch: 6   Global Step: 11740   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-03-05 05:10:11,641-Speed 9287.90 samples/sec   Loss 3.9863   LearningRate 0.0009   Epoch: 6   Global Step: 11750   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-03-05 05:10:38,194-Speed 9255.83 samples/sec   Loss 3.9568   LearningRate 0.0009   Epoch: 6   Global Step: 11760   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-03-05 05:11:04,745-Speed 9256.46 samples/sec   Loss 3.9589   LearningRate 0.0008   Epoch: 6   Global Step: 11770   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-03-05 05:11:31,253-Speed 9271.98 samples/sec   Loss 3.9636   LearningRate 0.0008   Epoch: 6   Global Step: 11780   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-03-05 05:11:57,618-Speed 9321.97 samples/sec   Loss 3.9330   LearningRate 0.0008   Epoch: 6   Global Step: 11790   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-03-05 05:12:24,054-Speed 9296.75 samples/sec   Loss 3.9061   LearningRate 0.0008   Epoch: 6   Global Step: 11800   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-03-05 05:12:50,336-Speed 9351.18 samples/sec   Loss 3.9446   LearningRate 0.0008   Epoch: 6   Global Step: 11810   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-03-05 05:13:16,721-Speed 9314.89 samples/sec   Loss 3.9631   LearningRate 0.0008   Epoch: 6   Global Step: 11820   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:13:43,259-Speed 9261.18 samples/sec   Loss 3.9583   LearningRate 0.0008   Epoch: 6   Global Step: 11830   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:14:09,784-Speed 9265.59 samples/sec   Loss 3.9326   LearningRate 0.0008   Epoch: 6   Global Step: 11840   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:14:36,314-Speed 9264.00 samples/sec   Loss 3.9576   LearningRate 0.0008   Epoch: 6   Global Step: 11850   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:15:02,902-Speed 9243.93 samples/sec   Loss 3.9427   LearningRate 0.0008   Epoch: 6   Global Step: 11860   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:15:29,404-Speed 9273.60 samples/sec   Loss 3.9276   LearningRate 0.0008   Epoch: 6   Global Step: 11870   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:15:55,996-Speed 9242.49 samples/sec   Loss 3.9447   LearningRate 0.0008   Epoch: 6   Global Step: 11880   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:16:22,415-Speed 9302.80 samples/sec   Loss 3.9627   LearningRate 0.0008   Epoch: 6   Global Step: 11890   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:16:48,915-Speed 9274.47 samples/sec   Loss 3.9219   LearningRate 0.0008   Epoch: 6   Global Step: 11900   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:17:15,532-Speed 9233.67 samples/sec   Loss 3.9297   LearningRate 0.0008   Epoch: 6   Global Step: 11910   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:17:42,386-Speed 9152.24 samples/sec   Loss 3.9099   LearningRate 0.0008   Epoch: 6   Global Step: 11920   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-03-05 05:18:08,945-Speed 9253.74 samples/sec   Loss 3.8764   LearningRate 0.0008   Epoch: 6   Global Step: 11930   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-03-05 05:18:35,388-Speed 9294.58 samples/sec   Loss 3.9105   LearningRate 0.0008   Epoch: 6   Global Step: 11940   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-03-05 05:19:02,081-Speed 9207.34 samples/sec   Loss 3.9661   LearningRate 0.0008   Epoch: 6   Global Step: 11950   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-03-05 05:19:28,723-Speed 9225.24 samples/sec   Loss 3.9365   LearningRate 0.0008   Epoch: 6   Global Step: 11960   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-03-05 05:19:55,369-Speed 9223.67 samples/sec   Loss 3.8992   LearningRate 0.0008   Epoch: 6   Global Step: 11970   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-03-05 05:20:21,925-Speed 9254.73 samples/sec   Loss 3.9749   LearningRate 0.0008   Epoch: 6   Global Step: 11980   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:20:48,467-Speed 9259.71 samples/sec   Loss 3.9024   LearningRate 0.0008   Epoch: 6   Global Step: 11990   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:21:14,982-Speed 9269.16 samples/sec   Loss 3.8782   LearningRate 0.0008   Epoch: 6   Global Step: 12000   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:21:41,520-Speed 9261.06 samples/sec   Loss 3.9037   LearningRate 0.0008   Epoch: 6   Global Step: 12010   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:22:07,989-Speed 9285.06 samples/sec   Loss 3.8851   LearningRate 0.0008   Epoch: 6   Global Step: 12020   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:22:34,499-Speed 9271.01 samples/sec   Loss 3.8847   LearningRate 0.0008   Epoch: 6   Global Step: 12030   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:23:01,028-Speed 9264.03 samples/sec   Loss 3.9077   LearningRate 0.0008   Epoch: 6   Global Step: 12040   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:23:27,704-Speed 9213.36 samples/sec   Loss 3.9189   LearningRate 0.0008   Epoch: 6   Global Step: 12050   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:23:54,237-Speed 9262.82 samples/sec   Loss 3.8815   LearningRate 0.0008   Epoch: 6   Global Step: 12060   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:24:20,783-Speed 9258.35 samples/sec   Loss 3.9078   LearningRate 0.0008   Epoch: 6   Global Step: 12070   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:24:47,302-Speed 9267.76 samples/sec   Loss 3.9332   LearningRate 0.0008   Epoch: 6   Global Step: 12080   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-03-05 05:25:13,795-Speed 9276.69 samples/sec   Loss 3.9283   LearningRate 0.0008   Epoch: 6   Global Step: 12090   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-03-05 05:26:33,891-Speed 3068.38 samples/sec   Loss 3.9080   LearningRate 0.0008   Epoch: 7   Global Step: 12100   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-03-05 05:26:59,771-Speed 9496.64 samples/sec   Loss 3.8232   LearningRate 0.0008   Epoch: 7   Global Step: 12110   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-03-05 05:27:25,799-Speed 9442.93 samples/sec   Loss 3.8221   LearningRate 0.0008   Epoch: 7   Global Step: 12120   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-03-05 05:27:52,090-Speed 9347.93 samples/sec   Loss 3.8340   LearningRate 0.0008   Epoch: 7   Global Step: 12130   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-03-05 05:28:18,565-Speed 9283.02 samples/sec   Loss 3.8684   LearningRate 0.0008   Epoch: 7   Global Step: 12140   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-03-05 05:28:45,001-Speed 9296.94 samples/sec   Loss 3.8362   LearningRate 0.0008   Epoch: 7   Global Step: 12150   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-03-05 05:29:11,550-Speed 9257.57 samples/sec   Loss 3.8095   LearningRate 0.0008   Epoch: 7   Global Step: 12160   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-03-05 05:29:37,970-Speed 9302.39 samples/sec   Loss 3.8048   LearningRate 0.0008   Epoch: 7   Global Step: 12170   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-03-05 05:30:04,483-Speed 9269.90 samples/sec   Loss 3.8324   LearningRate 0.0008   Epoch: 7   Global Step: 12180   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:30:30,875-Speed 9312.48 samples/sec   Loss 3.7976   LearningRate 0.0008   Epoch: 7   Global Step: 12190   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:30:57,263-Speed 9313.48 samples/sec   Loss 3.8625   LearningRate 0.0008   Epoch: 7   Global Step: 12200   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:31:23,550-Speed 9349.58 samples/sec   Loss 3.8145   LearningRate 0.0008   Epoch: 7   Global Step: 12210   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:31:49,912-Speed 9322.86 samples/sec   Loss 3.8044   LearningRate 0.0008   Epoch: 7   Global Step: 12220   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:32:16,265-Speed 9326.48 samples/sec   Loss 3.7914   LearningRate 0.0008   Epoch: 7   Global Step: 12230   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:32:42,756-Speed 9277.71 samples/sec   Loss 3.8331   LearningRate 0.0008   Epoch: 7   Global Step: 12240   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:33:09,221-Speed 9286.32 samples/sec   Loss 3.8283   LearningRate 0.0008   Epoch: 7   Global Step: 12250   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:33:35,747-Speed 9265.35 samples/sec   Loss 3.8165   LearningRate 0.0008   Epoch: 7   Global Step: 12260   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:34:02,222-Speed 9283.56 samples/sec   Loss 3.8043   LearningRate 0.0008   Epoch: 7   Global Step: 12270   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:34:28,745-Speed 9266.42 samples/sec   Loss 3.8027   LearningRate 0.0008   Epoch: 7   Global Step: 12280   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-03-05 05:34:55,310-Speed 9251.52 samples/sec   Loss 3.8219   LearningRate 0.0008   Epoch: 7   Global Step: 12290   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-03-05 05:35:21,880-Speed 9250.33 samples/sec   Loss 3.8579   LearningRate 0.0008   Epoch: 7   Global Step: 12300   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-03-05 05:35:48,483-Speed 9238.51 samples/sec   Loss 3.8310   LearningRate 0.0008   Epoch: 7   Global Step: 12310   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-03-05 05:36:15,080-Speed 9240.25 samples/sec   Loss 3.7960   LearningRate 0.0008   Epoch: 7   Global Step: 12320   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-03-05 05:36:41,683-Speed 9238.83 samples/sec   Loss 3.7672   LearningRate 0.0008   Epoch: 7   Global Step: 12330   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-03-05 05:37:08,205-Speed 9266.85 samples/sec   Loss 3.7747   LearningRate 0.0008   Epoch: 7   Global Step: 12340   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-03-05 05:37:34,824-Speed 9232.91 samples/sec   Loss 3.7931   LearningRate 0.0008   Epoch: 7   Global Step: 12350   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-03-05 05:38:01,337-Speed 9269.92 samples/sec   Loss 3.7798   LearningRate 0.0008   Epoch: 7   Global Step: 12360   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-03-05 05:38:27,872-Speed 9262.24 samples/sec   Loss 3.8246   LearningRate 0.0008   Epoch: 7   Global Step: 12370   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-03-05 05:38:54,439-Speed 9251.01 samples/sec   Loss 3.8160   LearningRate 0.0008   Epoch: 7   Global Step: 12380   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:39:21,023-Speed 9244.87 samples/sec   Loss 3.7872   LearningRate 0.0008   Epoch: 7   Global Step: 12390   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:39:47,630-Speed 9237.17 samples/sec   Loss 3.7740   LearningRate 0.0008   Epoch: 7   Global Step: 12400   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:40:14,214-Speed 9245.21 samples/sec   Loss 3.8145   LearningRate 0.0008   Epoch: 7   Global Step: 12410   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:40:40,655-Speed 9295.28 samples/sec   Loss 3.8784   LearningRate 0.0008   Epoch: 7   Global Step: 12420   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:41:07,176-Speed 9267.10 samples/sec   Loss 3.7892   LearningRate 0.0008   Epoch: 7   Global Step: 12430   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:41:33,807-Speed 9228.90 samples/sec   Loss 3.7757   LearningRate 0.0008   Epoch: 7   Global Step: 12440   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:42:00,504-Speed 9205.83 samples/sec   Loss 3.7821   LearningRate 0.0008   Epoch: 7   Global Step: 12450   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:42:27,052-Speed 9257.58 samples/sec   Loss 3.7757   LearningRate 0.0008   Epoch: 7   Global Step: 12460   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:42:53,526-Speed 9283.66 samples/sec   Loss 3.7703   LearningRate 0.0008   Epoch: 7   Global Step: 12470   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:43:20,195-Speed 9215.51 samples/sec   Loss 3.7571   LearningRate 0.0008   Epoch: 7   Global Step: 12480   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-03-05 05:43:46,550-Speed 9325.48 samples/sec   Loss 3.7585   LearningRate 0.0008   Epoch: 7   Global Step: 12490   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:44:13,207-Speed 9219.61 samples/sec   Loss 3.7946   LearningRate 0.0008   Epoch: 7   Global Step: 12500   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:44:39,757-Speed 9257.55 samples/sec   Loss 3.7487   LearningRate 0.0008   Epoch: 7   Global Step: 12510   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:45:06,318-Speed 9252.99 samples/sec   Loss 3.7541   LearningRate 0.0008   Epoch: 7   Global Step: 12520   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:45:32,866-Speed 9257.80 samples/sec   Loss 3.7577   LearningRate 0.0008   Epoch: 7   Global Step: 12530   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:45:59,467-Speed 9239.10 samples/sec   Loss 3.7296   LearningRate 0.0008   Epoch: 7   Global Step: 12540   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-03-05 05:46:26,126-Speed 9219.39 samples/sec   Loss 3.7704   LearningRate 0.0008   Epoch: 7   Global Step: 12550   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-03-05 05:46:52,752-Speed 9230.13 samples/sec   Loss 3.7190   LearningRate 0.0008   Epoch: 7   Global Step: 12560   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-03-05 05:47:19,415-Speed 9217.77 samples/sec   Loss 3.7333   LearningRate 0.0008   Epoch: 7   Global Step: 12570   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-03-05 05:47:45,976-Speed 9253.33 samples/sec   Loss 3.7382   LearningRate 0.0008   Epoch: 7   Global Step: 12580   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-03-05 05:48:12,684-Speed 9201.95 samples/sec   Loss 3.7398   LearningRate 0.0008   Epoch: 7   Global Step: 12590   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-03-05 05:48:39,206-Speed 9266.34 samples/sec   Loss 3.7470   LearningRate 0.0008   Epoch: 7   Global Step: 12600   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-03-05 05:49:05,765-Speed 9254.21 samples/sec   Loss 3.7460   LearningRate 0.0008   Epoch: 7   Global Step: 12610   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-03-05 05:49:32,377-Speed 9235.50 samples/sec   Loss 3.7799   LearningRate 0.0008   Epoch: 7   Global Step: 12620   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-03-05 05:49:59,024-Speed 9223.15 samples/sec   Loss 3.7588   LearningRate 0.0008   Epoch: 7   Global Step: 12630   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-03-05 05:50:25,523-Speed 9275.03 samples/sec   Loss 3.7113   LearningRate 0.0008   Epoch: 7   Global Step: 12640   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:50:51,806-Speed 9350.91 samples/sec   Loss 3.7089   LearningRate 0.0008   Epoch: 7   Global Step: 12650   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:51:18,248-Speed 9294.64 samples/sec   Loss 3.7309   LearningRate 0.0008   Epoch: 7   Global Step: 12660   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:51:44,692-Speed 9293.76 samples/sec   Loss 3.6950   LearningRate 0.0008   Epoch: 7   Global Step: 12670   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:52:11,087-Speed 9311.54 samples/sec   Loss 3.6999   LearningRate 0.0008   Epoch: 7   Global Step: 12680   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:52:37,474-Speed 9313.89 samples/sec   Loss 3.7218   LearningRate 0.0008   Epoch: 7   Global Step: 12690   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:53:03,782-Speed 9342.31 samples/sec   Loss 3.7163   LearningRate 0.0008   Epoch: 7   Global Step: 12700   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:53:30,108-Speed 9335.69 samples/sec   Loss 3.7222   LearningRate 0.0008   Epoch: 7   Global Step: 12710   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:53:56,425-Speed 9339.07 samples/sec   Loss 3.6733   LearningRate 0.0008   Epoch: 7   Global Step: 12720   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:54:22,678-Speed 9361.65 samples/sec   Loss 3.6950   LearningRate 0.0008   Epoch: 7   Global Step: 12730   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:54:49,023-Speed 9328.95 samples/sec   Loss 3.6721   LearningRate 0.0008   Epoch: 7   Global Step: 12740   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-03-05 05:55:15,267-Speed 9364.95 samples/sec   Loss 3.6777   LearningRate 0.0008   Epoch: 7   Global Step: 12750   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-03-05 05:55:41,527-Speed 9359.42 samples/sec   Loss 3.7502   LearningRate 0.0008   Epoch: 7   Global Step: 12760   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:56:07,873-Speed 9328.56 samples/sec   Loss 3.7476   LearningRate 0.0008   Epoch: 7   Global Step: 12770   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:56:34,286-Speed 9305.00 samples/sec   Loss 3.6997   LearningRate 0.0008   Epoch: 7   Global Step: 12780   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:57:00,671-Speed 9315.38 samples/sec   Loss 3.7200   LearningRate 0.0008   Epoch: 7   Global Step: 12790   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:57:26,930-Speed 9360.57 samples/sec   Loss 3.7112   LearningRate 0.0008   Epoch: 7   Global Step: 12800   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:57:53,424-Speed 9276.40 samples/sec   Loss 3.7005   LearningRate 0.0008   Epoch: 7   Global Step: 12810   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:58:19,714-Speed 9348.70 samples/sec   Loss 3.6858   LearningRate 0.0008   Epoch: 7   Global Step: 12820   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:58:45,986-Speed 9354.60 samples/sec   Loss 3.6538   LearningRate 0.0008   Epoch: 7   Global Step: 12830   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:59:12,272-Speed 9349.68 samples/sec   Loss 3.6931   LearningRate 0.0008   Epoch: 7   Global Step: 12840   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-03-05 05:59:38,582-Speed 9341.50 samples/sec   Loss 3.6921   LearningRate 0.0008   Epoch: 7   Global Step: 12850   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:00:04,836-Speed 9361.38 samples/sec   Loss 3.6689   LearningRate 0.0008   Epoch: 7   Global Step: 12860   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:00:31,105-Speed 9355.85 samples/sec   Loss 3.6435   LearningRate 0.0008   Epoch: 7   Global Step: 12870   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:00:57,413-Speed 9341.98 samples/sec   Loss 3.6872   LearningRate 0.0008   Epoch: 7   Global Step: 12880   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:01:23,741-Speed 9335.15 samples/sec   Loss 3.6858   LearningRate 0.0008   Epoch: 7   Global Step: 12890   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:01:49,931-Speed 9384.39 samples/sec   Loss 3.6615   LearningRate 0.0008   Epoch: 7   Global Step: 12900   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:02:16,267-Speed 9331.99 samples/sec   Loss 3.6571   LearningRate 0.0008   Epoch: 7   Global Step: 12910   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:02:42,534-Speed 9356.68 samples/sec   Loss 3.6387   LearningRate 0.0008   Epoch: 7   Global Step: 12920   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:03:08,853-Speed 9338.41 samples/sec   Loss 3.6477   LearningRate 0.0008   Epoch: 7   Global Step: 12930   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:03:35,162-Speed 9341.55 samples/sec   Loss 3.6745   LearningRate 0.0008   Epoch: 7   Global Step: 12940   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:04:01,517-Speed 9325.38 samples/sec   Loss 3.6283   LearningRate 0.0008   Epoch: 7   Global Step: 12950   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:04:27,750-Speed 9368.68 samples/sec   Loss 3.6102   LearningRate 0.0008   Epoch: 7   Global Step: 12960   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-03-05 06:04:54,074-Speed 9336.41 samples/sec   Loss 3.6383   LearningRate 0.0008   Epoch: 7   Global Step: 12970   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-03-05 06:05:20,384-Speed 9341.31 samples/sec   Loss 3.6549   LearningRate 0.0008   Epoch: 7   Global Step: 12980   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-03-05 06:05:46,801-Speed 9303.61 samples/sec   Loss 3.6238   LearningRate 0.0008   Epoch: 7   Global Step: 12990   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-03-05 06:06:13,232-Speed 9298.36 samples/sec   Loss 3.6387   LearningRate 0.0008   Epoch: 7   Global Step: 13000   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:06:39,738-Speed 9272.29 samples/sec   Loss 3.6516   LearningRate 0.0008   Epoch: 7   Global Step: 13010   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:07:06,177-Speed 9295.75 samples/sec   Loss 3.6193   LearningRate 0.0008   Epoch: 7   Global Step: 13020   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:07:32,561-Speed 9314.93 samples/sec   Loss 3.6322   LearningRate 0.0008   Epoch: 7   Global Step: 13030   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:07:59,025-Speed 9287.21 samples/sec   Loss 3.6490   LearningRate 0.0008   Epoch: 7   Global Step: 13040   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:08:25,527-Speed 9273.98 samples/sec   Loss 3.6372   LearningRate 0.0008   Epoch: 7   Global Step: 13050   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:08:51,842-Speed 9339.58 samples/sec   Loss 3.6303   LearningRate 0.0008   Epoch: 7   Global Step: 13060   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:09:18,248-Speed 9307.20 samples/sec   Loss 3.5975   LearningRate 0.0008   Epoch: 7   Global Step: 13070   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:09:44,622-Speed 9318.79 samples/sec   Loss 3.5972   LearningRate 0.0008   Epoch: 7   Global Step: 13080   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:10:10,891-Speed 9356.18 samples/sec   Loss 3.6287   LearningRate 0.0008   Epoch: 7   Global Step: 13090   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:10:37,308-Speed 9303.68 samples/sec   Loss 3.6388   LearningRate 0.0008   Epoch: 7   Global Step: 13100   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-03-05 06:11:03,638-Speed 9334.22 samples/sec   Loss 3.6054   LearningRate 0.0008   Epoch: 7   Global Step: 13110   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-03-05 06:11:30,046-Speed 9306.44 samples/sec   Loss 3.6110   LearningRate 0.0008   Epoch: 7   Global Step: 13120   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-03-05 06:11:56,495-Speed 9292.64 samples/sec   Loss 3.5852   LearningRate 0.0008   Epoch: 7   Global Step: 13130   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-03-05 06:12:22,960-Speed 9286.70 samples/sec   Loss 3.6133   LearningRate 0.0008   Epoch: 7   Global Step: 13140   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-03-05 06:12:49,325-Speed 9322.00 samples/sec   Loss 3.6084   LearningRate 0.0008   Epoch: 7   Global Step: 13150   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:13:15,643-Speed 9338.50 samples/sec   Loss 3.5983   LearningRate 0.0008   Epoch: 7   Global Step: 13160   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:13:42,094-Speed 9291.63 samples/sec   Loss 3.5935   LearningRate 0.0008   Epoch: 7   Global Step: 13170   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:14:08,441-Speed 9328.08 samples/sec   Loss 3.5767   LearningRate 0.0008   Epoch: 7   Global Step: 13180   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:14:34,798-Speed 9324.71 samples/sec   Loss 3.5852   LearningRate 0.0008   Epoch: 7   Global Step: 13190   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:15:01,090-Speed 9347.92 samples/sec   Loss 3.5823   LearningRate 0.0008   Epoch: 7   Global Step: 13200   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:15:27,433-Speed 9329.96 samples/sec   Loss 3.5798   LearningRate 0.0008   Epoch: 7   Global Step: 13210   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:15:53,674-Speed 9366.12 samples/sec   Loss 3.6014   LearningRate 0.0008   Epoch: 7   Global Step: 13220   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:16:20,047-Speed 9319.13 samples/sec   Loss 3.5774   LearningRate 0.0008   Epoch: 7   Global Step: 13230   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:16:46,288-Speed 9365.95 samples/sec   Loss 3.5877   LearningRate 0.0008   Epoch: 7   Global Step: 13240   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:17:12,503-Speed 9375.70 samples/sec   Loss 3.5744   LearningRate 0.0008   Epoch: 7   Global Step: 13250   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-03-05 06:17:38,818-Speed 9339.66 samples/sec   Loss 3.5870   LearningRate 0.0008   Epoch: 7   Global Step: 13260   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-03-05 06:18:05,096-Speed 9352.75 samples/sec   Loss 3.5919   LearningRate 0.0008   Epoch: 7   Global Step: 13270   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-03-05 06:18:31,383-Speed 9349.46 samples/sec   Loss 3.5559   LearningRate 0.0008   Epoch: 7   Global Step: 13280   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-03-05 06:18:57,749-Speed 9321.51 samples/sec   Loss 3.6124   LearningRate 0.0008   Epoch: 7   Global Step: 13290   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-03-05 06:19:24,012-Speed 9358.16 samples/sec   Loss 3.5583   LearningRate 0.0008   Epoch: 7   Global Step: 13300   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-03-05 06:19:50,190-Speed 9388.29 samples/sec   Loss 3.5539   LearningRate 0.0008   Epoch: 7   Global Step: 13310   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-03-05 06:20:16,435-Speed 9364.42 samples/sec   Loss 3.5989   LearningRate 0.0008   Epoch: 7   Global Step: 13320   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-03-05 06:20:42,647-Speed 9377.16 samples/sec   Loss 3.5886   LearningRate 0.0008   Epoch: 7   Global Step: 13330   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:21:08,787-Speed 9402.05 samples/sec   Loss 3.5693   LearningRate 0.0008   Epoch: 7   Global Step: 13340   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:21:34,920-Speed 9404.41 samples/sec   Loss 3.5667   LearningRate 0.0008   Epoch: 7   Global Step: 13350   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:22:01,110-Speed 9384.37 samples/sec   Loss 3.5206   LearningRate 0.0008   Epoch: 7   Global Step: 13360   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:22:27,324-Speed 9375.49 samples/sec   Loss 3.5453   LearningRate 0.0008   Epoch: 7   Global Step: 13370   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:22:53,649-Speed 9335.95 samples/sec   Loss 3.5784   LearningRate 0.0008   Epoch: 7   Global Step: 13380   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:23:19,857-Speed 9377.81 samples/sec   Loss 3.5552   LearningRate 0.0008   Epoch: 7   Global Step: 13390   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:23:46,129-Speed 9354.68 samples/sec   Loss 3.5336   LearningRate 0.0008   Epoch: 7   Global Step: 13400   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:24:12,399-Speed 9355.45 samples/sec   Loss 3.5229   LearningRate 0.0008   Epoch: 7   Global Step: 13410   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:24:38,575-Speed 9389.59 samples/sec   Loss 3.5399   LearningRate 0.0008   Epoch: 7   Global Step: 13420   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:25:04,806-Speed 9369.17 samples/sec   Loss 3.5515   LearningRate 0.0008   Epoch: 7   Global Step: 13430   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-03-05 06:25:31,448-Speed 9225.10 samples/sec   Loss 3.5428   LearningRate 0.0008   Epoch: 7   Global Step: 13440   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-03-05 06:25:58,191-Speed 9190.02 samples/sec   Loss 3.5084   LearningRate 0.0008   Epoch: 7   Global Step: 13450   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-03-05 06:26:24,756-Speed 9251.76 samples/sec   Loss 3.5344   LearningRate 0.0008   Epoch: 7   Global Step: 13460   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-03-05 06:26:51,251-Speed 9276.05 samples/sec   Loss 3.5475   LearningRate 0.0008   Epoch: 7   Global Step: 13470   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-03-05 06:27:17,691-Speed 9295.35 samples/sec   Loss 3.5410   LearningRate 0.0008   Epoch: 7   Global Step: 13480   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-03-05 06:27:44,152-Speed 9287.95 samples/sec   Loss 3.5740   LearningRate 0.0008   Epoch: 7   Global Step: 13490   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-03-05 06:28:10,365-Speed 9375.91 samples/sec   Loss 3.5250   LearningRate 0.0008   Epoch: 7   Global Step: 13500   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-03-05 06:28:36,566-Speed 9380.30 samples/sec   Loss 3.5293   LearningRate 0.0008   Epoch: 7   Global Step: 13510   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-03-05 06:29:02,814-Speed 9364.20 samples/sec   Loss 3.5287   LearningRate 0.0008   Epoch: 7   Global Step: 13520   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-03-05 06:29:28,972-Speed 9395.52 samples/sec   Loss 3.5289   LearningRate 0.0008   Epoch: 7   Global Step: 13530   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-03-05 06:29:55,154-Speed 9387.07 samples/sec   Loss 3.5210   LearningRate 0.0008   Epoch: 7   Global Step: 13540   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-03-05 06:30:21,331-Speed 9389.19 samples/sec   Loss 3.5305   LearningRate 0.0008   Epoch: 7   Global Step: 13550   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-03-05 06:30:47,551-Speed 9373.31 samples/sec   Loss 3.5021   LearningRate 0.0008   Epoch: 7   Global Step: 13560   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-03-05 06:31:13,723-Speed 9390.69 samples/sec   Loss 3.5137   LearningRate 0.0008   Epoch: 7   Global Step: 13570   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-03-05 06:31:39,889-Speed 9392.70 samples/sec   Loss 3.5247   LearningRate 0.0008   Epoch: 7   Global Step: 13580   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-03-05 06:32:06,088-Speed 9380.91 samples/sec   Loss 3.4956   LearningRate 0.0008   Epoch: 7   Global Step: 13590   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-03-05 06:32:32,349-Speed 9358.78 samples/sec   Loss 3.5008   LearningRate 0.0008   Epoch: 7   Global Step: 13600   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-03-05 06:32:58,530-Speed 9387.56 samples/sec   Loss 3.4890   LearningRate 0.0008   Epoch: 7   Global Step: 13610   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-03-05 06:33:24,747-Speed 9374.40 samples/sec   Loss 3.5034   LearningRate 0.0008   Epoch: 7   Global Step: 13620   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-03-05 06:33:50,891-Speed 9400.46 samples/sec   Loss 3.5019   LearningRate 0.0008   Epoch: 7   Global Step: 13630   Fp16 Grad Scale: 262144   Required: 41 hours
Training: 2022-03-05 06:34:17,041-Speed 9398.72 samples/sec   Loss 3.4808   LearningRate 0.0008   Epoch: 7   Global Step: 13640   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-03-05 06:34:43,285-Speed 9364.59 samples/sec   Loss 3.5095   LearningRate 0.0008   Epoch: 7   Global Step: 13650   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-03-05 06:35:09,467-Speed 9387.01 samples/sec   Loss 3.4719   LearningRate 0.0008   Epoch: 7   Global Step: 13660   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-03-05 06:35:35,666-Speed 9380.80 samples/sec   Loss 3.4913   LearningRate 0.0008   Epoch: 7   Global Step: 13670   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-03-05 06:36:01,876-Speed 9377.00 samples/sec   Loss 3.4935   LearningRate 0.0008   Epoch: 7   Global Step: 13680   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-03-05 06:36:28,323-Speed 9293.01 samples/sec   Loss 3.5251   LearningRate 0.0008   Epoch: 7   Global Step: 13690   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-03-05 06:36:54,508-Speed 9385.90 samples/sec   Loss 3.4966   LearningRate 0.0008   Epoch: 7   Global Step: 13700   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-03-05 06:37:20,691-Speed 9386.77 samples/sec   Loss 3.4942   LearningRate 0.0008   Epoch: 7   Global Step: 13710   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-03-05 06:37:46,964-Speed 9354.35 samples/sec   Loss 3.5043   LearningRate 0.0008   Epoch: 7   Global Step: 13720   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-03-05 06:38:13,223-Speed 9359.44 samples/sec   Loss 3.5048   LearningRate 0.0008   Epoch: 7   Global Step: 13730   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-03-05 06:38:39,383-Speed 9394.83 samples/sec   Loss 3.4845   LearningRate 0.0008   Epoch: 7   Global Step: 13740   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-03-05 06:39:05,663-Speed 9351.99 samples/sec   Loss 3.4448   LearningRate 0.0008   Epoch: 7   Global Step: 13750   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-03-05 06:39:31,793-Speed 9405.73 samples/sec   Loss 3.4816   LearningRate 0.0008   Epoch: 7   Global Step: 13760   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-03-05 06:39:58,005-Speed 9378.34 samples/sec   Loss 3.4882   LearningRate 0.0008   Epoch: 7   Global Step: 13770   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:40:24,282-Speed 9353.12 samples/sec   Loss 3.4727   LearningRate 0.0008   Epoch: 7   Global Step: 13780   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:40:50,434-Speed 9397.62 samples/sec   Loss 3.4705   LearningRate 0.0008   Epoch: 7   Global Step: 13790   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:41:16,651-Speed 9374.68 samples/sec   Loss 3.5069   LearningRate 0.0008   Epoch: 7   Global Step: 13800   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:41:42,793-Speed 9401.72 samples/sec   Loss 3.5037   LearningRate 0.0008   Epoch: 7   Global Step: 13810   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:42:08,913-Speed 9409.16 samples/sec   Loss 3.5273   LearningRate 0.0008   Epoch: 7   Global Step: 13820   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:43:28,504-Speed 3087.83 samples/sec   Loss 3.4699   LearningRate 0.0008   Epoch: 8   Global Step: 13830   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:43:54,428-Speed 9480.49 samples/sec   Loss 3.4148   LearningRate 0.0008   Epoch: 8   Global Step: 13840   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:44:20,498-Speed 9427.46 samples/sec   Loss 3.4215   LearningRate 0.0008   Epoch: 8   Global Step: 13850   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:44:46,522-Speed 9444.12 samples/sec   Loss 3.4434   LearningRate 0.0008   Epoch: 8   Global Step: 13860   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:45:12,549-Speed 9442.85 samples/sec   Loss 3.4109   LearningRate 0.0008   Epoch: 8   Global Step: 13870   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-03-05 06:45:38,549-Speed 9452.97 samples/sec   Loss 3.4136   LearningRate 0.0008   Epoch: 8   Global Step: 13880   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-03-05 06:46:04,525-Speed 9461.37 samples/sec   Loss 3.4277   LearningRate 0.0008   Epoch: 8   Global Step: 13890   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:46:30,615-Speed 9420.00 samples/sec   Loss 3.4242   LearningRate 0.0008   Epoch: 8   Global Step: 13900   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:46:56,751-Speed 9403.43 samples/sec   Loss 3.4039   LearningRate 0.0008   Epoch: 8   Global Step: 13910   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:47:22,865-Speed 9411.53 samples/sec   Loss 3.4109   LearningRate 0.0008   Epoch: 8   Global Step: 13920   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:47:48,946-Speed 9422.99 samples/sec   Loss 3.4259   LearningRate 0.0008   Epoch: 8   Global Step: 13930   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:48:15,099-Speed 9397.51 samples/sec   Loss 3.4072   LearningRate 0.0008   Epoch: 8   Global Step: 13940   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:48:41,285-Speed 9386.38 samples/sec   Loss 3.4168   LearningRate 0.0008   Epoch: 8   Global Step: 13950   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:49:07,380-Speed 9418.28 samples/sec   Loss 3.4595   LearningRate 0.0008   Epoch: 8   Global Step: 13960   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-03-05 06:49:33,511-Speed 9405.22 samples/sec   Loss 3.4217   LearningRate 0.0008   Epoch: 8   Global Step: 13970   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-03-05 06:49:59,665-Speed 9396.67 samples/sec   Loss 3.4276   LearningRate 0.0008   Epoch: 8   Global Step: 13980   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-03-05 06:50:25,888-Speed 9372.59 samples/sec   Loss 3.4061   LearningRate 0.0008   Epoch: 8   Global Step: 13990   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-03-05 06:50:52,254-Speed 9321.47 samples/sec   Loss 3.4131   LearningRate 0.0008   Epoch: 8   Global Step: 14000   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-03-05 06:51:18,495-Speed 9365.83 samples/sec   Loss 3.4156   LearningRate 0.0008   Epoch: 8   Global Step: 14010   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-03-05 06:51:44,722-Speed 9370.84 samples/sec   Loss 3.4036   LearningRate 0.0008   Epoch: 8   Global Step: 14020   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-03-05 06:52:10,934-Speed 9376.41 samples/sec   Loss 3.4135   LearningRate 0.0008   Epoch: 8   Global Step: 14030   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-03-05 06:52:37,040-Speed 9414.22 samples/sec   Loss 3.4132   LearningRate 0.0008   Epoch: 8   Global Step: 14040   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-03-05 06:53:03,268-Speed 9370.74 samples/sec   Loss 3.4121   LearningRate 0.0008   Epoch: 8   Global Step: 14050   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-03-05 06:53:29,503-Speed 9367.71 samples/sec   Loss 3.4257   LearningRate 0.0008   Epoch: 8   Global Step: 14060   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-03-05 06:53:55,657-Speed 9396.92 samples/sec   Loss 3.4334   LearningRate 0.0008   Epoch: 8   Global Step: 14070   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-03-05 06:54:21,867-Speed 9377.07 samples/sec   Loss 3.4208   LearningRate 0.0008   Epoch: 8   Global Step: 14080   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-03-05 06:54:48,071-Speed 9379.27 samples/sec   Loss 3.3963   LearningRate 0.0008   Epoch: 8   Global Step: 14090   Fp16 Grad Scale: 16384   Required: 41 hours
Training: 2022-03-05 06:55:14,329-Speed 9359.81 samples/sec   Loss 3.4024   LearningRate 0.0008   Epoch: 8   Global Step: 14100   Fp16 Grad Scale: 16384   Required: 41 hours
Training: 2022-03-05 06:55:40,606-Speed 9352.87 samples/sec   Loss 3.4206   LearningRate 0.0008   Epoch: 8   Global Step: 14110   Fp16 Grad Scale: 16384   Required: 41 hours
Training: 2022-03-05 06:56:06,972-Speed 9321.75 samples/sec   Loss 3.4153   LearningRate 0.0008   Epoch: 8   Global Step: 14120   Fp16 Grad Scale: 16384   Required: 41 hours
Training: 2022-03-05 06:56:33,220-Speed 9363.44 samples/sec   Loss 3.3754   LearningRate 0.0008   Epoch: 8   Global Step: 14130   Fp16 Grad Scale: 16384   Required: 41 hours
Training: 2022-03-05 06:56:59,470-Speed 9362.56 samples/sec   Loss 3.3855   LearningRate 0.0008   Epoch: 8   Global Step: 14140   Fp16 Grad Scale: 16384   Required: 41 hours
Training: 2022-03-05 06:57:25,794-Speed 9336.53 samples/sec   Loss 3.3989   LearningRate 0.0008   Epoch: 8   Global Step: 14150   Fp16 Grad Scale: 16384   Required: 41 hours
Training: 2022-03-05 06:57:52,119-Speed 9335.93 samples/sec   Loss 3.3955   LearningRate 0.0008   Epoch: 8   Global Step: 14160   Fp16 Grad Scale: 16384   Required: 41 hours
Training: 2022-03-05 06:58:18,350-Speed 9369.52 samples/sec   Loss 3.4185   LearningRate 0.0008   Epoch: 8   Global Step: 14170   Fp16 Grad Scale: 16384   Required: 41 hours
Training: 2022-03-05 06:58:44,644-Speed 9347.19 samples/sec   Loss 3.3853   LearningRate 0.0008   Epoch: 8   Global Step: 14180   Fp16 Grad Scale: 16384   Required: 41 hours
Training: 2022-03-05 06:59:10,862-Speed 9374.23 samples/sec   Loss 3.4351   LearningRate 0.0008   Epoch: 8   Global Step: 14190   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-03-05 06:59:37,219-Speed 9324.52 samples/sec   Loss 3.4059   LearningRate 0.0008   Epoch: 8   Global Step: 14200   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-03-05 07:00:03,592-Speed 9318.87 samples/sec   Loss 3.3777   LearningRate 0.0008   Epoch: 8   Global Step: 14210   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-03-05 07:00:29,775-Speed 9386.89 samples/sec   Loss 3.3784   LearningRate 0.0008   Epoch: 8   Global Step: 14220   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-03-05 07:00:56,021-Speed 9364.13 samples/sec   Loss 3.4086   LearningRate 0.0008   Epoch: 8   Global Step: 14230   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-03-05 07:01:22,232-Speed 9376.38 samples/sec   Loss 3.3972   LearningRate 0.0008   Epoch: 8   Global Step: 14240   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-03-05 07:01:48,495-Speed 9358.25 samples/sec   Loss 3.4212   LearningRate 0.0008   Epoch: 8   Global Step: 14250   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-03-05 07:02:14,735-Speed 9366.33 samples/sec   Loss 3.3728   LearningRate 0.0008   Epoch: 8   Global Step: 14260   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-03-05 07:02:40,960-Speed 9371.48 samples/sec   Loss 3.3756   LearningRate 0.0008   Epoch: 8   Global Step: 14270   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-03-05 07:03:07,243-Speed 9351.07 samples/sec   Loss 3.3556   LearningRate 0.0008   Epoch: 8   Global Step: 14280   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-03-05 07:03:33,509-Speed 9356.91 samples/sec   Loss 3.3547   LearningRate 0.0008   Epoch: 8   Global Step: 14290   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-03-05 07:03:59,841-Speed 9333.71 samples/sec   Loss 3.3628   LearningRate 0.0008   Epoch: 8   Global Step: 14300   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-03-05 07:04:26,139-Speed 9345.54 samples/sec   Loss 3.3461   LearningRate 0.0008   Epoch: 8   Global Step: 14310   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-03-05 07:04:52,341-Speed 9380.09 samples/sec   Loss 3.3653   LearningRate 0.0008   Epoch: 8   Global Step: 14320   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-03-05 07:05:18,469-Speed 9406.25 samples/sec   Loss 3.3367   LearningRate 0.0008   Epoch: 8   Global Step: 14330   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-03-05 07:05:44,755-Speed 9349.74 samples/sec   Loss 3.3870   LearningRate 0.0008   Epoch: 8   Global Step: 14340   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-03-05 07:06:10,997-Speed 9365.71 samples/sec   Loss 3.3877   LearningRate 0.0008   Epoch: 8   Global Step: 14350   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-03-05 07:06:37,294-Speed 9346.05 samples/sec   Loss 3.3529   LearningRate 0.0008   Epoch: 8   Global Step: 14360   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-03-05 07:07:03,502-Speed 9377.80 samples/sec   Loss 3.3236   LearningRate 0.0008   Epoch: 8   Global Step: 14370   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-03-05 07:07:29,808-Speed 9342.68 samples/sec   Loss 3.3520   LearningRate 0.0008   Epoch: 8   Global Step: 14380   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-03-05 07:07:56,126-Speed 9338.77 samples/sec   Loss 3.3409   LearningRate 0.0008   Epoch: 8   Global Step: 14390   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-03-05 07:08:22,347-Speed 9373.16 samples/sec   Loss 3.3367   LearningRate 0.0008   Epoch: 8   Global Step: 14400   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-03-05 07:08:48,538-Speed 9383.68 samples/sec   Loss 3.3466   LearningRate 0.0008   Epoch: 8   Global Step: 14410   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-03-05 07:09:14,845-Speed 9342.81 samples/sec   Loss 3.3640   LearningRate 0.0008   Epoch: 8   Global Step: 14420   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-03-05 07:09:41,176-Speed 9333.82 samples/sec   Loss 3.3390   LearningRate 0.0008   Epoch: 8   Global Step: 14430   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-03-05 07:10:07,527-Speed 9326.99 samples/sec   Loss 3.3688   LearningRate 0.0008   Epoch: 8   Global Step: 14440   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-03-05 07:10:33,883-Speed 9325.13 samples/sec   Loss 3.3279   LearningRate 0.0008   Epoch: 8   Global Step: 14450   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-03-05 07:11:00,276-Speed 9311.92 samples/sec   Loss 3.3196   LearningRate 0.0008   Epoch: 8   Global Step: 14460   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-03-05 07:11:26,568-Speed 9347.66 samples/sec   Loss 3.3353   LearningRate 0.0008   Epoch: 8   Global Step: 14470   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-03-05 07:11:52,846-Speed 9352.80 samples/sec   Loss 3.2998   LearningRate 0.0008   Epoch: 8   Global Step: 14480   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-03-05 07:12:19,064-Speed 9373.97 samples/sec   Loss 3.3514   LearningRate 0.0008   Epoch: 8   Global Step: 14490   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-03-05 07:12:45,375-Speed 9341.10 samples/sec   Loss 3.3211   LearningRate 0.0008   Epoch: 8   Global Step: 14500   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-03-05 07:13:11,553-Speed 9388.58 samples/sec   Loss 3.3354   LearningRate 0.0008   Epoch: 8   Global Step: 14510   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-03-05 07:13:37,817-Speed 9358.16 samples/sec   Loss 3.3786   LearningRate 0.0008   Epoch: 8   Global Step: 14520   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-03-05 07:14:04,024-Speed 9378.11 samples/sec   Loss 3.3333   LearningRate 0.0008   Epoch: 8   Global Step: 14530   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-03-05 07:14:30,274-Speed 9362.68 samples/sec   Loss 3.3116   LearningRate 0.0008   Epoch: 8   Global Step: 14540   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-03-05 07:14:56,548-Speed 9353.89 samples/sec   Loss 3.2889   LearningRate 0.0008   Epoch: 8   Global Step: 14550   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-03-05 07:15:22,753-Speed 9379.07 samples/sec   Loss 3.2894   LearningRate 0.0008   Epoch: 8   Global Step: 14560   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-03-05 07:15:49,113-Speed 9323.95 samples/sec   Loss 3.2919   LearningRate 0.0008   Epoch: 8   Global Step: 14570   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-03-05 07:16:15,331-Speed 9374.22 samples/sec   Loss 3.3368   LearningRate 0.0008   Epoch: 8   Global Step: 14580   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-03-05 07:16:41,481-Speed 9398.42 samples/sec   Loss 3.3319   LearningRate 0.0008   Epoch: 8   Global Step: 14590   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-03-05 07:17:07,594-Speed 9411.76 samples/sec   Loss 3.3328   LearningRate 0.0008   Epoch: 8   Global Step: 14600   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-03-05 07:17:33,792-Speed 9381.52 samples/sec   Loss 3.3243   LearningRate 0.0008   Epoch: 8   Global Step: 14610   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-03-05 07:17:59,923-Speed 9405.33 samples/sec   Loss 3.3287   LearningRate 0.0008   Epoch: 8   Global Step: 14620   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-03-05 07:18:26,173-Speed 9362.89 samples/sec   Loss 3.3330   LearningRate 0.0008   Epoch: 8   Global Step: 14630   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-03-05 07:18:52,281-Speed 9413.64 samples/sec   Loss 3.2714   LearningRate 0.0008   Epoch: 8   Global Step: 14640   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-03-05 07:19:18,377-Speed 9418.10 samples/sec   Loss 3.2877   LearningRate 0.0008   Epoch: 8   Global Step: 14650   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-03-05 07:19:44,553-Speed 9389.22 samples/sec   Loss 3.2963   LearningRate 0.0008   Epoch: 8   Global Step: 14660   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-03-05 07:20:10,694-Speed 9401.52 samples/sec   Loss 3.2704   LearningRate 0.0008   Epoch: 8   Global Step: 14670   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-03-05 07:20:36,832-Speed 9403.00 samples/sec   Loss 3.2906   LearningRate 0.0008   Epoch: 8   Global Step: 14680   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-03-05 07:21:02,977-Speed 9400.18 samples/sec   Loss 3.2997   LearningRate 0.0008   Epoch: 8   Global Step: 14690   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-03-05 07:21:29,076-Speed 9416.96 samples/sec   Loss 3.2867   LearningRate 0.0008   Epoch: 8   Global Step: 14700   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-03-05 07:21:55,229-Speed 9397.28 samples/sec   Loss 3.2635   LearningRate 0.0008   Epoch: 8   Global Step: 14710   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-03-05 07:22:21,420-Speed 9384.04 samples/sec   Loss 3.2727   LearningRate 0.0008   Epoch: 8   Global Step: 14720   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-03-05 07:22:47,520-Speed 9416.32 samples/sec   Loss 3.2823   LearningRate 0.0008   Epoch: 8   Global Step: 14730   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-03-05 07:23:13,666-Speed 9400.21 samples/sec   Loss 3.3053   LearningRate 0.0008   Epoch: 8   Global Step: 14740   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-03-05 07:23:39,861-Speed 9383.02 samples/sec   Loss 3.3116   LearningRate 0.0008   Epoch: 8   Global Step: 14750   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-03-05 07:24:06,007-Speed 9399.96 samples/sec   Loss 3.2857   LearningRate 0.0008   Epoch: 8   Global Step: 14760   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-03-05 07:24:32,116-Speed 9413.36 samples/sec   Loss 3.2849   LearningRate 0.0008   Epoch: 8   Global Step: 14770   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-03-05 07:24:58,221-Speed 9415.01 samples/sec   Loss 3.2729   LearningRate 0.0008   Epoch: 8   Global Step: 14780   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-03-05 07:25:24,326-Speed 9414.70 samples/sec   Loss 3.2468   LearningRate 0.0008   Epoch: 8   Global Step: 14790   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-03-05 07:25:50,400-Speed 9426.04 samples/sec   Loss 3.2588   LearningRate 0.0008   Epoch: 8   Global Step: 14800   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-03-05 07:26:16,535-Speed 9404.16 samples/sec   Loss 3.2524   LearningRate 0.0008   Epoch: 8   Global Step: 14810   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-03-05 07:26:42,705-Speed 9391.34 samples/sec   Loss 3.2882   LearningRate 0.0008   Epoch: 8   Global Step: 14820   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-03-05 07:27:08,825-Speed 9409.26 samples/sec   Loss 3.2802   LearningRate 0.0008   Epoch: 8   Global Step: 14830   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-03-05 07:27:34,960-Speed 9403.83 samples/sec   Loss 3.2593   LearningRate 0.0008   Epoch: 8   Global Step: 14840   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-03-05 07:28:01,101-Speed 9401.93 samples/sec   Loss 3.2722   LearningRate 0.0008   Epoch: 8   Global Step: 14850   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-03-05 07:28:27,246-Speed 9400.36 samples/sec   Loss 3.2574   LearningRate 0.0008   Epoch: 8   Global Step: 14860   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-03-05 07:28:53,493-Speed 9363.78 samples/sec   Loss 3.2720   LearningRate 0.0008   Epoch: 8   Global Step: 14870   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-03-05 07:29:19,634-Speed 9401.95 samples/sec   Loss 3.2669   LearningRate 0.0008   Epoch: 8   Global Step: 14880   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-03-05 07:29:45,935-Speed 9344.51 samples/sec   Loss 3.2746   LearningRate 0.0008   Epoch: 8   Global Step: 14890   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-03-05 07:30:12,081-Speed 9400.05 samples/sec   Loss 3.2458   LearningRate 0.0008   Epoch: 8   Global Step: 14900   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-03-05 07:30:38,244-Speed 9393.84 samples/sec   Loss 3.2525   LearningRate 0.0008   Epoch: 8   Global Step: 14910   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-03-05 07:31:04,345-Speed 9415.99 samples/sec   Loss 3.2942   LearningRate 0.0008   Epoch: 8   Global Step: 14920   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-03-05 07:31:30,511-Speed 9392.90 samples/sec   Loss 3.2507   LearningRate 0.0008   Epoch: 8   Global Step: 14930   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-03-05 07:31:56,609-Speed 9417.11 samples/sec   Loss 3.2510   LearningRate 0.0008   Epoch: 8   Global Step: 14940   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-03-05 07:32:22,692-Speed 9422.60 samples/sec   Loss 3.2242   LearningRate 0.0008   Epoch: 8   Global Step: 14950   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-03-05 07:32:48,862-Speed 9391.52 samples/sec   Loss 3.2485   LearningRate 0.0008   Epoch: 8   Global Step: 14960   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-03-05 07:33:14,990-Speed 9406.35 samples/sec   Loss 3.2569   LearningRate 0.0008   Epoch: 8   Global Step: 14970   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-03-05 07:33:41,191-Speed 9380.39 samples/sec   Loss 3.2587   LearningRate 0.0008   Epoch: 8   Global Step: 14980   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-03-05 07:34:07,439-Speed 9363.46 samples/sec   Loss 3.2562   LearningRate 0.0008   Epoch: 8   Global Step: 14990   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-03-05 07:34:33,527-Speed 9420.59 samples/sec   Loss 3.2714   LearningRate 0.0008   Epoch: 8   Global Step: 15000   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-03-05 07:34:59,702-Speed 9389.86 samples/sec   Loss 3.2206   LearningRate 0.0008   Epoch: 8   Global Step: 15010   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-03-05 07:35:25,881-Speed 9387.87 samples/sec   Loss 3.2320   LearningRate 0.0008   Epoch: 8   Global Step: 15020   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-03-05 07:35:52,098-Speed 9374.19 samples/sec   Loss 3.2518   LearningRate 0.0008   Epoch: 8   Global Step: 15030   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-03-05 07:36:18,189-Speed 9419.81 samples/sec   Loss 3.2321   LearningRate 0.0008   Epoch: 8   Global Step: 15040   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-03-05 07:36:44,243-Speed 9433.04 samples/sec   Loss 3.2069   LearningRate 0.0008   Epoch: 8   Global Step: 15050   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-03-05 07:37:10,399-Speed 9396.23 samples/sec   Loss 3.2311   LearningRate 0.0008   Epoch: 8   Global Step: 15060   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-03-05 07:37:36,560-Speed 9394.66 samples/sec   Loss 3.2498   LearningRate 0.0008   Epoch: 8   Global Step: 15070   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-03-05 07:38:02,659-Speed 9416.70 samples/sec   Loss 3.2269   LearningRate 0.0008   Epoch: 8   Global Step: 15080   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-03-05 07:38:28,809-Speed 9398.89 samples/sec   Loss 3.2257   LearningRate 0.0008   Epoch: 8   Global Step: 15090   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-03-05 07:38:55,001-Speed 9383.18 samples/sec   Loss 3.1937   LearningRate 0.0008   Epoch: 8   Global Step: 15100   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-03-05 07:39:21,295-Speed 9347.11 samples/sec   Loss 3.1960   LearningRate 0.0008   Epoch: 8   Global Step: 15110   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-03-05 07:39:47,529-Speed 9368.28 samples/sec   Loss 3.2047   LearningRate 0.0008   Epoch: 8   Global Step: 15120   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-03-05 07:40:13,737-Speed 9377.94 samples/sec   Loss 3.2201   LearningRate 0.0008   Epoch: 8   Global Step: 15130   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-03-05 07:40:39,962-Speed 9371.62 samples/sec   Loss 3.2234   LearningRate 0.0008   Epoch: 8   Global Step: 15140   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-03-05 07:41:06,171-Speed 9377.38 samples/sec   Loss 3.1935   LearningRate 0.0008   Epoch: 8   Global Step: 15150   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-03-05 07:41:32,423-Speed 9361.95 samples/sec   Loss 3.1971   LearningRate 0.0008   Epoch: 8   Global Step: 15160   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-03-05 07:41:58,594-Speed 9391.13 samples/sec   Loss 3.2164   LearningRate 0.0008   Epoch: 8   Global Step: 15170   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-03-05 07:42:24,756-Speed 9394.21 samples/sec   Loss 3.2310   LearningRate 0.0008   Epoch: 8   Global Step: 15180   Fp16 Grad Scale: 262144   Required: 40 hours
Training: 2022-03-05 07:42:50,919-Speed 9393.66 samples/sec   Loss 3.1851   LearningRate 0.0008   Epoch: 8   Global Step: 15190   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-03-05 07:43:17,259-Speed 9330.63 samples/sec   Loss 3.1924   LearningRate 0.0008   Epoch: 8   Global Step: 15200   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-03-05 07:43:43,533-Speed 9354.32 samples/sec   Loss 3.1934   LearningRate 0.0008   Epoch: 8   Global Step: 15210   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-03-05 07:44:09,744-Speed 9376.80 samples/sec   Loss 3.2017   LearningRate 0.0008   Epoch: 8   Global Step: 15220   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-03-05 07:44:35,935-Speed 9383.82 samples/sec   Loss 3.1937   LearningRate 0.0008   Epoch: 8   Global Step: 15230   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-03-05 07:45:02,130-Speed 9382.39 samples/sec   Loss 3.1985   LearningRate 0.0008   Epoch: 8   Global Step: 15240   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-03-05 07:45:28,345-Speed 9375.28 samples/sec   Loss 3.1966   LearningRate 0.0007   Epoch: 8   Global Step: 15250   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-03-05 07:45:54,585-Speed 9366.02 samples/sec   Loss 3.2228   LearningRate 0.0007   Epoch: 8   Global Step: 15260   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-03-05 07:46:20,699-Speed 9411.50 samples/sec   Loss 3.1849   LearningRate 0.0007   Epoch: 8   Global Step: 15270   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-03-05 07:46:46,853-Speed 9396.88 samples/sec   Loss 3.2044   LearningRate 0.0007   Epoch: 8   Global Step: 15280   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-03-05 07:47:13,061-Speed 9378.00 samples/sec   Loss 3.1780   LearningRate 0.0007   Epoch: 8   Global Step: 15290   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-03-05 07:47:39,211-Speed 9398.54 samples/sec   Loss 3.1843   LearningRate 0.0007   Epoch: 8   Global Step: 15300   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-03-05 07:48:05,343-Speed 9405.11 samples/sec   Loss 3.2191   LearningRate 0.0007   Epoch: 8   Global Step: 15310   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-03-05 07:48:31,611-Speed 9356.42 samples/sec   Loss 3.2112   LearningRate 0.0007   Epoch: 8   Global Step: 15320   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-03-05 07:48:57,876-Speed 9357.35 samples/sec   Loss 3.1943   LearningRate 0.0007   Epoch: 8   Global Step: 15330   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-03-05 07:49:24,037-Speed 9394.53 samples/sec   Loss 3.1756   LearningRate 0.0007   Epoch: 8   Global Step: 15340   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-03-05 07:49:50,178-Speed 9402.05 samples/sec   Loss 3.1782   LearningRate 0.0007   Epoch: 8   Global Step: 15350   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-03-05 07:50:16,373-Speed 9382.34 samples/sec   Loss 3.1768   LearningRate 0.0007   Epoch: 8   Global Step: 15360   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-03-05 07:50:42,644-Speed 9355.07 samples/sec   Loss 3.1806   LearningRate 0.0007   Epoch: 8   Global Step: 15370   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-03-05 07:51:08,823-Speed 9388.04 samples/sec   Loss 3.1735   LearningRate 0.0007   Epoch: 8   Global Step: 15380   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-03-05 07:51:35,000-Speed 9388.67 samples/sec   Loss 3.1783   LearningRate 0.0007   Epoch: 8   Global Step: 15390   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-03-05 07:52:01,199-Speed 9381.00 samples/sec   Loss 3.2170   LearningRate 0.0007   Epoch: 8   Global Step: 15400   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-03-05 07:52:27,339-Speed 9402.25 samples/sec   Loss 3.1750   LearningRate 0.0007   Epoch: 8   Global Step: 15410   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-03-05 07:52:53,517-Speed 9388.18 samples/sec   Loss 3.1415   LearningRate 0.0007   Epoch: 8   Global Step: 15420   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-03-05 07:53:19,687-Speed 9391.38 samples/sec   Loss 3.1730   LearningRate 0.0007   Epoch: 8   Global Step: 15430   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-03-05 07:53:45,837-Speed 9398.63 samples/sec   Loss 3.1888   LearningRate 0.0007   Epoch: 8   Global Step: 15440   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-03-05 07:54:11,990-Speed 9397.40 samples/sec   Loss 3.1697   LearningRate 0.0007   Epoch: 8   Global Step: 15450   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-03-05 07:54:38,173-Speed 9386.44 samples/sec   Loss 3.1863   LearningRate 0.0007   Epoch: 8   Global Step: 15460   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-03-05 07:55:04,361-Speed 9385.06 samples/sec   Loss 3.1882   LearningRate 0.0007   Epoch: 8   Global Step: 15470   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-03-05 07:55:30,475-Speed 9411.44 samples/sec   Loss 3.1687   LearningRate 0.0007   Epoch: 8   Global Step: 15480   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-03-05 07:55:56,635-Speed 9394.95 samples/sec   Loss 3.1669   LearningRate 0.0007   Epoch: 8   Global Step: 15490   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-03-05 07:56:22,822-Speed 9385.12 samples/sec   Loss 3.1795   LearningRate 0.0007   Epoch: 8   Global Step: 15500   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-03-05 07:56:49,128-Speed 9342.72 samples/sec   Loss 3.1821   LearningRate 0.0007   Epoch: 8   Global Step: 15510   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-03-05 07:57:15,326-Speed 9381.35 samples/sec   Loss 3.2000   LearningRate 0.0007   Epoch: 8   Global Step: 15520   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-03-05 07:57:41,532-Speed 9378.24 samples/sec   Loss 3.1848   LearningRate 0.0007   Epoch: 8   Global Step: 15530   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-03-05 07:58:07,774-Speed 9366.01 samples/sec   Loss 3.1626   LearningRate 0.0007   Epoch: 8   Global Step: 15540   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-03-05 07:58:33,959-Speed 9385.95 samples/sec   Loss 3.2091   LearningRate 0.0007   Epoch: 8   Global Step: 15550   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-03-05 07:59:53,376-Speed 3094.61 samples/sec   Loss 3.1812   LearningRate 0.0007   Epoch: 9   Global Step: 15560   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-03-05 08:00:19,323-Speed 9472.12 samples/sec   Loss 3.1202   LearningRate 0.0007   Epoch: 9   Global Step: 15570   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-03-05 08:00:45,460-Speed 9403.40 samples/sec   Loss 3.0903   LearningRate 0.0007   Epoch: 9   Global Step: 15580   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:01:11,599-Speed 9402.33 samples/sec   Loss 3.1231   LearningRate 0.0007   Epoch: 9   Global Step: 15590   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:01:37,671-Speed 9426.83 samples/sec   Loss 3.1271   LearningRate 0.0007   Epoch: 9   Global Step: 15600   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:02:03,790-Speed 9409.48 samples/sec   Loss 3.1007   LearningRate 0.0007   Epoch: 9   Global Step: 15610   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:02:29,843-Speed 9433.42 samples/sec   Loss 3.1237   LearningRate 0.0007   Epoch: 9   Global Step: 15620   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:02:55,949-Speed 9414.59 samples/sec   Loss 3.0881   LearningRate 0.0007   Epoch: 9   Global Step: 15630   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:03:22,085-Speed 9403.24 samples/sec   Loss 3.1384   LearningRate 0.0007   Epoch: 9   Global Step: 15640   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:03:48,248-Speed 9393.89 samples/sec   Loss 3.1171   LearningRate 0.0007   Epoch: 9   Global Step: 15650   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:04:14,303-Speed 9432.98 samples/sec   Loss 3.1397   LearningRate 0.0007   Epoch: 9   Global Step: 15660   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:04:40,396-Speed 9418.80 samples/sec   Loss 3.1069   LearningRate 0.0007   Epoch: 9   Global Step: 15670   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-03-05 08:05:06,482-Speed 9421.78 samples/sec   Loss 3.1185   LearningRate 0.0007   Epoch: 9   Global Step: 15680   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-03-05 08:05:32,513-Speed 9441.30 samples/sec   Loss 3.1532   LearningRate 0.0007   Epoch: 9   Global Step: 15690   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:05:58,553-Speed 9438.32 samples/sec   Loss 3.1230   LearningRate 0.0007   Epoch: 9   Global Step: 15700   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:06:24,554-Speed 9452.54 samples/sec   Loss 3.1192   LearningRate 0.0007   Epoch: 9   Global Step: 15710   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:06:50,639-Speed 9422.18 samples/sec   Loss 3.1188   LearningRate 0.0007   Epoch: 9   Global Step: 15720   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:07:16,716-Speed 9424.63 samples/sec   Loss 3.1216   LearningRate 0.0007   Epoch: 9   Global Step: 15730   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:07:42,814-Speed 9417.40 samples/sec   Loss 3.1094   LearningRate 0.0007   Epoch: 9   Global Step: 15740   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:08:09,010-Speed 9382.03 samples/sec   Loss 3.1319   LearningRate 0.0007   Epoch: 9   Global Step: 15750   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:08:35,103-Speed 9419.25 samples/sec   Loss 3.1117   LearningRate 0.0007   Epoch: 9   Global Step: 15760   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:09:01,259-Speed 9396.46 samples/sec   Loss 3.1127   LearningRate 0.0007   Epoch: 9   Global Step: 15770   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:09:27,378-Speed 9409.75 samples/sec   Loss 3.1381   LearningRate 0.0007   Epoch: 9   Global Step: 15780   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:09:53,509-Speed 9405.76 samples/sec   Loss 3.0858   LearningRate 0.0007   Epoch: 9   Global Step: 15790   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-03-05 08:10:19,664-Speed 9396.84 samples/sec   Loss 3.1102   LearningRate 0.0007   Epoch: 9   Global Step: 15800   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-03-05 08:10:45,819-Speed 9396.57 samples/sec   Loss 3.0933   LearningRate 0.0007   Epoch: 9   Global Step: 15810   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-03-05 08:11:12,098-Speed 9352.71 samples/sec   Loss 3.1239   LearningRate 0.0007   Epoch: 9   Global Step: 15820   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-03-05 08:11:38,306-Speed 9377.83 samples/sec   Loss 3.1005   LearningRate 0.0007   Epoch: 9   Global Step: 15830   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-03-05 08:12:04,478-Speed 9390.63 samples/sec   Loss 3.1120   LearningRate 0.0007   Epoch: 9   Global Step: 15840   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-03-05 08:12:30,624-Speed 9399.62 samples/sec   Loss 3.1011   LearningRate 0.0007   Epoch: 9   Global Step: 15850   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:12:56,796-Speed 9391.07 samples/sec   Loss 3.1059   LearningRate 0.0007   Epoch: 9   Global Step: 15860   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:13:22,913-Speed 9410.49 samples/sec   Loss 3.0949   LearningRate 0.0007   Epoch: 9   Global Step: 15870   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:13:49,071-Speed 9395.44 samples/sec   Loss 3.0945   LearningRate 0.0007   Epoch: 9   Global Step: 15880   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:14:15,287-Speed 9374.96 samples/sec   Loss 3.1159   LearningRate 0.0007   Epoch: 9   Global Step: 15890   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:14:41,526-Speed 9366.97 samples/sec   Loss 3.1021   LearningRate 0.0007   Epoch: 9   Global Step: 15900   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:15:07,679-Speed 9397.63 samples/sec   Loss 3.1105   LearningRate 0.0007   Epoch: 9   Global Step: 15910   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:15:33,916-Speed 9367.08 samples/sec   Loss 3.1044   LearningRate 0.0007   Epoch: 9   Global Step: 15920   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:16:00,089-Speed 9390.00 samples/sec   Loss 3.0999   LearningRate 0.0007   Epoch: 9   Global Step: 15930   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:16:26,259-Speed 9391.50 samples/sec   Loss 3.0826   LearningRate 0.0007   Epoch: 9   Global Step: 15940   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:16:52,364-Speed 9414.76 samples/sec   Loss 3.0919   LearningRate 0.0007   Epoch: 9   Global Step: 15950   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-03-05 08:17:18,488-Speed 9407.86 samples/sec   Loss 3.0644   LearningRate 0.0007   Epoch: 9   Global Step: 15960   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-03-05 08:17:44,677-Speed 9384.57 samples/sec   Loss 3.0652   LearningRate 0.0007   Epoch: 9   Global Step: 15970   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-03-05 08:18:10,883-Speed 9378.27 samples/sec   Loss 3.0758   LearningRate 0.0007   Epoch: 9   Global Step: 15980   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-03-05 08:18:37,068-Speed 9385.87 samples/sec   Loss 3.0803   LearningRate 0.0007   Epoch: 9   Global Step: 15990   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-03-05 08:19:03,264-Speed 9382.28 samples/sec   Loss 3.0964   LearningRate 0.0007   Epoch: 9   Global Step: 16000   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-03-05 08:19:29,484-Speed 9373.30 samples/sec   Loss 3.1086   LearningRate 0.0007   Epoch: 9   Global Step: 16010   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-03-05 08:19:55,714-Speed 9369.93 samples/sec   Loss 3.0965   LearningRate 0.0007   Epoch: 9   Global Step: 16020   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-03-05 08:20:22,026-Speed 9340.77 samples/sec   Loss 3.0666   LearningRate 0.0007   Epoch: 9   Global Step: 16030   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-03-05 08:20:48,155-Speed 9405.82 samples/sec   Loss 3.0581   LearningRate 0.0007   Epoch: 9   Global Step: 16040   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:21:14,363-Speed 9377.86 samples/sec   Loss 3.0784   LearningRate 0.0007   Epoch: 9   Global Step: 16050   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:21:40,661-Speed 9345.60 samples/sec   Loss 3.0557   LearningRate 0.0007   Epoch: 9   Global Step: 16060   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:22:06,841-Speed 9387.88 samples/sec   Loss 3.0783   LearningRate 0.0007   Epoch: 9   Global Step: 16070   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:22:33,103-Speed 9358.39 samples/sec   Loss 3.1065   LearningRate 0.0007   Epoch: 9   Global Step: 16080   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:22:59,341-Speed 9367.05 samples/sec   Loss 3.0866   LearningRate 0.0007   Epoch: 9   Global Step: 16090   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:23:25,574-Speed 9368.78 samples/sec   Loss 3.0569   LearningRate 0.0007   Epoch: 9   Global Step: 16100   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:23:51,815-Speed 9366.02 samples/sec   Loss 3.0559   LearningRate 0.0007   Epoch: 9   Global Step: 16110   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:24:17,969-Speed 9396.92 samples/sec   Loss 3.0299   LearningRate 0.0007   Epoch: 9   Global Step: 16120   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:24:44,240-Speed 9355.34 samples/sec   Loss 3.0363   LearningRate 0.0007   Epoch: 9   Global Step: 16130   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:25:10,349-Speed 9413.33 samples/sec   Loss 3.0381   LearningRate 0.0007   Epoch: 9   Global Step: 16140   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-03-05 08:25:36,643-Speed 9347.20 samples/sec   Loss 3.0678   LearningRate 0.0007   Epoch: 9   Global Step: 16150   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-03-05 08:26:02,911-Speed 9356.59 samples/sec   Loss 3.0430   LearningRate 0.0007   Epoch: 9   Global Step: 16160   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-03-05 08:26:29,017-Speed 9414.32 samples/sec   Loss 3.0750   LearningRate 0.0007   Epoch: 9   Global Step: 16170   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-03-05 08:26:55,164-Speed 9399.90 samples/sec   Loss 3.0746   LearningRate 0.0007   Epoch: 9   Global Step: 16180   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-03-05 08:27:21,277-Speed 9411.66 samples/sec   Loss 3.0477   LearningRate 0.0007   Epoch: 9   Global Step: 16190   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-03-05 08:27:47,402-Speed 9407.62 samples/sec   Loss 3.0435   LearningRate 0.0007   Epoch: 9   Global Step: 16200   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-03-05 08:28:13,555-Speed 9397.52 samples/sec   Loss 3.0406   LearningRate 0.0007   Epoch: 9   Global Step: 16210   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-03-05 08:28:39,764-Speed 9377.13 samples/sec   Loss 3.0599   LearningRate 0.0007   Epoch: 9   Global Step: 16220   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-03-05 08:29:05,942-Speed 9388.52 samples/sec   Loss 3.0609   LearningRate 0.0007   Epoch: 9   Global Step: 16230   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-03-05 08:29:32,026-Speed 9422.25 samples/sec   Loss 3.0360   LearningRate 0.0007   Epoch: 9   Global Step: 16240   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-03-05 08:29:58,117-Speed 9419.79 samples/sec   Loss 3.0184   LearningRate 0.0007   Epoch: 9   Global Step: 16250   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-03-05 08:30:24,219-Speed 9415.80 samples/sec   Loss 3.0477   LearningRate 0.0007   Epoch: 9   Global Step: 16260   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-03-05 08:30:50,384-Speed 9392.90 samples/sec   Loss 3.0331   LearningRate 0.0007   Epoch: 9   Global Step: 16270   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-03-05 08:31:16,571-Speed 9385.45 samples/sec   Loss 3.0304   LearningRate 0.0007   Epoch: 9   Global Step: 16280   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-03-05 08:31:42,729-Speed 9395.69 samples/sec   Loss 3.0245   LearningRate 0.0007   Epoch: 9   Global Step: 16290   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-03-05 08:32:08,888-Speed 9395.36 samples/sec   Loss 3.0443   LearningRate 0.0007   Epoch: 9   Global Step: 16300   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-03-05 08:32:34,950-Speed 9430.21 samples/sec   Loss 3.0837   LearningRate 0.0007   Epoch: 9   Global Step: 16310   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:33:01,032-Speed 9422.72 samples/sec   Loss 3.0492   LearningRate 0.0007   Epoch: 9   Global Step: 16320   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:33:27,148-Speed 9411.04 samples/sec   Loss 3.0330   LearningRate 0.0007   Epoch: 9   Global Step: 16330   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:33:53,287-Speed 9402.43 samples/sec   Loss 3.0146   LearningRate 0.0007   Epoch: 9   Global Step: 16340   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:34:19,446-Speed 9395.21 samples/sec   Loss 3.0164   LearningRate 0.0007   Epoch: 9   Global Step: 16350   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:34:45,646-Speed 9380.40 samples/sec   Loss 3.0126   LearningRate 0.0007   Epoch: 9   Global Step: 16360   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:35:11,766-Speed 9409.57 samples/sec   Loss 3.0235   LearningRate 0.0007   Epoch: 9   Global Step: 16370   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:35:37,918-Speed 9397.82 samples/sec   Loss 3.0383   LearningRate 0.0007   Epoch: 9   Global Step: 16380   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:36:04,012-Speed 9418.66 samples/sec   Loss 3.0221   LearningRate 0.0007   Epoch: 9   Global Step: 16390   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:36:30,100-Speed 9420.72 samples/sec   Loss 3.0070   LearningRate 0.0007   Epoch: 9   Global Step: 16400   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:36:56,299-Speed 9380.79 samples/sec   Loss 3.0024   LearningRate 0.0007   Epoch: 9   Global Step: 16410   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:37:22,514-Speed 9375.14 samples/sec   Loss 3.0245   LearningRate 0.0007   Epoch: 9   Global Step: 16420   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:37:48,706-Speed 9383.66 samples/sec   Loss 2.9973   LearningRate 0.0007   Epoch: 9   Global Step: 16430   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:38:14,758-Speed 9434.75 samples/sec   Loss 3.0027   LearningRate 0.0007   Epoch: 9   Global Step: 16440   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:38:40,898-Speed 9402.24 samples/sec   Loss 3.0252   LearningRate 0.0007   Epoch: 9   Global Step: 16450   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:39:06,988-Speed 9420.07 samples/sec   Loss 3.0157   LearningRate 0.0007   Epoch: 9   Global Step: 16460   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:39:33,123-Speed 9404.24 samples/sec   Loss 2.9890   LearningRate 0.0007   Epoch: 9   Global Step: 16470   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:39:59,343-Speed 9373.46 samples/sec   Loss 3.0053   LearningRate 0.0007   Epoch: 9   Global Step: 16480   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:40:25,591-Speed 9363.21 samples/sec   Loss 3.0051   LearningRate 0.0007   Epoch: 9   Global Step: 16490   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:40:51,734-Speed 9401.00 samples/sec   Loss 2.9925   LearningRate 0.0007   Epoch: 9   Global Step: 16500   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:41:17,883-Speed 9398.94 samples/sec   Loss 2.9981   LearningRate 0.0007   Epoch: 9   Global Step: 16510   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-03-05 08:41:43,952-Speed 9427.79 samples/sec   Loss 2.9865   LearningRate 0.0007   Epoch: 9   Global Step: 16520   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-03-05 08:42:10,093-Speed 9401.45 samples/sec   Loss 3.0056   LearningRate 0.0007   Epoch: 9   Global Step: 16530   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-03-05 08:42:36,232-Speed 9402.64 samples/sec   Loss 2.9794   LearningRate 0.0007   Epoch: 9   Global Step: 16540   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-03-05 08:43:02,297-Speed 9429.15 samples/sec   Loss 2.9878   LearningRate 0.0007   Epoch: 9   Global Step: 16550   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:43:28,450-Speed 9397.49 samples/sec   Loss 2.9710   LearningRate 0.0007   Epoch: 9   Global Step: 16560   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:43:54,642-Speed 9384.08 samples/sec   Loss 3.0190   LearningRate 0.0007   Epoch: 9   Global Step: 16570   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:44:20,812-Speed 9391.13 samples/sec   Loss 3.0196   LearningRate 0.0007   Epoch: 9   Global Step: 16580   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:44:46,901-Speed 9420.74 samples/sec   Loss 3.0033   LearningRate 0.0007   Epoch: 9   Global Step: 16590   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:45:13,014-Speed 9411.73 samples/sec   Loss 2.9768   LearningRate 0.0007   Epoch: 9   Global Step: 16600   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:45:39,205-Speed 9383.74 samples/sec   Loss 2.9773   LearningRate 0.0007   Epoch: 9   Global Step: 16610   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:46:05,345-Speed 9402.39 samples/sec   Loss 2.9800   LearningRate 0.0007   Epoch: 9   Global Step: 16620   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:46:31,543-Speed 9381.00 samples/sec   Loss 2.9570   LearningRate 0.0007   Epoch: 9   Global Step: 16630   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:46:57,691-Speed 9399.38 samples/sec   Loss 2.9648   LearningRate 0.0007   Epoch: 9   Global Step: 16640   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-03-05 08:47:23,850-Speed 9395.34 samples/sec   Loss 2.9993   LearningRate 0.0007   Epoch: 9   Global Step: 16650   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-03-05 08:47:50,032-Speed 9387.15 samples/sec   Loss 2.9757   LearningRate 0.0007   Epoch: 9   Global Step: 16660   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-03-05 08:48:16,170-Speed 9402.70 samples/sec   Loss 2.9682   LearningRate 0.0007   Epoch: 9   Global Step: 16670   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-03-05 08:48:42,308-Speed 9402.92 samples/sec   Loss 2.9782   LearningRate 0.0007   Epoch: 9   Global Step: 16680   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-03-05 08:49:08,392-Speed 9422.29 samples/sec   Loss 2.9838   LearningRate 0.0007   Epoch: 9   Global Step: 16690   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-03-05 08:49:34,445-Speed 9433.81 samples/sec   Loss 2.9786   LearningRate 0.0007   Epoch: 9   Global Step: 16700   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-03-05 08:50:00,560-Speed 9410.85 samples/sec   Loss 3.0351   LearningRate 0.0007   Epoch: 9   Global Step: 16710   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-03-05 08:50:26,665-Speed 9415.03 samples/sec   Loss 2.9979   LearningRate 0.0007   Epoch: 9   Global Step: 16720   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-03-05 08:50:52,844-Speed 9387.83 samples/sec   Loss 2.9558   LearningRate 0.0007   Epoch: 9   Global Step: 16730   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-03-05 08:51:19,005-Speed 9394.46 samples/sec   Loss 2.9487   LearningRate 0.0007   Epoch: 9   Global Step: 16740   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:51:45,115-Speed 9413.10 samples/sec   Loss 2.9659   LearningRate 0.0007   Epoch: 9   Global Step: 16750   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:52:11,318-Speed 9379.46 samples/sec   Loss 2.9645   LearningRate 0.0007   Epoch: 9   Global Step: 16760   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:52:37,483-Speed 9393.24 samples/sec   Loss 2.9719   LearningRate 0.0007   Epoch: 9   Global Step: 16770   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:53:03,567-Speed 9422.31 samples/sec   Loss 2.9635   LearningRate 0.0007   Epoch: 9   Global Step: 16780   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-03-05 08:53:29,603-Speed 9439.72 samples/sec   Loss 2.9905   LearningRate 0.0007   Epoch: 9   Global Step: 16790   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-03-05 08:53:55,816-Speed 9375.72 samples/sec   Loss 2.9841   LearningRate 0.0007   Epoch: 9   Global Step: 16800   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-03-05 08:54:21,965-Speed 9399.31 samples/sec   Loss 2.9569   LearningRate 0.0007   Epoch: 9   Global Step: 16810   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-03-05 08:54:48,103-Speed 9402.91 samples/sec   Loss 2.9525   LearningRate 0.0007   Epoch: 9   Global Step: 16820   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-03-05 08:55:14,254-Speed 9398.02 samples/sec   Loss 2.9427   LearningRate 0.0007   Epoch: 9   Global Step: 16830   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-03-05 08:55:40,375-Speed 9409.12 samples/sec   Loss 2.9483   LearningRate 0.0007   Epoch: 9   Global Step: 16840   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-03-05 08:56:06,558-Speed 9386.50 samples/sec   Loss 2.9533   LearningRate 0.0007   Epoch: 9   Global Step: 16850   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-03-05 08:56:32,708-Speed 9398.78 samples/sec   Loss 2.9537   LearningRate 0.0007   Epoch: 9   Global Step: 16860   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-03-05 08:56:58,822-Speed 9411.34 samples/sec   Loss 2.9897   LearningRate 0.0007   Epoch: 9   Global Step: 16870   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-03-05 08:57:24,920-Speed 9417.13 samples/sec   Loss 2.9768   LearningRate 0.0007   Epoch: 9   Global Step: 16880   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-03-05 08:57:51,100-Speed 9387.81 samples/sec   Loss 2.9272   LearningRate 0.0007   Epoch: 9   Global Step: 16890   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 08:58:17,284-Speed 9386.49 samples/sec   Loss 2.9290   LearningRate 0.0007   Epoch: 9   Global Step: 16900   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 08:58:43,414-Speed 9405.58 samples/sec   Loss 2.9355   LearningRate 0.0007   Epoch: 9   Global Step: 16910   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 08:59:09,572-Speed 9395.71 samples/sec   Loss 2.9499   LearningRate 0.0007   Epoch: 9   Global Step: 16920   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 08:59:35,714-Speed 9401.14 samples/sec   Loss 2.9411   LearningRate 0.0007   Epoch: 9   Global Step: 16930   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:00:01,890-Speed 9389.48 samples/sec   Loss 2.9458   LearningRate 0.0007   Epoch: 9   Global Step: 16940   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:00:28,038-Speed 9399.08 samples/sec   Loss 2.9324   LearningRate 0.0007   Epoch: 9   Global Step: 16950   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:00:54,164-Speed 9407.15 samples/sec   Loss 2.9489   LearningRate 0.0007   Epoch: 9   Global Step: 16960   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:01:20,345-Speed 9387.61 samples/sec   Loss 2.9622   LearningRate 0.0007   Epoch: 9   Global Step: 16970   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:01:46,429-Speed 9422.34 samples/sec   Loss 2.9515   LearningRate 0.0007   Epoch: 9   Global Step: 16980   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:02:12,548-Speed 9409.53 samples/sec   Loss 2.9242   LearningRate 0.0007   Epoch: 9   Global Step: 16990   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-03-05 09:02:38,700-Speed 9398.12 samples/sec   Loss 2.9399   LearningRate 0.0007   Epoch: 9   Global Step: 17000   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-03-05 09:03:04,813-Speed 9411.63 samples/sec   Loss 2.9142   LearningRate 0.0007   Epoch: 9   Global Step: 17010   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-03-05 09:03:30,905-Speed 9419.68 samples/sec   Loss 2.9393   LearningRate 0.0007   Epoch: 9   Global Step: 17020   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-03-05 09:03:57,097-Speed 9383.17 samples/sec   Loss 2.9283   LearningRate 0.0007   Epoch: 9   Global Step: 17030   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:04:23,212-Speed 9411.30 samples/sec   Loss 2.9262   LearningRate 0.0007   Epoch: 9   Global Step: 17040   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:04:49,323-Speed 9412.46 samples/sec   Loss 2.9699   LearningRate 0.0007   Epoch: 9   Global Step: 17050   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:05:15,448-Speed 9407.64 samples/sec   Loss 2.9270   LearningRate 0.0007   Epoch: 9   Global Step: 17060   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:05:41,594-Speed 9399.65 samples/sec   Loss 2.9324   LearningRate 0.0007   Epoch: 9   Global Step: 17070   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:06:07,742-Speed 9399.34 samples/sec   Loss 2.9241   LearningRate 0.0007   Epoch: 9   Global Step: 17080   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:06:33,860-Speed 9409.86 samples/sec   Loss 2.9179   LearningRate 0.0007   Epoch: 9   Global Step: 17090   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:07:00,022-Speed 9394.76 samples/sec   Loss 2.9059   LearningRate 0.0007   Epoch: 9   Global Step: 17100   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:07:26,183-Speed 9394.75 samples/sec   Loss 2.9089   LearningRate 0.0007   Epoch: 9   Global Step: 17110   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:07:52,459-Speed 9353.50 samples/sec   Loss 2.9197   LearningRate 0.0007   Epoch: 9   Global Step: 17120   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:08:18,759-Speed 9344.72 samples/sec   Loss 2.9041   LearningRate 0.0007   Epoch: 9   Global Step: 17130   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:08:44,862-Speed 9415.63 samples/sec   Loss 2.9159   LearningRate 0.0007   Epoch: 9   Global Step: 17140   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:09:10,984-Speed 9408.68 samples/sec   Loss 2.9214   LearningRate 0.0007   Epoch: 9   Global Step: 17150   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:09:37,148-Speed 9393.56 samples/sec   Loss 2.9335   LearningRate 0.0007   Epoch: 9   Global Step: 17160   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:10:03,342-Speed 9382.87 samples/sec   Loss 2.9166   LearningRate 0.0007   Epoch: 9   Global Step: 17170   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:10:29,502-Speed 9394.68 samples/sec   Loss 2.9064   LearningRate 0.0007   Epoch: 9   Global Step: 17180   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:10:55,632-Speed 9405.55 samples/sec   Loss 2.9413   LearningRate 0.0007   Epoch: 9   Global Step: 17190   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:11:21,685-Speed 9433.93 samples/sec   Loss 2.9068   LearningRate 0.0007   Epoch: 9   Global Step: 17200   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:11:47,818-Speed 9404.63 samples/sec   Loss 2.9073   LearningRate 0.0007   Epoch: 9   Global Step: 17210   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:12:13,923-Speed 9414.78 samples/sec   Loss 2.9068   LearningRate 0.0007   Epoch: 9   Global Step: 17220   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:12:40,042-Speed 9409.60 samples/sec   Loss 2.9316   LearningRate 0.0007   Epoch: 9   Global Step: 17230   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-03-05 09:13:06,144-Speed 9415.64 samples/sec   Loss 2.9984   LearningRate 0.0007   Epoch: 9   Global Step: 17240   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-03-05 09:13:32,287-Speed 9401.47 samples/sec   Loss 2.9489   LearningRate 0.0007   Epoch: 9   Global Step: 17250   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-03-05 09:13:58,408-Speed 9408.64 samples/sec   Loss 2.9391   LearningRate 0.0007   Epoch: 9   Global Step: 17260   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-03-05 09:14:24,536-Speed 9406.45 samples/sec   Loss 2.9391   LearningRate 0.0007   Epoch: 9   Global Step: 17270   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-03-05 09:14:50,603-Speed 9428.64 samples/sec   Loss 2.9434   LearningRate 0.0007   Epoch: 9   Global Step: 17280   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-03-05 09:16:10,547-Speed 3074.23 samples/sec   Loss 2.8809   LearningRate 0.0007   Epoch: 10   Global Step: 17290   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-03-05 09:16:36,526-Speed 9460.29 samples/sec   Loss 2.8619   LearningRate 0.0007   Epoch: 10   Global Step: 17300   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-03-05 09:17:02,596-Speed 9427.30 samples/sec   Loss 2.8712   LearningRate 0.0007   Epoch: 10   Global Step: 17310   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-03-05 09:17:28,775-Speed 9388.37 samples/sec   Loss 2.8709   LearningRate 0.0007   Epoch: 10   Global Step: 17320   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-03-05 09:17:54,948-Speed 9390.24 samples/sec   Loss 2.8637   LearningRate 0.0007   Epoch: 10   Global Step: 17330   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-03-05 09:18:21,043-Speed 9419.17 samples/sec   Loss 2.8764   LearningRate 0.0007   Epoch: 10   Global Step: 17340   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-03-05 09:18:47,080-Speed 9439.29 samples/sec   Loss 2.8991   LearningRate 0.0007   Epoch: 10   Global Step: 17350   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:19:13,179-Speed 9416.72 samples/sec   Loss 2.8714   LearningRate 0.0007   Epoch: 10   Global Step: 17360   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:19:39,224-Speed 9436.51 samples/sec   Loss 2.8766   LearningRate 0.0007   Epoch: 10   Global Step: 17370   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:20:05,271-Speed 9435.84 samples/sec   Loss 2.8897   LearningRate 0.0007   Epoch: 10   Global Step: 17380   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:20:31,372-Speed 9416.44 samples/sec   Loss 2.8740   LearningRate 0.0007   Epoch: 10   Global Step: 17390   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:20:57,441-Speed 9427.62 samples/sec   Loss 2.8616   LearningRate 0.0007   Epoch: 10   Global Step: 17400   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:21:23,565-Speed 9408.34 samples/sec   Loss 2.8478   LearningRate 0.0007   Epoch: 10   Global Step: 17410   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:21:49,832-Speed 9356.59 samples/sec   Loss 2.8564   LearningRate 0.0007   Epoch: 10   Global Step: 17420   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:22:15,886-Speed 9433.31 samples/sec   Loss 2.8894   LearningRate 0.0007   Epoch: 10   Global Step: 17430   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:22:42,018-Speed 9404.88 samples/sec   Loss 2.8966   LearningRate 0.0007   Epoch: 10   Global Step: 17440   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:23:08,294-Speed 9353.50 samples/sec   Loss 2.8629   LearningRate 0.0007   Epoch: 10   Global Step: 17450   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-03-05 09:23:34,418-Speed 9407.79 samples/sec   Loss 2.8977   LearningRate 0.0007   Epoch: 10   Global Step: 17460   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-03-05 09:24:00,532-Speed 9411.18 samples/sec   Loss 2.8642   LearningRate 0.0007   Epoch: 10   Global Step: 17470   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-03-05 09:24:26,648-Speed 9410.79 samples/sec   Loss 2.8771   LearningRate 0.0007   Epoch: 10   Global Step: 17480   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:24:52,841-Speed 9383.39 samples/sec   Loss 2.8746   LearningRate 0.0007   Epoch: 10   Global Step: 17490   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:25:19,038-Speed 9381.49 samples/sec   Loss 2.8746   LearningRate 0.0007   Epoch: 10   Global Step: 17500   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:25:45,206-Speed 9392.08 samples/sec   Loss 2.8601   LearningRate 0.0007   Epoch: 10   Global Step: 17510   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:26:11,334-Speed 9406.24 samples/sec   Loss 2.8532   LearningRate 0.0007   Epoch: 10   Global Step: 17520   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:26:37,475-Speed 9401.67 samples/sec   Loss 2.8602   LearningRate 0.0007   Epoch: 10   Global Step: 17530   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:27:03,593-Speed 9410.19 samples/sec   Loss 2.8665   LearningRate 0.0007   Epoch: 10   Global Step: 17540   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:27:29,677-Speed 9422.44 samples/sec   Loss 2.8552   LearningRate 0.0007   Epoch: 10   Global Step: 17550   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:27:55,781-Speed 9415.34 samples/sec   Loss 2.8546   LearningRate 0.0007   Epoch: 10   Global Step: 17560   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:28:21,857-Speed 9425.04 samples/sec   Loss 2.8949   LearningRate 0.0007   Epoch: 10   Global Step: 17570   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:28:48,002-Speed 9400.37 samples/sec   Loss 2.8715   LearningRate 0.0007   Epoch: 10   Global Step: 17580   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-03-05 09:29:14,155-Speed 9397.72 samples/sec   Loss 2.8475   LearningRate 0.0007   Epoch: 10   Global Step: 17590   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-03-05 09:29:40,276-Speed 9409.07 samples/sec   Loss 2.8688   LearningRate 0.0007   Epoch: 10   Global Step: 17600   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-03-05 09:30:06,375-Speed 9416.76 samples/sec   Loss 2.8766   LearningRate 0.0007   Epoch: 10   Global Step: 17610   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:30:32,515-Speed 9402.20 samples/sec   Loss 2.8432   LearningRate 0.0007   Epoch: 10   Global Step: 17620   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:30:58,594-Speed 9424.12 samples/sec   Loss 2.8595   LearningRate 0.0007   Epoch: 10   Global Step: 17630   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:31:24,716-Speed 9409.37 samples/sec   Loss 2.8547   LearningRate 0.0007   Epoch: 10   Global Step: 17640   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:31:50,840-Speed 9408.10 samples/sec   Loss 2.8429   LearningRate 0.0007   Epoch: 10   Global Step: 17650   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:32:16,971-Speed 9405.23 samples/sec   Loss 2.8506   LearningRate 0.0007   Epoch: 10   Global Step: 17660   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:32:43,113-Speed 9401.37 samples/sec   Loss 2.8598   LearningRate 0.0007   Epoch: 10   Global Step: 17670   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:33:09,217-Speed 9415.29 samples/sec   Loss 2.8587   LearningRate 0.0007   Epoch: 10   Global Step: 17680   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:33:35,385-Speed 9392.12 samples/sec   Loss 2.8437   LearningRate 0.0007   Epoch: 10   Global Step: 17690   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:34:01,520-Speed 9403.91 samples/sec   Loss 2.8354   LearningRate 0.0007   Epoch: 10   Global Step: 17700   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:34:27,705-Speed 9385.64 samples/sec   Loss 2.8512   LearningRate 0.0007   Epoch: 10   Global Step: 17710   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-03-05 09:34:53,892-Speed 9385.27 samples/sec   Loss 2.8371   LearningRate 0.0007   Epoch: 10   Global Step: 17720   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-03-05 09:35:20,051-Speed 9395.38 samples/sec   Loss 2.8488   LearningRate 0.0007   Epoch: 10   Global Step: 17730   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-03-05 09:35:46,226-Speed 9389.31 samples/sec   Loss 2.8511   LearningRate 0.0007   Epoch: 10   Global Step: 17740   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-03-05 09:36:12,498-Speed 9355.07 samples/sec   Loss 2.8483   LearningRate 0.0007   Epoch: 10   Global Step: 17750   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-03-05 09:36:38,681-Speed 9386.69 samples/sec   Loss 2.8423   LearningRate 0.0007   Epoch: 10   Global Step: 17760   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-03-05 09:37:04,876-Speed 9382.25 samples/sec   Loss 2.8479   LearningRate 0.0007   Epoch: 10   Global Step: 17770   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:37:30,986-Speed 9413.02 samples/sec   Loss 2.8539   LearningRate 0.0007   Epoch: 10   Global Step: 17780   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:37:57,070-Speed 9422.17 samples/sec   Loss 2.8521   LearningRate 0.0007   Epoch: 10   Global Step: 17790   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:38:23,154-Speed 9422.09 samples/sec   Loss 2.8330   LearningRate 0.0007   Epoch: 10   Global Step: 17800   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:38:49,311-Speed 9396.06 samples/sec   Loss 2.8436   LearningRate 0.0007   Epoch: 10   Global Step: 17810   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:39:15,450-Speed 9402.40 samples/sec   Loss 2.8331   LearningRate 0.0007   Epoch: 10   Global Step: 17820   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:39:41,666-Speed 9375.08 samples/sec   Loss 2.8087   LearningRate 0.0007   Epoch: 10   Global Step: 17830   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:40:07,840-Speed 9389.97 samples/sec   Loss 2.8293   LearningRate 0.0007   Epoch: 10   Global Step: 17840   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:40:33,898-Speed 9431.83 samples/sec   Loss 2.8205   LearningRate 0.0007   Epoch: 10   Global Step: 17850   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:41:00,039-Speed 9401.84 samples/sec   Loss 2.8282   LearningRate 0.0007   Epoch: 10   Global Step: 17860   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:41:26,134-Speed 9418.26 samples/sec   Loss 2.8118   LearningRate 0.0007   Epoch: 10   Global Step: 17870   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-03-05 09:41:52,306-Speed 9390.68 samples/sec   Loss 2.8204   LearningRate 0.0007   Epoch: 10   Global Step: 17880   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-03-05 09:42:18,487-Speed 9387.47 samples/sec   Loss 2.8361   LearningRate 0.0007   Epoch: 10   Global Step: 17890   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-03-05 09:42:44,601-Speed 9411.14 samples/sec   Loss 2.8287   LearningRate 0.0007   Epoch: 10   Global Step: 17900   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-03-05 09:43:10,732-Speed 9405.50 samples/sec   Loss 2.8390   LearningRate 0.0007   Epoch: 10   Global Step: 17910   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-03-05 09:43:36,836-Speed 9414.68 samples/sec   Loss 2.8110   LearningRate 0.0007   Epoch: 10   Global Step: 17920   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:44:02,966-Speed 9405.92 samples/sec   Loss 2.8213   LearningRate 0.0007   Epoch: 10   Global Step: 17930   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:44:29,079-Speed 9411.94 samples/sec   Loss 2.8048   LearningRate 0.0007   Epoch: 10   Global Step: 17940   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:44:55,151-Speed 9426.57 samples/sec   Loss 2.8089   LearningRate 0.0007   Epoch: 10   Global Step: 17950   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:45:21,282-Speed 9405.71 samples/sec   Loss 2.8352   LearningRate 0.0007   Epoch: 10   Global Step: 17960   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:45:47,379-Speed 9417.73 samples/sec   Loss 2.8124   LearningRate 0.0007   Epoch: 10   Global Step: 17970   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:46:13,617-Speed 9366.88 samples/sec   Loss 2.8098   LearningRate 0.0007   Epoch: 10   Global Step: 17980   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:46:39,782-Speed 9392.99 samples/sec   Loss 2.8066   LearningRate 0.0007   Epoch: 10   Global Step: 17990   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:47:05,955-Speed 9390.28 samples/sec   Loss 2.8074   LearningRate 0.0007   Epoch: 10   Global Step: 18000   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:47:32,037-Speed 9423.24 samples/sec   Loss 2.8214   LearningRate 0.0007   Epoch: 10   Global Step: 18010   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:47:58,154-Speed 9410.25 samples/sec   Loss 2.8221   LearningRate 0.0007   Epoch: 10   Global Step: 18020   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-03-05 09:48:24,248-Speed 9418.76 samples/sec   Loss 2.8083   LearningRate 0.0007   Epoch: 10   Global Step: 18030   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:48:50,352-Speed 9414.98 samples/sec   Loss 2.7758   LearningRate 0.0007   Epoch: 10   Global Step: 18040   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:49:16,380-Speed 9442.48 samples/sec   Loss 2.8225   LearningRate 0.0007   Epoch: 10   Global Step: 18050   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:49:42,516-Speed 9403.35 samples/sec   Loss 2.8184   LearningRate 0.0007   Epoch: 10   Global Step: 18060   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:50:08,589-Speed 9426.46 samples/sec   Loss 2.7717   LearningRate 0.0007   Epoch: 10   Global Step: 18070   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:50:34,749-Speed 9394.81 samples/sec   Loss 2.7877   LearningRate 0.0007   Epoch: 10   Global Step: 18080   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:51:00,920-Speed 9390.81 samples/sec   Loss 2.7942   LearningRate 0.0007   Epoch: 10   Global Step: 18090   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:51:27,081-Speed 9394.56 samples/sec   Loss 2.8230   LearningRate 0.0007   Epoch: 10   Global Step: 18100   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:51:53,333-Speed 9361.72 samples/sec   Loss 2.8154   LearningRate 0.0007   Epoch: 10   Global Step: 18110   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:52:19,445-Speed 9412.34 samples/sec   Loss 2.8054   LearningRate 0.0007   Epoch: 10   Global Step: 18120   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:52:45,600-Speed 9396.66 samples/sec   Loss 2.7837   LearningRate 0.0007   Epoch: 10   Global Step: 18130   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-03-05 09:53:11,774-Speed 9389.85 samples/sec   Loss 2.8134   LearningRate 0.0007   Epoch: 10   Global Step: 18140   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-03-05 09:53:37,942-Speed 9391.84 samples/sec   Loss 2.7969   LearningRate 0.0007   Epoch: 10   Global Step: 18150   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-03-05 09:54:04,107-Speed 9393.34 samples/sec   Loss 2.7867   LearningRate 0.0007   Epoch: 10   Global Step: 18160   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-03-05 09:54:30,170-Speed 9430.69 samples/sec   Loss 2.8029   LearningRate 0.0007   Epoch: 10   Global Step: 18170   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:54:56,340-Speed 9391.35 samples/sec   Loss 2.8152   LearningRate 0.0007   Epoch: 10   Global Step: 18180   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:55:22,411-Speed 9427.17 samples/sec   Loss 2.7902   LearningRate 0.0007   Epoch: 10   Global Step: 18190   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:55:48,537-Speed 9406.80 samples/sec   Loss 2.7780   LearningRate 0.0007   Epoch: 10   Global Step: 18200   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:56:14,665-Speed 9406.76 samples/sec   Loss 2.7896   LearningRate 0.0007   Epoch: 10   Global Step: 18210   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:56:40,715-Speed 9434.68 samples/sec   Loss 2.7602   LearningRate 0.0007   Epoch: 10   Global Step: 18220   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-03-05 09:57:06,757-Speed 9437.35 samples/sec   Loss 2.7584   LearningRate 0.0007   Epoch: 10   Global Step: 18230   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-03-05 09:57:32,832-Speed 9425.74 samples/sec   Loss 2.7761   LearningRate 0.0007   Epoch: 10   Global Step: 18240   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-03-05 09:57:58,976-Speed 9400.55 samples/sec   Loss 2.7608   LearningRate 0.0007   Epoch: 10   Global Step: 18250   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-03-05 09:58:25,140-Speed 9393.34 samples/sec   Loss 2.7746   LearningRate 0.0007   Epoch: 10   Global Step: 18260   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-03-05 09:58:51,257-Speed 9410.54 samples/sec   Loss 2.7659   LearningRate 0.0007   Epoch: 10   Global Step: 18270   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-03-05 09:59:17,297-Speed 9438.29 samples/sec   Loss 2.7656   LearningRate 0.0007   Epoch: 10   Global Step: 18280   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-03-05 09:59:43,549-Speed 9361.63 samples/sec   Loss 2.7966   LearningRate 0.0007   Epoch: 10   Global Step: 18290   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-03-05 10:00:09,694-Speed 9400.64 samples/sec   Loss 2.7946   LearningRate 0.0007   Epoch: 10   Global Step: 18300   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-03-05 10:00:35,798-Speed 9415.04 samples/sec   Loss 2.7658   LearningRate 0.0007   Epoch: 10   Global Step: 18310   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-03-05 10:01:01,947-Speed 9399.04 samples/sec   Loss 2.7650   LearningRate 0.0007   Epoch: 10   Global Step: 18320   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-03-05 10:01:28,253-Speed 9342.82 samples/sec   Loss 2.7689   LearningRate 0.0007   Epoch: 10   Global Step: 18330   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:01:54,443-Speed 9383.99 samples/sec   Loss 2.7769   LearningRate 0.0007   Epoch: 10   Global Step: 18340   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:02:20,559-Speed 9411.11 samples/sec   Loss 2.7582   LearningRate 0.0007   Epoch: 10   Global Step: 18350   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:02:46,698-Speed 9402.44 samples/sec   Loss 2.7684   LearningRate 0.0007   Epoch: 10   Global Step: 18360   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:03:12,822-Speed 9408.00 samples/sec   Loss 2.7805   LearningRate 0.0007   Epoch: 10   Global Step: 18370   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-03-05 10:03:38,939-Speed 9410.65 samples/sec   Loss 2.7842   LearningRate 0.0007   Epoch: 10   Global Step: 18380   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-03-05 10:04:05,053-Speed 9411.43 samples/sec   Loss 2.7667   LearningRate 0.0007   Epoch: 10   Global Step: 18390   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-03-05 10:04:31,222-Speed 9392.04 samples/sec   Loss 2.7446   LearningRate 0.0007   Epoch: 10   Global Step: 18400   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-03-05 10:04:57,335-Speed 9411.85 samples/sec   Loss 2.7648   LearningRate 0.0007   Epoch: 10   Global Step: 18410   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-03-05 10:05:23,471-Speed 9403.47 samples/sec   Loss 2.7988   LearningRate 0.0007   Epoch: 10   Global Step: 18420   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-03-05 10:05:49,616-Speed 9400.33 samples/sec   Loss 2.7766   LearningRate 0.0007   Epoch: 10   Global Step: 18430   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-03-05 10:06:15,725-Speed 9413.19 samples/sec   Loss 2.7666   LearningRate 0.0007   Epoch: 10   Global Step: 18440   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-03-05 10:06:41,817-Speed 9419.72 samples/sec   Loss 2.7418   LearningRate 0.0007   Epoch: 10   Global Step: 18450   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-03-05 10:07:07,952-Speed 9403.93 samples/sec   Loss 2.7546   LearningRate 0.0007   Epoch: 10   Global Step: 18460   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-03-05 10:07:34,103-Speed 9398.45 samples/sec   Loss 2.7640   LearningRate 0.0007   Epoch: 10   Global Step: 18470   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:08:00,244-Speed 9402.45 samples/sec   Loss 2.7448   LearningRate 0.0007   Epoch: 10   Global Step: 18480   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:08:26,448-Speed 9379.39 samples/sec   Loss 2.7507   LearningRate 0.0007   Epoch: 10   Global Step: 18490   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:08:52,613-Speed 9393.02 samples/sec   Loss 2.7764   LearningRate 0.0007   Epoch: 10   Global Step: 18500   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:09:18,731-Speed 9410.27 samples/sec   Loss 2.7471   LearningRate 0.0007   Epoch: 10   Global Step: 18510   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:09:44,986-Speed 9361.21 samples/sec   Loss 2.7329   LearningRate 0.0007   Epoch: 10   Global Step: 18520   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:10:11,168-Speed 9386.74 samples/sec   Loss 2.7364   LearningRate 0.0007   Epoch: 10   Global Step: 18530   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:10:37,249-Speed 9423.60 samples/sec   Loss 2.7415   LearningRate 0.0007   Epoch: 10   Global Step: 18540   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:11:03,371-Speed 9408.46 samples/sec   Loss 2.7452   LearningRate 0.0007   Epoch: 10   Global Step: 18550   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:11:29,496-Speed 9407.97 samples/sec   Loss 2.7603   LearningRate 0.0007   Epoch: 10   Global Step: 18560   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:11:55,576-Speed 9423.48 samples/sec   Loss 2.7210   LearningRate 0.0007   Epoch: 10   Global Step: 18570   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-03-05 10:12:21,625-Speed 9434.94 samples/sec   Loss 2.7369   LearningRate 0.0007   Epoch: 10   Global Step: 18580   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:12:47,734-Speed 9413.28 samples/sec   Loss 2.7544   LearningRate 0.0007   Epoch: 10   Global Step: 18590   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:13:13,894-Speed 9394.84 samples/sec   Loss 2.7325   LearningRate 0.0007   Epoch: 10   Global Step: 18600   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:13:40,017-Speed 9408.47 samples/sec   Loss 2.7320   LearningRate 0.0007   Epoch: 10   Global Step: 18610   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:14:06,097-Speed 9423.98 samples/sec   Loss 2.7227   LearningRate 0.0007   Epoch: 10   Global Step: 18620   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:14:32,268-Speed 9391.07 samples/sec   Loss 2.7171   LearningRate 0.0007   Epoch: 10   Global Step: 18630   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:14:58,350-Speed 9423.00 samples/sec   Loss 2.7480   LearningRate 0.0007   Epoch: 10   Global Step: 18640   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:15:24,439-Speed 9420.41 samples/sec   Loss 2.7544   LearningRate 0.0007   Epoch: 10   Global Step: 18650   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:15:50,499-Speed 9430.93 samples/sec   Loss 2.7123   LearningRate 0.0007   Epoch: 10   Global Step: 18660   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:16:16,552-Speed 9433.47 samples/sec   Loss 2.7124   LearningRate 0.0007   Epoch: 10   Global Step: 18670   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:16:42,659-Speed 9413.97 samples/sec   Loss 2.7243   LearningRate 0.0007   Epoch: 10   Global Step: 18680   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-03-05 10:17:08,772-Speed 9411.99 samples/sec   Loss 2.7154   LearningRate 0.0007   Epoch: 10   Global Step: 18690   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-03-05 10:17:34,833-Speed 9430.45 samples/sec   Loss 2.7307   LearningRate 0.0007   Epoch: 10   Global Step: 18700   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-03-05 10:18:00,944-Speed 9412.95 samples/sec   Loss 2.7474   LearningRate 0.0007   Epoch: 10   Global Step: 18710   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-03-05 10:18:27,089-Speed 9400.18 samples/sec   Loss 2.7276   LearningRate 0.0007   Epoch: 10   Global Step: 18720   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-03-05 10:18:53,311-Speed 9372.73 samples/sec   Loss 2.7317   LearningRate 0.0007   Epoch: 10   Global Step: 18730   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-03-05 10:19:19,424-Speed 9411.48 samples/sec   Loss 2.7279   LearningRate 0.0007   Epoch: 10   Global Step: 18740   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-03-05 10:19:45,617-Speed 9383.01 samples/sec   Loss 2.7153   LearningRate 0.0007   Epoch: 10   Global Step: 18750   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-03-05 10:20:11,808-Speed 9383.86 samples/sec   Loss 2.7067   LearningRate 0.0007   Epoch: 10   Global Step: 18760   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-03-05 10:20:37,933-Speed 9407.49 samples/sec   Loss 2.7209   LearningRate 0.0007   Epoch: 10   Global Step: 18770   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-03-05 10:21:04,037-Speed 9415.23 samples/sec   Loss 2.7143   LearningRate 0.0007   Epoch: 10   Global Step: 18780   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-03-05 10:21:30,179-Speed 9401.68 samples/sec   Loss 2.7140   LearningRate 0.0007   Epoch: 10   Global Step: 18790   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-03-05 10:21:56,349-Speed 9391.40 samples/sec   Loss 2.7321   LearningRate 0.0007   Epoch: 10   Global Step: 18800   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-03-05 10:22:22,510-Speed 9394.62 samples/sec   Loss 2.7346   LearningRate 0.0007   Epoch: 10   Global Step: 18810   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:22:48,674-Speed 9393.73 samples/sec   Loss 2.7311   LearningRate 0.0007   Epoch: 10   Global Step: 18820   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:23:14,881-Speed 9378.35 samples/sec   Loss 2.7119   LearningRate 0.0007   Epoch: 10   Global Step: 18830   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:23:40,997-Speed 9410.76 samples/sec   Loss 2.6906   LearningRate 0.0007   Epoch: 10   Global Step: 18840   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:24:07,096-Speed 9417.02 samples/sec   Loss 2.7274   LearningRate 0.0007   Epoch: 10   Global Step: 18850   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-03-05 10:24:33,176-Speed 9423.79 samples/sec   Loss 2.7161   LearningRate 0.0007   Epoch: 10   Global Step: 18860   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-03-05 10:24:59,321-Speed 9400.54 samples/sec   Loss 2.7211   LearningRate 0.0007   Epoch: 10   Global Step: 18870   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-03-05 10:25:25,536-Speed 9375.27 samples/sec   Loss 2.7147   LearningRate 0.0007   Epoch: 10   Global Step: 18880   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-03-05 10:25:51,670-Speed 9404.28 samples/sec   Loss 2.7048   LearningRate 0.0007   Epoch: 10   Global Step: 18890   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-03-05 10:26:17,864-Speed 9383.17 samples/sec   Loss 2.7148   LearningRate 0.0007   Epoch: 10   Global Step: 18900   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-03-05 10:26:43,976-Speed 9412.12 samples/sec   Loss 2.6866   LearningRate 0.0007   Epoch: 10   Global Step: 18910   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-03-05 10:27:10,191-Speed 9375.06 samples/sec   Loss 2.7055   LearningRate 0.0007   Epoch: 10   Global Step: 18920   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-03-05 10:27:36,309-Speed 9410.20 samples/sec   Loss 2.7290   LearningRate 0.0007   Epoch: 10   Global Step: 18930   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-03-05 10:28:02,452-Speed 9400.87 samples/sec   Loss 2.7122   LearningRate 0.0007   Epoch: 10   Global Step: 18940   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-03-05 10:28:28,593-Speed 9401.94 samples/sec   Loss 2.7179   LearningRate 0.0007   Epoch: 10   Global Step: 18950   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:28:54,795-Speed 9380.06 samples/sec   Loss 2.7120   LearningRate 0.0007   Epoch: 10   Global Step: 18960   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:29:20,966-Speed 9390.96 samples/sec   Loss 2.7020   LearningRate 0.0006   Epoch: 10   Global Step: 18970   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:29:47,141-Speed 9389.65 samples/sec   Loss 2.7008   LearningRate 0.0006   Epoch: 10   Global Step: 18980   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:30:13,272-Speed 9405.26 samples/sec   Loss 2.7242   LearningRate 0.0006   Epoch: 10   Global Step: 18990   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:30:39,391-Speed 9409.83 samples/sec   Loss 2.7221   LearningRate 0.0006   Epoch: 10   Global Step: 19000   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:31:05,576-Speed 9385.92 samples/sec   Loss 2.7196   LearningRate 0.0006   Epoch: 10   Global Step: 19010   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:32:25,097-Speed 3090.57 samples/sec   Loss 2.6822   LearningRate 0.0006   Epoch: 11   Global Step: 19020   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:32:51,080-Speed 9458.94 samples/sec   Loss 2.6798   LearningRate 0.0006   Epoch: 11   Global Step: 19030   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:33:17,102-Speed 9444.82 samples/sec   Loss 2.6798   LearningRate 0.0006   Epoch: 11   Global Step: 19040   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:33:43,169-Speed 9428.55 samples/sec   Loss 2.6817   LearningRate 0.0006   Epoch: 11   Global Step: 19050   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:34:09,342-Speed 9390.32 samples/sec   Loss 2.6513   LearningRate 0.0006   Epoch: 11   Global Step: 19060   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:34:35,451-Speed 9413.22 samples/sec   Loss 2.6685   LearningRate 0.0006   Epoch: 11   Global Step: 19070   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:35:01,472-Speed 9445.42 samples/sec   Loss 2.6630   LearningRate 0.0006   Epoch: 11   Global Step: 19080   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:35:27,496-Speed 9444.12 samples/sec   Loss 2.6662   LearningRate 0.0006   Epoch: 11   Global Step: 19090   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:35:53,618-Speed 9408.49 samples/sec   Loss 2.6842   LearningRate 0.0006   Epoch: 11   Global Step: 19100   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:36:19,627-Speed 9449.46 samples/sec   Loss 2.6673   LearningRate 0.0006   Epoch: 11   Global Step: 19110   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:36:45,674-Speed 9435.81 samples/sec   Loss 2.6567   LearningRate 0.0006   Epoch: 11   Global Step: 19120   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:37:11,767-Speed 9419.38 samples/sec   Loss 2.6806   LearningRate 0.0006   Epoch: 11   Global Step: 19130   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:37:37,866-Speed 9416.90 samples/sec   Loss 2.6837   LearningRate 0.0006   Epoch: 11   Global Step: 19140   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:38:03,983-Speed 9410.17 samples/sec   Loss 2.6669   LearningRate 0.0006   Epoch: 11   Global Step: 19150   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-03-05 10:38:30,036-Speed 9433.79 samples/sec   Loss 2.6395   LearningRate 0.0006   Epoch: 11   Global Step: 19160   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-03-05 10:38:56,060-Speed 9444.07 samples/sec   Loss 2.6640   LearningRate 0.0006   Epoch: 11   Global Step: 19170   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:39:22,103-Speed 9437.16 samples/sec   Loss 2.6809   LearningRate 0.0006   Epoch: 11   Global Step: 19180   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:39:48,176-Speed 9426.31 samples/sec   Loss 2.6923   LearningRate 0.0006   Epoch: 11   Global Step: 19190   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:40:14,328-Speed 9397.93 samples/sec   Loss 2.6636   LearningRate 0.0006   Epoch: 11   Global Step: 19200   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:40:40,469-Speed 9401.47 samples/sec   Loss 2.6650   LearningRate 0.0006   Epoch: 11   Global Step: 19210   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:41:06,589-Speed 9409.57 samples/sec   Loss 2.6611   LearningRate 0.0006   Epoch: 11   Global Step: 19220   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:41:32,735-Speed 9399.69 samples/sec   Loss 2.6716   LearningRate 0.0006   Epoch: 11   Global Step: 19230   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:41:58,799-Speed 9429.81 samples/sec   Loss 2.6846   LearningRate 0.0006   Epoch: 11   Global Step: 19240   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:42:24,937-Speed 9402.68 samples/sec   Loss 2.6629   LearningRate 0.0006   Epoch: 11   Global Step: 19250   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:42:51,012-Speed 9425.82 samples/sec   Loss 2.6690   LearningRate 0.0006   Epoch: 11   Global Step: 19260   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:43:17,128-Speed 9410.78 samples/sec   Loss 2.6732   LearningRate 0.0006   Epoch: 11   Global Step: 19270   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-03-05 10:43:43,255-Speed 9406.77 samples/sec   Loss 2.6607   LearningRate 0.0006   Epoch: 11   Global Step: 19280   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:44:09,336-Speed 9423.61 samples/sec   Loss 2.6439   LearningRate 0.0006   Epoch: 11   Global Step: 19290   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:44:35,391-Speed 9432.87 samples/sec   Loss 2.6569   LearningRate 0.0006   Epoch: 11   Global Step: 19300   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:45:01,528-Speed 9403.17 samples/sec   Loss 2.7066   LearningRate 0.0006   Epoch: 11   Global Step: 19310   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:45:27,703-Speed 9389.35 samples/sec   Loss 2.6831   LearningRate 0.0006   Epoch: 11   Global Step: 19320   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:45:53,807-Speed 9415.42 samples/sec   Loss 2.6420   LearningRate 0.0006   Epoch: 11   Global Step: 19330   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:46:19,966-Speed 9395.03 samples/sec   Loss 2.6705   LearningRate 0.0006   Epoch: 11   Global Step: 19340   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:46:46,122-Speed 9396.36 samples/sec   Loss 2.6587   LearningRate 0.0006   Epoch: 11   Global Step: 19350   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:47:12,240-Speed 9410.20 samples/sec   Loss 2.6342   LearningRate 0.0006   Epoch: 11   Global Step: 19360   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:47:38,316-Speed 9425.29 samples/sec   Loss 2.6457   LearningRate 0.0006   Epoch: 11   Global Step: 19370   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:48:04,463-Speed 9399.53 samples/sec   Loss 2.6796   LearningRate 0.0006   Epoch: 11   Global Step: 19380   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-03-05 10:48:30,566-Speed 9415.31 samples/sec   Loss 2.6462   LearningRate 0.0006   Epoch: 11   Global Step: 19390   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-03-05 10:48:56,669-Speed 9415.28 samples/sec   Loss 2.6460   LearningRate 0.0006   Epoch: 11   Global Step: 19400   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:49:22,822-Speed 9397.44 samples/sec   Loss 2.6369   LearningRate 0.0006   Epoch: 11   Global Step: 19410   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:49:49,032-Speed 9377.02 samples/sec   Loss 2.6459   LearningRate 0.0006   Epoch: 11   Global Step: 19420   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:50:15,154-Speed 9408.48 samples/sec   Loss 2.6428   LearningRate 0.0006   Epoch: 11   Global Step: 19430   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:50:41,301-Speed 9399.61 samples/sec   Loss 2.6528   LearningRate 0.0006   Epoch: 11   Global Step: 19440   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:51:07,396-Speed 9418.45 samples/sec   Loss 2.6264   LearningRate 0.0006   Epoch: 11   Global Step: 19450   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:51:33,485-Speed 9420.36 samples/sec   Loss 2.6393   LearningRate 0.0006   Epoch: 11   Global Step: 19460   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:51:59,608-Speed 9408.75 samples/sec   Loss 2.6359   LearningRate 0.0006   Epoch: 11   Global Step: 19470   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:52:25,690-Speed 9423.03 samples/sec   Loss 2.6501   LearningRate 0.0006   Epoch: 11   Global Step: 19480   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-03-05 10:52:51,858-Speed 9391.95 samples/sec   Loss 2.6362   LearningRate 0.0006   Epoch: 11   Global Step: 19490   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-03-05 10:53:18,074-Speed 9374.66 samples/sec   Loss 2.6670   LearningRate 0.0006   Epoch: 11   Global Step: 19500   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-03-05 10:53:44,215-Speed 9401.93 samples/sec   Loss 2.6596   LearningRate 0.0006   Epoch: 11   Global Step: 19510   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-03-05 10:54:10,297-Speed 9423.35 samples/sec   Loss 2.6485   LearningRate 0.0006   Epoch: 11   Global Step: 19520   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-03-05 10:54:36,376-Speed 9424.17 samples/sec   Loss 2.6366   LearningRate 0.0006   Epoch: 11   Global Step: 19530   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-03-05 10:55:02,477-Speed 9416.02 samples/sec   Loss 2.6235   LearningRate 0.0006   Epoch: 11   Global Step: 19540   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-03-05 10:55:28,582-Speed 9414.70 samples/sec   Loss 2.6290   LearningRate 0.0006   Epoch: 11   Global Step: 19550   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-03-05 10:55:54,776-Speed 9382.75 samples/sec   Loss 2.6634   LearningRate 0.0006   Epoch: 11   Global Step: 19560   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-03-05 10:56:20,888-Speed 9412.02 samples/sec   Loss 2.6233   LearningRate 0.0006   Epoch: 11   Global Step: 19570   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-03-05 10:56:47,006-Speed 9410.07 samples/sec   Loss 2.6508   LearningRate 0.0006   Epoch: 11   Global Step: 19580   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-03-05 10:57:13,185-Speed 9418.41 samples/sec   Loss 2.6251   LearningRate 0.0006   Epoch: 11   Global Step: 19590   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 10:57:39,296-Speed 9412.38 samples/sec   Loss 2.6196   LearningRate 0.0006   Epoch: 11   Global Step: 19600   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 10:58:05,370-Speed 9426.04 samples/sec   Loss 2.6432   LearningRate 0.0006   Epoch: 11   Global Step: 19610   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 10:58:31,523-Speed 9429.94 samples/sec   Loss 2.6230   LearningRate 0.0006   Epoch: 11   Global Step: 19620   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 10:58:57,655-Speed 9404.93 samples/sec   Loss 2.6406   LearningRate 0.0006   Epoch: 11   Global Step: 19630   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 10:59:23,716-Speed 9430.45 samples/sec   Loss 2.6232   LearningRate 0.0006   Epoch: 11   Global Step: 19640   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 10:59:49,935-Speed 9429.61 samples/sec   Loss 2.6514   LearningRate 0.0006   Epoch: 11   Global Step: 19650   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-03-05 11:00:16,097-Speed 9394.02 samples/sec   Loss 2.6308   LearningRate 0.0006   Epoch: 11   Global Step: 19660   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-03-05 11:00:42,309-Speed 9376.23 samples/sec   Loss 2.6152   LearningRate 0.0006   Epoch: 11   Global Step: 19670   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-03-05 11:01:08,423-Speed 9411.51 samples/sec   Loss 2.6041   LearningRate 0.0006   Epoch: 11   Global Step: 19680   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-03-05 11:01:34,711-Speed 9396.69 samples/sec   Loss 2.6317   LearningRate 0.0006   Epoch: 11   Global Step: 19690   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-03-05 11:02:00,871-Speed 9394.85 samples/sec   Loss 2.6222   LearningRate 0.0006   Epoch: 11   Global Step: 19700   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-03-05 11:02:27,080-Speed 9377.48 samples/sec   Loss 2.6203   LearningRate 0.0006   Epoch: 11   Global Step: 19710   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-03-05 11:02:53,332-Speed 9413.97 samples/sec   Loss 2.6297   LearningRate 0.0006   Epoch: 11   Global Step: 19720   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-03-05 11:03:19,520-Speed 9384.75 samples/sec   Loss 2.6116   LearningRate 0.0006   Epoch: 11   Global Step: 19730   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-03-05 11:03:45,764-Speed 9405.29 samples/sec   Loss 2.6030   LearningRate 0.0006   Epoch: 11   Global Step: 19740   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-03-05 11:04:11,896-Speed 9404.91 samples/sec   Loss 2.6266   LearningRate 0.0006   Epoch: 11   Global Step: 19750   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:04:38,080-Speed 9385.90 samples/sec   Loss 2.6058   LearningRate 0.0006   Epoch: 11   Global Step: 19760   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:05:04,291-Speed 9376.72 samples/sec   Loss 2.6326   LearningRate 0.0006   Epoch: 11   Global Step: 19770   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:05:30,390-Speed 9417.00 samples/sec   Loss 2.6043   LearningRate 0.0006   Epoch: 11   Global Step: 19780   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:05:56,561-Speed 9437.17 samples/sec   Loss 2.6206   LearningRate 0.0006   Epoch: 11   Global Step: 19790   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:06:22,654-Speed 9418.96 samples/sec   Loss 2.6116   LearningRate 0.0006   Epoch: 11   Global Step: 19800   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:06:48,931-Speed 9393.18 samples/sec   Loss 2.5967   LearningRate 0.0006   Epoch: 11   Global Step: 19810   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:07:15,020-Speed 9420.34 samples/sec   Loss 2.6016   LearningRate 0.0006   Epoch: 11   Global Step: 19820   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:07:41,181-Speed 9394.62 samples/sec   Loss 2.6347   LearningRate 0.0006   Epoch: 11   Global Step: 19830   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:08:07,511-Speed 9380.92 samples/sec   Loss 2.6015   LearningRate 0.0006   Epoch: 11   Global Step: 19840   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:08:33,758-Speed 9363.72 samples/sec   Loss 2.6034   LearningRate 0.0006   Epoch: 11   Global Step: 19850   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-03-05 11:08:59,898-Speed 9402.25 samples/sec   Loss 2.5955   LearningRate 0.0006   Epoch: 11   Global Step: 19860   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-03-05 11:09:25,993-Speed 9418.32 samples/sec   Loss 2.6054   LearningRate 0.0006   Epoch: 11   Global Step: 19870   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:09:52,072-Speed 9423.85 samples/sec   Loss 2.6189   LearningRate 0.0006   Epoch: 11   Global Step: 19880   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:10:18,137-Speed 9429.05 samples/sec   Loss 2.6106   LearningRate 0.0006   Epoch: 11   Global Step: 19890   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:10:44,327-Speed 9419.67 samples/sec   Loss 2.6147   LearningRate 0.0006   Epoch: 11   Global Step: 19900   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:11:10,550-Speed 9400.08 samples/sec   Loss 2.5853   LearningRate 0.0006   Epoch: 11   Global Step: 19910   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:11:36,642-Speed 9419.46 samples/sec   Loss 2.5684   LearningRate 0.0006   Epoch: 11   Global Step: 19920   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:12:02,851-Speed 9377.28 samples/sec   Loss 2.5840   LearningRate 0.0006   Epoch: 11   Global Step: 19930   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:12:28,988-Speed 9403.26 samples/sec   Loss 2.6108   LearningRate 0.0006   Epoch: 11   Global Step: 19940   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:12:55,198-Speed 9376.89 samples/sec   Loss 2.6144   LearningRate 0.0006   Epoch: 11   Global Step: 19950   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:13:21,281-Speed 9422.55 samples/sec   Loss 2.6017   LearningRate 0.0006   Epoch: 11   Global Step: 19960   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:13:47,484-Speed 9379.45 samples/sec   Loss 2.5995   LearningRate 0.0006   Epoch: 11   Global Step: 19970   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-03-05 11:14:13,636-Speed 9397.47 samples/sec   Loss 2.5725   LearningRate 0.0006   Epoch: 11   Global Step: 19980   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-03-05 11:14:39,887-Speed 9362.49 samples/sec   Loss 2.5608   LearningRate 0.0006   Epoch: 11   Global Step: 19990   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-03-05 11:15:06,121-Speed 9397.25 samples/sec   Loss 2.6000   LearningRate 0.0006   Epoch: 11   Global Step: 20000   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-03-05 11:15:32,183-Speed 9430.18 samples/sec   Loss 2.5817   LearningRate 0.0006   Epoch: 11   Global Step: 20010   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:15:58,337-Speed 9397.00 samples/sec   Loss 2.6098   LearningRate 0.0006   Epoch: 11   Global Step: 20020   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:16:24,531-Speed 9382.74 samples/sec   Loss 2.5933   LearningRate 0.0006   Epoch: 11   Global Step: 20030   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:16:50,610-Speed 9424.09 samples/sec   Loss 2.5860   LearningRate 0.0006   Epoch: 11   Global Step: 20040   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:17:16,697-Speed 9421.37 samples/sec   Loss 2.5774   LearningRate 0.0006   Epoch: 11   Global Step: 20050   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:17:42,845-Speed 9399.22 samples/sec   Loss 2.5636   LearningRate 0.0006   Epoch: 11   Global Step: 20060   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:18:08,952-Speed 9413.87 samples/sec   Loss 2.5692   LearningRate 0.0006   Epoch: 11   Global Step: 20070   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:18:35,043-Speed 9419.77 samples/sec   Loss 2.5841   LearningRate 0.0006   Epoch: 11   Global Step: 20080   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:19:01,110-Speed 9429.28 samples/sec   Loss 2.5839   LearningRate 0.0006   Epoch: 11   Global Step: 20090   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:19:27,247-Speed 9403.06 samples/sec   Loss 2.5808   LearningRate 0.0006   Epoch: 11   Global Step: 20100   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:19:53,370-Speed 9408.39 samples/sec   Loss 2.5639   LearningRate 0.0006   Epoch: 11   Global Step: 20110   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-03-05 11:20:19,538-Speed 9391.83 samples/sec   Loss 2.6005   LearningRate 0.0006   Epoch: 11   Global Step: 20120   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-03-05 11:20:45,623-Speed 9421.97 samples/sec   Loss 2.5775   LearningRate 0.0006   Epoch: 11   Global Step: 20130   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-03-05 11:21:11,821-Speed 9381.50 samples/sec   Loss 2.5662   LearningRate 0.0006   Epoch: 11   Global Step: 20140   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-03-05 11:21:37,989-Speed 9391.94 samples/sec   Loss 2.5690   LearningRate 0.0006   Epoch: 11   Global Step: 20150   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-03-05 11:22:04,056-Speed 9428.35 samples/sec   Loss 2.5673   LearningRate 0.0006   Epoch: 11   Global Step: 20160   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-03-05 11:22:30,237-Speed 9387.23 samples/sec   Loss 2.5852   LearningRate 0.0006   Epoch: 11   Global Step: 20170   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-03-05 11:22:56,352-Speed 9411.04 samples/sec   Loss 2.5702   LearningRate 0.0006   Epoch: 11   Global Step: 20180   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-03-05 11:23:22,455-Speed 9415.22 samples/sec   Loss 2.5741   LearningRate 0.0006   Epoch: 11   Global Step: 20190   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:23:48,584-Speed 9406.30 samples/sec   Loss 2.5665   LearningRate 0.0006   Epoch: 11   Global Step: 20200   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:24:14,658-Speed 9425.74 samples/sec   Loss 2.5708   LearningRate 0.0006   Epoch: 11   Global Step: 20210   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:24:40,783-Speed 9407.58 samples/sec   Loss 2.5774   LearningRate 0.0006   Epoch: 11   Global Step: 20220   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:25:06,920-Speed 9402.88 samples/sec   Loss 2.5571   LearningRate 0.0006   Epoch: 11   Global Step: 20230   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:25:33,111-Speed 9384.04 samples/sec   Loss 2.5919   LearningRate 0.0006   Epoch: 11   Global Step: 20240   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:25:59,322-Speed 9376.61 samples/sec   Loss 2.5797   LearningRate 0.0006   Epoch: 11   Global Step: 20250   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:26:25,397-Speed 9425.41 samples/sec   Loss 2.5704   LearningRate 0.0006   Epoch: 11   Global Step: 20260   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:26:51,529-Speed 9405.18 samples/sec   Loss 2.5628   LearningRate 0.0006   Epoch: 11   Global Step: 20270   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:27:17,722-Speed 9382.88 samples/sec   Loss 2.5750   LearningRate 0.0006   Epoch: 11   Global Step: 20280   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:27:43,891-Speed 9391.96 samples/sec   Loss 2.5521   LearningRate 0.0006   Epoch: 11   Global Step: 20290   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-03-05 11:28:10,012-Speed 9408.78 samples/sec   Loss 2.5451   LearningRate 0.0006   Epoch: 11   Global Step: 20300   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-03-05 11:28:36,158-Speed 9400.29 samples/sec   Loss 2.5393   LearningRate 0.0006   Epoch: 11   Global Step: 20310   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-03-05 11:29:02,374-Speed 9374.64 samples/sec   Loss 2.5512   LearningRate 0.0006   Epoch: 11   Global Step: 20320   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-03-05 11:29:28,527-Speed 9397.64 samples/sec   Loss 2.5682   LearningRate 0.0006   Epoch: 11   Global Step: 20330   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:29:54,697-Speed 9392.11 samples/sec   Loss 2.5566   LearningRate 0.0006   Epoch: 11   Global Step: 20340   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:30:20,759-Speed 9430.36 samples/sec   Loss 2.5457   LearningRate 0.0006   Epoch: 11   Global Step: 20350   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:30:46,882-Speed 9407.89 samples/sec   Loss 2.5321   LearningRate 0.0006   Epoch: 11   Global Step: 20360   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:31:13,033-Speed 9398.13 samples/sec   Loss 2.5660   LearningRate 0.0006   Epoch: 11   Global Step: 20370   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:31:39,132-Speed 9417.39 samples/sec   Loss 2.5621   LearningRate 0.0006   Epoch: 11   Global Step: 20380   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:32:05,258-Speed 9407.16 samples/sec   Loss 2.5526   LearningRate 0.0006   Epoch: 11   Global Step: 20390   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:32:31,324-Speed 9428.68 samples/sec   Loss 2.5372   LearningRate 0.0006   Epoch: 11   Global Step: 20400   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:32:57,402-Speed 9424.26 samples/sec   Loss 2.5297   LearningRate 0.0006   Epoch: 11   Global Step: 20410   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:33:23,555-Speed 9397.52 samples/sec   Loss 2.5318   LearningRate 0.0006   Epoch: 11   Global Step: 20420   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:33:49,663-Speed 9413.83 samples/sec   Loss 2.5399   LearningRate 0.0006   Epoch: 11   Global Step: 20430   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-03-05 11:34:15,795-Speed 9404.89 samples/sec   Loss 2.5374   LearningRate 0.0006   Epoch: 11   Global Step: 20440   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:34:41,888-Speed 9418.84 samples/sec   Loss 2.5307   LearningRate 0.0006   Epoch: 11   Global Step: 20450   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:35:07,999-Speed 9412.46 samples/sec   Loss 2.5397   LearningRate 0.0006   Epoch: 11   Global Step: 20460   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:35:34,222-Speed 9372.31 samples/sec   Loss 2.5587   LearningRate 0.0006   Epoch: 11   Global Step: 20470   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:36:00,320-Speed 9417.32 samples/sec   Loss 2.5396   LearningRate 0.0006   Epoch: 11   Global Step: 20480   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:36:26,384-Speed 9429.47 samples/sec   Loss 2.5787   LearningRate 0.0006   Epoch: 11   Global Step: 20490   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:36:52,590-Speed 9378.46 samples/sec   Loss 2.5854   LearningRate 0.0006   Epoch: 11   Global Step: 20500   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:37:18,769-Speed 9388.09 samples/sec   Loss 2.5513   LearningRate 0.0006   Epoch: 11   Global Step: 20510   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:37:44,908-Speed 9402.39 samples/sec   Loss 2.5402   LearningRate 0.0006   Epoch: 11   Global Step: 20520   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:38:10,992-Speed 9422.37 samples/sec   Loss 2.5429   LearningRate 0.0006   Epoch: 11   Global Step: 20530   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:38:37,168-Speed 9389.03 samples/sec   Loss 2.5492   LearningRate 0.0006   Epoch: 11   Global Step: 20540   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-03-05 11:39:03,364-Speed 9381.96 samples/sec   Loss 2.5446   LearningRate 0.0006   Epoch: 11   Global Step: 20550   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-03-05 11:39:29,498-Speed 9404.33 samples/sec   Loss 2.5327   LearningRate 0.0006   Epoch: 11   Global Step: 20560   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-03-05 11:39:55,655-Speed 9396.06 samples/sec   Loss 2.5399   LearningRate 0.0006   Epoch: 11   Global Step: 20570   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-03-05 11:40:21,706-Speed 9434.53 samples/sec   Loss 2.5269   LearningRate 0.0006   Epoch: 11   Global Step: 20580   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-03-05 11:40:47,794-Speed 9421.05 samples/sec   Loss 2.5464   LearningRate 0.0006   Epoch: 11   Global Step: 20590   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:41:13,987-Speed 9383.11 samples/sec   Loss 2.5237   LearningRate 0.0006   Epoch: 11   Global Step: 20600   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:41:40,193-Speed 9378.19 samples/sec   Loss 2.5173   LearningRate 0.0006   Epoch: 11   Global Step: 20610   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:42:06,313-Speed 9409.33 samples/sec   Loss 2.5201   LearningRate 0.0006   Epoch: 11   Global Step: 20620   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:42:32,459-Speed 9400.11 samples/sec   Loss 2.5285   LearningRate 0.0006   Epoch: 11   Global Step: 20630   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:42:58,549-Speed 9420.00 samples/sec   Loss 2.5433   LearningRate 0.0006   Epoch: 11   Global Step: 20640   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:43:24,635-Speed 9421.66 samples/sec   Loss 2.5304   LearningRate 0.0006   Epoch: 11   Global Step: 20650   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:43:50,835-Speed 9380.44 samples/sec   Loss 2.5091   LearningRate 0.0006   Epoch: 11   Global Step: 20660   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:44:16,975-Speed 9401.85 samples/sec   Loss 2.5460   LearningRate 0.0006   Epoch: 11   Global Step: 20670   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:44:43,063-Speed 9421.25 samples/sec   Loss 2.5291   LearningRate 0.0006   Epoch: 11   Global Step: 20680   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:45:09,177-Speed 9411.11 samples/sec   Loss 2.5440   LearningRate 0.0006   Epoch: 11   Global Step: 20690   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-03-05 11:45:35,258-Speed 9423.59 samples/sec   Loss 2.5459   LearningRate 0.0006   Epoch: 11   Global Step: 20700   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-03-05 11:46:01,367-Speed 9413.11 samples/sec   Loss 2.5475   LearningRate 0.0006   Epoch: 11   Global Step: 20710   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-03-05 11:46:27,543-Speed 9389.14 samples/sec   Loss 2.5384   LearningRate 0.0006   Epoch: 11   Global Step: 20720   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-03-05 11:46:53,689-Speed 9400.07 samples/sec   Loss 2.5424   LearningRate 0.0006   Epoch: 11   Global Step: 20730   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-03-05 11:47:19,759-Speed 9427.20 samples/sec   Loss 2.5489   LearningRate 0.0006   Epoch: 11   Global Step: 20740   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:48:38,025-Speed 3140.11 samples/sec   Loss 2.5074   LearningRate 0.0006   Epoch: 12   Global Step: 20750   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:49:03,939-Speed 9484.25 samples/sec   Loss 2.4770   LearningRate 0.0006   Epoch: 12   Global Step: 20760   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:49:29,943-Speed 9451.47 samples/sec   Loss 2.5063   LearningRate 0.0006   Epoch: 12   Global Step: 20770   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:49:56,036-Speed 9418.78 samples/sec   Loss 2.4904   LearningRate 0.0006   Epoch: 12   Global Step: 20780   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:50:22,031-Speed 9454.47 samples/sec   Loss 2.4941   LearningRate 0.0006   Epoch: 12   Global Step: 20790   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:50:47,983-Speed 9470.42 samples/sec   Loss 2.5037   LearningRate 0.0006   Epoch: 12   Global Step: 20800   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:51:13,948-Speed 9465.68 samples/sec   Loss 2.5010   LearningRate 0.0006   Epoch: 12   Global Step: 20810   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:51:40,033-Speed 9421.99 samples/sec   Loss 2.4928   LearningRate 0.0006   Epoch: 12   Global Step: 20820   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:52:06,169-Speed 9403.54 samples/sec   Loss 2.5084   LearningRate 0.0006   Epoch: 12   Global Step: 20830   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:52:32,305-Speed 9403.52 samples/sec   Loss 2.5045   LearningRate 0.0006   Epoch: 12   Global Step: 20840   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-03-05 11:52:58,448-Speed 9401.12 samples/sec   Loss 2.4926   LearningRate 0.0006   Epoch: 12   Global Step: 20850   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-03-05 11:53:24,570-Speed 9408.55 samples/sec   Loss 2.4882   LearningRate 0.0006   Epoch: 12   Global Step: 20860   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-03-05 11:53:50,676-Speed 9414.37 samples/sec   Loss 2.4859   LearningRate 0.0006   Epoch: 12   Global Step: 20870   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-03-05 11:54:16,777-Speed 9415.79 samples/sec   Loss 2.5004   LearningRate 0.0006   Epoch: 12   Global Step: 20880   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-03-05 11:54:42,849-Speed 9426.71 samples/sec   Loss 2.5001   LearningRate 0.0006   Epoch: 12   Global Step: 20890   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:55:08,925-Speed 9425.19 samples/sec   Loss 2.4973   LearningRate 0.0006   Epoch: 12   Global Step: 20900   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:55:35,077-Speed 9397.72 samples/sec   Loss 2.5085   LearningRate 0.0006   Epoch: 12   Global Step: 20910   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:56:01,172-Speed 9418.55 samples/sec   Loss 2.5029   LearningRate 0.0006   Epoch: 12   Global Step: 20920   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:56:27,228-Speed 9432.38 samples/sec   Loss 2.5073   LearningRate 0.0006   Epoch: 12   Global Step: 20930   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:56:53,287-Speed 9431.31 samples/sec   Loss 2.4859   LearningRate 0.0006   Epoch: 12   Global Step: 20940   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-03-05 11:57:19,345-Speed 9431.69 samples/sec   Loss 2.4884   LearningRate 0.0006   Epoch: 12   Global Step: 20950   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 11:57:45,441-Speed 9418.16 samples/sec   Loss 2.4841   LearningRate 0.0006   Epoch: 12   Global Step: 20960   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 11:58:11,515-Speed 9426.03 samples/sec   Loss 2.4898   LearningRate 0.0006   Epoch: 12   Global Step: 20970   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 11:58:37,601-Speed 9421.36 samples/sec   Loss 2.4850   LearningRate 0.0006   Epoch: 12   Global Step: 20980   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 11:59:03,719-Speed 9410.39 samples/sec   Loss 2.4966   LearningRate 0.0006   Epoch: 12   Global Step: 20990   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-03-05 11:59:29,846-Speed 9406.73 samples/sec   Loss 2.4927   LearningRate 0.0006   Epoch: 12   Global Step: 21000   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-03-05 11:59:56,025-Speed 9388.41 samples/sec   Loss 2.5074   LearningRate 0.0006   Epoch: 12   Global Step: 21010   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-03-05 12:00:22,081-Speed 9432.56 samples/sec   Loss 2.4781   LearningRate 0.0006   Epoch: 12   Global Step: 21020   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:00:48,148-Speed 9428.59 samples/sec   Loss 2.4705   LearningRate 0.0006   Epoch: 12   Global Step: 21030   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:01:14,240-Speed 9419.39 samples/sec   Loss 2.5094   LearningRate 0.0006   Epoch: 12   Global Step: 21040   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:01:40,354-Speed 9411.40 samples/sec   Loss 2.5154   LearningRate 0.0006   Epoch: 12   Global Step: 21050   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:02:06,471-Speed 9410.66 samples/sec   Loss 2.4989   LearningRate 0.0006   Epoch: 12   Global Step: 21060   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:02:32,556-Speed 9422.05 samples/sec   Loss 2.4918   LearningRate 0.0006   Epoch: 12   Global Step: 21070   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:02:58,770-Speed 9375.52 samples/sec   Loss 2.5088   LearningRate 0.0006   Epoch: 12   Global Step: 21080   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:03:24,912-Speed 9401.22 samples/sec   Loss 2.4931   LearningRate 0.0006   Epoch: 12   Global Step: 21090   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:03:51,112-Speed 9380.94 samples/sec   Loss 2.4815   LearningRate 0.0006   Epoch: 12   Global Step: 21100   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:04:17,283-Speed 9391.18 samples/sec   Loss 2.4938   LearningRate 0.0006   Epoch: 12   Global Step: 21110   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:04:43,320-Speed 9439.34 samples/sec   Loss 2.4885   LearningRate 0.0006   Epoch: 12   Global Step: 21120   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:05:09,449-Speed 9406.05 samples/sec   Loss 2.4668   LearningRate 0.0006   Epoch: 12   Global Step: 21130   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:05:35,551-Speed 9415.99 samples/sec   Loss 2.4766   LearningRate 0.0006   Epoch: 12   Global Step: 21140   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:06:01,628-Speed 9424.99 samples/sec   Loss 2.4749   LearningRate 0.0006   Epoch: 12   Global Step: 21150   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:06:27,722-Speed 9418.59 samples/sec   Loss 2.4763   LearningRate 0.0006   Epoch: 12   Global Step: 21160   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:06:53,869-Speed 9399.68 samples/sec   Loss 2.4876   LearningRate 0.0006   Epoch: 12   Global Step: 21170   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:07:19,974-Speed 9414.39 samples/sec   Loss 2.4650   LearningRate 0.0006   Epoch: 12   Global Step: 21180   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:07:46,096-Speed 9408.68 samples/sec   Loss 2.4630   LearningRate 0.0006   Epoch: 12   Global Step: 21190   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:08:12,218-Speed 9409.53 samples/sec   Loss 2.4742   LearningRate 0.0006   Epoch: 12   Global Step: 21200   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:08:38,338-Speed 9409.43 samples/sec   Loss 2.4986   LearningRate 0.0006   Epoch: 12   Global Step: 21210   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:09:04,404-Speed 9428.74 samples/sec   Loss 2.4863   LearningRate 0.0006   Epoch: 12   Global Step: 21220   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:09:30,514-Speed 9412.68 samples/sec   Loss 2.4664   LearningRate 0.0006   Epoch: 12   Global Step: 21230   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:09:56,635-Speed 9409.07 samples/sec   Loss 2.4725   LearningRate 0.0006   Epoch: 12   Global Step: 21240   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:10:22,705-Speed 9427.27 samples/sec   Loss 2.4645   LearningRate 0.0006   Epoch: 12   Global Step: 21250   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:10:48,815-Speed 9413.03 samples/sec   Loss 2.4696   LearningRate 0.0006   Epoch: 12   Global Step: 21260   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:11:15,003-Speed 9384.90 samples/sec   Loss 2.4695   LearningRate 0.0006   Epoch: 12   Global Step: 21270   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:11:41,169-Speed 9392.51 samples/sec   Loss 2.4751   LearningRate 0.0006   Epoch: 12   Global Step: 21280   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:12:07,339-Speed 9391.22 samples/sec   Loss 2.4580   LearningRate 0.0006   Epoch: 12   Global Step: 21290   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:12:33,413-Speed 9425.96 samples/sec   Loss 2.4514   LearningRate 0.0006   Epoch: 12   Global Step: 21300   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:12:59,633-Speed 9373.60 samples/sec   Loss 2.4590   LearningRate 0.0006   Epoch: 12   Global Step: 21310   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:13:25,739-Speed 9414.49 samples/sec   Loss 2.4515   LearningRate 0.0006   Epoch: 12   Global Step: 21320   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:13:51,830-Speed 9419.89 samples/sec   Loss 2.4612   LearningRate 0.0006   Epoch: 12   Global Step: 21330   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:14:17,961-Speed 9405.60 samples/sec   Loss 2.4529   LearningRate 0.0006   Epoch: 12   Global Step: 21340   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:14:44,207-Speed 9364.18 samples/sec   Loss 2.4710   LearningRate 0.0006   Epoch: 12   Global Step: 21350   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:15:10,330-Speed 9408.09 samples/sec   Loss 2.4590   LearningRate 0.0006   Epoch: 12   Global Step: 21360   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:15:36,495-Speed 9393.20 samples/sec   Loss 2.4623   LearningRate 0.0006   Epoch: 12   Global Step: 21370   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:16:02,673-Speed 9388.40 samples/sec   Loss 2.4613   LearningRate 0.0006   Epoch: 12   Global Step: 21380   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:16:28,785-Speed 9412.10 samples/sec   Loss 2.4530   LearningRate 0.0006   Epoch: 12   Global Step: 21390   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:16:54,885-Speed 9416.66 samples/sec   Loss 2.4540   LearningRate 0.0006   Epoch: 12   Global Step: 21400   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:17:20,979-Speed 9418.96 samples/sec   Loss 2.4531   LearningRate 0.0006   Epoch: 12   Global Step: 21410   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:17:47,042-Speed 9429.96 samples/sec   Loss 2.4492   LearningRate 0.0006   Epoch: 12   Global Step: 21420   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:18:13,207-Speed 9393.07 samples/sec   Loss 2.4475   LearningRate 0.0006   Epoch: 12   Global Step: 21430   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:18:39,309-Speed 9415.95 samples/sec   Loss 2.4371   LearningRate 0.0006   Epoch: 12   Global Step: 21440   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:19:05,431-Speed 9408.74 samples/sec   Loss 2.4543   LearningRate 0.0006   Epoch: 12   Global Step: 21450   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:19:31,582-Speed 9398.26 samples/sec   Loss 2.4494   LearningRate 0.0006   Epoch: 12   Global Step: 21460   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:19:57,751-Speed 9391.38 samples/sec   Loss 2.4678   LearningRate 0.0006   Epoch: 12   Global Step: 21470   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:20:23,775-Speed 9444.22 samples/sec   Loss 2.4576   LearningRate 0.0006   Epoch: 12   Global Step: 21480   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:20:49,909-Speed 9404.37 samples/sec   Loss 2.4492   LearningRate 0.0006   Epoch: 12   Global Step: 21490   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:21:16,024-Speed 9411.16 samples/sec   Loss 2.4523   LearningRate 0.0006   Epoch: 12   Global Step: 21500   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:21:42,161-Speed 9403.05 samples/sec   Loss 2.4456   LearningRate 0.0006   Epoch: 12   Global Step: 21510   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:22:08,281-Speed 9409.74 samples/sec   Loss 2.4360   LearningRate 0.0006   Epoch: 12   Global Step: 21520   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:22:34,439-Speed 9395.48 samples/sec   Loss 2.4529   LearningRate 0.0006   Epoch: 12   Global Step: 21530   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:23:00,673-Speed 9368.40 samples/sec   Loss 2.4327   LearningRate 0.0006   Epoch: 12   Global Step: 21540   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:23:26,776-Speed 9415.31 samples/sec   Loss 2.4607   LearningRate 0.0006   Epoch: 12   Global Step: 21550   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:23:52,926-Speed 9398.83 samples/sec   Loss 2.4484   LearningRate 0.0006   Epoch: 12   Global Step: 21560   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:24:19,037-Speed 9412.58 samples/sec   Loss 2.4521   LearningRate 0.0006   Epoch: 12   Global Step: 21570   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:24:45,204-Speed 9392.06 samples/sec   Loss 2.4336   LearningRate 0.0006   Epoch: 12   Global Step: 21580   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:25:11,383-Speed 9388.27 samples/sec   Loss 2.4343   LearningRate 0.0006   Epoch: 12   Global Step: 21590   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:25:37,585-Speed 9380.12 samples/sec   Loss 2.4508   LearningRate 0.0006   Epoch: 12   Global Step: 21600   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:26:03,700-Speed 9411.06 samples/sec   Loss 2.4215   LearningRate 0.0006   Epoch: 12   Global Step: 21610   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:26:29,910-Speed 9377.19 samples/sec   Loss 2.4396   LearningRate 0.0006   Epoch: 12   Global Step: 21620   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:26:56,022-Speed 9412.06 samples/sec   Loss 2.4354   LearningRate 0.0006   Epoch: 12   Global Step: 21630   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:27:22,154-Speed 9405.48 samples/sec   Loss 2.4534   LearningRate 0.0006   Epoch: 12   Global Step: 21640   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:27:48,245-Speed 9419.68 samples/sec   Loss 2.4252   LearningRate 0.0006   Epoch: 12   Global Step: 21650   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:28:14,369-Speed 9407.97 samples/sec   Loss 2.4201   LearningRate 0.0006   Epoch: 12   Global Step: 21660   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:28:40,536-Speed 9392.32 samples/sec   Loss 2.4310   LearningRate 0.0006   Epoch: 12   Global Step: 21670   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:29:06,690-Speed 9397.11 samples/sec   Loss 2.4284   LearningRate 0.0006   Epoch: 12   Global Step: 21680   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:29:32,779-Speed 9420.74 samples/sec   Loss 2.4253   LearningRate 0.0006   Epoch: 12   Global Step: 21690   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:29:58,882-Speed 9415.16 samples/sec   Loss 2.4175   LearningRate 0.0006   Epoch: 12   Global Step: 21700   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:30:25,040-Speed 9395.72 samples/sec   Loss 2.4156   LearningRate 0.0006   Epoch: 12   Global Step: 21710   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:30:51,076-Speed 9439.71 samples/sec   Loss 2.4286   LearningRate 0.0006   Epoch: 12   Global Step: 21720   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:31:17,187-Speed 9412.49 samples/sec   Loss 2.4282   LearningRate 0.0006   Epoch: 12   Global Step: 21730   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:31:43,325-Speed 9402.91 samples/sec   Loss 2.4354   LearningRate 0.0006   Epoch: 12   Global Step: 21740   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:32:09,441-Speed 9410.29 samples/sec   Loss 2.4119   LearningRate 0.0006   Epoch: 12   Global Step: 21750   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:32:35,558-Speed 9410.42 samples/sec   Loss 2.4029   LearningRate 0.0006   Epoch: 12   Global Step: 21760   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:33:01,698-Speed 9401.94 samples/sec   Loss 2.4567   LearningRate 0.0006   Epoch: 12   Global Step: 21770   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:33:27,797-Speed 9416.67 samples/sec   Loss 2.4323   LearningRate 0.0006   Epoch: 12   Global Step: 21780   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:33:54,011-Speed 9376.05 samples/sec   Loss 2.4246   LearningRate 0.0006   Epoch: 12   Global Step: 21790   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:34:20,092-Speed 9423.31 samples/sec   Loss 2.4092   LearningRate 0.0006   Epoch: 12   Global Step: 21800   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:34:46,173-Speed 9423.28 samples/sec   Loss 2.4012   LearningRate 0.0006   Epoch: 12   Global Step: 21810   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:35:12,248-Speed 9425.44 samples/sec   Loss 2.4029   LearningRate 0.0006   Epoch: 12   Global Step: 21820   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:35:38,341-Speed 9419.18 samples/sec   Loss 2.4089   LearningRate 0.0006   Epoch: 12   Global Step: 21830   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:36:04,555-Speed 9375.80 samples/sec   Loss 2.4465   LearningRate 0.0006   Epoch: 12   Global Step: 21840   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:36:30,725-Speed 9391.20 samples/sec   Loss 2.3940   LearningRate 0.0006   Epoch: 12   Global Step: 21850   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:36:56,918-Speed 9383.29 samples/sec   Loss 2.4206   LearningRate 0.0006   Epoch: 12   Global Step: 21860   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:37:23,013-Speed 9418.26 samples/sec   Loss 2.4157   LearningRate 0.0006   Epoch: 12   Global Step: 21870   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:37:49,185-Speed 9390.63 samples/sec   Loss 2.4104   LearningRate 0.0006   Epoch: 12   Global Step: 21880   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:38:15,293-Speed 9413.89 samples/sec   Loss 2.4170   LearningRate 0.0006   Epoch: 12   Global Step: 21890   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:38:41,350-Speed 9431.91 samples/sec   Loss 2.4379   LearningRate 0.0006   Epoch: 12   Global Step: 21900   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:39:07,510-Speed 9395.12 samples/sec   Loss 2.4214   LearningRate 0.0006   Epoch: 12   Global Step: 21910   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:39:33,613-Speed 9415.17 samples/sec   Loss 2.4084   LearningRate 0.0006   Epoch: 12   Global Step: 21920   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-03-05 12:39:59,792-Speed 9388.47 samples/sec   Loss 2.3876   LearningRate 0.0006   Epoch: 12   Global Step: 21930   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-03-05 12:40:25,832-Speed 9438.19 samples/sec   Loss 2.4127   LearningRate 0.0006   Epoch: 12   Global Step: 21940   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:40:51,898-Speed 9428.90 samples/sec   Loss 2.4109   LearningRate 0.0006   Epoch: 12   Global Step: 21950   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:41:17,950-Speed 9433.47 samples/sec   Loss 2.4157   LearningRate 0.0006   Epoch: 12   Global Step: 21960   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:41:44,041-Speed 9420.05 samples/sec   Loss 2.4122   LearningRate 0.0006   Epoch: 12   Global Step: 21970   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:42:10,074-Speed 9440.61 samples/sec   Loss 2.4072   LearningRate 0.0006   Epoch: 12   Global Step: 21980   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:42:36,233-Speed 9395.34 samples/sec   Loss 2.4142   LearningRate 0.0006   Epoch: 12   Global Step: 21990   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:43:02,308-Speed 9425.46 samples/sec   Loss 2.3955   LearningRate 0.0006   Epoch: 12   Global Step: 22000   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:43:28,423-Speed 9411.21 samples/sec   Loss 2.3901   LearningRate 0.0006   Epoch: 12   Global Step: 22010   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:43:54,602-Speed 9388.01 samples/sec   Loss 2.3963   LearningRate 0.0006   Epoch: 12   Global Step: 22020   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:44:20,726-Speed 9408.20 samples/sec   Loss 2.4186   LearningRate 0.0006   Epoch: 12   Global Step: 22030   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:44:46,830-Speed 9415.15 samples/sec   Loss 2.3774   LearningRate 0.0006   Epoch: 12   Global Step: 22040   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:45:12,916-Speed 9421.61 samples/sec   Loss 2.3929   LearningRate 0.0006   Epoch: 12   Global Step: 22050   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-03-05 12:45:39,046-Speed 9405.75 samples/sec   Loss 2.3851   LearningRate 0.0006   Epoch: 12   Global Step: 22060   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:46:05,178-Speed 9404.85 samples/sec   Loss 2.3776   LearningRate 0.0006   Epoch: 12   Global Step: 22070   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:46:31,363-Speed 9386.13 samples/sec   Loss 2.3969   LearningRate 0.0006   Epoch: 12   Global Step: 22080   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:46:57,568-Speed 9378.92 samples/sec   Loss 2.3909   LearningRate 0.0006   Epoch: 12   Global Step: 22090   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:47:23,648-Speed 9423.75 samples/sec   Loss 2.3841   LearningRate 0.0006   Epoch: 12   Global Step: 22100   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:47:49,784-Speed 9403.27 samples/sec   Loss 2.3900   LearningRate 0.0006   Epoch: 12   Global Step: 22110   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:48:15,902-Speed 9410.26 samples/sec   Loss 2.3788   LearningRate 0.0006   Epoch: 12   Global Step: 22120   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:48:41,959-Speed 9432.06 samples/sec   Loss 2.3898   LearningRate 0.0006   Epoch: 12   Global Step: 22130   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:49:08,141-Speed 9387.07 samples/sec   Loss 2.4066   LearningRate 0.0006   Epoch: 12   Global Step: 22140   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:49:34,321-Speed 9387.79 samples/sec   Loss 2.4005   LearningRate 0.0006   Epoch: 12   Global Step: 22150   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:50:00,497-Speed 9389.39 samples/sec   Loss 2.3892   LearningRate 0.0006   Epoch: 12   Global Step: 22160   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-03-05 12:50:26,681-Speed 9385.87 samples/sec   Loss 2.3952   LearningRate 0.0006   Epoch: 12   Global Step: 22170   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-03-05 12:50:52,878-Speed 9381.97 samples/sec   Loss 2.3892   LearningRate 0.0006   Epoch: 12   Global Step: 22180   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-03-05 12:51:19,126-Speed 9363.31 samples/sec   Loss 2.3796   LearningRate 0.0006   Epoch: 12   Global Step: 22190   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:51:45,427-Speed 9344.50 samples/sec   Loss 2.3922   LearningRate 0.0006   Epoch: 12   Global Step: 22200   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:52:11,607-Speed 9388.00 samples/sec   Loss 2.3850   LearningRate 0.0006   Epoch: 12   Global Step: 22210   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:52:37,767-Speed 9394.93 samples/sec   Loss 2.3709   LearningRate 0.0006   Epoch: 12   Global Step: 22220   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:53:03,935-Speed 9392.10 samples/sec   Loss 2.4215   LearningRate 0.0006   Epoch: 12   Global Step: 22230   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:53:30,000-Speed 9429.17 samples/sec   Loss 2.4216   LearningRate 0.0006   Epoch: 12   Global Step: 22240   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:53:56,145-Speed 9400.22 samples/sec   Loss 2.3980   LearningRate 0.0006   Epoch: 12   Global Step: 22250   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-05 12:54:22,266-Speed 9409.00 samples/sec   Loss 2.3815   LearningRate 0.0006   Epoch: 12   Global Step: 22260   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 12:54:48,492-Speed 9371.10 samples/sec   Loss 2.3603   LearningRate 0.0006   Epoch: 12   Global Step: 22270   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 12:55:14,681-Speed 9384.67 samples/sec   Loss 2.3744   LearningRate 0.0006   Epoch: 12   Global Step: 22280   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 12:55:40,916-Speed 9368.22 samples/sec   Loss 2.3885   LearningRate 0.0006   Epoch: 12   Global Step: 22290   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-03-05 12:56:07,084-Speed 9391.95 samples/sec   Loss 2.3702   LearningRate 0.0006   Epoch: 12   Global Step: 22300   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 12:56:33,167-Speed 9422.67 samples/sec   Loss 2.3767   LearningRate 0.0006   Epoch: 12   Global Step: 22310   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 12:56:59,248-Speed 9423.61 samples/sec   Loss 2.3950   LearningRate 0.0006   Epoch: 12   Global Step: 22320   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 12:57:25,411-Speed 9393.87 samples/sec   Loss 2.3745   LearningRate 0.0006   Epoch: 12   Global Step: 22330   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 12:57:51,506-Speed 9418.52 samples/sec   Loss 2.3696   LearningRate 0.0006   Epoch: 12   Global Step: 22340   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 12:58:17,621-Speed 9411.27 samples/sec   Loss 2.3801   LearningRate 0.0006   Epoch: 12   Global Step: 22350   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 12:58:43,698-Speed 9425.00 samples/sec   Loss 2.3793   LearningRate 0.0006   Epoch: 12   Global Step: 22360   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 12:59:09,787-Speed 9420.59 samples/sec   Loss 2.3777   LearningRate 0.0006   Epoch: 12   Global Step: 22370   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 12:59:35,908-Speed 9408.97 samples/sec   Loss 2.3716   LearningRate 0.0006   Epoch: 12   Global Step: 22380   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:00:02,045-Speed 9402.98 samples/sec   Loss 2.3615   LearningRate 0.0006   Epoch: 12   Global Step: 22390   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:00:28,146-Speed 9416.40 samples/sec   Loss 2.3631   LearningRate 0.0006   Epoch: 12   Global Step: 22400   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-03-05 13:00:54,289-Speed 9400.81 samples/sec   Loss 2.3949   LearningRate 0.0006   Epoch: 12   Global Step: 22410   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-03-05 13:01:20,338-Speed 9435.01 samples/sec   Loss 2.3915   LearningRate 0.0006   Epoch: 12   Global Step: 22420   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:01:46,509-Speed 9391.00 samples/sec   Loss 2.4119   LearningRate 0.0006   Epoch: 12   Global Step: 22430   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:02:12,609-Speed 9416.41 samples/sec   Loss 2.3887   LearningRate 0.0006   Epoch: 12   Global Step: 22440   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:02:38,817-Speed 9377.53 samples/sec   Loss 2.3769   LearningRate 0.0006   Epoch: 12   Global Step: 22450   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:03:04,884-Speed 9428.52 samples/sec   Loss 2.3950   LearningRate 0.0006   Epoch: 12   Global Step: 22460   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:04:23,950-Speed 3108.37 samples/sec   Loss 2.3930   LearningRate 0.0006   Epoch: 13   Global Step: 22470   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:04:50,008-Speed 9431.82 samples/sec   Loss 2.3370   LearningRate 0.0006   Epoch: 13   Global Step: 22480   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:05:16,064-Speed 9432.66 samples/sec   Loss 2.3524   LearningRate 0.0006   Epoch: 13   Global Step: 22490   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:05:42,191-Speed 9406.80 samples/sec   Loss 2.3550   LearningRate 0.0006   Epoch: 13   Global Step: 22500   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:06:08,300-Speed 9413.39 samples/sec   Loss 2.3399   LearningRate 0.0006   Epoch: 13   Global Step: 22510   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:06:34,400-Speed 9416.80 samples/sec   Loss 2.3346   LearningRate 0.0006   Epoch: 13   Global Step: 22520   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-03-05 13:07:00,495-Speed 9418.23 samples/sec   Loss 2.3394   LearningRate 0.0006   Epoch: 13   Global Step: 22530   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-03-05 13:07:26,618-Speed 9408.21 samples/sec   Loss 2.3538   LearningRate 0.0006   Epoch: 13   Global Step: 22540   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-03-05 13:07:52,719-Speed 9416.86 samples/sec   Loss 2.3385   LearningRate 0.0006   Epoch: 13   Global Step: 22550   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-03-05 13:08:18,857-Speed 9402.48 samples/sec   Loss 2.3420   LearningRate 0.0006   Epoch: 13   Global Step: 22560   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:08:45,074-Speed 9374.55 samples/sec   Loss 2.3369   LearningRate 0.0006   Epoch: 13   Global Step: 22570   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:09:11,233-Speed 9395.29 samples/sec   Loss 2.3514   LearningRate 0.0006   Epoch: 13   Global Step: 22580   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:09:37,270-Speed 9440.11 samples/sec   Loss 2.3567   LearningRate 0.0006   Epoch: 13   Global Step: 22590   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:10:03,365-Speed 9418.50 samples/sec   Loss 2.3295   LearningRate 0.0006   Epoch: 13   Global Step: 22600   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:10:29,481-Speed 9410.66 samples/sec   Loss 2.3340   LearningRate 0.0006   Epoch: 13   Global Step: 22610   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:10:55,656-Speed 9389.41 samples/sec   Loss 2.3408   LearningRate 0.0006   Epoch: 13   Global Step: 22620   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:11:21,776-Speed 9409.47 samples/sec   Loss 2.3448   LearningRate 0.0006   Epoch: 13   Global Step: 22630   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:11:47,895-Speed 9409.41 samples/sec   Loss 2.3389   LearningRate 0.0006   Epoch: 13   Global Step: 22640   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:12:13,978-Speed 9422.76 samples/sec   Loss 2.3475   LearningRate 0.0006   Epoch: 13   Global Step: 22650   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:12:40,054-Speed 9425.17 samples/sec   Loss 2.3467   LearningRate 0.0006   Epoch: 13   Global Step: 22660   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-03-05 13:13:06,214-Speed 9394.92 samples/sec   Loss 2.3639   LearningRate 0.0006   Epoch: 13   Global Step: 22670   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-03-05 13:13:32,341-Speed 9406.73 samples/sec   Loss 2.3367   LearningRate 0.0006   Epoch: 13   Global Step: 22680   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-03-05 13:13:58,515-Speed 9389.81 samples/sec   Loss 2.3334   LearningRate 0.0006   Epoch: 13   Global Step: 22690   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-03-05 13:14:24,707-Speed 9383.48 samples/sec   Loss 2.3279   LearningRate 0.0006   Epoch: 13   Global Step: 22700   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-03-05 13:14:50,803-Speed 9418.16 samples/sec   Loss 2.3610   LearningRate 0.0006   Epoch: 13   Global Step: 22710   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-03-05 13:15:16,924-Speed 9408.66 samples/sec   Loss 2.3495   LearningRate 0.0006   Epoch: 13   Global Step: 22720   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:15:43,046-Speed 9408.68 samples/sec   Loss 2.3585   LearningRate 0.0006   Epoch: 13   Global Step: 22730   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:16:09,178-Speed 9405.05 samples/sec   Loss 2.3448   LearningRate 0.0006   Epoch: 13   Global Step: 22740   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-05 13:16:35,274-Speed 9418.18 samples/sec   Loss 2.3270   LearningRate 0.0006   Epoch: 13   Global Step: 22750   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-05 13:17:01,357-Speed 9422.88 samples/sec   Loss 2.3546   LearningRate 0.0006   Epoch: 13   Global Step: 22760   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-05 13:17:27,526-Speed 9391.89 samples/sec   Loss 2.3524   LearningRate 0.0006   Epoch: 13   Global Step: 22770   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-05 13:17:53,652-Speed 9407.48 samples/sec   Loss 2.3389   LearningRate 0.0006   Epoch: 13   Global Step: 22780   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-05 13:18:19,757-Speed 9414.99 samples/sec   Loss 2.3349   LearningRate 0.0006   Epoch: 13   Global Step: 22790   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-05 13:18:45,843-Speed 9422.01 samples/sec   Loss 2.3454   LearningRate 0.0006   Epoch: 13   Global Step: 22800   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-05 13:19:11,976-Speed 9404.76 samples/sec   Loss 2.3661   LearningRate 0.0006   Epoch: 13   Global Step: 22810   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-05 13:19:38,053-Speed 9424.73 samples/sec   Loss 2.3442   LearningRate 0.0006   Epoch: 13   Global Step: 22820   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-05 13:20:04,265-Speed 9376.42 samples/sec   Loss 2.3227   LearningRate 0.0006   Epoch: 13   Global Step: 22830   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-05 13:20:30,398-Speed 9404.56 samples/sec   Loss 2.3297   LearningRate 0.0006   Epoch: 13   Global Step: 22840   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:20:56,582-Speed 9386.37 samples/sec   Loss 2.3357   LearningRate 0.0006   Epoch: 13   Global Step: 22850   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:21:22,754-Speed 9390.58 samples/sec   Loss 2.3415   LearningRate 0.0006   Epoch: 13   Global Step: 22860   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:21:48,950-Speed 9381.95 samples/sec   Loss 2.3508   LearningRate 0.0006   Epoch: 13   Global Step: 22870   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:22:15,043-Speed 9419.07 samples/sec   Loss 2.3399   LearningRate 0.0006   Epoch: 13   Global Step: 22880   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-05 13:22:41,136-Speed 9419.80 samples/sec   Loss 2.3179   LearningRate 0.0006   Epoch: 13   Global Step: 22890   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-05 13:23:07,258-Speed 9408.45 samples/sec   Loss 2.3266   LearningRate 0.0006   Epoch: 13   Global Step: 22900   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-05 13:23:33,431-Speed 9390.46 samples/sec   Loss 2.3199   LearningRate 0.0006   Epoch: 13   Global Step: 22910   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-05 13:23:59,637-Speed 9378.45 samples/sec   Loss 2.3235   LearningRate 0.0006   Epoch: 13   Global Step: 22920   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-05 13:24:25,766-Speed 9406.23 samples/sec   Loss 2.3264   LearningRate 0.0006   Epoch: 13   Global Step: 22930   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-05 13:24:51,831-Speed 9429.50 samples/sec   Loss 2.3210   LearningRate 0.0006   Epoch: 13   Global Step: 22940   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-05 13:25:18,014-Speed 9386.80 samples/sec   Loss 2.3228   LearningRate 0.0006   Epoch: 13   Global Step: 22950   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-05 13:25:44,188-Speed 9389.60 samples/sec   Loss 2.3308   LearningRate 0.0006   Epoch: 13   Global Step: 22960   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-05 13:26:10,364-Speed 9389.22 samples/sec   Loss 2.3275   LearningRate 0.0006   Epoch: 13   Global Step: 22970   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-05 13:26:36,500-Speed 9403.98 samples/sec   Loss 2.3398   LearningRate 0.0006   Epoch: 13   Global Step: 22980   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:27:02,627-Speed 9407.66 samples/sec   Loss 2.3270   LearningRate 0.0005   Epoch: 13   Global Step: 22990   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:27:28,799-Speed 9390.65 samples/sec   Loss 2.3190   LearningRate 0.0005   Epoch: 13   Global Step: 23000   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:27:54,912-Speed 9411.59 samples/sec   Loss 2.3224   LearningRate 0.0005   Epoch: 13   Global Step: 23010   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:28:21,074-Speed 9394.25 samples/sec   Loss 2.3103   LearningRate 0.0005   Epoch: 13   Global Step: 23020   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:28:47,169-Speed 9418.50 samples/sec   Loss 2.3034   LearningRate 0.0005   Epoch: 13   Global Step: 23030   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:29:13,280-Speed 9412.47 samples/sec   Loss 2.3250   LearningRate 0.0005   Epoch: 13   Global Step: 23040   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:29:39,370-Speed 9420.17 samples/sec   Loss 2.3173   LearningRate 0.0005   Epoch: 13   Global Step: 23050   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:30:05,458-Speed 9421.08 samples/sec   Loss 2.3097   LearningRate 0.0005   Epoch: 13   Global Step: 23060   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:30:31,613-Speed 9396.91 samples/sec   Loss 2.3160   LearningRate 0.0005   Epoch: 13   Global Step: 23070   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-05 13:30:57,754-Speed 9401.74 samples/sec   Loss 2.3177   LearningRate 0.0005   Epoch: 13   Global Step: 23080   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-05 13:31:23,880-Speed 9406.96 samples/sec   Loss 2.3183   LearningRate 0.0005   Epoch: 13   Global Step: 23090   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-05 13:31:50,069-Speed 9384.38 samples/sec   Loss 2.3111   LearningRate 0.0005   Epoch: 13   Global Step: 23100   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-05 13:32:16,166-Speed 9417.72 samples/sec   Loss 2.3039   LearningRate 0.0005   Epoch: 13   Global Step: 23110   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-05 13:32:42,219-Speed 9433.26 samples/sec   Loss 2.3050   LearningRate 0.0005   Epoch: 13   Global Step: 23120   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-05 13:33:08,352-Speed 9404.89 samples/sec   Loss 2.2989   LearningRate 0.0005   Epoch: 13   Global Step: 23130   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-05 13:33:34,559-Speed 9377.82 samples/sec   Loss 2.2983   LearningRate 0.0005   Epoch: 13   Global Step: 23140   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-05 13:34:00,738-Speed 9388.28 samples/sec   Loss 2.3189   LearningRate 0.0005   Epoch: 13   Global Step: 23150   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-05 13:34:26,927-Speed 9384.72 samples/sec   Loss 2.3013   LearningRate 0.0005   Epoch: 13   Global Step: 23160   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-05 13:34:53,076-Speed 9398.78 samples/sec   Loss 2.2926   LearningRate 0.0005   Epoch: 13   Global Step: 23170   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:35:19,182-Speed 9414.36 samples/sec   Loss 2.3038   LearningRate 0.0005   Epoch: 13   Global Step: 23180   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:35:45,282-Speed 9416.61 samples/sec   Loss 2.2939   LearningRate 0.0005   Epoch: 13   Global Step: 23190   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:36:11,366-Speed 9422.56 samples/sec   Loss 2.3037   LearningRate 0.0005   Epoch: 13   Global Step: 23200   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:36:37,411-Speed 9436.49 samples/sec   Loss 2.2978   LearningRate 0.0005   Epoch: 13   Global Step: 23210   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:37:03,488-Speed 9424.84 samples/sec   Loss 2.3170   LearningRate 0.0005   Epoch: 13   Global Step: 23220   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:37:29,586-Speed 9417.45 samples/sec   Loss 2.3149   LearningRate 0.0005   Epoch: 13   Global Step: 23230   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:37:55,666-Speed 9423.80 samples/sec   Loss 2.3062   LearningRate 0.0005   Epoch: 13   Global Step: 23240   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:38:21,751-Speed 9421.93 samples/sec   Loss 2.3141   LearningRate 0.0005   Epoch: 13   Global Step: 23250   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:38:47,942-Speed 9384.20 samples/sec   Loss 2.3009   LearningRate 0.0005   Epoch: 13   Global Step: 23260   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:39:14,080-Speed 9402.73 samples/sec   Loss 2.3046   LearningRate 0.0005   Epoch: 13   Global Step: 23270   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-03-05 13:39:40,247-Speed 9392.58 samples/sec   Loss 2.2828   LearningRate 0.0005   Epoch: 13   Global Step: 23280   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:40:06,411-Speed 9393.85 samples/sec   Loss 2.2858   LearningRate 0.0005   Epoch: 13   Global Step: 23290   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:40:32,531-Speed 9409.23 samples/sec   Loss 2.2944   LearningRate 0.0005   Epoch: 13   Global Step: 23300   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:40:58,596-Speed 9429.10 samples/sec   Loss 2.3020   LearningRate 0.0005   Epoch: 13   Global Step: 23310   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-05 13:41:24,823-Speed 9371.14 samples/sec   Loss 2.2843   LearningRate 0.0005   Epoch: 13   Global Step: 23320   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-05 13:41:51,096-Speed 9354.45 samples/sec   Loss 2.2932   LearningRate 0.0005   Epoch: 13   Global Step: 23330   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-05 13:42:17,289-Speed 9383.09 samples/sec   Loss 2.2837   LearningRate 0.0005   Epoch: 13   Global Step: 23340   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-05 13:42:43,378-Speed 9420.45 samples/sec   Loss 2.2800   LearningRate 0.0005   Epoch: 13   Global Step: 23350   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-05 13:43:09,552-Speed 9389.92 samples/sec   Loss 2.3006   LearningRate 0.0005   Epoch: 13   Global Step: 23360   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-05 13:43:35,703-Speed 9398.29 samples/sec   Loss 2.2966   LearningRate 0.0005   Epoch: 13   Global Step: 23370   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-05 13:44:01,852-Speed 9398.59 samples/sec   Loss 2.2791   LearningRate 0.0005   Epoch: 13   Global Step: 23380   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-05 13:44:28,004-Speed 9398.09 samples/sec   Loss 2.2970   LearningRate 0.0005   Epoch: 13   Global Step: 23390   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-05 13:44:54,113-Speed 9413.07 samples/sec   Loss 2.2735   LearningRate 0.0005   Epoch: 13   Global Step: 23400   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-05 13:45:20,279-Speed 9392.78 samples/sec   Loss 2.2746   LearningRate 0.0005   Epoch: 13   Global Step: 23410   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:45:46,373-Speed 9419.13 samples/sec   Loss 2.2777   LearningRate 0.0005   Epoch: 13   Global Step: 23420   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-05 13:46:12,508-Speed 9403.71 samples/sec   Loss 2.2667   LearningRate 0.0005   Epoch: 13   Global Step: 23430   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-05 13:46:38,607-Speed 9416.90 samples/sec   Loss 2.2887   LearningRate 0.0005   Epoch: 13   Global Step: 23440   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-05 13:47:04,686-Speed 9423.87 samples/sec   Loss 2.2971   LearningRate 0.0005   Epoch: 13   Global Step: 23450   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-05 13:47:30,780-Speed 9418.82 samples/sec   Loss 2.2782   LearningRate 0.0005   Epoch: 13   Global Step: 23460   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-05 13:47:56,965-Speed 9386.25 samples/sec   Loss 2.2926   LearningRate 0.0005   Epoch: 13   Global Step: 23470   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-05 13:48:23,071-Speed 9414.33 samples/sec   Loss 2.2779   LearningRate 0.0005   Epoch: 13   Global Step: 23480   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-05 13:48:49,159-Speed 9420.88 samples/sec   Loss 2.2582   LearningRate 0.0005   Epoch: 13   Global Step: 23490   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-05 13:49:15,287-Speed 9406.35 samples/sec   Loss 2.2631   LearningRate 0.0005   Epoch: 13   Global Step: 23500   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-05 13:49:41,411-Speed 9407.91 samples/sec   Loss 2.2652   LearningRate 0.0005   Epoch: 13   Global Step: 23510   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-05 13:50:07,594-Speed 9387.46 samples/sec   Loss 2.2682   LearningRate 0.0005   Epoch: 13   Global Step: 23520   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:50:33,746-Speed 9397.59 samples/sec   Loss 2.2819   LearningRate 0.0005   Epoch: 13   Global Step: 23530   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:50:59,823-Speed 9425.11 samples/sec   Loss 2.2716   LearningRate 0.0005   Epoch: 13   Global Step: 23540   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:51:26,032-Speed 9377.06 samples/sec   Loss 2.2710   LearningRate 0.0005   Epoch: 13   Global Step: 23550   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:51:52,129-Speed 9417.92 samples/sec   Loss 2.2697   LearningRate 0.0005   Epoch: 13   Global Step: 23560   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:52:18,196-Speed 9429.16 samples/sec   Loss 2.2902   LearningRate 0.0005   Epoch: 13   Global Step: 23570   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:52:44,364-Speed 9392.05 samples/sec   Loss 2.2685   LearningRate 0.0005   Epoch: 13   Global Step: 23580   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:53:10,536-Speed 9390.81 samples/sec   Loss 2.2992   LearningRate 0.0005   Epoch: 13   Global Step: 23590   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:53:36,717-Speed 9387.24 samples/sec   Loss 2.2769   LearningRate 0.0005   Epoch: 13   Global Step: 23600   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:54:03,072-Speed 9325.25 samples/sec   Loss 2.2681   LearningRate 0.0005   Epoch: 13   Global Step: 23610   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-05 13:54:29,316-Speed 9365.03 samples/sec   Loss 2.2614   LearningRate 0.0005   Epoch: 13   Global Step: 23620   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-05 13:54:55,649-Speed 9333.25 samples/sec   Loss 2.2614   LearningRate 0.0005   Epoch: 13   Global Step: 23630   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-05 13:55:21,973-Speed 9336.65 samples/sec   Loss 2.2765   LearningRate 0.0005   Epoch: 13   Global Step: 23640   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-05 13:55:48,195-Speed 9372.45 samples/sec   Loss 2.2771   LearningRate 0.0005   Epoch: 13   Global Step: 23650   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 13:56:14,563-Speed 9321.04 samples/sec   Loss 2.2779   LearningRate 0.0005   Epoch: 13   Global Step: 23660   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 13:56:40,870-Speed 9342.53 samples/sec   Loss 2.2652   LearningRate 0.0005   Epoch: 13   Global Step: 23670   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 13:57:07,156-Speed 9349.73 samples/sec   Loss 2.2783   LearningRate 0.0005   Epoch: 13   Global Step: 23680   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 13:57:33,350-Speed 9382.59 samples/sec   Loss 2.2613   LearningRate 0.0005   Epoch: 13   Global Step: 23690   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 13:57:59,563-Speed 9376.22 samples/sec   Loss 2.2678   LearningRate 0.0005   Epoch: 13   Global Step: 23700   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 13:58:25,844-Speed 9351.80 samples/sec   Loss 2.2634   LearningRate 0.0005   Epoch: 13   Global Step: 23710   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 13:58:52,196-Speed 9326.17 samples/sec   Loss 2.2511   LearningRate 0.0005   Epoch: 13   Global Step: 23720   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 13:59:18,430-Speed 9368.46 samples/sec   Loss 2.2715   LearningRate 0.0005   Epoch: 13   Global Step: 23730   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 13:59:44,727-Speed 9346.05 samples/sec   Loss 2.2700   LearningRate 0.0005   Epoch: 13   Global Step: 23740   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:00:11,000-Speed 9354.44 samples/sec   Loss 2.2436   LearningRate 0.0005   Epoch: 13   Global Step: 23750   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-05 14:00:37,236-Speed 9368.01 samples/sec   Loss 2.2612   LearningRate 0.0005   Epoch: 13   Global Step: 23760   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-05 14:01:03,424-Speed 9384.60 samples/sec   Loss 2.2385   LearningRate 0.0005   Epoch: 13   Global Step: 23770   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:01:29,651-Speed 9370.95 samples/sec   Loss 2.2444   LearningRate 0.0005   Epoch: 13   Global Step: 23780   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:01:55,840-Speed 9385.24 samples/sec   Loss 2.2613   LearningRate 0.0005   Epoch: 13   Global Step: 23790   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:02:22,077-Speed 9367.42 samples/sec   Loss 2.2544   LearningRate 0.0005   Epoch: 13   Global Step: 23800   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:02:48,248-Speed 9390.84 samples/sec   Loss 2.2603   LearningRate 0.0005   Epoch: 13   Global Step: 23810   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:03:14,447-Speed 9380.84 samples/sec   Loss 2.2663   LearningRate 0.0005   Epoch: 13   Global Step: 23820   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:03:40,565-Speed 9410.08 samples/sec   Loss 2.2437   LearningRate 0.0005   Epoch: 13   Global Step: 23830   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:04:06,809-Speed 9364.81 samples/sec   Loss 2.2454   LearningRate 0.0005   Epoch: 13   Global Step: 23840   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:04:32,989-Speed 9387.95 samples/sec   Loss 2.2443   LearningRate 0.0005   Epoch: 13   Global Step: 23850   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:04:59,048-Speed 9431.12 samples/sec   Loss 2.2651   LearningRate 0.0005   Epoch: 13   Global Step: 23860   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:05:25,116-Speed 9428.00 samples/sec   Loss 2.2632   LearningRate 0.0005   Epoch: 13   Global Step: 23870   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:05:51,221-Speed 9414.97 samples/sec   Loss 2.2398   LearningRate 0.0005   Epoch: 13   Global Step: 23880   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:06:17,426-Speed 9378.77 samples/sec   Loss 2.2456   LearningRate 0.0005   Epoch: 13   Global Step: 23890   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:06:43,571-Speed 9400.31 samples/sec   Loss 2.2355   LearningRate 0.0005   Epoch: 13   Global Step: 23900   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:07:09,633-Speed 9430.17 samples/sec   Loss 2.2468   LearningRate 0.0005   Epoch: 13   Global Step: 23910   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-05 14:07:35,724-Speed 9419.51 samples/sec   Loss 2.2467   LearningRate 0.0005   Epoch: 13   Global Step: 23920   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-05 14:08:01,896-Speed 9390.96 samples/sec   Loss 2.2437   LearningRate 0.0005   Epoch: 13   Global Step: 23930   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-05 14:08:28,037-Speed 9401.86 samples/sec   Loss 2.2555   LearningRate 0.0005   Epoch: 13   Global Step: 23940   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-05 14:08:54,159-Speed 9408.72 samples/sec   Loss 2.2430   LearningRate 0.0005   Epoch: 13   Global Step: 23950   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-05 14:09:20,241-Speed 9422.89 samples/sec   Loss 2.2384   LearningRate 0.0005   Epoch: 13   Global Step: 23960   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-05 14:09:46,352-Speed 9412.47 samples/sec   Loss 2.2307   LearningRate 0.0005   Epoch: 13   Global Step: 23970   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-05 14:10:12,532-Speed 9387.85 samples/sec   Loss 2.2412   LearningRate 0.0005   Epoch: 13   Global Step: 23980   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-05 14:10:38,589-Speed 9432.22 samples/sec   Loss 2.2532   LearningRate 0.0005   Epoch: 13   Global Step: 23990   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-05 14:11:04,724-Speed 9403.89 samples/sec   Loss 2.2319   LearningRate 0.0005   Epoch: 13   Global Step: 24000   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-05 14:11:30,870-Speed 9399.99 samples/sec   Loss 2.2558   LearningRate 0.0005   Epoch: 13   Global Step: 24010   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:11:56,941-Speed 9426.66 samples/sec   Loss 2.2425   LearningRate 0.0005   Epoch: 13   Global Step: 24020   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:12:22,998-Speed 9432.30 samples/sec   Loss 2.2317   LearningRate 0.0005   Epoch: 13   Global Step: 24030   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:12:49,194-Speed 9381.77 samples/sec   Loss 2.2290   LearningRate 0.0005   Epoch: 13   Global Step: 24040   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:13:15,329-Speed 9403.69 samples/sec   Loss 2.2468   LearningRate 0.0005   Epoch: 13   Global Step: 24050   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:13:41,468-Speed 9402.73 samples/sec   Loss 2.2416   LearningRate 0.0005   Epoch: 13   Global Step: 24060   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:14:07,536-Speed 9428.01 samples/sec   Loss 2.2375   LearningRate 0.0005   Epoch: 13   Global Step: 24070   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:14:33,645-Speed 9413.51 samples/sec   Loss 2.2415   LearningRate 0.0005   Epoch: 13   Global Step: 24080   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:14:59,804-Speed 9394.93 samples/sec   Loss 2.2426   LearningRate 0.0005   Epoch: 13   Global Step: 24090   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:15:25,981-Speed 9388.74 samples/sec   Loss 2.2399   LearningRate 0.0005   Epoch: 13   Global Step: 24100   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:15:52,070-Speed 9420.55 samples/sec   Loss 2.2365   LearningRate 0.0005   Epoch: 13   Global Step: 24110   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-05 14:16:18,217-Speed 9399.56 samples/sec   Loss 2.2474   LearningRate 0.0005   Epoch: 13   Global Step: 24120   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-05 14:16:44,307-Speed 9420.21 samples/sec   Loss 2.2249   LearningRate 0.0005   Epoch: 13   Global Step: 24130   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:17:10,374-Speed 9428.88 samples/sec   Loss 2.2584   LearningRate 0.0005   Epoch: 13   Global Step: 24140   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:17:36,399-Speed 9443.60 samples/sec   Loss 2.2329   LearningRate 0.0005   Epoch: 13   Global Step: 24150   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:18:02,472-Speed 9426.03 samples/sec   Loss 2.2637   LearningRate 0.0005   Epoch: 13   Global Step: 24160   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:18:28,563-Speed 9419.91 samples/sec   Loss 2.2593   LearningRate 0.0005   Epoch: 13   Global Step: 24170   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:18:54,646-Speed 9422.40 samples/sec   Loss 2.2717   LearningRate 0.0005   Epoch: 13   Global Step: 24180   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:19:20,706-Speed 9431.34 samples/sec   Loss 2.2575   LearningRate 0.0005   Epoch: 13   Global Step: 24190   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:20:38,883-Speed 3143.68 samples/sec   Loss 2.2348   LearningRate 0.0005   Epoch: 14   Global Step: 24200   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:21:04,737-Speed 9506.12 samples/sec   Loss 2.1885   LearningRate 0.0005   Epoch: 14   Global Step: 24210   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:21:30,706-Speed 9463.89 samples/sec   Loss 2.2159   LearningRate 0.0005   Epoch: 14   Global Step: 24220   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:21:56,707-Speed 9452.46 samples/sec   Loss 2.2135   LearningRate 0.0005   Epoch: 14   Global Step: 24230   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-05 14:22:22,652-Speed 9472.83 samples/sec   Loss 2.2177   LearningRate 0.0005   Epoch: 14   Global Step: 24240   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:22:48,640-Speed 9457.00 samples/sec   Loss 2.2006   LearningRate 0.0005   Epoch: 14   Global Step: 24250   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:23:14,620-Speed 9460.35 samples/sec   Loss 2.1888   LearningRate 0.0005   Epoch: 14   Global Step: 24260   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:23:40,592-Speed 9462.90 samples/sec   Loss 2.1943   LearningRate 0.0005   Epoch: 14   Global Step: 24270   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:24:06,541-Speed 9471.27 samples/sec   Loss 2.2248   LearningRate 0.0005   Epoch: 14   Global Step: 24280   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:24:32,589-Speed 9435.48 samples/sec   Loss 2.2197   LearningRate 0.0005   Epoch: 14   Global Step: 24290   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:24:58,633-Speed 9436.59 samples/sec   Loss 2.2053   LearningRate 0.0005   Epoch: 14   Global Step: 24300   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:25:24,635-Speed 9452.15 samples/sec   Loss 2.2037   LearningRate 0.0005   Epoch: 14   Global Step: 24310   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:25:50,665-Speed 9442.09 samples/sec   Loss 2.1874   LearningRate 0.0005   Epoch: 14   Global Step: 24320   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-05 14:26:16,683-Speed 9446.12 samples/sec   Loss 2.2006   LearningRate 0.0005   Epoch: 14   Global Step: 24330   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-05 14:26:42,730-Speed 9435.49 samples/sec   Loss 2.2506   LearningRate 0.0005   Epoch: 14   Global Step: 24340   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-05 14:27:08,804-Speed 9425.89 samples/sec   Loss 2.2244   LearningRate 0.0005   Epoch: 14   Global Step: 24350   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-05 14:27:34,846-Speed 9437.52 samples/sec   Loss 2.2029   LearningRate 0.0005   Epoch: 14   Global Step: 24360   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-05 14:28:00,948-Speed 9415.61 samples/sec   Loss 2.2316   LearningRate 0.0005   Epoch: 14   Global Step: 24370   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-05 14:28:27,029-Speed 9423.89 samples/sec   Loss 2.2109   LearningRate 0.0005   Epoch: 14   Global Step: 24380   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-05 14:28:53,191-Speed 9394.06 samples/sec   Loss 2.1909   LearningRate 0.0005   Epoch: 14   Global Step: 24390   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-05 14:29:19,370-Speed 9387.91 samples/sec   Loss 2.1837   LearningRate 0.0005   Epoch: 14   Global Step: 24400   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-05 14:29:45,471-Speed 9416.43 samples/sec   Loss 2.1840   LearningRate 0.0005   Epoch: 14   Global Step: 24410   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-05 14:30:11,576-Speed 9414.80 samples/sec   Loss 2.2107   LearningRate 0.0005   Epoch: 14   Global Step: 24420   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:30:37,798-Speed 9372.66 samples/sec   Loss 2.1989   LearningRate 0.0005   Epoch: 14   Global Step: 24430   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:31:03,907-Speed 9412.98 samples/sec   Loss 2.2030   LearningRate 0.0005   Epoch: 14   Global Step: 24440   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:31:30,026-Speed 9409.75 samples/sec   Loss 2.2154   LearningRate 0.0005   Epoch: 14   Global Step: 24450   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:31:56,052-Speed 9443.51 samples/sec   Loss 2.2150   LearningRate 0.0005   Epoch: 14   Global Step: 24460   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-05 14:32:22,099-Speed 9435.53 samples/sec   Loss 2.2106   LearningRate 0.0005   Epoch: 14   Global Step: 24470   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-05 14:32:48,230-Speed 9405.88 samples/sec   Loss 2.2193   LearningRate 0.0005   Epoch: 14   Global Step: 24480   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-05 14:33:14,414-Speed 9386.44 samples/sec   Loss 2.2086   LearningRate 0.0005   Epoch: 14   Global Step: 24490   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-05 14:33:40,524-Speed 9412.79 samples/sec   Loss 2.2012   LearningRate 0.0005   Epoch: 14   Global Step: 24500   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-05 14:34:06,616-Speed 9419.27 samples/sec   Loss 2.1989   LearningRate 0.0005   Epoch: 14   Global Step: 24510   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-05 14:34:32,653-Speed 9439.69 samples/sec   Loss 2.1946   LearningRate 0.0005   Epoch: 14   Global Step: 24520   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-05 14:34:58,760-Speed 9414.09 samples/sec   Loss 2.1992   LearningRate 0.0005   Epoch: 14   Global Step: 24530   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-05 14:35:24,779-Speed 9445.68 samples/sec   Loss 2.1876   LearningRate 0.0005   Epoch: 14   Global Step: 24540   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-05 14:35:50,933-Speed 9397.23 samples/sec   Loss 2.1849   LearningRate 0.0005   Epoch: 14   Global Step: 24550   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-05 14:36:17,040-Speed 9414.08 samples/sec   Loss 2.2051   LearningRate 0.0005   Epoch: 14   Global Step: 24560   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:36:43,013-Speed 9462.94 samples/sec   Loss 2.2226   LearningRate 0.0005   Epoch: 14   Global Step: 24570   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-05 14:37:09,075-Speed 9430.27 samples/sec   Loss 2.2351   LearningRate 0.0005   Epoch: 14   Global Step: 24580   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-05 14:37:35,232-Speed 9396.06 samples/sec   Loss 2.2155   LearningRate 0.0005   Epoch: 14   Global Step: 24590   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-05 14:38:01,391-Speed 9395.18 samples/sec   Loss 2.2105   LearningRate 0.0005   Epoch: 14   Global Step: 24600   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-05 14:38:27,476-Speed 9421.88 samples/sec   Loss 2.2012   LearningRate 0.0005   Epoch: 14   Global Step: 24610   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-05 14:38:53,631-Speed 9397.03 samples/sec   Loss 2.2046   LearningRate 0.0005   Epoch: 14   Global Step: 24620   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-05 14:39:19,769-Speed 9403.12 samples/sec   Loss 2.1929   LearningRate 0.0005   Epoch: 14   Global Step: 24630   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-05 14:39:45,959-Speed 9384.01 samples/sec   Loss 2.1855   LearningRate 0.0005   Epoch: 14   Global Step: 24640   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-05 14:40:12,143-Speed 9386.19 samples/sec   Loss 2.1908   LearningRate 0.0005   Epoch: 14   Global Step: 24650   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-05 14:40:38,357-Speed 9375.52 samples/sec   Loss 2.1939   LearningRate 0.0005   Epoch: 14   Global Step: 24660   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-05 14:41:04,493-Speed 9403.79 samples/sec   Loss 2.2024   LearningRate 0.0005   Epoch: 14   Global Step: 24670   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:41:30,738-Speed 9364.48 samples/sec   Loss 2.1909   LearningRate 0.0005   Epoch: 14   Global Step: 24680   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:41:56,956-Speed 9374.20 samples/sec   Loss 2.1838   LearningRate 0.0005   Epoch: 14   Global Step: 24690   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:42:23,118-Speed 9394.12 samples/sec   Loss 2.1746   LearningRate 0.0005   Epoch: 14   Global Step: 24700   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-05 14:42:49,268-Speed 9398.81 samples/sec   Loss 2.1685   LearningRate 0.0005   Epoch: 14   Global Step: 24710   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-05 14:43:15,459-Speed 9383.97 samples/sec   Loss 2.1978   LearningRate 0.0005   Epoch: 14   Global Step: 24720   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-05 14:43:41,599-Speed 9402.12 samples/sec   Loss 2.1821   LearningRate 0.0005   Epoch: 14   Global Step: 24730   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-05 14:44:07,758-Speed 9395.08 samples/sec   Loss 2.1972   LearningRate 0.0005   Epoch: 14   Global Step: 24740   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-05 14:44:33,949-Speed 9383.90 samples/sec   Loss 2.1912   LearningRate 0.0005   Epoch: 14   Global Step: 24750   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-05 14:45:00,049-Speed 9416.74 samples/sec   Loss 2.1707   LearningRate 0.0005   Epoch: 14   Global Step: 24760   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-05 14:45:26,211-Speed 9395.45 samples/sec   Loss 2.1787   LearningRate 0.0005   Epoch: 14   Global Step: 24770   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-05 14:45:52,360-Speed 9399.06 samples/sec   Loss 2.1979   LearningRate 0.0005   Epoch: 14   Global Step: 24780   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-05 14:46:18,491-Speed 9405.13 samples/sec   Loss 2.1820   LearningRate 0.0005   Epoch: 14   Global Step: 24790   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-05 14:46:44,620-Speed 9406.20 samples/sec   Loss 2.1811   LearningRate 0.0005   Epoch: 14   Global Step: 24800   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:47:10,734-Speed 9411.46 samples/sec   Loss 2.1648   LearningRate 0.0005   Epoch: 14   Global Step: 24810   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:47:37,037-Speed 9344.36 samples/sec   Loss 2.1704   LearningRate 0.0005   Epoch: 14   Global Step: 24820   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:48:03,312-Speed 9353.67 samples/sec   Loss 2.1713   LearningRate 0.0005   Epoch: 14   Global Step: 24830   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:48:29,579-Speed 9356.88 samples/sec   Loss 2.1759   LearningRate 0.0005   Epoch: 14   Global Step: 24840   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:48:55,713-Speed 9404.39 samples/sec   Loss 2.1768   LearningRate 0.0005   Epoch: 14   Global Step: 24850   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:49:21,886-Speed 9390.38 samples/sec   Loss 2.1750   LearningRate 0.0005   Epoch: 14   Global Step: 24860   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:49:48,113-Speed 9371.28 samples/sec   Loss 2.1762   LearningRate 0.0005   Epoch: 14   Global Step: 24870   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:50:14,301-Speed 9384.66 samples/sec   Loss 2.1789   LearningRate 0.0005   Epoch: 14   Global Step: 24880   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:50:40,465-Speed 9393.42 samples/sec   Loss 2.1798   LearningRate 0.0005   Epoch: 14   Global Step: 24890   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:51:06,669-Speed 9379.34 samples/sec   Loss 2.1924   LearningRate 0.0005   Epoch: 14   Global Step: 24900   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-05 14:51:32,826-Speed 9395.83 samples/sec   Loss 2.1747   LearningRate 0.0005   Epoch: 14   Global Step: 24910   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-05 14:51:58,955-Speed 9406.18 samples/sec   Loss 2.1631   LearningRate 0.0005   Epoch: 14   Global Step: 24920   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:52:25,171-Speed 9374.95 samples/sec   Loss 2.1705   LearningRate 0.0005   Epoch: 14   Global Step: 24930   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:52:51,281-Speed 9412.80 samples/sec   Loss 2.1749   LearningRate 0.0005   Epoch: 14   Global Step: 24940   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:53:17,453-Speed 9391.25 samples/sec   Loss 2.1969   LearningRate 0.0005   Epoch: 14   Global Step: 24950   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:53:43,672-Speed 9373.82 samples/sec   Loss 2.1848   LearningRate 0.0005   Epoch: 14   Global Step: 24960   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:54:09,845-Speed 9390.27 samples/sec   Loss 2.1752   LearningRate 0.0005   Epoch: 14   Global Step: 24970   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-05 14:54:36,017-Speed 9390.45 samples/sec   Loss 2.1698   LearningRate 0.0005   Epoch: 14   Global Step: 24980   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 14:55:02,183-Speed 9392.82 samples/sec   Loss 2.1713   LearningRate 0.0005   Epoch: 14   Global Step: 24990   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 14:55:28,365-Speed 9387.45 samples/sec   Loss 2.1650   LearningRate 0.0005   Epoch: 14   Global Step: 25000   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 14:55:54,472-Speed 9414.34 samples/sec   Loss 2.1598   LearningRate 0.0005   Epoch: 14   Global Step: 25010   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 14:56:20,630-Speed 9395.66 samples/sec   Loss 2.1634   LearningRate 0.0005   Epoch: 14   Global Step: 25020   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-05 14:56:46,792-Speed 9393.96 samples/sec   Loss 2.1524   LearningRate 0.0005   Epoch: 14   Global Step: 25030   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-05 14:57:12,858-Speed 9429.09 samples/sec   Loss 2.1556   LearningRate 0.0005   Epoch: 14   Global Step: 25040   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 14:57:39,006-Speed 9399.63 samples/sec   Loss 2.1573   LearningRate 0.0005   Epoch: 14   Global Step: 25050   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 14:58:05,122-Speed 9410.69 samples/sec   Loss 2.1579   LearningRate 0.0005   Epoch: 14   Global Step: 25060   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 14:58:31,148-Speed 9443.39 samples/sec   Loss 2.1452   LearningRate 0.0005   Epoch: 14   Global Step: 25070   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 14:58:57,351-Speed 9379.43 samples/sec   Loss 2.1612   LearningRate 0.0005   Epoch: 14   Global Step: 25080   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 14:59:23,457-Speed 9414.13 samples/sec   Loss 2.1712   LearningRate 0.0005   Epoch: 14   Global Step: 25090   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 14:59:49,551-Speed 9419.08 samples/sec   Loss 2.1611   LearningRate 0.0005   Epoch: 14   Global Step: 25100   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:00:15,721-Speed 9391.43 samples/sec   Loss 2.1508   LearningRate 0.0005   Epoch: 14   Global Step: 25110   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:00:41,792-Speed 9426.85 samples/sec   Loss 2.1496   LearningRate 0.0005   Epoch: 14   Global Step: 25120   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:01:07,937-Speed 9400.15 samples/sec   Loss 2.1554   LearningRate 0.0005   Epoch: 14   Global Step: 25130   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:01:34,051-Speed 9411.57 samples/sec   Loss 2.1622   LearningRate 0.0005   Epoch: 14   Global Step: 25140   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-05 15:02:00,071-Speed 9445.80 samples/sec   Loss 2.1562   LearningRate 0.0005   Epoch: 14   Global Step: 25150   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:02:26,191-Speed 9409.25 samples/sec   Loss 2.1502   LearningRate 0.0005   Epoch: 14   Global Step: 25160   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:02:52,300-Speed 9413.37 samples/sec   Loss 2.1657   LearningRate 0.0005   Epoch: 14   Global Step: 25170   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:03:18,373-Speed 9426.20 samples/sec   Loss 2.1364   LearningRate 0.0005   Epoch: 14   Global Step: 25180   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:03:44,535-Speed 9394.32 samples/sec   Loss 2.1490   LearningRate 0.0005   Epoch: 14   Global Step: 25190   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:04:10,606-Speed 9427.04 samples/sec   Loss 2.1369   LearningRate 0.0005   Epoch: 14   Global Step: 25200   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:04:36,747-Speed 9401.72 samples/sec   Loss 2.1508   LearningRate 0.0005   Epoch: 14   Global Step: 25210   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:05:02,815-Speed 9427.91 samples/sec   Loss 2.1412   LearningRate 0.0005   Epoch: 14   Global Step: 25220   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:05:28,876-Speed 9430.82 samples/sec   Loss 2.1487   LearningRate 0.0005   Epoch: 14   Global Step: 25230   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:05:54,968-Speed 9419.35 samples/sec   Loss 2.1687   LearningRate 0.0005   Epoch: 14   Global Step: 25240   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:06:21,101-Speed 9404.54 samples/sec   Loss 2.1457   LearningRate 0.0005   Epoch: 14   Global Step: 25250   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-05 15:06:47,232-Speed 9405.25 samples/sec   Loss 2.1433   LearningRate 0.0005   Epoch: 14   Global Step: 25260   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-05 15:07:13,298-Speed 9428.87 samples/sec   Loss 2.1387   LearningRate 0.0005   Epoch: 14   Global Step: 25270   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-05 15:07:39,452-Speed 9397.16 samples/sec   Loss 2.1361   LearningRate 0.0005   Epoch: 14   Global Step: 25280   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:08:05,591-Speed 9402.44 samples/sec   Loss 2.1517   LearningRate 0.0005   Epoch: 14   Global Step: 25290   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:08:31,804-Speed 9375.98 samples/sec   Loss 2.1457   LearningRate 0.0005   Epoch: 14   Global Step: 25300   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:08:57,862-Speed 9431.37 samples/sec   Loss 2.1395   LearningRate 0.0005   Epoch: 14   Global Step: 25310   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:09:23,947-Speed 9422.05 samples/sec   Loss 2.1499   LearningRate 0.0005   Epoch: 14   Global Step: 25320   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:09:50,033-Speed 9421.71 samples/sec   Loss 2.1393   LearningRate 0.0005   Epoch: 14   Global Step: 25330   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:10:16,170-Speed 9402.96 samples/sec   Loss 2.1460   LearningRate 0.0005   Epoch: 14   Global Step: 25340   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:10:42,314-Speed 9400.79 samples/sec   Loss 2.1522   LearningRate 0.0005   Epoch: 14   Global Step: 25350   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:11:08,423-Speed 9413.38 samples/sec   Loss 2.1367   LearningRate 0.0005   Epoch: 14   Global Step: 25360   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:11:34,537-Speed 9411.34 samples/sec   Loss 2.1352   LearningRate 0.0005   Epoch: 14   Global Step: 25370   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:12:00,575-Speed 9438.76 samples/sec   Loss 2.1501   LearningRate 0.0005   Epoch: 14   Global Step: 25380   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:12:26,672-Speed 9417.82 samples/sec   Loss 2.1563   LearningRate 0.0005   Epoch: 14   Global Step: 25390   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:12:52,853-Speed 9387.61 samples/sec   Loss 2.1359   LearningRate 0.0005   Epoch: 14   Global Step: 25400   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:13:18,924-Speed 9426.81 samples/sec   Loss 2.1293   LearningRate 0.0005   Epoch: 14   Global Step: 25410   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:13:45,005-Speed 9423.27 samples/sec   Loss 2.1283   LearningRate 0.0005   Epoch: 14   Global Step: 25420   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:14:11,068-Speed 9429.78 samples/sec   Loss 2.1229   LearningRate 0.0005   Epoch: 14   Global Step: 25430   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:14:37,257-Speed 9385.17 samples/sec   Loss 2.1406   LearningRate 0.0005   Epoch: 14   Global Step: 25440   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:15:03,402-Speed 9400.32 samples/sec   Loss 2.1293   LearningRate 0.0005   Epoch: 14   Global Step: 25450   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:15:29,487-Speed 9421.89 samples/sec   Loss 2.1242   LearningRate 0.0005   Epoch: 14   Global Step: 25460   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:15:55,556-Speed 9427.99 samples/sec   Loss 2.1499   LearningRate 0.0005   Epoch: 14   Global Step: 25470   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:16:21,637-Speed 9423.34 samples/sec   Loss 2.1283   LearningRate 0.0005   Epoch: 14   Global Step: 25480   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-05 15:16:47,750-Speed 9411.72 samples/sec   Loss 2.1338   LearningRate 0.0005   Epoch: 14   Global Step: 25490   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-05 15:17:13,896-Speed 9399.90 samples/sec   Loss 2.1221   LearningRate 0.0005   Epoch: 14   Global Step: 25500   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-05 15:17:39,982-Speed 9421.55 samples/sec   Loss 2.1163   LearningRate 0.0005   Epoch: 14   Global Step: 25510   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-05 15:18:06,124-Speed 9401.30 samples/sec   Loss 2.1378   LearningRate 0.0005   Epoch: 14   Global Step: 25520   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-05 15:18:32,186-Speed 9430.46 samples/sec   Loss 2.1279   LearningRate 0.0005   Epoch: 14   Global Step: 25530   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-05 15:18:58,253-Speed 9428.35 samples/sec   Loss 2.1371   LearningRate 0.0005   Epoch: 14   Global Step: 25540   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:19:24,374-Speed 9408.73 samples/sec   Loss 2.1313   LearningRate 0.0005   Epoch: 14   Global Step: 25550   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:19:50,479-Speed 9414.66 samples/sec   Loss 2.1388   LearningRate 0.0005   Epoch: 14   Global Step: 25560   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:20:16,528-Speed 9435.05 samples/sec   Loss 2.1304   LearningRate 0.0005   Epoch: 14   Global Step: 25570   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:20:42,677-Speed 9398.82 samples/sec   Loss 2.1274   LearningRate 0.0005   Epoch: 14   Global Step: 25580   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:21:08,819-Speed 9401.49 samples/sec   Loss 2.1329   LearningRate 0.0005   Epoch: 14   Global Step: 25590   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:21:34,840-Speed 9445.03 samples/sec   Loss 2.1323   LearningRate 0.0005   Epoch: 14   Global Step: 25600   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:22:00,887-Speed 9435.82 samples/sec   Loss 2.1045   LearningRate 0.0005   Epoch: 14   Global Step: 25610   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:22:26,991-Speed 9414.92 samples/sec   Loss 2.1133   LearningRate 0.0005   Epoch: 14   Global Step: 25620   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:22:53,094-Speed 9415.42 samples/sec   Loss 2.1120   LearningRate 0.0005   Epoch: 14   Global Step: 25630   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:23:19,165-Speed 9427.01 samples/sec   Loss 2.1074   LearningRate 0.0005   Epoch: 14   Global Step: 25640   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:23:45,253-Speed 9420.89 samples/sec   Loss 2.1103   LearningRate 0.0005   Epoch: 14   Global Step: 25650   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:24:11,283-Speed 9441.85 samples/sec   Loss 2.1410   LearningRate 0.0005   Epoch: 14   Global Step: 25660   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:24:37,367-Speed 9422.12 samples/sec   Loss 2.1167   LearningRate 0.0005   Epoch: 14   Global Step: 25670   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:25:03,551-Speed 9386.40 samples/sec   Loss 2.1116   LearningRate 0.0005   Epoch: 14   Global Step: 25680   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:25:29,718-Speed 9392.35 samples/sec   Loss 2.1275   LearningRate 0.0005   Epoch: 14   Global Step: 25690   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:25:55,840-Speed 9408.57 samples/sec   Loss 2.1394   LearningRate 0.0005   Epoch: 14   Global Step: 25700   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:26:21,968-Speed 9406.36 samples/sec   Loss 2.1197   LearningRate 0.0005   Epoch: 14   Global Step: 25710   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:26:48,116-Speed 9399.18 samples/sec   Loss 2.1243   LearningRate 0.0005   Epoch: 14   Global Step: 25720   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:27:14,222-Speed 9414.76 samples/sec   Loss 2.1075   LearningRate 0.0005   Epoch: 14   Global Step: 25730   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:27:40,315-Speed 9419.38 samples/sec   Loss 2.1267   LearningRate 0.0005   Epoch: 14   Global Step: 25740   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-05 15:28:06,416-Speed 9417.35 samples/sec   Loss 2.1149   LearningRate 0.0005   Epoch: 14   Global Step: 25750   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-05 15:28:32,567-Speed 9398.05 samples/sec   Loss 2.1213   LearningRate 0.0005   Epoch: 14   Global Step: 25760   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:28:58,567-Speed 9452.97 samples/sec   Loss 2.1242   LearningRate 0.0005   Epoch: 14   Global Step: 25770   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:29:24,633-Speed 9428.50 samples/sec   Loss 2.1094   LearningRate 0.0005   Epoch: 14   Global Step: 25780   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:29:50,734-Speed 9416.33 samples/sec   Loss 2.1114   LearningRate 0.0005   Epoch: 14   Global Step: 25790   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:30:16,768-Speed 9440.51 samples/sec   Loss 2.0963   LearningRate 0.0005   Epoch: 14   Global Step: 25800   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:30:42,906-Speed 9402.65 samples/sec   Loss 2.1167   LearningRate 0.0005   Epoch: 14   Global Step: 25810   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:31:09,065-Speed 9395.15 samples/sec   Loss 2.1147   LearningRate 0.0005   Epoch: 14   Global Step: 25820   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:31:35,147-Speed 9423.16 samples/sec   Loss 2.1142   LearningRate 0.0005   Epoch: 14   Global Step: 25830   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:32:01,302-Speed 9396.58 samples/sec   Loss 2.1072   LearningRate 0.0005   Epoch: 14   Global Step: 25840   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:32:27,421-Speed 9409.90 samples/sec   Loss 2.1023   LearningRate 0.0005   Epoch: 14   Global Step: 25850   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:32:53,548-Speed 9406.78 samples/sec   Loss 2.1220   LearningRate 0.0005   Epoch: 14   Global Step: 25860   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-05 15:33:19,651-Speed 9415.41 samples/sec   Loss 2.1052   LearningRate 0.0005   Epoch: 14   Global Step: 25870   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-05 15:33:45,673-Speed 9444.93 samples/sec   Loss 2.1180   LearningRate 0.0005   Epoch: 14   Global Step: 25880   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-05 15:34:11,723-Speed 9434.42 samples/sec   Loss 2.1347   LearningRate 0.0005   Epoch: 14   Global Step: 25890   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-05 15:34:37,754-Speed 9441.63 samples/sec   Loss 2.1324   LearningRate 0.0005   Epoch: 14   Global Step: 25900   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-05 15:35:03,820-Speed 9428.61 samples/sec   Loss 2.1267   LearningRate 0.0005   Epoch: 14   Global Step: 25910   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:35:29,875-Speed 9432.93 samples/sec   Loss 2.1499   LearningRate 0.0005   Epoch: 14   Global Step: 25920   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:36:47,735-Speed 3156.49 samples/sec   Loss 2.1032   LearningRate 0.0005   Epoch: 15   Global Step: 25930   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:37:13,655-Speed 9482.09 samples/sec   Loss 2.0739   LearningRate 0.0005   Epoch: 15   Global Step: 25940   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:37:39,660-Speed 9451.05 samples/sec   Loss 2.0663   LearningRate 0.0005   Epoch: 15   Global Step: 25950   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:38:05,782-Speed 9408.71 samples/sec   Loss 2.0801   LearningRate 0.0005   Epoch: 15   Global Step: 25960   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:38:31,815-Speed 9440.60 samples/sec   Loss 2.0872   LearningRate 0.0005   Epoch: 15   Global Step: 25970   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:38:57,824-Speed 9449.51 samples/sec   Loss 2.0945   LearningRate 0.0005   Epoch: 15   Global Step: 25980   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:39:23,806-Speed 9459.68 samples/sec   Loss 2.0631   LearningRate 0.0005   Epoch: 15   Global Step: 25990   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:39:49,810-Speed 9451.37 samples/sec   Loss 2.0824   LearningRate 0.0005   Epoch: 15   Global Step: 26000   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:40:15,897-Speed 9421.19 samples/sec   Loss 2.1073   LearningRate 0.0005   Epoch: 15   Global Step: 26010   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-05 15:40:42,021-Speed 9408.16 samples/sec   Loss 2.0832   LearningRate 0.0005   Epoch: 15   Global Step: 26020   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-05 15:41:08,021-Speed 9452.51 samples/sec   Loss 2.0846   LearningRate 0.0005   Epoch: 15   Global Step: 26030   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:41:34,060-Speed 9438.81 samples/sec   Loss 2.0900   LearningRate 0.0005   Epoch: 15   Global Step: 26040   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:42:00,139-Speed 9424.27 samples/sec   Loss 2.0809   LearningRate 0.0005   Epoch: 15   Global Step: 26050   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:42:26,175-Speed 9439.43 samples/sec   Loss 2.0732   LearningRate 0.0005   Epoch: 15   Global Step: 26060   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:42:52,168-Speed 9455.37 samples/sec   Loss 2.0812   LearningRate 0.0005   Epoch: 15   Global Step: 26070   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:43:18,155-Speed 9457.54 samples/sec   Loss 2.0723   LearningRate 0.0005   Epoch: 15   Global Step: 26080   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-03-05 15:43:44,191-Speed 9439.27 samples/sec   Loss 2.0955   LearningRate 0.0005   Epoch: 15   Global Step: 26090   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-03-05 15:44:10,276-Speed 9422.24 samples/sec   Loss 2.1377   LearningRate 0.0005   Epoch: 15   Global Step: 26100   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-03-05 15:44:36,383-Speed 9414.12 samples/sec   Loss 2.0953   LearningRate 0.0005   Epoch: 15   Global Step: 26110   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-03-05 15:45:02,533-Speed 9398.61 samples/sec   Loss 2.0814   LearningRate 0.0005   Epoch: 15   Global Step: 26120   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-03-05 15:45:28,626-Speed 9419.12 samples/sec   Loss 2.0717   LearningRate 0.0005   Epoch: 15   Global Step: 26130   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-03-05 15:45:54,790-Speed 9393.40 samples/sec   Loss 2.0804   LearningRate 0.0005   Epoch: 15   Global Step: 26140   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-03-05 15:46:20,853-Speed 9429.83 samples/sec   Loss 2.0720   LearningRate 0.0005   Epoch: 15   Global Step: 26150   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-03-05 15:46:46,931-Speed 9424.60 samples/sec   Loss 2.0898   LearningRate 0.0005   Epoch: 15   Global Step: 26160   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-03-05 15:47:13,044-Speed 9411.85 samples/sec   Loss 2.0959   LearningRate 0.0005   Epoch: 15   Global Step: 26170   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-03-05 15:47:39,088-Speed 9436.73 samples/sec   Loss 2.0878   LearningRate 0.0005   Epoch: 15   Global Step: 26180   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:48:05,268-Speed 9387.62 samples/sec   Loss 2.0785   LearningRate 0.0005   Epoch: 15   Global Step: 26190   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:48:31,401-Speed 9404.68 samples/sec   Loss 2.0763   LearningRate 0.0005   Epoch: 15   Global Step: 26200   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:48:57,516-Speed 9411.14 samples/sec   Loss 2.0684   LearningRate 0.0005   Epoch: 15   Global Step: 26210   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:49:23,613-Speed 9417.77 samples/sec   Loss 2.0887   LearningRate 0.0005   Epoch: 15   Global Step: 26220   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:49:49,679-Speed 9428.87 samples/sec   Loss 2.0990   LearningRate 0.0005   Epoch: 15   Global Step: 26230   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:50:15,756-Speed 9424.64 samples/sec   Loss 2.0850   LearningRate 0.0005   Epoch: 15   Global Step: 26240   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:50:41,841-Speed 9422.26 samples/sec   Loss 2.0803   LearningRate 0.0005   Epoch: 15   Global Step: 26250   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:51:07,948-Speed 9413.72 samples/sec   Loss 2.0813   LearningRate 0.0005   Epoch: 15   Global Step: 26260   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:51:33,999-Speed 9434.43 samples/sec   Loss 2.0803   LearningRate 0.0005   Epoch: 15   Global Step: 26270   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:52:00,036-Speed 9439.52 samples/sec   Loss 2.1145   LearningRate 0.0005   Epoch: 15   Global Step: 26280   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:52:26,102-Speed 9428.60 samples/sec   Loss 2.0759   LearningRate 0.0005   Epoch: 15   Global Step: 26290   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:52:52,272-Speed 9391.21 samples/sec   Loss 2.0734   LearningRate 0.0005   Epoch: 15   Global Step: 26300   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:53:18,327-Speed 9432.96 samples/sec   Loss 2.0579   LearningRate 0.0005   Epoch: 15   Global Step: 26310   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:53:44,469-Speed 9401.39 samples/sec   Loss 2.0877   LearningRate 0.0005   Epoch: 15   Global Step: 26320   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:54:10,566-Speed 9417.48 samples/sec   Loss 2.0828   LearningRate 0.0005   Epoch: 15   Global Step: 26330   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-05 15:54:36,614-Speed 9435.41 samples/sec   Loss 2.0871   LearningRate 0.0005   Epoch: 15   Global Step: 26340   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-05 15:55:02,694-Speed 9423.76 samples/sec   Loss 2.0653   LearningRate 0.0005   Epoch: 15   Global Step: 26350   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-05 15:55:28,829-Speed 9403.84 samples/sec   Loss 2.0849   LearningRate 0.0005   Epoch: 15   Global Step: 26360   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-05 15:55:54,885-Speed 9432.33 samples/sec   Loss 2.0599   LearningRate 0.0005   Epoch: 15   Global Step: 26370   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-05 15:56:20,938-Speed 9433.39 samples/sec   Loss 2.0705   LearningRate 0.0005   Epoch: 15   Global Step: 26380   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-05 15:56:46,979-Speed 9437.84 samples/sec   Loss 2.0829   LearningRate 0.0005   Epoch: 15   Global Step: 26390   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-05 15:57:13,021-Speed 9437.67 samples/sec   Loss 2.0730   LearningRate 0.0005   Epoch: 15   Global Step: 26400   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-05 15:57:39,097-Speed 9425.19 samples/sec   Loss 2.0648   LearningRate 0.0005   Epoch: 15   Global Step: 26410   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-05 15:58:05,131-Speed 9440.25 samples/sec   Loss 2.0780   LearningRate 0.0005   Epoch: 15   Global Step: 26420   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-05 15:58:31,248-Speed 9410.44 samples/sec   Loss 2.0732   LearningRate 0.0005   Epoch: 15   Global Step: 26430   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-05 15:58:57,327-Speed 9423.75 samples/sec   Loss 2.0720   LearningRate 0.0005   Epoch: 15   Global Step: 26440   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 15:59:23,382-Speed 9432.99 samples/sec   Loss 2.0820   LearningRate 0.0005   Epoch: 15   Global Step: 26450   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 15:59:49,418-Speed 9439.69 samples/sec   Loss 2.0690   LearningRate 0.0005   Epoch: 15   Global Step: 26460   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:00:15,491-Speed 9426.26 samples/sec   Loss 2.0568   LearningRate 0.0005   Epoch: 15   Global Step: 26470   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:00:41,495-Speed 9451.25 samples/sec   Loss 2.0627   LearningRate 0.0005   Epoch: 15   Global Step: 26480   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:01:07,531-Speed 9439.85 samples/sec   Loss 2.0713   LearningRate 0.0005   Epoch: 15   Global Step: 26490   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-05 16:01:33,618-Speed 9421.13 samples/sec   Loss 2.0508   LearningRate 0.0005   Epoch: 15   Global Step: 26500   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-05 16:01:59,743-Speed 9407.81 samples/sec   Loss 2.0498   LearningRate 0.0005   Epoch: 15   Global Step: 26510   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-05 16:02:25,842-Speed 9416.73 samples/sec   Loss 2.0577   LearningRate 0.0005   Epoch: 15   Global Step: 26520   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-05 16:02:51,961-Speed 9409.96 samples/sec   Loss 2.0496   LearningRate 0.0005   Epoch: 15   Global Step: 26530   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-05 16:03:18,064-Speed 9415.63 samples/sec   Loss 2.0590   LearningRate 0.0005   Epoch: 15   Global Step: 26540   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-05 16:03:44,325-Speed 9358.85 samples/sec   Loss 2.0489   LearningRate 0.0005   Epoch: 15   Global Step: 26550   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-05 16:04:10,377-Speed 9434.00 samples/sec   Loss 2.0571   LearningRate 0.0005   Epoch: 15   Global Step: 26560   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-05 16:04:36,452-Speed 9425.50 samples/sec   Loss 2.0695   LearningRate 0.0005   Epoch: 15   Global Step: 26570   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-05 16:05:02,640-Speed 9385.12 samples/sec   Loss 2.0600   LearningRate 0.0005   Epoch: 15   Global Step: 26580   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-05 16:05:28,686-Speed 9435.96 samples/sec   Loss 2.0479   LearningRate 0.0005   Epoch: 15   Global Step: 26590   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:05:54,823-Speed 9403.50 samples/sec   Loss 2.0555   LearningRate 0.0005   Epoch: 15   Global Step: 26600   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:06:20,921-Speed 9417.21 samples/sec   Loss 2.0610   LearningRate 0.0005   Epoch: 15   Global Step: 26610   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:06:47,015-Speed 9418.89 samples/sec   Loss 2.0406   LearningRate 0.0005   Epoch: 15   Global Step: 26620   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:07:13,097-Speed 9422.66 samples/sec   Loss 2.0466   LearningRate 0.0005   Epoch: 15   Global Step: 26630   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:07:39,144-Speed 9435.81 samples/sec   Loss 2.0501   LearningRate 0.0005   Epoch: 15   Global Step: 26640   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:08:05,304-Speed 9394.79 samples/sec   Loss 2.0664   LearningRate 0.0005   Epoch: 15   Global Step: 26650   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:08:31,385-Speed 9423.23 samples/sec   Loss 2.0578   LearningRate 0.0005   Epoch: 15   Global Step: 26660   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:08:57,441-Speed 9432.58 samples/sec   Loss 2.0641   LearningRate 0.0005   Epoch: 15   Global Step: 26670   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:09:23,520-Speed 9423.97 samples/sec   Loss 2.0560   LearningRate 0.0005   Epoch: 15   Global Step: 26680   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:09:49,577-Speed 9432.26 samples/sec   Loss 2.0350   LearningRate 0.0005   Epoch: 15   Global Step: 26690   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-05 16:10:15,645-Speed 9428.25 samples/sec   Loss 2.0432   LearningRate 0.0005   Epoch: 15   Global Step: 26700   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-05 16:10:41,742-Speed 9417.57 samples/sec   Loss 2.0558   LearningRate 0.0005   Epoch: 15   Global Step: 26710   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:11:07,878-Speed 9403.80 samples/sec   Loss 2.0494   LearningRate 0.0005   Epoch: 15   Global Step: 26720   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:11:34,114-Speed 9367.91 samples/sec   Loss 2.0412   LearningRate 0.0005   Epoch: 15   Global Step: 26730   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:12:00,229-Speed 9411.18 samples/sec   Loss 2.0454   LearningRate 0.0005   Epoch: 15   Global Step: 26740   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:12:26,399-Speed 9391.18 samples/sec   Loss 2.0485   LearningRate 0.0005   Epoch: 15   Global Step: 26750   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:12:52,457-Speed 9431.86 samples/sec   Loss 2.0225   LearningRate 0.0005   Epoch: 15   Global Step: 26760   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-05 16:13:18,574-Speed 9410.36 samples/sec   Loss 2.0438   LearningRate 0.0005   Epoch: 15   Global Step: 26770   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-05 16:13:44,671-Speed 9417.53 samples/sec   Loss 2.0266   LearningRate 0.0005   Epoch: 15   Global Step: 26780   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-05 16:14:10,813-Speed 9401.46 samples/sec   Loss 2.0305   LearningRate 0.0005   Epoch: 15   Global Step: 26790   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-05 16:14:36,926-Speed 9411.87 samples/sec   Loss 2.0194   LearningRate 0.0005   Epoch: 15   Global Step: 26800   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-05 16:15:03,067-Speed 9402.17 samples/sec   Loss 2.0478   LearningRate 0.0005   Epoch: 15   Global Step: 26810   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-05 16:15:29,268-Speed 9380.00 samples/sec   Loss 2.0243   LearningRate 0.0005   Epoch: 15   Global Step: 26820   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-05 16:15:55,388-Speed 9409.33 samples/sec   Loss 2.0199   LearningRate 0.0005   Epoch: 15   Global Step: 26830   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-05 16:16:21,490-Speed 9415.86 samples/sec   Loss 2.0570   LearningRate 0.0005   Epoch: 15   Global Step: 26840   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-05 16:16:47,745-Speed 9360.86 samples/sec   Loss 2.0300   LearningRate 0.0005   Epoch: 15   Global Step: 26850   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-05 16:17:13,909-Speed 9393.63 samples/sec   Loss 2.0410   LearningRate 0.0005   Epoch: 15   Global Step: 26860   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:17:39,951-Speed 9437.45 samples/sec   Loss 2.0496   LearningRate 0.0005   Epoch: 15   Global Step: 26870   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:18:06,031-Speed 9423.83 samples/sec   Loss 2.0498   LearningRate 0.0005   Epoch: 15   Global Step: 26880   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:18:32,169-Speed 9402.86 samples/sec   Loss 2.0273   LearningRate 0.0005   Epoch: 15   Global Step: 26890   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:18:58,218-Speed 9434.80 samples/sec   Loss 2.0306   LearningRate 0.0005   Epoch: 15   Global Step: 26900   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:19:24,372-Speed 9397.14 samples/sec   Loss 2.0284   LearningRate 0.0005   Epoch: 15   Global Step: 26910   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:19:50,458-Speed 9421.48 samples/sec   Loss 2.0367   LearningRate 0.0005   Epoch: 15   Global Step: 26920   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:20:16,602-Speed 9400.60 samples/sec   Loss 2.0367   LearningRate 0.0005   Epoch: 15   Global Step: 26930   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:20:42,743-Speed 9401.54 samples/sec   Loss 2.0642   LearningRate 0.0005   Epoch: 15   Global Step: 26940   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:21:08,873-Speed 9405.91 samples/sec   Loss 2.0392   LearningRate 0.0005   Epoch: 15   Global Step: 26950   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:21:34,995-Speed 9408.62 samples/sec   Loss 2.0390   LearningRate 0.0005   Epoch: 15   Global Step: 26960   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-05 16:22:01,069-Speed 9426.10 samples/sec   Loss 2.0376   LearningRate 0.0005   Epoch: 15   Global Step: 26970   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:22:27,141-Speed 9426.69 samples/sec   Loss 2.0277   LearningRate 0.0005   Epoch: 15   Global Step: 26980   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:22:53,225-Speed 9422.36 samples/sec   Loss 2.0307   LearningRate 0.0005   Epoch: 15   Global Step: 26990   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:23:19,324-Speed 9416.89 samples/sec   Loss 2.0392   LearningRate 0.0005   Epoch: 15   Global Step: 27000   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:23:45,375-Speed 9434.09 samples/sec   Loss 2.0337   LearningRate 0.0005   Epoch: 15   Global Step: 27010   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:24:11,446-Speed 9427.52 samples/sec   Loss 2.0322   LearningRate 0.0005   Epoch: 15   Global Step: 27020   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:24:37,483-Speed 9439.15 samples/sec   Loss 2.0293   LearningRate 0.0005   Epoch: 15   Global Step: 27030   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:25:03,554-Speed 9427.23 samples/sec   Loss 2.0161   LearningRate 0.0005   Epoch: 15   Global Step: 27040   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:25:29,643-Speed 9420.35 samples/sec   Loss 2.0196   LearningRate 0.0005   Epoch: 15   Global Step: 27050   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:25:55,810-Speed 9392.41 samples/sec   Loss 2.0237   LearningRate 0.0005   Epoch: 15   Global Step: 27060   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:26:21,997-Speed 9385.06 samples/sec   Loss 2.0331   LearningRate 0.0005   Epoch: 15   Global Step: 27070   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-05 16:26:48,135-Speed 9402.82 samples/sec   Loss 2.0146   LearningRate 0.0005   Epoch: 15   Global Step: 27080   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-05 16:27:14,151-Speed 9446.92 samples/sec   Loss 2.0129   LearningRate 0.0005   Epoch: 15   Global Step: 27090   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:27:40,265-Speed 9411.47 samples/sec   Loss 2.0098   LearningRate 0.0005   Epoch: 15   Global Step: 27100   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:28:06,345-Speed 9423.79 samples/sec   Loss 2.0355   LearningRate 0.0005   Epoch: 15   Global Step: 27110   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:28:32,373-Speed 9442.51 samples/sec   Loss 2.0286   LearningRate 0.0005   Epoch: 15   Global Step: 27120   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:28:58,396-Speed 9444.27 samples/sec   Loss 2.0202   LearningRate 0.0005   Epoch: 15   Global Step: 27130   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:29:24,563-Speed 9392.33 samples/sec   Loss 2.0224   LearningRate 0.0005   Epoch: 15   Global Step: 27140   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:29:50,665-Speed 9416.11 samples/sec   Loss 2.0119   LearningRate 0.0005   Epoch: 15   Global Step: 27150   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:30:16,702-Speed 9439.09 samples/sec   Loss 2.0387   LearningRate 0.0005   Epoch: 15   Global Step: 27160   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:30:42,782-Speed 9423.88 samples/sec   Loss 2.0209   LearningRate 0.0005   Epoch: 15   Global Step: 27170   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:31:08,860-Speed 9424.57 samples/sec   Loss 1.9995   LearningRate 0.0005   Epoch: 15   Global Step: 27180   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:31:35,014-Speed 9396.84 samples/sec   Loss 2.0126   LearningRate 0.0005   Epoch: 15   Global Step: 27190   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:32:01,195-Speed 9387.83 samples/sec   Loss 2.0162   LearningRate 0.0005   Epoch: 15   Global Step: 27200   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:32:27,319-Speed 9407.87 samples/sec   Loss 2.0332   LearningRate 0.0005   Epoch: 15   Global Step: 27210   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:32:53,457-Speed 9402.73 samples/sec   Loss 2.0290   LearningRate 0.0005   Epoch: 15   Global Step: 27220   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:33:19,582-Speed 9407.50 samples/sec   Loss 2.0102   LearningRate 0.0005   Epoch: 15   Global Step: 27230   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:33:45,616-Speed 9440.61 samples/sec   Loss 2.0160   LearningRate 0.0005   Epoch: 15   Global Step: 27240   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:34:11,674-Speed 9431.84 samples/sec   Loss 2.0111   LearningRate 0.0005   Epoch: 15   Global Step: 27250   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:34:37,781-Speed 9413.77 samples/sec   Loss 2.0139   LearningRate 0.0005   Epoch: 15   Global Step: 27260   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:35:03,834-Speed 9433.38 samples/sec   Loss 2.0235   LearningRate 0.0005   Epoch: 15   Global Step: 27270   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:35:29,957-Speed 9408.07 samples/sec   Loss 2.0215   LearningRate 0.0005   Epoch: 15   Global Step: 27280   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:35:56,100-Speed 9401.12 samples/sec   Loss 2.0173   LearningRate 0.0005   Epoch: 15   Global Step: 27290   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-05 16:36:22,201-Speed 9416.53 samples/sec   Loss 1.9894   LearningRate 0.0005   Epoch: 15   Global Step: 27300   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-05 16:36:48,290-Speed 9420.16 samples/sec   Loss 2.0122   LearningRate 0.0005   Epoch: 15   Global Step: 27310   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:37:14,423-Speed 9404.94 samples/sec   Loss 2.0247   LearningRate 0.0005   Epoch: 15   Global Step: 27320   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:37:40,539-Speed 9410.97 samples/sec   Loss 1.9989   LearningRate 0.0005   Epoch: 15   Global Step: 27330   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:38:06,586-Speed 9435.57 samples/sec   Loss 2.0024   LearningRate 0.0005   Epoch: 15   Global Step: 27340   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:38:32,662-Speed 9425.36 samples/sec   Loss 2.0275   LearningRate 0.0005   Epoch: 15   Global Step: 27350   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:38:58,783-Speed 9409.15 samples/sec   Loss 2.0132   LearningRate 0.0005   Epoch: 15   Global Step: 27360   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:39:24,846-Speed 9430.01 samples/sec   Loss 1.9980   LearningRate 0.0005   Epoch: 15   Global Step: 27370   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:39:51,020-Speed 9389.78 samples/sec   Loss 2.0101   LearningRate 0.0005   Epoch: 15   Global Step: 27380   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:40:17,125-Speed 9414.59 samples/sec   Loss 2.0102   LearningRate 0.0004   Epoch: 15   Global Step: 27390   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:40:43,173-Speed 9435.71 samples/sec   Loss 1.9994   LearningRate 0.0004   Epoch: 15   Global Step: 27400   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-05 16:41:09,282-Speed 9413.40 samples/sec   Loss 2.0035   LearningRate 0.0004   Epoch: 15   Global Step: 27410   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-05 16:41:35,318-Speed 9439.53 samples/sec   Loss 1.9965   LearningRate 0.0004   Epoch: 15   Global Step: 27420   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-05 16:42:01,429-Speed 9412.58 samples/sec   Loss 2.0184   LearningRate 0.0004   Epoch: 15   Global Step: 27430   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-05 16:42:27,502-Speed 9426.23 samples/sec   Loss 2.0228   LearningRate 0.0004   Epoch: 15   Global Step: 27440   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-05 16:42:53,569-Speed 9428.29 samples/sec   Loss 2.0088   LearningRate 0.0004   Epoch: 15   Global Step: 27450   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-05 16:43:19,602-Speed 9441.07 samples/sec   Loss 2.0085   LearningRate 0.0004   Epoch: 15   Global Step: 27460   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-05 16:43:45,722-Speed 9409.22 samples/sec   Loss 2.0089   LearningRate 0.0004   Epoch: 15   Global Step: 27470   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-05 16:44:11,834-Speed 9412.04 samples/sec   Loss 2.0128   LearningRate 0.0004   Epoch: 15   Global Step: 27480   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-05 16:44:37,925-Speed 9420.15 samples/sec   Loss 2.0061   LearningRate 0.0004   Epoch: 15   Global Step: 27490   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-05 16:45:03,966-Speed 9437.61 samples/sec   Loss 1.9907   LearningRate 0.0004   Epoch: 15   Global Step: 27500   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:45:30,098-Speed 9405.00 samples/sec   Loss 1.9940   LearningRate 0.0004   Epoch: 15   Global Step: 27510   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:45:56,201-Speed 9415.76 samples/sec   Loss 1.9799   LearningRate 0.0004   Epoch: 15   Global Step: 27520   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:46:22,314-Speed 9411.68 samples/sec   Loss 1.9708   LearningRate 0.0004   Epoch: 15   Global Step: 27530   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:46:48,436-Speed 9408.84 samples/sec   Loss 2.0011   LearningRate 0.0004   Epoch: 15   Global Step: 27540   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:47:14,498-Speed 9430.14 samples/sec   Loss 2.0020   LearningRate 0.0004   Epoch: 15   Global Step: 27550   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:47:40,665-Speed 9392.17 samples/sec   Loss 2.0167   LearningRate 0.0004   Epoch: 15   Global Step: 27560   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:48:06,743-Speed 9424.67 samples/sec   Loss 1.9944   LearningRate 0.0004   Epoch: 15   Global Step: 27570   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:48:32,857-Speed 9411.11 samples/sec   Loss 2.0127   LearningRate 0.0004   Epoch: 15   Global Step: 27580   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:48:59,030-Speed 9390.29 samples/sec   Loss 1.9872   LearningRate 0.0004   Epoch: 15   Global Step: 27590   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:49:25,180-Speed 9400.90 samples/sec   Loss 1.9950   LearningRate 0.0004   Epoch: 15   Global Step: 27600   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-05 16:49:51,291-Speed 9413.26 samples/sec   Loss 2.0167   LearningRate 0.0004   Epoch: 15   Global Step: 27610   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-05 16:50:17,353-Speed 9430.32 samples/sec   Loss 2.0148   LearningRate 0.0004   Epoch: 15   Global Step: 27620   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-05 16:50:43,418-Speed 9428.87 samples/sec   Loss 2.0066   LearningRate 0.0004   Epoch: 15   Global Step: 27630   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-05 16:51:09,679-Speed 9358.84 samples/sec   Loss 2.0189   LearningRate 0.0004   Epoch: 15   Global Step: 27640   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-05 16:51:35,726-Speed 9436.01 samples/sec   Loss 2.0409   LearningRate 0.0004   Epoch: 15   Global Step: 27650   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:52:54,415-Speed 3123.23 samples/sec   Loss 1.9931   LearningRate 0.0004   Epoch: 16   Global Step: 27660   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:53:20,293-Speed 9497.12 samples/sec   Loss 1.9799   LearningRate 0.0004   Epoch: 16   Global Step: 27670   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:53:46,267-Speed 9462.35 samples/sec   Loss 1.9510   LearningRate 0.0004   Epoch: 16   Global Step: 27680   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-05 16:54:12,303-Speed 9439.63 samples/sec   Loss 1.9701   LearningRate 0.0004   Epoch: 16   Global Step: 27690   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 16:54:38,168-Speed 9502.15 samples/sec   Loss 1.9555   LearningRate 0.0004   Epoch: 16   Global Step: 27700   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 16:55:04,124-Speed 9468.64 samples/sec   Loss 1.9607   LearningRate 0.0004   Epoch: 16   Global Step: 27710   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 16:55:30,058-Speed 9476.97 samples/sec   Loss 1.9595   LearningRate 0.0004   Epoch: 16   Global Step: 27720   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-05 16:55:55,967-Speed 9485.88 samples/sec   Loss 1.9689   LearningRate 0.0004   Epoch: 16   Global Step: 27730   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-05 16:56:21,910-Speed 9473.71 samples/sec   Loss 1.9838   LearningRate 0.0004   Epoch: 16   Global Step: 27740   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-05 16:56:47,869-Speed 9467.63 samples/sec   Loss 1.9728   LearningRate 0.0004   Epoch: 16   Global Step: 27750   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-05 16:57:13,871-Speed 9452.44 samples/sec   Loss 1.9599   LearningRate 0.0004   Epoch: 16   Global Step: 27760   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-05 16:57:39,818-Speed 9471.99 samples/sec   Loss 1.9735   LearningRate 0.0004   Epoch: 16   Global Step: 27770   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-05 16:58:05,887-Speed 9427.81 samples/sec   Loss 1.9741   LearningRate 0.0004   Epoch: 16   Global Step: 27780   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-05 16:58:31,884-Speed 9453.61 samples/sec   Loss 1.9641   LearningRate 0.0004   Epoch: 16   Global Step: 27790   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-05 16:58:57,928-Speed 9436.58 samples/sec   Loss 1.9744   LearningRate 0.0004   Epoch: 16   Global Step: 27800   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-05 16:59:23,942-Speed 9447.91 samples/sec   Loss 1.9810   LearningRate 0.0004   Epoch: 16   Global Step: 27810   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-05 16:59:49,990-Speed 9435.32 samples/sec   Loss 1.9853   LearningRate 0.0004   Epoch: 16   Global Step: 27820   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:00:16,112-Speed 9408.39 samples/sec   Loss 1.9859   LearningRate 0.0004   Epoch: 16   Global Step: 27830   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:00:42,234-Speed 9408.83 samples/sec   Loss 1.9824   LearningRate 0.0004   Epoch: 16   Global Step: 27840   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:01:08,318-Speed 9421.98 samples/sec   Loss 1.9801   LearningRate 0.0004   Epoch: 16   Global Step: 27850   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:01:34,386-Speed 9427.99 samples/sec   Loss 1.9646   LearningRate 0.0004   Epoch: 16   Global Step: 27860   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:02:00,610-Speed 9372.06 samples/sec   Loss 1.9593   LearningRate 0.0004   Epoch: 16   Global Step: 27870   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:02:26,789-Speed 9387.86 samples/sec   Loss 1.9694   LearningRate 0.0004   Epoch: 16   Global Step: 27880   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:02:52,972-Speed 9386.74 samples/sec   Loss 1.9760   LearningRate 0.0004   Epoch: 16   Global Step: 27890   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:03:19,119-Speed 9399.81 samples/sec   Loss 1.9718   LearningRate 0.0004   Epoch: 16   Global Step: 27900   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:03:45,333-Speed 9375.68 samples/sec   Loss 1.9853   LearningRate 0.0004   Epoch: 16   Global Step: 27910   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:04:11,516-Speed 9386.63 samples/sec   Loss 1.9622   LearningRate 0.0004   Epoch: 16   Global Step: 27920   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-03-05 17:04:37,628-Speed 9412.65 samples/sec   Loss 1.9787   LearningRate 0.0004   Epoch: 16   Global Step: 27930   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:05:03,772-Speed 9400.22 samples/sec   Loss 1.9618   LearningRate 0.0004   Epoch: 16   Global Step: 27940   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:05:29,927-Speed 9396.91 samples/sec   Loss 1.9706   LearningRate 0.0004   Epoch: 16   Global Step: 27950   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:05:56,136-Speed 9377.10 samples/sec   Loss 1.9635   LearningRate 0.0004   Epoch: 16   Global Step: 27960   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:06:22,291-Speed 9396.87 samples/sec   Loss 1.9793   LearningRate 0.0004   Epoch: 16   Global Step: 27970   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:06:48,427-Speed 9403.85 samples/sec   Loss 1.9602   LearningRate 0.0004   Epoch: 16   Global Step: 27980   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:07:14,626-Speed 9380.96 samples/sec   Loss 1.9819   LearningRate 0.0004   Epoch: 16   Global Step: 27990   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:07:40,789-Speed 9393.75 samples/sec   Loss 1.9656   LearningRate 0.0004   Epoch: 16   Global Step: 28000   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:08:07,052-Speed 9358.43 samples/sec   Loss 1.9618   LearningRate 0.0004   Epoch: 16   Global Step: 28010   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:08:33,231-Speed 9387.97 samples/sec   Loss 1.9519   LearningRate 0.0004   Epoch: 16   Global Step: 28020   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:08:59,362-Speed 9405.66 samples/sec   Loss 1.9679   LearningRate 0.0004   Epoch: 16   Global Step: 28030   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:09:25,496-Speed 9404.69 samples/sec   Loss 1.9751   LearningRate 0.0004   Epoch: 16   Global Step: 28040   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:09:51,673-Speed 9388.72 samples/sec   Loss 1.9547   LearningRate 0.0004   Epoch: 16   Global Step: 28050   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:10:17,808-Speed 9404.05 samples/sec   Loss 1.9706   LearningRate 0.0004   Epoch: 16   Global Step: 28060   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:10:43,962-Speed 9397.32 samples/sec   Loss 1.9730   LearningRate 0.0004   Epoch: 16   Global Step: 28070   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:11:10,108-Speed 9399.74 samples/sec   Loss 1.9751   LearningRate 0.0004   Epoch: 16   Global Step: 28080   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:11:36,224-Speed 9410.81 samples/sec   Loss 1.9509   LearningRate 0.0004   Epoch: 16   Global Step: 28090   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:12:02,389-Speed 9392.80 samples/sec   Loss 1.9670   LearningRate 0.0004   Epoch: 16   Global Step: 28100   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:12:28,587-Speed 9381.46 samples/sec   Loss 1.9647   LearningRate 0.0004   Epoch: 16   Global Step: 28110   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:12:54,699-Speed 9412.04 samples/sec   Loss 1.9607   LearningRate 0.0004   Epoch: 16   Global Step: 28120   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-05 17:13:20,902-Speed 9379.42 samples/sec   Loss 1.9655   LearningRate 0.0004   Epoch: 16   Global Step: 28130   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-05 17:13:47,046-Speed 9400.85 samples/sec   Loss 1.9469   LearningRate 0.0004   Epoch: 16   Global Step: 28140   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-05 17:14:13,280-Speed 9368.25 samples/sec   Loss 1.9655   LearningRate 0.0004   Epoch: 16   Global Step: 28150   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-05 17:14:39,496-Speed 9374.59 samples/sec   Loss 1.9598   LearningRate 0.0004   Epoch: 16   Global Step: 28160   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-05 17:15:05,748-Speed 9362.30 samples/sec   Loss 1.9603   LearningRate 0.0004   Epoch: 16   Global Step: 28170   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-05 17:15:31,931-Speed 9386.30 samples/sec   Loss 1.9633   LearningRate 0.0004   Epoch: 16   Global Step: 28180   Fp16 Grad Scale: 16384   Required: 30 hours
Training: 2022-03-05 17:15:58,073-Speed 9401.87 samples/sec   Loss 1.9574   LearningRate 0.0004   Epoch: 16   Global Step: 28190   Fp16 Grad Scale: 16384   Required: 30 hours
Training: 2022-03-05 17:16:24,222-Speed 9399.19 samples/sec   Loss 1.9591   LearningRate 0.0004   Epoch: 16   Global Step: 28200   Fp16 Grad Scale: 16384   Required: 30 hours
Training: 2022-03-05 17:16:50,449-Speed 9370.74 samples/sec   Loss 1.9487   LearningRate 0.0004   Epoch: 16   Global Step: 28210   Fp16 Grad Scale: 16384   Required: 30 hours
Training: 2022-03-05 17:17:16,653-Speed 9379.29 samples/sec   Loss 1.9401   LearningRate 0.0004   Epoch: 16   Global Step: 28220   Fp16 Grad Scale: 16384   Required: 30 hours
Training: 2022-03-05 17:17:42,909-Speed 9360.60 samples/sec   Loss 1.9433   LearningRate 0.0004   Epoch: 16   Global Step: 28230   Fp16 Grad Scale: 16384   Required: 30 hours
Training: 2022-03-05 17:18:09,071-Speed 9394.11 samples/sec   Loss 1.9322   LearningRate 0.0004   Epoch: 16   Global Step: 28240   Fp16 Grad Scale: 16384   Required: 30 hours
Training: 2022-03-05 17:18:35,195-Speed 9408.04 samples/sec   Loss 1.9607   LearningRate 0.0004   Epoch: 16   Global Step: 28250   Fp16 Grad Scale: 16384   Required: 30 hours
Training: 2022-03-05 17:19:01,427-Speed 9369.39 samples/sec   Loss 1.9690   LearningRate 0.0004   Epoch: 16   Global Step: 28260   Fp16 Grad Scale: 16384   Required: 30 hours
Training: 2022-03-05 17:19:27,685-Speed 9359.69 samples/sec   Loss 1.9463   LearningRate 0.0004   Epoch: 16   Global Step: 28270   Fp16 Grad Scale: 16384   Required: 30 hours
Training: 2022-03-05 17:19:53,930-Speed 9364.63 samples/sec   Loss 1.9474   LearningRate 0.0004   Epoch: 16   Global Step: 28280   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-05 17:20:20,170-Speed 9366.40 samples/sec   Loss 1.9553   LearningRate 0.0004   Epoch: 16   Global Step: 28290   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-05 17:20:46,330-Speed 9394.95 samples/sec   Loss 1.9689   LearningRate 0.0004   Epoch: 16   Global Step: 28300   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-05 17:21:12,530-Speed 9380.53 samples/sec   Loss 1.9410   LearningRate 0.0004   Epoch: 16   Global Step: 28310   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-05 17:21:38,770-Speed 9366.40 samples/sec   Loss 1.9580   LearningRate 0.0004   Epoch: 16   Global Step: 28320   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-05 17:22:04,972-Speed 9379.71 samples/sec   Loss 1.9391   LearningRate 0.0004   Epoch: 16   Global Step: 28330   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-05 17:22:31,231-Speed 9359.54 samples/sec   Loss 1.9509   LearningRate 0.0004   Epoch: 16   Global Step: 28340   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-05 17:22:57,435-Speed 9379.19 samples/sec   Loss 1.9494   LearningRate 0.0004   Epoch: 16   Global Step: 28350   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-05 17:23:23,658-Speed 9372.64 samples/sec   Loss 1.9487   LearningRate 0.0004   Epoch: 16   Global Step: 28360   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-05 17:23:49,916-Speed 9359.69 samples/sec   Loss 1.9371   LearningRate 0.0004   Epoch: 16   Global Step: 28370   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-05 17:24:16,166-Speed 9362.92 samples/sec   Loss 1.9504   LearningRate 0.0004   Epoch: 16   Global Step: 28380   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:24:42,381-Speed 9375.23 samples/sec   Loss 1.9557   LearningRate 0.0004   Epoch: 16   Global Step: 28390   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:25:08,624-Speed 9365.12 samples/sec   Loss 1.9500   LearningRate 0.0004   Epoch: 16   Global Step: 28400   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:25:34,874-Speed 9362.67 samples/sec   Loss 1.9434   LearningRate 0.0004   Epoch: 16   Global Step: 28410   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:26:01,134-Speed 9359.25 samples/sec   Loss 1.9253   LearningRate 0.0004   Epoch: 16   Global Step: 28420   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:26:27,326-Speed 9383.23 samples/sec   Loss 1.9419   LearningRate 0.0004   Epoch: 16   Global Step: 28430   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:26:53,550-Speed 9372.06 samples/sec   Loss 1.9364   LearningRate 0.0004   Epoch: 16   Global Step: 28440   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:27:19,698-Speed 9399.17 samples/sec   Loss 1.9382   LearningRate 0.0004   Epoch: 16   Global Step: 28450   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:27:45,871-Speed 9390.30 samples/sec   Loss 1.9456   LearningRate 0.0004   Epoch: 16   Global Step: 28460   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:28:12,121-Speed 9362.84 samples/sec   Loss 1.9438   LearningRate 0.0004   Epoch: 16   Global Step: 28470   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:28:38,311-Speed 9383.94 samples/sec   Loss 1.9230   LearningRate 0.0004   Epoch: 16   Global Step: 28480   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-03-05 17:29:04,537-Speed 9371.31 samples/sec   Loss 1.9522   LearningRate 0.0004   Epoch: 16   Global Step: 28490   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-03-05 17:29:30,728-Speed 9383.52 samples/sec   Loss 1.9287   LearningRate 0.0004   Epoch: 16   Global Step: 28500   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-03-05 17:29:56,906-Speed 9388.48 samples/sec   Loss 1.9313   LearningRate 0.0004   Epoch: 16   Global Step: 28510   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:30:23,102-Speed 9382.14 samples/sec   Loss 1.9416   LearningRate 0.0004   Epoch: 16   Global Step: 28520   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:30:49,310-Speed 9377.97 samples/sec   Loss 1.9288   LearningRate 0.0004   Epoch: 16   Global Step: 28530   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:31:15,552-Speed 9365.43 samples/sec   Loss 1.9131   LearningRate 0.0004   Epoch: 16   Global Step: 28540   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:31:41,757-Speed 9378.77 samples/sec   Loss 1.9134   LearningRate 0.0004   Epoch: 16   Global Step: 28550   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:32:08,035-Speed 9352.91 samples/sec   Loss 1.9196   LearningRate 0.0004   Epoch: 16   Global Step: 28560   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:32:34,220-Speed 9386.06 samples/sec   Loss 1.9223   LearningRate 0.0004   Epoch: 16   Global Step: 28570   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-05 17:33:00,481-Speed 9358.43 samples/sec   Loss 1.9382   LearningRate 0.0004   Epoch: 16   Global Step: 28580   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-05 17:33:26,746-Speed 9357.40 samples/sec   Loss 1.9184   LearningRate 0.0004   Epoch: 16   Global Step: 28590   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-05 17:33:53,035-Speed 9348.78 samples/sec   Loss 1.9254   LearningRate 0.0004   Epoch: 16   Global Step: 28600   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-05 17:34:19,254-Speed 9373.78 samples/sec   Loss 1.9301   LearningRate 0.0004   Epoch: 16   Global Step: 28610   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-05 17:34:45,522-Speed 9356.19 samples/sec   Loss 1.9278   LearningRate 0.0004   Epoch: 16   Global Step: 28620   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-05 17:35:11,703-Speed 9387.42 samples/sec   Loss 1.9607   LearningRate 0.0004   Epoch: 16   Global Step: 28630   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-05 17:35:37,904-Speed 9379.89 samples/sec   Loss 1.9288   LearningRate 0.0004   Epoch: 16   Global Step: 28640   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-05 17:36:04,172-Speed 9356.13 samples/sec   Loss 1.9248   LearningRate 0.0004   Epoch: 16   Global Step: 28650   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-05 17:36:30,398-Speed 9371.68 samples/sec   Loss 1.9140   LearningRate 0.0004   Epoch: 16   Global Step: 28660   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-05 17:36:56,592-Speed 9382.51 samples/sec   Loss 1.9241   LearningRate 0.0004   Epoch: 16   Global Step: 28670   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:37:22,769-Speed 9389.07 samples/sec   Loss 1.9256   LearningRate 0.0004   Epoch: 16   Global Step: 28680   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:37:49,014-Speed 9364.43 samples/sec   Loss 1.9343   LearningRate 0.0004   Epoch: 16   Global Step: 28690   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-05 17:38:15,206-Speed 9383.70 samples/sec   Loss 1.9327   LearningRate 0.0004   Epoch: 16   Global Step: 28700   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-05 17:38:41,322-Speed 9410.89 samples/sec   Loss 1.9252   LearningRate 0.0004   Epoch: 16   Global Step: 28710   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-05 17:39:07,497-Speed 9389.33 samples/sec   Loss 1.9145   LearningRate 0.0004   Epoch: 16   Global Step: 28720   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-05 17:39:33,702-Speed 9378.85 samples/sec   Loss 1.9176   LearningRate 0.0004   Epoch: 16   Global Step: 28730   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-05 17:39:59,902-Speed 9380.74 samples/sec   Loss 1.9341   LearningRate 0.0004   Epoch: 16   Global Step: 28740   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-05 17:40:26,047-Speed 9400.59 samples/sec   Loss 1.9216   LearningRate 0.0004   Epoch: 16   Global Step: 28750   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-05 17:40:52,211-Speed 9393.10 samples/sec   Loss 1.9302   LearningRate 0.0004   Epoch: 16   Global Step: 28760   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-05 17:41:18,377-Speed 9392.68 samples/sec   Loss 1.9139   LearningRate 0.0004   Epoch: 16   Global Step: 28770   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-05 17:41:44,556-Speed 9388.15 samples/sec   Loss 1.9145   LearningRate 0.0004   Epoch: 16   Global Step: 28780   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-05 17:42:10,672-Speed 9410.93 samples/sec   Loss 1.9173   LearningRate 0.0004   Epoch: 16   Global Step: 28790   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:42:36,881-Speed 9377.30 samples/sec   Loss 1.9140   LearningRate 0.0004   Epoch: 16   Global Step: 28800   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:43:03,038-Speed 9395.79 samples/sec   Loss 1.9132   LearningRate 0.0004   Epoch: 16   Global Step: 28810   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:43:29,189-Speed 9398.37 samples/sec   Loss 1.9118   LearningRate 0.0004   Epoch: 16   Global Step: 28820   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:43:55,423-Speed 9369.61 samples/sec   Loss 1.9036   LearningRate 0.0004   Epoch: 16   Global Step: 28830   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:44:21,659-Speed 9367.60 samples/sec   Loss 1.9165   LearningRate 0.0004   Epoch: 16   Global Step: 28840   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:44:47,861-Speed 9380.33 samples/sec   Loss 1.9221   LearningRate 0.0004   Epoch: 16   Global Step: 28850   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:45:13,987-Speed 9407.20 samples/sec   Loss 1.9162   LearningRate 0.0004   Epoch: 16   Global Step: 28860   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:45:40,043-Speed 9432.46 samples/sec   Loss 1.9159   LearningRate 0.0004   Epoch: 16   Global Step: 28870   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:46:06,151-Speed 9413.49 samples/sec   Loss 1.9108   LearningRate 0.0004   Epoch: 16   Global Step: 28880   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:46:32,312-Speed 9394.65 samples/sec   Loss 1.9128   LearningRate 0.0004   Epoch: 16   Global Step: 28890   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-03-05 17:46:58,367-Speed 9437.50 samples/sec   Loss 1.9098   LearningRate 0.0004   Epoch: 16   Global Step: 28900   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:47:24,462-Speed 9418.31 samples/sec   Loss 1.9271   LearningRate 0.0004   Epoch: 16   Global Step: 28910   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:47:50,566-Speed 9415.07 samples/sec   Loss 1.9177   LearningRate 0.0004   Epoch: 16   Global Step: 28920   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:48:16,694-Speed 9406.57 samples/sec   Loss 1.8918   LearningRate 0.0004   Epoch: 16   Global Step: 28930   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:48:42,762-Speed 9428.23 samples/sec   Loss 1.9125   LearningRate 0.0004   Epoch: 16   Global Step: 28940   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:49:08,895-Speed 9404.77 samples/sec   Loss 1.9170   LearningRate 0.0004   Epoch: 16   Global Step: 28950   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-05 17:49:35,009-Speed 9411.48 samples/sec   Loss 1.9153   LearningRate 0.0004   Epoch: 16   Global Step: 28960   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-05 17:50:01,177-Speed 9392.11 samples/sec   Loss 1.9023   LearningRate 0.0004   Epoch: 16   Global Step: 28970   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-05 17:50:27,245-Speed 9428.16 samples/sec   Loss 1.9250   LearningRate 0.0004   Epoch: 16   Global Step: 28980   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-05 17:50:53,330-Speed 9422.16 samples/sec   Loss 1.9192   LearningRate 0.0004   Epoch: 16   Global Step: 28990   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-05 17:51:19,521-Speed 9383.90 samples/sec   Loss 1.8999   LearningRate 0.0004   Epoch: 16   Global Step: 29000   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-05 17:51:45,594-Speed 9426.44 samples/sec   Loss 1.9047   LearningRate 0.0004   Epoch: 16   Global Step: 29010   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-05 17:52:11,769-Speed 9389.29 samples/sec   Loss 1.8927   LearningRate 0.0004   Epoch: 16   Global Step: 29020   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-05 17:52:37,989-Speed 9373.40 samples/sec   Loss 1.9072   LearningRate 0.0004   Epoch: 16   Global Step: 29030   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-05 17:53:04,156-Speed 9392.73 samples/sec   Loss 1.8942   LearningRate 0.0004   Epoch: 16   Global Step: 29040   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-05 17:53:30,287-Speed 9405.13 samples/sec   Loss 1.9040   LearningRate 0.0004   Epoch: 16   Global Step: 29050   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-05 17:53:56,363-Speed 9425.16 samples/sec   Loss 1.9011   LearningRate 0.0004   Epoch: 16   Global Step: 29060   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-05 17:54:22,415-Speed 9433.64 samples/sec   Loss 1.8996   LearningRate 0.0004   Epoch: 16   Global Step: 29070   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-05 17:54:48,554-Speed 9402.46 samples/sec   Loss 1.8948   LearningRate 0.0004   Epoch: 16   Global Step: 29080   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-05 17:55:14,627-Speed 9426.62 samples/sec   Loss 1.9017   LearningRate 0.0004   Epoch: 16   Global Step: 29090   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-05 17:55:40,680-Speed 9433.48 samples/sec   Loss 1.8903   LearningRate 0.0004   Epoch: 16   Global Step: 29100   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-05 17:56:06,753-Speed 9426.19 samples/sec   Loss 1.9046   LearningRate 0.0004   Epoch: 16   Global Step: 29110   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-05 17:56:32,822-Speed 9427.74 samples/sec   Loss 1.9036   LearningRate 0.0004   Epoch: 16   Global Step: 29120   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-05 17:56:58,871-Speed 9434.85 samples/sec   Loss 1.8991   LearningRate 0.0004   Epoch: 16   Global Step: 29130   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-05 17:57:24,940-Speed 9427.80 samples/sec   Loss 1.9046   LearningRate 0.0004   Epoch: 16   Global Step: 29140   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-05 17:57:51,066-Speed 9407.36 samples/sec   Loss 1.8985   LearningRate 0.0004   Epoch: 16   Global Step: 29150   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-05 17:58:17,245-Speed 9389.31 samples/sec   Loss 1.8905   LearningRate 0.0004   Epoch: 16   Global Step: 29160   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-05 17:58:43,337-Speed 9419.40 samples/sec   Loss 1.8869   LearningRate 0.0004   Epoch: 16   Global Step: 29170   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-05 17:59:09,461-Speed 9407.86 samples/sec   Loss 1.8985   LearningRate 0.0004   Epoch: 16   Global Step: 29180   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-05 17:59:35,557-Speed 9418.03 samples/sec   Loss 1.9051   LearningRate 0.0004   Epoch: 16   Global Step: 29190   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-05 18:00:01,720-Speed 9394.13 samples/sec   Loss 1.9059   LearningRate 0.0004   Epoch: 16   Global Step: 29200   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-05 18:00:27,849-Speed 9405.94 samples/sec   Loss 1.9088   LearningRate 0.0004   Epoch: 16   Global Step: 29210   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-05 18:00:53,904-Speed 9432.86 samples/sec   Loss 1.8965   LearningRate 0.0004   Epoch: 16   Global Step: 29220   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-05 18:01:19,965-Speed 9430.75 samples/sec   Loss 1.8966   LearningRate 0.0004   Epoch: 16   Global Step: 29230   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-05 18:01:46,016-Speed 9434.02 samples/sec   Loss 1.8967   LearningRate 0.0004   Epoch: 16   Global Step: 29240   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-05 18:02:12,102-Speed 9421.54 samples/sec   Loss 1.9134   LearningRate 0.0004   Epoch: 16   Global Step: 29250   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-05 18:02:38,286-Speed 9386.26 samples/sec   Loss 1.8979   LearningRate 0.0004   Epoch: 16   Global Step: 29260   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-05 18:03:04,406-Speed 9409.27 samples/sec   Loss 1.8967   LearningRate 0.0004   Epoch: 16   Global Step: 29270   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-05 18:03:30,534-Speed 9406.35 samples/sec   Loss 1.8920   LearningRate 0.0004   Epoch: 16   Global Step: 29280   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-05 18:03:56,733-Speed 9381.05 samples/sec   Loss 1.8958   LearningRate 0.0004   Epoch: 16   Global Step: 29290   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-05 18:04:22,858-Speed 9407.39 samples/sec   Loss 1.8989   LearningRate 0.0004   Epoch: 16   Global Step: 29300   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-05 18:04:48,937-Speed 9424.27 samples/sec   Loss 1.8927   LearningRate 0.0004   Epoch: 16   Global Step: 29310   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-05 18:05:15,010-Speed 9425.91 samples/sec   Loss 1.8951   LearningRate 0.0004   Epoch: 16   Global Step: 29320   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-05 18:05:41,140-Speed 9405.82 samples/sec   Loss 1.9060   LearningRate 0.0004   Epoch: 16   Global Step: 29330   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-05 18:06:07,195-Speed 9432.71 samples/sec   Loss 1.8988   LearningRate 0.0004   Epoch: 16   Global Step: 29340   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-03-05 18:06:33,238-Speed 9437.27 samples/sec   Loss 1.8930   LearningRate 0.0004   Epoch: 16   Global Step: 29350   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-05 18:06:59,363-Speed 9407.36 samples/sec   Loss 1.9157   LearningRate 0.0004   Epoch: 16   Global Step: 29360   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-05 18:07:25,520-Speed 9395.87 samples/sec   Loss 1.9141   LearningRate 0.0004   Epoch: 16   Global Step: 29370   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-05 18:07:51,661-Speed 9401.49 samples/sec   Loss 1.9160   LearningRate 0.0004   Epoch: 16   Global Step: 29380   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-05 18:09:11,429-Speed 3081.05 samples/sec   Loss 1.8538   LearningRate 0.0004   Epoch: 17   Global Step: 29390   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-05 18:09:37,371-Speed 9473.73 samples/sec   Loss 1.8602   LearningRate 0.0004   Epoch: 17   Global Step: 29400   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-05 18:10:03,455-Speed 9422.51 samples/sec   Loss 1.8552   LearningRate 0.0004   Epoch: 17   Global Step: 29410   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-05 18:10:29,549-Speed 9418.60 samples/sec   Loss 1.8603   LearningRate 0.0004   Epoch: 17   Global Step: 29420   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-05 18:10:55,558-Speed 9449.77 samples/sec   Loss 1.8610   LearningRate 0.0004   Epoch: 17   Global Step: 29430   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-05 18:11:21,564-Speed 9450.30 samples/sec   Loss 1.8740   LearningRate 0.0004   Epoch: 17   Global Step: 29440   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-05 18:11:47,581-Speed 9446.71 samples/sec   Loss 1.8548   LearningRate 0.0004   Epoch: 17   Global Step: 29450   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-05 18:12:13,629-Speed 9434.94 samples/sec   Loss 1.8816   LearningRate 0.0004   Epoch: 17   Global Step: 29460   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-05 18:12:39,626-Speed 9454.28 samples/sec   Loss 1.8741   LearningRate 0.0004   Epoch: 17   Global Step: 29470   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-05 18:13:05,717-Speed 9419.72 samples/sec   Loss 1.8732   LearningRate 0.0004   Epoch: 17   Global Step: 29480   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-05 18:13:31,769-Speed 9433.78 samples/sec   Loss 1.8694   LearningRate 0.0004   Epoch: 17   Global Step: 29490   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-05 18:13:57,932-Speed 9393.83 samples/sec   Loss 1.8764   LearningRate 0.0004   Epoch: 17   Global Step: 29500   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-05 18:14:23,972-Speed 9438.30 samples/sec   Loss 1.8932   LearningRate 0.0004   Epoch: 17   Global Step: 29510   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-05 18:14:50,017-Speed 9436.43 samples/sec   Loss 1.8471   LearningRate 0.0004   Epoch: 17   Global Step: 29520   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-05 18:15:16,113-Speed 9417.84 samples/sec   Loss 1.8574   LearningRate 0.0004   Epoch: 17   Global Step: 29530   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-05 18:15:42,171-Speed 9431.68 samples/sec   Loss 1.8612   LearningRate 0.0004   Epoch: 17   Global Step: 29540   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-05 18:16:08,234-Speed 9429.60 samples/sec   Loss 1.8516   LearningRate 0.0004   Epoch: 17   Global Step: 29550   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-05 18:16:34,283-Speed 9435.11 samples/sec   Loss 1.8759   LearningRate 0.0004   Epoch: 17   Global Step: 29560   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-03-05 18:17:00,445-Speed 9393.83 samples/sec   Loss 1.8607   LearningRate 0.0004   Epoch: 17   Global Step: 29570   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-03-05 18:17:26,482-Speed 9439.60 samples/sec   Loss 1.8640   LearningRate 0.0004   Epoch: 17   Global Step: 29580   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-05 18:17:52,505-Speed 9444.56 samples/sec   Loss 1.8589   LearningRate 0.0004   Epoch: 17   Global Step: 29590   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-05 18:18:18,451-Speed 9472.09 samples/sec   Loss 1.8617   LearningRate 0.0004   Epoch: 17   Global Step: 29600   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-05 18:18:44,499-Speed 9435.63 samples/sec   Loss 1.8729   LearningRate 0.0004   Epoch: 17   Global Step: 29610   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-05 18:19:10,542-Speed 9437.24 samples/sec   Loss 1.8633   LearningRate 0.0004   Epoch: 17   Global Step: 29620   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-05 18:19:36,679-Speed 9403.30 samples/sec   Loss 1.8621   LearningRate 0.0004   Epoch: 17   Global Step: 29630   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-05 18:20:02,811-Speed 9404.89 samples/sec   Loss 1.8797   LearningRate 0.0004   Epoch: 17   Global Step: 29640   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-05 18:20:28,843-Speed 9441.24 samples/sec   Loss 1.8807   LearningRate 0.0004   Epoch: 17   Global Step: 29650   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-05 18:20:54,883-Speed 9438.00 samples/sec   Loss 1.8653   LearningRate 0.0004   Epoch: 17   Global Step: 29660   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-05 18:21:21,023-Speed 9402.14 samples/sec   Loss 1.8756   LearningRate 0.0004   Epoch: 17   Global Step: 29670   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-05 18:21:47,137-Speed 9411.71 samples/sec   Loss 1.8802   LearningRate 0.0004   Epoch: 17   Global Step: 29680   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-05 18:22:13,176-Speed 9438.76 samples/sec   Loss 1.8491   LearningRate 0.0004   Epoch: 17   Global Step: 29690   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-05 18:22:39,275-Speed 9416.90 samples/sec   Loss 1.8624   LearningRate 0.0004   Epoch: 17   Global Step: 29700   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-05 18:23:05,317-Speed 9437.23 samples/sec   Loss 1.8661   LearningRate 0.0004   Epoch: 17   Global Step: 29710   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-05 18:23:31,496-Speed 9388.11 samples/sec   Loss 1.8673   LearningRate 0.0004   Epoch: 17   Global Step: 29720   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-05 18:23:57,619-Speed 9408.84 samples/sec   Loss 1.8587   LearningRate 0.0004   Epoch: 17   Global Step: 29730   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-05 18:24:23,720-Speed 9416.04 samples/sec   Loss 1.8649   LearningRate 0.0004   Epoch: 17   Global Step: 29740   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-05 18:24:49,875-Speed 9397.01 samples/sec   Loss 1.8648   LearningRate 0.0004   Epoch: 17   Global Step: 29750   Fp16 Grad Scale: 16384   Required: 29 hours
Training: 2022-03-05 18:25:15,918-Speed 9437.21 samples/sec   Loss 1.8516   LearningRate 0.0004   Epoch: 17   Global Step: 29760   Fp16 Grad Scale: 16384   Required: 29 hours
Training: 2022-03-05 18:25:42,039-Speed 9408.64 samples/sec   Loss 1.8691   LearningRate 0.0004   Epoch: 17   Global Step: 29770   Fp16 Grad Scale: 16384   Required: 29 hours
Training: 2022-03-05 18:26:08,141-Speed 9415.95 samples/sec   Loss 1.8649   LearningRate 0.0004   Epoch: 17   Global Step: 29780   Fp16 Grad Scale: 16384   Required: 29 hours
Training: 2022-03-05 18:26:34,349-Speed 9377.54 samples/sec   Loss 1.8606   LearningRate 0.0004   Epoch: 17   Global Step: 29790   Fp16 Grad Scale: 16384   Required: 29 hours
Training: 2022-03-05 18:27:00,640-Speed 9347.96 samples/sec   Loss 1.8610   LearningRate 0.0004   Epoch: 17   Global Step: 29800   Fp16 Grad Scale: 16384   Required: 29 hours
Training: 2022-03-05 18:27:26,832-Speed 9383.39 samples/sec   Loss 1.8700   LearningRate 0.0004   Epoch: 17   Global Step: 29810   Fp16 Grad Scale: 16384   Required: 29 hours
Training: 2022-03-05 18:27:52,958-Speed 9407.28 samples/sec   Loss 1.8726   LearningRate 0.0004   Epoch: 17   Global Step: 29820   Fp16 Grad Scale: 16384   Required: 29 hours
Training: 2022-03-05 18:28:19,099-Speed 9401.86 samples/sec   Loss 1.8713   LearningRate 0.0004   Epoch: 17   Global Step: 29830   Fp16 Grad Scale: 16384   Required: 29 hours
Training: 2022-03-05 18:28:45,203-Speed 9415.08 samples/sec   Loss 1.8591   LearningRate 0.0004   Epoch: 17   Global Step: 29840   Fp16 Grad Scale: 16384   Required: 29 hours
Training: 2022-03-05 18:29:11,329-Speed 9406.87 samples/sec   Loss 1.8634   LearningRate 0.0004   Epoch: 17   Global Step: 29850   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-05 18:29:37,473-Speed 9400.61 samples/sec   Loss 1.8610   LearningRate 0.0004   Epoch: 17   Global Step: 29860   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-05 18:30:03,522-Speed 9435.01 samples/sec   Loss 1.8468   LearningRate 0.0004   Epoch: 17   Global Step: 29870   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-05 18:30:29,660-Speed 9403.16 samples/sec   Loss 1.8589   LearningRate 0.0004   Epoch: 17   Global Step: 29880   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-05 18:30:55,834-Speed 9389.85 samples/sec   Loss 1.8580   LearningRate 0.0004   Epoch: 17   Global Step: 29890   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-05 18:31:21,930-Speed 9417.70 samples/sec   Loss 1.8454   LearningRate 0.0004   Epoch: 17   Global Step: 29900   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-05 18:31:48,102-Speed 9390.66 samples/sec   Loss 1.8580   LearningRate 0.0004   Epoch: 17   Global Step: 29910   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-05 18:32:14,319-Speed 9374.56 samples/sec   Loss 1.8426   LearningRate 0.0004   Epoch: 17   Global Step: 29920   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-05 18:32:40,346-Speed 9442.90 samples/sec   Loss 1.8508   LearningRate 0.0004   Epoch: 17   Global Step: 29930   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-05 18:33:06,431-Speed 9421.59 samples/sec   Loss 1.8559   LearningRate 0.0004   Epoch: 17   Global Step: 29940   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-05 18:33:32,534-Speed 9415.59 samples/sec   Loss 1.8481   LearningRate 0.0004   Epoch: 17   Global Step: 29950   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-05 18:33:58,684-Speed 9398.16 samples/sec   Loss 1.8464   LearningRate 0.0004   Epoch: 17   Global Step: 29960   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-05 18:34:24,737-Speed 9433.55 samples/sec   Loss 1.8487   LearningRate 0.0004   Epoch: 17   Global Step: 29970   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-05 18:34:50,799-Speed 9430.24 samples/sec   Loss 1.8490   LearningRate 0.0004   Epoch: 17   Global Step: 29980   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-05 18:35:16,859-Speed 9430.65 samples/sec   Loss 1.8443   LearningRate 0.0004   Epoch: 17   Global Step: 29990   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-05 18:35:42,885-Speed 9443.27 samples/sec   Loss 1.8363   LearningRate 0.0004   Epoch: 17   Global Step: 30000   Fp16 Grad Scale: 16384   Required: 29 hours
Training: 2022-03-05 18:36:09,063-Speed 9388.30 samples/sec   Loss 1.8530   LearningRate 0.0004   Epoch: 17   Global Step: 30010   Fp16 Grad Scale: 16384   Required: 29 hours
Training: 2022-03-05 18:36:35,168-Speed 9415.18 samples/sec   Loss 1.8437   LearningRate 0.0004   Epoch: 17   Global Step: 30020   Fp16 Grad Scale: 16384   Required: 29 hours
Training: 2022-03-05 18:37:01,380-Speed 9376.53 samples/sec   Loss 1.8512   LearningRate 0.0004   Epoch: 17   Global Step: 30030   Fp16 Grad Scale: 16384   Required: 29 hours
Training: 2022-03-05 18:37:27,571-Speed 9383.70 samples/sec   Loss 1.8395   LearningRate 0.0004   Epoch: 17   Global Step: 30040   Fp16 Grad Scale: 16384   Required: 29 hours
Training: 2022-03-05 18:37:53,630-Speed 9431.49 samples/sec   Loss 1.8482   LearningRate 0.0004   Epoch: 17   Global Step: 30050   Fp16 Grad Scale: 16384   Required: 29 hours
Training: 2022-03-05 18:38:19,759-Speed 9405.92 samples/sec   Loss 1.8483   LearningRate 0.0004   Epoch: 17   Global Step: 30060   Fp16 Grad Scale: 16384   Required: 29 hours
Training: 2022-03-05 18:38:45,868-Speed 9413.33 samples/sec   Loss 1.8423   LearningRate 0.0004   Epoch: 17   Global Step: 30070   Fp16 Grad Scale: 16384   Required: 29 hours
Training: 2022-03-05 18:39:11,977-Speed 9413.48 samples/sec   Loss 1.8470   LearningRate 0.0004   Epoch: 17   Global Step: 30080   Fp16 Grad Scale: 16384   Required: 29 hours
Training: 2022-03-05 18:39:38,105-Speed 9406.31 samples/sec   Loss 1.8528   LearningRate 0.0004   Epoch: 17   Global Step: 30090   Fp16 Grad Scale: 16384   Required: 29 hours
Training: 2022-03-05 18:40:04,250-Speed 9400.37 samples/sec   Loss 1.8442   LearningRate 0.0004   Epoch: 17   Global Step: 30100   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-05 18:40:30,372-Speed 9408.36 samples/sec   Loss 1.8419   LearningRate 0.0004   Epoch: 17   Global Step: 30110   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-05 18:40:56,542-Speed 9392.32 samples/sec   Loss 1.8290   LearningRate 0.0004   Epoch: 17   Global Step: 30120   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-05 18:41:22,666-Speed 9408.03 samples/sec   Loss 1.8469   LearningRate 0.0004   Epoch: 17   Global Step: 30130   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-05 18:41:48,825-Speed 9395.17 samples/sec   Loss 1.8535   LearningRate 0.0004   Epoch: 17   Global Step: 30140   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-05 18:42:14,839-Speed 9447.67 samples/sec   Loss 1.8401   LearningRate 0.0004   Epoch: 17   Global Step: 30150   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-05 18:42:40,876-Speed 9439.18 samples/sec   Loss 1.8429   LearningRate 0.0004   Epoch: 17   Global Step: 30160   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-05 18:43:06,967-Speed 9420.83 samples/sec   Loss 1.8372   LearningRate 0.0004   Epoch: 17   Global Step: 30170   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-05 18:43:33,000-Speed 9440.87 samples/sec   Loss 1.8324   LearningRate 0.0004   Epoch: 17   Global Step: 30180   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-05 18:43:59,076-Speed 9425.27 samples/sec   Loss 1.8529   LearningRate 0.0004   Epoch: 17   Global Step: 30190   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-05 18:44:25,177-Speed 9416.14 samples/sec   Loss 1.8343   LearningRate 0.0004   Epoch: 17   Global Step: 30200   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-05 18:44:51,273-Speed 9417.66 samples/sec   Loss 1.8346   LearningRate 0.0004   Epoch: 17   Global Step: 30210   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-05 18:45:17,356-Speed 9422.79 samples/sec   Loss 1.8485   LearningRate 0.0004   Epoch: 17   Global Step: 30220   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-05 18:45:43,408-Speed 9434.08 samples/sec   Loss 1.8399   LearningRate 0.0004   Epoch: 17   Global Step: 30230   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-05 18:46:09,493-Speed 9422.08 samples/sec   Loss 1.8346   LearningRate 0.0004   Epoch: 17   Global Step: 30240   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-05 18:46:35,604-Speed 9412.16 samples/sec   Loss 1.8389   LearningRate 0.0004   Epoch: 17   Global Step: 30250   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-05 18:47:01,758-Speed 9397.15 samples/sec   Loss 1.8270   LearningRate 0.0004   Epoch: 17   Global Step: 30260   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-05 18:47:27,834-Speed 9425.18 samples/sec   Loss 1.8319   LearningRate 0.0004   Epoch: 17   Global Step: 30270   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-05 18:47:53,933-Speed 9417.10 samples/sec   Loss 1.8368   LearningRate 0.0004   Epoch: 17   Global Step: 30280   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-05 18:48:20,061-Speed 9406.17 samples/sec   Loss 1.8195   LearningRate 0.0004   Epoch: 17   Global Step: 30290   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-05 18:48:46,205-Speed 9400.64 samples/sec   Loss 1.8183   LearningRate 0.0004   Epoch: 17   Global Step: 30300   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-03-05 18:49:12,284-Speed 9424.09 samples/sec   Loss 1.8316   LearningRate 0.0004   Epoch: 17   Global Step: 30310   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-05 18:49:38,471-Speed 9385.37 samples/sec   Loss 1.8354   LearningRate 0.0004   Epoch: 17   Global Step: 30320   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-05 18:50:04,521-Speed 9434.49 samples/sec   Loss 1.8281   LearningRate 0.0004   Epoch: 17   Global Step: 30330   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-05 18:50:30,601-Speed 9423.98 samples/sec   Loss 1.8327   LearningRate 0.0004   Epoch: 17   Global Step: 30340   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-05 18:50:56,733-Speed 9404.69 samples/sec   Loss 1.8353   LearningRate 0.0004   Epoch: 17   Global Step: 30350   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-05 18:51:22,816-Speed 9422.82 samples/sec   Loss 1.8296   LearningRate 0.0004   Epoch: 17   Global Step: 30360   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-05 18:51:48,907-Speed 9419.68 samples/sec   Loss 1.8262   LearningRate 0.0004   Epoch: 17   Global Step: 30370   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-05 18:52:14,994-Speed 9421.19 samples/sec   Loss 1.8206   LearningRate 0.0004   Epoch: 17   Global Step: 30380   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-05 18:52:41,115-Speed 9409.28 samples/sec   Loss 1.8119   LearningRate 0.0004   Epoch: 17   Global Step: 30390   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 18:53:07,240-Speed 9407.56 samples/sec   Loss 1.8240   LearningRate 0.0004   Epoch: 17   Global Step: 30400   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 18:53:33,379-Speed 9402.67 samples/sec   Loss 1.8137   LearningRate 0.0004   Epoch: 17   Global Step: 30410   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-03-05 18:53:59,389-Speed 9449.02 samples/sec   Loss 1.8214   LearningRate 0.0004   Epoch: 17   Global Step: 30420   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 18:54:25,469-Speed 9423.79 samples/sec   Loss 1.8263   LearningRate 0.0004   Epoch: 17   Global Step: 30430   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 18:54:51,571-Speed 9415.76 samples/sec   Loss 1.8165   LearningRate 0.0004   Epoch: 17   Global Step: 30440   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 18:55:17,686-Speed 9410.95 samples/sec   Loss 1.8136   LearningRate 0.0004   Epoch: 17   Global Step: 30450   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 18:55:43,771-Speed 9422.17 samples/sec   Loss 1.8228   LearningRate 0.0004   Epoch: 17   Global Step: 30460   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 18:56:09,866-Speed 9418.75 samples/sec   Loss 1.8068   LearningRate 0.0004   Epoch: 17   Global Step: 30470   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 18:56:35,981-Speed 9411.04 samples/sec   Loss 1.8249   LearningRate 0.0004   Epoch: 17   Global Step: 30480   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 18:57:02,053-Speed 9426.38 samples/sec   Loss 1.8285   LearningRate 0.0004   Epoch: 17   Global Step: 30490   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 18:57:28,259-Speed 9378.42 samples/sec   Loss 1.8120   LearningRate 0.0004   Epoch: 17   Global Step: 30500   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 18:57:54,470-Speed 9376.60 samples/sec   Loss 1.8199   LearningRate 0.0004   Epoch: 17   Global Step: 30510   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 18:58:20,634-Speed 9393.62 samples/sec   Loss 1.8161   LearningRate 0.0004   Epoch: 17   Global Step: 30520   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 18:58:46,765-Speed 9405.13 samples/sec   Loss 1.8084   LearningRate 0.0004   Epoch: 17   Global Step: 30530   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 18:59:12,965-Speed 9380.55 samples/sec   Loss 1.8194   LearningRate 0.0004   Epoch: 17   Global Step: 30540   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 18:59:39,149-Speed 9385.95 samples/sec   Loss 1.8184   LearningRate 0.0004   Epoch: 17   Global Step: 30550   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:00:05,384-Speed 9368.00 samples/sec   Loss 1.8297   LearningRate 0.0004   Epoch: 17   Global Step: 30560   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:00:31,533-Speed 9398.97 samples/sec   Loss 1.8393   LearningRate 0.0004   Epoch: 17   Global Step: 30570   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:00:57,672-Speed 9402.57 samples/sec   Loss 1.8147   LearningRate 0.0004   Epoch: 17   Global Step: 30580   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:01:23,837-Speed 9393.03 samples/sec   Loss 1.8152   LearningRate 0.0004   Epoch: 17   Global Step: 30590   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:01:50,023-Speed 9385.63 samples/sec   Loss 1.8026   LearningRate 0.0004   Epoch: 17   Global Step: 30600   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:02:16,139-Speed 9410.54 samples/sec   Loss 1.8026   LearningRate 0.0004   Epoch: 17   Global Step: 30610   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:02:42,195-Speed 9432.80 samples/sec   Loss 1.8162   LearningRate 0.0004   Epoch: 17   Global Step: 30620   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:03:08,283-Speed 9420.81 samples/sec   Loss 1.8138   LearningRate 0.0004   Epoch: 17   Global Step: 30630   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:03:34,357-Speed 9425.75 samples/sec   Loss 1.8101   LearningRate 0.0004   Epoch: 17   Global Step: 30640   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:04:00,472-Speed 9411.15 samples/sec   Loss 1.8192   LearningRate 0.0004   Epoch: 17   Global Step: 30650   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:04:26,596-Speed 9407.78 samples/sec   Loss 1.8107   LearningRate 0.0004   Epoch: 17   Global Step: 30660   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:04:52,763-Speed 9392.75 samples/sec   Loss 1.8186   LearningRate 0.0004   Epoch: 17   Global Step: 30670   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:05:18,937-Speed 9389.78 samples/sec   Loss 1.8113   LearningRate 0.0004   Epoch: 17   Global Step: 30680   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:05:45,139-Speed 9379.54 samples/sec   Loss 1.8100   LearningRate 0.0004   Epoch: 17   Global Step: 30690   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:06:11,323-Speed 9386.52 samples/sec   Loss 1.8151   LearningRate 0.0004   Epoch: 17   Global Step: 30700   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:06:37,485-Speed 9394.23 samples/sec   Loss 1.8067   LearningRate 0.0004   Epoch: 17   Global Step: 30710   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:07:03,637-Speed 9397.46 samples/sec   Loss 1.8090   LearningRate 0.0004   Epoch: 17   Global Step: 30720   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-03-05 19:07:29,728-Speed 9419.77 samples/sec   Loss 1.8006   LearningRate 0.0004   Epoch: 17   Global Step: 30730   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-03-05 19:07:55,854-Speed 9407.02 samples/sec   Loss 1.7973   LearningRate 0.0004   Epoch: 17   Global Step: 30740   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:08:22,110-Speed 9360.79 samples/sec   Loss 1.8023   LearningRate 0.0004   Epoch: 17   Global Step: 30750   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:08:48,201-Speed 9419.96 samples/sec   Loss 1.7962   LearningRate 0.0004   Epoch: 17   Global Step: 30760   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:09:14,305-Speed 9415.04 samples/sec   Loss 1.8089   LearningRate 0.0004   Epoch: 17   Global Step: 30770   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:09:40,506-Speed 9379.78 samples/sec   Loss 1.7997   LearningRate 0.0004   Epoch: 17   Global Step: 30780   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:10:06,670-Speed 9393.46 samples/sec   Loss 1.7892   LearningRate 0.0004   Epoch: 17   Global Step: 30790   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:10:32,855-Speed 9386.07 samples/sec   Loss 1.7985   LearningRate 0.0004   Epoch: 17   Global Step: 30800   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:10:59,012-Speed 9395.98 samples/sec   Loss 1.8011   LearningRate 0.0004   Epoch: 17   Global Step: 30810   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:11:25,139-Speed 9406.58 samples/sec   Loss 1.8003   LearningRate 0.0004   Epoch: 17   Global Step: 30820   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:11:51,262-Speed 9408.39 samples/sec   Loss 1.8043   LearningRate 0.0004   Epoch: 17   Global Step: 30830   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:12:17,329-Speed 9428.36 samples/sec   Loss 1.7934   LearningRate 0.0004   Epoch: 17   Global Step: 30840   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:12:43,475-Speed 9400.10 samples/sec   Loss 1.7894   LearningRate 0.0004   Epoch: 17   Global Step: 30850   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:13:09,557-Speed 9422.76 samples/sec   Loss 1.8043   LearningRate 0.0004   Epoch: 17   Global Step: 30860   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:13:35,660-Speed 9415.61 samples/sec   Loss 1.8018   LearningRate 0.0004   Epoch: 17   Global Step: 30870   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:14:01,797-Speed 9403.67 samples/sec   Loss 1.8000   LearningRate 0.0004   Epoch: 17   Global Step: 30880   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:14:27,946-Speed 9398.90 samples/sec   Loss 1.8042   LearningRate 0.0004   Epoch: 17   Global Step: 30890   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:14:54,021-Speed 9425.55 samples/sec   Loss 1.8130   LearningRate 0.0004   Epoch: 17   Global Step: 30900   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:15:20,112-Speed 9419.70 samples/sec   Loss 1.7981   LearningRate 0.0004   Epoch: 17   Global Step: 30910   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:15:46,280-Speed 9392.05 samples/sec   Loss 1.7995   LearningRate 0.0004   Epoch: 17   Global Step: 30920   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:16:12,413-Speed 9404.65 samples/sec   Loss 1.7881   LearningRate 0.0004   Epoch: 17   Global Step: 30930   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:16:38,530-Speed 9410.26 samples/sec   Loss 1.7980   LearningRate 0.0004   Epoch: 17   Global Step: 30940   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-03-05 19:17:04,662-Speed 9404.93 samples/sec   Loss 1.8061   LearningRate 0.0004   Epoch: 17   Global Step: 30950   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:17:30,802-Speed 9402.30 samples/sec   Loss 1.8005   LearningRate 0.0004   Epoch: 17   Global Step: 30960   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:17:56,941-Speed 9402.37 samples/sec   Loss 1.7975   LearningRate 0.0004   Epoch: 17   Global Step: 30970   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:18:23,071-Speed 9405.64 samples/sec   Loss 1.7867   LearningRate 0.0004   Epoch: 17   Global Step: 30980   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:18:49,227-Speed 9396.42 samples/sec   Loss 1.7850   LearningRate 0.0004   Epoch: 17   Global Step: 30990   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:19:15,367-Speed 9402.11 samples/sec   Loss 1.7820   LearningRate 0.0004   Epoch: 17   Global Step: 31000   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:19:41,455-Speed 9421.08 samples/sec   Loss 1.7909   LearningRate 0.0004   Epoch: 17   Global Step: 31010   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:20:07,602-Speed 9399.33 samples/sec   Loss 1.8130   LearningRate 0.0004   Epoch: 17   Global Step: 31020   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:20:33,640-Speed 9439.04 samples/sec   Loss 1.8147   LearningRate 0.0004   Epoch: 17   Global Step: 31030   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:20:59,684-Speed 9436.89 samples/sec   Loss 1.8106   LearningRate 0.0004   Epoch: 17   Global Step: 31040   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:21:25,851-Speed 9392.39 samples/sec   Loss 1.8154   LearningRate 0.0004   Epoch: 17   Global Step: 31050   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-03-05 19:21:51,963-Speed 9412.28 samples/sec   Loss 1.8082   LearningRate 0.0004   Epoch: 17   Global Step: 31060   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-03-05 19:22:18,070-Speed 9414.19 samples/sec   Loss 1.8189   LearningRate 0.0004   Epoch: 17   Global Step: 31070   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-03-05 19:22:44,149-Speed 9423.97 samples/sec   Loss 1.7969   LearningRate 0.0004   Epoch: 17   Global Step: 31080   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:23:10,268-Speed 9409.55 samples/sec   Loss 1.8044   LearningRate 0.0004   Epoch: 17   Global Step: 31090   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:23:36,355-Speed 9421.20 samples/sec   Loss 1.8200   LearningRate 0.0004   Epoch: 17   Global Step: 31100   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:24:02,493-Speed 9402.94 samples/sec   Loss 1.8071   LearningRate 0.0004   Epoch: 17   Global Step: 31110   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:25:22,145-Speed 3085.48 samples/sec   Loss 1.7660   LearningRate 0.0004   Epoch: 18   Global Step: 31120   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-05 19:25:48,100-Speed 9469.01 samples/sec   Loss 1.7552   LearningRate 0.0004   Epoch: 18   Global Step: 31130   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-05 19:26:14,194-Speed 9418.76 samples/sec   Loss 1.7699   LearningRate 0.0004   Epoch: 18   Global Step: 31140   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-05 19:26:40,292-Speed 9417.46 samples/sec   Loss 1.7712   LearningRate 0.0004   Epoch: 18   Global Step: 31150   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-05 19:27:06,477-Speed 9385.99 samples/sec   Loss 1.7747   LearningRate 0.0004   Epoch: 18   Global Step: 31160   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-05 19:27:32,520-Speed 9437.10 samples/sec   Loss 1.7665   LearningRate 0.0004   Epoch: 18   Global Step: 31170   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-05 19:27:58,725-Speed 9378.79 samples/sec   Loss 1.7719   LearningRate 0.0004   Epoch: 18   Global Step: 31180   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-05 19:28:24,835-Speed 9413.03 samples/sec   Loss 1.7741   LearningRate 0.0004   Epoch: 18   Global Step: 31190   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-05 19:28:50,956-Speed 9409.08 samples/sec   Loss 1.7711   LearningRate 0.0004   Epoch: 18   Global Step: 31200   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-05 19:29:17,086-Speed 9405.60 samples/sec   Loss 1.7695   LearningRate 0.0004   Epoch: 18   Global Step: 31210   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-05 19:29:43,172-Speed 9421.59 samples/sec   Loss 1.7588   LearningRate 0.0004   Epoch: 18   Global Step: 31220   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:30:09,303-Speed 9405.24 samples/sec   Loss 1.7781   LearningRate 0.0004   Epoch: 18   Global Step: 31230   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:30:35,352-Speed 9434.73 samples/sec   Loss 1.7749   LearningRate 0.0004   Epoch: 18   Global Step: 31240   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:31:01,408-Speed 9432.55 samples/sec   Loss 1.7615   LearningRate 0.0004   Epoch: 18   Global Step: 31250   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-05 19:31:27,543-Speed 9403.96 samples/sec   Loss 1.7736   LearningRate 0.0004   Epoch: 18   Global Step: 31260   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-05 19:31:53,691-Speed 9398.96 samples/sec   Loss 1.7771   LearningRate 0.0004   Epoch: 18   Global Step: 31270   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-05 19:32:19,790-Speed 9416.68 samples/sec   Loss 1.7687   LearningRate 0.0004   Epoch: 18   Global Step: 31280   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-05 19:32:46,013-Speed 9372.27 samples/sec   Loss 1.7742   LearningRate 0.0004   Epoch: 18   Global Step: 31290   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-05 19:33:12,128-Speed 9411.43 samples/sec   Loss 1.7759   LearningRate 0.0004   Epoch: 18   Global Step: 31300   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-05 19:33:38,255-Speed 9406.53 samples/sec   Loss 1.7670   LearningRate 0.0004   Epoch: 18   Global Step: 31310   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-05 19:34:04,419-Speed 9393.33 samples/sec   Loss 1.7761   LearningRate 0.0004   Epoch: 18   Global Step: 31320   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-05 19:34:30,546-Speed 9406.71 samples/sec   Loss 1.7778   LearningRate 0.0004   Epoch: 18   Global Step: 31330   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-05 19:34:56,644-Speed 9417.28 samples/sec   Loss 1.7752   LearningRate 0.0004   Epoch: 18   Global Step: 31340   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-05 19:35:22,750-Speed 9414.16 samples/sec   Loss 1.7661   LearningRate 0.0004   Epoch: 18   Global Step: 31350   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:35:48,970-Speed 9373.43 samples/sec   Loss 1.7770   LearningRate 0.0004   Epoch: 18   Global Step: 31360   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:36:15,076-Speed 9414.28 samples/sec   Loss 1.7712   LearningRate 0.0004   Epoch: 18   Global Step: 31370   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:36:41,171-Speed 9418.33 samples/sec   Loss 1.7790   LearningRate 0.0004   Epoch: 18   Global Step: 31380   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:37:07,302-Speed 9405.64 samples/sec   Loss 1.7656   LearningRate 0.0004   Epoch: 18   Global Step: 31390   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:37:33,405-Speed 9415.36 samples/sec   Loss 1.7605   LearningRate 0.0004   Epoch: 18   Global Step: 31400   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:37:59,474-Speed 9427.82 samples/sec   Loss 1.7761   LearningRate 0.0004   Epoch: 18   Global Step: 31410   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:38:25,568-Speed 9418.67 samples/sec   Loss 1.7634   LearningRate 0.0004   Epoch: 18   Global Step: 31420   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:38:51,657-Speed 9420.29 samples/sec   Loss 1.7730   LearningRate 0.0004   Epoch: 18   Global Step: 31430   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:39:17,729-Speed 9426.89 samples/sec   Loss 1.7836   LearningRate 0.0004   Epoch: 18   Global Step: 31440   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:39:43,850-Speed 9408.87 samples/sec   Loss 1.7654   LearningRate 0.0004   Epoch: 18   Global Step: 31450   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-03-05 19:40:09,907-Speed 9432.27 samples/sec   Loss 1.7662   LearningRate 0.0004   Epoch: 18   Global Step: 31460   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:40:36,115-Speed 9377.54 samples/sec   Loss 1.7677   LearningRate 0.0004   Epoch: 18   Global Step: 31470   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:41:02,230-Speed 9411.25 samples/sec   Loss 1.7633   LearningRate 0.0004   Epoch: 18   Global Step: 31480   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:41:28,381-Speed 9398.51 samples/sec   Loss 1.7673   LearningRate 0.0004   Epoch: 18   Global Step: 31490   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:41:54,561-Speed 9387.59 samples/sec   Loss 1.7733   LearningRate 0.0004   Epoch: 18   Global Step: 31500   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:42:20,714-Speed 9397.51 samples/sec   Loss 1.7631   LearningRate 0.0004   Epoch: 18   Global Step: 31510   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:42:46,796-Speed 9422.67 samples/sec   Loss 1.7763   LearningRate 0.0004   Epoch: 18   Global Step: 31520   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:43:12,890-Speed 9418.69 samples/sec   Loss 1.7667   LearningRate 0.0004   Epoch: 18   Global Step: 31530   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:43:39,019-Speed 9406.16 samples/sec   Loss 1.7670   LearningRate 0.0004   Epoch: 18   Global Step: 31540   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:44:05,199-Speed 9387.85 samples/sec   Loss 1.7646   LearningRate 0.0004   Epoch: 18   Global Step: 31550   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:44:31,529-Speed 9333.98 samples/sec   Loss 1.7564   LearningRate 0.0004   Epoch: 18   Global Step: 31560   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-03-05 19:44:57,598-Speed 9427.71 samples/sec   Loss 1.7676   LearningRate 0.0004   Epoch: 18   Global Step: 31570   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:45:23,753-Speed 9396.85 samples/sec   Loss 1.7559   LearningRate 0.0004   Epoch: 18   Global Step: 31580   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:45:49,906-Speed 9397.32 samples/sec   Loss 1.7613   LearningRate 0.0004   Epoch: 18   Global Step: 31590   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:46:16,123-Speed 9374.85 samples/sec   Loss 1.7699   LearningRate 0.0004   Epoch: 18   Global Step: 31600   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:46:42,246-Speed 9407.92 samples/sec   Loss 1.7568   LearningRate 0.0004   Epoch: 18   Global Step: 31610   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:47:08,419-Speed 9389.98 samples/sec   Loss 1.7619   LearningRate 0.0004   Epoch: 18   Global Step: 31620   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:47:34,509-Speed 9420.09 samples/sec   Loss 1.7575   LearningRate 0.0004   Epoch: 18   Global Step: 31630   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:48:00,631-Speed 9408.89 samples/sec   Loss 1.7649   LearningRate 0.0004   Epoch: 18   Global Step: 31640   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:48:26,746-Speed 9411.18 samples/sec   Loss 1.7542   LearningRate 0.0004   Epoch: 18   Global Step: 31650   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:48:52,789-Speed 9436.98 samples/sec   Loss 1.7563   LearningRate 0.0004   Epoch: 18   Global Step: 31660   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:49:18,884-Speed 9418.22 samples/sec   Loss 1.7434   LearningRate 0.0004   Epoch: 18   Global Step: 31670   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:49:44,982-Speed 9417.45 samples/sec   Loss 1.7538   LearningRate 0.0004   Epoch: 18   Global Step: 31680   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:50:11,202-Speed 9373.56 samples/sec   Loss 1.7522   LearningRate 0.0004   Epoch: 18   Global Step: 31690   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:50:37,279-Speed 9424.76 samples/sec   Loss 1.7623   LearningRate 0.0004   Epoch: 18   Global Step: 31700   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:51:03,389-Speed 9412.79 samples/sec   Loss 1.7533   LearningRate 0.0004   Epoch: 18   Global Step: 31710   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:51:29,519-Speed 9405.61 samples/sec   Loss 1.7701   LearningRate 0.0004   Epoch: 18   Global Step: 31720   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:51:55,694-Speed 9389.57 samples/sec   Loss 1.7597   LearningRate 0.0004   Epoch: 18   Global Step: 31730   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:52:21,855-Speed 9395.32 samples/sec   Loss 1.7541   LearningRate 0.0004   Epoch: 18   Global Step: 31740   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-05 19:52:47,929-Speed 9426.09 samples/sec   Loss 1.7577   LearningRate 0.0004   Epoch: 18   Global Step: 31750   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 19:53:14,059-Speed 9405.55 samples/sec   Loss 1.7457   LearningRate 0.0004   Epoch: 18   Global Step: 31760   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-05 19:53:40,220-Speed 9394.49 samples/sec   Loss 1.7520   LearningRate 0.0004   Epoch: 18   Global Step: 31770   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-05 19:54:06,379-Speed 9395.27 samples/sec   Loss 1.7591   LearningRate 0.0004   Epoch: 18   Global Step: 31780   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-05 19:54:32,561-Speed 9387.05 samples/sec   Loss 1.7410   LearningRate 0.0004   Epoch: 18   Global Step: 31790   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-05 19:54:58,706-Speed 9400.37 samples/sec   Loss 1.7396   LearningRate 0.0004   Epoch: 18   Global Step: 31800   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-05 19:55:24,771-Speed 9429.04 samples/sec   Loss 1.7647   LearningRate 0.0004   Epoch: 18   Global Step: 31810   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-05 19:55:50,907-Speed 9403.62 samples/sec   Loss 1.7391   LearningRate 0.0004   Epoch: 18   Global Step: 31820   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-05 19:56:17,078-Speed 9391.31 samples/sec   Loss 1.7452   LearningRate 0.0004   Epoch: 18   Global Step: 31830   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-05 19:56:43,210-Speed 9404.67 samples/sec   Loss 1.7510   LearningRate 0.0004   Epoch: 18   Global Step: 31840   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-05 19:57:09,320-Speed 9413.18 samples/sec   Loss 1.7459   LearningRate 0.0004   Epoch: 18   Global Step: 31850   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-05 19:57:35,383-Speed 9429.64 samples/sec   Loss 1.7468   LearningRate 0.0004   Epoch: 18   Global Step: 31860   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 19:58:01,511-Speed 9406.32 samples/sec   Loss 1.7483   LearningRate 0.0004   Epoch: 18   Global Step: 31870   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 19:58:27,627-Speed 9411.01 samples/sec   Loss 1.7409   LearningRate 0.0004   Epoch: 18   Global Step: 31880   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 19:58:53,704-Speed 9424.96 samples/sec   Loss 1.7478   LearningRate 0.0004   Epoch: 18   Global Step: 31890   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 19:59:19,818-Speed 9411.25 samples/sec   Loss 1.7425   LearningRate 0.0004   Epoch: 18   Global Step: 31900   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 19:59:45,961-Speed 9401.21 samples/sec   Loss 1.7499   LearningRate 0.0004   Epoch: 18   Global Step: 31910   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:00:12,031-Speed 9427.39 samples/sec   Loss 1.7287   LearningRate 0.0004   Epoch: 18   Global Step: 31920   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-05 20:00:38,156-Speed 9407.52 samples/sec   Loss 1.7308   LearningRate 0.0004   Epoch: 18   Global Step: 31930   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-05 20:01:04,339-Speed 9386.65 samples/sec   Loss 1.7347   LearningRate 0.0004   Epoch: 18   Global Step: 31940   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-05 20:01:30,433-Speed 9418.47 samples/sec   Loss 1.7342   LearningRate 0.0004   Epoch: 18   Global Step: 31950   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-05 20:01:56,511-Speed 9424.67 samples/sec   Loss 1.7356   LearningRate 0.0004   Epoch: 18   Global Step: 31960   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-05 20:02:22,659-Speed 9399.29 samples/sec   Loss 1.7433   LearningRate 0.0004   Epoch: 18   Global Step: 31970   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-05 20:02:48,794-Speed 9403.91 samples/sec   Loss 1.7482   LearningRate 0.0004   Epoch: 18   Global Step: 31980   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-05 20:03:14,883-Speed 9420.55 samples/sec   Loss 1.7354   LearningRate 0.0004   Epoch: 18   Global Step: 31990   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-05 20:03:41,007-Speed 9408.11 samples/sec   Loss 1.7272   LearningRate 0.0004   Epoch: 18   Global Step: 32000   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-05 20:04:07,057-Speed 9434.32 samples/sec   Loss 1.7276   LearningRate 0.0004   Epoch: 18   Global Step: 32010   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-05 20:04:33,137-Speed 9423.77 samples/sec   Loss 1.7403   LearningRate 0.0004   Epoch: 18   Global Step: 32020   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:04:59,258-Speed 9408.98 samples/sec   Loss 1.7410   LearningRate 0.0004   Epoch: 18   Global Step: 32030   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:05:25,375-Speed 9410.28 samples/sec   Loss 1.7394   LearningRate 0.0004   Epoch: 18   Global Step: 32040   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:05:51,468-Speed 9419.22 samples/sec   Loss 1.7233   LearningRate 0.0004   Epoch: 18   Global Step: 32050   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:06:17,645-Speed 9388.82 samples/sec   Loss 1.7378   LearningRate 0.0004   Epoch: 18   Global Step: 32060   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:06:43,758-Speed 9411.81 samples/sec   Loss 1.7370   LearningRate 0.0004   Epoch: 18   Global Step: 32070   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:07:09,867-Speed 9414.04 samples/sec   Loss 1.7348   LearningRate 0.0004   Epoch: 18   Global Step: 32080   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:07:36,113-Speed 9363.88 samples/sec   Loss 1.7404   LearningRate 0.0004   Epoch: 18   Global Step: 32090   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:08:02,258-Speed 9400.48 samples/sec   Loss 1.7387   LearningRate 0.0004   Epoch: 18   Global Step: 32100   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:08:28,368-Speed 9412.95 samples/sec   Loss 1.7298   LearningRate 0.0004   Epoch: 18   Global Step: 32110   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:08:54,519-Speed 9398.32 samples/sec   Loss 1.7325   LearningRate 0.0004   Epoch: 18   Global Step: 32120   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-03-05 20:09:20,772-Speed 9362.19 samples/sec   Loss 1.7305   LearningRate 0.0004   Epoch: 18   Global Step: 32130   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-03-05 20:09:46,922-Speed 9398.64 samples/sec   Loss 1.7282   LearningRate 0.0004   Epoch: 18   Global Step: 32140   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:10:13,074-Speed 9397.85 samples/sec   Loss 1.7188   LearningRate 0.0004   Epoch: 18   Global Step: 32150   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:10:39,219-Speed 9400.25 samples/sec   Loss 1.7293   LearningRate 0.0004   Epoch: 18   Global Step: 32160   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:11:05,349-Speed 9405.77 samples/sec   Loss 1.7335   LearningRate 0.0004   Epoch: 18   Global Step: 32170   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:11:31,577-Speed 9370.24 samples/sec   Loss 1.7263   LearningRate 0.0004   Epoch: 18   Global Step: 32180   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:11:57,800-Speed 9372.22 samples/sec   Loss 1.7391   LearningRate 0.0004   Epoch: 18   Global Step: 32190   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:12:24,093-Speed 9347.61 samples/sec   Loss 1.7488   LearningRate 0.0004   Epoch: 18   Global Step: 32200   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:12:50,284-Speed 9383.43 samples/sec   Loss 1.7334   LearningRate 0.0004   Epoch: 18   Global Step: 32210   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:13:16,389-Speed 9415.15 samples/sec   Loss 1.7251   LearningRate 0.0004   Epoch: 18   Global Step: 32220   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:13:42,526-Speed 9403.25 samples/sec   Loss 1.7350   LearningRate 0.0004   Epoch: 18   Global Step: 32230   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:14:08,690-Speed 9393.46 samples/sec   Loss 1.7222   LearningRate 0.0004   Epoch: 18   Global Step: 32240   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-03-05 20:14:34,826-Speed 9403.40 samples/sec   Loss 1.7104   LearningRate 0.0004   Epoch: 18   Global Step: 32250   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-03-05 20:15:00,864-Speed 9439.08 samples/sec   Loss 1.7130   LearningRate 0.0004   Epoch: 18   Global Step: 32260   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:15:26,984-Speed 9409.66 samples/sec   Loss 1.7205   LearningRate 0.0004   Epoch: 18   Global Step: 32270   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:15:53,053-Speed 9427.74 samples/sec   Loss 1.7153   LearningRate 0.0004   Epoch: 18   Global Step: 32280   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:16:19,159-Speed 9414.25 samples/sec   Loss 1.7275   LearningRate 0.0004   Epoch: 18   Global Step: 32290   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:16:45,232-Speed 9426.35 samples/sec   Loss 1.7267   LearningRate 0.0004   Epoch: 18   Global Step: 32300   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:17:11,382-Speed 9398.28 samples/sec   Loss 1.7403   LearningRate 0.0004   Epoch: 18   Global Step: 32310   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:17:37,525-Speed 9401.03 samples/sec   Loss 1.7219   LearningRate 0.0003   Epoch: 18   Global Step: 32320   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:18:03,596-Speed 9427.21 samples/sec   Loss 1.7149   LearningRate 0.0003   Epoch: 18   Global Step: 32330   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:18:29,643-Speed 9435.39 samples/sec   Loss 1.7194   LearningRate 0.0003   Epoch: 18   Global Step: 32340   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:18:55,681-Speed 9438.98 samples/sec   Loss 1.7123   LearningRate 0.0003   Epoch: 18   Global Step: 32350   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:19:21,812-Speed 9405.50 samples/sec   Loss 1.7209   LearningRate 0.0003   Epoch: 18   Global Step: 32360   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-03-05 20:19:48,000-Speed 9384.78 samples/sec   Loss 1.7102   LearningRate 0.0003   Epoch: 18   Global Step: 32370   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-03-05 20:20:14,258-Speed 9359.95 samples/sec   Loss 1.7208   LearningRate 0.0003   Epoch: 18   Global Step: 32380   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-03-05 20:20:40,326-Speed 9427.99 samples/sec   Loss 1.7249   LearningRate 0.0003   Epoch: 18   Global Step: 32390   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:21:06,492-Speed 9392.59 samples/sec   Loss 1.7151   LearningRate 0.0003   Epoch: 18   Global Step: 32400   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:21:32,767-Speed 9353.64 samples/sec   Loss 1.7160   LearningRate 0.0003   Epoch: 18   Global Step: 32410   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:21:59,001-Speed 9368.47 samples/sec   Loss 1.7261   LearningRate 0.0003   Epoch: 18   Global Step: 32420   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-05 20:22:25,091-Speed 9420.02 samples/sec   Loss 1.7030   LearningRate 0.0003   Epoch: 18   Global Step: 32430   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-05 20:22:51,289-Speed 9381.37 samples/sec   Loss 1.7097   LearningRate 0.0003   Epoch: 18   Global Step: 32440   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-05 20:23:17,449-Speed 9394.82 samples/sec   Loss 1.7151   LearningRate 0.0003   Epoch: 18   Global Step: 32450   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-05 20:23:43,557-Speed 9413.76 samples/sec   Loss 1.7051   LearningRate 0.0003   Epoch: 18   Global Step: 32460   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-05 20:24:09,758-Speed 9380.15 samples/sec   Loss 1.7146   LearningRate 0.0003   Epoch: 18   Global Step: 32470   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-05 20:24:35,958-Speed 9380.30 samples/sec   Loss 1.7075   LearningRate 0.0003   Epoch: 18   Global Step: 32480   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-05 20:25:02,177-Speed 9373.73 samples/sec   Loss 1.7145   LearningRate 0.0003   Epoch: 18   Global Step: 32490   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-05 20:25:28,382-Speed 9378.74 samples/sec   Loss 1.7148   LearningRate 0.0003   Epoch: 18   Global Step: 32500   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-05 20:25:54,684-Speed 9344.49 samples/sec   Loss 1.7099   LearningRate 0.0003   Epoch: 18   Global Step: 32510   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-05 20:26:20,916-Speed 9369.02 samples/sec   Loss 1.7126   LearningRate 0.0003   Epoch: 18   Global Step: 32520   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:26:47,019-Speed 9415.36 samples/sec   Loss 1.7221   LearningRate 0.0003   Epoch: 18   Global Step: 32530   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:27:13,182-Speed 9393.88 samples/sec   Loss 1.7021   LearningRate 0.0003   Epoch: 18   Global Step: 32540   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:27:39,374-Speed 9384.13 samples/sec   Loss 1.7255   LearningRate 0.0003   Epoch: 18   Global Step: 32550   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:28:05,616-Speed 9365.35 samples/sec   Loss 1.7170   LearningRate 0.0003   Epoch: 18   Global Step: 32560   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:28:31,779-Speed 9393.75 samples/sec   Loss 1.7121   LearningRate 0.0003   Epoch: 18   Global Step: 32570   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:28:57,933-Speed 9397.16 samples/sec   Loss 1.6984   LearningRate 0.0003   Epoch: 18   Global Step: 32580   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:29:24,004-Speed 9427.16 samples/sec   Loss 1.7083   LearningRate 0.0003   Epoch: 18   Global Step: 32590   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:29:50,191-Speed 9385.40 samples/sec   Loss 1.7045   LearningRate 0.0003   Epoch: 18   Global Step: 32600   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:30:16,271-Speed 9423.94 samples/sec   Loss 1.7087   LearningRate 0.0003   Epoch: 18   Global Step: 32610   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:30:42,306-Speed 9439.74 samples/sec   Loss 1.7105   LearningRate 0.0003   Epoch: 18   Global Step: 32620   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-05 20:31:08,372-Speed 9428.99 samples/sec   Loss 1.7024   LearningRate 0.0003   Epoch: 18   Global Step: 32630   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-05 20:31:34,490-Speed 9409.73 samples/sec   Loss 1.7116   LearningRate 0.0003   Epoch: 18   Global Step: 32640   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-05 20:32:00,598-Speed 9413.75 samples/sec   Loss 1.7043   LearningRate 0.0003   Epoch: 18   Global Step: 32650   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-05 20:32:26,645-Speed 9435.93 samples/sec   Loss 1.7135   LearningRate 0.0003   Epoch: 18   Global Step: 32660   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-05 20:32:52,719-Speed 9425.66 samples/sec   Loss 1.7116   LearningRate 0.0003   Epoch: 18   Global Step: 32670   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-05 20:33:18,795-Speed 9425.10 samples/sec   Loss 1.7092   LearningRate 0.0003   Epoch: 18   Global Step: 32680   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-05 20:33:44,915-Speed 9409.15 samples/sec   Loss 1.7025   LearningRate 0.0003   Epoch: 18   Global Step: 32690   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-05 20:34:11,067-Speed 9397.88 samples/sec   Loss 1.7130   LearningRate 0.0003   Epoch: 18   Global Step: 32700   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-05 20:34:37,317-Speed 9362.60 samples/sec   Loss 1.6987   LearningRate 0.0003   Epoch: 18   Global Step: 32710   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-05 20:35:03,403-Speed 9421.70 samples/sec   Loss 1.7009   LearningRate 0.0003   Epoch: 18   Global Step: 32720   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:35:29,667-Speed 9357.73 samples/sec   Loss 1.7089   LearningRate 0.0003   Epoch: 18   Global Step: 32730   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:35:55,813-Speed 9399.90 samples/sec   Loss 1.6939   LearningRate 0.0003   Epoch: 18   Global Step: 32740   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:36:21,896-Speed 9422.74 samples/sec   Loss 1.6946   LearningRate 0.0003   Epoch: 18   Global Step: 32750   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:36:47,965-Speed 9427.93 samples/sec   Loss 1.7000   LearningRate 0.0003   Epoch: 18   Global Step: 32760   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:37:14,155-Speed 9383.82 samples/sec   Loss 1.7177   LearningRate 0.0003   Epoch: 18   Global Step: 32770   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:37:40,260-Speed 9414.96 samples/sec   Loss 1.7062   LearningRate 0.0003   Epoch: 18   Global Step: 32780   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:38:06,340-Speed 9423.36 samples/sec   Loss 1.7173   LearningRate 0.0003   Epoch: 18   Global Step: 32790   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:38:32,517-Speed 9389.01 samples/sec   Loss 1.7294   LearningRate 0.0003   Epoch: 18   Global Step: 32800   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:38:58,591-Speed 9426.09 samples/sec   Loss 1.7057   LearningRate 0.0003   Epoch: 18   Global Step: 32810   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:39:24,652-Speed 9431.07 samples/sec   Loss 1.7170   LearningRate 0.0003   Epoch: 18   Global Step: 32820   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-03-05 20:39:50,774-Speed 9408.58 samples/sec   Loss 1.7156   LearningRate 0.0003   Epoch: 18   Global Step: 32830   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-03-05 20:41:10,888-Speed 3067.69 samples/sec   Loss 1.7092   LearningRate 0.0003   Epoch: 19   Global Step: 32840   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-03-05 20:41:36,641-Speed 9543.29 samples/sec   Loss 1.6822   LearningRate 0.0003   Epoch: 19   Global Step: 32850   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:42:02,514-Speed 9499.47 samples/sec   Loss 1.6768   LearningRate 0.0003   Epoch: 19   Global Step: 32860   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:42:28,419-Speed 9487.24 samples/sec   Loss 1.6857   LearningRate 0.0003   Epoch: 19   Global Step: 32870   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:42:54,382-Speed 9466.35 samples/sec   Loss 1.6928   LearningRate 0.0003   Epoch: 19   Global Step: 32880   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:43:20,277-Speed 9490.84 samples/sec   Loss 1.6942   LearningRate 0.0003   Epoch: 19   Global Step: 32890   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:43:46,186-Speed 9486.12 samples/sec   Loss 1.6916   LearningRate 0.0003   Epoch: 19   Global Step: 32900   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:44:12,106-Speed 9481.63 samples/sec   Loss 1.6735   LearningRate 0.0003   Epoch: 19   Global Step: 32910   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:44:38,156-Speed 9434.73 samples/sec   Loss 1.6952   LearningRate 0.0003   Epoch: 19   Global Step: 32920   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:45:04,078-Speed 9481.05 samples/sec   Loss 1.6820   LearningRate 0.0003   Epoch: 19   Global Step: 32930   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:45:30,064-Speed 9458.03 samples/sec   Loss 1.6818   LearningRate 0.0003   Epoch: 19   Global Step: 32940   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:45:55,966-Speed 9488.44 samples/sec   Loss 1.6822   LearningRate 0.0003   Epoch: 19   Global Step: 32950   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-03-05 20:46:21,873-Speed 9486.74 samples/sec   Loss 1.6695   LearningRate 0.0003   Epoch: 19   Global Step: 32960   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:46:47,766-Speed 9491.89 samples/sec   Loss 1.6900   LearningRate 0.0003   Epoch: 19   Global Step: 32970   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:47:13,766-Speed 9452.57 samples/sec   Loss 1.6856   LearningRate 0.0003   Epoch: 19   Global Step: 32980   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:47:39,681-Speed 9483.60 samples/sec   Loss 1.6767   LearningRate 0.0003   Epoch: 19   Global Step: 32990   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:48:05,691-Speed 9449.38 samples/sec   Loss 1.6777   LearningRate 0.0003   Epoch: 19   Global Step: 33000   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:48:31,684-Speed 9455.08 samples/sec   Loss 1.6677   LearningRate 0.0003   Epoch: 19   Global Step: 33010   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:48:57,646-Speed 9466.43 samples/sec   Loss 1.6709   LearningRate 0.0003   Epoch: 19   Global Step: 33020   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:49:23,614-Speed 9464.46 samples/sec   Loss 1.6827   LearningRate 0.0003   Epoch: 19   Global Step: 33030   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:49:49,641-Speed 9443.06 samples/sec   Loss 1.6870   LearningRate 0.0003   Epoch: 19   Global Step: 33040   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:50:15,591-Speed 9471.15 samples/sec   Loss 1.6806   LearningRate 0.0003   Epoch: 19   Global Step: 33050   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:50:41,573-Speed 9459.68 samples/sec   Loss 1.6788   LearningRate 0.0003   Epoch: 19   Global Step: 33060   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:51:07,568-Speed 9454.53 samples/sec   Loss 1.6736   LearningRate 0.0003   Epoch: 19   Global Step: 33070   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:51:33,564-Speed 9453.82 samples/sec   Loss 1.6836   LearningRate 0.0003   Epoch: 19   Global Step: 33080   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-05 20:51:59,584-Speed 9445.60 samples/sec   Loss 1.6976   LearningRate 0.0003   Epoch: 19   Global Step: 33090   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-05 20:52:25,564-Speed 9459.90 samples/sec   Loss 1.6737   LearningRate 0.0003   Epoch: 19   Global Step: 33100   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-05 20:52:51,490-Speed 9479.84 samples/sec   Loss 1.6856   LearningRate 0.0003   Epoch: 19   Global Step: 33110   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-05 20:53:17,509-Speed 9445.81 samples/sec   Loss 1.6828   LearningRate 0.0003   Epoch: 19   Global Step: 33120   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-05 20:53:43,551-Speed 9437.25 samples/sec   Loss 1.6820   LearningRate 0.0003   Epoch: 19   Global Step: 33130   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-05 20:54:09,592-Speed 9437.81 samples/sec   Loss 1.6858   LearningRate 0.0003   Epoch: 19   Global Step: 33140   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-05 20:54:35,594-Speed 9451.93 samples/sec   Loss 1.6732   LearningRate 0.0003   Epoch: 19   Global Step: 33150   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-05 20:55:01,610-Speed 9447.29 samples/sec   Loss 1.6888   LearningRate 0.0003   Epoch: 19   Global Step: 33160   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-05 20:55:27,670-Speed 9430.94 samples/sec   Loss 1.6868   LearningRate 0.0003   Epoch: 19   Global Step: 33170   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-05 20:55:53,658-Speed 9457.18 samples/sec   Loss 1.6806   LearningRate 0.0003   Epoch: 19   Global Step: 33180   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-05 20:56:19,646-Speed 9456.98 samples/sec   Loss 1.6771   LearningRate 0.0003   Epoch: 19   Global Step: 33190   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 20:56:45,762-Speed 9410.71 samples/sec   Loss 1.6836   LearningRate 0.0003   Epoch: 19   Global Step: 33200   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 20:57:11,812-Speed 9434.94 samples/sec   Loss 1.6753   LearningRate 0.0003   Epoch: 19   Global Step: 33210   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 20:57:37,919-Speed 9414.12 samples/sec   Loss 1.6695   LearningRate 0.0003   Epoch: 19   Global Step: 33220   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 20:58:04,038-Speed 9409.54 samples/sec   Loss 1.6765   LearningRate 0.0003   Epoch: 19   Global Step: 33230   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 20:58:30,136-Speed 9417.33 samples/sec   Loss 1.6716   LearningRate 0.0003   Epoch: 19   Global Step: 33240   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 20:58:56,185-Speed 9434.65 samples/sec   Loss 1.6657   LearningRate 0.0003   Epoch: 19   Global Step: 33250   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 20:59:22,249-Speed 9429.72 samples/sec   Loss 1.6663   LearningRate 0.0003   Epoch: 19   Global Step: 33260   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 20:59:48,408-Speed 9395.10 samples/sec   Loss 1.6837   LearningRate 0.0003   Epoch: 19   Global Step: 33270   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:00:14,559-Speed 9397.92 samples/sec   Loss 1.6742   LearningRate 0.0003   Epoch: 19   Global Step: 33280   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:00:40,591-Speed 9441.38 samples/sec   Loss 1.6871   LearningRate 0.0003   Epoch: 19   Global Step: 33290   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:01:06,721-Speed 9405.53 samples/sec   Loss 1.6796   LearningRate 0.0003   Epoch: 19   Global Step: 33300   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:01:32,809-Speed 9421.12 samples/sec   Loss 1.6818   LearningRate 0.0003   Epoch: 19   Global Step: 33310   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:01:58,918-Speed 9413.17 samples/sec   Loss 1.6744   LearningRate 0.0003   Epoch: 19   Global Step: 33320   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:02:25,053-Speed 9403.75 samples/sec   Loss 1.6710   LearningRate 0.0003   Epoch: 19   Global Step: 33330   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:02:51,116-Speed 9429.64 samples/sec   Loss 1.6622   LearningRate 0.0003   Epoch: 19   Global Step: 33340   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:03:17,189-Speed 9426.35 samples/sec   Loss 1.6768   LearningRate 0.0003   Epoch: 19   Global Step: 33350   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:03:43,391-Speed 9379.99 samples/sec   Loss 1.6649   LearningRate 0.0003   Epoch: 19   Global Step: 33360   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:04:09,528-Speed 9403.14 samples/sec   Loss 1.6584   LearningRate 0.0003   Epoch: 19   Global Step: 33370   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:04:35,648-Speed 9409.16 samples/sec   Loss 1.6744   LearningRate 0.0003   Epoch: 19   Global Step: 33380   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:05:01,734-Speed 9421.57 samples/sec   Loss 1.6725   LearningRate 0.0003   Epoch: 19   Global Step: 33390   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-03-05 21:05:27,894-Speed 9395.09 samples/sec   Loss 1.6786   LearningRate 0.0003   Epoch: 19   Global Step: 33400   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-03-05 21:05:54,013-Speed 9409.67 samples/sec   Loss 1.6603   LearningRate 0.0003   Epoch: 19   Global Step: 33410   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-03-05 21:06:20,118-Speed 9415.14 samples/sec   Loss 1.6568   LearningRate 0.0003   Epoch: 19   Global Step: 33420   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:06:46,308-Speed 9384.04 samples/sec   Loss 1.6626   LearningRate 0.0003   Epoch: 19   Global Step: 33430   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:07:12,421-Speed 9412.11 samples/sec   Loss 1.6660   LearningRate 0.0003   Epoch: 19   Global Step: 33440   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:07:38,551-Speed 9405.73 samples/sec   Loss 1.6728   LearningRate 0.0003   Epoch: 19   Global Step: 33450   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:08:04,610-Speed 9431.50 samples/sec   Loss 1.6521   LearningRate 0.0003   Epoch: 19   Global Step: 33460   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:08:30,760-Speed 9399.16 samples/sec   Loss 1.6459   LearningRate 0.0003   Epoch: 19   Global Step: 33470   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:08:56,976-Speed 9374.96 samples/sec   Loss 1.6631   LearningRate 0.0003   Epoch: 19   Global Step: 33480   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:09:23,095-Speed 9409.90 samples/sec   Loss 1.6515   LearningRate 0.0003   Epoch: 19   Global Step: 33490   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:09:49,333-Speed 9367.13 samples/sec   Loss 1.6584   LearningRate 0.0003   Epoch: 19   Global Step: 33500   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:10:15,410-Speed 9424.60 samples/sec   Loss 1.6629   LearningRate 0.0003   Epoch: 19   Global Step: 33510   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:10:41,526-Speed 9410.96 samples/sec   Loss 1.6600   LearningRate 0.0003   Epoch: 19   Global Step: 33520   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-03-05 21:11:07,738-Speed 9376.24 samples/sec   Loss 1.6558   LearningRate 0.0003   Epoch: 19   Global Step: 33530   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-03-05 21:11:33,900-Speed 9394.20 samples/sec   Loss 1.6733   LearningRate 0.0003   Epoch: 19   Global Step: 33540   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-03-05 21:12:00,105-Speed 9378.71 samples/sec   Loss 1.6761   LearningRate 0.0003   Epoch: 19   Global Step: 33550   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:12:26,207-Speed 9415.90 samples/sec   Loss 1.6674   LearningRate 0.0003   Epoch: 19   Global Step: 33560   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:12:52,340-Speed 9404.77 samples/sec   Loss 1.6603   LearningRate 0.0003   Epoch: 19   Global Step: 33570   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:13:18,486-Speed 9399.96 samples/sec   Loss 1.6566   LearningRate 0.0003   Epoch: 19   Global Step: 33580   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:13:44,568-Speed 9423.12 samples/sec   Loss 1.6498   LearningRate 0.0003   Epoch: 19   Global Step: 33590   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:14:10,740-Speed 9390.41 samples/sec   Loss 1.6554   LearningRate 0.0003   Epoch: 19   Global Step: 33600   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:14:36,803-Speed 9430.37 samples/sec   Loss 1.6500   LearningRate 0.0003   Epoch: 19   Global Step: 33610   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:15:02,916-Speed 9411.48 samples/sec   Loss 1.6578   LearningRate 0.0003   Epoch: 19   Global Step: 33620   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:15:29,040-Speed 9408.25 samples/sec   Loss 1.6495   LearningRate 0.0003   Epoch: 19   Global Step: 33630   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:15:55,117-Speed 9424.59 samples/sec   Loss 1.6539   LearningRate 0.0003   Epoch: 19   Global Step: 33640   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:16:21,169-Speed 9434.03 samples/sec   Loss 1.6572   LearningRate 0.0003   Epoch: 19   Global Step: 33650   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-03-05 21:16:47,216-Speed 9436.96 samples/sec   Loss 1.6515   LearningRate 0.0003   Epoch: 19   Global Step: 33660   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-03-05 21:17:13,315-Speed 9416.98 samples/sec   Loss 1.6462   LearningRate 0.0003   Epoch: 19   Global Step: 33670   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:17:39,435-Speed 9409.23 samples/sec   Loss 1.6559   LearningRate 0.0003   Epoch: 19   Global Step: 33680   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:18:05,553-Speed 9409.94 samples/sec   Loss 1.6570   LearningRate 0.0003   Epoch: 19   Global Step: 33690   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:18:31,671-Speed 9409.91 samples/sec   Loss 1.6368   LearningRate 0.0003   Epoch: 19   Global Step: 33700   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:18:57,762-Speed 9419.64 samples/sec   Loss 1.6531   LearningRate 0.0003   Epoch: 19   Global Step: 33710   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:19:23,910-Speed 9399.38 samples/sec   Loss 1.6413   LearningRate 0.0003   Epoch: 19   Global Step: 33720   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:19:50,000-Speed 9420.02 samples/sec   Loss 1.6414   LearningRate 0.0003   Epoch: 19   Global Step: 33730   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:20:16,150-Speed 9398.85 samples/sec   Loss 1.6415   LearningRate 0.0003   Epoch: 19   Global Step: 33740   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:20:42,244-Speed 9418.39 samples/sec   Loss 1.6526   LearningRate 0.0003   Epoch: 19   Global Step: 33750   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:21:08,391-Speed 9399.98 samples/sec   Loss 1.6429   LearningRate 0.0003   Epoch: 19   Global Step: 33760   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:21:34,477-Speed 9421.54 samples/sec   Loss 1.6530   LearningRate 0.0003   Epoch: 19   Global Step: 33770   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:22:00,665-Speed 9384.90 samples/sec   Loss 1.6545   LearningRate 0.0003   Epoch: 19   Global Step: 33780   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:22:26,783-Speed 9410.04 samples/sec   Loss 1.6484   LearningRate 0.0003   Epoch: 19   Global Step: 33790   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:22:52,877-Speed 9418.64 samples/sec   Loss 1.6394   LearningRate 0.0003   Epoch: 19   Global Step: 33800   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:23:19,033-Speed 9396.36 samples/sec   Loss 1.6574   LearningRate 0.0003   Epoch: 19   Global Step: 33810   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:23:45,171-Speed 9402.83 samples/sec   Loss 1.6357   LearningRate 0.0003   Epoch: 19   Global Step: 33820   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:24:11,298-Speed 9406.89 samples/sec   Loss 1.6384   LearningRate 0.0003   Epoch: 19   Global Step: 33830   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:24:37,420-Speed 9408.43 samples/sec   Loss 1.6415   LearningRate 0.0003   Epoch: 19   Global Step: 33840   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:25:03,550-Speed 9405.55 samples/sec   Loss 1.6387   LearningRate 0.0003   Epoch: 19   Global Step: 33850   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:25:29,674-Speed 9408.05 samples/sec   Loss 1.6443   LearningRate 0.0003   Epoch: 19   Global Step: 33860   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:25:55,821-Speed 9399.60 samples/sec   Loss 1.6455   LearningRate 0.0003   Epoch: 19   Global Step: 33870   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-03-05 21:26:21,940-Speed 9409.71 samples/sec   Loss 1.6513   LearningRate 0.0003   Epoch: 19   Global Step: 33880   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-03-05 21:26:48,064-Speed 9407.93 samples/sec   Loss 1.6545   LearningRate 0.0003   Epoch: 19   Global Step: 33890   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-03-05 21:27:14,154-Speed 9420.26 samples/sec   Loss 1.6455   LearningRate 0.0003   Epoch: 19   Global Step: 33900   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-03-05 21:27:40,274-Speed 9409.27 samples/sec   Loss 1.6238   LearningRate 0.0003   Epoch: 19   Global Step: 33910   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-03-05 21:28:06,428-Speed 9396.88 samples/sec   Loss 1.6356   LearningRate 0.0003   Epoch: 19   Global Step: 33920   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-03-05 21:28:32,469-Speed 9437.89 samples/sec   Loss 1.6401   LearningRate 0.0003   Epoch: 19   Global Step: 33930   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-03-05 21:28:58,522-Speed 9433.40 samples/sec   Loss 1.6400   LearningRate 0.0003   Epoch: 19   Global Step: 33940   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-05 21:29:24,568-Speed 9435.94 samples/sec   Loss 1.6318   LearningRate 0.0003   Epoch: 19   Global Step: 33950   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-05 21:29:50,670-Speed 9416.51 samples/sec   Loss 1.6597   LearningRate 0.0003   Epoch: 19   Global Step: 33960   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-05 21:30:16,695-Speed 9443.47 samples/sec   Loss 1.6486   LearningRate 0.0003   Epoch: 19   Global Step: 33970   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-05 21:30:42,837-Speed 9401.25 samples/sec   Loss 1.6283   LearningRate 0.0003   Epoch: 19   Global Step: 33980   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-05 21:31:08,916-Speed 9424.11 samples/sec   Loss 1.6336   LearningRate 0.0003   Epoch: 19   Global Step: 33990   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-05 21:31:35,015-Speed 9416.84 samples/sec   Loss 1.6370   LearningRate 0.0003   Epoch: 19   Global Step: 34000   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-05 21:32:01,153-Speed 9402.93 samples/sec   Loss 1.6291   LearningRate 0.0003   Epoch: 19   Global Step: 34010   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-05 21:32:27,283-Speed 9405.83 samples/sec   Loss 1.6376   LearningRate 0.0003   Epoch: 19   Global Step: 34020   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-05 21:32:53,435-Speed 9397.62 samples/sec   Loss 1.6356   LearningRate 0.0003   Epoch: 19   Global Step: 34030   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-05 21:33:19,588-Speed 9397.45 samples/sec   Loss 1.6229   LearningRate 0.0003   Epoch: 19   Global Step: 34040   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:33:45,730-Speed 9401.41 samples/sec   Loss 1.6338   LearningRate 0.0003   Epoch: 19   Global Step: 34050   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:34:11,935-Speed 9378.94 samples/sec   Loss 1.6396   LearningRate 0.0003   Epoch: 19   Global Step: 34060   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:34:38,112-Speed 9388.54 samples/sec   Loss 1.6418   LearningRate 0.0003   Epoch: 19   Global Step: 34070   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:35:04,242-Speed 9405.98 samples/sec   Loss 1.6362   LearningRate 0.0003   Epoch: 19   Global Step: 34080   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:35:30,453-Speed 9376.52 samples/sec   Loss 1.6239   LearningRate 0.0003   Epoch: 19   Global Step: 34090   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:35:56,668-Speed 9375.18 samples/sec   Loss 1.6284   LearningRate 0.0003   Epoch: 19   Global Step: 34100   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:36:22,758-Speed 9421.00 samples/sec   Loss 1.6261   LearningRate 0.0003   Epoch: 19   Global Step: 34110   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:36:48,919-Speed 9394.67 samples/sec   Loss 1.6296   LearningRate 0.0003   Epoch: 19   Global Step: 34120   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:37:15,009-Speed 9419.97 samples/sec   Loss 1.6279   LearningRate 0.0003   Epoch: 19   Global Step: 34130   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:37:41,123-Speed 9411.43 samples/sec   Loss 1.6264   LearningRate 0.0003   Epoch: 19   Global Step: 34140   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:38:07,313-Speed 9384.35 samples/sec   Loss 1.6318   LearningRate 0.0003   Epoch: 19   Global Step: 34150   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:38:33,445-Speed 9404.80 samples/sec   Loss 1.6288   LearningRate 0.0003   Epoch: 19   Global Step: 34160   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:38:59,669-Speed 9372.10 samples/sec   Loss 1.6292   LearningRate 0.0003   Epoch: 19   Global Step: 34170   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:39:25,834-Speed 9393.16 samples/sec   Loss 1.6335   LearningRate 0.0003   Epoch: 19   Global Step: 34180   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:39:52,023-Speed 9384.38 samples/sec   Loss 1.6398   LearningRate 0.0003   Epoch: 19   Global Step: 34190   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:40:18,089-Speed 9428.95 samples/sec   Loss 1.6250   LearningRate 0.0003   Epoch: 19   Global Step: 34200   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:40:44,188-Speed 9416.85 samples/sec   Loss 1.6232   LearningRate 0.0003   Epoch: 19   Global Step: 34210   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:41:10,313-Speed 9407.69 samples/sec   Loss 1.6354   LearningRate 0.0003   Epoch: 19   Global Step: 34220   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:41:36,451-Speed 9402.92 samples/sec   Loss 1.6151   LearningRate 0.0003   Epoch: 19   Global Step: 34230   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:42:02,550-Speed 9417.02 samples/sec   Loss 1.6208   LearningRate 0.0003   Epoch: 19   Global Step: 34240   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-03-05 21:42:28,662-Speed 9412.37 samples/sec   Loss 1.6144   LearningRate 0.0003   Epoch: 19   Global Step: 34250   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-03-05 21:42:54,704-Speed 9437.61 samples/sec   Loss 1.6172   LearningRate 0.0003   Epoch: 19   Global Step: 34260   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:43:20,773-Speed 9427.57 samples/sec   Loss 1.6220   LearningRate 0.0003   Epoch: 19   Global Step: 34270   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:43:46,907-Speed 9404.07 samples/sec   Loss 1.6103   LearningRate 0.0003   Epoch: 19   Global Step: 34280   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:44:13,032-Speed 9407.83 samples/sec   Loss 1.6206   LearningRate 0.0003   Epoch: 19   Global Step: 34290   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:44:39,116-Speed 9422.37 samples/sec   Loss 1.6194   LearningRate 0.0003   Epoch: 19   Global Step: 34300   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:45:05,236-Speed 9409.48 samples/sec   Loss 1.6184   LearningRate 0.0003   Epoch: 19   Global Step: 34310   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:45:31,326-Speed 9419.97 samples/sec   Loss 1.6304   LearningRate 0.0003   Epoch: 19   Global Step: 34320   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:45:57,426-Speed 9416.51 samples/sec   Loss 1.6151   LearningRate 0.0003   Epoch: 19   Global Step: 34330   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:46:23,446-Speed 9445.43 samples/sec   Loss 1.6270   LearningRate 0.0003   Epoch: 19   Global Step: 34340   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:46:49,603-Speed 9396.32 samples/sec   Loss 1.6353   LearningRate 0.0003   Epoch: 19   Global Step: 34350   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:47:15,752-Speed 9398.69 samples/sec   Loss 1.6149   LearningRate 0.0003   Epoch: 19   Global Step: 34360   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-03-05 21:47:41,888-Speed 9403.28 samples/sec   Loss 1.6187   LearningRate 0.0003   Epoch: 19   Global Step: 34370   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-03-05 21:48:08,036-Speed 9399.28 samples/sec   Loss 1.6125   LearningRate 0.0003   Epoch: 19   Global Step: 34380   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:48:34,216-Speed 9387.55 samples/sec   Loss 1.6024   LearningRate 0.0003   Epoch: 19   Global Step: 34390   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:49:00,434-Speed 9374.20 samples/sec   Loss 1.6147   LearningRate 0.0003   Epoch: 19   Global Step: 34400   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:49:26,640-Speed 9378.72 samples/sec   Loss 1.6102   LearningRate 0.0003   Epoch: 19   Global Step: 34410   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:49:52,776-Speed 9403.45 samples/sec   Loss 1.6199   LearningRate 0.0003   Epoch: 19   Global Step: 34420   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:50:18,913-Speed 9403.33 samples/sec   Loss 1.6049   LearningRate 0.0003   Epoch: 19   Global Step: 34430   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:50:45,050-Speed 9402.91 samples/sec   Loss 1.6348   LearningRate 0.0003   Epoch: 19   Global Step: 34440   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-05 21:51:11,247-Speed 9381.63 samples/sec   Loss 1.6202   LearningRate 0.0003   Epoch: 19   Global Step: 34450   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 21:51:37,291-Speed 9436.64 samples/sec   Loss 1.6187   LearningRate 0.0003   Epoch: 19   Global Step: 34460   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 21:52:03,367-Speed 9425.18 samples/sec   Loss 1.6186   LearningRate 0.0003   Epoch: 19   Global Step: 34470   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 21:52:29,476-Speed 9413.14 samples/sec   Loss 1.6170   LearningRate 0.0003   Epoch: 19   Global Step: 34480   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-03-05 21:52:55,522-Speed 9435.86 samples/sec   Loss 1.6205   LearningRate 0.0003   Epoch: 19   Global Step: 34490   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 21:53:21,660-Speed 9403.01 samples/sec   Loss 1.6188   LearningRate 0.0003   Epoch: 19   Global Step: 34500   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 21:53:47,776-Speed 9410.78 samples/sec   Loss 1.6271   LearningRate 0.0003   Epoch: 19   Global Step: 34510   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 21:54:13,888-Speed 9412.21 samples/sec   Loss 1.6220   LearningRate 0.0003   Epoch: 19   Global Step: 34520   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-05 21:54:39,963-Speed 9425.14 samples/sec   Loss 1.6208   LearningRate 0.0003   Epoch: 19   Global Step: 34530   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-05 21:55:06,136-Speed 9390.14 samples/sec   Loss 1.6239   LearningRate 0.0003   Epoch: 19   Global Step: 34540   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-05 21:55:32,193-Speed 9432.73 samples/sec   Loss 1.6310   LearningRate 0.0003   Epoch: 19   Global Step: 34550   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-05 21:55:58,283-Speed 9419.97 samples/sec   Loss 1.6248   LearningRate 0.0003   Epoch: 19   Global Step: 34560   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-05 21:57:16,320-Speed 3149.33 samples/sec   Loss 1.6128   LearningRate 0.0003   Epoch: 20   Global Step: 34570   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-05 21:57:42,292-Speed 9463.27 samples/sec   Loss 1.5910   LearningRate 0.0003   Epoch: 20   Global Step: 34580   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-05 21:58:08,335-Speed 9437.20 samples/sec   Loss 1.5977   LearningRate 0.0003   Epoch: 20   Global Step: 34590   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-05 21:58:34,439-Speed 9415.29 samples/sec   Loss 1.5920   LearningRate 0.0003   Epoch: 20   Global Step: 34600   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-05 21:59:00,517-Speed 9424.65 samples/sec   Loss 1.5978   LearningRate 0.0003   Epoch: 20   Global Step: 34610   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-05 21:59:26,571-Speed 9432.99 samples/sec   Loss 1.6085   LearningRate 0.0003   Epoch: 20   Global Step: 34620   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 21:59:52,697-Speed 9407.39 samples/sec   Loss 1.5904   LearningRate 0.0003   Epoch: 20   Global Step: 34630   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:00:18,774-Speed 9424.96 samples/sec   Loss 1.5981   LearningRate 0.0003   Epoch: 20   Global Step: 34640   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:00:44,866-Speed 9419.26 samples/sec   Loss 1.6022   LearningRate 0.0003   Epoch: 20   Global Step: 34650   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:01:10,899-Speed 9440.97 samples/sec   Loss 1.5977   LearningRate 0.0003   Epoch: 20   Global Step: 34660   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:01:37,077-Speed 9388.50 samples/sec   Loss 1.5985   LearningRate 0.0003   Epoch: 20   Global Step: 34670   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:02:03,148-Speed 9427.12 samples/sec   Loss 1.6073   LearningRate 0.0003   Epoch: 20   Global Step: 34680   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:02:29,246-Speed 9417.10 samples/sec   Loss 1.6081   LearningRate 0.0003   Epoch: 20   Global Step: 34690   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:02:55,410-Speed 9393.47 samples/sec   Loss 1.5999   LearningRate 0.0003   Epoch: 20   Global Step: 34700   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:03:21,551-Speed 9402.02 samples/sec   Loss 1.5916   LearningRate 0.0003   Epoch: 20   Global Step: 34710   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:03:47,648-Speed 9417.38 samples/sec   Loss 1.6014   LearningRate 0.0003   Epoch: 20   Global Step: 34720   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-03-05 22:04:13,718-Speed 9427.38 samples/sec   Loss 1.5916   LearningRate 0.0003   Epoch: 20   Global Step: 34730   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-03-05 22:04:39,743-Speed 9443.98 samples/sec   Loss 1.5972   LearningRate 0.0003   Epoch: 20   Global Step: 34740   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:05:05,833-Speed 9420.31 samples/sec   Loss 1.6009   LearningRate 0.0003   Epoch: 20   Global Step: 34750   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:05:31,939-Speed 9414.88 samples/sec   Loss 1.5998   LearningRate 0.0003   Epoch: 20   Global Step: 34760   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:05:58,065-Speed 9407.08 samples/sec   Loss 1.5895   LearningRate 0.0003   Epoch: 20   Global Step: 34770   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:06:24,229-Speed 9393.30 samples/sec   Loss 1.5971   LearningRate 0.0003   Epoch: 20   Global Step: 34780   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:06:50,504-Speed 9353.98 samples/sec   Loss 1.5948   LearningRate 0.0003   Epoch: 20   Global Step: 34790   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:07:16,725-Speed 9373.18 samples/sec   Loss 1.5998   LearningRate 0.0003   Epoch: 20   Global Step: 34800   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:07:42,865-Speed 9401.99 samples/sec   Loss 1.6016   LearningRate 0.0003   Epoch: 20   Global Step: 34810   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:08:09,135-Speed 9355.59 samples/sec   Loss 1.5968   LearningRate 0.0003   Epoch: 20   Global Step: 34820   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:08:35,196-Speed 9430.47 samples/sec   Loss 1.5990   LearningRate 0.0003   Epoch: 20   Global Step: 34830   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:09:01,308-Speed 9411.95 samples/sec   Loss 1.5998   LearningRate 0.0003   Epoch: 20   Global Step: 34840   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:09:27,470-Speed 9394.19 samples/sec   Loss 1.5826   LearningRate 0.0003   Epoch: 20   Global Step: 34850   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:09:53,605-Speed 9403.95 samples/sec   Loss 1.5935   LearningRate 0.0003   Epoch: 20   Global Step: 34860   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:10:19,773-Speed 9391.98 samples/sec   Loss 1.5969   LearningRate 0.0003   Epoch: 20   Global Step: 34870   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:10:45,982-Speed 9377.23 samples/sec   Loss 1.5952   LearningRate 0.0003   Epoch: 20   Global Step: 34880   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:11:12,166-Speed 9386.13 samples/sec   Loss 1.5940   LearningRate 0.0003   Epoch: 20   Global Step: 34890   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:11:38,386-Speed 9373.61 samples/sec   Loss 1.5965   LearningRate 0.0003   Epoch: 20   Global Step: 34900   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:12:04,582-Speed 9382.10 samples/sec   Loss 1.5949   LearningRate 0.0003   Epoch: 20   Global Step: 34910   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:12:30,819-Speed 9367.13 samples/sec   Loss 1.5938   LearningRate 0.0003   Epoch: 20   Global Step: 34920   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:12:56,854-Speed 9439.79 samples/sec   Loss 1.5844   LearningRate 0.0003   Epoch: 20   Global Step: 34930   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:13:22,995-Speed 9401.83 samples/sec   Loss 1.5948   LearningRate 0.0003   Epoch: 20   Global Step: 34940   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-03-05 22:13:49,100-Speed 9414.41 samples/sec   Loss 1.6011   LearningRate 0.0003   Epoch: 20   Global Step: 34950   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:14:15,216-Speed 9411.01 samples/sec   Loss 1.5960   LearningRate 0.0003   Epoch: 20   Global Step: 34960   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:14:41,357-Speed 9401.53 samples/sec   Loss 1.5987   LearningRate 0.0003   Epoch: 20   Global Step: 34970   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:15:07,420-Speed 9429.58 samples/sec   Loss 1.5873   LearningRate 0.0003   Epoch: 20   Global Step: 34980   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:15:33,525-Speed 9415.05 samples/sec   Loss 1.5979   LearningRate 0.0003   Epoch: 20   Global Step: 34990   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:15:59,678-Speed 9397.42 samples/sec   Loss 1.5832   LearningRate 0.0003   Epoch: 20   Global Step: 35000   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:16:25,764-Speed 9421.35 samples/sec   Loss 1.5988   LearningRate 0.0003   Epoch: 20   Global Step: 35010   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:16:51,851-Speed 9421.11 samples/sec   Loss 1.5856   LearningRate 0.0003   Epoch: 20   Global Step: 35020   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:17:17,978-Speed 9406.67 samples/sec   Loss 1.5909   LearningRate 0.0003   Epoch: 20   Global Step: 35030   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:17:44,086-Speed 9414.32 samples/sec   Loss 1.5941   LearningRate 0.0003   Epoch: 20   Global Step: 35040   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:18:10,187-Speed 9416.11 samples/sec   Loss 1.5982   LearningRate 0.0003   Epoch: 20   Global Step: 35050   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-05 22:18:36,347-Speed 9394.75 samples/sec   Loss 1.5991   LearningRate 0.0003   Epoch: 20   Global Step: 35060   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-05 22:19:02,499-Speed 9398.20 samples/sec   Loss 1.6032   LearningRate 0.0003   Epoch: 20   Global Step: 35070   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-05 22:19:28,660-Speed 9394.49 samples/sec   Loss 1.5878   LearningRate 0.0003   Epoch: 20   Global Step: 35080   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-05 22:19:54,739-Speed 9425.05 samples/sec   Loss 1.5916   LearningRate 0.0003   Epoch: 20   Global Step: 35090   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-05 22:20:20,844-Speed 9414.75 samples/sec   Loss 1.5908   LearningRate 0.0003   Epoch: 20   Global Step: 35100   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-05 22:20:46,935-Speed 9419.75 samples/sec   Loss 1.5894   LearningRate 0.0003   Epoch: 20   Global Step: 35110   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-05 22:21:13,029-Speed 9418.71 samples/sec   Loss 1.5767   LearningRate 0.0003   Epoch: 20   Global Step: 35120   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-05 22:21:39,166-Speed 9402.94 samples/sec   Loss 1.5800   LearningRate 0.0003   Epoch: 20   Global Step: 35130   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-05 22:22:05,290-Speed 9408.10 samples/sec   Loss 1.5769   LearningRate 0.0003   Epoch: 20   Global Step: 35140   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-05 22:22:31,377-Speed 9421.03 samples/sec   Loss 1.5858   LearningRate 0.0003   Epoch: 20   Global Step: 35150   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:22:57,472-Speed 9418.36 samples/sec   Loss 1.5794   LearningRate 0.0003   Epoch: 20   Global Step: 35160   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:23:23,587-Speed 9411.01 samples/sec   Loss 1.5682   LearningRate 0.0003   Epoch: 20   Global Step: 35170   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:23:49,729-Speed 9401.51 samples/sec   Loss 1.5781   LearningRate 0.0003   Epoch: 20   Global Step: 35180   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:24:15,847-Speed 9410.33 samples/sec   Loss 1.5759   LearningRate 0.0003   Epoch: 20   Global Step: 35190   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:24:41,923-Speed 9425.22 samples/sec   Loss 1.5789   LearningRate 0.0003   Epoch: 20   Global Step: 35200   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:25:08,029-Speed 9414.04 samples/sec   Loss 1.5833   LearningRate 0.0003   Epoch: 20   Global Step: 35210   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:25:34,133-Speed 9415.07 samples/sec   Loss 1.5834   LearningRate 0.0003   Epoch: 20   Global Step: 35220   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:26:00,276-Speed 9401.02 samples/sec   Loss 1.5729   LearningRate 0.0003   Epoch: 20   Global Step: 35230   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:26:26,384-Speed 9413.91 samples/sec   Loss 1.5800   LearningRate 0.0003   Epoch: 20   Global Step: 35240   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:26:52,518-Speed 9404.32 samples/sec   Loss 1.5624   LearningRate 0.0003   Epoch: 20   Global Step: 35250   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-03-05 22:27:18,588-Speed 9427.35 samples/sec   Loss 1.5706   LearningRate 0.0003   Epoch: 20   Global Step: 35260   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:27:44,736-Speed 9399.28 samples/sec   Loss 1.5756   LearningRate 0.0003   Epoch: 20   Global Step: 35270   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:28:10,886-Speed 9398.59 samples/sec   Loss 1.5832   LearningRate 0.0003   Epoch: 20   Global Step: 35280   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:28:36,992-Speed 9414.39 samples/sec   Loss 1.5766   LearningRate 0.0003   Epoch: 20   Global Step: 35290   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:29:03,038-Speed 9436.06 samples/sec   Loss 1.5735   LearningRate 0.0003   Epoch: 20   Global Step: 35300   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:29:29,130-Speed 9419.63 samples/sec   Loss 1.5732   LearningRate 0.0003   Epoch: 20   Global Step: 35310   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:29:55,210-Speed 9423.88 samples/sec   Loss 1.5738   LearningRate 0.0003   Epoch: 20   Global Step: 35320   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:30:21,355-Speed 9400.26 samples/sec   Loss 1.5766   LearningRate 0.0003   Epoch: 20   Global Step: 35330   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:30:47,423-Speed 9428.42 samples/sec   Loss 1.5773   LearningRate 0.0003   Epoch: 20   Global Step: 35340   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:31:13,495-Speed 9426.80 samples/sec   Loss 1.5722   LearningRate 0.0003   Epoch: 20   Global Step: 35350   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:31:39,578-Speed 9422.48 samples/sec   Loss 1.5736   LearningRate 0.0003   Epoch: 20   Global Step: 35360   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-03-05 22:32:05,679-Speed 9416.19 samples/sec   Loss 1.5602   LearningRate 0.0003   Epoch: 20   Global Step: 35370   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-05 22:32:31,821-Speed 9401.34 samples/sec   Loss 1.5804   LearningRate 0.0003   Epoch: 20   Global Step: 35380   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-05 22:32:58,035-Speed 9375.67 samples/sec   Loss 1.5760   LearningRate 0.0003   Epoch: 20   Global Step: 35390   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-05 22:33:24,197-Speed 9394.45 samples/sec   Loss 1.5688   LearningRate 0.0003   Epoch: 20   Global Step: 35400   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-05 22:33:50,427-Speed 9369.78 samples/sec   Loss 1.5639   LearningRate 0.0003   Epoch: 20   Global Step: 35410   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-05 22:34:16,547-Speed 9409.25 samples/sec   Loss 1.5615   LearningRate 0.0003   Epoch: 20   Global Step: 35420   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-05 22:34:42,718-Speed 9390.85 samples/sec   Loss 1.5613   LearningRate 0.0003   Epoch: 20   Global Step: 35430   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-05 22:35:08,848-Speed 9405.68 samples/sec   Loss 1.5735   LearningRate 0.0003   Epoch: 20   Global Step: 35440   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-05 22:35:34,993-Speed 9400.31 samples/sec   Loss 1.5528   LearningRate 0.0003   Epoch: 20   Global Step: 35450   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-05 22:36:01,219-Speed 9370.90 samples/sec   Loss 1.5560   LearningRate 0.0003   Epoch: 20   Global Step: 35460   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-05 22:36:27,386-Speed 9392.49 samples/sec   Loss 1.5540   LearningRate 0.0003   Epoch: 20   Global Step: 35470   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:36:53,489-Speed 9415.55 samples/sec   Loss 1.5654   LearningRate 0.0003   Epoch: 20   Global Step: 35480   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-05 22:37:19,696-Speed 9378.09 samples/sec   Loss 1.5594   LearningRate 0.0003   Epoch: 20   Global Step: 35490   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-05 22:37:45,830-Speed 9404.02 samples/sec   Loss 1.5592   LearningRate 0.0003   Epoch: 20   Global Step: 35500   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-05 22:38:12,012-Speed 9386.99 samples/sec   Loss 1.5674   LearningRate 0.0003   Epoch: 20   Global Step: 35510   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-05 22:38:38,097-Speed 9421.97 samples/sec   Loss 1.5682   LearningRate 0.0003   Epoch: 20   Global Step: 35520   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-05 22:39:04,181-Speed 9422.19 samples/sec   Loss 1.5582   LearningRate 0.0003   Epoch: 20   Global Step: 35530   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-05 22:39:30,402-Speed 9372.81 samples/sec   Loss 1.5603   LearningRate 0.0003   Epoch: 20   Global Step: 35540   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-05 22:39:56,552-Speed 9398.74 samples/sec   Loss 1.5782   LearningRate 0.0003   Epoch: 20   Global Step: 35550   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-05 22:40:22,634-Speed 9423.02 samples/sec   Loss 1.5795   LearningRate 0.0003   Epoch: 20   Global Step: 35560   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-05 22:40:48,733-Speed 9416.83 samples/sec   Loss 1.5585   LearningRate 0.0003   Epoch: 20   Global Step: 35570   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-05 22:41:14,821-Speed 9421.04 samples/sec   Loss 1.5683   LearningRate 0.0003   Epoch: 20   Global Step: 35580   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:41:41,038-Speed 9374.55 samples/sec   Loss 1.5697   LearningRate 0.0003   Epoch: 20   Global Step: 35590   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:42:07,174-Speed 9403.66 samples/sec   Loss 1.5583   LearningRate 0.0003   Epoch: 20   Global Step: 35600   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:42:33,261-Speed 9421.10 samples/sec   Loss 1.5607   LearningRate 0.0003   Epoch: 20   Global Step: 35610   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:42:59,365-Speed 9415.17 samples/sec   Loss 1.5590   LearningRate 0.0003   Epoch: 20   Global Step: 35620   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:43:25,466-Speed 9416.77 samples/sec   Loss 1.5597   LearningRate 0.0003   Epoch: 20   Global Step: 35630   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:43:51,556-Speed 9420.21 samples/sec   Loss 1.5587   LearningRate 0.0003   Epoch: 20   Global Step: 35640   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:44:17,805-Speed 9363.02 samples/sec   Loss 1.5614   LearningRate 0.0003   Epoch: 20   Global Step: 35650   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:44:43,981-Speed 9388.88 samples/sec   Loss 1.5541   LearningRate 0.0003   Epoch: 20   Global Step: 35660   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:45:10,162-Speed 9387.45 samples/sec   Loss 1.5526   LearningRate 0.0003   Epoch: 20   Global Step: 35670   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:45:36,268-Speed 9414.42 samples/sec   Loss 1.5567   LearningRate 0.0003   Epoch: 20   Global Step: 35680   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-03-05 22:46:02,443-Speed 9389.50 samples/sec   Loss 1.5539   LearningRate 0.0003   Epoch: 20   Global Step: 35690   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:46:28,589-Speed 9399.85 samples/sec   Loss 1.5639   LearningRate 0.0003   Epoch: 20   Global Step: 35700   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:46:54,759-Speed 9391.29 samples/sec   Loss 1.5588   LearningRate 0.0003   Epoch: 20   Global Step: 35710   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:47:20,784-Speed 9443.82 samples/sec   Loss 1.5633   LearningRate 0.0003   Epoch: 20   Global Step: 35720   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:47:47,038-Speed 9361.11 samples/sec   Loss 1.5660   LearningRate 0.0003   Epoch: 20   Global Step: 35730   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:48:13,226-Speed 9384.79 samples/sec   Loss 1.5625   LearningRate 0.0003   Epoch: 20   Global Step: 35740   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-05 22:48:39,280-Speed 9433.20 samples/sec   Loss 1.5517   LearningRate 0.0003   Epoch: 20   Global Step: 35750   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-05 22:49:05,325-Speed 9436.38 samples/sec   Loss 1.5475   LearningRate 0.0003   Epoch: 20   Global Step: 35760   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-05 22:49:31,526-Speed 9380.18 samples/sec   Loss 1.5591   LearningRate 0.0003   Epoch: 20   Global Step: 35770   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-05 22:49:57,574-Speed 9435.41 samples/sec   Loss 1.5458   LearningRate 0.0003   Epoch: 20   Global Step: 35780   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-05 22:50:23,736-Speed 9394.06 samples/sec   Loss 1.5488   LearningRate 0.0003   Epoch: 20   Global Step: 35790   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-05 22:50:49,937-Speed 9380.01 samples/sec   Loss 1.5546   LearningRate 0.0003   Epoch: 20   Global Step: 35800   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-05 22:51:16,054-Speed 9410.23 samples/sec   Loss 1.5549   LearningRate 0.0003   Epoch: 20   Global Step: 35810   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 22:51:42,101-Speed 9435.98 samples/sec   Loss 1.5482   LearningRate 0.0003   Epoch: 20   Global Step: 35820   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 22:52:08,374-Speed 9354.39 samples/sec   Loss 1.5510   LearningRate 0.0003   Epoch: 20   Global Step: 35830   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 22:52:34,518-Speed 9400.43 samples/sec   Loss 1.5637   LearningRate 0.0003   Epoch: 20   Global Step: 35840   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 22:53:00,648-Speed 9405.70 samples/sec   Loss 1.5441   LearningRate 0.0003   Epoch: 20   Global Step: 35850   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 22:53:26,774-Speed 9407.24 samples/sec   Loss 1.5537   LearningRate 0.0003   Epoch: 20   Global Step: 35860   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 22:53:52,945-Speed 9390.80 samples/sec   Loss 1.5403   LearningRate 0.0003   Epoch: 20   Global Step: 35870   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 22:54:19,100-Speed 9396.97 samples/sec   Loss 1.5578   LearningRate 0.0003   Epoch: 20   Global Step: 35880   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 22:54:45,252-Speed 9397.72 samples/sec   Loss 1.5463   LearningRate 0.0003   Epoch: 20   Global Step: 35890   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 22:55:11,405-Speed 9397.44 samples/sec   Loss 1.5456   LearningRate 0.0003   Epoch: 20   Global Step: 35900   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 22:55:37,601-Speed 9381.86 samples/sec   Loss 1.5399   LearningRate 0.0003   Epoch: 20   Global Step: 35910   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 22:56:03,697-Speed 9417.79 samples/sec   Loss 1.5521   LearningRate 0.0003   Epoch: 20   Global Step: 35920   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 22:56:29,813-Speed 9410.73 samples/sec   Loss 1.5484   LearningRate 0.0003   Epoch: 20   Global Step: 35930   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 22:56:55,896-Speed 9422.79 samples/sec   Loss 1.5459   LearningRate 0.0003   Epoch: 20   Global Step: 35940   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 22:57:22,011-Speed 9410.78 samples/sec   Loss 1.5411   LearningRate 0.0003   Epoch: 20   Global Step: 35950   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-03-05 22:57:48,098-Speed 9421.24 samples/sec   Loss 1.5532   LearningRate 0.0003   Epoch: 20   Global Step: 35960   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 22:58:14,206-Speed 9413.68 samples/sec   Loss 1.5382   LearningRate 0.0003   Epoch: 20   Global Step: 35970   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 22:58:40,263-Speed 9432.10 samples/sec   Loss 1.5376   LearningRate 0.0003   Epoch: 20   Global Step: 35980   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 22:59:06,415-Speed 9397.60 samples/sec   Loss 1.5479   LearningRate 0.0003   Epoch: 20   Global Step: 35990   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 22:59:32,535-Speed 9409.32 samples/sec   Loss 1.5306   LearningRate 0.0003   Epoch: 20   Global Step: 36000   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 22:59:58,628-Speed 9419.02 samples/sec   Loss 1.5494   LearningRate 0.0003   Epoch: 20   Global Step: 36010   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:00:24,680-Speed 9434.26 samples/sec   Loss 1.5448   LearningRate 0.0003   Epoch: 20   Global Step: 36020   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:00:50,760-Speed 9423.61 samples/sec   Loss 1.5489   LearningRate 0.0003   Epoch: 20   Global Step: 36030   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:01:16,867-Speed 9413.84 samples/sec   Loss 1.5372   LearningRate 0.0003   Epoch: 20   Global Step: 36040   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:01:42,935-Speed 9428.31 samples/sec   Loss 1.5325   LearningRate 0.0003   Epoch: 20   Global Step: 36050   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:02:09,074-Speed 9402.44 samples/sec   Loss 1.5382   LearningRate 0.0003   Epoch: 20   Global Step: 36060   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:02:35,187-Speed 9411.73 samples/sec   Loss 1.5478   LearningRate 0.0003   Epoch: 20   Global Step: 36070   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:03:01,289-Speed 9415.87 samples/sec   Loss 1.5426   LearningRate 0.0003   Epoch: 20   Global Step: 36080   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:03:27,388-Speed 9417.03 samples/sec   Loss 1.5431   LearningRate 0.0003   Epoch: 20   Global Step: 36090   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:03:53,544-Speed 9396.10 samples/sec   Loss 1.5326   LearningRate 0.0003   Epoch: 20   Global Step: 36100   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:04:19,652-Speed 9413.71 samples/sec   Loss 1.5365   LearningRate 0.0003   Epoch: 20   Global Step: 36110   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 23:04:45,745-Speed 9419.90 samples/sec   Loss 1.5401   LearningRate 0.0003   Epoch: 20   Global Step: 36120   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 23:05:11,852-Speed 9413.82 samples/sec   Loss 1.5453   LearningRate 0.0003   Epoch: 20   Global Step: 36130   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 23:05:37,969-Speed 9410.49 samples/sec   Loss 1.5468   LearningRate 0.0003   Epoch: 20   Global Step: 36140   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 23:06:04,188-Speed 9373.88 samples/sec   Loss 1.5350   LearningRate 0.0003   Epoch: 20   Global Step: 36150   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 23:06:30,249-Speed 9430.73 samples/sec   Loss 1.5423   LearningRate 0.0003   Epoch: 20   Global Step: 36160   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 23:06:56,352-Speed 9415.42 samples/sec   Loss 1.5284   LearningRate 0.0003   Epoch: 20   Global Step: 36170   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 23:07:22,478-Speed 9407.15 samples/sec   Loss 1.5382   LearningRate 0.0003   Epoch: 20   Global Step: 36180   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 23:07:48,689-Speed 9376.79 samples/sec   Loss 1.5488   LearningRate 0.0003   Epoch: 20   Global Step: 36190   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 23:08:14,835-Speed 9399.89 samples/sec   Loss 1.5445   LearningRate 0.0003   Epoch: 20   Global Step: 36200   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 23:08:40,962-Speed 9407.15 samples/sec   Loss 1.5325   LearningRate 0.0003   Epoch: 20   Global Step: 36210   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 23:09:07,103-Speed 9401.79 samples/sec   Loss 1.5341   LearningRate 0.0003   Epoch: 20   Global Step: 36220   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 23:09:33,222-Speed 9409.77 samples/sec   Loss 1.5398   LearningRate 0.0003   Epoch: 20   Global Step: 36230   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 23:09:59,342-Speed 9409.31 samples/sec   Loss 1.5413   LearningRate 0.0003   Epoch: 20   Global Step: 36240   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 23:10:25,412-Speed 9427.43 samples/sec   Loss 1.5308   LearningRate 0.0003   Epoch: 20   Global Step: 36250   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 23:10:51,562-Speed 9398.41 samples/sec   Loss 1.5577   LearningRate 0.0003   Epoch: 20   Global Step: 36260   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 23:11:17,699-Speed 9403.21 samples/sec   Loss 1.5484   LearningRate 0.0003   Epoch: 20   Global Step: 36270   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 23:11:43,818-Speed 9409.76 samples/sec   Loss 1.5475   LearningRate 0.0003   Epoch: 20   Global Step: 36280   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 23:12:09,888-Speed 9427.24 samples/sec   Loss 1.5444   LearningRate 0.0003   Epoch: 20   Global Step: 36290   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:13:29,106-Speed 3102.36 samples/sec   Loss 1.5365   LearningRate 0.0003   Epoch: 21   Global Step: 36300   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:13:55,008-Speed 9488.60 samples/sec   Loss 1.5156   LearningRate 0.0003   Epoch: 21   Global Step: 36310   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:14:20,996-Speed 9457.33 samples/sec   Loss 1.5155   LearningRate 0.0003   Epoch: 21   Global Step: 36320   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:14:46,975-Speed 9460.09 samples/sec   Loss 1.5173   LearningRate 0.0003   Epoch: 21   Global Step: 36330   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:15:12,974-Speed 9453.08 samples/sec   Loss 1.5117   LearningRate 0.0003   Epoch: 21   Global Step: 36340   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:15:39,028-Speed 9433.36 samples/sec   Loss 1.5166   LearningRate 0.0003   Epoch: 21   Global Step: 36350   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:16:05,032-Speed 9451.07 samples/sec   Loss 1.5230   LearningRate 0.0003   Epoch: 21   Global Step: 36360   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:16:30,982-Speed 9470.94 samples/sec   Loss 1.5163   LearningRate 0.0003   Epoch: 21   Global Step: 36370   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:16:57,007-Speed 9443.73 samples/sec   Loss 1.5137   LearningRate 0.0003   Epoch: 21   Global Step: 36380   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:17:23,034-Speed 9442.80 samples/sec   Loss 1.5128   LearningRate 0.0003   Epoch: 21   Global Step: 36390   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 23:17:49,117-Speed 9422.70 samples/sec   Loss 1.5114   LearningRate 0.0003   Epoch: 21   Global Step: 36400   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 23:18:15,131-Speed 9447.58 samples/sec   Loss 1.5281   LearningRate 0.0003   Epoch: 21   Global Step: 36410   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 23:18:41,149-Speed 9445.91 samples/sec   Loss 1.5147   LearningRate 0.0003   Epoch: 21   Global Step: 36420   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 23:19:07,172-Speed 9444.16 samples/sec   Loss 1.5133   LearningRate 0.0003   Epoch: 21   Global Step: 36430   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 23:19:33,360-Speed 9384.99 samples/sec   Loss 1.5184   LearningRate 0.0003   Epoch: 21   Global Step: 36440   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 23:19:59,426-Speed 9428.62 samples/sec   Loss 1.5183   LearningRate 0.0003   Epoch: 21   Global Step: 36450   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 23:20:25,519-Speed 9419.22 samples/sec   Loss 1.5315   LearningRate 0.0003   Epoch: 21   Global Step: 36460   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 23:20:51,536-Speed 9446.54 samples/sec   Loss 1.5235   LearningRate 0.0003   Epoch: 21   Global Step: 36470   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:21:17,703-Speed 9392.17 samples/sec   Loss 1.5181   LearningRate 0.0003   Epoch: 21   Global Step: 36480   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:21:43,717-Speed 9447.72 samples/sec   Loss 1.5281   LearningRate 0.0003   Epoch: 21   Global Step: 36490   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:22:09,879-Speed 9394.27 samples/sec   Loss 1.5155   LearningRate 0.0003   Epoch: 21   Global Step: 36500   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:22:36,067-Speed 9384.92 samples/sec   Loss 1.5200   LearningRate 0.0003   Epoch: 21   Global Step: 36510   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:23:02,245-Speed 9388.35 samples/sec   Loss 1.5215   LearningRate 0.0003   Epoch: 21   Global Step: 36520   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:23:28,254-Speed 9449.75 samples/sec   Loss 1.5110   LearningRate 0.0003   Epoch: 21   Global Step: 36530   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:23:54,339-Speed 9422.10 samples/sec   Loss 1.5290   LearningRate 0.0003   Epoch: 21   Global Step: 36540   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:24:20,376-Speed 9439.40 samples/sec   Loss 1.5152   LearningRate 0.0003   Epoch: 21   Global Step: 36550   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:24:46,470-Speed 9418.80 samples/sec   Loss 1.5189   LearningRate 0.0003   Epoch: 21   Global Step: 36560   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:25:12,755-Speed 9349.98 samples/sec   Loss 1.5196   LearningRate 0.0003   Epoch: 21   Global Step: 36570   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 23:25:38,832-Speed 9424.92 samples/sec   Loss 1.5206   LearningRate 0.0003   Epoch: 21   Global Step: 36580   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 23:26:04,998-Speed 9392.63 samples/sec   Loss 1.5129   LearningRate 0.0003   Epoch: 21   Global Step: 36590   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 23:26:31,127-Speed 9406.20 samples/sec   Loss 1.5303   LearningRate 0.0003   Epoch: 21   Global Step: 36600   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 23:26:57,243-Speed 9410.47 samples/sec   Loss 1.5165   LearningRate 0.0003   Epoch: 21   Global Step: 36610   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 23:27:23,375-Speed 9405.43 samples/sec   Loss 1.5079   LearningRate 0.0003   Epoch: 21   Global Step: 36620   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 23:27:49,522-Speed 9399.80 samples/sec   Loss 1.5147   LearningRate 0.0003   Epoch: 21   Global Step: 36630   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 23:28:15,677-Speed 9396.60 samples/sec   Loss 1.5161   LearningRate 0.0003   Epoch: 21   Global Step: 36640   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 23:28:41,816-Speed 9402.70 samples/sec   Loss 1.5150   LearningRate 0.0003   Epoch: 21   Global Step: 36650   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 23:29:08,025-Speed 9377.16 samples/sec   Loss 1.5096   LearningRate 0.0003   Epoch: 21   Global Step: 36660   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 23:29:34,103-Speed 9424.51 samples/sec   Loss 1.5102   LearningRate 0.0003   Epoch: 21   Global Step: 36670   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-03-05 23:30:00,256-Speed 9397.53 samples/sec   Loss 1.5216   LearningRate 0.0003   Epoch: 21   Global Step: 36680   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-03-05 23:30:26,342-Speed 9421.59 samples/sec   Loss 1.5049   LearningRate 0.0003   Epoch: 21   Global Step: 36690   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 23:30:52,500-Speed 9395.44 samples/sec   Loss 1.5145   LearningRate 0.0003   Epoch: 21   Global Step: 36700   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 23:31:18,706-Speed 9378.56 samples/sec   Loss 1.5049   LearningRate 0.0003   Epoch: 21   Global Step: 36710   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 23:31:44,830-Speed 9407.87 samples/sec   Loss 1.5054   LearningRate 0.0003   Epoch: 21   Global Step: 36720   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 23:32:10,995-Speed 9392.75 samples/sec   Loss 1.5062   LearningRate 0.0003   Epoch: 21   Global Step: 36730   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 23:32:37,105-Speed 9412.91 samples/sec   Loss 1.5120   LearningRate 0.0003   Epoch: 21   Global Step: 36740   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:33:03,241-Speed 9403.74 samples/sec   Loss 1.5148   LearningRate 0.0003   Epoch: 21   Global Step: 36750   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:33:29,345-Speed 9415.00 samples/sec   Loss 1.5149   LearningRate 0.0003   Epoch: 21   Global Step: 36760   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:33:55,506-Speed 9394.54 samples/sec   Loss 1.5096   LearningRate 0.0003   Epoch: 21   Global Step: 36770   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:34:21,646-Speed 9402.19 samples/sec   Loss 1.5133   LearningRate 0.0003   Epoch: 21   Global Step: 36780   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:34:47,801-Speed 9396.35 samples/sec   Loss 1.5036   LearningRate 0.0003   Epoch: 21   Global Step: 36790   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:35:13,884-Speed 9422.79 samples/sec   Loss 1.5123   LearningRate 0.0003   Epoch: 21   Global Step: 36800   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:35:39,985-Speed 9416.17 samples/sec   Loss 1.5049   LearningRate 0.0003   Epoch: 21   Global Step: 36810   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:36:06,231-Speed 9364.06 samples/sec   Loss 1.5071   LearningRate 0.0003   Epoch: 21   Global Step: 36820   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:36:32,352-Speed 9409.05 samples/sec   Loss 1.5149   LearningRate 0.0003   Epoch: 21   Global Step: 36830   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:36:58,482-Speed 9405.64 samples/sec   Loss 1.5089   LearningRate 0.0003   Epoch: 21   Global Step: 36840   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 23:37:24,548-Speed 9428.86 samples/sec   Loss 1.4923   LearningRate 0.0003   Epoch: 21   Global Step: 36850   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 23:37:50,669-Speed 9408.66 samples/sec   Loss 1.5018   LearningRate 0.0003   Epoch: 21   Global Step: 36860   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:38:16,761-Speed 9419.33 samples/sec   Loss 1.4947   LearningRate 0.0003   Epoch: 21   Global Step: 36870   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:38:42,839-Speed 9425.60 samples/sec   Loss 1.5055   LearningRate 0.0003   Epoch: 21   Global Step: 36880   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:39:08,949-Speed 9412.60 samples/sec   Loss 1.5098   LearningRate 0.0003   Epoch: 21   Global Step: 36890   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:39:35,008-Speed 9431.50 samples/sec   Loss 1.4984   LearningRate 0.0003   Epoch: 21   Global Step: 36900   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:40:01,064-Speed 9432.45 samples/sec   Loss 1.5077   LearningRate 0.0003   Epoch: 21   Global Step: 36910   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:40:27,211-Speed 9399.31 samples/sec   Loss 1.4954   LearningRate 0.0003   Epoch: 21   Global Step: 36920   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:40:53,311-Speed 9416.88 samples/sec   Loss 1.5037   LearningRate 0.0003   Epoch: 21   Global Step: 36930   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:41:19,397-Speed 9421.37 samples/sec   Loss 1.4990   LearningRate 0.0003   Epoch: 21   Global Step: 36940   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:41:45,490-Speed 9418.86 samples/sec   Loss 1.5007   LearningRate 0.0003   Epoch: 21   Global Step: 36950   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:42:11,615-Speed 9407.71 samples/sec   Loss 1.4941   LearningRate 0.0003   Epoch: 21   Global Step: 36960   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 23:42:37,733-Speed 9409.79 samples/sec   Loss 1.4914   LearningRate 0.0003   Epoch: 21   Global Step: 36970   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:43:03,908-Speed 9389.47 samples/sec   Loss 1.5083   LearningRate 0.0003   Epoch: 21   Global Step: 36980   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:43:30,008-Speed 9416.54 samples/sec   Loss 1.5009   LearningRate 0.0003   Epoch: 21   Global Step: 36990   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:43:56,105-Speed 9417.54 samples/sec   Loss 1.5016   LearningRate 0.0003   Epoch: 21   Global Step: 37000   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:44:22,276-Speed 9391.35 samples/sec   Loss 1.4959   LearningRate 0.0003   Epoch: 21   Global Step: 37010   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:44:48,409-Speed 9404.75 samples/sec   Loss 1.5071   LearningRate 0.0003   Epoch: 21   Global Step: 37020   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:45:14,562-Speed 9397.53 samples/sec   Loss 1.4949   LearningRate 0.0003   Epoch: 21   Global Step: 37030   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:45:40,714-Speed 9397.52 samples/sec   Loss 1.4936   LearningRate 0.0003   Epoch: 21   Global Step: 37040   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:46:06,869-Speed 9396.70 samples/sec   Loss 1.4954   LearningRate 0.0003   Epoch: 21   Global Step: 37050   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:46:33,036-Speed 9393.19 samples/sec   Loss 1.4921   LearningRate 0.0003   Epoch: 21   Global Step: 37060   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:46:59,137-Speed 9416.10 samples/sec   Loss 1.5016   LearningRate 0.0003   Epoch: 21   Global Step: 37070   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 23:47:25,261-Speed 9407.86 samples/sec   Loss 1.4899   LearningRate 0.0003   Epoch: 21   Global Step: 37080   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 23:47:51,338-Speed 9424.89 samples/sec   Loss 1.4864   LearningRate 0.0003   Epoch: 21   Global Step: 37090   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 23:48:17,518-Speed 9387.64 samples/sec   Loss 1.5062   LearningRate 0.0003   Epoch: 21   Global Step: 37100   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 23:48:43,614-Speed 9419.76 samples/sec   Loss 1.4975   LearningRate 0.0003   Epoch: 21   Global Step: 37110   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 23:49:09,778-Speed 9393.23 samples/sec   Loss 1.4924   LearningRate 0.0003   Epoch: 21   Global Step: 37120   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 23:49:35,960-Speed 9386.89 samples/sec   Loss 1.4962   LearningRate 0.0003   Epoch: 21   Global Step: 37130   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 23:50:02,078-Speed 9410.13 samples/sec   Loss 1.4947   LearningRate 0.0003   Epoch: 21   Global Step: 37140   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-05 23:50:28,205-Speed 9407.05 samples/sec   Loss 1.4894   LearningRate 0.0003   Epoch: 21   Global Step: 37150   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:50:54,304-Speed 9417.08 samples/sec   Loss 1.4884   LearningRate 0.0003   Epoch: 21   Global Step: 37160   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-05 23:51:20,466-Speed 9393.91 samples/sec   Loss 1.4809   LearningRate 0.0003   Epoch: 21   Global Step: 37170   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-05 23:51:46,602-Speed 9403.47 samples/sec   Loss 1.4806   LearningRate 0.0003   Epoch: 21   Global Step: 37180   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-05 23:52:12,782-Speed 9387.99 samples/sec   Loss 1.4855   LearningRate 0.0003   Epoch: 21   Global Step: 37190   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-05 23:52:39,003-Speed 9373.42 samples/sec   Loss 1.4893   LearningRate 0.0003   Epoch: 21   Global Step: 37200   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-05 23:53:05,127-Speed 9407.68 samples/sec   Loss 1.4920   LearningRate 0.0003   Epoch: 21   Global Step: 37210   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-05 23:53:31,258-Speed 9405.28 samples/sec   Loss 1.4956   LearningRate 0.0003   Epoch: 21   Global Step: 37220   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-05 23:53:57,392-Speed 9404.41 samples/sec   Loss 1.4898   LearningRate 0.0003   Epoch: 21   Global Step: 37230   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-05 23:54:23,492-Speed 9416.13 samples/sec   Loss 1.4913   LearningRate 0.0003   Epoch: 21   Global Step: 37240   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-05 23:54:49,576-Speed 9422.40 samples/sec   Loss 1.4952   LearningRate 0.0003   Epoch: 21   Global Step: 37250   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-05 23:55:15,708-Speed 9405.22 samples/sec   Loss 1.4830   LearningRate 0.0003   Epoch: 21   Global Step: 37260   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-05 23:55:41,831-Speed 9408.14 samples/sec   Loss 1.4783   LearningRate 0.0003   Epoch: 21   Global Step: 37270   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-05 23:56:07,986-Speed 9396.68 samples/sec   Loss 1.4815   LearningRate 0.0003   Epoch: 21   Global Step: 37280   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-05 23:56:34,177-Speed 9383.56 samples/sec   Loss 1.4780   LearningRate 0.0003   Epoch: 21   Global Step: 37290   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-05 23:57:00,321-Speed 9400.87 samples/sec   Loss 1.4892   LearningRate 0.0003   Epoch: 21   Global Step: 37300   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-05 23:57:26,484-Speed 9393.66 samples/sec   Loss 1.4807   LearningRate 0.0003   Epoch: 21   Global Step: 37310   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-05 23:57:52,611-Speed 9406.87 samples/sec   Loss 1.4771   LearningRate 0.0003   Epoch: 21   Global Step: 37320   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-05 23:58:18,768-Speed 9395.82 samples/sec   Loss 1.4828   LearningRate 0.0003   Epoch: 21   Global Step: 37330   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-05 23:58:44,985-Speed 9374.40 samples/sec   Loss 1.4822   LearningRate 0.0003   Epoch: 21   Global Step: 37340   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-05 23:59:11,184-Speed 9381.05 samples/sec   Loss 1.4813   LearningRate 0.0003   Epoch: 21   Global Step: 37350   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-03-05 23:59:37,343-Speed 9395.25 samples/sec   Loss 1.4713   LearningRate 0.0003   Epoch: 21   Global Step: 37360   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:00:03,520-Speed 9389.07 samples/sec   Loss 1.4845   LearningRate 0.0003   Epoch: 21   Global Step: 37370   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-06 00:00:29,728-Speed 9377.66 samples/sec   Loss 1.4821   LearningRate 0.0003   Epoch: 21   Global Step: 37380   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-06 00:00:55,873-Speed 9400.56 samples/sec   Loss 1.4743   LearningRate 0.0003   Epoch: 21   Global Step: 37390   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-06 00:01:21,976-Speed 9415.61 samples/sec   Loss 1.4789   LearningRate 0.0003   Epoch: 21   Global Step: 37400   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-06 00:01:48,124-Speed 9399.43 samples/sec   Loss 1.4850   LearningRate 0.0003   Epoch: 21   Global Step: 37410   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-06 00:02:14,307-Speed 9386.58 samples/sec   Loss 1.4786   LearningRate 0.0003   Epoch: 21   Global Step: 37420   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-06 00:02:40,409-Speed 9415.61 samples/sec   Loss 1.4758   LearningRate 0.0003   Epoch: 21   Global Step: 37430   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-06 00:03:06,549-Speed 9402.33 samples/sec   Loss 1.4779   LearningRate 0.0003   Epoch: 21   Global Step: 37440   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-06 00:03:32,670-Speed 9409.17 samples/sec   Loss 1.4802   LearningRate 0.0003   Epoch: 21   Global Step: 37450   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-06 00:03:58,813-Speed 9400.83 samples/sec   Loss 1.4748   LearningRate 0.0003   Epoch: 21   Global Step: 37460   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-06 00:04:25,017-Speed 9379.58 samples/sec   Loss 1.4802   LearningRate 0.0003   Epoch: 21   Global Step: 37470   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:04:51,185-Speed 9392.03 samples/sec   Loss 1.4761   LearningRate 0.0003   Epoch: 21   Global Step: 37480   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:05:17,322-Speed 9403.23 samples/sec   Loss 1.4722   LearningRate 0.0003   Epoch: 21   Global Step: 37490   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:05:43,397-Speed 9425.61 samples/sec   Loss 1.4705   LearningRate 0.0003   Epoch: 21   Global Step: 37500   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:06:09,465-Speed 9428.15 samples/sec   Loss 1.4745   LearningRate 0.0003   Epoch: 21   Global Step: 37510   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:06:35,654-Speed 9384.73 samples/sec   Loss 1.4614   LearningRate 0.0003   Epoch: 21   Global Step: 37520   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:07:01,733-Speed 9424.21 samples/sec   Loss 1.4805   LearningRate 0.0003   Epoch: 21   Global Step: 37530   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-06 00:07:27,805-Speed 9426.61 samples/sec   Loss 1.4760   LearningRate 0.0003   Epoch: 21   Global Step: 37540   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-06 00:07:53,968-Speed 9393.76 samples/sec   Loss 1.4727   LearningRate 0.0003   Epoch: 21   Global Step: 37550   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-06 00:08:20,067-Speed 9416.83 samples/sec   Loss 1.4860   LearningRate 0.0003   Epoch: 21   Global Step: 37560   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-06 00:08:46,188-Speed 9408.82 samples/sec   Loss 1.4712   LearningRate 0.0003   Epoch: 21   Global Step: 37570   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-06 00:09:12,372-Speed 9386.56 samples/sec   Loss 1.4750   LearningRate 0.0003   Epoch: 21   Global Step: 37580   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-06 00:09:38,533-Speed 9394.56 samples/sec   Loss 1.4723   LearningRate 0.0003   Epoch: 21   Global Step: 37590   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-06 00:10:04,651-Speed 9410.14 samples/sec   Loss 1.4653   LearningRate 0.0003   Epoch: 21   Global Step: 37600   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-06 00:10:30,692-Speed 9437.65 samples/sec   Loss 1.4673   LearningRate 0.0003   Epoch: 21   Global Step: 37610   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-06 00:10:56,776-Speed 9422.31 samples/sec   Loss 1.4653   LearningRate 0.0003   Epoch: 21   Global Step: 37620   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-06 00:11:22,878-Speed 9415.87 samples/sec   Loss 1.4723   LearningRate 0.0003   Epoch: 21   Global Step: 37630   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:11:49,055-Speed 9388.74 samples/sec   Loss 1.4627   LearningRate 0.0003   Epoch: 21   Global Step: 37640   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:12:15,216-Speed 9394.42 samples/sec   Loss 1.4735   LearningRate 0.0003   Epoch: 21   Global Step: 37650   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:12:41,370-Speed 9397.29 samples/sec   Loss 1.4673   LearningRate 0.0003   Epoch: 21   Global Step: 37660   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:13:07,500-Speed 9405.56 samples/sec   Loss 1.4686   LearningRate 0.0003   Epoch: 21   Global Step: 37670   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:13:33,645-Speed 9400.49 samples/sec   Loss 1.4674   LearningRate 0.0003   Epoch: 21   Global Step: 37680   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-06 00:13:59,792-Speed 9399.40 samples/sec   Loss 1.4581   LearningRate 0.0003   Epoch: 21   Global Step: 37690   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-06 00:14:25,932-Speed 9402.41 samples/sec   Loss 1.4682   LearningRate 0.0003   Epoch: 21   Global Step: 37700   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-06 00:14:51,978-Speed 9436.24 samples/sec   Loss 1.4635   LearningRate 0.0003   Epoch: 21   Global Step: 37710   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-06 00:15:18,011-Speed 9440.67 samples/sec   Loss 1.4604   LearningRate 0.0003   Epoch: 21   Global Step: 37720   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-06 00:15:44,074-Speed 9430.05 samples/sec   Loss 1.4699   LearningRate 0.0003   Epoch: 21   Global Step: 37730   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-06 00:16:10,197-Speed 9408.26 samples/sec   Loss 1.4556   LearningRate 0.0003   Epoch: 21   Global Step: 37740   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-06 00:16:36,329-Speed 9404.90 samples/sec   Loss 1.4651   LearningRate 0.0003   Epoch: 21   Global Step: 37750   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-06 00:17:02,449-Speed 9409.44 samples/sec   Loss 1.4555   LearningRate 0.0003   Epoch: 21   Global Step: 37760   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-06 00:17:28,525-Speed 9425.36 samples/sec   Loss 1.4681   LearningRate 0.0003   Epoch: 21   Global Step: 37770   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-06 00:17:54,595-Speed 9427.19 samples/sec   Loss 1.4793   LearningRate 0.0003   Epoch: 21   Global Step: 37780   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:18:20,703-Speed 9413.88 samples/sec   Loss 1.4866   LearningRate 0.0003   Epoch: 21   Global Step: 37790   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:18:46,820-Speed 9410.36 samples/sec   Loss 1.4700   LearningRate 0.0003   Epoch: 21   Global Step: 37800   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:19:13,019-Speed 9381.04 samples/sec   Loss 1.4674   LearningRate 0.0003   Epoch: 21   Global Step: 37810   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:19:39,185-Speed 9392.96 samples/sec   Loss 1.4641   LearningRate 0.0003   Epoch: 21   Global Step: 37820   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:20:05,273-Speed 9420.92 samples/sec   Loss 1.4647   LearningRate 0.0003   Epoch: 21   Global Step: 37830   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:20:31,536-Speed 9358.08 samples/sec   Loss 1.4627   LearningRate 0.0003   Epoch: 21   Global Step: 37840   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:20:57,699-Speed 9394.10 samples/sec   Loss 1.4567   LearningRate 0.0003   Epoch: 21   Global Step: 37850   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:21:23,816-Speed 9410.37 samples/sec   Loss 1.4701   LearningRate 0.0003   Epoch: 21   Global Step: 37860   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:21:49,938-Speed 9408.71 samples/sec   Loss 1.4530   LearningRate 0.0003   Epoch: 21   Global Step: 37870   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:22:16,116-Speed 9388.55 samples/sec   Loss 1.4692   LearningRate 0.0003   Epoch: 21   Global Step: 37880   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:22:42,192-Speed 9425.25 samples/sec   Loss 1.4530   LearningRate 0.0003   Epoch: 21   Global Step: 37890   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:23:08,311-Speed 9409.69 samples/sec   Loss 1.4537   LearningRate 0.0003   Epoch: 21   Global Step: 37900   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:23:34,487-Speed 9389.39 samples/sec   Loss 1.4608   LearningRate 0.0003   Epoch: 21   Global Step: 37910   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:24:00,653-Speed 9393.06 samples/sec   Loss 1.4656   LearningRate 0.0003   Epoch: 21   Global Step: 37920   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:24:26,877-Speed 9372.44 samples/sec   Loss 1.4612   LearningRate 0.0003   Epoch: 21   Global Step: 37930   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:24:53,091-Speed 9375.46 samples/sec   Loss 1.4818   LearningRate 0.0003   Epoch: 21   Global Step: 37940   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:25:19,173-Speed 9423.16 samples/sec   Loss 1.4700   LearningRate 0.0003   Epoch: 21   Global Step: 37950   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:25:45,317-Speed 9400.39 samples/sec   Loss 1.4772   LearningRate 0.0003   Epoch: 21   Global Step: 37960   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:26:11,400-Speed 9422.59 samples/sec   Loss 1.4609   LearningRate 0.0003   Epoch: 21   Global Step: 37970   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:26:37,522-Speed 9408.62 samples/sec   Loss 1.4565   LearningRate 0.0003   Epoch: 21   Global Step: 37980   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-03-06 00:27:03,544-Speed 9444.76 samples/sec   Loss 1.4651   LearningRate 0.0003   Epoch: 21   Global Step: 37990   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:27:29,638-Speed 9418.61 samples/sec   Loss 1.4725   LearningRate 0.0003   Epoch: 21   Global Step: 38000   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:27:55,748-Speed 9413.20 samples/sec   Loss 1.4656   LearningRate 0.0003   Epoch: 21   Global Step: 38010   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:28:21,794-Speed 9435.69 samples/sec   Loss 1.4715   LearningRate 0.0002   Epoch: 21   Global Step: 38020   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:29:41,170-Speed 3096.21 samples/sec   Loss 1.4485   LearningRate 0.0002   Epoch: 22   Global Step: 38030   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:30:07,129-Speed 9467.86 samples/sec   Loss 1.4563   LearningRate 0.0002   Epoch: 22   Global Step: 38040   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:30:33,142-Speed 9447.84 samples/sec   Loss 1.4259   LearningRate 0.0002   Epoch: 22   Global Step: 38050   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:30:59,207-Speed 9429.32 samples/sec   Loss 1.4398   LearningRate 0.0002   Epoch: 22   Global Step: 38060   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:31:25,346-Speed 9402.33 samples/sec   Loss 1.4425   LearningRate 0.0002   Epoch: 22   Global Step: 38070   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-06 00:31:51,485-Speed 9402.48 samples/sec   Loss 1.4401   LearningRate 0.0002   Epoch: 22   Global Step: 38080   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-06 00:32:17,626-Speed 9402.04 samples/sec   Loss 1.4328   LearningRate 0.0002   Epoch: 22   Global Step: 38090   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-06 00:32:43,766-Speed 9402.04 samples/sec   Loss 1.4351   LearningRate 0.0002   Epoch: 22   Global Step: 38100   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-06 00:33:09,908-Speed 9401.29 samples/sec   Loss 1.4428   LearningRate 0.0002   Epoch: 22   Global Step: 38110   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-06 00:33:35,984-Speed 9425.06 samples/sec   Loss 1.4325   LearningRate 0.0002   Epoch: 22   Global Step: 38120   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-06 00:34:02,171-Speed 9385.24 samples/sec   Loss 1.4402   LearningRate 0.0002   Epoch: 22   Global Step: 38130   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-06 00:34:28,398-Speed 9370.97 samples/sec   Loss 1.4406   LearningRate 0.0002   Epoch: 22   Global Step: 38140   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-06 00:34:54,494-Speed 9417.79 samples/sec   Loss 1.4409   LearningRate 0.0002   Epoch: 22   Global Step: 38150   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-06 00:35:20,608-Speed 9411.28 samples/sec   Loss 1.4252   LearningRate 0.0002   Epoch: 22   Global Step: 38160   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-06 00:35:46,717-Speed 9413.23 samples/sec   Loss 1.4439   LearningRate 0.0002   Epoch: 22   Global Step: 38170   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:36:12,964-Speed 9363.72 samples/sec   Loss 1.4329   LearningRate 0.0002   Epoch: 22   Global Step: 38180   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:36:39,038-Speed 9425.92 samples/sec   Loss 1.4391   LearningRate 0.0002   Epoch: 22   Global Step: 38190   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:37:05,137-Speed 9416.59 samples/sec   Loss 1.4501   LearningRate 0.0002   Epoch: 22   Global Step: 38200   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:37:31,215-Speed 9424.42 samples/sec   Loss 1.4410   LearningRate 0.0002   Epoch: 22   Global Step: 38210   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:37:57,346-Speed 9405.36 samples/sec   Loss 1.4546   LearningRate 0.0002   Epoch: 22   Global Step: 38220   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:38:23,400-Speed 9433.22 samples/sec   Loss 1.4373   LearningRate 0.0002   Epoch: 22   Global Step: 38230   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:38:49,473-Speed 9426.23 samples/sec   Loss 1.4407   LearningRate 0.0002   Epoch: 22   Global Step: 38240   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:39:15,580-Speed 9413.94 samples/sec   Loss 1.4412   LearningRate 0.0002   Epoch: 22   Global Step: 38250   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:39:41,767-Speed 9385.07 samples/sec   Loss 1.4492   LearningRate 0.0002   Epoch: 22   Global Step: 38260   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:40:07,932-Speed 9392.97 samples/sec   Loss 1.4442   LearningRate 0.0002   Epoch: 22   Global Step: 38270   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-03-06 00:40:34,023-Speed 9419.96 samples/sec   Loss 1.4385   LearningRate 0.0002   Epoch: 22   Global Step: 38280   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-03-06 00:41:00,098-Speed 9425.37 samples/sec   Loss 1.4372   LearningRate 0.0002   Epoch: 22   Global Step: 38290   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:41:26,265-Speed 9392.15 samples/sec   Loss 1.4248   LearningRate 0.0002   Epoch: 22   Global Step: 38300   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:41:52,429-Speed 9393.62 samples/sec   Loss 1.4396   LearningRate 0.0002   Epoch: 22   Global Step: 38310   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:42:18,504-Speed 9425.61 samples/sec   Loss 1.4465   LearningRate 0.0002   Epoch: 22   Global Step: 38320   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:42:44,618-Speed 9411.66 samples/sec   Loss 1.4451   LearningRate 0.0002   Epoch: 22   Global Step: 38330   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:43:10,730-Speed 9412.00 samples/sec   Loss 1.4467   LearningRate 0.0002   Epoch: 22   Global Step: 38340   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:43:36,859-Speed 9406.15 samples/sec   Loss 1.4365   LearningRate 0.0002   Epoch: 22   Global Step: 38350   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:44:03,037-Speed 9388.21 samples/sec   Loss 1.4378   LearningRate 0.0002   Epoch: 22   Global Step: 38360   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:44:29,223-Speed 9385.73 samples/sec   Loss 1.4494   LearningRate 0.0002   Epoch: 22   Global Step: 38370   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:44:55,389-Speed 9392.76 samples/sec   Loss 1.4529   LearningRate 0.0002   Epoch: 22   Global Step: 38380   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:45:21,518-Speed 9405.99 samples/sec   Loss 1.4417   LearningRate 0.0002   Epoch: 22   Global Step: 38390   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:45:47,649-Speed 9405.28 samples/sec   Loss 1.4372   LearningRate 0.0002   Epoch: 22   Global Step: 38400   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:46:13,805-Speed 9396.44 samples/sec   Loss 1.4386   LearningRate 0.0002   Epoch: 22   Global Step: 38410   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:46:39,882-Speed 9424.85 samples/sec   Loss 1.4450   LearningRate 0.0002   Epoch: 22   Global Step: 38420   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:47:05,989-Speed 9414.84 samples/sec   Loss 1.4364   LearningRate 0.0002   Epoch: 22   Global Step: 38430   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:47:32,289-Speed 9344.99 samples/sec   Loss 1.4459   LearningRate 0.0002   Epoch: 22   Global Step: 38440   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:47:58,499-Speed 9376.66 samples/sec   Loss 1.4376   LearningRate 0.0002   Epoch: 22   Global Step: 38450   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:48:24,740-Speed 9365.99 samples/sec   Loss 1.4396   LearningRate 0.0002   Epoch: 22   Global Step: 38460   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:48:51,012-Speed 9355.01 samples/sec   Loss 1.4404   LearningRate 0.0002   Epoch: 22   Global Step: 38470   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:49:17,233-Speed 9373.03 samples/sec   Loss 1.4390   LearningRate 0.0002   Epoch: 22   Global Step: 38480   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-06 00:49:43,384-Speed 9398.15 samples/sec   Loss 1.4417   LearningRate 0.0002   Epoch: 22   Global Step: 38490   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-06 00:50:09,600-Speed 9374.85 samples/sec   Loss 1.4343   LearningRate 0.0002   Epoch: 22   Global Step: 38500   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-06 00:50:35,783-Speed 9386.43 samples/sec   Loss 1.4312   LearningRate 0.0002   Epoch: 22   Global Step: 38510   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-06 00:51:01,906-Speed 9408.57 samples/sec   Loss 1.4385   LearningRate 0.0002   Epoch: 22   Global Step: 38520   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-06 00:51:28,199-Speed 9347.42 samples/sec   Loss 1.4279   LearningRate 0.0002   Epoch: 22   Global Step: 38530   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 00:51:54,357-Speed 9395.42 samples/sec   Loss 1.4259   LearningRate 0.0002   Epoch: 22   Global Step: 38540   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 00:52:20,522-Speed 9392.86 samples/sec   Loss 1.4341   LearningRate 0.0002   Epoch: 22   Global Step: 38550   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 00:52:46,717-Speed 9382.73 samples/sec   Loss 1.4340   LearningRate 0.0002   Epoch: 22   Global Step: 38560   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 00:53:12,895-Speed 9388.59 samples/sec   Loss 1.4410   LearningRate 0.0002   Epoch: 22   Global Step: 38570   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 00:53:39,080-Speed 9386.16 samples/sec   Loss 1.4416   LearningRate 0.0002   Epoch: 22   Global Step: 38580   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 00:54:05,265-Speed 9385.60 samples/sec   Loss 1.4297   LearningRate 0.0002   Epoch: 22   Global Step: 38590   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 00:54:31,434-Speed 9391.59 samples/sec   Loss 1.4257   LearningRate 0.0002   Epoch: 22   Global Step: 38600   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 00:54:57,577-Speed 9401.03 samples/sec   Loss 1.4295   LearningRate 0.0002   Epoch: 22   Global Step: 38610   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 00:55:23,734-Speed 9396.30 samples/sec   Loss 1.4360   LearningRate 0.0002   Epoch: 22   Global Step: 38620   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 00:55:49,902-Speed 9392.17 samples/sec   Loss 1.4220   LearningRate 0.0002   Epoch: 22   Global Step: 38630   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 00:56:16,156-Speed 9361.20 samples/sec   Loss 1.4246   LearningRate 0.0002   Epoch: 22   Global Step: 38640   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 00:56:42,447-Speed 9348.27 samples/sec   Loss 1.4288   LearningRate 0.0002   Epoch: 22   Global Step: 38650   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 00:57:08,774-Speed 9335.18 samples/sec   Loss 1.4329   LearningRate 0.0002   Epoch: 22   Global Step: 38660   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 00:57:34,949-Speed 9389.58 samples/sec   Loss 1.4228   LearningRate 0.0002   Epoch: 22   Global Step: 38670   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 00:58:01,061-Speed 9412.19 samples/sec   Loss 1.4259   LearningRate 0.0002   Epoch: 22   Global Step: 38680   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 00:58:27,217-Speed 9396.34 samples/sec   Loss 1.4424   LearningRate 0.0002   Epoch: 22   Global Step: 38690   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 00:58:53,441-Speed 9372.14 samples/sec   Loss 1.4189   LearningRate 0.0002   Epoch: 22   Global Step: 38700   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 00:59:19,579-Speed 9402.73 samples/sec   Loss 1.4218   LearningRate 0.0002   Epoch: 22   Global Step: 38710   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 00:59:45,726-Speed 9399.69 samples/sec   Loss 1.4289   LearningRate 0.0002   Epoch: 22   Global Step: 38720   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 01:00:11,889-Speed 9393.90 samples/sec   Loss 1.4242   LearningRate 0.0002   Epoch: 22   Global Step: 38730   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 01:00:38,064-Speed 9389.50 samples/sec   Loss 1.4275   LearningRate 0.0002   Epoch: 22   Global Step: 38740   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 01:01:04,323-Speed 9359.32 samples/sec   Loss 1.4168   LearningRate 0.0002   Epoch: 22   Global Step: 38750   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 01:01:30,580-Speed 9360.24 samples/sec   Loss 1.4156   LearningRate 0.0002   Epoch: 22   Global Step: 38760   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 01:01:56,777-Speed 9381.67 samples/sec   Loss 1.4314   LearningRate 0.0002   Epoch: 22   Global Step: 38770   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 01:02:22,961-Speed 9386.58 samples/sec   Loss 1.4104   LearningRate 0.0002   Epoch: 22   Global Step: 38780   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:02:49,172-Speed 9376.62 samples/sec   Loss 1.4223   LearningRate 0.0002   Epoch: 22   Global Step: 38790   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:03:15,431-Speed 9359.54 samples/sec   Loss 1.4222   LearningRate 0.0002   Epoch: 22   Global Step: 38800   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:03:41,610-Speed 9387.88 samples/sec   Loss 1.4184   LearningRate 0.0002   Epoch: 22   Global Step: 38810   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:04:07,819-Speed 9377.49 samples/sec   Loss 1.4317   LearningRate 0.0002   Epoch: 22   Global Step: 38820   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:04:34,068-Speed 9362.89 samples/sec   Loss 1.4269   LearningRate 0.0002   Epoch: 22   Global Step: 38830   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:05:00,238-Speed 9391.48 samples/sec   Loss 1.4198   LearningRate 0.0002   Epoch: 22   Global Step: 38840   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:05:26,435-Speed 9381.39 samples/sec   Loss 1.4256   LearningRate 0.0002   Epoch: 22   Global Step: 38850   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:05:52,635-Speed 9380.69 samples/sec   Loss 1.4199   LearningRate 0.0002   Epoch: 22   Global Step: 38860   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:06:18,913-Speed 9352.49 samples/sec   Loss 1.4144   LearningRate 0.0002   Epoch: 22   Global Step: 38870   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:06:45,062-Speed 9399.26 samples/sec   Loss 1.4258   LearningRate 0.0002   Epoch: 22   Global Step: 38880   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-03-06 01:07:11,251-Speed 9384.20 samples/sec   Loss 1.4167   LearningRate 0.0002   Epoch: 22   Global Step: 38890   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-03-06 01:07:37,392-Speed 9401.94 samples/sec   Loss 1.4196   LearningRate 0.0002   Epoch: 22   Global Step: 38900   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:08:03,587-Speed 9382.22 samples/sec   Loss 1.4115   LearningRate 0.0002   Epoch: 22   Global Step: 38910   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:08:29,829-Speed 9365.52 samples/sec   Loss 1.4008   LearningRate 0.0002   Epoch: 22   Global Step: 38920   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:08:56,034-Speed 9378.97 samples/sec   Loss 1.4145   LearningRate 0.0002   Epoch: 22   Global Step: 38930   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:09:22,217-Speed 9386.59 samples/sec   Loss 1.4195   LearningRate 0.0002   Epoch: 22   Global Step: 38940   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:09:48,353-Speed 9403.59 samples/sec   Loss 1.4142   LearningRate 0.0002   Epoch: 22   Global Step: 38950   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:10:14,487-Speed 9404.00 samples/sec   Loss 1.4169   LearningRate 0.0002   Epoch: 22   Global Step: 38960   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:10:40,684-Speed 9381.72 samples/sec   Loss 1.4106   LearningRate 0.0002   Epoch: 22   Global Step: 38970   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 01:11:06,860-Speed 9389.49 samples/sec   Loss 1.4095   LearningRate 0.0002   Epoch: 22   Global Step: 38980   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 01:11:32,962-Speed 9415.56 samples/sec   Loss 1.4102   LearningRate 0.0002   Epoch: 22   Global Step: 38990   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 01:11:59,234-Speed 9354.76 samples/sec   Loss 1.4058   LearningRate 0.0002   Epoch: 22   Global Step: 39000   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 01:12:25,326-Speed 9419.22 samples/sec   Loss 1.4131   LearningRate 0.0002   Epoch: 22   Global Step: 39010   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 01:12:51,519-Speed 9383.10 samples/sec   Loss 1.4081   LearningRate 0.0002   Epoch: 22   Global Step: 39020   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 01:13:17,739-Speed 9373.37 samples/sec   Loss 1.4037   LearningRate 0.0002   Epoch: 22   Global Step: 39030   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 01:13:43,876-Speed 9403.00 samples/sec   Loss 1.4080   LearningRate 0.0002   Epoch: 22   Global Step: 39040   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 01:14:10,026-Speed 9398.77 samples/sec   Loss 1.4141   LearningRate 0.0002   Epoch: 22   Global Step: 39050   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 01:14:36,160-Speed 9404.01 samples/sec   Loss 1.4085   LearningRate 0.0002   Epoch: 22   Global Step: 39060   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 01:15:02,324-Speed 9393.56 samples/sec   Loss 1.4068   LearningRate 0.0002   Epoch: 22   Global Step: 39070   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:15:28,395-Speed 9427.24 samples/sec   Loss 1.4111   LearningRate 0.0002   Epoch: 22   Global Step: 39080   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:15:54,563-Speed 9392.00 samples/sec   Loss 1.4102   LearningRate 0.0002   Epoch: 22   Global Step: 39090   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:16:20,627-Speed 9429.55 samples/sec   Loss 1.4155   LearningRate 0.0002   Epoch: 22   Global Step: 39100   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:16:46,739-Speed 9412.25 samples/sec   Loss 1.4132   LearningRate 0.0002   Epoch: 22   Global Step: 39110   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 01:17:12,851-Speed 9412.55 samples/sec   Loss 1.4090   LearningRate 0.0002   Epoch: 22   Global Step: 39120   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 01:17:38,917-Speed 9428.57 samples/sec   Loss 1.4065   LearningRate 0.0002   Epoch: 22   Global Step: 39130   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 01:18:05,004-Speed 9421.25 samples/sec   Loss 1.4107   LearningRate 0.0002   Epoch: 22   Global Step: 39140   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 01:18:31,153-Speed 9398.86 samples/sec   Loss 1.4147   LearningRate 0.0002   Epoch: 22   Global Step: 39150   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 01:18:57,342-Speed 9384.48 samples/sec   Loss 1.4082   LearningRate 0.0002   Epoch: 22   Global Step: 39160   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 01:19:23,516-Speed 9389.82 samples/sec   Loss 1.4015   LearningRate 0.0002   Epoch: 22   Global Step: 39170   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 01:19:49,708-Speed 9383.43 samples/sec   Loss 1.4138   LearningRate 0.0002   Epoch: 22   Global Step: 39180   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 01:20:15,796-Speed 9420.89 samples/sec   Loss 1.4101   LearningRate 0.0002   Epoch: 22   Global Step: 39190   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 01:20:42,044-Speed 9363.38 samples/sec   Loss 1.4053   LearningRate 0.0002   Epoch: 22   Global Step: 39200   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 01:21:08,202-Speed 9395.79 samples/sec   Loss 1.4042   LearningRate 0.0002   Epoch: 22   Global Step: 39210   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:21:34,308-Speed 9414.53 samples/sec   Loss 1.3971   LearningRate 0.0002   Epoch: 22   Global Step: 39220   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:22:00,408-Speed 9416.63 samples/sec   Loss 1.3892   LearningRate 0.0002   Epoch: 22   Global Step: 39230   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:22:26,558-Speed 9398.41 samples/sec   Loss 1.3965   LearningRate 0.0002   Epoch: 22   Global Step: 39240   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:22:52,711-Speed 9397.52 samples/sec   Loss 1.4148   LearningRate 0.0002   Epoch: 22   Global Step: 39250   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:23:18,874-Speed 9393.95 samples/sec   Loss 1.4087   LearningRate 0.0002   Epoch: 22   Global Step: 39260   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:23:44,985-Speed 9412.75 samples/sec   Loss 1.4046   LearningRate 0.0002   Epoch: 22   Global Step: 39270   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 01:24:11,040-Speed 9432.64 samples/sec   Loss 1.3975   LearningRate 0.0002   Epoch: 22   Global Step: 39280   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 01:24:37,191-Speed 9398.32 samples/sec   Loss 1.4053   LearningRate 0.0002   Epoch: 22   Global Step: 39290   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 01:25:03,333-Speed 9401.24 samples/sec   Loss 1.3981   LearningRate 0.0002   Epoch: 22   Global Step: 39300   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 01:25:29,486-Speed 9397.52 samples/sec   Loss 1.4066   LearningRate 0.0002   Epoch: 22   Global Step: 39310   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 01:25:55,659-Speed 9390.53 samples/sec   Loss 1.4038   LearningRate 0.0002   Epoch: 22   Global Step: 39320   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 01:26:21,804-Speed 9399.90 samples/sec   Loss 1.4020   LearningRate 0.0002   Epoch: 22   Global Step: 39330   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 01:26:47,940-Speed 9403.74 samples/sec   Loss 1.3973   LearningRate 0.0002   Epoch: 22   Global Step: 39340   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 01:27:14,044-Speed 9414.94 samples/sec   Loss 1.3914   LearningRate 0.0002   Epoch: 22   Global Step: 39350   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 01:27:40,190-Speed 9399.90 samples/sec   Loss 1.3985   LearningRate 0.0002   Epoch: 22   Global Step: 39360   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 01:28:06,387-Speed 9381.63 samples/sec   Loss 1.3864   LearningRate 0.0002   Epoch: 22   Global Step: 39370   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:28:32,545-Speed 9395.55 samples/sec   Loss 1.4036   LearningRate 0.0002   Epoch: 22   Global Step: 39380   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:28:58,649-Speed 9414.67 samples/sec   Loss 1.3955   LearningRate 0.0002   Epoch: 22   Global Step: 39390   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:29:24,833-Speed 9386.51 samples/sec   Loss 1.3987   LearningRate 0.0002   Epoch: 22   Global Step: 39400   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:29:50,941-Speed 9413.60 samples/sec   Loss 1.3908   LearningRate 0.0002   Epoch: 22   Global Step: 39410   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:30:17,150-Speed 9377.60 samples/sec   Loss 1.4047   LearningRate 0.0002   Epoch: 22   Global Step: 39420   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:30:43,263-Speed 9411.84 samples/sec   Loss 1.3919   LearningRate 0.0002   Epoch: 22   Global Step: 39430   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:31:09,368-Speed 9414.47 samples/sec   Loss 1.4084   LearningRate 0.0002   Epoch: 22   Global Step: 39440   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:31:35,459-Speed 9420.11 samples/sec   Loss 1.4009   LearningRate 0.0002   Epoch: 22   Global Step: 39450   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:32:01,631-Speed 9390.47 samples/sec   Loss 1.3944   LearningRate 0.0002   Epoch: 22   Global Step: 39460   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:32:27,701-Speed 9427.32 samples/sec   Loss 1.3954   LearningRate 0.0002   Epoch: 22   Global Step: 39470   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-03-06 01:32:53,833-Speed 9404.85 samples/sec   Loss 1.3898   LearningRate 0.0002   Epoch: 22   Global Step: 39480   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:33:19,978-Speed 9400.28 samples/sec   Loss 1.3884   LearningRate 0.0002   Epoch: 22   Global Step: 39490   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:33:46,037-Speed 9431.36 samples/sec   Loss 1.3948   LearningRate 0.0002   Epoch: 22   Global Step: 39500   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:34:12,092-Speed 9433.03 samples/sec   Loss 1.3921   LearningRate 0.0002   Epoch: 22   Global Step: 39510   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:34:38,158-Speed 9428.56 samples/sec   Loss 1.3835   LearningRate 0.0002   Epoch: 22   Global Step: 39520   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:35:04,351-Speed 9383.08 samples/sec   Loss 1.3964   LearningRate 0.0002   Epoch: 22   Global Step: 39530   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:35:30,391-Speed 9438.12 samples/sec   Loss 1.3902   LearningRate 0.0002   Epoch: 22   Global Step: 39540   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:35:56,512-Speed 9409.03 samples/sec   Loss 1.4015   LearningRate 0.0002   Epoch: 22   Global Step: 39550   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:36:22,625-Speed 9411.92 samples/sec   Loss 1.3860   LearningRate 0.0002   Epoch: 22   Global Step: 39560   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:36:48,684-Speed 9431.02 samples/sec   Loss 1.3799   LearningRate 0.0002   Epoch: 22   Global Step: 39570   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:37:14,770-Speed 9421.87 samples/sec   Loss 1.3946   LearningRate 0.0002   Epoch: 22   Global Step: 39580   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-03-06 01:37:40,891-Speed 9408.84 samples/sec   Loss 1.3847   LearningRate 0.0002   Epoch: 22   Global Step: 39590   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:38:07,048-Speed 9395.77 samples/sec   Loss 1.3872   LearningRate 0.0002   Epoch: 22   Global Step: 39600   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:38:35,682-Speed 8583.31 samples/sec   Loss 1.3906   LearningRate 0.0002   Epoch: 22   Global Step: 39610   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:39:01,775-Speed 9419.10 samples/sec   Loss 1.3766   LearningRate 0.0002   Epoch: 22   Global Step: 39620   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 01:39:27,830-Speed 9432.43 samples/sec   Loss 1.3839   LearningRate 0.0002   Epoch: 22   Global Step: 39630   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 01:39:53,990-Speed 9395.41 samples/sec   Loss 1.4102   LearningRate 0.0002   Epoch: 22   Global Step: 39640   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 01:40:20,073-Speed 9422.45 samples/sec   Loss 1.3976   LearningRate 0.0002   Epoch: 22   Global Step: 39650   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 01:40:46,222-Speed 9399.20 samples/sec   Loss 1.3925   LearningRate 0.0002   Epoch: 22   Global Step: 39660   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 01:41:12,393-Speed 9391.28 samples/sec   Loss 1.3927   LearningRate 0.0002   Epoch: 22   Global Step: 39670   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 01:41:38,500-Speed 9413.90 samples/sec   Loss 1.3874   LearningRate 0.0002   Epoch: 22   Global Step: 39680   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 01:42:04,559-Speed 9431.38 samples/sec   Loss 1.3896   LearningRate 0.0002   Epoch: 22   Global Step: 39690   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 01:42:30,628-Speed 9427.60 samples/sec   Loss 1.4071   LearningRate 0.0002   Epoch: 22   Global Step: 39700   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 01:42:56,851-Speed 9372.40 samples/sec   Loss 1.4086   LearningRate 0.0002   Epoch: 22   Global Step: 39710   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 01:43:22,976-Speed 9407.99 samples/sec   Loss 1.4044   LearningRate 0.0002   Epoch: 22   Global Step: 39720   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:43:49,146-Speed 9391.61 samples/sec   Loss 1.4069   LearningRate 0.0002   Epoch: 22   Global Step: 39730   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:44:15,324-Speed 9388.35 samples/sec   Loss 1.4100   LearningRate 0.0002   Epoch: 22   Global Step: 39740   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:44:41,404-Speed 9423.73 samples/sec   Loss 1.3949   LearningRate 0.0002   Epoch: 22   Global Step: 39750   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:46:00,627-Speed 3102.20 samples/sec   Loss 1.3827   LearningRate 0.0002   Epoch: 23   Global Step: 39760   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:46:26,686-Speed 9431.49 samples/sec   Loss 1.3677   LearningRate 0.0002   Epoch: 23   Global Step: 39770   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:46:52,677-Speed 9455.91 samples/sec   Loss 1.3738   LearningRate 0.0002   Epoch: 23   Global Step: 39780   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:47:18,673-Speed 9454.03 samples/sec   Loss 1.3620   LearningRate 0.0002   Epoch: 23   Global Step: 39790   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:47:44,635-Speed 9467.10 samples/sec   Loss 1.3764   LearningRate 0.0002   Epoch: 23   Global Step: 39800   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:48:10,596-Speed 9467.04 samples/sec   Loss 1.3657   LearningRate 0.0002   Epoch: 23   Global Step: 39810   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:48:36,551-Speed 9469.85 samples/sec   Loss 1.3698   LearningRate 0.0002   Epoch: 23   Global Step: 39820   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-03-06 01:49:02,499-Speed 9471.53 samples/sec   Loss 1.3760   LearningRate 0.0002   Epoch: 23   Global Step: 39830   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:49:28,647-Speed 9399.43 samples/sec   Loss 1.3731   LearningRate 0.0002   Epoch: 23   Global Step: 39840   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:49:54,721-Speed 9425.97 samples/sec   Loss 1.3694   LearningRate 0.0002   Epoch: 23   Global Step: 39850   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:50:20,758-Speed 9439.11 samples/sec   Loss 1.3662   LearningRate 0.0002   Epoch: 23   Global Step: 39860   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:50:46,852-Speed 9418.78 samples/sec   Loss 1.3665   LearningRate 0.0002   Epoch: 23   Global Step: 39870   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-06 01:51:12,920-Speed 9428.25 samples/sec   Loss 1.3692   LearningRate 0.0002   Epoch: 23   Global Step: 39880   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 01:51:39,049-Speed 9405.81 samples/sec   Loss 1.3709   LearningRate 0.0002   Epoch: 23   Global Step: 39890   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-06 01:52:05,120-Speed 9426.82 samples/sec   Loss 1.3700   LearningRate 0.0002   Epoch: 23   Global Step: 39900   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-06 01:52:31,178-Speed 9431.74 samples/sec   Loss 1.3669   LearningRate 0.0002   Epoch: 23   Global Step: 39910   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-06 01:52:57,289-Speed 9412.77 samples/sec   Loss 1.3691   LearningRate 0.0002   Epoch: 23   Global Step: 39920   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-06 01:53:23,362-Speed 9426.39 samples/sec   Loss 1.3757   LearningRate 0.0002   Epoch: 23   Global Step: 39930   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-06 01:53:49,544-Speed 9386.83 samples/sec   Loss 1.3692   LearningRate 0.0002   Epoch: 23   Global Step: 39940   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-06 01:54:15,688-Speed 9400.72 samples/sec   Loss 1.3681   LearningRate 0.0002   Epoch: 23   Global Step: 39950   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-06 01:54:41,847-Speed 9395.01 samples/sec   Loss 1.3731   LearningRate 0.0002   Epoch: 23   Global Step: 39960   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-06 01:55:07,906-Speed 9431.49 samples/sec   Loss 1.3701   LearningRate 0.0002   Epoch: 23   Global Step: 39970   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-06 01:55:34,014-Speed 9413.47 samples/sec   Loss 1.3692   LearningRate 0.0002   Epoch: 23   Global Step: 39980   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 01:56:00,177-Speed 9393.81 samples/sec   Loss 1.3726   LearningRate 0.0002   Epoch: 23   Global Step: 39990   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 01:56:26,299-Speed 9408.61 samples/sec   Loss 1.3657   LearningRate 0.0002   Epoch: 23   Global Step: 40000   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 01:56:52,445-Speed 9399.80 samples/sec   Loss 1.3718   LearningRate 0.0002   Epoch: 23   Global Step: 40010   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 01:57:18,636-Speed 9383.72 samples/sec   Loss 1.3658   LearningRate 0.0002   Epoch: 23   Global Step: 40020   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 01:57:44,798-Speed 9394.32 samples/sec   Loss 1.3605   LearningRate 0.0002   Epoch: 23   Global Step: 40030   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 01:58:10,922-Speed 9407.75 samples/sec   Loss 1.3633   LearningRate 0.0002   Epoch: 23   Global Step: 40040   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 01:58:37,060-Speed 9402.69 samples/sec   Loss 1.3708   LearningRate 0.0002   Epoch: 23   Global Step: 40050   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 01:59:03,329-Speed 9356.10 samples/sec   Loss 1.3820   LearningRate 0.0002   Epoch: 23   Global Step: 40060   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 01:59:29,424-Speed 9418.53 samples/sec   Loss 1.3733   LearningRate 0.0002   Epoch: 23   Global Step: 40070   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 01:59:55,575-Speed 9398.08 samples/sec   Loss 1.3617   LearningRate 0.0002   Epoch: 23   Global Step: 40080   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-03-06 02:00:21,668-Speed 9419.06 samples/sec   Loss 1.3621   LearningRate 0.0002   Epoch: 23   Global Step: 40090   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-03-06 02:00:47,761-Speed 9419.17 samples/sec   Loss 1.3751   LearningRate 0.0002   Epoch: 23   Global Step: 40100   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-03-06 02:01:13,830-Speed 9427.73 samples/sec   Loss 1.3682   LearningRate 0.0002   Epoch: 23   Global Step: 40110   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-03-06 02:01:39,929-Speed 9417.21 samples/sec   Loss 1.3632   LearningRate 0.0002   Epoch: 23   Global Step: 40120   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-03-06 02:02:06,016-Speed 9421.32 samples/sec   Loss 1.3741   LearningRate 0.0002   Epoch: 23   Global Step: 40130   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-03-06 02:02:32,209-Speed 9383.08 samples/sec   Loss 1.3598   LearningRate 0.0002   Epoch: 23   Global Step: 40140   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-03-06 02:02:58,309-Speed 9416.61 samples/sec   Loss 1.3683   LearningRate 0.0002   Epoch: 23   Global Step: 40150   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:03:24,513-Speed 9379.75 samples/sec   Loss 1.3769   LearningRate 0.0002   Epoch: 23   Global Step: 40160   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:03:50,629-Speed 9410.63 samples/sec   Loss 1.3673   LearningRate 0.0002   Epoch: 23   Global Step: 40170   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-06 02:04:16,702-Speed 9426.78 samples/sec   Loss 1.3597   LearningRate 0.0002   Epoch: 23   Global Step: 40180   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-06 02:04:42,839-Speed 9403.36 samples/sec   Loss 1.3759   LearningRate 0.0002   Epoch: 23   Global Step: 40190   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-06 02:05:08,934-Speed 9418.34 samples/sec   Loss 1.3646   LearningRate 0.0002   Epoch: 23   Global Step: 40200   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-06 02:05:35,006-Speed 9426.86 samples/sec   Loss 1.3716   LearningRate 0.0002   Epoch: 23   Global Step: 40210   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-06 02:06:01,206-Speed 9380.31 samples/sec   Loss 1.3712   LearningRate 0.0002   Epoch: 23   Global Step: 40220   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-06 02:06:27,421-Speed 9375.06 samples/sec   Loss 1.3611   LearningRate 0.0002   Epoch: 23   Global Step: 40230   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-06 02:06:53,572-Speed 9398.49 samples/sec   Loss 1.3594   LearningRate 0.0002   Epoch: 23   Global Step: 40240   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-06 02:07:19,734-Speed 9394.21 samples/sec   Loss 1.3562   LearningRate 0.0002   Epoch: 23   Global Step: 40250   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-06 02:07:45,810-Speed 9425.45 samples/sec   Loss 1.3656   LearningRate 0.0002   Epoch: 23   Global Step: 40260   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-06 02:08:11,888-Speed 9424.44 samples/sec   Loss 1.3598   LearningRate 0.0002   Epoch: 23   Global Step: 40270   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:08:38,038-Speed 9398.21 samples/sec   Loss 1.3613   LearningRate 0.0002   Epoch: 23   Global Step: 40280   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:09:04,155-Speed 9410.68 samples/sec   Loss 1.3564   LearningRate 0.0002   Epoch: 23   Global Step: 40290   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:09:30,262-Speed 9413.82 samples/sec   Loss 1.3576   LearningRate 0.0002   Epoch: 23   Global Step: 40300   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:09:56,443-Speed 9387.56 samples/sec   Loss 1.3528   LearningRate 0.0002   Epoch: 23   Global Step: 40310   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:10:22,537-Speed 9418.67 samples/sec   Loss 1.3554   LearningRate 0.0002   Epoch: 23   Global Step: 40320   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:10:48,671-Speed 9404.09 samples/sec   Loss 1.3591   LearningRate 0.0002   Epoch: 23   Global Step: 40330   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:11:14,731-Speed 9431.14 samples/sec   Loss 1.3611   LearningRate 0.0002   Epoch: 23   Global Step: 40340   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:11:40,835-Speed 9414.99 samples/sec   Loss 1.3639   LearningRate 0.0002   Epoch: 23   Global Step: 40350   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:12:06,911-Speed 9425.32 samples/sec   Loss 1.3503   LearningRate 0.0002   Epoch: 23   Global Step: 40360   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:12:33,017-Speed 9414.06 samples/sec   Loss 1.3523   LearningRate 0.0002   Epoch: 23   Global Step: 40370   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-03-06 02:12:59,273-Speed 9360.37 samples/sec   Loss 1.3562   LearningRate 0.0002   Epoch: 23   Global Step: 40380   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:13:25,436-Speed 9394.19 samples/sec   Loss 1.3681   LearningRate 0.0002   Epoch: 23   Global Step: 40390   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:13:51,527-Speed 9419.57 samples/sec   Loss 1.3627   LearningRate 0.0002   Epoch: 23   Global Step: 40400   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:14:17,718-Speed 9383.86 samples/sec   Loss 1.3656   LearningRate 0.0002   Epoch: 23   Global Step: 40410   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:14:43,873-Speed 9396.63 samples/sec   Loss 1.3577   LearningRate 0.0002   Epoch: 23   Global Step: 40420   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:15:10,004-Speed 9405.03 samples/sec   Loss 1.3662   LearningRate 0.0002   Epoch: 23   Global Step: 40430   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:15:36,194-Speed 9384.31 samples/sec   Loss 1.3584   LearningRate 0.0002   Epoch: 23   Global Step: 40440   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:16:02,392-Speed 9381.78 samples/sec   Loss 1.3507   LearningRate 0.0002   Epoch: 23   Global Step: 40450   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:16:28,529-Speed 9402.91 samples/sec   Loss 1.3672   LearningRate 0.0002   Epoch: 23   Global Step: 40460   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:16:54,643-Speed 9411.37 samples/sec   Loss 1.3610   LearningRate 0.0002   Epoch: 23   Global Step: 40470   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:17:20,798-Speed 9396.73 samples/sec   Loss 1.3578   LearningRate 0.0002   Epoch: 23   Global Step: 40480   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-03-06 02:17:46,896-Speed 9417.07 samples/sec   Loss 1.3500   LearningRate 0.0002   Epoch: 23   Global Step: 40490   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-03-06 02:18:13,088-Speed 9383.81 samples/sec   Loss 1.3483   LearningRate 0.0002   Epoch: 23   Global Step: 40500   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-03-06 02:18:39,199-Speed 9412.28 samples/sec   Loss 1.3476   LearningRate 0.0002   Epoch: 23   Global Step: 40510   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:19:05,344-Speed 9400.51 samples/sec   Loss 1.3520   LearningRate 0.0002   Epoch: 23   Global Step: 40520   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:19:31,484-Speed 9402.10 samples/sec   Loss 1.3434   LearningRate 0.0002   Epoch: 23   Global Step: 40530   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:19:57,656-Speed 9390.31 samples/sec   Loss 1.3543   LearningRate 0.0002   Epoch: 23   Global Step: 40540   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:20:23,832-Speed 9389.57 samples/sec   Loss 1.3558   LearningRate 0.0002   Epoch: 23   Global Step: 40550   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:20:49,938-Speed 9414.18 samples/sec   Loss 1.3499   LearningRate 0.0002   Epoch: 23   Global Step: 40560   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:21:16,016-Speed 9424.36 samples/sec   Loss 1.3483   LearningRate 0.0002   Epoch: 23   Global Step: 40570   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:21:42,123-Speed 9413.81 samples/sec   Loss 1.3524   LearningRate 0.0002   Epoch: 23   Global Step: 40580   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:22:08,288-Speed 9393.22 samples/sec   Loss 1.3440   LearningRate 0.0002   Epoch: 23   Global Step: 40590   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:22:34,400-Speed 9412.12 samples/sec   Loss 1.3524   LearningRate 0.0002   Epoch: 23   Global Step: 40600   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:23:00,566-Speed 9392.75 samples/sec   Loss 1.3529   LearningRate 0.0002   Epoch: 23   Global Step: 40610   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-03-06 02:23:26,643-Speed 9424.57 samples/sec   Loss 1.3419   LearningRate 0.0002   Epoch: 23   Global Step: 40620   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:23:52,849-Speed 9378.43 samples/sec   Loss 1.3467   LearningRate 0.0002   Epoch: 23   Global Step: 40630   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-06 02:24:19,055-Speed 9378.58 samples/sec   Loss 1.3329   LearningRate 0.0002   Epoch: 23   Global Step: 40640   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-06 02:24:45,159-Speed 9415.66 samples/sec   Loss 1.3426   LearningRate 0.0002   Epoch: 23   Global Step: 40650   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-06 02:25:11,440-Speed 9351.72 samples/sec   Loss 1.3401   LearningRate 0.0002   Epoch: 23   Global Step: 40660   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-06 02:25:37,608-Speed 9391.97 samples/sec   Loss 1.3346   LearningRate 0.0002   Epoch: 23   Global Step: 40670   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-06 02:26:03,779-Speed 9391.13 samples/sec   Loss 1.3420   LearningRate 0.0002   Epoch: 23   Global Step: 40680   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-06 02:26:29,964-Speed 9386.32 samples/sec   Loss 1.3328   LearningRate 0.0002   Epoch: 23   Global Step: 40690   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-06 02:26:56,180-Speed 9374.75 samples/sec   Loss 1.3460   LearningRate 0.0002   Epoch: 23   Global Step: 40700   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-06 02:27:22,281-Speed 9416.24 samples/sec   Loss 1.3448   LearningRate 0.0002   Epoch: 23   Global Step: 40710   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-06 02:27:48,593-Speed 9340.84 samples/sec   Loss 1.3385   LearningRate 0.0002   Epoch: 23   Global Step: 40720   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-06 02:28:14,859-Speed 9356.92 samples/sec   Loss 1.3442   LearningRate 0.0002   Epoch: 23   Global Step: 40730   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:28:41,013-Speed 9397.72 samples/sec   Loss 1.3452   LearningRate 0.0002   Epoch: 23   Global Step: 40740   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:29:07,194-Speed 9387.54 samples/sec   Loss 1.3410   LearningRate 0.0002   Epoch: 23   Global Step: 40750   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:29:33,306-Speed 9412.26 samples/sec   Loss 1.3421   LearningRate 0.0002   Epoch: 23   Global Step: 40760   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:29:59,489-Speed 9386.28 samples/sec   Loss 1.3536   LearningRate 0.0002   Epoch: 23   Global Step: 40770   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:30:25,634-Speed 9400.42 samples/sec   Loss 1.3438   LearningRate 0.0002   Epoch: 23   Global Step: 40780   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:30:51,828-Speed 9382.77 samples/sec   Loss 1.3466   LearningRate 0.0002   Epoch: 23   Global Step: 40790   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:31:18,036-Speed 9377.48 samples/sec   Loss 1.3458   LearningRate 0.0002   Epoch: 23   Global Step: 40800   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:31:44,225-Speed 9384.51 samples/sec   Loss 1.3440   LearningRate 0.0002   Epoch: 23   Global Step: 40810   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:32:10,472-Speed 9363.80 samples/sec   Loss 1.3350   LearningRate 0.0002   Epoch: 23   Global Step: 40820   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:32:36,642-Speed 9396.29 samples/sec   Loss 1.3343   LearningRate 0.0002   Epoch: 23   Global Step: 40830   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-03-06 02:33:02,795-Speed 9397.45 samples/sec   Loss 1.3392   LearningRate 0.0002   Epoch: 23   Global Step: 40840   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-03-06 02:33:28,861-Speed 9428.68 samples/sec   Loss 1.3431   LearningRate 0.0002   Epoch: 23   Global Step: 40850   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:33:54,995-Speed 9404.35 samples/sec   Loss 1.3374   LearningRate 0.0002   Epoch: 23   Global Step: 40860   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:34:21,153-Speed 9395.61 samples/sec   Loss 1.3350   LearningRate 0.0002   Epoch: 23   Global Step: 40870   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:34:47,290-Speed 9403.50 samples/sec   Loss 1.3293   LearningRate 0.0002   Epoch: 23   Global Step: 40880   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:35:13,442-Speed 9397.40 samples/sec   Loss 1.3431   LearningRate 0.0002   Epoch: 23   Global Step: 40890   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:35:39,650-Speed 9377.85 samples/sec   Loss 1.3438   LearningRate 0.0002   Epoch: 23   Global Step: 40900   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:36:05,789-Speed 9402.19 samples/sec   Loss 1.3330   LearningRate 0.0002   Epoch: 23   Global Step: 40910   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:36:31,913-Speed 9407.84 samples/sec   Loss 1.3390   LearningRate 0.0002   Epoch: 23   Global Step: 40920   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:36:58,059-Speed 9399.82 samples/sec   Loss 1.3373   LearningRate 0.0002   Epoch: 23   Global Step: 40930   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:37:24,160-Speed 9416.36 samples/sec   Loss 1.3368   LearningRate 0.0002   Epoch: 23   Global Step: 40940   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:37:50,304-Speed 9400.19 samples/sec   Loss 1.3382   LearningRate 0.0002   Epoch: 23   Global Step: 40950   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-03-06 02:38:16,397-Speed 9419.11 samples/sec   Loss 1.3333   LearningRate 0.0002   Epoch: 23   Global Step: 40960   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-03-06 02:38:42,540-Speed 9401.08 samples/sec   Loss 1.3395   LearningRate 0.0002   Epoch: 23   Global Step: 40970   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-03-06 02:39:08,700-Speed 9394.87 samples/sec   Loss 1.3358   LearningRate 0.0002   Epoch: 23   Global Step: 40980   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:39:34,847-Speed 9399.57 samples/sec   Loss 1.3380   LearningRate 0.0002   Epoch: 23   Global Step: 40990   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:40:00,986-Speed 9402.63 samples/sec   Loss 1.3301   LearningRate 0.0002   Epoch: 23   Global Step: 41000   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:40:27,126-Speed 9401.98 samples/sec   Loss 1.3360   LearningRate 0.0002   Epoch: 23   Global Step: 41010   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:40:53,227-Speed 9416.26 samples/sec   Loss 1.3230   LearningRate 0.0002   Epoch: 23   Global Step: 41020   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:41:19,307-Speed 9423.94 samples/sec   Loss 1.3230   LearningRate 0.0002   Epoch: 23   Global Step: 41030   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:41:45,430-Speed 9408.21 samples/sec   Loss 1.3262   LearningRate 0.0002   Epoch: 23   Global Step: 41040   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:42:11,524-Speed 9418.48 samples/sec   Loss 1.3229   LearningRate 0.0002   Epoch: 23   Global Step: 41050   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:42:37,709-Speed 9385.95 samples/sec   Loss 1.3278   LearningRate 0.0002   Epoch: 23   Global Step: 41060   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:43:03,857-Speed 9399.35 samples/sec   Loss 1.3244   LearningRate 0.0002   Epoch: 23   Global Step: 41070   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:43:29,964-Speed 9414.23 samples/sec   Loss 1.3271   LearningRate 0.0002   Epoch: 23   Global Step: 41080   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:43:56,049-Speed 9421.74 samples/sec   Loss 1.3234   LearningRate 0.0002   Epoch: 23   Global Step: 41090   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:44:22,111-Speed 9430.25 samples/sec   Loss 1.3305   LearningRate 0.0002   Epoch: 23   Global Step: 41100   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:44:48,258-Speed 9399.30 samples/sec   Loss 1.3270   LearningRate 0.0002   Epoch: 23   Global Step: 41110   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:45:14,396-Speed 9403.08 samples/sec   Loss 1.3246   LearningRate 0.0002   Epoch: 23   Global Step: 41120   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:45:40,495-Speed 9417.16 samples/sec   Loss 1.3245   LearningRate 0.0002   Epoch: 23   Global Step: 41130   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:46:06,639-Speed 9400.45 samples/sec   Loss 1.3369   LearningRate 0.0002   Epoch: 23   Global Step: 41140   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:46:32,781-Speed 9401.38 samples/sec   Loss 1.3254   LearningRate 0.0002   Epoch: 23   Global Step: 41150   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:46:58,940-Speed 9395.37 samples/sec   Loss 1.3325   LearningRate 0.0002   Epoch: 23   Global Step: 41160   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:47:25,078-Speed 9403.00 samples/sec   Loss 1.3385   LearningRate 0.0002   Epoch: 23   Global Step: 41170   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:47:51,277-Speed 9380.97 samples/sec   Loss 1.3278   LearningRate 0.0002   Epoch: 23   Global Step: 41180   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-03-06 02:48:17,462-Speed 9385.89 samples/sec   Loss 1.3351   LearningRate 0.0002   Epoch: 23   Global Step: 41190   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-03-06 02:48:43,659-Speed 9381.82 samples/sec   Loss 1.3240   LearningRate 0.0002   Epoch: 23   Global Step: 41200   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:49:09,813-Speed 9396.91 samples/sec   Loss 1.3259   LearningRate 0.0002   Epoch: 23   Global Step: 41210   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:49:35,981-Speed 9392.47 samples/sec   Loss 1.3222   LearningRate 0.0002   Epoch: 23   Global Step: 41220   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:50:02,244-Speed 9357.87 samples/sec   Loss 1.3227   LearningRate 0.0002   Epoch: 23   Global Step: 41230   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-06 02:50:28,431-Speed 9385.26 samples/sec   Loss 1.3289   LearningRate 0.0002   Epoch: 23   Global Step: 41240   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 02:50:54,650-Speed 9373.80 samples/sec   Loss 1.3219   LearningRate 0.0002   Epoch: 23   Global Step: 41250   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 02:51:20,847-Speed 9381.76 samples/sec   Loss 1.3212   LearningRate 0.0002   Epoch: 23   Global Step: 41260   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 02:51:46,972-Speed 9407.55 samples/sec   Loss 1.3259   LearningRate 0.0002   Epoch: 23   Global Step: 41270   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 02:52:13,192-Speed 9373.47 samples/sec   Loss 1.3196   LearningRate 0.0002   Epoch: 23   Global Step: 41280   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 02:52:39,437-Speed 9364.63 samples/sec   Loss 1.3247   LearningRate 0.0002   Epoch: 23   Global Step: 41290   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 02:53:05,631-Speed 9382.40 samples/sec   Loss 1.3333   LearningRate 0.0002   Epoch: 23   Global Step: 41300   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-03-06 02:53:31,823-Speed 9383.63 samples/sec   Loss 1.3270   LearningRate 0.0002   Epoch: 23   Global Step: 41310   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-03-06 02:53:57,988-Speed 9393.27 samples/sec   Loss 1.3262   LearningRate 0.0002   Epoch: 23   Global Step: 41320   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-06 02:54:24,157-Speed 9391.23 samples/sec   Loss 1.3295   LearningRate 0.0002   Epoch: 23   Global Step: 41330   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-06 02:54:50,308-Speed 9398.12 samples/sec   Loss 1.3266   LearningRate 0.0002   Epoch: 23   Global Step: 41340   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-06 02:55:16,448-Speed 9402.35 samples/sec   Loss 1.3292   LearningRate 0.0002   Epoch: 23   Global Step: 41350   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-06 02:55:42,604-Speed 9396.33 samples/sec   Loss 1.3206   LearningRate 0.0002   Epoch: 23   Global Step: 41360   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-06 02:56:08,856-Speed 9362.34 samples/sec   Loss 1.3236   LearningRate 0.0002   Epoch: 23   Global Step: 41370   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-06 02:56:35,050-Speed 9382.55 samples/sec   Loss 1.3179   LearningRate 0.0002   Epoch: 23   Global Step: 41380   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-06 02:57:01,239-Speed 9384.57 samples/sec   Loss 1.3196   LearningRate 0.0002   Epoch: 23   Global Step: 41390   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-06 02:57:27,449-Speed 9376.83 samples/sec   Loss 1.3113   LearningRate 0.0002   Epoch: 23   Global Step: 41400   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-06 02:57:53,592-Speed 9401.10 samples/sec   Loss 1.3338   LearningRate 0.0002   Epoch: 23   Global Step: 41410   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-06 02:58:19,746-Speed 9396.92 samples/sec   Loss 1.3311   LearningRate 0.0002   Epoch: 23   Global Step: 41420   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 02:58:45,887-Speed 9402.17 samples/sec   Loss 1.3283   LearningRate 0.0002   Epoch: 23   Global Step: 41430   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 02:59:12,034-Speed 9399.22 samples/sec   Loss 1.3344   LearningRate 0.0002   Epoch: 23   Global Step: 41440   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 02:59:38,151-Speed 9410.73 samples/sec   Loss 1.3282   LearningRate 0.0002   Epoch: 23   Global Step: 41450   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-06 03:00:04,290-Speed 9402.14 samples/sec   Loss 1.3201   LearningRate 0.0002   Epoch: 23   Global Step: 41460   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-06 03:00:30,516-Speed 9371.41 samples/sec   Loss 1.3273   LearningRate 0.0002   Epoch: 23   Global Step: 41470   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-06 03:00:56,727-Speed 9376.42 samples/sec   Loss 1.3273   LearningRate 0.0002   Epoch: 23   Global Step: 41480   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-06 03:02:15,911-Speed 3103.70 samples/sec   Loss 1.3062   LearningRate 0.0002   Epoch: 24   Global Step: 41490   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-06 03:02:41,901-Speed 9456.83 samples/sec   Loss 1.3068   LearningRate 0.0002   Epoch: 24   Global Step: 41500   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-06 03:03:07,965-Speed 9429.29 samples/sec   Loss 1.2984   LearningRate 0.0002   Epoch: 24   Global Step: 41510   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-06 03:03:33,960-Speed 9454.65 samples/sec   Loss 1.3127   LearningRate 0.0002   Epoch: 24   Global Step: 41520   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-06 03:04:00,013-Speed 9433.67 samples/sec   Loss 1.3081   LearningRate 0.0002   Epoch: 24   Global Step: 41530   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-06 03:04:25,959-Speed 9472.40 samples/sec   Loss 1.3096   LearningRate 0.0002   Epoch: 24   Global Step: 41540   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-06 03:04:51,956-Speed 9453.38 samples/sec   Loss 1.3020   LearningRate 0.0002   Epoch: 24   Global Step: 41550   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 03:05:17,896-Speed 9474.74 samples/sec   Loss 1.3025   LearningRate 0.0002   Epoch: 24   Global Step: 41560   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 03:05:43,806-Speed 9485.81 samples/sec   Loss 1.3112   LearningRate 0.0002   Epoch: 24   Global Step: 41570   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-06 03:06:09,817-Speed 9448.38 samples/sec   Loss 1.3093   LearningRate 0.0002   Epoch: 24   Global Step: 41580   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-06 03:06:35,814-Speed 9453.92 samples/sec   Loss 1.3067   LearningRate 0.0002   Epoch: 24   Global Step: 41590   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-06 03:07:01,763-Speed 9471.24 samples/sec   Loss 1.3043   LearningRate 0.0002   Epoch: 24   Global Step: 41600   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-06 03:07:27,804-Speed 9437.96 samples/sec   Loss 1.2929   LearningRate 0.0002   Epoch: 24   Global Step: 41610   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-06 03:07:53,845-Speed 9437.56 samples/sec   Loss 1.2983   LearningRate 0.0002   Epoch: 24   Global Step: 41620   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-06 03:08:19,881-Speed 9439.73 samples/sec   Loss 1.3123   LearningRate 0.0002   Epoch: 24   Global Step: 41630   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-06 03:08:45,900-Speed 9445.74 samples/sec   Loss 1.3105   LearningRate 0.0002   Epoch: 24   Global Step: 41640   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-06 03:09:11,886-Speed 9457.67 samples/sec   Loss 1.3025   LearningRate 0.0002   Epoch: 24   Global Step: 41650   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-06 03:09:37,903-Speed 9446.53 samples/sec   Loss 1.2988   LearningRate 0.0002   Epoch: 24   Global Step: 41660   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-06 03:10:03,939-Speed 9439.98 samples/sec   Loss 1.2989   LearningRate 0.0002   Epoch: 24   Global Step: 41670   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 03:10:29,973-Speed 9440.47 samples/sec   Loss 1.2986   LearningRate 0.0002   Epoch: 24   Global Step: 41680   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 03:10:56,022-Speed 9434.93 samples/sec   Loss 1.3051   LearningRate 0.0002   Epoch: 24   Global Step: 41690   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 03:11:22,039-Speed 9446.50 samples/sec   Loss 1.3047   LearningRate 0.0002   Epoch: 24   Global Step: 41700   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 03:11:48,125-Speed 9421.27 samples/sec   Loss 1.2989   LearningRate 0.0002   Epoch: 24   Global Step: 41710   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 03:12:14,214-Speed 9420.53 samples/sec   Loss 1.3054   LearningRate 0.0002   Epoch: 24   Global Step: 41720   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-06 03:12:40,268-Speed 9432.90 samples/sec   Loss 1.3141   LearningRate 0.0002   Epoch: 24   Global Step: 41730   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-06 03:13:06,314-Speed 9436.40 samples/sec   Loss 1.3056   LearningRate 0.0002   Epoch: 24   Global Step: 41740   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-06 03:13:32,332-Speed 9446.04 samples/sec   Loss 1.3157   LearningRate 0.0002   Epoch: 24   Global Step: 41750   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-06 03:13:58,371-Speed 9438.55 samples/sec   Loss 1.2983   LearningRate 0.0002   Epoch: 24   Global Step: 41760   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-06 03:14:24,446-Speed 9425.57 samples/sec   Loss 1.3009   LearningRate 0.0002   Epoch: 24   Global Step: 41770   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-06 03:14:50,488-Speed 9437.60 samples/sec   Loss 1.3002   LearningRate 0.0002   Epoch: 24   Global Step: 41780   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-06 03:15:16,600-Speed 9412.19 samples/sec   Loss 1.3001   LearningRate 0.0002   Epoch: 24   Global Step: 41790   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-06 03:15:42,651-Speed 9434.20 samples/sec   Loss 1.3068   LearningRate 0.0002   Epoch: 24   Global Step: 41800   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-06 03:16:08,754-Speed 9415.33 samples/sec   Loss 1.3109   LearningRate 0.0002   Epoch: 24   Global Step: 41810   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-06 03:16:34,803-Speed 9435.28 samples/sec   Loss 1.3055   LearningRate 0.0002   Epoch: 24   Global Step: 41820   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 03:17:00,871-Speed 9427.97 samples/sec   Loss 1.3081   LearningRate 0.0002   Epoch: 24   Global Step: 41830   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 03:17:26,972-Speed 9416.43 samples/sec   Loss 1.2994   LearningRate 0.0002   Epoch: 24   Global Step: 41840   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 03:17:53,033-Speed 9430.76 samples/sec   Loss 1.3054   LearningRate 0.0002   Epoch: 24   Global Step: 41850   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 03:18:19,072-Speed 9438.15 samples/sec   Loss 1.2933   LearningRate 0.0002   Epoch: 24   Global Step: 41860   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 03:18:45,213-Speed 9401.74 samples/sec   Loss 1.2952   LearningRate 0.0002   Epoch: 24   Global Step: 41870   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 03:19:11,396-Speed 9386.89 samples/sec   Loss 1.3011   LearningRate 0.0002   Epoch: 24   Global Step: 41880   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 03:19:37,456-Speed 9430.87 samples/sec   Loss 1.2933   LearningRate 0.0002   Epoch: 24   Global Step: 41890   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 03:20:03,628-Speed 9390.70 samples/sec   Loss 1.2991   LearningRate 0.0002   Epoch: 24   Global Step: 41900   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 03:20:29,747-Speed 9409.51 samples/sec   Loss 1.3006   LearningRate 0.0002   Epoch: 24   Global Step: 41910   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 03:20:55,865-Speed 9409.90 samples/sec   Loss 1.3019   LearningRate 0.0002   Epoch: 24   Global Step: 41920   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-03-06 03:21:21,993-Speed 9406.35 samples/sec   Loss 1.3049   LearningRate 0.0002   Epoch: 24   Global Step: 41930   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-03-06 03:21:48,147-Speed 9397.05 samples/sec   Loss 1.2993   LearningRate 0.0002   Epoch: 24   Global Step: 41940   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-03-06 03:22:14,253-Speed 9414.19 samples/sec   Loss 1.2944   LearningRate 0.0002   Epoch: 24   Global Step: 41950   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 03:22:40,358-Speed 9414.80 samples/sec   Loss 1.3051   LearningRate 0.0002   Epoch: 24   Global Step: 41960   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 03:23:06,632-Speed 9353.91 samples/sec   Loss 1.2984   LearningRate 0.0002   Epoch: 24   Global Step: 41970   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 03:23:32,788-Speed 9396.51 samples/sec   Loss 1.3000   LearningRate 0.0002   Epoch: 24   Global Step: 41980   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 03:23:59,017-Speed 9370.17 samples/sec   Loss 1.2992   LearningRate 0.0002   Epoch: 24   Global Step: 41990   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 03:24:25,236-Speed 9373.50 samples/sec   Loss 1.2992   LearningRate 0.0002   Epoch: 24   Global Step: 42000   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 03:24:51,519-Speed 9351.10 samples/sec   Loss 1.3014   LearningRate 0.0002   Epoch: 24   Global Step: 42010   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 03:25:17,691-Speed 9390.52 samples/sec   Loss 1.2892   LearningRate 0.0002   Epoch: 24   Global Step: 42020   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 03:25:44,007-Speed 9339.07 samples/sec   Loss 1.2918   LearningRate 0.0002   Epoch: 24   Global Step: 42030   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 03:26:10,225-Speed 9374.30 samples/sec   Loss 1.2954   LearningRate 0.0002   Epoch: 24   Global Step: 42040   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 03:26:36,509-Speed 9350.45 samples/sec   Loss 1.2887   LearningRate 0.0002   Epoch: 24   Global Step: 42050   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-03-06 03:27:02,689-Speed 9387.69 samples/sec   Loss 1.2968   LearningRate 0.0002   Epoch: 24   Global Step: 42060   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-03-06 03:27:28,885-Speed 9382.09 samples/sec   Loss 1.2856   LearningRate 0.0002   Epoch: 24   Global Step: 42070   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-03-06 03:27:55,162-Speed 9352.95 samples/sec   Loss 1.2948   LearningRate 0.0002   Epoch: 24   Global Step: 42080   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-03-06 03:28:21,281-Speed 9409.81 samples/sec   Loss 1.2839   LearningRate 0.0002   Epoch: 24   Global Step: 42090   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 03:28:47,721-Speed 9295.61 samples/sec   Loss 1.2856   LearningRate 0.0002   Epoch: 24   Global Step: 42100   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 03:29:14,137-Speed 9303.65 samples/sec   Loss 1.2929   LearningRate 0.0002   Epoch: 24   Global Step: 42110   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 03:29:40,555-Speed 9303.47 samples/sec   Loss 1.2835   LearningRate 0.0002   Epoch: 24   Global Step: 42120   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 03:30:06,820-Speed 9357.21 samples/sec   Loss 1.2944   LearningRate 0.0002   Epoch: 24   Global Step: 42130   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 03:30:33,124-Speed 9343.42 samples/sec   Loss 1.2981   LearningRate 0.0002   Epoch: 24   Global Step: 42140   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 03:30:59,416-Speed 9347.73 samples/sec   Loss 1.2856   LearningRate 0.0002   Epoch: 24   Global Step: 42150   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 03:31:25,859-Speed 9294.56 samples/sec   Loss 1.2958   LearningRate 0.0002   Epoch: 24   Global Step: 42160   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 03:31:52,122-Speed 9357.97 samples/sec   Loss 1.2942   LearningRate 0.0002   Epoch: 24   Global Step: 42170   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 03:32:18,451-Speed 9334.66 samples/sec   Loss 1.2849   LearningRate 0.0002   Epoch: 24   Global Step: 42180   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 03:32:44,734-Speed 9351.25 samples/sec   Loss 1.2959   LearningRate 0.0002   Epoch: 24   Global Step: 42190   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-03-06 03:33:11,065-Speed 9333.79 samples/sec   Loss 1.2860   LearningRate 0.0002   Epoch: 24   Global Step: 42200   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 03:33:37,263-Speed 9381.51 samples/sec   Loss 1.2907   LearningRate 0.0002   Epoch: 24   Global Step: 42210   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 03:34:03,597-Speed 9332.74 samples/sec   Loss 1.2857   LearningRate 0.0002   Epoch: 24   Global Step: 42220   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 03:34:29,792-Speed 9382.46 samples/sec   Loss 1.2924   LearningRate 0.0002   Epoch: 24   Global Step: 42230   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 03:34:56,056-Speed 9357.73 samples/sec   Loss 1.2924   LearningRate 0.0002   Epoch: 24   Global Step: 42240   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 03:35:22,314-Speed 9359.82 samples/sec   Loss 1.2792   LearningRate 0.0002   Epoch: 24   Global Step: 42250   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 03:35:48,443-Speed 9405.75 samples/sec   Loss 1.2902   LearningRate 0.0002   Epoch: 24   Global Step: 42260   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 03:36:14,611-Speed 9392.16 samples/sec   Loss 1.2983   LearningRate 0.0002   Epoch: 24   Global Step: 42270   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 03:36:40,788-Speed 9388.94 samples/sec   Loss 1.2886   LearningRate 0.0002   Epoch: 24   Global Step: 42280   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 03:37:07,011-Speed 9372.25 samples/sec   Loss 1.2822   LearningRate 0.0002   Epoch: 24   Global Step: 42290   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 03:37:33,258-Speed 9364.10 samples/sec   Loss 1.2824   LearningRate 0.0002   Epoch: 24   Global Step: 42300   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 03:37:59,483-Speed 9371.24 samples/sec   Loss 1.2817   LearningRate 0.0002   Epoch: 24   Global Step: 42310   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-06 03:38:25,762-Speed 9352.25 samples/sec   Loss 1.2775   LearningRate 0.0002   Epoch: 24   Global Step: 42320   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-06 03:38:52,056-Speed 9347.33 samples/sec   Loss 1.2786   LearningRate 0.0002   Epoch: 24   Global Step: 42330   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-03-06 03:39:18,348-Speed 9347.69 samples/sec   Loss 1.2779   LearningRate 0.0002   Epoch: 24   Global Step: 42340   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-03-06 03:39:44,618-Speed 9355.39 samples/sec   Loss 1.2816   LearningRate 0.0002   Epoch: 24   Global Step: 42350   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-03-06 03:40:10,957-Speed 9331.19 samples/sec   Loss 1.2804   LearningRate 0.0002   Epoch: 24   Global Step: 42360   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-03-06 03:40:37,254-Speed 9345.81 samples/sec   Loss 1.2763   LearningRate 0.0002   Epoch: 24   Global Step: 42370   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-03-06 03:41:03,522-Speed 9356.25 samples/sec   Loss 1.2758   LearningRate 0.0002   Epoch: 24   Global Step: 42380   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-03-06 03:41:29,772-Speed 9363.09 samples/sec   Loss 1.2821   LearningRate 0.0002   Epoch: 24   Global Step: 42390   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-03-06 03:41:56,082-Speed 9341.40 samples/sec   Loss 1.2753   LearningRate 0.0002   Epoch: 24   Global Step: 42400   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-03-06 03:42:22,303-Speed 9372.89 samples/sec   Loss 1.2745   LearningRate 0.0002   Epoch: 24   Global Step: 42410   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-03-06 03:42:48,509-Speed 9378.56 samples/sec   Loss 1.2762   LearningRate 0.0002   Epoch: 24   Global Step: 42420   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-03-06 03:43:14,859-Speed 9327.30 samples/sec   Loss 1.2715   LearningRate 0.0002   Epoch: 24   Global Step: 42430   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-06 03:43:41,173-Speed 9339.90 samples/sec   Loss 1.2863   LearningRate 0.0002   Epoch: 24   Global Step: 42440   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-06 03:44:07,458-Speed 9350.35 samples/sec   Loss 1.2819   LearningRate 0.0002   Epoch: 24   Global Step: 42450   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-06 03:44:33,651-Speed 9383.19 samples/sec   Loss 1.2840   LearningRate 0.0002   Epoch: 24   Global Step: 42460   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-06 03:44:59,859-Speed 9377.78 samples/sec   Loss 1.2806   LearningRate 0.0002   Epoch: 24   Global Step: 42470   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-06 03:45:26,020-Speed 9394.46 samples/sec   Loss 1.2808   LearningRate 0.0002   Epoch: 24   Global Step: 42480   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-06 03:45:52,164-Speed 9400.48 samples/sec   Loss 1.2788   LearningRate 0.0002   Epoch: 24   Global Step: 42490   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-06 03:46:18,276-Speed 9412.52 samples/sec   Loss 1.2858   LearningRate 0.0002   Epoch: 24   Global Step: 42500   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-06 03:46:44,367-Speed 9419.66 samples/sec   Loss 1.2767   LearningRate 0.0002   Epoch: 24   Global Step: 42510   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-03-06 03:47:10,629-Speed 9358.47 samples/sec   Loss 1.2813   LearningRate 0.0002   Epoch: 24   Global Step: 42520   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-03-06 03:47:36,756-Speed 9406.47 samples/sec   Loss 1.2809   LearningRate 0.0002   Epoch: 24   Global Step: 42530   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-03-06 03:48:03,010-Speed 9361.39 samples/sec   Loss 1.2722   LearningRate 0.0002   Epoch: 24   Global Step: 42540   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-03-06 03:48:29,348-Speed 9331.44 samples/sec   Loss 1.2685   LearningRate 0.0002   Epoch: 24   Global Step: 42550   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-03-06 03:48:55,662-Speed 9340.03 samples/sec   Loss 1.2709   LearningRate 0.0002   Epoch: 24   Global Step: 42560   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-03-06 03:49:21,938-Speed 9353.37 samples/sec   Loss 1.2733   LearningRate 0.0002   Epoch: 24   Global Step: 42570   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-03-06 03:49:48,073-Speed 9404.01 samples/sec   Loss 1.2707   LearningRate 0.0002   Epoch: 24   Global Step: 42580   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-03-06 03:50:14,308-Speed 9367.91 samples/sec   Loss 1.2773   LearningRate 0.0002   Epoch: 24   Global Step: 42590   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-03-06 03:50:40,483-Speed 9389.79 samples/sec   Loss 1.2814   LearningRate 0.0002   Epoch: 24   Global Step: 42600   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-03-06 03:51:06,642-Speed 9395.27 samples/sec   Loss 1.2776   LearningRate 0.0002   Epoch: 24   Global Step: 42610   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-06 03:51:32,752-Speed 9412.64 samples/sec   Loss 1.2755   LearningRate 0.0002   Epoch: 24   Global Step: 42620   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-06 03:51:58,957-Speed 9378.84 samples/sec   Loss 1.2701   LearningRate 0.0002   Epoch: 24   Global Step: 42630   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-06 03:52:25,321-Speed 9322.91 samples/sec   Loss 1.2690   LearningRate 0.0002   Epoch: 24   Global Step: 42640   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-06 03:52:51,544-Speed 9372.60 samples/sec   Loss 1.2705   LearningRate 0.0002   Epoch: 24   Global Step: 42650   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-06 03:53:17,711-Speed 9392.45 samples/sec   Loss 1.2709   LearningRate 0.0002   Epoch: 24   Global Step: 42660   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-06 03:53:43,904-Speed 9383.19 samples/sec   Loss 1.2639   LearningRate 0.0002   Epoch: 24   Global Step: 42670   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-06 03:54:10,186-Speed 9351.60 samples/sec   Loss 1.2727   LearningRate 0.0002   Epoch: 24   Global Step: 42680   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-06 03:54:36,417-Speed 9369.46 samples/sec   Loss 1.2635   LearningRate 0.0002   Epoch: 24   Global Step: 42690   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-06 03:55:02,627-Speed 9376.89 samples/sec   Loss 1.2728   LearningRate 0.0002   Epoch: 24   Global Step: 42700   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-06 03:55:28,750-Speed 9408.30 samples/sec   Loss 1.2714   LearningRate 0.0002   Epoch: 24   Global Step: 42710   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 03:55:54,930-Speed 9387.97 samples/sec   Loss 1.2681   LearningRate 0.0002   Epoch: 24   Global Step: 42720   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 03:56:21,048-Speed 9410.62 samples/sec   Loss 1.2616   LearningRate 0.0002   Epoch: 24   Global Step: 42730   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 03:56:47,274-Speed 9371.30 samples/sec   Loss 1.2673   LearningRate 0.0002   Epoch: 24   Global Step: 42740   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 03:57:13,433-Speed 9395.22 samples/sec   Loss 1.2571   LearningRate 0.0002   Epoch: 24   Global Step: 42750   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 03:57:39,650-Speed 9374.29 samples/sec   Loss 1.2735   LearningRate 0.0002   Epoch: 24   Global Step: 42760   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 03:58:05,807-Speed 9395.86 samples/sec   Loss 1.2607   LearningRate 0.0002   Epoch: 24   Global Step: 42770   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 03:58:31,934-Speed 9406.96 samples/sec   Loss 1.2704   LearningRate 0.0002   Epoch: 24   Global Step: 42780   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 03:58:58,057-Speed 9408.02 samples/sec   Loss 1.2605   LearningRate 0.0002   Epoch: 24   Global Step: 42790   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 03:59:24,225-Speed 9392.09 samples/sec   Loss 1.2620   LearningRate 0.0002   Epoch: 24   Global Step: 42800   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 03:59:50,382-Speed 9395.97 samples/sec   Loss 1.2730   LearningRate 0.0002   Epoch: 24   Global Step: 42810   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-03-06 04:00:16,526-Speed 9400.50 samples/sec   Loss 1.2644   LearningRate 0.0002   Epoch: 24   Global Step: 42820   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-03-06 04:00:42,622-Speed 9418.25 samples/sec   Loss 1.2622   LearningRate 0.0002   Epoch: 24   Global Step: 42830   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 04:01:08,736-Speed 9411.37 samples/sec   Loss 1.2604   LearningRate 0.0002   Epoch: 24   Global Step: 42840   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 04:01:34,794-Speed 9431.80 samples/sec   Loss 1.2592   LearningRate 0.0002   Epoch: 24   Global Step: 42850   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 04:02:00,949-Speed 9396.44 samples/sec   Loss 1.2675   LearningRate 0.0002   Epoch: 24   Global Step: 42860   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 04:02:27,073-Speed 9408.22 samples/sec   Loss 1.2686   LearningRate 0.0002   Epoch: 24   Global Step: 42870   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 04:02:53,290-Speed 9375.33 samples/sec   Loss 1.2557   LearningRate 0.0002   Epoch: 24   Global Step: 42880   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 04:03:19,390-Speed 9416.47 samples/sec   Loss 1.2592   LearningRate 0.0002   Epoch: 24   Global Step: 42890   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 04:03:45,532-Speed 9401.20 samples/sec   Loss 1.2651   LearningRate 0.0002   Epoch: 24   Global Step: 42900   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 04:04:11,675-Speed 9401.27 samples/sec   Loss 1.2655   LearningRate 0.0002   Epoch: 24   Global Step: 42910   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 04:04:37,817-Speed 9401.18 samples/sec   Loss 1.2636   LearningRate 0.0002   Epoch: 24   Global Step: 42920   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 04:05:03,914-Speed 9418.59 samples/sec   Loss 1.2655   LearningRate 0.0002   Epoch: 24   Global Step: 42930   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 04:05:30,021-Speed 9414.16 samples/sec   Loss 1.2619   LearningRate 0.0002   Epoch: 24   Global Step: 42940   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 04:05:56,101-Speed 9423.68 samples/sec   Loss 1.2646   LearningRate 0.0002   Epoch: 24   Global Step: 42950   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 04:06:22,204-Speed 9415.35 samples/sec   Loss 1.2631   LearningRate 0.0002   Epoch: 24   Global Step: 42960   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 04:06:48,262-Speed 9431.70 samples/sec   Loss 1.2674   LearningRate 0.0002   Epoch: 24   Global Step: 42970   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 04:07:14,439-Speed 9388.54 samples/sec   Loss 1.2652   LearningRate 0.0002   Epoch: 24   Global Step: 42980   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 04:07:40,539-Speed 9416.53 samples/sec   Loss 1.2521   LearningRate 0.0002   Epoch: 24   Global Step: 42990   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 04:08:06,654-Speed 9411.11 samples/sec   Loss 1.2594   LearningRate 0.0002   Epoch: 24   Global Step: 43000   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 04:08:32,762-Speed 9413.48 samples/sec   Loss 1.2608   LearningRate 0.0002   Epoch: 24   Global Step: 43010   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 04:08:58,896-Speed 9404.48 samples/sec   Loss 1.2579   LearningRate 0.0002   Epoch: 24   Global Step: 43020   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 04:09:25,056-Speed 9394.91 samples/sec   Loss 1.2506   LearningRate 0.0002   Epoch: 24   Global Step: 43030   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-03-06 04:09:51,260-Speed 9379.03 samples/sec   Loss 1.2537   LearningRate 0.0002   Epoch: 24   Global Step: 43040   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-03-06 04:10:17,479-Speed 9374.05 samples/sec   Loss 1.2646   LearningRate 0.0002   Epoch: 24   Global Step: 43050   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-03-06 04:10:43,647-Speed 9391.98 samples/sec   Loss 1.2559   LearningRate 0.0002   Epoch: 24   Global Step: 43060   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-03-06 04:11:09,839-Speed 9383.42 samples/sec   Loss 1.2567   LearningRate 0.0002   Epoch: 24   Global Step: 43070   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 04:11:35,937-Speed 9417.28 samples/sec   Loss 1.2507   LearningRate 0.0002   Epoch: 24   Global Step: 43080   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 04:12:02,157-Speed 9373.39 samples/sec   Loss 1.2640   LearningRate 0.0002   Epoch: 24   Global Step: 43090   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 04:12:28,271-Speed 9411.55 samples/sec   Loss 1.2629   LearningRate 0.0002   Epoch: 24   Global Step: 43100   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 04:12:54,471-Speed 9380.78 samples/sec   Loss 1.2487   LearningRate 0.0002   Epoch: 24   Global Step: 43110   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 04:13:20,584-Speed 9411.80 samples/sec   Loss 1.2510   LearningRate 0.0002   Epoch: 24   Global Step: 43120   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-06 04:13:46,705-Speed 9409.16 samples/sec   Loss 1.2666   LearningRate 0.0002   Epoch: 24   Global Step: 43130   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-06 04:14:12,899-Speed 9382.40 samples/sec   Loss 1.2593   LearningRate 0.0002   Epoch: 24   Global Step: 43140   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-06 04:14:39,007-Speed 9414.04 samples/sec   Loss 1.2666   LearningRate 0.0002   Epoch: 24   Global Step: 43150   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-06 04:15:05,145-Speed 9402.74 samples/sec   Loss 1.2687   LearningRate 0.0002   Epoch: 24   Global Step: 43160   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-06 04:15:31,298-Speed 9398.15 samples/sec   Loss 1.2599   LearningRate 0.0002   Epoch: 24   Global Step: 43170   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-06 04:15:57,531-Speed 9369.10 samples/sec   Loss 1.2566   LearningRate 0.0002   Epoch: 24   Global Step: 43180   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-06 04:16:23,747-Speed 9374.53 samples/sec   Loss 1.2648   LearningRate 0.0002   Epoch: 24   Global Step: 43190   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-06 04:16:49,928-Speed 9387.60 samples/sec   Loss 1.2718   LearningRate 0.0002   Epoch: 24   Global Step: 43200   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-06 04:18:07,553-Speed 3166.07 samples/sec   Loss 1.2672   LearningRate 0.0002   Epoch: 25   Global Step: 43210   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-06 04:18:33,616-Speed 9429.71 samples/sec   Loss 1.2387   LearningRate 0.0002   Epoch: 25   Global Step: 43220   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-06 04:18:59,727-Speed 9412.55 samples/sec   Loss 1.2556   LearningRate 0.0002   Epoch: 25   Global Step: 43230   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-06 04:19:25,893-Speed 9393.16 samples/sec   Loss 1.2375   LearningRate 0.0002   Epoch: 25   Global Step: 43240   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-06 04:19:52,314-Speed 9302.17 samples/sec   Loss 1.2389   LearningRate 0.0002   Epoch: 25   Global Step: 43250   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-06 04:20:18,737-Speed 9301.26 samples/sec   Loss 1.2425   LearningRate 0.0002   Epoch: 25   Global Step: 43260   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-06 04:20:45,029-Speed 9347.68 samples/sec   Loss 1.2490   LearningRate 0.0002   Epoch: 25   Global Step: 43270   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-06 04:21:11,286-Speed 9360.40 samples/sec   Loss 1.2398   LearningRate 0.0002   Epoch: 25   Global Step: 43280   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-06 04:21:37,840-Speed 9255.48 samples/sec   Loss 1.2370   LearningRate 0.0002   Epoch: 25   Global Step: 43290   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-06 04:22:04,260-Speed 9302.56 samples/sec   Loss 1.2458   LearningRate 0.0002   Epoch: 25   Global Step: 43300   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-06 04:22:30,799-Speed 9260.68 samples/sec   Loss 1.2457   LearningRate 0.0002   Epoch: 25   Global Step: 43310   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-06 04:22:57,171-Speed 9319.27 samples/sec   Loss 1.2406   LearningRate 0.0002   Epoch: 25   Global Step: 43320   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-06 04:23:23,635-Speed 9287.30 samples/sec   Loss 1.2462   LearningRate 0.0002   Epoch: 25   Global Step: 43330   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-06 04:23:49,977-Speed 9329.88 samples/sec   Loss 1.2453   LearningRate 0.0002   Epoch: 25   Global Step: 43340   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-06 04:24:16,270-Speed 9347.35 samples/sec   Loss 1.2416   LearningRate 0.0002   Epoch: 25   Global Step: 43350   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 04:24:42,834-Speed 9252.40 samples/sec   Loss 1.2366   LearningRate 0.0002   Epoch: 25   Global Step: 43360   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 04:25:09,367-Speed 9262.82 samples/sec   Loss 1.2449   LearningRate 0.0002   Epoch: 25   Global Step: 43370   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 04:25:35,972-Speed 9237.74 samples/sec   Loss 1.2440   LearningRate 0.0002   Epoch: 25   Global Step: 43380   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 04:26:02,503-Speed 9263.28 samples/sec   Loss 1.2421   LearningRate 0.0002   Epoch: 25   Global Step: 43390   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 04:26:29,062-Speed 9253.85 samples/sec   Loss 1.2505   LearningRate 0.0002   Epoch: 25   Global Step: 43400   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 04:26:55,589-Speed 9264.88 samples/sec   Loss 1.2419   LearningRate 0.0002   Epoch: 25   Global Step: 43410   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 04:27:22,131-Speed 9259.77 samples/sec   Loss 1.2452   LearningRate 0.0002   Epoch: 25   Global Step: 43420   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 04:27:48,715-Speed 9245.19 samples/sec   Loss 1.2365   LearningRate 0.0002   Epoch: 25   Global Step: 43430   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 04:28:15,272-Speed 9254.60 samples/sec   Loss 1.2308   LearningRate 0.0002   Epoch: 25   Global Step: 43440   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 04:28:41,790-Speed 9268.31 samples/sec   Loss 1.2499   LearningRate 0.0002   Epoch: 25   Global Step: 43450   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-03-06 04:29:08,242-Speed 9291.22 samples/sec   Loss 1.2396   LearningRate 0.0002   Epoch: 25   Global Step: 43460   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-03-06 04:29:34,787-Speed 9258.68 samples/sec   Loss 1.2428   LearningRate 0.0002   Epoch: 25   Global Step: 43470   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-03-06 04:30:01,286-Speed 9274.76 samples/sec   Loss 1.2346   LearningRate 0.0002   Epoch: 25   Global Step: 43480   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-03-06 04:30:27,619-Speed 9333.34 samples/sec   Loss 1.2359   LearningRate 0.0002   Epoch: 25   Global Step: 43490   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 04:30:54,171-Speed 9256.09 samples/sec   Loss 1.2366   LearningRate 0.0002   Epoch: 25   Global Step: 43500   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 04:31:20,655-Speed 9279.89 samples/sec   Loss 1.2363   LearningRate 0.0002   Epoch: 25   Global Step: 43510   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 04:31:47,233-Speed 9247.19 samples/sec   Loss 1.2340   LearningRate 0.0002   Epoch: 25   Global Step: 43520   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 04:32:13,629-Speed 9310.81 samples/sec   Loss 1.2403   LearningRate 0.0002   Epoch: 25   Global Step: 43530   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-06 04:32:40,004-Speed 9318.44 samples/sec   Loss 1.2302   LearningRate 0.0002   Epoch: 25   Global Step: 43540   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-06 04:33:06,496-Speed 9277.06 samples/sec   Loss 1.2419   LearningRate 0.0002   Epoch: 25   Global Step: 43550   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-06 04:33:33,055-Speed 9253.84 samples/sec   Loss 1.2435   LearningRate 0.0002   Epoch: 25   Global Step: 43560   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-06 04:33:59,651-Speed 9241.06 samples/sec   Loss 1.2369   LearningRate 0.0002   Epoch: 25   Global Step: 43570   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-06 04:34:26,255-Speed 9238.18 samples/sec   Loss 1.2389   LearningRate 0.0002   Epoch: 25   Global Step: 43580   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-06 04:34:52,723-Speed 9285.61 samples/sec   Loss 1.2491   LearningRate 0.0002   Epoch: 25   Global Step: 43590   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-06 04:35:19,210-Speed 9278.53 samples/sec   Loss 1.2358   LearningRate 0.0002   Epoch: 25   Global Step: 43600   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-06 04:35:45,597-Speed 9314.98 samples/sec   Loss 1.2413   LearningRate 0.0002   Epoch: 25   Global Step: 43610   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-06 04:36:12,121-Speed 9265.96 samples/sec   Loss 1.2393   LearningRate 0.0002   Epoch: 25   Global Step: 43620   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-06 04:36:38,234-Speed 9411.81 samples/sec   Loss 1.2399   LearningRate 0.0002   Epoch: 25   Global Step: 43630   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 04:37:04,366-Speed 9404.88 samples/sec   Loss 1.2329   LearningRate 0.0002   Epoch: 25   Global Step: 43640   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 04:37:30,451-Speed 9422.28 samples/sec   Loss 1.2228   LearningRate 0.0002   Epoch: 25   Global Step: 43650   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-06 04:37:56,593-Speed 9401.44 samples/sec   Loss 1.2336   LearningRate 0.0002   Epoch: 25   Global Step: 43660   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-06 04:38:22,850-Speed 9360.32 samples/sec   Loss 1.2401   LearningRate 0.0002   Epoch: 25   Global Step: 43670   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-06 04:38:49,065-Speed 9375.32 samples/sec   Loss 1.2424   LearningRate 0.0002   Epoch: 25   Global Step: 43680   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-06 04:39:15,306-Speed 9366.07 samples/sec   Loss 1.2302   LearningRate 0.0002   Epoch: 25   Global Step: 43690   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-06 04:39:41,471-Speed 9392.85 samples/sec   Loss 1.2449   LearningRate 0.0002   Epoch: 25   Global Step: 43700   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-06 04:40:07,698-Speed 9371.00 samples/sec   Loss 1.2385   LearningRate 0.0002   Epoch: 25   Global Step: 43710   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-06 04:40:33,868-Speed 9391.41 samples/sec   Loss 1.2387   LearningRate 0.0002   Epoch: 25   Global Step: 43720   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-06 04:41:00,102-Speed 9368.30 samples/sec   Loss 1.2366   LearningRate 0.0002   Epoch: 25   Global Step: 43730   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-06 04:41:26,286-Speed 9386.45 samples/sec   Loss 1.2269   LearningRate 0.0002   Epoch: 25   Global Step: 43740   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-06 04:41:52,515-Speed 9370.10 samples/sec   Loss 1.2382   LearningRate 0.0002   Epoch: 25   Global Step: 43750   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-06 04:42:18,773-Speed 9360.15 samples/sec   Loss 1.2334   LearningRate 0.0002   Epoch: 25   Global Step: 43760   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-06 04:42:45,007-Speed 9368.46 samples/sec   Loss 1.2314   LearningRate 0.0002   Epoch: 25   Global Step: 43770   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-06 04:43:11,200-Speed 9382.93 samples/sec   Loss 1.2305   LearningRate 0.0002   Epoch: 25   Global Step: 43780   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-06 04:43:37,499-Speed 9345.36 samples/sec   Loss 1.2387   LearningRate 0.0002   Epoch: 25   Global Step: 43790   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-06 04:44:03,717-Speed 9374.24 samples/sec   Loss 1.2277   LearningRate 0.0002   Epoch: 25   Global Step: 43800   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-06 04:44:29,953-Speed 9367.52 samples/sec   Loss 1.2201   LearningRate 0.0002   Epoch: 25   Global Step: 43810   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-06 04:44:56,097-Speed 9400.86 samples/sec   Loss 1.2336   LearningRate 0.0002   Epoch: 25   Global Step: 43820   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-06 04:45:22,215-Speed 9409.92 samples/sec   Loss 1.2256   LearningRate 0.0002   Epoch: 25   Global Step: 43830   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-06 04:45:48,312-Speed 9417.47 samples/sec   Loss 1.2265   LearningRate 0.0002   Epoch: 25   Global Step: 43840   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-06 04:46:14,725-Speed 9305.12 samples/sec   Loss 1.2324   LearningRate 0.0002   Epoch: 25   Global Step: 43850   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-06 04:46:41,016-Speed 9348.14 samples/sec   Loss 1.2357   LearningRate 0.0002   Epoch: 25   Global Step: 43860   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 04:47:07,268-Speed 9361.78 samples/sec   Loss 1.2283   LearningRate 0.0002   Epoch: 25   Global Step: 43870   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 04:47:33,505-Speed 9368.14 samples/sec   Loss 1.2250   LearningRate 0.0002   Epoch: 25   Global Step: 43880   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 04:47:59,808-Speed 9343.49 samples/sec   Loss 1.2300   LearningRate 0.0002   Epoch: 25   Global Step: 43890   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 04:48:26,144-Speed 9332.17 samples/sec   Loss 1.2231   LearningRate 0.0002   Epoch: 25   Global Step: 43900   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 04:48:52,355-Speed 9376.42 samples/sec   Loss 1.2304   LearningRate 0.0002   Epoch: 25   Global Step: 43910   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 04:49:18,548-Speed 9383.23 samples/sec   Loss 1.2319   LearningRate 0.0002   Epoch: 25   Global Step: 43920   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 04:49:44,744-Speed 9381.93 samples/sec   Loss 1.2255   LearningRate 0.0002   Epoch: 25   Global Step: 43930   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 04:50:11,067-Speed 9336.70 samples/sec   Loss 1.2284   LearningRate 0.0002   Epoch: 25   Global Step: 43940   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 04:50:37,277-Speed 9377.11 samples/sec   Loss 1.2341   LearningRate 0.0002   Epoch: 25   Global Step: 43950   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-06 04:51:03,550-Speed 9355.74 samples/sec   Loss 1.2307   LearningRate 0.0002   Epoch: 25   Global Step: 43960   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-03-06 04:51:29,880-Speed 9334.28 samples/sec   Loss 1.2231   LearningRate 0.0002   Epoch: 25   Global Step: 43970   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-06 04:51:56,162-Speed 9351.39 samples/sec   Loss 1.2291   LearningRate 0.0002   Epoch: 25   Global Step: 43980   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-06 04:52:22,277-Speed 9410.86 samples/sec   Loss 1.2306   LearningRate 0.0002   Epoch: 25   Global Step: 43990   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-06 04:52:48,513-Speed 9367.60 samples/sec   Loss 1.2317   LearningRate 0.0002   Epoch: 25   Global Step: 44000   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-06 04:53:14,911-Speed 9310.37 samples/sec   Loss 1.2230   LearningRate 0.0002   Epoch: 25   Global Step: 44010   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-06 04:53:41,428-Speed 9268.37 samples/sec   Loss 1.2309   LearningRate 0.0002   Epoch: 25   Global Step: 44020   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-06 04:54:07,995-Speed 9250.87 samples/sec   Loss 1.2307   LearningRate 0.0002   Epoch: 25   Global Step: 44030   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-06 04:54:34,429-Speed 9297.65 samples/sec   Loss 1.2197   LearningRate 0.0002   Epoch: 25   Global Step: 44040   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-06 04:55:00,722-Speed 9347.45 samples/sec   Loss 1.2251   LearningRate 0.0002   Epoch: 25   Global Step: 44050   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-06 04:55:27,165-Speed 9294.72 samples/sec   Loss 1.2191   LearningRate 0.0002   Epoch: 25   Global Step: 44060   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-06 04:55:53,472-Speed 9342.17 samples/sec   Loss 1.2126   LearningRate 0.0002   Epoch: 25   Global Step: 44070   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 04:56:19,976-Speed 9272.95 samples/sec   Loss 1.2260   LearningRate 0.0002   Epoch: 25   Global Step: 44080   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 04:56:46,545-Speed 9250.47 samples/sec   Loss 1.2303   LearningRate 0.0002   Epoch: 25   Global Step: 44090   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 04:57:13,199-Speed 9220.52 samples/sec   Loss 1.2192   LearningRate 0.0002   Epoch: 25   Global Step: 44100   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 04:57:39,701-Speed 9273.66 samples/sec   Loss 1.2167   LearningRate 0.0002   Epoch: 25   Global Step: 44110   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 04:58:06,624-Speed 9128.95 samples/sec   Loss 1.2228   LearningRate 0.0002   Epoch: 25   Global Step: 44120   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 04:58:33,209-Speed 9244.55 samples/sec   Loss 1.2217   LearningRate 0.0002   Epoch: 25   Global Step: 44130   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 04:58:59,773-Speed 9251.82 samples/sec   Loss 1.2204   LearningRate 0.0002   Epoch: 25   Global Step: 44140   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 04:59:26,274-Speed 9275.11 samples/sec   Loss 1.2153   LearningRate 0.0002   Epoch: 25   Global Step: 44150   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 04:59:52,914-Speed 9225.45 samples/sec   Loss 1.2229   LearningRate 0.0002   Epoch: 25   Global Step: 44160   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 05:00:19,478-Speed 9252.28 samples/sec   Loss 1.2040   LearningRate 0.0002   Epoch: 25   Global Step: 44170   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-06 05:00:45,797-Speed 9338.03 samples/sec   Loss 1.2178   LearningRate 0.0002   Epoch: 25   Global Step: 44180   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-06 05:01:12,010-Speed 9375.92 samples/sec   Loss 1.2145   LearningRate 0.0002   Epoch: 25   Global Step: 44190   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-06 05:01:38,232-Speed 9372.63 samples/sec   Loss 1.2116   LearningRate 0.0002   Epoch: 25   Global Step: 44200   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-06 05:02:04,491-Speed 9359.47 samples/sec   Loss 1.2192   LearningRate 0.0002   Epoch: 25   Global Step: 44210   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-06 05:02:30,705-Speed 9375.45 samples/sec   Loss 1.2171   LearningRate 0.0002   Epoch: 25   Global Step: 44220   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-06 05:02:56,884-Speed 9388.09 samples/sec   Loss 1.2062   LearningRate 0.0002   Epoch: 25   Global Step: 44230   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-06 05:03:23,038-Speed 9397.01 samples/sec   Loss 1.2138   LearningRate 0.0002   Epoch: 25   Global Step: 44240   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-06 05:03:49,179-Speed 9401.40 samples/sec   Loss 1.2131   LearningRate 0.0002   Epoch: 25   Global Step: 44250   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-06 05:04:15,294-Speed 9411.17 samples/sec   Loss 1.2170   LearningRate 0.0002   Epoch: 25   Global Step: 44260   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-06 05:04:41,336-Speed 9437.45 samples/sec   Loss 1.2246   LearningRate 0.0002   Epoch: 25   Global Step: 44270   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 05:05:07,498-Speed 9394.29 samples/sec   Loss 1.2133   LearningRate 0.0002   Epoch: 25   Global Step: 44280   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 05:05:33,659-Speed 9394.41 samples/sec   Loss 1.2167   LearningRate 0.0002   Epoch: 25   Global Step: 44290   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 05:05:59,936-Speed 9352.98 samples/sec   Loss 1.2055   LearningRate 0.0002   Epoch: 25   Global Step: 44300   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 05:06:26,147-Speed 9376.60 samples/sec   Loss 1.2014   LearningRate 0.0002   Epoch: 25   Global Step: 44310   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 05:06:52,338-Speed 9383.78 samples/sec   Loss 1.2057   LearningRate 0.0002   Epoch: 25   Global Step: 44320   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 05:07:18,446-Speed 9413.88 samples/sec   Loss 1.2176   LearningRate 0.0002   Epoch: 25   Global Step: 44330   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 05:07:44,665-Speed 9373.37 samples/sec   Loss 1.2055   LearningRate 0.0002   Epoch: 25   Global Step: 44340   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 05:08:10,910-Speed 9364.57 samples/sec   Loss 1.2076   LearningRate 0.0002   Epoch: 25   Global Step: 44350   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 05:08:37,137-Speed 9370.94 samples/sec   Loss 1.2102   LearningRate 0.0002   Epoch: 25   Global Step: 44360   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 05:09:03,378-Speed 9366.23 samples/sec   Loss 1.2094   LearningRate 0.0002   Epoch: 25   Global Step: 44370   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-06 05:09:29,583-Speed 9378.81 samples/sec   Loss 1.2090   LearningRate 0.0002   Epoch: 25   Global Step: 44380   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-06 05:09:55,828-Speed 9364.36 samples/sec   Loss 1.2080   LearningRate 0.0002   Epoch: 25   Global Step: 44390   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-06 05:10:22,133-Speed 9342.88 samples/sec   Loss 1.2026   LearningRate 0.0002   Epoch: 25   Global Step: 44400   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-06 05:10:48,516-Speed 9315.83 samples/sec   Loss 1.2050   LearningRate 0.0002   Epoch: 25   Global Step: 44410   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-06 05:11:14,717-Speed 9379.99 samples/sec   Loss 1.2105   LearningRate 0.0002   Epoch: 25   Global Step: 44420   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 05:11:41,109-Speed 9312.31 samples/sec   Loss 1.2093   LearningRate 0.0002   Epoch: 25   Global Step: 44430   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 05:12:07,546-Speed 9296.56 samples/sec   Loss 1.2085   LearningRate 0.0002   Epoch: 25   Global Step: 44440   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 05:12:33,998-Speed 9291.28 samples/sec   Loss 1.2048   LearningRate 0.0002   Epoch: 25   Global Step: 44450   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 05:13:00,443-Speed 9293.42 samples/sec   Loss 1.2013   LearningRate 0.0002   Epoch: 25   Global Step: 44460   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 05:13:26,667-Speed 9371.97 samples/sec   Loss 1.2125   LearningRate 0.0002   Epoch: 25   Global Step: 44470   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 05:13:52,784-Speed 9411.33 samples/sec   Loss 1.2014   LearningRate 0.0002   Epoch: 25   Global Step: 44480   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 05:14:18,905-Speed 9408.98 samples/sec   Loss 1.2052   LearningRate 0.0002   Epoch: 25   Global Step: 44490   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 05:14:45,119-Speed 9375.69 samples/sec   Loss 1.2028   LearningRate 0.0002   Epoch: 25   Global Step: 44500   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 05:15:11,277-Speed 9395.87 samples/sec   Loss 1.2066   LearningRate 0.0002   Epoch: 25   Global Step: 44510   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 05:15:37,434-Speed 9395.58 samples/sec   Loss 1.2039   LearningRate 0.0002   Epoch: 25   Global Step: 44520   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-06 05:16:03,570-Speed 9403.86 samples/sec   Loss 1.2096   LearningRate 0.0002   Epoch: 25   Global Step: 44530   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-06 05:16:29,743-Speed 9390.16 samples/sec   Loss 1.2109   LearningRate 0.0002   Epoch: 25   Global Step: 44540   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 05:16:55,903-Speed 9394.73 samples/sec   Loss 1.2053   LearningRate 0.0002   Epoch: 25   Global Step: 44550   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 05:17:22,061-Speed 9395.68 samples/sec   Loss 1.2027   LearningRate 0.0002   Epoch: 25   Global Step: 44560   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 05:17:48,194-Speed 9404.75 samples/sec   Loss 1.2081   LearningRate 0.0002   Epoch: 25   Global Step: 44570   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 05:18:14,267-Speed 9426.09 samples/sec   Loss 1.2105   LearningRate 0.0002   Epoch: 25   Global Step: 44580   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-06 05:18:40,383-Speed 9410.90 samples/sec   Loss 1.2005   LearningRate 0.0002   Epoch: 25   Global Step: 44590   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-06 05:19:06,596-Speed 9375.90 samples/sec   Loss 1.2026   LearningRate 0.0002   Epoch: 25   Global Step: 44600   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-06 05:19:32,753-Speed 9395.79 samples/sec   Loss 1.1974   LearningRate 0.0002   Epoch: 25   Global Step: 44610   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-06 05:19:58,983-Speed 9370.19 samples/sec   Loss 1.2090   LearningRate 0.0002   Epoch: 25   Global Step: 44620   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-06 05:20:25,189-Speed 9378.44 samples/sec   Loss 1.2045   LearningRate 0.0002   Epoch: 25   Global Step: 44630   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-06 05:20:51,336-Speed 9399.39 samples/sec   Loss 1.2005   LearningRate 0.0002   Epoch: 25   Global Step: 44640   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-06 05:21:17,463-Speed 9406.95 samples/sec   Loss 1.1994   LearningRate 0.0002   Epoch: 25   Global Step: 44650   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-06 05:21:43,663-Speed 9380.24 samples/sec   Loss 1.2044   LearningRate 0.0002   Epoch: 25   Global Step: 44660   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-06 05:22:09,955-Speed 9348.05 samples/sec   Loss 1.2035   LearningRate 0.0002   Epoch: 25   Global Step: 44670   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-06 05:22:36,146-Speed 9383.69 samples/sec   Loss 1.1984   LearningRate 0.0002   Epoch: 25   Global Step: 44680   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 05:23:02,558-Speed 9305.16 samples/sec   Loss 1.2034   LearningRate 0.0002   Epoch: 25   Global Step: 44690   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 05:23:28,941-Speed 9315.22 samples/sec   Loss 1.1998   LearningRate 0.0002   Epoch: 25   Global Step: 44700   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 05:23:55,376-Speed 9297.17 samples/sec   Loss 1.1953   LearningRate 0.0002   Epoch: 25   Global Step: 44710   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 05:24:21,754-Speed 9317.51 samples/sec   Loss 1.1976   LearningRate 0.0002   Epoch: 25   Global Step: 44720   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 05:24:48,152-Speed 9310.04 samples/sec   Loss 1.2084   LearningRate 0.0002   Epoch: 25   Global Step: 44730   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 05:25:14,733-Speed 9246.05 samples/sec   Loss 1.2004   LearningRate 0.0002   Epoch: 25   Global Step: 44740   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 05:25:41,120-Speed 9313.92 samples/sec   Loss 1.1959   LearningRate 0.0002   Epoch: 25   Global Step: 44750   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 05:26:07,592-Speed 9284.45 samples/sec   Loss 1.1869   LearningRate 0.0002   Epoch: 25   Global Step: 44760   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 05:26:34,103-Speed 9270.20 samples/sec   Loss 1.1935   LearningRate 0.0002   Epoch: 25   Global Step: 44770   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 05:27:00,527-Speed 9301.15 samples/sec   Loss 1.1934   LearningRate 0.0002   Epoch: 25   Global Step: 44780   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-06 05:27:27,101-Speed 9248.64 samples/sec   Loss 1.1980   LearningRate 0.0002   Epoch: 25   Global Step: 44790   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-06 05:27:53,803-Speed 9204.43 samples/sec   Loss 1.2010   LearningRate 0.0002   Epoch: 25   Global Step: 44800   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-06 05:28:20,407-Speed 9238.05 samples/sec   Loss 1.2046   LearningRate 0.0002   Epoch: 25   Global Step: 44810   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-06 05:28:47,162-Speed 9185.83 samples/sec   Loss 1.2024   LearningRate 0.0002   Epoch: 25   Global Step: 44820   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-06 05:29:13,428-Speed 9357.13 samples/sec   Loss 1.1991   LearningRate 0.0002   Epoch: 25   Global Step: 44830   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-06 05:29:39,640-Speed 9376.32 samples/sec   Loss 1.1969   LearningRate 0.0002   Epoch: 25   Global Step: 44840   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-06 05:30:05,834-Speed 9382.53 samples/sec   Loss 1.2062   LearningRate 0.0002   Epoch: 25   Global Step: 44850   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-06 05:30:32,114-Speed 9352.07 samples/sec   Loss 1.2033   LearningRate 0.0002   Epoch: 25   Global Step: 44860   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-06 05:30:58,373-Speed 9360.17 samples/sec   Loss 1.1939   LearningRate 0.0002   Epoch: 25   Global Step: 44870   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-06 05:31:24,742-Speed 9320.35 samples/sec   Loss 1.2018   LearningRate 0.0002   Epoch: 25   Global Step: 44880   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-06 05:31:51,272-Speed 9263.94 samples/sec   Loss 1.2131   LearningRate 0.0002   Epoch: 25   Global Step: 44890   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-06 05:32:17,691-Speed 9302.94 samples/sec   Loss 1.2041   LearningRate 0.0002   Epoch: 25   Global Step: 44900   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-06 05:32:44,166-Speed 9283.07 samples/sec   Loss 1.1942   LearningRate 0.0002   Epoch: 25   Global Step: 44910   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-06 05:33:10,682-Speed 9268.54 samples/sec   Loss 1.2083   LearningRate 0.0002   Epoch: 25   Global Step: 44920   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-06 05:33:37,284-Speed 9239.05 samples/sec   Loss 1.2012   LearningRate 0.0002   Epoch: 25   Global Step: 44930   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-06 05:34:57,034-Speed 3081.65 samples/sec   Loss 1.1992   LearningRate 0.0002   Epoch: 26   Global Step: 44940   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-06 05:35:22,987-Speed 9469.93 samples/sec   Loss 1.1890   LearningRate 0.0002   Epoch: 26   Global Step: 44950   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-06 05:35:48,938-Speed 9470.61 samples/sec   Loss 1.1850   LearningRate 0.0002   Epoch: 26   Global Step: 44960   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-06 05:36:15,036-Speed 9417.67 samples/sec   Loss 1.1781   LearningRate 0.0002   Epoch: 26   Global Step: 44970   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-06 05:36:41,117-Speed 9423.44 samples/sec   Loss 1.1857   LearningRate 0.0002   Epoch: 26   Global Step: 44980   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-06 05:37:07,219-Speed 9415.86 samples/sec   Loss 1.1748   LearningRate 0.0002   Epoch: 26   Global Step: 44990   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-06 05:37:33,293-Speed 9425.67 samples/sec   Loss 1.1854   LearningRate 0.0002   Epoch: 26   Global Step: 45000   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-06 05:37:59,340-Speed 9435.60 samples/sec   Loss 1.1899   LearningRate 0.0002   Epoch: 26   Global Step: 45010   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-06 05:38:25,511-Speed 9391.10 samples/sec   Loss 1.1848   LearningRate 0.0002   Epoch: 26   Global Step: 45020   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-06 05:38:51,586-Speed 9425.64 samples/sec   Loss 1.1719   LearningRate 0.0001   Epoch: 26   Global Step: 45030   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-06 05:39:17,774-Speed 9385.01 samples/sec   Loss 1.1709   LearningRate 0.0001   Epoch: 26   Global Step: 45040   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-06 05:39:43,863-Speed 9420.37 samples/sec   Loss 1.1753   LearningRate 0.0001   Epoch: 26   Global Step: 45050   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-06 05:40:10,000-Speed 9404.12 samples/sec   Loss 1.1867   LearningRate 0.0001   Epoch: 26   Global Step: 45060   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-06 05:40:36,171-Speed 9391.07 samples/sec   Loss 1.1839   LearningRate 0.0001   Epoch: 26   Global Step: 45070   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-06 05:41:02,323-Speed 9397.75 samples/sec   Loss 1.1784   LearningRate 0.0001   Epoch: 26   Global Step: 45080   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 05:41:28,462-Speed 9402.71 samples/sec   Loss 1.1842   LearningRate 0.0001   Epoch: 26   Global Step: 45090   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 05:41:54,568-Speed 9414.18 samples/sec   Loss 1.1859   LearningRate 0.0001   Epoch: 26   Global Step: 45100   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 05:42:20,642-Speed 9425.71 samples/sec   Loss 1.1720   LearningRate 0.0001   Epoch: 26   Global Step: 45110   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 05:42:46,820-Speed 9388.48 samples/sec   Loss 1.1849   LearningRate 0.0001   Epoch: 26   Global Step: 45120   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 05:43:13,077-Speed 9360.32 samples/sec   Loss 1.1830   LearningRate 0.0001   Epoch: 26   Global Step: 45130   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 05:43:39,280-Speed 9379.20 samples/sec   Loss 1.1747   LearningRate 0.0001   Epoch: 26   Global Step: 45140   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 05:44:05,551-Speed 9355.25 samples/sec   Loss 1.1814   LearningRate 0.0001   Epoch: 26   Global Step: 45150   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 05:44:31,721-Speed 9391.12 samples/sec   Loss 1.1881   LearningRate 0.0001   Epoch: 26   Global Step: 45160   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 05:44:57,870-Speed 9398.86 samples/sec   Loss 1.1787   LearningRate 0.0001   Epoch: 26   Global Step: 45170   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 05:45:24,072-Speed 9380.02 samples/sec   Loss 1.1772   LearningRate 0.0001   Epoch: 26   Global Step: 45180   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-06 05:45:50,298-Speed 9371.18 samples/sec   Loss 1.1848   LearningRate 0.0001   Epoch: 26   Global Step: 45190   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-06 05:46:16,558-Speed 9359.15 samples/sec   Loss 1.1913   LearningRate 0.0001   Epoch: 26   Global Step: 45200   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-06 05:46:42,865-Speed 9342.39 samples/sec   Loss 1.1842   LearningRate 0.0001   Epoch: 26   Global Step: 45210   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-06 05:47:09,344-Speed 9281.86 samples/sec   Loss 1.1800   LearningRate 0.0001   Epoch: 26   Global Step: 45220   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-06 05:47:35,814-Speed 9285.12 samples/sec   Loss 1.1846   LearningRate 0.0001   Epoch: 26   Global Step: 45230   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-06 05:48:02,264-Speed 9291.69 samples/sec   Loss 1.1901   LearningRate 0.0001   Epoch: 26   Global Step: 45240   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-06 05:48:28,534-Speed 9356.15 samples/sec   Loss 1.1803   LearningRate 0.0001   Epoch: 26   Global Step: 45250   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 05:48:54,948-Speed 9304.33 samples/sec   Loss 1.1823   LearningRate 0.0001   Epoch: 26   Global Step: 45260   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 05:49:21,291-Speed 9329.78 samples/sec   Loss 1.1883   LearningRate 0.0001   Epoch: 26   Global Step: 45270   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 05:49:47,637-Speed 9328.62 samples/sec   Loss 1.1810   LearningRate 0.0001   Epoch: 26   Global Step: 45280   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 05:50:13,984-Speed 9328.40 samples/sec   Loss 1.1814   LearningRate 0.0001   Epoch: 26   Global Step: 45290   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 05:50:40,446-Speed 9287.55 samples/sec   Loss 1.1885   LearningRate 0.0001   Epoch: 26   Global Step: 45300   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 05:51:06,746-Speed 9345.17 samples/sec   Loss 1.1771   LearningRate 0.0001   Epoch: 26   Global Step: 45310   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 05:51:33,143-Speed 9310.39 samples/sec   Loss 1.1771   LearningRate 0.0001   Epoch: 26   Global Step: 45320   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 05:51:59,528-Speed 9315.09 samples/sec   Loss 1.1766   LearningRate 0.0001   Epoch: 26   Global Step: 45330   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-06 05:52:25,955-Speed 9299.87 samples/sec   Loss 1.1744   LearningRate 0.0001   Epoch: 26   Global Step: 45340   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-06 05:52:52,388-Speed 9297.87 samples/sec   Loss 1.1800   LearningRate 0.0001   Epoch: 26   Global Step: 45350   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 05:53:18,813-Speed 9300.73 samples/sec   Loss 1.1732   LearningRate 0.0001   Epoch: 26   Global Step: 45360   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 05:53:45,367-Speed 9256.20 samples/sec   Loss 1.1780   LearningRate 0.0001   Epoch: 26   Global Step: 45370   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 05:54:11,797-Speed 9298.99 samples/sec   Loss 1.1786   LearningRate 0.0001   Epoch: 26   Global Step: 45380   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 05:54:38,313-Speed 9268.62 samples/sec   Loss 1.1803   LearningRate 0.0001   Epoch: 26   Global Step: 45390   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 05:55:04,830-Speed 9268.48 samples/sec   Loss 1.1699   LearningRate 0.0001   Epoch: 26   Global Step: 45400   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 05:55:31,477-Speed 9223.31 samples/sec   Loss 1.1836   LearningRate 0.0001   Epoch: 26   Global Step: 45410   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 05:55:58,137-Speed 9218.87 samples/sec   Loss 1.1758   LearningRate 0.0001   Epoch: 26   Global Step: 45420   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 05:56:24,772-Speed 9227.23 samples/sec   Loss 1.1693   LearningRate 0.0001   Epoch: 26   Global Step: 45430   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 05:56:51,368-Speed 9240.99 samples/sec   Loss 1.1771   LearningRate 0.0001   Epoch: 26   Global Step: 45440   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-06 05:57:17,818-Speed 9291.91 samples/sec   Loss 1.1766   LearningRate 0.0001   Epoch: 26   Global Step: 45450   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-06 05:57:44,320-Speed 9273.71 samples/sec   Loss 1.1670   LearningRate 0.0001   Epoch: 26   Global Step: 45460   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-06 05:58:10,791-Speed 9284.73 samples/sec   Loss 1.1733   LearningRate 0.0001   Epoch: 26   Global Step: 45470   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-06 05:58:37,232-Speed 9295.04 samples/sec   Loss 1.1750   LearningRate 0.0001   Epoch: 26   Global Step: 45480   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-06 05:59:03,526-Speed 9347.02 samples/sec   Loss 1.1701   LearningRate 0.0001   Epoch: 26   Global Step: 45490   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-06 05:59:29,760-Speed 9368.59 samples/sec   Loss 1.1731   LearningRate 0.0001   Epoch: 26   Global Step: 45500   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-06 05:59:56,021-Speed 9359.02 samples/sec   Loss 1.1718   LearningRate 0.0001   Epoch: 26   Global Step: 45510   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-06 06:00:22,210-Speed 9384.09 samples/sec   Loss 1.1797   LearningRate 0.0001   Epoch: 26   Global Step: 45520   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-06 06:00:48,531-Speed 9337.48 samples/sec   Loss 1.1693   LearningRate 0.0001   Epoch: 26   Global Step: 45530   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-06 06:01:14,776-Speed 9364.56 samples/sec   Loss 1.1717   LearningRate 0.0001   Epoch: 26   Global Step: 45540   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-06 06:01:41,039-Speed 9357.97 samples/sec   Loss 1.1717   LearningRate 0.0001   Epoch: 26   Global Step: 45550   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-06 06:02:07,267-Speed 9370.90 samples/sec   Loss 1.1755   LearningRate 0.0001   Epoch: 26   Global Step: 45560   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-06 06:02:33,467-Speed 9380.33 samples/sec   Loss 1.1767   LearningRate 0.0001   Epoch: 26   Global Step: 45570   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-06 06:02:59,670-Speed 9379.39 samples/sec   Loss 1.1683   LearningRate 0.0001   Epoch: 26   Global Step: 45580   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-06 06:03:25,966-Speed 9346.58 samples/sec   Loss 1.1724   LearningRate 0.0001   Epoch: 26   Global Step: 45590   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-06 06:03:54,394-Speed 8645.40 samples/sec   Loss 1.1694   LearningRate 0.0001   Epoch: 26   Global Step: 45600   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-06 06:04:20,536-Speed 9401.46 samples/sec   Loss 1.1725   LearningRate 0.0001   Epoch: 26   Global Step: 45610   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-06 06:04:46,744-Speed 9377.71 samples/sec   Loss 1.1791   LearningRate 0.0001   Epoch: 26   Global Step: 45620   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-06 06:05:13,034-Speed 9348.03 samples/sec   Loss 1.1730   LearningRate 0.0001   Epoch: 26   Global Step: 45630   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-06 06:05:39,277-Speed 9365.29 samples/sec   Loss 1.1710   LearningRate 0.0001   Epoch: 26   Global Step: 45640   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:06:05,538-Speed 9358.73 samples/sec   Loss 1.1597   LearningRate 0.0001   Epoch: 26   Global Step: 45650   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:06:31,733-Speed 9382.47 samples/sec   Loss 1.1729   LearningRate 0.0001   Epoch: 26   Global Step: 45660   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:06:57,936-Speed 9379.31 samples/sec   Loss 1.1800   LearningRate 0.0001   Epoch: 26   Global Step: 45670   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:07:24,061-Speed 9407.51 samples/sec   Loss 1.1565   LearningRate 0.0001   Epoch: 26   Global Step: 45680   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:07:50,276-Speed 9375.52 samples/sec   Loss 1.1710   LearningRate 0.0001   Epoch: 26   Global Step: 45690   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:08:16,552-Speed 9353.33 samples/sec   Loss 1.1655   LearningRate 0.0001   Epoch: 26   Global Step: 45700   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:08:42,726-Speed 9390.26 samples/sec   Loss 1.1767   LearningRate 0.0001   Epoch: 26   Global Step: 45710   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:09:08,935-Speed 9377.41 samples/sec   Loss 1.1602   LearningRate 0.0001   Epoch: 26   Global Step: 45720   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:09:35,120-Speed 9385.71 samples/sec   Loss 1.1653   LearningRate 0.0001   Epoch: 26   Global Step: 45730   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:10:01,223-Speed 9415.39 samples/sec   Loss 1.1663   LearningRate 0.0001   Epoch: 26   Global Step: 45740   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:10:27,419-Speed 9381.94 samples/sec   Loss 1.1600   LearningRate 0.0001   Epoch: 26   Global Step: 45750   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:10:53,571-Speed 9397.87 samples/sec   Loss 1.1559   LearningRate 0.0001   Epoch: 26   Global Step: 45760   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:11:19,713-Speed 9401.38 samples/sec   Loss 1.1730   LearningRate 0.0001   Epoch: 26   Global Step: 45770   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-06 06:11:45,928-Speed 9375.15 samples/sec   Loss 1.1716   LearningRate 0.0001   Epoch: 26   Global Step: 45780   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-06 06:12:12,151-Speed 9372.51 samples/sec   Loss 1.1659   LearningRate 0.0001   Epoch: 26   Global Step: 45790   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-06 06:12:38,293-Speed 9401.20 samples/sec   Loss 1.1662   LearningRate 0.0001   Epoch: 26   Global Step: 45800   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-06 06:13:04,544-Speed 9362.27 samples/sec   Loss 1.1696   LearningRate 0.0001   Epoch: 26   Global Step: 45810   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-06 06:13:30,929-Speed 9314.81 samples/sec   Loss 1.1603   LearningRate 0.0001   Epoch: 26   Global Step: 45820   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-06 06:13:57,174-Speed 9364.43 samples/sec   Loss 1.1519   LearningRate 0.0001   Epoch: 26   Global Step: 45830   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-06 06:14:23,333-Speed 9395.08 samples/sec   Loss 1.1555   LearningRate 0.0001   Epoch: 26   Global Step: 45840   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-06 06:14:49,511-Speed 9388.27 samples/sec   Loss 1.1507   LearningRate 0.0001   Epoch: 26   Global Step: 45850   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-06 06:15:15,693-Speed 9387.16 samples/sec   Loss 1.1623   LearningRate 0.0001   Epoch: 26   Global Step: 45860   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-06 06:15:41,815-Speed 9408.54 samples/sec   Loss 1.1622   LearningRate 0.0001   Epoch: 26   Global Step: 45870   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:16:07,992-Speed 9388.56 samples/sec   Loss 1.1520   LearningRate 0.0001   Epoch: 26   Global Step: 45880   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:16:34,191-Speed 9380.96 samples/sec   Loss 1.1616   LearningRate 0.0001   Epoch: 26   Global Step: 45890   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-03-06 06:17:00,410-Speed 9373.78 samples/sec   Loss 1.1675   LearningRate 0.0001   Epoch: 26   Global Step: 45900   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-03-06 06:17:26,825-Speed 9304.46 samples/sec   Loss 1.1581   LearningRate 0.0001   Epoch: 26   Global Step: 45910   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-03-06 06:17:53,037-Speed 9376.20 samples/sec   Loss 1.1569   LearningRate 0.0001   Epoch: 26   Global Step: 45920   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-03-06 06:18:19,396-Speed 9324.10 samples/sec   Loss 1.1582   LearningRate 0.0001   Epoch: 26   Global Step: 45930   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-03-06 06:18:45,746-Speed 9326.92 samples/sec   Loss 1.1477   LearningRate 0.0001   Epoch: 26   Global Step: 45940   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-03-06 06:19:11,936-Speed 9384.08 samples/sec   Loss 1.1666   LearningRate 0.0001   Epoch: 26   Global Step: 45950   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-03-06 06:19:38,129-Speed 9383.45 samples/sec   Loss 1.1538   LearningRate 0.0001   Epoch: 26   Global Step: 45960   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-03-06 06:20:04,426-Speed 9345.72 samples/sec   Loss 1.1518   LearningRate 0.0001   Epoch: 26   Global Step: 45970   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-03-06 06:20:30,677-Speed 9362.76 samples/sec   Loss 1.1575   LearningRate 0.0001   Epoch: 26   Global Step: 45980   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-03-06 06:20:56,818-Speed 9401.58 samples/sec   Loss 1.1571   LearningRate 0.0001   Epoch: 26   Global Step: 45990   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-06 06:21:23,025-Speed 9378.38 samples/sec   Loss 1.1661   LearningRate 0.0001   Epoch: 26   Global Step: 46000   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-06 06:21:49,227-Speed 9379.47 samples/sec   Loss 1.1571   LearningRate 0.0001   Epoch: 26   Global Step: 46010   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-06 06:22:15,486-Speed 9359.50 samples/sec   Loss 1.1591   LearningRate 0.0001   Epoch: 26   Global Step: 46020   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-06 06:22:41,678-Speed 9383.49 samples/sec   Loss 1.1481   LearningRate 0.0001   Epoch: 26   Global Step: 46030   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-06 06:23:07,890-Speed 9376.34 samples/sec   Loss 1.1507   LearningRate 0.0001   Epoch: 26   Global Step: 46040   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-06 06:23:34,054-Speed 9393.43 samples/sec   Loss 1.1451   LearningRate 0.0001   Epoch: 26   Global Step: 46050   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-06 06:24:00,311-Speed 9360.09 samples/sec   Loss 1.1555   LearningRate 0.0001   Epoch: 26   Global Step: 46060   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-06 06:24:26,494-Speed 9386.86 samples/sec   Loss 1.1554   LearningRate 0.0001   Epoch: 26   Global Step: 46070   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-06 06:24:52,691-Speed 9381.49 samples/sec   Loss 1.1566   LearningRate 0.0001   Epoch: 26   Global Step: 46080   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-06 06:25:19,068-Speed 9317.27 samples/sec   Loss 1.1469   LearningRate 0.0001   Epoch: 26   Global Step: 46090   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:25:45,561-Speed 9276.79 samples/sec   Loss 1.1449   LearningRate 0.0001   Epoch: 26   Global Step: 46100   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:26:11,893-Speed 9333.49 samples/sec   Loss 1.1520   LearningRate 0.0001   Epoch: 26   Global Step: 46110   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:26:38,243-Speed 9327.31 samples/sec   Loss 1.1483   LearningRate 0.0001   Epoch: 26   Global Step: 46120   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:27:04,642-Speed 9309.79 samples/sec   Loss 1.1470   LearningRate 0.0001   Epoch: 26   Global Step: 46130   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:27:30,969-Speed 9335.11 samples/sec   Loss 1.1533   LearningRate 0.0001   Epoch: 26   Global Step: 46140   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:27:57,289-Speed 9338.11 samples/sec   Loss 1.1547   LearningRate 0.0001   Epoch: 26   Global Step: 46150   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:28:23,567-Speed 9352.65 samples/sec   Loss 1.1548   LearningRate 0.0001   Epoch: 26   Global Step: 46160   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:28:49,976-Speed 9306.09 samples/sec   Loss 1.1440   LearningRate 0.0001   Epoch: 26   Global Step: 46170   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:29:16,233-Speed 9360.24 samples/sec   Loss 1.1472   LearningRate 0.0001   Epoch: 26   Global Step: 46180   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:29:42,560-Speed 9335.47 samples/sec   Loss 1.1561   LearningRate 0.0001   Epoch: 26   Global Step: 46190   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-06 06:30:08,772-Speed 9376.61 samples/sec   Loss 1.1494   LearningRate 0.0001   Epoch: 26   Global Step: 46200   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:30:34,978-Speed 9378.51 samples/sec   Loss 1.1463   LearningRate 0.0001   Epoch: 26   Global Step: 46210   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:31:01,182-Speed 9378.90 samples/sec   Loss 1.1494   LearningRate 0.0001   Epoch: 26   Global Step: 46220   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:31:27,417-Speed 9368.06 samples/sec   Loss 1.1475   LearningRate 0.0001   Epoch: 26   Global Step: 46230   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:31:53,716-Speed 9345.46 samples/sec   Loss 1.1538   LearningRate 0.0001   Epoch: 26   Global Step: 46240   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:32:19,987-Speed 9355.24 samples/sec   Loss 1.1528   LearningRate 0.0001   Epoch: 26   Global Step: 46250   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:32:46,195-Speed 9377.72 samples/sec   Loss 1.1598   LearningRate 0.0001   Epoch: 26   Global Step: 46260   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:33:12,470-Speed 9353.72 samples/sec   Loss 1.1511   LearningRate 0.0001   Epoch: 26   Global Step: 46270   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:33:38,681-Speed 9376.67 samples/sec   Loss 1.1420   LearningRate 0.0001   Epoch: 26   Global Step: 46280   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:34:04,898-Speed 9374.31 samples/sec   Loss 1.1436   LearningRate 0.0001   Epoch: 26   Global Step: 46290   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:34:31,053-Speed 9396.78 samples/sec   Loss 1.1486   LearningRate 0.0001   Epoch: 26   Global Step: 46300   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:34:57,199-Speed 9399.96 samples/sec   Loss 1.1501   LearningRate 0.0001   Epoch: 26   Global Step: 46310   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:35:23,395-Speed 9382.13 samples/sec   Loss 1.1469   LearningRate 0.0001   Epoch: 26   Global Step: 46320   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:35:49,705-Speed 9341.17 samples/sec   Loss 1.1368   LearningRate 0.0001   Epoch: 26   Global Step: 46330   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:36:15,879-Speed 9389.89 samples/sec   Loss 1.1471   LearningRate 0.0001   Epoch: 26   Global Step: 46340   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:36:42,066-Speed 9385.47 samples/sec   Loss 1.1441   LearningRate 0.0001   Epoch: 26   Global Step: 46350   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:37:08,309-Speed 9365.04 samples/sec   Loss 1.1474   LearningRate 0.0001   Epoch: 26   Global Step: 46360   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:37:34,523-Speed 9375.30 samples/sec   Loss 1.1444   LearningRate 0.0001   Epoch: 26   Global Step: 46370   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:38:00,752-Speed 9370.30 samples/sec   Loss 1.1457   LearningRate 0.0001   Epoch: 26   Global Step: 46380   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:38:26,892-Speed 9402.00 samples/sec   Loss 1.1518   LearningRate 0.0001   Epoch: 26   Global Step: 46390   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:38:52,987-Speed 9418.11 samples/sec   Loss 1.1434   LearningRate 0.0001   Epoch: 26   Global Step: 46400   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-06 06:39:19,093-Speed 9414.56 samples/sec   Loss 1.1488   LearningRate 0.0001   Epoch: 26   Global Step: 46410   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:39:45,280-Speed 9385.36 samples/sec   Loss 1.1388   LearningRate 0.0001   Epoch: 26   Global Step: 46420   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:40:11,500-Speed 9373.62 samples/sec   Loss 1.1491   LearningRate 0.0001   Epoch: 26   Global Step: 46430   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:40:37,649-Speed 9399.01 samples/sec   Loss 1.1515   LearningRate 0.0001   Epoch: 26   Global Step: 46440   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:41:03,956-Speed 9342.02 samples/sec   Loss 1.1407   LearningRate 0.0001   Epoch: 26   Global Step: 46450   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:41:30,149-Speed 9383.35 samples/sec   Loss 1.1484   LearningRate 0.0001   Epoch: 26   Global Step: 46460   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:41:56,417-Speed 9356.36 samples/sec   Loss 1.1434   LearningRate 0.0001   Epoch: 26   Global Step: 46470   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:42:22,652-Speed 9368.14 samples/sec   Loss 1.1332   LearningRate 0.0001   Epoch: 26   Global Step: 46480   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:42:48,858-Speed 9378.19 samples/sec   Loss 1.1453   LearningRate 0.0001   Epoch: 26   Global Step: 46490   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:43:15,091-Speed 9368.97 samples/sec   Loss 1.1393   LearningRate 0.0001   Epoch: 26   Global Step: 46500   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:43:41,221-Speed 9405.54 samples/sec   Loss 1.1356   LearningRate 0.0001   Epoch: 26   Global Step: 46510   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:44:07,462-Speed 9365.96 samples/sec   Loss 1.1430   LearningRate 0.0001   Epoch: 26   Global Step: 46520   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:44:33,823-Speed 9323.23 samples/sec   Loss 1.1356   LearningRate 0.0001   Epoch: 26   Global Step: 46530   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:45:00,001-Speed 9388.16 samples/sec   Loss 1.1414   LearningRate 0.0001   Epoch: 26   Global Step: 46540   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:45:26,162-Speed 9394.50 samples/sec   Loss 1.1504   LearningRate 0.0001   Epoch: 26   Global Step: 46550   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-06 06:45:52,356-Speed 9382.87 samples/sec   Loss 1.1387   LearningRate 0.0001   Epoch: 26   Global Step: 46560   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-06 06:46:18,600-Speed 9364.90 samples/sec   Loss 1.1373   LearningRate 0.0001   Epoch: 26   Global Step: 46570   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-06 06:46:44,777-Speed 9388.82 samples/sec   Loss 1.1463   LearningRate 0.0001   Epoch: 26   Global Step: 46580   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-06 06:47:11,029-Speed 9361.84 samples/sec   Loss 1.1415   LearningRate 0.0001   Epoch: 26   Global Step: 46590   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-06 06:47:37,257-Speed 9370.53 samples/sec   Loss 1.1449   LearningRate 0.0001   Epoch: 26   Global Step: 46600   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-06 06:48:03,443-Speed 9385.43 samples/sec   Loss 1.1509   LearningRate 0.0001   Epoch: 26   Global Step: 46610   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-06 06:48:29,785-Speed 9330.27 samples/sec   Loss 1.1393   LearningRate 0.0001   Epoch: 26   Global Step: 46620   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-06 06:48:56,098-Speed 9340.18 samples/sec   Loss 1.1403   LearningRate 0.0001   Epoch: 26   Global Step: 46630   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-06 06:49:22,291-Speed 9383.09 samples/sec   Loss 1.1473   LearningRate 0.0001   Epoch: 26   Global Step: 46640   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-06 06:49:48,619-Speed 9334.68 samples/sec   Loss 1.1502   LearningRate 0.0001   Epoch: 26   Global Step: 46650   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:50:14,957-Speed 9331.38 samples/sec   Loss 1.1462   LearningRate 0.0001   Epoch: 26   Global Step: 46660   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:51:34,869-Speed 3075.44 samples/sec   Loss 1.1319   LearningRate 0.0001   Epoch: 27   Global Step: 46670   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:52:00,844-Speed 9461.70 samples/sec   Loss 1.1222   LearningRate 0.0001   Epoch: 27   Global Step: 46680   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:52:26,860-Speed 9447.16 samples/sec   Loss 1.1239   LearningRate 0.0001   Epoch: 27   Global Step: 46690   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-06 06:52:52,976-Speed 9410.86 samples/sec   Loss 1.1236   LearningRate 0.0001   Epoch: 27   Global Step: 46700   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 06:53:19,040-Speed 9429.63 samples/sec   Loss 1.1236   LearningRate 0.0001   Epoch: 27   Global Step: 46710   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-06 06:53:45,068-Speed 9442.33 samples/sec   Loss 1.1253   LearningRate 0.0001   Epoch: 27   Global Step: 46720   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-06 06:54:11,192-Speed 9408.08 samples/sec   Loss 1.1304   LearningRate 0.0001   Epoch: 27   Global Step: 46730   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-06 06:54:37,373-Speed 9387.33 samples/sec   Loss 1.1274   LearningRate 0.0001   Epoch: 27   Global Step: 46740   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-06 06:55:03,591-Speed 9374.58 samples/sec   Loss 1.1273   LearningRate 0.0001   Epoch: 27   Global Step: 46750   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-06 06:55:29,949-Speed 9324.17 samples/sec   Loss 1.1197   LearningRate 0.0001   Epoch: 27   Global Step: 46760   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-06 06:55:56,336-Speed 9314.20 samples/sec   Loss 1.1229   LearningRate 0.0001   Epoch: 27   Global Step: 46770   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-06 06:56:22,591-Speed 9360.62 samples/sec   Loss 1.1289   LearningRate 0.0001   Epoch: 27   Global Step: 46780   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-06 06:56:48,955-Speed 9322.11 samples/sec   Loss 1.1256   LearningRate 0.0001   Epoch: 27   Global Step: 46790   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-06 06:57:15,303-Speed 9328.16 samples/sec   Loss 1.1247   LearningRate 0.0001   Epoch: 27   Global Step: 46800   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-06 06:57:41,646-Speed 9329.37 samples/sec   Loss 1.1234   LearningRate 0.0001   Epoch: 27   Global Step: 46810   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 06:58:07,957-Speed 9340.92 samples/sec   Loss 1.1338   LearningRate 0.0001   Epoch: 27   Global Step: 46820   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 06:58:34,141-Speed 9386.62 samples/sec   Loss 1.1229   LearningRate 0.0001   Epoch: 27   Global Step: 46830   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 06:59:00,532-Speed 9312.63 samples/sec   Loss 1.1205   LearningRate 0.0001   Epoch: 27   Global Step: 46840   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 06:59:26,861-Speed 9334.53 samples/sec   Loss 1.1222   LearningRate 0.0001   Epoch: 27   Global Step: 46850   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 06:59:53,244-Speed 9315.42 samples/sec   Loss 1.1284   LearningRate 0.0001   Epoch: 27   Global Step: 46860   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:00:19,511-Speed 9356.49 samples/sec   Loss 1.1232   LearningRate 0.0001   Epoch: 27   Global Step: 46870   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:00:45,752-Speed 9366.16 samples/sec   Loss 1.1343   LearningRate 0.0001   Epoch: 27   Global Step: 46880   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:01:11,917-Speed 9393.05 samples/sec   Loss 1.1312   LearningRate 0.0001   Epoch: 27   Global Step: 46890   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:01:38,082-Speed 9392.70 samples/sec   Loss 1.1254   LearningRate 0.0001   Epoch: 27   Global Step: 46900   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:02:04,339-Speed 9360.27 samples/sec   Loss 1.1252   LearningRate 0.0001   Epoch: 27   Global Step: 46910   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-06 07:02:30,496-Speed 9396.73 samples/sec   Loss 1.1291   LearningRate 0.0001   Epoch: 27   Global Step: 46920   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:02:56,735-Speed 9366.49 samples/sec   Loss 1.1225   LearningRate 0.0001   Epoch: 27   Global Step: 46930   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:03:23,036-Speed 9344.81 samples/sec   Loss 1.1233   LearningRate 0.0001   Epoch: 27   Global Step: 46940   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:03:49,277-Speed 9365.77 samples/sec   Loss 1.1286   LearningRate 0.0001   Epoch: 27   Global Step: 46950   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:04:15,696-Speed 9302.76 samples/sec   Loss 1.1265   LearningRate 0.0001   Epoch: 27   Global Step: 46960   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:04:41,955-Speed 9359.70 samples/sec   Loss 1.1252   LearningRate 0.0001   Epoch: 27   Global Step: 46970   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:05:08,447-Speed 9277.04 samples/sec   Loss 1.1239   LearningRate 0.0001   Epoch: 27   Global Step: 46980   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:05:34,809-Speed 9322.65 samples/sec   Loss 1.1253   LearningRate 0.0001   Epoch: 27   Global Step: 46990   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:06:01,143-Speed 9332.93 samples/sec   Loss 1.1219   LearningRate 0.0001   Epoch: 27   Global Step: 47000   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:06:27,538-Speed 9311.02 samples/sec   Loss 1.1223   LearningRate 0.0001   Epoch: 27   Global Step: 47010   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:06:53,812-Speed 9354.31 samples/sec   Loss 1.1291   LearningRate 0.0001   Epoch: 27   Global Step: 47020   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:07:20,114-Speed 9343.93 samples/sec   Loss 1.1236   LearningRate 0.0001   Epoch: 27   Global Step: 47030   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:07:46,447-Speed 9333.35 samples/sec   Loss 1.1292   LearningRate 0.0001   Epoch: 27   Global Step: 47040   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:08:12,679-Speed 9369.63 samples/sec   Loss 1.1311   LearningRate 0.0001   Epoch: 27   Global Step: 47050   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:08:38,891-Speed 9376.02 samples/sec   Loss 1.1243   LearningRate 0.0001   Epoch: 27   Global Step: 47060   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:09:05,145-Speed 9361.50 samples/sec   Loss 1.1264   LearningRate 0.0001   Epoch: 27   Global Step: 47070   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:09:31,536-Speed 9312.42 samples/sec   Loss 1.1299   LearningRate 0.0001   Epoch: 27   Global Step: 47080   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:09:57,856-Speed 9337.96 samples/sec   Loss 1.1230   LearningRate 0.0001   Epoch: 27   Global Step: 47090   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:10:29,416-Speed 7787.45 samples/sec   Loss 1.1284   LearningRate 0.0001   Epoch: 27   Global Step: 47100   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:10:55,694-Speed 9352.60 samples/sec   Loss 1.1356   LearningRate 0.0001   Epoch: 27   Global Step: 47110   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:11:21,903-Speed 9377.20 samples/sec   Loss 1.1209   LearningRate 0.0001   Epoch: 27   Global Step: 47120   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-06 07:11:48,328-Speed 9300.68 samples/sec   Loss 1.1194   LearningRate 0.0001   Epoch: 27   Global Step: 47130   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-06 07:12:14,797-Speed 9285.14 samples/sec   Loss 1.1127   LearningRate 0.0001   Epoch: 27   Global Step: 47140   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-06 07:12:41,328-Speed 9263.44 samples/sec   Loss 1.1150   LearningRate 0.0001   Epoch: 27   Global Step: 47150   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:13:08,006-Speed 9212.76 samples/sec   Loss 1.1271   LearningRate 0.0001   Epoch: 27   Global Step: 47160   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:13:34,453-Speed 9292.83 samples/sec   Loss 1.1266   LearningRate 0.0001   Epoch: 27   Global Step: 47170   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:14:00,955-Speed 9273.68 samples/sec   Loss 1.1278   LearningRate 0.0001   Epoch: 27   Global Step: 47180   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:14:27,369-Speed 9304.49 samples/sec   Loss 1.1168   LearningRate 0.0001   Epoch: 27   Global Step: 47190   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:14:53,835-Speed 9286.83 samples/sec   Loss 1.1111   LearningRate 0.0001   Epoch: 27   Global Step: 47200   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-06 07:15:20,388-Speed 9255.89 samples/sec   Loss 1.1184   LearningRate 0.0001   Epoch: 27   Global Step: 47210   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-06 07:15:46,840-Speed 9291.10 samples/sec   Loss 1.1203   LearningRate 0.0001   Epoch: 27   Global Step: 47220   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-06 07:16:13,441-Speed 9239.35 samples/sec   Loss 1.1168   LearningRate 0.0001   Epoch: 27   Global Step: 47230   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-03-06 07:16:39,932-Speed 9277.60 samples/sec   Loss 1.1141   LearningRate 0.0001   Epoch: 27   Global Step: 47240   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-03-06 07:17:06,595-Speed 9217.50 samples/sec   Loss 1.1122   LearningRate 0.0001   Epoch: 27   Global Step: 47250   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-03-06 07:17:33,358-Speed 9182.99 samples/sec   Loss 1.1126   LearningRate 0.0001   Epoch: 27   Global Step: 47260   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-03-06 07:17:59,802-Speed 9294.19 samples/sec   Loss 1.1213   LearningRate 0.0001   Epoch: 27   Global Step: 47270   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-03-06 07:18:26,574-Speed 9180.03 samples/sec   Loss 1.1117   LearningRate 0.0001   Epoch: 27   Global Step: 47280   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-03-06 07:18:53,252-Speed 9212.19 samples/sec   Loss 1.1112   LearningRate 0.0001   Epoch: 27   Global Step: 47290   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-03-06 07:19:19,739-Speed 9279.13 samples/sec   Loss 1.1243   LearningRate 0.0001   Epoch: 27   Global Step: 47300   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-03-06 07:19:46,200-Speed 9287.82 samples/sec   Loss 1.1142   LearningRate 0.0001   Epoch: 27   Global Step: 47310   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-03-06 07:20:12,791-Speed 9242.59 samples/sec   Loss 1.1238   LearningRate 0.0001   Epoch: 27   Global Step: 47320   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-03-06 07:20:39,432-Speed 9225.27 samples/sec   Loss 1.1212   LearningRate 0.0001   Epoch: 27   Global Step: 47330   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-06 07:21:06,167-Speed 9192.79 samples/sec   Loss 1.1099   LearningRate 0.0001   Epoch: 27   Global Step: 47340   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-06 07:21:32,969-Speed 9171.32 samples/sec   Loss 1.1197   LearningRate 0.0001   Epoch: 27   Global Step: 47350   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-06 07:21:59,760-Speed 9173.28 samples/sec   Loss 1.1196   LearningRate 0.0001   Epoch: 27   Global Step: 47360   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-06 07:22:26,433-Speed 9214.44 samples/sec   Loss 1.1115   LearningRate 0.0001   Epoch: 27   Global Step: 47370   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-06 07:22:53,094-Speed 9218.23 samples/sec   Loss 1.1061   LearningRate 0.0001   Epoch: 27   Global Step: 47380   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-06 07:23:19,722-Speed 9229.88 samples/sec   Loss 1.1073   LearningRate 0.0001   Epoch: 27   Global Step: 47390   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-06 07:23:46,256-Speed 9262.45 samples/sec   Loss 1.1167   LearningRate 0.0001   Epoch: 27   Global Step: 47400   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-06 07:24:12,823-Speed 9250.75 samples/sec   Loss 1.1140   LearningRate 0.0001   Epoch: 27   Global Step: 47410   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-06 07:24:39,474-Speed 9221.99 samples/sec   Loss 1.1216   LearningRate 0.0001   Epoch: 27   Global Step: 47420   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-06 07:25:05,940-Speed 9286.41 samples/sec   Loss 1.1128   LearningRate 0.0001   Epoch: 27   Global Step: 47430   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:25:32,669-Speed 9195.23 samples/sec   Loss 1.1116   LearningRate 0.0001   Epoch: 27   Global Step: 47440   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:25:59,164-Speed 9276.07 samples/sec   Loss 1.1125   LearningRate 0.0001   Epoch: 27   Global Step: 47450   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-06 07:26:25,802-Speed 9226.15 samples/sec   Loss 1.1125   LearningRate 0.0001   Epoch: 27   Global Step: 47460   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-06 07:26:52,341-Speed 9261.73 samples/sec   Loss 1.1014   LearningRate 0.0001   Epoch: 27   Global Step: 47470   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-06 07:27:19,071-Speed 9194.45 samples/sec   Loss 1.1167   LearningRate 0.0001   Epoch: 27   Global Step: 47480   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-06 07:27:45,859-Speed 9174.86 samples/sec   Loss 1.1104   LearningRate 0.0001   Epoch: 27   Global Step: 47490   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-06 07:28:12,495-Speed 9226.95 samples/sec   Loss 1.1115   LearningRate 0.0001   Epoch: 27   Global Step: 47500   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-06 07:28:39,084-Speed 9243.23 samples/sec   Loss 1.1131   LearningRate 0.0001   Epoch: 27   Global Step: 47510   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-06 07:29:05,667-Speed 9245.65 samples/sec   Loss 1.1051   LearningRate 0.0001   Epoch: 27   Global Step: 47520   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-06 07:29:32,139-Speed 9284.16 samples/sec   Loss 1.1145   LearningRate 0.0001   Epoch: 27   Global Step: 47530   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-06 07:29:58,939-Speed 9170.35 samples/sec   Loss 1.1052   LearningRate 0.0001   Epoch: 27   Global Step: 47540   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-06 07:30:25,562-Speed 9231.78 samples/sec   Loss 1.1130   LearningRate 0.0001   Epoch: 27   Global Step: 47550   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:30:52,340-Speed 9178.20 samples/sec   Loss 1.1074   LearningRate 0.0001   Epoch: 27   Global Step: 47560   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:31:18,981-Speed 9225.51 samples/sec   Loss 1.1061   LearningRate 0.0001   Epoch: 27   Global Step: 47570   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:31:45,726-Speed 9189.43 samples/sec   Loss 1.1071   LearningRate 0.0001   Epoch: 27   Global Step: 47580   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:32:12,341-Speed 9234.25 samples/sec   Loss 1.1070   LearningRate 0.0001   Epoch: 27   Global Step: 47590   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:32:38,920-Speed 9246.81 samples/sec   Loss 1.1067   LearningRate 0.0001   Epoch: 27   Global Step: 47600   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:33:05,464-Speed 9258.99 samples/sec   Loss 1.1106   LearningRate 0.0001   Epoch: 27   Global Step: 47610   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:33:32,189-Speed 9196.19 samples/sec   Loss 1.1022   LearningRate 0.0001   Epoch: 27   Global Step: 47620   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:33:58,854-Speed 9217.01 samples/sec   Loss 1.1029   LearningRate 0.0001   Epoch: 27   Global Step: 47630   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:34:25,568-Speed 9199.95 samples/sec   Loss 1.1019   LearningRate 0.0001   Epoch: 27   Global Step: 47640   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:34:52,173-Speed 9237.85 samples/sec   Loss 1.0928   LearningRate 0.0001   Epoch: 27   Global Step: 47650   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-06 07:35:19,053-Speed 9143.50 samples/sec   Loss 1.0953   LearningRate 0.0001   Epoch: 27   Global Step: 47660   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-06 07:35:45,692-Speed 9225.71 samples/sec   Loss 1.1118   LearningRate 0.0001   Epoch: 27   Global Step: 47670   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-06 07:36:12,383-Speed 9208.21 samples/sec   Loss 1.0955   LearningRate 0.0001   Epoch: 27   Global Step: 47680   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-06 07:36:38,947-Speed 9252.11 samples/sec   Loss 1.0963   LearningRate 0.0001   Epoch: 27   Global Step: 47690   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-06 07:37:05,484-Speed 9261.21 samples/sec   Loss 1.1044   LearningRate 0.0001   Epoch: 27   Global Step: 47700   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:37:32,072-Speed 9243.86 samples/sec   Loss 1.1098   LearningRate 0.0001   Epoch: 27   Global Step: 47710   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:37:58,662-Speed 9243.12 samples/sec   Loss 1.1021   LearningRate 0.0001   Epoch: 27   Global Step: 47720   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:38:25,246-Speed 9244.95 samples/sec   Loss 1.1008   LearningRate 0.0001   Epoch: 27   Global Step: 47730   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:38:51,763-Speed 9268.34 samples/sec   Loss 1.0999   LearningRate 0.0001   Epoch: 27   Global Step: 47740   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:39:18,214-Speed 9291.53 samples/sec   Loss 1.1070   LearningRate 0.0001   Epoch: 27   Global Step: 47750   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:39:44,925-Speed 9201.11 samples/sec   Loss 1.1038   LearningRate 0.0001   Epoch: 27   Global Step: 47760   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:40:11,445-Speed 9267.39 samples/sec   Loss 1.1019   LearningRate 0.0001   Epoch: 27   Global Step: 47770   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:40:37,986-Speed 9260.04 samples/sec   Loss 1.1053   LearningRate 0.0001   Epoch: 27   Global Step: 47780   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:41:04,656-Speed 9215.07 samples/sec   Loss 1.0996   LearningRate 0.0001   Epoch: 27   Global Step: 47790   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:41:31,261-Speed 9237.92 samples/sec   Loss 1.0982   LearningRate 0.0001   Epoch: 27   Global Step: 47800   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-06 07:41:57,953-Speed 9207.74 samples/sec   Loss 1.0971   LearningRate 0.0001   Epoch: 27   Global Step: 47810   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-06 07:42:24,417-Speed 9286.62 samples/sec   Loss 1.0919   LearningRate 0.0001   Epoch: 27   Global Step: 47820   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:42:50,984-Speed 9251.46 samples/sec   Loss 1.0924   LearningRate 0.0001   Epoch: 27   Global Step: 47830   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:43:17,536-Speed 9256.21 samples/sec   Loss 1.1005   LearningRate 0.0001   Epoch: 27   Global Step: 47840   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:43:44,196-Speed 9218.86 samples/sec   Loss 1.0948   LearningRate 0.0001   Epoch: 27   Global Step: 47850   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:44:11,038-Speed 9156.35 samples/sec   Loss 1.0997   LearningRate 0.0001   Epoch: 27   Global Step: 47860   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:44:37,931-Speed 9138.71 samples/sec   Loss 1.0962   LearningRate 0.0001   Epoch: 27   Global Step: 47870   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-06 07:45:04,511-Speed 9246.20 samples/sec   Loss 1.0940   LearningRate 0.0001   Epoch: 27   Global Step: 47880   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-06 07:45:31,247-Speed 9192.52 samples/sec   Loss 1.1005   LearningRate 0.0001   Epoch: 27   Global Step: 47890   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-06 07:45:57,994-Speed 9188.87 samples/sec   Loss 1.0994   LearningRate 0.0001   Epoch: 27   Global Step: 47900   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-06 07:46:24,590-Speed 9240.86 samples/sec   Loss 1.0997   LearningRate 0.0001   Epoch: 27   Global Step: 47910   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-06 07:46:51,050-Speed 9288.25 samples/sec   Loss 1.0997   LearningRate 0.0001   Epoch: 27   Global Step: 47920   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-06 07:47:17,400-Speed 9327.22 samples/sec   Loss 1.0981   LearningRate 0.0001   Epoch: 27   Global Step: 47930   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-06 07:47:43,704-Speed 9343.34 samples/sec   Loss 1.0948   LearningRate 0.0001   Epoch: 27   Global Step: 47940   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-06 07:48:10,131-Speed 9299.74 samples/sec   Loss 1.0911   LearningRate 0.0001   Epoch: 27   Global Step: 47950   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-06 07:48:36,575-Speed 9295.31 samples/sec   Loss 1.0990   LearningRate 0.0001   Epoch: 27   Global Step: 47960   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-06 07:49:03,031-Speed 9289.43 samples/sec   Loss 1.0903   LearningRate 0.0001   Epoch: 27   Global Step: 47970   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:49:29,685-Speed 9220.73 samples/sec   Loss 1.0933   LearningRate 0.0001   Epoch: 27   Global Step: 47980   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:50:01,809-Speed 7650.88 samples/sec   Loss 1.0947   LearningRate 0.0001   Epoch: 27   Global Step: 47990   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:50:28,222-Speed 9304.67 samples/sec   Loss 1.0900   LearningRate 0.0001   Epoch: 27   Global Step: 48000   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:50:54,715-Speed 9276.77 samples/sec   Loss 1.0991   LearningRate 0.0001   Epoch: 27   Global Step: 48010   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:51:21,149-Speed 9297.47 samples/sec   Loss 1.0952   LearningRate 0.0001   Epoch: 27   Global Step: 48020   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:51:47,613-Speed 9287.01 samples/sec   Loss 1.1025   LearningRate 0.0001   Epoch: 27   Global Step: 48030   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:52:14,130-Speed 9268.60 samples/sec   Loss 1.0937   LearningRate 0.0001   Epoch: 27   Global Step: 48040   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:52:40,625-Speed 9276.02 samples/sec   Loss 1.0837   LearningRate 0.0001   Epoch: 27   Global Step: 48050   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-06 07:53:07,063-Speed 9295.96 samples/sec   Loss 1.0993   LearningRate 0.0001   Epoch: 27   Global Step: 48060   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 07:53:33,528-Speed 9286.62 samples/sec   Loss 1.0889   LearningRate 0.0001   Epoch: 27   Global Step: 48070   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 07:54:00,173-Speed 9223.74 samples/sec   Loss 1.0868   LearningRate 0.0001   Epoch: 27   Global Step: 48080   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 07:54:26,622-Speed 9292.31 samples/sec   Loss 1.0851   LearningRate 0.0001   Epoch: 27   Global Step: 48090   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 07:54:53,078-Speed 9289.86 samples/sec   Loss 1.0851   LearningRate 0.0001   Epoch: 27   Global Step: 48100   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 07:55:19,628-Speed 9256.79 samples/sec   Loss 1.0998   LearningRate 0.0001   Epoch: 27   Global Step: 48110   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 07:55:46,043-Speed 9304.12 samples/sec   Loss 1.0860   LearningRate 0.0001   Epoch: 27   Global Step: 48120   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 07:56:12,607-Speed 9252.11 samples/sec   Loss 1.0842   LearningRate 0.0001   Epoch: 27   Global Step: 48130   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 07:56:39,254-Speed 9223.40 samples/sec   Loss 1.0892   LearningRate 0.0001   Epoch: 27   Global Step: 48140   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 07:57:05,953-Speed 9205.08 samples/sec   Loss 1.0946   LearningRate 0.0001   Epoch: 27   Global Step: 48150   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 07:57:32,691-Speed 9191.77 samples/sec   Loss 1.0931   LearningRate 0.0001   Epoch: 27   Global Step: 48160   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 07:57:59,290-Speed 9240.22 samples/sec   Loss 1.0836   LearningRate 0.0001   Epoch: 27   Global Step: 48170   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-06 07:58:25,800-Speed 9270.67 samples/sec   Loss 1.0970   LearningRate 0.0001   Epoch: 27   Global Step: 48180   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-06 07:58:52,365-Speed 9251.65 samples/sec   Loss 1.0985   LearningRate 0.0001   Epoch: 27   Global Step: 48190   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-06 07:59:18,833-Speed 9285.88 samples/sec   Loss 1.0964   LearningRate 0.0001   Epoch: 27   Global Step: 48200   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 07:59:45,390-Speed 9254.40 samples/sec   Loss 1.0921   LearningRate 0.0001   Epoch: 27   Global Step: 48210   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:00:12,065-Speed 9213.63 samples/sec   Loss 1.0934   LearningRate 0.0001   Epoch: 27   Global Step: 48220   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:00:38,461-Speed 9310.88 samples/sec   Loss 1.0854   LearningRate 0.0001   Epoch: 27   Global Step: 48230   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:01:05,088-Speed 9229.90 samples/sec   Loss 1.0896   LearningRate 0.0001   Epoch: 27   Global Step: 48240   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:01:31,580-Speed 9277.14 samples/sec   Loss 1.0796   LearningRate 0.0001   Epoch: 27   Global Step: 48250   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:01:58,074-Speed 9276.39 samples/sec   Loss 1.0903   LearningRate 0.0001   Epoch: 27   Global Step: 48260   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:02:24,877-Speed 9169.67 samples/sec   Loss 1.0800   LearningRate 0.0001   Epoch: 27   Global Step: 48270   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:02:51,341-Speed 9287.01 samples/sec   Loss 1.0914   LearningRate 0.0001   Epoch: 27   Global Step: 48280   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:03:19,957-Speed 8588.33 samples/sec   Loss 1.0907   LearningRate 0.0001   Epoch: 27   Global Step: 48290   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:03:46,474-Speed 9268.44 samples/sec   Loss 1.0844   LearningRate 0.0001   Epoch: 27   Global Step: 48300   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:04:12,983-Speed 9271.56 samples/sec   Loss 1.0899   LearningRate 0.0001   Epoch: 27   Global Step: 48310   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:04:39,479-Speed 9275.65 samples/sec   Loss 1.0918   LearningRate 0.0001   Epoch: 27   Global Step: 48320   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:05:06,029-Speed 9256.83 samples/sec   Loss 1.0960   LearningRate 0.0001   Epoch: 27   Global Step: 48330   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:05:32,545-Speed 9268.81 samples/sec   Loss 1.0908   LearningRate 0.0001   Epoch: 27   Global Step: 48340   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:05:59,054-Speed 9271.19 samples/sec   Loss 1.0879   LearningRate 0.0001   Epoch: 27   Global Step: 48350   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:06:25,531-Speed 9282.49 samples/sec   Loss 1.0864   LearningRate 0.0001   Epoch: 27   Global Step: 48360   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:06:52,272-Speed 9190.99 samples/sec   Loss 1.0939   LearningRate 0.0001   Epoch: 27   Global Step: 48370   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:07:18,717-Speed 9293.74 samples/sec   Loss 1.0901   LearningRate 0.0001   Epoch: 27   Global Step: 48380   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:07:45,132-Speed 9304.05 samples/sec   Loss 1.0946   LearningRate 0.0001   Epoch: 27   Global Step: 48390   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:09:04,169-Speed 3109.51 samples/sec   Loss 1.0838   LearningRate 0.0001   Epoch: 28   Global Step: 48400   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:09:30,210-Speed 9437.92 samples/sec   Loss 1.0876   LearningRate 0.0001   Epoch: 28   Global Step: 48410   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:09:56,306-Speed 9418.12 samples/sec   Loss 1.0762   LearningRate 0.0001   Epoch: 28   Global Step: 48420   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:10:22,459-Speed 9397.35 samples/sec   Loss 1.0776   LearningRate 0.0001   Epoch: 28   Global Step: 48430   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:10:48,740-Speed 9351.58 samples/sec   Loss 1.0752   LearningRate 0.0001   Epoch: 28   Global Step: 48440   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:11:15,072-Speed 9333.37 samples/sec   Loss 1.0773   LearningRate 0.0001   Epoch: 28   Global Step: 48450   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:11:41,208-Speed 9403.79 samples/sec   Loss 1.0794   LearningRate 0.0001   Epoch: 28   Global Step: 48460   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:12:07,385-Speed 9388.84 samples/sec   Loss 1.0712   LearningRate 0.0001   Epoch: 28   Global Step: 48470   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:12:33,523-Speed 9402.55 samples/sec   Loss 1.0750   LearningRate 0.0001   Epoch: 28   Global Step: 48480   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:12:59,640-Speed 9410.42 samples/sec   Loss 1.0716   LearningRate 0.0001   Epoch: 28   Global Step: 48490   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:13:25,776-Speed 9403.51 samples/sec   Loss 1.0707   LearningRate 0.0001   Epoch: 28   Global Step: 48500   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:13:51,876-Speed 9416.97 samples/sec   Loss 1.0769   LearningRate 0.0001   Epoch: 28   Global Step: 48510   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:14:18,184-Speed 9341.91 samples/sec   Loss 1.0691   LearningRate 0.0001   Epoch: 28   Global Step: 48520   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:14:44,350-Speed 9392.88 samples/sec   Loss 1.0677   LearningRate 0.0001   Epoch: 28   Global Step: 48530   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:15:10,470-Speed 9409.36 samples/sec   Loss 1.0704   LearningRate 0.0001   Epoch: 28   Global Step: 48540   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:15:36,531-Speed 9430.45 samples/sec   Loss 1.0846   LearningRate 0.0001   Epoch: 28   Global Step: 48550   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:16:02,662-Speed 9406.24 samples/sec   Loss 1.0777   LearningRate 0.0001   Epoch: 28   Global Step: 48560   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:16:28,820-Speed 9395.16 samples/sec   Loss 1.0723   LearningRate 0.0001   Epoch: 28   Global Step: 48570   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:16:54,955-Speed 9404.16 samples/sec   Loss 1.0825   LearningRate 0.0001   Epoch: 28   Global Step: 48580   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:17:21,061-Speed 9414.27 samples/sec   Loss 1.0774   LearningRate 0.0001   Epoch: 28   Global Step: 48590   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:17:47,200-Speed 9402.02 samples/sec   Loss 1.0746   LearningRate 0.0001   Epoch: 28   Global Step: 48600   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:18:13,431-Speed 9369.57 samples/sec   Loss 1.0662   LearningRate 0.0001   Epoch: 28   Global Step: 48610   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:18:39,596-Speed 9393.29 samples/sec   Loss 1.0676   LearningRate 0.0001   Epoch: 28   Global Step: 48620   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:19:05,800-Speed 9378.78 samples/sec   Loss 1.0697   LearningRate 0.0001   Epoch: 28   Global Step: 48630   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:19:31,927-Speed 9406.90 samples/sec   Loss 1.0668   LearningRate 0.0001   Epoch: 28   Global Step: 48640   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:19:58,155-Speed 9370.44 samples/sec   Loss 1.0751   LearningRate 0.0001   Epoch: 28   Global Step: 48650   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:20:24,340-Speed 9386.04 samples/sec   Loss 1.0712   LearningRate 0.0001   Epoch: 28   Global Step: 48660   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:20:50,666-Speed 9335.45 samples/sec   Loss 1.0673   LearningRate 0.0001   Epoch: 28   Global Step: 48670   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:21:16,849-Speed 9386.59 samples/sec   Loss 1.0785   LearningRate 0.0001   Epoch: 28   Global Step: 48680   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:21:43,011-Speed 9394.17 samples/sec   Loss 1.0813   LearningRate 0.0001   Epoch: 28   Global Step: 48690   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:22:09,191-Speed 9388.37 samples/sec   Loss 1.0708   LearningRate 0.0001   Epoch: 28   Global Step: 48700   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:22:35,264-Speed 9426.29 samples/sec   Loss 1.0753   LearningRate 0.0001   Epoch: 28   Global Step: 48710   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:23:01,430-Speed 9392.68 samples/sec   Loss 1.0622   LearningRate 0.0001   Epoch: 28   Global Step: 48720   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:23:27,577-Speed 9399.62 samples/sec   Loss 1.0847   LearningRate 0.0001   Epoch: 28   Global Step: 48730   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:23:53,673-Speed 9417.93 samples/sec   Loss 1.0720   LearningRate 0.0001   Epoch: 28   Global Step: 48740   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:24:19,824-Speed 9398.09 samples/sec   Loss 1.0816   LearningRate 0.0001   Epoch: 28   Global Step: 48750   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:24:45,933-Speed 9413.10 samples/sec   Loss 1.0834   LearningRate 0.0001   Epoch: 28   Global Step: 48760   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:25:12,103-Speed 9391.20 samples/sec   Loss 1.0754   LearningRate 0.0001   Epoch: 28   Global Step: 48770   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:25:38,306-Speed 9379.65 samples/sec   Loss 1.0734   LearningRate 0.0001   Epoch: 28   Global Step: 48780   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:26:04,419-Speed 9412.12 samples/sec   Loss 1.0736   LearningRate 0.0001   Epoch: 28   Global Step: 48790   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:26:30,583-Speed 9393.36 samples/sec   Loss 1.0810   LearningRate 0.0001   Epoch: 28   Global Step: 48800   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:26:56,790-Speed 9378.10 samples/sec   Loss 1.0673   LearningRate 0.0001   Epoch: 28   Global Step: 48810   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:27:23,017-Speed 9370.82 samples/sec   Loss 1.0710   LearningRate 0.0001   Epoch: 28   Global Step: 48820   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:27:49,150-Speed 9404.63 samples/sec   Loss 1.0732   LearningRate 0.0001   Epoch: 28   Global Step: 48830   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:28:15,305-Speed 9396.60 samples/sec   Loss 1.0742   LearningRate 0.0001   Epoch: 28   Global Step: 48840   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:28:41,452-Speed 9399.67 samples/sec   Loss 1.0744   LearningRate 0.0001   Epoch: 28   Global Step: 48850   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:29:07,566-Speed 9411.43 samples/sec   Loss 1.0708   LearningRate 0.0001   Epoch: 28   Global Step: 48860   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:29:33,763-Speed 9381.48 samples/sec   Loss 1.0771   LearningRate 0.0001   Epoch: 28   Global Step: 48870   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:29:59,930-Speed 9392.41 samples/sec   Loss 1.0720   LearningRate 0.0001   Epoch: 28   Global Step: 48880   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:30:26,072-Speed 9401.73 samples/sec   Loss 1.0775   LearningRate 0.0001   Epoch: 28   Global Step: 48890   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:30:52,274-Speed 9379.51 samples/sec   Loss 1.0718   LearningRate 0.0001   Epoch: 28   Global Step: 48900   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:31:18,413-Speed 9402.57 samples/sec   Loss 1.0699   LearningRate 0.0001   Epoch: 28   Global Step: 48910   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:31:44,521-Speed 9413.38 samples/sec   Loss 1.0807   LearningRate 0.0001   Epoch: 28   Global Step: 48920   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:32:10,604-Speed 9422.80 samples/sec   Loss 1.0721   LearningRate 0.0001   Epoch: 28   Global Step: 48930   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:32:36,667-Speed 9429.96 samples/sec   Loss 1.0679   LearningRate 0.0001   Epoch: 28   Global Step: 48940   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:33:02,784-Speed 9410.90 samples/sec   Loss 1.0687   LearningRate 0.0001   Epoch: 28   Global Step: 48950   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:33:28,919-Speed 9403.86 samples/sec   Loss 1.0649   LearningRate 0.0001   Epoch: 28   Global Step: 48960   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:33:55,064-Speed 9400.13 samples/sec   Loss 1.0654   LearningRate 0.0001   Epoch: 28   Global Step: 48970   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:34:21,165-Speed 9418.58 samples/sec   Loss 1.0619   LearningRate 0.0001   Epoch: 28   Global Step: 48980   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:34:47,374-Speed 9377.19 samples/sec   Loss 1.0712   LearningRate 0.0001   Epoch: 28   Global Step: 48990   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-06 08:35:13,537-Speed 9393.89 samples/sec   Loss 1.0652   LearningRate 0.0001   Epoch: 28   Global Step: 49000   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-06 08:35:43,639-Speed 8164.48 samples/sec   Loss 1.0600   LearningRate 0.0001   Epoch: 28   Global Step: 49010   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:36:09,728-Speed 9420.57 samples/sec   Loss 1.0578   LearningRate 0.0001   Epoch: 28   Global Step: 49020   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:36:35,882-Speed 9397.14 samples/sec   Loss 1.0652   LearningRate 0.0001   Epoch: 28   Global Step: 49030   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:37:02,054-Speed 9390.75 samples/sec   Loss 1.0601   LearningRate 0.0001   Epoch: 28   Global Step: 49040   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:37:28,325-Speed 9355.42 samples/sec   Loss 1.0583   LearningRate 0.0001   Epoch: 28   Global Step: 49050   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:37:54,541-Speed 9374.49 samples/sec   Loss 1.0732   LearningRate 0.0001   Epoch: 28   Global Step: 49060   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:38:20,665-Speed 9408.00 samples/sec   Loss 1.0652   LearningRate 0.0001   Epoch: 28   Global Step: 49070   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:38:46,794-Speed 9406.06 samples/sec   Loss 1.0603   LearningRate 0.0001   Epoch: 28   Global Step: 49080   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:39:12,913-Speed 9409.89 samples/sec   Loss 1.0568   LearningRate 0.0001   Epoch: 28   Global Step: 49090   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:39:39,087-Speed 9389.76 samples/sec   Loss 1.0640   LearningRate 0.0001   Epoch: 28   Global Step: 49100   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:40:13,194-Speed 7205.73 samples/sec   Loss 1.0601   LearningRate 0.0001   Epoch: 28   Global Step: 49110   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-06 08:40:39,196-Speed 9452.12 samples/sec   Loss 1.0650   LearningRate 0.0001   Epoch: 28   Global Step: 49120   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:41:05,390-Speed 9382.42 samples/sec   Loss 1.0632   LearningRate 0.0001   Epoch: 28   Global Step: 49130   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:41:31,506-Speed 9410.82 samples/sec   Loss 1.0664   LearningRate 0.0001   Epoch: 28   Global Step: 49140   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:41:57,646-Speed 9401.98 samples/sec   Loss 1.0596   LearningRate 0.0001   Epoch: 28   Global Step: 49150   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:42:23,761-Speed 9411.19 samples/sec   Loss 1.0627   LearningRate 0.0001   Epoch: 28   Global Step: 49160   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:42:49,889-Speed 9406.61 samples/sec   Loss 1.0562   LearningRate 0.0001   Epoch: 28   Global Step: 49170   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:43:15,927-Speed 9438.53 samples/sec   Loss 1.0548   LearningRate 0.0001   Epoch: 28   Global Step: 49180   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:43:42,041-Speed 9411.47 samples/sec   Loss 1.0556   LearningRate 0.0001   Epoch: 28   Global Step: 49190   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:44:08,172-Speed 9405.41 samples/sec   Loss 1.0602   LearningRate 0.0001   Epoch: 28   Global Step: 49200   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:44:34,299-Speed 9406.74 samples/sec   Loss 1.0631   LearningRate 0.0001   Epoch: 28   Global Step: 49210   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:45:00,418-Speed 9409.70 samples/sec   Loss 1.0607   LearningRate 0.0001   Epoch: 28   Global Step: 49220   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:45:26,520-Speed 9415.77 samples/sec   Loss 1.0667   LearningRate 0.0001   Epoch: 28   Global Step: 49230   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:45:52,579-Speed 9431.11 samples/sec   Loss 1.0598   LearningRate 0.0001   Epoch: 28   Global Step: 49240   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:46:18,742-Speed 9393.84 samples/sec   Loss 1.0574   LearningRate 0.0001   Epoch: 28   Global Step: 49250   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:46:44,920-Speed 9388.61 samples/sec   Loss 1.0486   LearningRate 0.0001   Epoch: 28   Global Step: 49260   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:47:11,084-Speed 9393.48 samples/sec   Loss 1.0608   LearningRate 0.0001   Epoch: 28   Global Step: 49270   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:47:37,165-Speed 9423.27 samples/sec   Loss 1.0599   LearningRate 0.0001   Epoch: 28   Global Step: 49280   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:48:03,318-Speed 9397.34 samples/sec   Loss 1.0606   LearningRate 0.0001   Epoch: 28   Global Step: 49290   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:48:29,457-Speed 9402.24 samples/sec   Loss 1.0598   LearningRate 0.0001   Epoch: 28   Global Step: 49300   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:48:55,567-Speed 9413.09 samples/sec   Loss 1.0577   LearningRate 0.0001   Epoch: 28   Global Step: 49310   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:49:21,696-Speed 9406.05 samples/sec   Loss 1.0573   LearningRate 0.0001   Epoch: 28   Global Step: 49320   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:49:47,867-Speed 9391.16 samples/sec   Loss 1.0567   LearningRate 0.0001   Epoch: 28   Global Step: 49330   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-06 08:50:13,993-Speed 9406.97 samples/sec   Loss 1.0496   LearningRate 0.0001   Epoch: 28   Global Step: 49340   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:50:40,203-Speed 9379.15 samples/sec   Loss 1.0485   LearningRate 0.0001   Epoch: 28   Global Step: 49350   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:51:08,601-Speed 8654.22 samples/sec   Loss 1.0551   LearningRate 0.0001   Epoch: 28   Global Step: 49360   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:51:34,684-Speed 9422.64 samples/sec   Loss 1.0563   LearningRate 0.0001   Epoch: 28   Global Step: 49370   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:52:00,963-Speed 9352.67 samples/sec   Loss 1.0516   LearningRate 0.0001   Epoch: 28   Global Step: 49380   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:52:27,167-Speed 9378.89 samples/sec   Loss 1.0502   LearningRate 0.0001   Epoch: 28   Global Step: 49390   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:52:53,307-Speed 9402.34 samples/sec   Loss 1.0455   LearningRate 0.0001   Epoch: 28   Global Step: 49400   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:53:19,408-Speed 9415.76 samples/sec   Loss 1.0533   LearningRate 0.0001   Epoch: 28   Global Step: 49410   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:53:45,529-Speed 9409.21 samples/sec   Loss 1.0458   LearningRate 0.0001   Epoch: 28   Global Step: 49420   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-06 08:54:11,672-Speed 9401.06 samples/sec   Loss 1.0507   LearningRate 0.0001   Epoch: 28   Global Step: 49430   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 08:54:37,807-Speed 9403.78 samples/sec   Loss 1.0508   LearningRate 0.0001   Epoch: 28   Global Step: 49440   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-06 08:55:03,957-Speed 9398.30 samples/sec   Loss 1.0589   LearningRate 0.0001   Epoch: 28   Global Step: 49450   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-06 08:55:30,141-Speed 9386.27 samples/sec   Loss 1.0490   LearningRate 0.0001   Epoch: 28   Global Step: 49460   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-06 08:55:56,253-Speed 9412.33 samples/sec   Loss 1.0512   LearningRate 0.0001   Epoch: 28   Global Step: 49470   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-06 08:56:22,360-Speed 9414.39 samples/sec   Loss 1.0494   LearningRate 0.0001   Epoch: 28   Global Step: 49480   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-06 08:56:48,494-Speed 9403.98 samples/sec   Loss 1.0599   LearningRate 0.0001   Epoch: 28   Global Step: 49490   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-06 08:57:14,660-Speed 9393.00 samples/sec   Loss 1.0552   LearningRate 0.0001   Epoch: 28   Global Step: 49500   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-06 08:57:40,842-Speed 9387.09 samples/sec   Loss 1.0548   LearningRate 0.0001   Epoch: 28   Global Step: 49510   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-06 08:58:07,033-Speed 9383.67 samples/sec   Loss 1.0437   LearningRate 0.0001   Epoch: 28   Global Step: 49520   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-06 08:58:33,158-Speed 9407.52 samples/sec   Loss 1.0553   LearningRate 0.0001   Epoch: 28   Global Step: 49530   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-06 08:58:59,291-Speed 9404.40 samples/sec   Loss 1.0451   LearningRate 0.0001   Epoch: 28   Global Step: 49540   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-03-06 08:59:25,415-Speed 9408.04 samples/sec   Loss 1.0518   LearningRate 0.0001   Epoch: 28   Global Step: 49550   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-03-06 08:59:51,483-Speed 9428.18 samples/sec   Loss 1.0502   LearningRate 0.0001   Epoch: 28   Global Step: 49560   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-06 09:00:17,732-Speed 9362.76 samples/sec   Loss 1.0583   LearningRate 0.0001   Epoch: 28   Global Step: 49570   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-06 09:00:43,870-Speed 9402.81 samples/sec   Loss 1.0534   LearningRate 0.0001   Epoch: 28   Global Step: 49580   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-06 09:01:09,956-Speed 9421.52 samples/sec   Loss 1.0636   LearningRate 0.0001   Epoch: 28   Global Step: 49590   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-06 09:01:36,068-Speed 9412.09 samples/sec   Loss 1.0436   LearningRate 0.0001   Epoch: 28   Global Step: 49600   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-06 09:02:02,173-Speed 9414.77 samples/sec   Loss 1.0437   LearningRate 0.0001   Epoch: 28   Global Step: 49610   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-06 09:02:28,269-Speed 9417.90 samples/sec   Loss 1.0376   LearningRate 0.0001   Epoch: 28   Global Step: 49620   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-06 09:02:54,333-Speed 9429.49 samples/sec   Loss 1.0477   LearningRate 0.0001   Epoch: 28   Global Step: 49630   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-06 09:03:20,425-Speed 9419.39 samples/sec   Loss 1.0429   LearningRate 0.0001   Epoch: 28   Global Step: 49640   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-06 09:03:46,519-Speed 9418.59 samples/sec   Loss 1.0409   LearningRate 0.0001   Epoch: 28   Global Step: 49650   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-06 09:04:12,582-Speed 9430.14 samples/sec   Loss 1.0531   LearningRate 0.0001   Epoch: 28   Global Step: 49660   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-03-06 09:04:38,594-Speed 9449.10 samples/sec   Loss 1.0414   LearningRate 0.0001   Epoch: 28   Global Step: 49670   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-06 09:05:04,670-Speed 9425.07 samples/sec   Loss 1.0380   LearningRate 0.0001   Epoch: 28   Global Step: 49680   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-06 09:05:30,836-Speed 9392.70 samples/sec   Loss 1.0500   LearningRate 0.0001   Epoch: 28   Global Step: 49690   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-06 09:05:56,943-Speed 9413.97 samples/sec   Loss 1.0532   LearningRate 0.0001   Epoch: 28   Global Step: 49700   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-06 09:06:23,059-Speed 9410.86 samples/sec   Loss 1.0451   LearningRate 0.0001   Epoch: 28   Global Step: 49710   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-06 09:06:49,120-Speed 9430.61 samples/sec   Loss 1.0473   LearningRate 0.0001   Epoch: 28   Global Step: 49720   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:07:15,352-Speed 9368.95 samples/sec   Loss 1.0397   LearningRate 0.0001   Epoch: 28   Global Step: 49730   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:07:41,542-Speed 9384.15 samples/sec   Loss 1.0511   LearningRate 0.0001   Epoch: 28   Global Step: 49740   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:08:07,805-Speed 9358.31 samples/sec   Loss 1.0393   LearningRate 0.0001   Epoch: 28   Global Step: 49750   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:08:34,021-Speed 9374.65 samples/sec   Loss 1.0474   LearningRate 0.0001   Epoch: 28   Global Step: 49760   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:09:00,166-Speed 9400.59 samples/sec   Loss 1.0453   LearningRate 0.0001   Epoch: 28   Global Step: 49770   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:09:26,319-Speed 9397.22 samples/sec   Loss 1.0420   LearningRate 0.0001   Epoch: 28   Global Step: 49780   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:09:52,439-Speed 9409.52 samples/sec   Loss 1.0448   LearningRate 0.0001   Epoch: 28   Global Step: 49790   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:10:18,577-Speed 9402.81 samples/sec   Loss 1.0439   LearningRate 0.0001   Epoch: 28   Global Step: 49800   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:10:44,658-Speed 9423.53 samples/sec   Loss 1.0499   LearningRate 0.0001   Epoch: 28   Global Step: 49810   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:11:10,774-Speed 9410.99 samples/sec   Loss 1.0397   LearningRate 0.0001   Epoch: 28   Global Step: 49820   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-06 09:11:36,904-Speed 9405.53 samples/sec   Loss 1.0441   LearningRate 0.0001   Epoch: 28   Global Step: 49830   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-06 09:12:03,090-Speed 9385.87 samples/sec   Loss 1.0329   LearningRate 0.0001   Epoch: 28   Global Step: 49840   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-06 09:12:29,248-Speed 9395.45 samples/sec   Loss 1.0433   LearningRate 0.0001   Epoch: 28   Global Step: 49850   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-06 09:12:55,357-Speed 9413.28 samples/sec   Loss 1.0325   LearningRate 0.0001   Epoch: 28   Global Step: 49860   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-06 09:13:21,484-Speed 9407.04 samples/sec   Loss 1.0410   LearningRate 0.0001   Epoch: 28   Global Step: 49870   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-06 09:13:47,560-Speed 9425.00 samples/sec   Loss 1.0426   LearningRate 0.0001   Epoch: 28   Global Step: 49880   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-06 09:14:13,735-Speed 9389.92 samples/sec   Loss 1.0502   LearningRate 0.0001   Epoch: 28   Global Step: 49890   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-06 09:14:39,906-Speed 9390.86 samples/sec   Loss 1.0421   LearningRate 0.0001   Epoch: 28   Global Step: 49900   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-06 09:15:05,996-Speed 9420.12 samples/sec   Loss 1.0372   LearningRate 0.0001   Epoch: 28   Global Step: 49910   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-06 09:15:32,118-Speed 9408.30 samples/sec   Loss 1.0430   LearningRate 0.0001   Epoch: 28   Global Step: 49920   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-03-06 09:15:58,311-Speed 9383.02 samples/sec   Loss 1.0406   LearningRate 0.0001   Epoch: 28   Global Step: 49930   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-03-06 09:16:24,404-Speed 9419.09 samples/sec   Loss 1.0394   LearningRate 0.0001   Epoch: 28   Global Step: 49940   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:16:50,571-Speed 9392.48 samples/sec   Loss 1.0397   LearningRate 0.0001   Epoch: 28   Global Step: 49950   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:17:16,751-Speed 9387.57 samples/sec   Loss 1.0401   LearningRate 0.0001   Epoch: 28   Global Step: 49960   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:17:42,915-Speed 9393.45 samples/sec   Loss 1.0359   LearningRate 0.0001   Epoch: 28   Global Step: 49970   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:18:09,068-Speed 9397.45 samples/sec   Loss 1.0356   LearningRate 0.0001   Epoch: 28   Global Step: 49980   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:18:35,210-Speed 9401.51 samples/sec   Loss 1.0457   LearningRate 0.0001   Epoch: 28   Global Step: 49990   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:19:01,362-Speed 9397.59 samples/sec   Loss 1.0353   LearningRate 0.0001   Epoch: 28   Global Step: 50000   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:19:27,483-Speed 9409.28 samples/sec   Loss 1.0319   LearningRate 0.0001   Epoch: 28   Global Step: 50010   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:19:53,636-Speed 9397.53 samples/sec   Loss 1.0432   LearningRate 0.0001   Epoch: 28   Global Step: 50020   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:20:19,852-Speed 9374.89 samples/sec   Loss 1.0352   LearningRate 0.0001   Epoch: 28   Global Step: 50030   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:20:46,043-Speed 9383.52 samples/sec   Loss 1.0443   LearningRate 0.0001   Epoch: 28   Global Step: 50040   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-06 09:21:12,153-Speed 9412.89 samples/sec   Loss 1.0452   LearningRate 0.0001   Epoch: 28   Global Step: 50050   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-06 09:21:38,401-Speed 9364.06 samples/sec   Loss 1.0352   LearningRate 0.0001   Epoch: 28   Global Step: 50060   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-06 09:22:04,539-Speed 9403.09 samples/sec   Loss 1.0384   LearningRate 0.0001   Epoch: 28   Global Step: 50070   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-06 09:22:30,686-Speed 9399.70 samples/sec   Loss 1.0469   LearningRate 0.0001   Epoch: 28   Global Step: 50080   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-06 09:22:56,778-Speed 9419.29 samples/sec   Loss 1.0459   LearningRate 0.0001   Epoch: 28   Global Step: 50090   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-06 09:23:22,909-Speed 9405.53 samples/sec   Loss 1.0487   LearningRate 0.0001   Epoch: 28   Global Step: 50100   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-06 09:23:49,017-Speed 9413.84 samples/sec   Loss 1.0486   LearningRate 0.0001   Epoch: 28   Global Step: 50110   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-06 09:24:15,220-Speed 9379.37 samples/sec   Loss 1.0522   LearningRate 0.0001   Epoch: 28   Global Step: 50120   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-06 09:25:34,405-Speed 3103.67 samples/sec   Loss 1.0318   LearningRate 0.0001   Epoch: 29   Global Step: 50130   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-06 09:26:00,394-Speed 9456.68 samples/sec   Loss 1.0308   LearningRate 0.0001   Epoch: 29   Global Step: 50140   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:26:26,429-Speed 9440.08 samples/sec   Loss 1.0290   LearningRate 0.0001   Epoch: 29   Global Step: 50150   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:26:52,538-Speed 9413.24 samples/sec   Loss 1.0376   LearningRate 0.0001   Epoch: 29   Global Step: 50160   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:27:18,615-Speed 9424.79 samples/sec   Loss 1.0283   LearningRate 0.0001   Epoch: 29   Global Step: 50170   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:27:44,583-Speed 9464.34 samples/sec   Loss 1.0250   LearningRate 0.0001   Epoch: 29   Global Step: 50180   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:28:10,702-Speed 9409.55 samples/sec   Loss 1.0340   LearningRate 0.0001   Epoch: 29   Global Step: 50190   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:28:36,788-Speed 9421.84 samples/sec   Loss 1.0273   LearningRate 0.0001   Epoch: 29   Global Step: 50200   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:29:02,806-Speed 9446.14 samples/sec   Loss 1.0291   LearningRate 0.0001   Epoch: 29   Global Step: 50210   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-06 09:29:28,946-Speed 9402.38 samples/sec   Loss 1.0247   LearningRate 0.0001   Epoch: 29   Global Step: 50220   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-06 09:29:55,048-Speed 9415.86 samples/sec   Loss 1.0284   LearningRate 0.0001   Epoch: 29   Global Step: 50230   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-06 09:30:21,101-Speed 9433.41 samples/sec   Loss 1.0250   LearningRate 0.0001   Epoch: 29   Global Step: 50240   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-06 09:30:47,180-Speed 9423.86 samples/sec   Loss 1.0270   LearningRate 0.0001   Epoch: 29   Global Step: 50250   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-06 09:31:13,269-Speed 9420.49 samples/sec   Loss 1.0271   LearningRate 0.0001   Epoch: 29   Global Step: 50260   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-06 09:31:39,325-Speed 9432.80 samples/sec   Loss 1.0237   LearningRate 0.0001   Epoch: 29   Global Step: 50270   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-06 09:32:05,426-Speed 9416.17 samples/sec   Loss 1.0331   LearningRate 0.0001   Epoch: 29   Global Step: 50280   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-06 09:32:31,486-Speed 9431.53 samples/sec   Loss 1.0211   LearningRate 0.0001   Epoch: 29   Global Step: 50290   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-06 09:32:57,594-Speed 9413.53 samples/sec   Loss 1.0227   LearningRate 0.0001   Epoch: 29   Global Step: 50300   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-06 09:33:23,636-Speed 9437.61 samples/sec   Loss 1.0261   LearningRate 0.0001   Epoch: 29   Global Step: 50310   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:33:49,742-Speed 9414.21 samples/sec   Loss 1.0262   LearningRate 0.0001   Epoch: 29   Global Step: 50320   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:34:15,824-Speed 9423.20 samples/sec   Loss 1.0252   LearningRate 0.0001   Epoch: 29   Global Step: 50330   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:34:41,872-Speed 9435.32 samples/sec   Loss 1.0272   LearningRate 0.0001   Epoch: 29   Global Step: 50340   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:35:08,020-Speed 9398.97 samples/sec   Loss 1.0257   LearningRate 0.0001   Epoch: 29   Global Step: 50350   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:35:34,053-Speed 9440.85 samples/sec   Loss 1.0279   LearningRate 0.0001   Epoch: 29   Global Step: 50360   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:36:00,245-Speed 9383.50 samples/sec   Loss 1.0311   LearningRate 0.0001   Epoch: 29   Global Step: 50370   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:36:26,409-Speed 9393.58 samples/sec   Loss 1.0289   LearningRate 0.0001   Epoch: 29   Global Step: 50380   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:36:52,596-Speed 9385.25 samples/sec   Loss 1.0263   LearningRate 0.0001   Epoch: 29   Global Step: 50390   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:37:18,688-Speed 9419.32 samples/sec   Loss 1.0282   LearningRate 0.0001   Epoch: 29   Global Step: 50400   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:37:44,723-Speed 9440.03 samples/sec   Loss 1.0217   LearningRate 0.0001   Epoch: 29   Global Step: 50410   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:38:10,839-Speed 9410.85 samples/sec   Loss 1.0225   LearningRate 0.0001   Epoch: 29   Global Step: 50420   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:38:36,934-Speed 9418.35 samples/sec   Loss 1.0230   LearningRate 0.0001   Epoch: 29   Global Step: 50430   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:39:03,004-Speed 9427.28 samples/sec   Loss 1.0208   LearningRate 0.0001   Epoch: 29   Global Step: 50440   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:39:29,129-Speed 9407.67 samples/sec   Loss 1.0273   LearningRate 0.0001   Epoch: 29   Global Step: 50450   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:39:55,230-Speed 9416.08 samples/sec   Loss 1.0209   LearningRate 0.0001   Epoch: 29   Global Step: 50460   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:40:21,386-Speed 9396.44 samples/sec   Loss 1.0319   LearningRate 0.0001   Epoch: 29   Global Step: 50470   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:40:47,545-Speed 9395.29 samples/sec   Loss 1.0335   LearningRate 0.0001   Epoch: 29   Global Step: 50480   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:41:13,708-Speed 9393.86 samples/sec   Loss 1.0257   LearningRate 0.0001   Epoch: 29   Global Step: 50490   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:41:39,831-Speed 9408.21 samples/sec   Loss 1.0236   LearningRate 0.0001   Epoch: 29   Global Step: 50500   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:42:06,021-Speed 9384.63 samples/sec   Loss 1.0250   LearningRate 0.0001   Epoch: 29   Global Step: 50510   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-06 09:42:32,141-Speed 9409.45 samples/sec   Loss 1.0191   LearningRate 0.0001   Epoch: 29   Global Step: 50520   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-06 09:42:58,239-Speed 9417.41 samples/sec   Loss 1.0283   LearningRate 0.0001   Epoch: 29   Global Step: 50530   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-06 09:43:24,335-Speed 9418.06 samples/sec   Loss 1.0165   LearningRate 0.0001   Epoch: 29   Global Step: 50540   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-06 09:43:50,483-Speed 9399.02 samples/sec   Loss 1.0245   LearningRate 0.0001   Epoch: 29   Global Step: 50550   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:44:16,646-Speed 9393.79 samples/sec   Loss 1.0270   LearningRate 0.0001   Epoch: 29   Global Step: 50560   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:44:42,826-Speed 9387.81 samples/sec   Loss 1.0290   LearningRate 0.0001   Epoch: 29   Global Step: 50570   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:45:08,993-Speed 9393.13 samples/sec   Loss 1.0219   LearningRate 0.0001   Epoch: 29   Global Step: 50580   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:45:35,228-Speed 9367.99 samples/sec   Loss 1.0192   LearningRate 0.0001   Epoch: 29   Global Step: 50590   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:46:01,333-Speed 9414.71 samples/sec   Loss 1.0286   LearningRate 0.0001   Epoch: 29   Global Step: 50600   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:46:27,429-Speed 9417.97 samples/sec   Loss 1.0287   LearningRate 0.0001   Epoch: 29   Global Step: 50610   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:46:53,599-Speed 9391.52 samples/sec   Loss 1.0260   LearningRate 0.0001   Epoch: 29   Global Step: 50620   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:47:19,786-Speed 9385.35 samples/sec   Loss 1.0315   LearningRate 0.0001   Epoch: 29   Global Step: 50630   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:47:46,009-Speed 9372.48 samples/sec   Loss 1.0287   LearningRate 0.0001   Epoch: 29   Global Step: 50640   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:48:12,275-Speed 9357.08 samples/sec   Loss 1.0253   LearningRate 0.0001   Epoch: 29   Global Step: 50650   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-06 09:48:38,468-Speed 9383.04 samples/sec   Loss 1.0267   LearningRate 0.0001   Epoch: 29   Global Step: 50660   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-06 09:49:04,747-Speed 9352.11 samples/sec   Loss 1.0174   LearningRate 0.0001   Epoch: 29   Global Step: 50670   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-06 09:49:30,863-Speed 9410.86 samples/sec   Loss 1.0202   LearningRate 0.0001   Epoch: 29   Global Step: 50680   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-06 09:49:56,938-Speed 9425.58 samples/sec   Loss 1.0210   LearningRate 0.0001   Epoch: 29   Global Step: 50690   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:50:23,082-Speed 9400.85 samples/sec   Loss 1.0237   LearningRate 0.0001   Epoch: 29   Global Step: 50700   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:50:49,234-Speed 9397.75 samples/sec   Loss 1.0157   LearningRate 0.0001   Epoch: 29   Global Step: 50710   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:51:15,434-Speed 9380.45 samples/sec   Loss 1.0142   LearningRate 0.0001   Epoch: 29   Global Step: 50720   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:51:41,584-Speed 9398.81 samples/sec   Loss 1.0287   LearningRate 0.0001   Epoch: 29   Global Step: 50730   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:52:07,684-Speed 9416.41 samples/sec   Loss 1.0184   LearningRate 0.0001   Epoch: 29   Global Step: 50740   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:52:33,790-Speed 9414.30 samples/sec   Loss 1.0200   LearningRate 0.0001   Epoch: 29   Global Step: 50750   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:52:59,881-Speed 9419.76 samples/sec   Loss 1.0167   LearningRate 0.0001   Epoch: 29   Global Step: 50760   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:53:25,930-Speed 9435.61 samples/sec   Loss 1.0170   LearningRate 0.0001   Epoch: 29   Global Step: 50770   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-06 09:53:52,069-Speed 9402.48 samples/sec   Loss 1.0216   LearningRate 0.0001   Epoch: 29   Global Step: 50780   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 09:54:18,226-Speed 9396.04 samples/sec   Loss 1.0171   LearningRate 0.0001   Epoch: 29   Global Step: 50790   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 09:54:44,398-Speed 9390.37 samples/sec   Loss 1.0296   LearningRate 0.0001   Epoch: 29   Global Step: 50800   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 09:55:10,557-Speed 9395.40 samples/sec   Loss 1.0160   LearningRate 0.0001   Epoch: 29   Global Step: 50810   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 09:55:36,747-Speed 9383.95 samples/sec   Loss 1.0129   LearningRate 0.0001   Epoch: 29   Global Step: 50820   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 09:56:02,893-Speed 9400.14 samples/sec   Loss 1.0126   LearningRate 0.0001   Epoch: 29   Global Step: 50830   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 09:56:29,029-Speed 9403.29 samples/sec   Loss 1.0143   LearningRate 0.0001   Epoch: 29   Global Step: 50840   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 09:56:55,166-Speed 9403.43 samples/sec   Loss 1.0120   LearningRate 0.0001   Epoch: 29   Global Step: 50850   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 09:57:21,313-Speed 9399.31 samples/sec   Loss 1.0115   LearningRate 0.0001   Epoch: 29   Global Step: 50860   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 09:57:47,398-Speed 9422.26 samples/sec   Loss 1.0152   LearningRate 0.0001   Epoch: 29   Global Step: 50870   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 09:58:13,510-Speed 9412.15 samples/sec   Loss 1.0250   LearningRate 0.0001   Epoch: 29   Global Step: 50880   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 09:58:39,641-Speed 9405.44 samples/sec   Loss 1.0141   LearningRate 0.0001   Epoch: 29   Global Step: 50890   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 09:59:05,788-Speed 9399.29 samples/sec   Loss 1.0148   LearningRate 0.0001   Epoch: 29   Global Step: 50900   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 09:59:31,988-Speed 9380.75 samples/sec   Loss 1.0167   LearningRate 0.0001   Epoch: 29   Global Step: 50910   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 09:59:58,107-Speed 9409.63 samples/sec   Loss 1.0129   LearningRate 0.0001   Epoch: 29   Global Step: 50920   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 10:00:24,222-Speed 9410.96 samples/sec   Loss 1.0175   LearningRate 0.0001   Epoch: 29   Global Step: 50930   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 10:00:52,839-Speed 8588.25 samples/sec   Loss 1.0166   LearningRate 0.0001   Epoch: 29   Global Step: 50940   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 10:01:18,966-Speed 9406.93 samples/sec   Loss 1.0093   LearningRate 0.0001   Epoch: 29   Global Step: 50950   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 10:01:45,094-Speed 9406.40 samples/sec   Loss 1.0083   LearningRate 0.0001   Epoch: 29   Global Step: 50960   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 10:02:11,264-Speed 9391.43 samples/sec   Loss 1.0049   LearningRate 0.0001   Epoch: 29   Global Step: 50970   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 10:02:37,399-Speed 9403.92 samples/sec   Loss 1.0079   LearningRate 0.0001   Epoch: 29   Global Step: 50980   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 10:03:03,655-Speed 9360.81 samples/sec   Loss 1.0041   LearningRate 0.0001   Epoch: 29   Global Step: 50990   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 10:03:29,830-Speed 9389.30 samples/sec   Loss 1.0099   LearningRate 0.0001   Epoch: 29   Global Step: 51000   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 10:03:55,994-Speed 9393.31 samples/sec   Loss 1.0079   LearningRate 0.0001   Epoch: 29   Global Step: 51010   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-03-06 10:04:22,253-Speed 9359.69 samples/sec   Loss 1.0108   LearningRate 0.0001   Epoch: 29   Global Step: 51020   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-03-06 10:04:48,518-Speed 9357.54 samples/sec   Loss 1.0061   LearningRate 0.0001   Epoch: 29   Global Step: 51030   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-03-06 10:05:14,653-Speed 9404.24 samples/sec   Loss 0.9984   LearningRate 0.0001   Epoch: 29   Global Step: 51040   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 10:05:40,828-Speed 9389.28 samples/sec   Loss 1.0046   LearningRate 0.0001   Epoch: 29   Global Step: 51050   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 10:06:07,012-Speed 9386.57 samples/sec   Loss 1.0090   LearningRate 0.0001   Epoch: 29   Global Step: 51060   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:06:33,257-Speed 9364.45 samples/sec   Loss 0.9977   LearningRate 0.0001   Epoch: 29   Global Step: 51070   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:06:59,542-Speed 9350.42 samples/sec   Loss 1.0105   LearningRate 0.0001   Epoch: 29   Global Step: 51080   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:07:25,761-Speed 9373.76 samples/sec   Loss 1.0114   LearningRate 0.0001   Epoch: 29   Global Step: 51090   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:07:51,970-Speed 9377.52 samples/sec   Loss 1.0075   LearningRate 0.0001   Epoch: 29   Global Step: 51100   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:08:18,045-Speed 9425.64 samples/sec   Loss 1.0118   LearningRate 0.0001   Epoch: 29   Global Step: 51110   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:08:44,142-Speed 9417.64 samples/sec   Loss 1.0077   LearningRate 0.0001   Epoch: 29   Global Step: 51120   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:09:10,207-Speed 9429.15 samples/sec   Loss 1.0018   LearningRate 0.0001   Epoch: 29   Global Step: 51130   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:09:36,403-Speed 9382.11 samples/sec   Loss 0.9998   LearningRate 0.0001   Epoch: 29   Global Step: 51140   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:10:02,456-Speed 9433.17 samples/sec   Loss 1.0095   LearningRate 0.0001   Epoch: 29   Global Step: 51150   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:10:28,616-Speed 9395.09 samples/sec   Loss 1.0104   LearningRate 0.0001   Epoch: 29   Global Step: 51160   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 10:10:54,790-Speed 9390.09 samples/sec   Loss 0.9998   LearningRate 0.0001   Epoch: 29   Global Step: 51170   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 10:11:20,896-Speed 9415.37 samples/sec   Loss 1.0017   LearningRate 0.0001   Epoch: 29   Global Step: 51180   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:11:47,002-Speed 9414.20 samples/sec   Loss 1.0085   LearningRate 0.0001   Epoch: 29   Global Step: 51190   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:12:13,133-Speed 9405.40 samples/sec   Loss 1.0095   LearningRate 0.0001   Epoch: 29   Global Step: 51200   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:12:39,281-Speed 9399.09 samples/sec   Loss 1.0068   LearningRate 0.0001   Epoch: 29   Global Step: 51210   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:13:05,471-Speed 9384.28 samples/sec   Loss 1.0120   LearningRate 0.0001   Epoch: 29   Global Step: 51220   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:13:31,593-Speed 9408.74 samples/sec   Loss 1.0061   LearningRate 0.0001   Epoch: 29   Global Step: 51230   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:13:57,730-Speed 9403.28 samples/sec   Loss 1.0019   LearningRate 0.0001   Epoch: 29   Global Step: 51240   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:14:23,808-Speed 9424.29 samples/sec   Loss 1.0079   LearningRate 0.0001   Epoch: 29   Global Step: 51250   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:14:50,100-Speed 9347.88 samples/sec   Loss 1.0040   LearningRate 0.0001   Epoch: 29   Global Step: 51260   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:15:16,241-Speed 9401.65 samples/sec   Loss 0.9998   LearningRate 0.0001   Epoch: 29   Global Step: 51270   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:15:42,343-Speed 9415.98 samples/sec   Loss 1.0084   LearningRate 0.0001   Epoch: 29   Global Step: 51280   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 10:16:08,471-Speed 9406.52 samples/sec   Loss 1.0019   LearningRate 0.0001   Epoch: 29   Global Step: 51290   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 10:16:34,643-Speed 9390.67 samples/sec   Loss 1.0069   LearningRate 0.0001   Epoch: 29   Global Step: 51300   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 10:17:00,807-Speed 9393.45 samples/sec   Loss 0.9967   LearningRate 0.0001   Epoch: 29   Global Step: 51310   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 10:17:27,042-Speed 9367.85 samples/sec   Loss 1.0101   LearningRate 0.0001   Epoch: 29   Global Step: 51320   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 10:17:53,185-Speed 9401.35 samples/sec   Loss 1.0049   LearningRate 0.0001   Epoch: 29   Global Step: 51330   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 10:18:19,346-Speed 9394.29 samples/sec   Loss 1.0051   LearningRate 0.0001   Epoch: 29   Global Step: 51340   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 10:18:45,470-Speed 9407.70 samples/sec   Loss 0.9978   LearningRate 0.0001   Epoch: 29   Global Step: 51350   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 10:19:11,618-Speed 9399.22 samples/sec   Loss 0.9972   LearningRate 0.0001   Epoch: 29   Global Step: 51360   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 10:19:37,853-Speed 9368.17 samples/sec   Loss 1.0036   LearningRate 0.0001   Epoch: 29   Global Step: 51370   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 10:20:03,997-Speed 9400.46 samples/sec   Loss 1.0015   LearningRate 0.0001   Epoch: 29   Global Step: 51380   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-03-06 10:20:30,118-Speed 9408.99 samples/sec   Loss 1.0090   LearningRate 0.0001   Epoch: 29   Global Step: 51390   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-03-06 10:20:56,177-Speed 9431.51 samples/sec   Loss 0.9929   LearningRate 0.0001   Epoch: 29   Global Step: 51400   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 10:21:22,346-Speed 9392.23 samples/sec   Loss 0.9958   LearningRate 0.0001   Epoch: 29   Global Step: 51410   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 10:21:48,494-Speed 9399.40 samples/sec   Loss 0.9964   LearningRate 0.0001   Epoch: 29   Global Step: 51420   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 10:22:14,689-Speed 9382.19 samples/sec   Loss 0.9952   LearningRate 0.0001   Epoch: 29   Global Step: 51430   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 10:22:40,846-Speed 9396.06 samples/sec   Loss 1.0036   LearningRate 0.0001   Epoch: 29   Global Step: 51440   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 10:23:06,920-Speed 9425.88 samples/sec   Loss 0.9878   LearningRate 0.0001   Epoch: 29   Global Step: 51450   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:23:33,036-Speed 9410.70 samples/sec   Loss 1.0086   LearningRate 0.0001   Epoch: 29   Global Step: 51460   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:23:59,224-Speed 9384.93 samples/sec   Loss 1.0003   LearningRate 0.0001   Epoch: 29   Global Step: 51470   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:24:25,375-Speed 9398.56 samples/sec   Loss 0.9927   LearningRate 0.0001   Epoch: 29   Global Step: 51480   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:24:51,556-Speed 9387.21 samples/sec   Loss 0.9907   LearningRate 0.0001   Epoch: 29   Global Step: 51490   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:25:17,704-Speed 9399.26 samples/sec   Loss 0.9918   LearningRate 0.0001   Epoch: 29   Global Step: 51500   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:25:43,840-Speed 9403.67 samples/sec   Loss 0.9911   LearningRate 0.0001   Epoch: 29   Global Step: 51510   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:26:09,966-Speed 9407.33 samples/sec   Loss 0.9953   LearningRate 0.0001   Epoch: 29   Global Step: 51520   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:26:36,153-Speed 9385.37 samples/sec   Loss 1.0037   LearningRate 0.0001   Epoch: 29   Global Step: 51530   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:27:02,376-Speed 9372.32 samples/sec   Loss 0.9975   LearningRate 0.0001   Epoch: 29   Global Step: 51540   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:27:28,572-Speed 9381.88 samples/sec   Loss 0.9892   LearningRate 0.0001   Epoch: 29   Global Step: 51550   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 10:27:54,813-Speed 9365.88 samples/sec   Loss 0.9962   LearningRate 0.0001   Epoch: 29   Global Step: 51560   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 10:28:21,025-Speed 9376.44 samples/sec   Loss 1.0000   LearningRate 0.0001   Epoch: 29   Global Step: 51570   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 10:28:47,194-Speed 9391.69 samples/sec   Loss 0.9908   LearningRate 0.0001   Epoch: 29   Global Step: 51580   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 10:29:13,404-Speed 9376.99 samples/sec   Loss 1.0017   LearningRate 0.0001   Epoch: 29   Global Step: 51590   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 10:29:39,533-Speed 9406.15 samples/sec   Loss 0.9935   LearningRate 0.0001   Epoch: 29   Global Step: 51600   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 10:30:05,677-Speed 9401.25 samples/sec   Loss 0.9993   LearningRate 0.0001   Epoch: 29   Global Step: 51610   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 10:30:32,031-Speed 9325.45 samples/sec   Loss 0.9995   LearningRate 0.0001   Epoch: 29   Global Step: 51620   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 10:30:58,190-Speed 9395.52 samples/sec   Loss 0.9960   LearningRate 0.0001   Epoch: 29   Global Step: 51630   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 10:31:24,280-Speed 9419.80 samples/sec   Loss 0.9943   LearningRate 0.0001   Epoch: 29   Global Step: 51640   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:31:50,409-Speed 9406.12 samples/sec   Loss 1.0003   LearningRate 0.0001   Epoch: 29   Global Step: 51650   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:32:16,547-Speed 9403.04 samples/sec   Loss 0.9944   LearningRate 0.0001   Epoch: 29   Global Step: 51660   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:32:42,707-Speed 9394.54 samples/sec   Loss 0.9936   LearningRate 0.0001   Epoch: 29   Global Step: 51670   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:33:08,752-Speed 9436.65 samples/sec   Loss 0.9987   LearningRate 0.0001   Epoch: 29   Global Step: 51680   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:33:34,830-Speed 9424.56 samples/sec   Loss 0.9932   LearningRate 0.0001   Epoch: 29   Global Step: 51690   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:34:00,913-Speed 9422.65 samples/sec   Loss 0.9994   LearningRate 0.0001   Epoch: 29   Global Step: 51700   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:34:27,024-Speed 9412.46 samples/sec   Loss 0.9867   LearningRate 0.0001   Epoch: 29   Global Step: 51710   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:34:53,176-Speed 9398.05 samples/sec   Loss 1.0004   LearningRate 0.0001   Epoch: 29   Global Step: 51720   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:35:19,333-Speed 9395.97 samples/sec   Loss 0.9903   LearningRate 0.0001   Epoch: 29   Global Step: 51730   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:35:45,454-Speed 9408.76 samples/sec   Loss 0.9991   LearningRate 0.0001   Epoch: 29   Global Step: 51740   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 10:36:11,582-Speed 9406.49 samples/sec   Loss 0.9975   LearningRate 0.0001   Epoch: 29   Global Step: 51750   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:36:37,680-Speed 9417.41 samples/sec   Loss 0.9998   LearningRate 0.0001   Epoch: 29   Global Step: 51760   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:37:03,774-Speed 9418.49 samples/sec   Loss 0.9928   LearningRate 0.0001   Epoch: 29   Global Step: 51770   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:37:29,874-Speed 9416.40 samples/sec   Loss 0.9951   LearningRate 0.0001   Epoch: 29   Global Step: 51780   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:37:56,027-Speed 9397.54 samples/sec   Loss 0.9999   LearningRate 0.0001   Epoch: 29   Global Step: 51790   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:38:22,191-Speed 9393.30 samples/sec   Loss 0.9888   LearningRate 0.0001   Epoch: 29   Global Step: 51800   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:38:48,324-Speed 9404.63 samples/sec   Loss 0.9997   LearningRate 0.0001   Epoch: 29   Global Step: 51810   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:39:14,486-Speed 9394.22 samples/sec   Loss 0.9882   LearningRate 0.0001   Epoch: 29   Global Step: 51820   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:39:40,592-Speed 9414.19 samples/sec   Loss 0.9956   LearningRate 0.0001   Epoch: 29   Global Step: 51830   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:40:06,755-Speed 9394.16 samples/sec   Loss 0.9962   LearningRate 0.0001   Epoch: 29   Global Step: 51840   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:41:26,722-Speed 3073.32 samples/sec   Loss 0.9964   LearningRate 0.0001   Epoch: 30   Global Step: 51850   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 10:41:52,672-Speed 9470.64 samples/sec   Loss 0.9931   LearningRate 0.0001   Epoch: 30   Global Step: 51860   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 10:42:18,629-Speed 9468.73 samples/sec   Loss 0.9891   LearningRate 0.0001   Epoch: 30   Global Step: 51870   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 10:42:44,664-Speed 9439.96 samples/sec   Loss 0.9894   LearningRate 0.0001   Epoch: 30   Global Step: 51880   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 10:43:10,628-Speed 9465.80 samples/sec   Loss 0.9867   LearningRate 0.0001   Epoch: 30   Global Step: 51890   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 10:43:36,592-Speed 9465.85 samples/sec   Loss 0.9907   LearningRate 0.0001   Epoch: 30   Global Step: 51900   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 10:44:02,605-Speed 9448.27 samples/sec   Loss 0.9802   LearningRate 0.0001   Epoch: 30   Global Step: 51910   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 10:44:28,605-Speed 9452.46 samples/sec   Loss 0.9766   LearningRate 0.0001   Epoch: 30   Global Step: 51920   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 10:44:54,624-Speed 9445.80 samples/sec   Loss 0.9815   LearningRate 0.0001   Epoch: 30   Global Step: 51930   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 10:45:20,535-Speed 9485.30 samples/sec   Loss 0.9716   LearningRate 0.0001   Epoch: 30   Global Step: 51940   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 10:45:46,568-Speed 9440.99 samples/sec   Loss 0.9798   LearningRate 0.0001   Epoch: 30   Global Step: 51950   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-03-06 10:46:12,637-Speed 9427.43 samples/sec   Loss 0.9843   LearningRate 0.0001   Epoch: 30   Global Step: 51960   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-03-06 10:46:38,849-Speed 9376.36 samples/sec   Loss 0.9790   LearningRate 0.0001   Epoch: 30   Global Step: 51970   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 10:47:04,866-Speed 9446.89 samples/sec   Loss 0.9793   LearningRate 0.0001   Epoch: 30   Global Step: 51980   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 10:47:30,853-Speed 9457.25 samples/sec   Loss 0.9837   LearningRate 0.0001   Epoch: 30   Global Step: 51990   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 10:47:56,897-Speed 9436.91 samples/sec   Loss 0.9824   LearningRate 0.0001   Epoch: 30   Global Step: 52000   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 10:48:22,983-Speed 9421.75 samples/sec   Loss 0.9776   LearningRate 0.0001   Epoch: 30   Global Step: 52010   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 10:48:49,038-Speed 9433.27 samples/sec   Loss 0.9887   LearningRate 0.0001   Epoch: 30   Global Step: 52020   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 10:49:15,121-Speed 9422.63 samples/sec   Loss 0.9834   LearningRate 0.0001   Epoch: 30   Global Step: 52030   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:49:41,139-Speed 9446.06 samples/sec   Loss 0.9801   LearningRate 0.0001   Epoch: 30   Global Step: 52040   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:50:07,155-Speed 9446.84 samples/sec   Loss 0.9792   LearningRate 0.0001   Epoch: 30   Global Step: 52050   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:50:33,229-Speed 9425.81 samples/sec   Loss 0.9907   LearningRate 0.0001   Epoch: 30   Global Step: 52060   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:50:59,368-Speed 9402.35 samples/sec   Loss 0.9824   LearningRate 0.0001   Epoch: 30   Global Step: 52070   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:51:25,425-Speed 9432.20 samples/sec   Loss 0.9783   LearningRate 0.0001   Epoch: 30   Global Step: 52080   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:51:51,540-Speed 9411.05 samples/sec   Loss 0.9885   LearningRate 0.0001   Epoch: 30   Global Step: 52090   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:52:17,593-Speed 9433.41 samples/sec   Loss 0.9829   LearningRate 0.0001   Epoch: 30   Global Step: 52100   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:52:43,635-Speed 9437.49 samples/sec   Loss 0.9866   LearningRate 0.0001   Epoch: 30   Global Step: 52110   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:53:09,783-Speed 9399.13 samples/sec   Loss 0.9843   LearningRate 0.0001   Epoch: 30   Global Step: 52120   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-06 10:53:35,888-Speed 9414.66 samples/sec   Loss 0.9836   LearningRate 0.0001   Epoch: 30   Global Step: 52130   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-06 10:54:02,062-Speed 9389.90 samples/sec   Loss 0.9858   LearningRate 0.0001   Epoch: 30   Global Step: 52140   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-06 10:54:28,159-Speed 9417.52 samples/sec   Loss 0.9824   LearningRate 0.0001   Epoch: 30   Global Step: 52150   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 10:54:54,270-Speed 9412.34 samples/sec   Loss 0.9842   LearningRate 0.0001   Epoch: 30   Global Step: 52160   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 10:55:20,411-Speed 9402.08 samples/sec   Loss 0.9840   LearningRate 0.0001   Epoch: 30   Global Step: 52170   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 10:55:46,533-Speed 9408.35 samples/sec   Loss 0.9891   LearningRate 0.0001   Epoch: 30   Global Step: 52180   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 10:56:12,660-Speed 9406.91 samples/sec   Loss 0.9801   LearningRate 0.0001   Epoch: 30   Global Step: 52190   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 10:56:39,017-Speed 9324.69 samples/sec   Loss 0.9842   LearningRate 0.0001   Epoch: 30   Global Step: 52200   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 10:57:05,099-Speed 9422.80 samples/sec   Loss 0.9822   LearningRate 0.0001   Epoch: 30   Global Step: 52210   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 10:57:31,294-Speed 9382.60 samples/sec   Loss 0.9797   LearningRate 0.0001   Epoch: 30   Global Step: 52220   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 10:57:57,458-Speed 9393.10 samples/sec   Loss 0.9767   LearningRate 0.0001   Epoch: 30   Global Step: 52230   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 10:58:23,584-Speed 9407.50 samples/sec   Loss 0.9839   LearningRate 0.0001   Epoch: 30   Global Step: 52240   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 10:58:49,747-Speed 9393.70 samples/sec   Loss 0.9804   LearningRate 0.0001   Epoch: 30   Global Step: 52250   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-06 10:59:15,826-Speed 9423.74 samples/sec   Loss 0.9790   LearningRate 0.0001   Epoch: 30   Global Step: 52260   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 10:59:41,968-Speed 9401.75 samples/sec   Loss 0.9823   LearningRate 0.0001   Epoch: 30   Global Step: 52270   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:00:08,230-Speed 9358.46 samples/sec   Loss 0.9752   LearningRate 0.0001   Epoch: 30   Global Step: 52280   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:00:34,394-Speed 9393.22 samples/sec   Loss 0.9789   LearningRate 0.0001   Epoch: 30   Global Step: 52290   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-06 11:01:00,575-Speed 9387.52 samples/sec   Loss 0.9880   LearningRate 0.0001   Epoch: 30   Global Step: 52300   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-06 11:01:26,888-Speed 9340.42 samples/sec   Loss 0.9784   LearningRate 0.0001   Epoch: 30   Global Step: 52310   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-06 11:01:53,045-Speed 9396.12 samples/sec   Loss 0.9789   LearningRate 0.0001   Epoch: 30   Global Step: 52320   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-06 11:02:19,193-Speed 9399.14 samples/sec   Loss 0.9788   LearningRate 0.0001   Epoch: 30   Global Step: 52330   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-06 11:02:45,378-Speed 9386.34 samples/sec   Loss 0.9800   LearningRate 0.0001   Epoch: 30   Global Step: 52340   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-06 11:03:11,516-Speed 9402.61 samples/sec   Loss 0.9700   LearningRate 0.0001   Epoch: 30   Global Step: 52350   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-06 11:03:37,684-Speed 9392.17 samples/sec   Loss 0.9797   LearningRate 0.0001   Epoch: 30   Global Step: 52360   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-06 11:04:03,803-Speed 9409.74 samples/sec   Loss 0.9825   LearningRate 0.0001   Epoch: 30   Global Step: 52370   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-06 11:04:29,947-Speed 9400.45 samples/sec   Loss 0.9820   LearningRate 0.0001   Epoch: 30   Global Step: 52380   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-06 11:04:56,184-Speed 9367.41 samples/sec   Loss 0.9784   LearningRate 0.0001   Epoch: 30   Global Step: 52390   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:05:22,266-Speed 9423.16 samples/sec   Loss 0.9725   LearningRate 0.0001   Epoch: 30   Global Step: 52400   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:05:48,381-Speed 9411.19 samples/sec   Loss 0.9806   LearningRate 0.0001   Epoch: 30   Global Step: 52410   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:06:14,617-Speed 9367.84 samples/sec   Loss 0.9743   LearningRate 0.0001   Epoch: 30   Global Step: 52420   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:06:40,770-Speed 9397.44 samples/sec   Loss 0.9786   LearningRate 0.0001   Epoch: 30   Global Step: 52430   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:07:06,935-Speed 9393.17 samples/sec   Loss 0.9815   LearningRate 0.0001   Epoch: 30   Global Step: 52440   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:07:33,053-Speed 9410.16 samples/sec   Loss 0.9744   LearningRate 0.0001   Epoch: 30   Global Step: 52450   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:07:59,278-Speed 9371.79 samples/sec   Loss 0.9727   LearningRate 0.0001   Epoch: 30   Global Step: 52460   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:08:25,367-Speed 9420.50 samples/sec   Loss 0.9736   LearningRate 0.0001   Epoch: 30   Global Step: 52470   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:08:51,571-Speed 9379.15 samples/sec   Loss 0.9711   LearningRate 0.0001   Epoch: 30   Global Step: 52480   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:09:17,727-Speed 9396.21 samples/sec   Loss 0.9774   LearningRate 0.0001   Epoch: 30   Global Step: 52490   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-06 11:09:43,966-Speed 9366.92 samples/sec   Loss 0.9716   LearningRate 0.0001   Epoch: 30   Global Step: 52500   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-06 11:10:10,137-Speed 9390.91 samples/sec   Loss 0.9727   LearningRate 0.0001   Epoch: 30   Global Step: 52510   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-06 11:10:36,271-Speed 9404.16 samples/sec   Loss 0.9709   LearningRate 0.0001   Epoch: 30   Global Step: 52520   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-06 11:11:03,694-Speed 8962.08 samples/sec   Loss 0.9691   LearningRate 0.0001   Epoch: 30   Global Step: 52530   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-06 11:11:29,836-Speed 9401.39 samples/sec   Loss 0.9780   LearningRate 0.0001   Epoch: 30   Global Step: 52540   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-06 11:11:55,958-Speed 9408.76 samples/sec   Loss 0.9691   LearningRate 0.0001   Epoch: 30   Global Step: 52550   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-06 11:12:22,061-Speed 9415.51 samples/sec   Loss 0.9757   LearningRate 0.0001   Epoch: 30   Global Step: 52560   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-06 11:12:48,200-Speed 9402.57 samples/sec   Loss 0.9773   LearningRate 0.0001   Epoch: 30   Global Step: 52570   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:13:14,364-Speed 9393.51 samples/sec   Loss 0.9719   LearningRate 0.0001   Epoch: 30   Global Step: 52580   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:13:40,604-Speed 9366.46 samples/sec   Loss 0.9731   LearningRate 0.0001   Epoch: 30   Global Step: 52590   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:14:06,753-Speed 9398.99 samples/sec   Loss 0.9784   LearningRate 0.0001   Epoch: 30   Global Step: 52600   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:14:32,931-Speed 9388.32 samples/sec   Loss 0.9719   LearningRate 0.0001   Epoch: 30   Global Step: 52610   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:14:59,065-Speed 9404.45 samples/sec   Loss 0.9792   LearningRate 0.0001   Epoch: 30   Global Step: 52620   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:15:25,269-Speed 9379.17 samples/sec   Loss 0.9659   LearningRate 0.0001   Epoch: 30   Global Step: 52630   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:15:51,443-Speed 9389.68 samples/sec   Loss 0.9720   LearningRate 0.0001   Epoch: 30   Global Step: 52640   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:16:17,571-Speed 9406.33 samples/sec   Loss 0.9679   LearningRate 0.0001   Epoch: 30   Global Step: 52650   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:16:43,750-Speed 9388.37 samples/sec   Loss 0.9790   LearningRate 0.0001   Epoch: 30   Global Step: 52660   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:17:09,870-Speed 9409.15 samples/sec   Loss 0.9741   LearningRate 0.0001   Epoch: 30   Global Step: 52670   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-06 11:17:36,165-Speed 9347.21 samples/sec   Loss 0.9669   LearningRate 0.0001   Epoch: 30   Global Step: 52680   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-06 11:18:02,391-Speed 9371.23 samples/sec   Loss 0.9642   LearningRate 0.0001   Epoch: 30   Global Step: 52690   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-06 11:18:28,563-Speed 9390.54 samples/sec   Loss 0.9728   LearningRate 0.0001   Epoch: 30   Global Step: 52700   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:18:54,785-Speed 9372.72 samples/sec   Loss 0.9711   LearningRate 0.0001   Epoch: 30   Global Step: 52710   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:19:21,086-Speed 9344.49 samples/sec   Loss 0.9614   LearningRate 0.0001   Epoch: 30   Global Step: 52720   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:19:47,249-Speed 9394.45 samples/sec   Loss 0.9677   LearningRate 0.0001   Epoch: 30   Global Step: 52730   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:20:13,509-Speed 9359.22 samples/sec   Loss 0.9595   LearningRate 0.0001   Epoch: 30   Global Step: 52740   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:20:39,690-Speed 9387.36 samples/sec   Loss 0.9692   LearningRate 0.0001   Epoch: 30   Global Step: 52750   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:21:05,959-Speed 9355.74 samples/sec   Loss 0.9680   LearningRate 0.0001   Epoch: 30   Global Step: 52760   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:21:32,133-Speed 9390.09 samples/sec   Loss 0.9624   LearningRate 0.0001   Epoch: 30   Global Step: 52770   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-06 11:21:58,338-Speed 9378.76 samples/sec   Loss 0.9633   LearningRate 0.0001   Epoch: 30   Global Step: 52780   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-06 11:22:24,524-Speed 9385.39 samples/sec   Loss 0.9742   LearningRate 0.0001   Epoch: 30   Global Step: 52790   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-06 11:22:50,655-Speed 9405.52 samples/sec   Loss 0.9734   LearningRate 0.0001   Epoch: 30   Global Step: 52800   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-06 11:23:16,842-Speed 9385.12 samples/sec   Loss 0.9711   LearningRate 0.0001   Epoch: 30   Global Step: 52810   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-06 11:23:42,942-Speed 9416.52 samples/sec   Loss 0.9637   LearningRate 0.0001   Epoch: 30   Global Step: 52820   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-06 11:24:09,109-Speed 9392.58 samples/sec   Loss 0.9633   LearningRate 0.0001   Epoch: 30   Global Step: 52830   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-06 11:24:35,244-Speed 9403.71 samples/sec   Loss 0.9619   LearningRate 0.0001   Epoch: 30   Global Step: 52840   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-06 11:25:01,367-Speed 9408.42 samples/sec   Loss 0.9670   LearningRate 0.0001   Epoch: 30   Global Step: 52850   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-06 11:25:27,556-Speed 9384.46 samples/sec   Loss 0.9698   LearningRate 0.0001   Epoch: 30   Global Step: 52860   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-06 11:25:53,696-Speed 9402.46 samples/sec   Loss 0.9603   LearningRate 0.0001   Epoch: 30   Global Step: 52870   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:26:19,823-Speed 9406.84 samples/sec   Loss 0.9547   LearningRate 0.0001   Epoch: 30   Global Step: 52880   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:26:45,946-Speed 9407.83 samples/sec   Loss 0.9632   LearningRate 0.0001   Epoch: 30   Global Step: 52890   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:27:12,080-Speed 9404.50 samples/sec   Loss 0.9703   LearningRate 0.0001   Epoch: 30   Global Step: 52900   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:27:38,273-Speed 9383.00 samples/sec   Loss 0.9612   LearningRate 0.0001   Epoch: 30   Global Step: 52910   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:28:04,492-Speed 9373.95 samples/sec   Loss 0.9708   LearningRate 0.0001   Epoch: 30   Global Step: 52920   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:28:30,580-Speed 9420.93 samples/sec   Loss 0.9641   LearningRate 0.0001   Epoch: 30   Global Step: 52930   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:28:56,699-Speed 9409.79 samples/sec   Loss 0.9606   LearningRate 0.0001   Epoch: 30   Global Step: 52940   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:29:22,781-Speed 9422.87 samples/sec   Loss 0.9647   LearningRate 0.0001   Epoch: 30   Global Step: 52950   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:29:48,939-Speed 9395.73 samples/sec   Loss 0.9538   LearningRate 0.0001   Epoch: 30   Global Step: 52960   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:30:15,133-Speed 9382.93 samples/sec   Loss 0.9595   LearningRate 0.0001   Epoch: 30   Global Step: 52970   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-06 11:30:41,310-Speed 9388.51 samples/sec   Loss 0.9625   LearningRate 0.0001   Epoch: 30   Global Step: 52980   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-06 11:31:07,422-Speed 9412.32 samples/sec   Loss 0.9587   LearningRate 0.0001   Epoch: 30   Global Step: 52990   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:31:33,622-Speed 9380.46 samples/sec   Loss 0.9650   LearningRate 0.0001   Epoch: 30   Global Step: 53000   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:31:59,786-Speed 9393.86 samples/sec   Loss 0.9601   LearningRate 0.0001   Epoch: 30   Global Step: 53010   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:32:25,898-Speed 9412.01 samples/sec   Loss 0.9587   LearningRate 0.0001   Epoch: 30   Global Step: 53020   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:32:52,040-Speed 9401.23 samples/sec   Loss 0.9661   LearningRate 0.0001   Epoch: 30   Global Step: 53030   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:33:18,141-Speed 9415.84 samples/sec   Loss 0.9609   LearningRate 0.0001   Epoch: 30   Global Step: 53040   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:33:44,293-Speed 9397.80 samples/sec   Loss 0.9514   LearningRate 0.0001   Epoch: 30   Global Step: 53050   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:34:10,398-Speed 9414.75 samples/sec   Loss 0.9554   LearningRate 0.0001   Epoch: 30   Global Step: 53060   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:34:36,492-Speed 9418.78 samples/sec   Loss 0.9490   LearningRate 0.0001   Epoch: 30   Global Step: 53070   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:35:02,581-Speed 9420.21 samples/sec   Loss 0.9551   LearningRate 0.0001   Epoch: 30   Global Step: 53080   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:35:28,668-Speed 9421.27 samples/sec   Loss 0.9566   LearningRate 0.0001   Epoch: 30   Global Step: 53090   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-06 11:35:54,767-Speed 9416.73 samples/sec   Loss 0.9583   LearningRate 0.0001   Epoch: 30   Global Step: 53100   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-06 11:36:20,880-Speed 9411.86 samples/sec   Loss 0.9546   LearningRate 0.0001   Epoch: 30   Global Step: 53110   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-06 11:36:47,059-Speed 9388.27 samples/sec   Loss 0.9636   LearningRate 0.0001   Epoch: 30   Global Step: 53120   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-06 11:37:13,222-Speed 9393.52 samples/sec   Loss 0.9565   LearningRate 0.0001   Epoch: 30   Global Step: 53130   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-06 11:37:39,442-Speed 9373.56 samples/sec   Loss 0.9610   LearningRate 0.0001   Epoch: 30   Global Step: 53140   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-06 11:38:05,587-Speed 9400.25 samples/sec   Loss 0.9628   LearningRate 0.0001   Epoch: 30   Global Step: 53150   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-06 11:38:31,749-Speed 9394.34 samples/sec   Loss 0.9574   LearningRate 0.0001   Epoch: 30   Global Step: 53160   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-06 11:38:57,841-Speed 9419.11 samples/sec   Loss 0.9609   LearningRate 0.0001   Epoch: 30   Global Step: 53170   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-06 11:39:23,967-Speed 9407.24 samples/sec   Loss 0.9634   LearningRate 0.0001   Epoch: 30   Global Step: 53180   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-06 11:39:50,260-Speed 9347.43 samples/sec   Loss 0.9589   LearningRate 0.0001   Epoch: 30   Global Step: 53190   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-03-06 11:40:16,336-Speed 9425.24 samples/sec   Loss 0.9607   LearningRate 0.0001   Epoch: 30   Global Step: 53200   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-06 11:40:42,559-Speed 9372.48 samples/sec   Loss 0.9559   LearningRate 0.0001   Epoch: 30   Global Step: 53210   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:41:08,712-Speed 9397.55 samples/sec   Loss 0.9591   LearningRate 0.0001   Epoch: 30   Global Step: 53220   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:41:34,841-Speed 9406.11 samples/sec   Loss 0.9581   LearningRate 0.0001   Epoch: 30   Global Step: 53230   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:42:01,064-Speed 9372.43 samples/sec   Loss 0.9594   LearningRate 0.0001   Epoch: 30   Global Step: 53240   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:42:27,191-Speed 9407.20 samples/sec   Loss 0.9549   LearningRate 0.0001   Epoch: 30   Global Step: 53250   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:42:53,285-Speed 9418.78 samples/sec   Loss 0.9557   LearningRate 0.0001   Epoch: 30   Global Step: 53260   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:43:19,440-Speed 9396.65 samples/sec   Loss 0.9542   LearningRate 0.0001   Epoch: 30   Global Step: 53270   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:43:45,664-Speed 9372.15 samples/sec   Loss 0.9514   LearningRate 0.0001   Epoch: 30   Global Step: 53280   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:44:11,785-Speed 9408.78 samples/sec   Loss 0.9518   LearningRate 0.0001   Epoch: 30   Global Step: 53290   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:44:37,936-Speed 9399.58 samples/sec   Loss 0.9564   LearningRate 0.0001   Epoch: 30   Global Step: 53300   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:45:04,284-Speed 9328.05 samples/sec   Loss 0.9558   LearningRate 0.0001   Epoch: 30   Global Step: 53310   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-06 11:45:30,472-Speed 9384.91 samples/sec   Loss 0.9594   LearningRate 0.0001   Epoch: 30   Global Step: 53320   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-06 11:45:56,659-Speed 9385.28 samples/sec   Loss 0.9561   LearningRate 0.0001   Epoch: 30   Global Step: 53330   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:46:22,810-Speed 9398.15 samples/sec   Loss 0.9516   LearningRate 0.0001   Epoch: 30   Global Step: 53340   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:46:48,899-Speed 9420.77 samples/sec   Loss 0.9557   LearningRate 0.0001   Epoch: 30   Global Step: 53350   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:47:15,082-Speed 9386.24 samples/sec   Loss 0.9531   LearningRate 0.0001   Epoch: 30   Global Step: 53360   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:47:41,219-Speed 9403.34 samples/sec   Loss 0.9580   LearningRate 0.0001   Epoch: 30   Global Step: 53370   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:48:07,420-Speed 9380.10 samples/sec   Loss 0.9539   LearningRate 0.0001   Epoch: 30   Global Step: 53380   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:48:33,610-Speed 9384.46 samples/sec   Loss 0.9525   LearningRate 0.0001   Epoch: 30   Global Step: 53390   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:48:59,766-Speed 9396.43 samples/sec   Loss 0.9550   LearningRate 0.0001   Epoch: 30   Global Step: 53400   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:49:26,052-Speed 9349.71 samples/sec   Loss 0.9508   LearningRate 0.0001   Epoch: 30   Global Step: 53410   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:49:52,217-Speed 9393.56 samples/sec   Loss 0.9579   LearningRate 0.0001   Epoch: 30   Global Step: 53420   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-06 11:50:18,326-Speed 9413.38 samples/sec   Loss 0.9507   LearningRate 0.0001   Epoch: 30   Global Step: 53430   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-06 11:50:44,500-Speed 9390.61 samples/sec   Loss 0.9508   LearningRate 0.0001   Epoch: 30   Global Step: 53440   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-06 11:51:10,661-Speed 9394.50 samples/sec   Loss 0.9593   LearningRate 0.0001   Epoch: 30   Global Step: 53450   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-06 11:51:36,820-Speed 9395.58 samples/sec   Loss 0.9533   LearningRate 0.0001   Epoch: 30   Global Step: 53460   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-06 11:52:03,106-Speed 9349.90 samples/sec   Loss 0.9472   LearningRate 0.0001   Epoch: 30   Global Step: 53470   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-06 11:52:29,318-Speed 9376.46 samples/sec   Loss 0.9522   LearningRate 0.0001   Epoch: 30   Global Step: 53480   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-06 11:52:55,466-Speed 9398.86 samples/sec   Loss 0.9549   LearningRate 0.0001   Epoch: 30   Global Step: 53490   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-06 11:53:21,615-Speed 9398.99 samples/sec   Loss 0.9519   LearningRate 0.0001   Epoch: 30   Global Step: 53500   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 11:53:47,770-Speed 9396.68 samples/sec   Loss 0.9577   LearningRate 0.0001   Epoch: 30   Global Step: 53510   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 11:54:13,934-Speed 9393.34 samples/sec   Loss 0.9463   LearningRate 0.0001   Epoch: 30   Global Step: 53520   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 11:54:40,083-Speed 9398.82 samples/sec   Loss 0.9486   LearningRate 0.0001   Epoch: 30   Global Step: 53530   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 11:55:06,215-Speed 9405.13 samples/sec   Loss 0.9465   LearningRate 0.0001   Epoch: 30   Global Step: 53540   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 11:55:32,286-Speed 9426.96 samples/sec   Loss 0.9435   LearningRate 0.0001   Epoch: 30   Global Step: 53550   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 11:55:58,425-Speed 9402.26 samples/sec   Loss 0.9625   LearningRate 0.0001   Epoch: 30   Global Step: 53560   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 11:56:24,616-Speed 9384.00 samples/sec   Loss 0.9594   LearningRate 0.0001   Epoch: 30   Global Step: 53570   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 11:57:43,282-Speed 3124.16 samples/sec   Loss 0.9532   LearningRate 0.0001   Epoch: 31   Global Step: 53580   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 11:58:09,136-Speed 9506.04 samples/sec   Loss 0.9442   LearningRate 0.0001   Epoch: 31   Global Step: 53590   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 11:58:35,095-Speed 9467.54 samples/sec   Loss 0.9464   LearningRate 0.0001   Epoch: 31   Global Step: 53600   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 11:59:01,192-Speed 9417.74 samples/sec   Loss 0.9413   LearningRate 0.0001   Epoch: 31   Global Step: 53610   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 11:59:27,387-Speed 9382.40 samples/sec   Loss 0.9434   LearningRate 0.0001   Epoch: 31   Global Step: 53620   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 11:59:53,551-Speed 9393.56 samples/sec   Loss 0.9514   LearningRate 0.0001   Epoch: 31   Global Step: 53630   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:00:19,807-Speed 9360.36 samples/sec   Loss 0.9425   LearningRate 0.0001   Epoch: 31   Global Step: 53640   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:00:46,014-Speed 9378.20 samples/sec   Loss 0.9438   LearningRate 0.0001   Epoch: 31   Global Step: 53650   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:01:12,147-Speed 9404.64 samples/sec   Loss 0.9442   LearningRate 0.0001   Epoch: 31   Global Step: 53660   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:01:38,260-Speed 9411.91 samples/sec   Loss 0.9409   LearningRate 0.0001   Epoch: 31   Global Step: 53670   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:02:04,365-Speed 9414.56 samples/sec   Loss 0.9421   LearningRate 0.0001   Epoch: 31   Global Step: 53680   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:02:30,493-Speed 9406.12 samples/sec   Loss 0.9435   LearningRate 0.0001   Epoch: 31   Global Step: 53690   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:02:56,651-Speed 9395.88 samples/sec   Loss 0.9444   LearningRate 0.0001   Epoch: 31   Global Step: 53700   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:03:22,774-Speed 9408.08 samples/sec   Loss 0.9425   LearningRate 0.0001   Epoch: 31   Global Step: 53710   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-03-06 12:03:48,867-Speed 9419.36 samples/sec   Loss 0.9410   LearningRate 0.0001   Epoch: 31   Global Step: 53720   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:04:14,940-Speed 9426.38 samples/sec   Loss 0.9450   LearningRate 0.0001   Epoch: 31   Global Step: 53730   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:04:41,097-Speed 9395.96 samples/sec   Loss 0.9443   LearningRate 0.0001   Epoch: 31   Global Step: 53740   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:05:07,209-Speed 9412.05 samples/sec   Loss 0.9478   LearningRate 0.0001   Epoch: 31   Global Step: 53750   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:05:33,310-Speed 9416.38 samples/sec   Loss 0.9406   LearningRate 0.0001   Epoch: 31   Global Step: 53760   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:05:59,413-Speed 9415.70 samples/sec   Loss 0.9419   LearningRate 0.0001   Epoch: 31   Global Step: 53770   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:06:25,549-Speed 9404.62 samples/sec   Loss 0.9341   LearningRate 0.0001   Epoch: 31   Global Step: 53780   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:06:58,657-Speed 7423.28 samples/sec   Loss 0.9438   LearningRate 0.0001   Epoch: 31   Global Step: 53790   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:07:24,818-Speed 9394.38 samples/sec   Loss 0.9460   LearningRate 0.0001   Epoch: 31   Global Step: 53800   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:07:50,969-Speed 9398.39 samples/sec   Loss 0.9431   LearningRate 0.0001   Epoch: 31   Global Step: 53810   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:08:17,110-Speed 9401.58 samples/sec   Loss 0.9483   LearningRate 0.0001   Epoch: 31   Global Step: 53820   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-03-06 12:08:43,136-Speed 9443.21 samples/sec   Loss 0.9460   LearningRate 0.0001   Epoch: 31   Global Step: 53830   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:09:09,326-Speed 9384.12 samples/sec   Loss 0.9433   LearningRate 0.0001   Epoch: 31   Global Step: 53840   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:09:35,515-Speed 9384.95 samples/sec   Loss 0.9428   LearningRate 0.0001   Epoch: 31   Global Step: 53850   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:10:01,638-Speed 9408.06 samples/sec   Loss 0.9404   LearningRate 0.0001   Epoch: 31   Global Step: 53860   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:10:27,846-Speed 9377.73 samples/sec   Loss 0.9370   LearningRate 0.0001   Epoch: 31   Global Step: 53870   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:10:54,037-Speed 9384.00 samples/sec   Loss 0.9387   LearningRate 0.0001   Epoch: 31   Global Step: 53880   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:11:20,215-Speed 9388.53 samples/sec   Loss 0.9429   LearningRate 0.0001   Epoch: 31   Global Step: 53890   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:11:46,419-Speed 9378.98 samples/sec   Loss 0.9469   LearningRate 0.0001   Epoch: 31   Global Step: 53900   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:12:12,578-Speed 9395.02 samples/sec   Loss 0.9378   LearningRate 0.0001   Epoch: 31   Global Step: 53910   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:12:38,735-Speed 9396.28 samples/sec   Loss 0.9375   LearningRate 0.0001   Epoch: 31   Global Step: 53920   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:13:04,919-Speed 9386.25 samples/sec   Loss 0.9439   LearningRate 0.0001   Epoch: 31   Global Step: 53930   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-03-06 12:13:30,985-Speed 9429.46 samples/sec   Loss 0.9475   LearningRate 0.0001   Epoch: 31   Global Step: 53940   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:13:57,171-Speed 9385.66 samples/sec   Loss 0.9429   LearningRate 0.0001   Epoch: 31   Global Step: 53950   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:14:23,282-Speed 9412.14 samples/sec   Loss 0.9340   LearningRate 0.0001   Epoch: 31   Global Step: 53960   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:14:49,446-Speed 9393.65 samples/sec   Loss 0.9457   LearningRate 0.0001   Epoch: 31   Global Step: 53970   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:15:15,637-Speed 9384.46 samples/sec   Loss 0.9404   LearningRate 0.0001   Epoch: 31   Global Step: 53980   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:15:43,867-Speed 8705.82 samples/sec   Loss 0.9412   LearningRate 0.0001   Epoch: 31   Global Step: 53990   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 12:16:09,986-Speed 9409.91 samples/sec   Loss 0.9363   LearningRate 0.0001   Epoch: 31   Global Step: 54000   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 12:16:36,137-Speed 9399.02 samples/sec   Loss 0.9407   LearningRate 0.0001   Epoch: 31   Global Step: 54010   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 12:17:02,281-Speed 9400.90 samples/sec   Loss 0.9468   LearningRate 0.0001   Epoch: 31   Global Step: 54020   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 12:17:28,325-Speed 9436.79 samples/sec   Loss 0.9463   LearningRate 0.0001   Epoch: 31   Global Step: 54030   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 12:17:54,541-Speed 9374.93 samples/sec   Loss 0.9326   LearningRate 0.0001   Epoch: 31   Global Step: 54040   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 12:18:20,754-Speed 9376.00 samples/sec   Loss 0.9448   LearningRate 0.0001   Epoch: 31   Global Step: 54050   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 12:18:46,822-Speed 9428.22 samples/sec   Loss 0.9381   LearningRate 0.0001   Epoch: 31   Global Step: 54060   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 12:19:12,958-Speed 9403.37 samples/sec   Loss 0.9507   LearningRate 0.0001   Epoch: 31   Global Step: 54070   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 12:19:39,082-Speed 9407.94 samples/sec   Loss 0.9385   LearningRate 0.0001   Epoch: 31   Global Step: 54080   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 12:20:05,215-Speed 9404.60 samples/sec   Loss 0.9360   LearningRate 0.0001   Epoch: 31   Global Step: 54090   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:20:31,370-Speed 9396.98 samples/sec   Loss 0.9377   LearningRate 0.0001   Epoch: 31   Global Step: 54100   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:20:57,473-Speed 9415.59 samples/sec   Loss 0.9372   LearningRate 0.0001   Epoch: 31   Global Step: 54110   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:21:23,641-Speed 9391.95 samples/sec   Loss 0.9341   LearningRate 0.0001   Epoch: 31   Global Step: 54120   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:21:49,745-Speed 9414.99 samples/sec   Loss 0.9352   LearningRate 0.0001   Epoch: 31   Global Step: 54130   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:22:15,898-Speed 9397.81 samples/sec   Loss 0.9411   LearningRate 0.0001   Epoch: 31   Global Step: 54140   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:22:42,057-Speed 9395.18 samples/sec   Loss 0.9319   LearningRate 0.0001   Epoch: 31   Global Step: 54150   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:23:08,131-Speed 9426.57 samples/sec   Loss 0.9355   LearningRate 0.0001   Epoch: 31   Global Step: 54160   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:23:34,335-Speed 9379.18 samples/sec   Loss 0.9345   LearningRate 0.0001   Epoch: 31   Global Step: 54170   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:24:00,591-Speed 9360.76 samples/sec   Loss 0.9390   LearningRate 0.0001   Epoch: 31   Global Step: 54180   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:24:26,768-Speed 9388.86 samples/sec   Loss 0.9334   LearningRate 0.0001   Epoch: 31   Global Step: 54190   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-03-06 12:24:52,824-Speed 9432.53 samples/sec   Loss 0.9350   LearningRate 0.0001   Epoch: 31   Global Step: 54200   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 12:25:18,965-Speed 9401.64 samples/sec   Loss 0.9322   LearningRate 0.0001   Epoch: 31   Global Step: 54210   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 12:25:45,031-Speed 9428.78 samples/sec   Loss 0.9363   LearningRate 0.0001   Epoch: 31   Global Step: 54220   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 12:26:11,192-Speed 9394.78 samples/sec   Loss 0.9326   LearningRate 0.0001   Epoch: 31   Global Step: 54230   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 12:26:37,410-Speed 9374.03 samples/sec   Loss 0.9361   LearningRate 0.0001   Epoch: 31   Global Step: 54240   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 12:27:03,512-Speed 9416.13 samples/sec   Loss 0.9325   LearningRate 0.0001   Epoch: 31   Global Step: 54250   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 12:27:29,642-Speed 9405.86 samples/sec   Loss 0.9293   LearningRate 0.0001   Epoch: 31   Global Step: 54260   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 12:27:55,802-Speed 9394.98 samples/sec   Loss 0.9408   LearningRate 0.0001   Epoch: 31   Global Step: 54270   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 12:28:21,901-Speed 9416.90 samples/sec   Loss 0.9340   LearningRate 0.0001   Epoch: 31   Global Step: 54280   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 12:28:48,041-Speed 9402.07 samples/sec   Loss 0.9293   LearningRate 0.0001   Epoch: 31   Global Step: 54290   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 12:29:14,140-Speed 9416.78 samples/sec   Loss 0.9356   LearningRate 0.0001   Epoch: 31   Global Step: 54300   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 12:29:40,285-Speed 9400.18 samples/sec   Loss 0.9305   LearningRate 0.0001   Epoch: 31   Global Step: 54310   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 12:30:06,406-Speed 9409.17 samples/sec   Loss 0.9334   LearningRate 0.0001   Epoch: 31   Global Step: 54320   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 12:30:32,530-Speed 9407.92 samples/sec   Loss 0.9300   LearningRate 0.0001   Epoch: 31   Global Step: 54330   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 12:30:58,726-Speed 9381.81 samples/sec   Loss 0.9391   LearningRate 0.0001   Epoch: 31   Global Step: 54340   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 12:31:24,788-Speed 9430.92 samples/sec   Loss 0.9298   LearningRate 0.0001   Epoch: 31   Global Step: 54350   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 12:31:50,919-Speed 9405.41 samples/sec   Loss 0.9352   LearningRate 0.0001   Epoch: 31   Global Step: 54360   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 12:32:17,026-Speed 9413.85 samples/sec   Loss 0.9324   LearningRate 0.0001   Epoch: 31   Global Step: 54370   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 12:32:43,139-Speed 9411.77 samples/sec   Loss 0.9307   LearningRate 0.0001   Epoch: 31   Global Step: 54380   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 12:33:09,324-Speed 9385.96 samples/sec   Loss 0.9369   LearningRate 0.0001   Epoch: 31   Global Step: 54390   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 12:33:35,390-Speed 9428.75 samples/sec   Loss 0.9350   LearningRate 0.0001   Epoch: 31   Global Step: 54400   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:34:01,517-Speed 9406.77 samples/sec   Loss 0.9276   LearningRate 0.0001   Epoch: 31   Global Step: 54410   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:34:27,639-Speed 9408.42 samples/sec   Loss 0.9203   LearningRate 0.0001   Epoch: 31   Global Step: 54420   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:34:53,787-Speed 9399.36 samples/sec   Loss 0.9339   LearningRate 0.0001   Epoch: 31   Global Step: 54430   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:35:19,964-Speed 9388.55 samples/sec   Loss 0.9285   LearningRate 0.0001   Epoch: 31   Global Step: 54440   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:35:46,146-Speed 9387.24 samples/sec   Loss 0.9295   LearningRate 0.0001   Epoch: 31   Global Step: 54450   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:36:12,345-Speed 9380.59 samples/sec   Loss 0.9310   LearningRate 0.0001   Epoch: 31   Global Step: 54460   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:36:38,530-Speed 9386.06 samples/sec   Loss 0.9301   LearningRate 0.0001   Epoch: 31   Global Step: 54470   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:37:04,771-Speed 9365.81 samples/sec   Loss 0.9281   LearningRate 0.0001   Epoch: 31   Global Step: 54480   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:37:30,983-Speed 9376.26 samples/sec   Loss 0.9172   LearningRate 0.0001   Epoch: 31   Global Step: 54490   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:37:57,097-Speed 9411.75 samples/sec   Loss 0.9249   LearningRate 0.0001   Epoch: 31   Global Step: 54500   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-03-06 12:38:23,210-Speed 9411.68 samples/sec   Loss 0.9234   LearningRate 0.0001   Epoch: 31   Global Step: 54510   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-03-06 12:38:49,323-Speed 9411.94 samples/sec   Loss 0.9299   LearningRate 0.0001   Epoch: 31   Global Step: 54520   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:39:15,473-Speed 9398.52 samples/sec   Loss 0.9284   LearningRate 0.0001   Epoch: 31   Global Step: 54530   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:39:41,601-Speed 9406.42 samples/sec   Loss 0.9298   LearningRate 0.0001   Epoch: 31   Global Step: 54540   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:40:07,691-Speed 9420.28 samples/sec   Loss 0.9256   LearningRate 0.0001   Epoch: 31   Global Step: 54550   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:40:33,799-Speed 9413.84 samples/sec   Loss 0.9176   LearningRate 0.0001   Epoch: 31   Global Step: 54560   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:40:59,983-Speed 9385.91 samples/sec   Loss 0.9274   LearningRate 0.0001   Epoch: 31   Global Step: 54570   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:41:26,127-Speed 9400.89 samples/sec   Loss 0.9267   LearningRate 0.0001   Epoch: 31   Global Step: 54580   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:41:52,229-Speed 9415.99 samples/sec   Loss 0.9272   LearningRate 0.0001   Epoch: 31   Global Step: 54590   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:42:18,389-Speed 9394.95 samples/sec   Loss 0.9301   LearningRate 0.0001   Epoch: 31   Global Step: 54600   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:42:44,441-Speed 9433.83 samples/sec   Loss 0.9312   LearningRate 0.0001   Epoch: 31   Global Step: 54610   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 12:43:10,616-Speed 9389.29 samples/sec   Loss 0.9244   LearningRate 0.0001   Epoch: 31   Global Step: 54620   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 12:43:36,821-Speed 9379.95 samples/sec   Loss 0.9275   LearningRate 0.0001   Epoch: 31   Global Step: 54630   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 12:44:02,965-Speed 9400.89 samples/sec   Loss 0.9278   LearningRate 0.0001   Epoch: 31   Global Step: 54640   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 12:44:29,104-Speed 9402.43 samples/sec   Loss 0.9229   LearningRate 0.0001   Epoch: 31   Global Step: 54650   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 12:44:55,315-Speed 9376.55 samples/sec   Loss 0.9268   LearningRate 0.0001   Epoch: 31   Global Step: 54660   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 12:45:21,454-Speed 9402.58 samples/sec   Loss 0.9229   LearningRate 0.0001   Epoch: 31   Global Step: 54670   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 12:45:47,677-Speed 9372.23 samples/sec   Loss 0.9193   LearningRate 0.0001   Epoch: 31   Global Step: 54680   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 12:46:13,825-Speed 9399.41 samples/sec   Loss 0.9223   LearningRate 0.0001   Epoch: 31   Global Step: 54690   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 12:46:39,974-Speed 9398.59 samples/sec   Loss 0.9205   LearningRate 0.0001   Epoch: 31   Global Step: 54700   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 12:47:06,143-Speed 9391.66 samples/sec   Loss 0.9222   LearningRate 0.0001   Epoch: 31   Global Step: 54710   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:47:32,321-Speed 9388.45 samples/sec   Loss 0.9239   LearningRate 0.0001   Epoch: 31   Global Step: 54720   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:47:58,444-Speed 9408.32 samples/sec   Loss 0.9230   LearningRate 0.0001   Epoch: 31   Global Step: 54730   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:48:24,573-Speed 9405.88 samples/sec   Loss 0.9272   LearningRate 0.0001   Epoch: 31   Global Step: 54740   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 12:48:50,674-Speed 9416.43 samples/sec   Loss 0.9227   LearningRate 0.0001   Epoch: 31   Global Step: 54750   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 12:49:16,811-Speed 9403.11 samples/sec   Loss 0.9234   LearningRate 0.0001   Epoch: 31   Global Step: 54760   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 12:49:42,944-Speed 9404.53 samples/sec   Loss 0.9218   LearningRate 0.0001   Epoch: 31   Global Step: 54770   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 12:50:09,271-Speed 9335.52 samples/sec   Loss 0.9198   LearningRate 0.0001   Epoch: 31   Global Step: 54780   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 12:50:35,422-Speed 9398.26 samples/sec   Loss 0.9154   LearningRate 0.0001   Epoch: 31   Global Step: 54790   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 12:51:01,572-Speed 9398.91 samples/sec   Loss 0.9227   LearningRate 0.0001   Epoch: 31   Global Step: 54800   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 12:51:27,738-Speed 9392.65 samples/sec   Loss 0.9208   LearningRate 0.0001   Epoch: 31   Global Step: 54810   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 12:51:53,894-Speed 9396.14 samples/sec   Loss 0.9267   LearningRate 0.0001   Epoch: 31   Global Step: 54820   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 12:52:20,012-Speed 9410.24 samples/sec   Loss 0.9286   LearningRate 0.0001   Epoch: 31   Global Step: 54830   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-06 12:52:46,140-Speed 9406.47 samples/sec   Loss 0.9124   LearningRate 0.0001   Epoch: 31   Global Step: 54840   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:53:12,262-Speed 9408.59 samples/sec   Loss 0.9188   LearningRate 0.0001   Epoch: 31   Global Step: 54850   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-06 12:53:38,557-Speed 9346.98 samples/sec   Loss 0.9206   LearningRate 0.0001   Epoch: 31   Global Step: 54860   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 12:54:04,733-Speed 9389.07 samples/sec   Loss 0.9265   LearningRate 0.0001   Epoch: 31   Global Step: 54870   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-06 12:54:30,840-Speed 9413.84 samples/sec   Loss 0.9293   LearningRate 0.0001   Epoch: 31   Global Step: 54880   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-06 12:54:56,958-Speed 9410.00 samples/sec   Loss 0.9224   LearningRate 0.0001   Epoch: 31   Global Step: 54890   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-06 12:55:23,087-Speed 9406.36 samples/sec   Loss 0.9182   LearningRate 0.0001   Epoch: 31   Global Step: 54900   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-06 12:55:49,240-Speed 9397.35 samples/sec   Loss 0.9157   LearningRate 0.0001   Epoch: 31   Global Step: 54910   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-06 12:56:15,380-Speed 9402.34 samples/sec   Loss 0.9098   LearningRate 0.0001   Epoch: 31   Global Step: 54920   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-06 12:56:41,496-Speed 9410.68 samples/sec   Loss 0.9134   LearningRate 0.0001   Epoch: 31   Global Step: 54930   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-06 12:57:07,625-Speed 9405.87 samples/sec   Loss 0.9218   LearningRate 0.0001   Epoch: 31   Global Step: 54940   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-06 12:57:33,813-Speed 9385.04 samples/sec   Loss 0.9326   LearningRate 0.0001   Epoch: 31   Global Step: 54950   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-06 12:58:00,003-Speed 9384.11 samples/sec   Loss 0.9214   LearningRate 0.0001   Epoch: 31   Global Step: 54960   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-06 12:58:26,183-Speed 9387.76 samples/sec   Loss 0.9221   LearningRate 0.0001   Epoch: 31   Global Step: 54970   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 12:58:52,275-Speed 9419.66 samples/sec   Loss 0.9123   LearningRate 0.0001   Epoch: 31   Global Step: 54980   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 12:59:18,419-Speed 9400.56 samples/sec   Loss 0.9130   LearningRate 0.0001   Epoch: 31   Global Step: 54990   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 12:59:44,584-Speed 9392.77 samples/sec   Loss 0.9163   LearningRate 0.0001   Epoch: 31   Global Step: 55000   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:00:10,743-Speed 9395.39 samples/sec   Loss 0.9163   LearningRate 0.0001   Epoch: 31   Global Step: 55010   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:00:36,951-Speed 9377.88 samples/sec   Loss 0.9225   LearningRate 0.0001   Epoch: 31   Global Step: 55020   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:01:03,096-Speed 9399.95 samples/sec   Loss 0.9100   LearningRate 0.0001   Epoch: 31   Global Step: 55030   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:01:29,217-Speed 9409.06 samples/sec   Loss 0.9173   LearningRate 0.0001   Epoch: 31   Global Step: 55040   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:01:55,378-Speed 9394.25 samples/sec   Loss 0.9232   LearningRate 0.0001   Epoch: 31   Global Step: 55050   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:02:21,567-Speed 9384.71 samples/sec   Loss 0.9198   LearningRate 0.0001   Epoch: 31   Global Step: 55060   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:02:47,784-Speed 9374.72 samples/sec   Loss 0.9170   LearningRate 0.0001   Epoch: 31   Global Step: 55070   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-03-06 13:03:13,977-Speed 9382.82 samples/sec   Loss 0.9156   LearningRate 0.0001   Epoch: 31   Global Step: 55080   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:03:40,151-Speed 9390.03 samples/sec   Loss 0.9155   LearningRate 0.0001   Epoch: 31   Global Step: 55090   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:04:06,295-Speed 9400.70 samples/sec   Loss 0.9109   LearningRate 0.0001   Epoch: 31   Global Step: 55100   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:04:32,435-Speed 9402.01 samples/sec   Loss 0.9144   LearningRate 0.0001   Epoch: 31   Global Step: 55110   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:04:58,700-Speed 9357.60 samples/sec   Loss 0.9211   LearningRate 0.0001   Epoch: 31   Global Step: 55120   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:05:24,861-Speed 9394.27 samples/sec   Loss 0.9130   LearningRate 0.0001   Epoch: 31   Global Step: 55130   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:05:51,056-Speed 9382.58 samples/sec   Loss 0.9067   LearningRate 0.0001   Epoch: 31   Global Step: 55140   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:06:17,234-Speed 9388.16 samples/sec   Loss 0.9151   LearningRate 0.0001   Epoch: 31   Global Step: 55150   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:06:43,428-Speed 9382.66 samples/sec   Loss 0.9124   LearningRate 0.0001   Epoch: 31   Global Step: 55160   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:07:09,561-Speed 9404.71 samples/sec   Loss 0.9159   LearningRate 0.0001   Epoch: 31   Global Step: 55170   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:07:35,678-Speed 9410.57 samples/sec   Loss 0.9158   LearningRate 0.0001   Epoch: 31   Global Step: 55180   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-03-06 13:08:01,973-Speed 9346.77 samples/sec   Loss 0.9097   LearningRate 0.0001   Epoch: 31   Global Step: 55190   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:08:28,106-Speed 9404.88 samples/sec   Loss 0.9151   LearningRate 0.0001   Epoch: 31   Global Step: 55200   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:08:54,268-Speed 9394.06 samples/sec   Loss 0.9305   LearningRate 0.0000   Epoch: 31   Global Step: 55210   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:09:20,431-Speed 9393.89 samples/sec   Loss 0.9198   LearningRate 0.0000   Epoch: 31   Global Step: 55220   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:09:46,613-Speed 9387.15 samples/sec   Loss 0.9096   LearningRate 0.0000   Epoch: 31   Global Step: 55230   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:10:12,825-Speed 9376.45 samples/sec   Loss 0.9186   LearningRate 0.0000   Epoch: 31   Global Step: 55240   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:10:38,966-Speed 9401.81 samples/sec   Loss 0.9130   LearningRate 0.0000   Epoch: 31   Global Step: 55250   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:11:05,156-Speed 9384.28 samples/sec   Loss 0.9145   LearningRate 0.0000   Epoch: 31   Global Step: 55260   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:11:31,268-Speed 9412.09 samples/sec   Loss 0.9235   LearningRate 0.0000   Epoch: 31   Global Step: 55270   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:11:57,408-Speed 9402.09 samples/sec   Loss 0.9141   LearningRate 0.0000   Epoch: 31   Global Step: 55280   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:12:23,568-Speed 9394.99 samples/sec   Loss 0.9181   LearningRate 0.0000   Epoch: 31   Global Step: 55290   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:12:49,672-Speed 9414.92 samples/sec   Loss 0.9184   LearningRate 0.0000   Epoch: 31   Global Step: 55300   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:14:09,731-Speed 3069.78 samples/sec   Loss 0.9126   LearningRate 0.0000   Epoch: 32   Global Step: 55310   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:14:35,733-Speed 9451.92 samples/sec   Loss 0.9113   LearningRate 0.0000   Epoch: 32   Global Step: 55320   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:15:01,788-Speed 9432.82 samples/sec   Loss 0.9082   LearningRate 0.0000   Epoch: 32   Global Step: 55330   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:15:27,842-Speed 9433.13 samples/sec   Loss 0.9124   LearningRate 0.0000   Epoch: 32   Global Step: 55340   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:15:53,954-Speed 9412.36 samples/sec   Loss 0.9028   LearningRate 0.0000   Epoch: 32   Global Step: 55350   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:16:20,138-Speed 9386.03 samples/sec   Loss 0.8992   LearningRate 0.0000   Epoch: 32   Global Step: 55360   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:16:46,236-Speed 9417.25 samples/sec   Loss 0.8999   LearningRate 0.0000   Epoch: 32   Global Step: 55370   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:17:12,361-Speed 9407.68 samples/sec   Loss 0.9061   LearningRate 0.0000   Epoch: 32   Global Step: 55380   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:17:38,419-Speed 9432.15 samples/sec   Loss 0.9032   LearningRate 0.0000   Epoch: 32   Global Step: 55390   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-03-06 13:18:04,527-Speed 9413.60 samples/sec   Loss 0.9075   LearningRate 0.0000   Epoch: 32   Global Step: 55400   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-03-06 13:18:30,624-Speed 9417.51 samples/sec   Loss 0.9115   LearningRate 0.0000   Epoch: 32   Global Step: 55410   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:18:56,780-Speed 9396.18 samples/sec   Loss 0.9043   LearningRate 0.0000   Epoch: 32   Global Step: 55420   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:19:22,922-Speed 9401.50 samples/sec   Loss 0.9113   LearningRate 0.0000   Epoch: 32   Global Step: 55430   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:19:49,044-Speed 9408.70 samples/sec   Loss 0.9040   LearningRate 0.0000   Epoch: 32   Global Step: 55440   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:20:15,272-Speed 9370.62 samples/sec   Loss 0.8977   LearningRate 0.0000   Epoch: 32   Global Step: 55450   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:20:41,426-Speed 9397.26 samples/sec   Loss 0.9109   LearningRate 0.0000   Epoch: 32   Global Step: 55460   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:21:07,666-Speed 9366.27 samples/sec   Loss 0.9032   LearningRate 0.0000   Epoch: 32   Global Step: 55470   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:21:33,833-Speed 9392.55 samples/sec   Loss 0.9093   LearningRate 0.0000   Epoch: 32   Global Step: 55480   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:21:59,916-Speed 9422.58 samples/sec   Loss 0.9043   LearningRate 0.0000   Epoch: 32   Global Step: 55490   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-06 13:22:26,078-Speed 9394.28 samples/sec   Loss 0.9034   LearningRate 0.0000   Epoch: 32   Global Step: 55500   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-06 13:22:52,243-Speed 9392.94 samples/sec   Loss 0.9132   LearningRate 0.0000   Epoch: 32   Global Step: 55510   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-06 13:23:18,370-Speed 9406.77 samples/sec   Loss 0.9152   LearningRate 0.0000   Epoch: 32   Global Step: 55520   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-06 13:23:44,564-Speed 9382.85 samples/sec   Loss 0.9056   LearningRate 0.0000   Epoch: 32   Global Step: 55530   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-06 13:24:10,775-Speed 9376.66 samples/sec   Loss 0.9056   LearningRate 0.0000   Epoch: 32   Global Step: 55540   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-06 13:24:36,928-Speed 9397.47 samples/sec   Loss 0.9084   LearningRate 0.0000   Epoch: 32   Global Step: 55550   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-06 13:25:03,049-Speed 9408.75 samples/sec   Loss 0.9031   LearningRate 0.0000   Epoch: 32   Global Step: 55560   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-06 13:25:29,137-Speed 9421.03 samples/sec   Loss 0.9013   LearningRate 0.0000   Epoch: 32   Global Step: 55570   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-06 13:25:55,252-Speed 9411.13 samples/sec   Loss 0.9073   LearningRate 0.0000   Epoch: 32   Global Step: 55580   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-06 13:26:21,405-Speed 9397.32 samples/sec   Loss 0.9047   LearningRate 0.0000   Epoch: 32   Global Step: 55590   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:26:47,555-Speed 9398.76 samples/sec   Loss 0.9071   LearningRate 0.0000   Epoch: 32   Global Step: 55600   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:27:13,692-Speed 9403.31 samples/sec   Loss 0.9074   LearningRate 0.0000   Epoch: 32   Global Step: 55610   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-06 13:27:39,837-Speed 9399.94 samples/sec   Loss 0.9106   LearningRate 0.0000   Epoch: 32   Global Step: 55620   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-06 13:28:05,940-Speed 9415.74 samples/sec   Loss 0.9054   LearningRate 0.0000   Epoch: 32   Global Step: 55630   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-06 13:28:32,108-Speed 9391.76 samples/sec   Loss 0.9052   LearningRate 0.0000   Epoch: 32   Global Step: 55640   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-06 13:28:58,221-Speed 9411.95 samples/sec   Loss 0.9015   LearningRate 0.0000   Epoch: 32   Global Step: 55650   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-06 13:29:24,316-Speed 9418.28 samples/sec   Loss 0.9074   LearningRate 0.0000   Epoch: 32   Global Step: 55660   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-06 13:29:50,462-Speed 9399.79 samples/sec   Loss 0.9049   LearningRate 0.0000   Epoch: 32   Global Step: 55670   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-06 13:30:16,561-Speed 9417.07 samples/sec   Loss 0.9046   LearningRate 0.0000   Epoch: 32   Global Step: 55680   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-06 13:30:42,723-Speed 9394.00 samples/sec   Loss 0.9157   LearningRate 0.0000   Epoch: 32   Global Step: 55690   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-06 13:31:08,870-Speed 9399.90 samples/sec   Loss 0.9132   LearningRate 0.0000   Epoch: 32   Global Step: 55700   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-06 13:31:35,047-Speed 9388.69 samples/sec   Loss 0.9074   LearningRate 0.0000   Epoch: 32   Global Step: 55710   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:32:01,243-Speed 9382.12 samples/sec   Loss 0.9006   LearningRate 0.0000   Epoch: 32   Global Step: 55720   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:32:27,442-Speed 9381.26 samples/sec   Loss 0.9123   LearningRate 0.0000   Epoch: 32   Global Step: 55730   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:32:53,610-Speed 9391.80 samples/sec   Loss 0.9057   LearningRate 0.0000   Epoch: 32   Global Step: 55740   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:33:19,890-Speed 9352.05 samples/sec   Loss 0.9022   LearningRate 0.0000   Epoch: 32   Global Step: 55750   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:33:46,100-Speed 9377.23 samples/sec   Loss 0.9048   LearningRate 0.0000   Epoch: 32   Global Step: 55760   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:34:12,331-Speed 9369.66 samples/sec   Loss 0.9082   LearningRate 0.0000   Epoch: 32   Global Step: 55770   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:34:38,589-Speed 9359.62 samples/sec   Loss 0.9012   LearningRate 0.0000   Epoch: 32   Global Step: 55780   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:35:04,744-Speed 9396.79 samples/sec   Loss 0.8998   LearningRate 0.0000   Epoch: 32   Global Step: 55790   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:35:30,879-Speed 9403.59 samples/sec   Loss 0.8998   LearningRate 0.0000   Epoch: 32   Global Step: 55800   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:35:57,114-Speed 9368.39 samples/sec   Loss 0.8993   LearningRate 0.0000   Epoch: 32   Global Step: 55810   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-03-06 13:36:23,302-Speed 9384.83 samples/sec   Loss 0.9061   LearningRate 0.0000   Epoch: 32   Global Step: 55820   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:36:49,485-Speed 9386.90 samples/sec   Loss 0.9017   LearningRate 0.0000   Epoch: 32   Global Step: 55830   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:37:15,577-Speed 9419.43 samples/sec   Loss 0.9010   LearningRate 0.0000   Epoch: 32   Global Step: 55840   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:37:41,734-Speed 9396.00 samples/sec   Loss 0.9010   LearningRate 0.0000   Epoch: 32   Global Step: 55850   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:38:07,829-Speed 9418.54 samples/sec   Loss 0.9019   LearningRate 0.0000   Epoch: 32   Global Step: 55860   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:38:33,961-Speed 9405.01 samples/sec   Loss 0.8989   LearningRate 0.0000   Epoch: 32   Global Step: 55870   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:39:00,139-Speed 9389.06 samples/sec   Loss 0.9013   LearningRate 0.0000   Epoch: 32   Global Step: 55880   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:39:26,351-Speed 9375.99 samples/sec   Loss 0.8963   LearningRate 0.0000   Epoch: 32   Global Step: 55890   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:39:52,516-Speed 9393.34 samples/sec   Loss 0.8962   LearningRate 0.0000   Epoch: 32   Global Step: 55900   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:40:18,732-Speed 9375.29 samples/sec   Loss 0.8993   LearningRate 0.0000   Epoch: 32   Global Step: 55910   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:40:44,747-Speed 9447.29 samples/sec   Loss 0.8898   LearningRate 0.0000   Epoch: 32   Global Step: 55920   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-06 13:41:10,960-Speed 9376.03 samples/sec   Loss 0.8981   LearningRate 0.0000   Epoch: 32   Global Step: 55930   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-06 13:41:37,081-Speed 9409.10 samples/sec   Loss 0.8976   LearningRate 0.0000   Epoch: 32   Global Step: 55940   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-06 13:42:03,224-Speed 9400.99 samples/sec   Loss 0.8993   LearningRate 0.0000   Epoch: 32   Global Step: 55950   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-06 13:42:29,330-Speed 9414.43 samples/sec   Loss 0.8985   LearningRate 0.0000   Epoch: 32   Global Step: 55960   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-06 13:42:55,445-Speed 9410.94 samples/sec   Loss 0.8963   LearningRate 0.0000   Epoch: 32   Global Step: 55970   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-06 13:43:21,530-Speed 9421.97 samples/sec   Loss 0.8997   LearningRate 0.0000   Epoch: 32   Global Step: 55980   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-06 13:43:47,708-Speed 9388.80 samples/sec   Loss 0.8996   LearningRate 0.0000   Epoch: 32   Global Step: 55990   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-06 13:44:13,775-Speed 9428.46 samples/sec   Loss 0.8944   LearningRate 0.0000   Epoch: 32   Global Step: 56000   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-06 13:44:39,926-Speed 9398.08 samples/sec   Loss 0.8973   LearningRate 0.0000   Epoch: 32   Global Step: 56010   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-06 13:45:06,029-Speed 9415.46 samples/sec   Loss 0.8975   LearningRate 0.0000   Epoch: 32   Global Step: 56020   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:45:32,204-Speed 9389.51 samples/sec   Loss 0.8966   LearningRate 0.0000   Epoch: 32   Global Step: 56030   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:45:58,406-Speed 9380.02 samples/sec   Loss 0.8963   LearningRate 0.0000   Epoch: 32   Global Step: 56040   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:46:24,646-Speed 9366.29 samples/sec   Loss 0.8947   LearningRate 0.0000   Epoch: 32   Global Step: 56050   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:46:50,815-Speed 9391.98 samples/sec   Loss 0.8962   LearningRate 0.0000   Epoch: 32   Global Step: 56060   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:47:16,947-Speed 9405.03 samples/sec   Loss 0.8965   LearningRate 0.0000   Epoch: 32   Global Step: 56070   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-06 13:47:43,055-Speed 9413.69 samples/sec   Loss 0.8918   LearningRate 0.0000   Epoch: 32   Global Step: 56080   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-06 13:48:09,273-Speed 9374.21 samples/sec   Loss 0.8944   LearningRate 0.0000   Epoch: 32   Global Step: 56090   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-06 13:48:35,360-Speed 9421.17 samples/sec   Loss 0.9095   LearningRate 0.0000   Epoch: 32   Global Step: 56100   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-06 13:49:01,554-Speed 9383.19 samples/sec   Loss 0.8980   LearningRate 0.0000   Epoch: 32   Global Step: 56110   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-06 13:49:27,797-Speed 9365.20 samples/sec   Loss 0.9030   LearningRate 0.0000   Epoch: 32   Global Step: 56120   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-06 13:49:53,993-Speed 9381.95 samples/sec   Loss 0.8985   LearningRate 0.0000   Epoch: 32   Global Step: 56130   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-06 13:50:20,190-Speed 9381.67 samples/sec   Loss 0.9070   LearningRate 0.0000   Epoch: 32   Global Step: 56140   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-06 13:50:46,378-Speed 9385.08 samples/sec   Loss 0.9039   LearningRate 0.0000   Epoch: 32   Global Step: 56150   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-06 13:51:12,575-Speed 9381.65 samples/sec   Loss 0.8997   LearningRate 0.0000   Epoch: 32   Global Step: 56160   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-06 13:51:38,769-Speed 9382.90 samples/sec   Loss 0.8920   LearningRate 0.0000   Epoch: 32   Global Step: 56170   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-06 13:52:05,045-Speed 9353.55 samples/sec   Loss 0.8895   LearningRate 0.0000   Epoch: 32   Global Step: 56180   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-06 13:52:31,278-Speed 9369.04 samples/sec   Loss 0.8961   LearningRate 0.0000   Epoch: 32   Global Step: 56190   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-06 13:52:57,506-Speed 9370.72 samples/sec   Loss 0.8890   LearningRate 0.0000   Epoch: 32   Global Step: 56200   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-06 13:53:23,727-Speed 9373.27 samples/sec   Loss 0.8923   LearningRate 0.0000   Epoch: 32   Global Step: 56210   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-06 13:53:49,889-Speed 9394.36 samples/sec   Loss 0.8907   LearningRate 0.0000   Epoch: 32   Global Step: 56220   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-06 13:54:16,079-Speed 9384.14 samples/sec   Loss 0.8833   LearningRate 0.0000   Epoch: 32   Global Step: 56230   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-06 13:54:42,259-Speed 9387.78 samples/sec   Loss 0.8893   LearningRate 0.0000   Epoch: 32   Global Step: 56240   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-06 13:55:08,547-Speed 9348.97 samples/sec   Loss 0.8914   LearningRate 0.0000   Epoch: 32   Global Step: 56250   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-06 13:55:34,722-Speed 9389.75 samples/sec   Loss 0.8924   LearningRate 0.0000   Epoch: 32   Global Step: 56260   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-06 13:56:00,918-Speed 9381.83 samples/sec   Loss 0.8910   LearningRate 0.0000   Epoch: 32   Global Step: 56270   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-06 13:56:27,079-Speed 9394.48 samples/sec   Loss 0.8936   LearningRate 0.0000   Epoch: 32   Global Step: 56280   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-06 13:56:53,332-Speed 9361.86 samples/sec   Loss 0.8931   LearningRate 0.0000   Epoch: 32   Global Step: 56290   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-06 13:57:19,530-Speed 9381.06 samples/sec   Loss 0.8947   LearningRate 0.0000   Epoch: 32   Global Step: 56300   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-06 13:57:45,690-Speed 9395.01 samples/sec   Loss 0.8980   LearningRate 0.0000   Epoch: 32   Global Step: 56310   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-06 13:58:11,872-Speed 9387.06 samples/sec   Loss 0.8931   LearningRate 0.0000   Epoch: 32   Global Step: 56320   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-06 13:58:38,091-Speed 9374.20 samples/sec   Loss 0.8907   LearningRate 0.0000   Epoch: 32   Global Step: 56330   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-06 13:59:04,285-Speed 9382.91 samples/sec   Loss 0.8940   LearningRate 0.0000   Epoch: 32   Global Step: 56340   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-06 13:59:30,464-Speed 9388.04 samples/sec   Loss 0.8878   LearningRate 0.0000   Epoch: 32   Global Step: 56350   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-06 13:59:56,539-Speed 9425.29 samples/sec   Loss 0.8895   LearningRate 0.0000   Epoch: 32   Global Step: 56360   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-06 14:00:22,630-Speed 9420.07 samples/sec   Loss 0.8874   LearningRate 0.0000   Epoch: 32   Global Step: 56370   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-06 14:00:48,760-Speed 9406.61 samples/sec   Loss 0.8873   LearningRate 0.0000   Epoch: 32   Global Step: 56380   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-06 14:01:14,974-Speed 9375.49 samples/sec   Loss 0.8897   LearningRate 0.0000   Epoch: 32   Global Step: 56390   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-06 14:01:41,085-Speed 9412.40 samples/sec   Loss 0.8929   LearningRate 0.0000   Epoch: 32   Global Step: 56400   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-06 14:02:07,258-Speed 9389.88 samples/sec   Loss 0.8908   LearningRate 0.0000   Epoch: 32   Global Step: 56410   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-06 14:02:33,451-Speed 9383.31 samples/sec   Loss 0.8896   LearningRate 0.0000   Epoch: 32   Global Step: 56420   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-06 14:02:59,558-Speed 9413.81 samples/sec   Loss 0.8891   LearningRate 0.0000   Epoch: 32   Global Step: 56430   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-06 14:03:25,793-Speed 9367.86 samples/sec   Loss 0.8920   LearningRate 0.0000   Epoch: 32   Global Step: 56440   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-06 14:03:51,992-Speed 9381.28 samples/sec   Loss 0.8887   LearningRate 0.0000   Epoch: 32   Global Step: 56450   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-06 14:04:18,252-Speed 9358.86 samples/sec   Loss 0.8949   LearningRate 0.0000   Epoch: 32   Global Step: 56460   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:04:44,495-Speed 9365.49 samples/sec   Loss 0.8939   LearningRate 0.0000   Epoch: 32   Global Step: 56470   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:05:10,674-Speed 9388.20 samples/sec   Loss 0.9013   LearningRate 0.0000   Epoch: 32   Global Step: 56480   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:05:36,885-Speed 9376.63 samples/sec   Loss 0.8905   LearningRate 0.0000   Epoch: 32   Global Step: 56490   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:06:03,014-Speed 9406.22 samples/sec   Loss 0.8881   LearningRate 0.0000   Epoch: 32   Global Step: 56500   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:06:29,156-Speed 9401.49 samples/sec   Loss 0.8859   LearningRate 0.0000   Epoch: 32   Global Step: 56510   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:06:55,283-Speed 9407.66 samples/sec   Loss 0.8890   LearningRate 0.0000   Epoch: 32   Global Step: 56520   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:07:21,389-Speed 9414.26 samples/sec   Loss 0.8832   LearningRate 0.0000   Epoch: 32   Global Step: 56530   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:07:47,562-Speed 9390.39 samples/sec   Loss 0.8863   LearningRate 0.0000   Epoch: 32   Global Step: 56540   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-06 14:08:13,674-Speed 9412.05 samples/sec   Loss 0.8872   LearningRate 0.0000   Epoch: 32   Global Step: 56550   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-06 14:08:39,808-Speed 9404.08 samples/sec   Loss 0.8876   LearningRate 0.0000   Epoch: 32   Global Step: 56560   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-06 14:09:05,963-Speed 9396.66 samples/sec   Loss 0.8894   LearningRate 0.0000   Epoch: 32   Global Step: 56570   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-06 14:09:32,016-Speed 9433.65 samples/sec   Loss 0.8851   LearningRate 0.0000   Epoch: 32   Global Step: 56580   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-06 14:09:58,103-Speed 9421.02 samples/sec   Loss 0.8837   LearningRate 0.0000   Epoch: 32   Global Step: 56590   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-06 14:10:24,223-Speed 9409.44 samples/sec   Loss 0.8862   LearningRate 0.0000   Epoch: 32   Global Step: 56600   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-06 14:10:50,353-Speed 9405.44 samples/sec   Loss 0.8817   LearningRate 0.0000   Epoch: 32   Global Step: 56610   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-06 14:11:16,442-Speed 9420.71 samples/sec   Loss 0.8876   LearningRate 0.0000   Epoch: 32   Global Step: 56620   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-06 14:11:42,557-Speed 9411.15 samples/sec   Loss 0.8909   LearningRate 0.0000   Epoch: 32   Global Step: 56630   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-06 14:12:08,764-Speed 9378.05 samples/sec   Loss 0.8922   LearningRate 0.0000   Epoch: 32   Global Step: 56640   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:12:34,918-Speed 9396.71 samples/sec   Loss 0.8871   LearningRate 0.0000   Epoch: 32   Global Step: 56650   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:13:01,019-Speed 9416.65 samples/sec   Loss 0.8830   LearningRate 0.0000   Epoch: 32   Global Step: 56660   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:13:30,459-Speed 8348.14 samples/sec   Loss 0.8877   LearningRate 0.0000   Epoch: 32   Global Step: 56670   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:13:56,540-Speed 9423.77 samples/sec   Loss 0.8806   LearningRate 0.0000   Epoch: 32   Global Step: 56680   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-06 14:14:22,722-Speed 9387.16 samples/sec   Loss 0.8868   LearningRate 0.0000   Epoch: 32   Global Step: 56690   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-06 14:14:48,955-Speed 9368.77 samples/sec   Loss 0.8858   LearningRate 0.0000   Epoch: 32   Global Step: 56700   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-06 14:15:15,138-Speed 9386.67 samples/sec   Loss 0.8820   LearningRate 0.0000   Epoch: 32   Global Step: 56710   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-06 14:15:41,240-Speed 9415.86 samples/sec   Loss 0.8828   LearningRate 0.0000   Epoch: 32   Global Step: 56720   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-06 14:16:07,326-Speed 9422.18 samples/sec   Loss 0.8941   LearningRate 0.0000   Epoch: 32   Global Step: 56730   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-06 14:16:33,445-Speed 9409.69 samples/sec   Loss 0.8882   LearningRate 0.0000   Epoch: 32   Global Step: 56740   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-06 14:16:59,559-Speed 9411.45 samples/sec   Loss 0.8811   LearningRate 0.0000   Epoch: 32   Global Step: 56750   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-06 14:17:25,740-Speed 9387.53 samples/sec   Loss 0.8796   LearningRate 0.0000   Epoch: 32   Global Step: 56760   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-06 14:17:51,945-Speed 9378.85 samples/sec   Loss 0.8854   LearningRate 0.0000   Epoch: 32   Global Step: 56770   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-06 14:18:18,166-Speed 9373.04 samples/sec   Loss 0.8814   LearningRate 0.0000   Epoch: 32   Global Step: 56780   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-06 14:18:44,379-Speed 9375.89 samples/sec   Loss 0.8861   LearningRate 0.0000   Epoch: 32   Global Step: 56790   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-06 14:19:10,494-Speed 9411.03 samples/sec   Loss 0.8824   LearningRate 0.0000   Epoch: 32   Global Step: 56800   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-06 14:19:36,689-Speed 9382.60 samples/sec   Loss 0.8863   LearningRate 0.0000   Epoch: 32   Global Step: 56810   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-06 14:20:02,800-Speed 9412.89 samples/sec   Loss 0.8824   LearningRate 0.0000   Epoch: 32   Global Step: 56820   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-06 14:20:28,963-Speed 9393.75 samples/sec   Loss 0.8769   LearningRate 0.0000   Epoch: 32   Global Step: 56830   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-06 14:20:55,146-Speed 9386.42 samples/sec   Loss 0.8873   LearningRate 0.0000   Epoch: 32   Global Step: 56840   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-06 14:21:21,295-Speed 9398.86 samples/sec   Loss 0.8854   LearningRate 0.0000   Epoch: 32   Global Step: 56850   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-06 14:21:47,505-Speed 9377.35 samples/sec   Loss 0.8865   LearningRate 0.0000   Epoch: 32   Global Step: 56860   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-06 14:22:13,704-Speed 9380.88 samples/sec   Loss 0.8799   LearningRate 0.0000   Epoch: 32   Global Step: 56870   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-06 14:22:39,858-Speed 9397.20 samples/sec   Loss 0.8808   LearningRate 0.0000   Epoch: 32   Global Step: 56880   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:23:05,937-Speed 9423.96 samples/sec   Loss 0.8829   LearningRate 0.0000   Epoch: 32   Global Step: 56890   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:23:32,150-Speed 9375.78 samples/sec   Loss 0.8836   LearningRate 0.0000   Epoch: 32   Global Step: 56900   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:23:58,285-Speed 9404.02 samples/sec   Loss 0.8894   LearningRate 0.0000   Epoch: 32   Global Step: 56910   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:24:24,434-Speed 9398.86 samples/sec   Loss 0.8911   LearningRate 0.0000   Epoch: 32   Global Step: 56920   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:24:50,541-Speed 9414.05 samples/sec   Loss 0.8848   LearningRate 0.0000   Epoch: 32   Global Step: 56930   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:25:16,737-Speed 9381.97 samples/sec   Loss 0.8904   LearningRate 0.0000   Epoch: 32   Global Step: 56940   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:25:43,003-Speed 9357.03 samples/sec   Loss 0.8805   LearningRate 0.0000   Epoch: 32   Global Step: 56950   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:26:09,209-Speed 9378.20 samples/sec   Loss 0.8772   LearningRate 0.0000   Epoch: 32   Global Step: 56960   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:26:35,364-Speed 9397.04 samples/sec   Loss 0.8745   LearningRate 0.0000   Epoch: 32   Global Step: 56970   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:27:01,510-Speed 9399.94 samples/sec   Loss 0.8785   LearningRate 0.0000   Epoch: 32   Global Step: 56980   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-03-06 14:27:27,664-Speed 9396.66 samples/sec   Loss 0.8806   LearningRate 0.0000   Epoch: 32   Global Step: 56990   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-03-06 14:27:53,920-Speed 9360.47 samples/sec   Loss 0.8800   LearningRate 0.0000   Epoch: 32   Global Step: 57000   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:28:20,134-Speed 9375.65 samples/sec   Loss 0.8815   LearningRate 0.0000   Epoch: 32   Global Step: 57010   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:28:46,344-Speed 9377.00 samples/sec   Loss 0.8847   LearningRate 0.0000   Epoch: 32   Global Step: 57020   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:29:12,505-Speed 9394.49 samples/sec   Loss 0.8817   LearningRate 0.0000   Epoch: 32   Global Step: 57030   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:30:31,558-Speed 3108.85 samples/sec   Loss 0.8796   LearningRate 0.0000   Epoch: 33   Global Step: 57040   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:30:57,529-Speed 9463.35 samples/sec   Loss 0.8733   LearningRate 0.0000   Epoch: 33   Global Step: 57050   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:31:23,677-Speed 9399.03 samples/sec   Loss 0.8707   LearningRate 0.0000   Epoch: 33   Global Step: 57060   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:31:49,675-Speed 9453.84 samples/sec   Loss 0.8793   LearningRate 0.0000   Epoch: 33   Global Step: 57070   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:32:15,734-Speed 9431.07 samples/sec   Loss 0.8737   LearningRate 0.0000   Epoch: 33   Global Step: 57080   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:32:41,855-Speed 9409.00 samples/sec   Loss 0.8808   LearningRate 0.0000   Epoch: 33   Global Step: 57090   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:33:07,886-Speed 9441.54 samples/sec   Loss 0.8816   LearningRate 0.0000   Epoch: 33   Global Step: 57100   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-03-06 14:33:33,965-Speed 9424.10 samples/sec   Loss 0.8689   LearningRate 0.0000   Epoch: 33   Global Step: 57110   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:34:00,022-Speed 9432.15 samples/sec   Loss 0.8749   LearningRate 0.0000   Epoch: 33   Global Step: 57120   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:34:26,038-Speed 9446.66 samples/sec   Loss 0.8732   LearningRate 0.0000   Epoch: 33   Global Step: 57130   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:34:52,081-Speed 9437.49 samples/sec   Loss 0.8777   LearningRate 0.0000   Epoch: 33   Global Step: 57140   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:35:18,139-Speed 9431.75 samples/sec   Loss 0.8758   LearningRate 0.0000   Epoch: 33   Global Step: 57150   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:35:44,275-Speed 9403.46 samples/sec   Loss 0.8769   LearningRate 0.0000   Epoch: 33   Global Step: 57160   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:36:10,303-Speed 9442.56 samples/sec   Loss 0.8689   LearningRate 0.0000   Epoch: 33   Global Step: 57170   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-06 14:36:36,393-Speed 9420.21 samples/sec   Loss 0.8753   LearningRate 0.0000   Epoch: 33   Global Step: 57180   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-06 14:37:02,520-Speed 9406.72 samples/sec   Loss 0.8739   LearningRate 0.0000   Epoch: 33   Global Step: 57190   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-06 14:37:28,673-Speed 9397.13 samples/sec   Loss 0.8707   LearningRate 0.0000   Epoch: 33   Global Step: 57200   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-06 14:37:54,806-Speed 9404.61 samples/sec   Loss 0.8810   LearningRate 0.0000   Epoch: 33   Global Step: 57210   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-06 14:38:21,002-Speed 9383.21 samples/sec   Loss 0.8749   LearningRate 0.0000   Epoch: 33   Global Step: 57220   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-06 14:38:47,181-Speed 9387.80 samples/sec   Loss 0.8768   LearningRate 0.0000   Epoch: 33   Global Step: 57230   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-06 14:39:13,388-Speed 9377.99 samples/sec   Loss 0.8769   LearningRate 0.0000   Epoch: 33   Global Step: 57240   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-06 14:39:39,573-Speed 9386.10 samples/sec   Loss 0.8677   LearningRate 0.0000   Epoch: 33   Global Step: 57250   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-06 14:40:05,727-Speed 9397.02 samples/sec   Loss 0.8805   LearningRate 0.0000   Epoch: 33   Global Step: 57260   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-06 14:40:31,978-Speed 9362.07 samples/sec   Loss 0.8728   LearningRate 0.0000   Epoch: 33   Global Step: 57270   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:40:58,077-Speed 9417.01 samples/sec   Loss 0.8731   LearningRate 0.0000   Epoch: 33   Global Step: 57280   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:41:24,166-Speed 9420.59 samples/sec   Loss 0.8737   LearningRate 0.0000   Epoch: 33   Global Step: 57290   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:41:50,265-Speed 9416.96 samples/sec   Loss 0.8787   LearningRate 0.0000   Epoch: 33   Global Step: 57300   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:42:16,364-Speed 9417.03 samples/sec   Loss 0.8740   LearningRate 0.0000   Epoch: 33   Global Step: 57310   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:42:42,434-Speed 9427.09 samples/sec   Loss 0.8734   LearningRate 0.0000   Epoch: 33   Global Step: 57320   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:43:08,476-Speed 9437.85 samples/sec   Loss 0.8766   LearningRate 0.0000   Epoch: 33   Global Step: 57330   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:43:34,603-Speed 9406.83 samples/sec   Loss 0.8801   LearningRate 0.0000   Epoch: 33   Global Step: 57340   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:44:00,817-Speed 9375.57 samples/sec   Loss 0.8748   LearningRate 0.0000   Epoch: 33   Global Step: 57350   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:44:26,963-Speed 9399.99 samples/sec   Loss 0.8691   LearningRate 0.0000   Epoch: 33   Global Step: 57360   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:44:53,130-Speed 9392.36 samples/sec   Loss 0.8818   LearningRate 0.0000   Epoch: 33   Global Step: 57370   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-03-06 14:45:19,211-Speed 9423.22 samples/sec   Loss 0.8745   LearningRate 0.0000   Epoch: 33   Global Step: 57380   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:45:48,924-Speed 8271.70 samples/sec   Loss 0.8747   LearningRate 0.0000   Epoch: 33   Global Step: 57390   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:46:15,017-Speed 9419.24 samples/sec   Loss 0.8720   LearningRate 0.0000   Epoch: 33   Global Step: 57400   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:46:41,145-Speed 9406.04 samples/sec   Loss 0.8710   LearningRate 0.0000   Epoch: 33   Global Step: 57410   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:47:07,274-Speed 9406.03 samples/sec   Loss 0.8734   LearningRate 0.0000   Epoch: 33   Global Step: 57420   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:47:33,330-Speed 9432.66 samples/sec   Loss 0.8763   LearningRate 0.0000   Epoch: 33   Global Step: 57430   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:47:59,403-Speed 9426.25 samples/sec   Loss 0.8777   LearningRate 0.0000   Epoch: 33   Global Step: 57440   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:48:25,526-Speed 9408.29 samples/sec   Loss 0.8716   LearningRate 0.0000   Epoch: 33   Global Step: 57450   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:48:51,659-Speed 9404.69 samples/sec   Loss 0.8688   LearningRate 0.0000   Epoch: 33   Global Step: 57460   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:49:17,717-Speed 9431.45 samples/sec   Loss 0.8800   LearningRate 0.0000   Epoch: 33   Global Step: 57470   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:49:43,935-Speed 9374.21 samples/sec   Loss 0.8731   LearningRate 0.0000   Epoch: 33   Global Step: 57480   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-03-06 14:50:09,985-Speed 9434.48 samples/sec   Loss 0.8645   LearningRate 0.0000   Epoch: 33   Global Step: 57490   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:50:36,087-Speed 9415.86 samples/sec   Loss 0.8647   LearningRate 0.0000   Epoch: 33   Global Step: 57500   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:51:02,216-Speed 9406.00 samples/sec   Loss 0.8793   LearningRate 0.0000   Epoch: 33   Global Step: 57510   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:51:28,353-Speed 9403.10 samples/sec   Loss 0.8652   LearningRate 0.0000   Epoch: 33   Global Step: 57520   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-06 14:51:54,445-Speed 9419.20 samples/sec   Loss 0.8742   LearningRate 0.0000   Epoch: 33   Global Step: 57530   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-06 14:52:20,535-Speed 9420.50 samples/sec   Loss 0.8666   LearningRate 0.0000   Epoch: 33   Global Step: 57540   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-06 14:52:46,611-Speed 9424.91 samples/sec   Loss 0.8688   LearningRate 0.0000   Epoch: 33   Global Step: 57550   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-06 14:53:12,689-Speed 9424.75 samples/sec   Loss 0.8646   LearningRate 0.0000   Epoch: 33   Global Step: 57560   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-06 14:53:38,908-Speed 9373.74 samples/sec   Loss 0.8710   LearningRate 0.0000   Epoch: 33   Global Step: 57570   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-06 14:54:05,177-Speed 9355.66 samples/sec   Loss 0.8696   LearningRate 0.0000   Epoch: 33   Global Step: 57580   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-06 14:54:31,415-Speed 9367.29 samples/sec   Loss 0.8784   LearningRate 0.0000   Epoch: 33   Global Step: 57590   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-06 14:54:57,645-Speed 9369.78 samples/sec   Loss 0.8624   LearningRate 0.0000   Epoch: 33   Global Step: 57600   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-06 14:55:23,907-Speed 9358.36 samples/sec   Loss 0.8715   LearningRate 0.0000   Epoch: 33   Global Step: 57610   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-06 14:55:50,141-Speed 9368.19 samples/sec   Loss 0.8703   LearningRate 0.0000   Epoch: 33   Global Step: 57620   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-06 14:56:16,366-Speed 9371.99 samples/sec   Loss 0.8687   LearningRate 0.0000   Epoch: 33   Global Step: 57630   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-06 14:56:42,555-Speed 9384.38 samples/sec   Loss 0.8749   LearningRate 0.0000   Epoch: 33   Global Step: 57640   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-06 14:57:08,763-Speed 9377.91 samples/sec   Loss 0.8758   LearningRate 0.0000   Epoch: 33   Global Step: 57650   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 14:57:34,866-Speed 9415.65 samples/sec   Loss 0.8665   LearningRate 0.0000   Epoch: 33   Global Step: 57660   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 14:58:00,946-Speed 9423.65 samples/sec   Loss 0.8706   LearningRate 0.0000   Epoch: 33   Global Step: 57670   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 14:58:28,073-Speed 9060.01 samples/sec   Loss 0.8653   LearningRate 0.0000   Epoch: 33   Global Step: 57680   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 14:58:54,227-Speed 9397.20 samples/sec   Loss 0.8691   LearningRate 0.0000   Epoch: 33   Global Step: 57690   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 14:59:20,394-Speed 9392.36 samples/sec   Loss 0.8748   LearningRate 0.0000   Epoch: 33   Global Step: 57700   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 14:59:46,559-Speed 9393.03 samples/sec   Loss 0.8692   LearningRate 0.0000   Epoch: 33   Global Step: 57710   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:00:12,730-Speed 9391.01 samples/sec   Loss 0.8679   LearningRate 0.0000   Epoch: 33   Global Step: 57720   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:00:38,997-Speed 9356.69 samples/sec   Loss 0.8705   LearningRate 0.0000   Epoch: 33   Global Step: 57730   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:01:05,169-Speed 9390.25 samples/sec   Loss 0.8705   LearningRate 0.0000   Epoch: 33   Global Step: 57740   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:01:31,307-Speed 9402.98 samples/sec   Loss 0.8704   LearningRate 0.0000   Epoch: 33   Global Step: 57750   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-06 15:01:57,487-Speed 9387.57 samples/sec   Loss 0.8752   LearningRate 0.0000   Epoch: 33   Global Step: 57760   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-06 15:02:23,532-Speed 9436.28 samples/sec   Loss 0.8672   LearningRate 0.0000   Epoch: 33   Global Step: 57770   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:02:49,685-Speed 9397.58 samples/sec   Loss 0.8684   LearningRate 0.0000   Epoch: 33   Global Step: 57780   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:03:15,761-Speed 9424.96 samples/sec   Loss 0.8711   LearningRate 0.0000   Epoch: 33   Global Step: 57790   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:03:41,897-Speed 9403.74 samples/sec   Loss 0.8623   LearningRate 0.0000   Epoch: 33   Global Step: 57800   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:04:08,108-Speed 9376.81 samples/sec   Loss 0.8666   LearningRate 0.0000   Epoch: 33   Global Step: 57810   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:04:34,282-Speed 9389.82 samples/sec   Loss 0.8695   LearningRate 0.0000   Epoch: 33   Global Step: 57820   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:05:00,410-Speed 9406.50 samples/sec   Loss 0.8672   LearningRate 0.0000   Epoch: 33   Global Step: 57830   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:05:26,726-Speed 9339.27 samples/sec   Loss 0.8569   LearningRate 0.0000   Epoch: 33   Global Step: 57840   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:05:52,956-Speed 9369.92 samples/sec   Loss 0.8613   LearningRate 0.0000   Epoch: 33   Global Step: 57850   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:06:19,178-Speed 9372.75 samples/sec   Loss 0.8656   LearningRate 0.0000   Epoch: 33   Global Step: 57860   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:06:45,386-Speed 9377.69 samples/sec   Loss 0.8636   LearningRate 0.0000   Epoch: 33   Global Step: 57870   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-06 15:07:11,526-Speed 9401.94 samples/sec   Loss 0.8648   LearningRate 0.0000   Epoch: 33   Global Step: 57880   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-06 15:07:37,727-Speed 9380.14 samples/sec   Loss 0.8599   LearningRate 0.0000   Epoch: 33   Global Step: 57890   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-06 15:08:03,857-Speed 9406.00 samples/sec   Loss 0.8628   LearningRate 0.0000   Epoch: 33   Global Step: 57900   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-06 15:08:30,035-Speed 9389.21 samples/sec   Loss 0.8658   LearningRate 0.0000   Epoch: 33   Global Step: 57910   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-06 15:08:56,216-Speed 9387.38 samples/sec   Loss 0.8600   LearningRate 0.0000   Epoch: 33   Global Step: 57920   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-06 15:09:22,373-Speed 9395.70 samples/sec   Loss 0.8567   LearningRate 0.0000   Epoch: 33   Global Step: 57930   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-06 15:09:48,483-Speed 9412.90 samples/sec   Loss 0.8629   LearningRate 0.0000   Epoch: 33   Global Step: 57940   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-06 15:10:14,603-Speed 9409.35 samples/sec   Loss 0.8638   LearningRate 0.0000   Epoch: 33   Global Step: 57950   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-06 15:10:40,693-Speed 9420.18 samples/sec   Loss 0.8628   LearningRate 0.0000   Epoch: 33   Global Step: 57960   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-06 15:11:06,855-Speed 9394.01 samples/sec   Loss 0.8608   LearningRate 0.0000   Epoch: 33   Global Step: 57970   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-06 15:11:33,115-Speed 9359.33 samples/sec   Loss 0.8662   LearningRate 0.0000   Epoch: 33   Global Step: 57980   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:11:59,268-Speed 9397.65 samples/sec   Loss 0.8547   LearningRate 0.0000   Epoch: 33   Global Step: 57990   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:12:25,402-Speed 9404.31 samples/sec   Loss 0.8585   LearningRate 0.0000   Epoch: 33   Global Step: 58000   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:12:51,780-Speed 9317.24 samples/sec   Loss 0.8576   LearningRate 0.0000   Epoch: 33   Global Step: 58010   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:13:17,947-Speed 9392.24 samples/sec   Loss 0.8580   LearningRate 0.0000   Epoch: 33   Global Step: 58020   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:13:44,214-Speed 9356.36 samples/sec   Loss 0.8669   LearningRate 0.0000   Epoch: 33   Global Step: 58030   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:14:10,380-Speed 9392.75 samples/sec   Loss 0.8637   LearningRate 0.0000   Epoch: 33   Global Step: 58040   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:14:36,540-Speed 9395.02 samples/sec   Loss 0.8641   LearningRate 0.0000   Epoch: 33   Global Step: 58050   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:15:02,710-Speed 9391.24 samples/sec   Loss 0.8598   LearningRate 0.0000   Epoch: 33   Global Step: 58060   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:15:28,846-Speed 9403.56 samples/sec   Loss 0.8665   LearningRate 0.0000   Epoch: 33   Global Step: 58070   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:15:55,158-Speed 9340.41 samples/sec   Loss 0.8614   LearningRate 0.0000   Epoch: 33   Global Step: 58080   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-06 15:16:21,351-Speed 9383.22 samples/sec   Loss 0.8642   LearningRate 0.0000   Epoch: 33   Global Step: 58090   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-06 15:16:47,480-Speed 9406.15 samples/sec   Loss 0.8577   LearningRate 0.0000   Epoch: 33   Global Step: 58100   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-06 15:17:13,706-Speed 9371.33 samples/sec   Loss 0.8603   LearningRate 0.0000   Epoch: 33   Global Step: 58110   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-06 15:17:39,844-Speed 9402.68 samples/sec   Loss 0.8590   LearningRate 0.0000   Epoch: 33   Global Step: 58120   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-06 15:18:05,894-Speed 9434.76 samples/sec   Loss 0.8607   LearningRate 0.0000   Epoch: 33   Global Step: 58130   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:18:32,027-Speed 9404.94 samples/sec   Loss 0.8608   LearningRate 0.0000   Epoch: 33   Global Step: 58140   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:18:58,097-Speed 9427.20 samples/sec   Loss 0.8567   LearningRate 0.0000   Epoch: 33   Global Step: 58150   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:19:24,242-Speed 9400.11 samples/sec   Loss 0.8535   LearningRate 0.0000   Epoch: 33   Global Step: 58160   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:19:50,413-Speed 9391.05 samples/sec   Loss 0.8561   LearningRate 0.0000   Epoch: 33   Global Step: 58170   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:20:16,627-Speed 9375.76 samples/sec   Loss 0.8660   LearningRate 0.0000   Epoch: 33   Global Step: 58180   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:20:42,905-Speed 9352.86 samples/sec   Loss 0.8628   LearningRate 0.0000   Epoch: 33   Global Step: 58190   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:21:09,024-Speed 9409.65 samples/sec   Loss 0.8564   LearningRate 0.0000   Epoch: 33   Global Step: 58200   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:21:35,153-Speed 9406.20 samples/sec   Loss 0.8477   LearningRate 0.0000   Epoch: 33   Global Step: 58210   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:22:01,284-Speed 9405.06 samples/sec   Loss 0.8613   LearningRate 0.0000   Epoch: 33   Global Step: 58220   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:22:27,540-Speed 9360.68 samples/sec   Loss 0.8594   LearningRate 0.0000   Epoch: 33   Global Step: 58230   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-06 15:22:53,684-Speed 9400.96 samples/sec   Loss 0.8608   LearningRate 0.0000   Epoch: 33   Global Step: 58240   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-06 15:23:19,852-Speed 9391.90 samples/sec   Loss 0.8653   LearningRate 0.0000   Epoch: 33   Global Step: 58250   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-06 15:23:45,923-Speed 9427.04 samples/sec   Loss 0.8566   LearningRate 0.0000   Epoch: 33   Global Step: 58260   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-06 15:24:12,103-Speed 9387.48 samples/sec   Loss 0.8587   LearningRate 0.0000   Epoch: 33   Global Step: 58270   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-06 15:24:38,243-Speed 9402.19 samples/sec   Loss 0.8595   LearningRate 0.0000   Epoch: 33   Global Step: 58280   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-06 15:25:04,411-Speed 9392.02 samples/sec   Loss 0.8593   LearningRate 0.0000   Epoch: 33   Global Step: 58290   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-06 15:25:30,487-Speed 9425.43 samples/sec   Loss 0.8591   LearningRate 0.0000   Epoch: 33   Global Step: 58300   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-06 15:25:56,530-Speed 9436.87 samples/sec   Loss 0.8569   LearningRate 0.0000   Epoch: 33   Global Step: 58310   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-06 15:26:22,720-Speed 9383.93 samples/sec   Loss 0.8541   LearningRate 0.0000   Epoch: 33   Global Step: 58320   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-06 15:26:48,846-Speed 9407.55 samples/sec   Loss 0.8540   LearningRate 0.0000   Epoch: 33   Global Step: 58330   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-06 15:27:15,122-Speed 9353.09 samples/sec   Loss 0.8639   LearningRate 0.0000   Epoch: 33   Global Step: 58340   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-06 15:27:41,406-Speed 9350.82 samples/sec   Loss 0.8642   LearningRate 0.0000   Epoch: 33   Global Step: 58350   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-06 15:28:07,547-Speed 9401.53 samples/sec   Loss 0.8597   LearningRate 0.0000   Epoch: 33   Global Step: 58360   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-06 15:28:33,643-Speed 9417.71 samples/sec   Loss 0.8653   LearningRate 0.0000   Epoch: 33   Global Step: 58370   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-06 15:28:59,770-Speed 9406.92 samples/sec   Loss 0.8547   LearningRate 0.0000   Epoch: 33   Global Step: 58380   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:29:25,908-Speed 9402.82 samples/sec   Loss 0.8568   LearningRate 0.0000   Epoch: 33   Global Step: 58390   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:29:52,027-Speed 9409.33 samples/sec   Loss 0.8558   LearningRate 0.0000   Epoch: 33   Global Step: 58400   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:30:18,226-Speed 9381.27 samples/sec   Loss 0.8455   LearningRate 0.0000   Epoch: 33   Global Step: 58410   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:30:44,484-Speed 9359.75 samples/sec   Loss 0.8560   LearningRate 0.0000   Epoch: 33   Global Step: 58420   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:31:10,630-Speed 9400.12 samples/sec   Loss 0.8472   LearningRate 0.0000   Epoch: 33   Global Step: 58430   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:31:36,799-Speed 9391.69 samples/sec   Loss 0.8549   LearningRate 0.0000   Epoch: 33   Global Step: 58440   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:32:02,980-Speed 9387.27 samples/sec   Loss 0.8512   LearningRate 0.0000   Epoch: 33   Global Step: 58450   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:32:29,152-Speed 9390.70 samples/sec   Loss 0.8554   LearningRate 0.0000   Epoch: 33   Global Step: 58460   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:32:55,287-Speed 9403.84 samples/sec   Loss 0.8559   LearningRate 0.0000   Epoch: 33   Global Step: 58470   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:33:21,583-Speed 9346.47 samples/sec   Loss 0.8552   LearningRate 0.0000   Epoch: 33   Global Step: 58480   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-06 15:33:47,715-Speed 9405.00 samples/sec   Loss 0.8564   LearningRate 0.0000   Epoch: 33   Global Step: 58490   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-06 15:34:13,929-Speed 9375.43 samples/sec   Loss 0.8618   LearningRate 0.0000   Epoch: 33   Global Step: 58500   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-06 15:34:40,148-Speed 9373.70 samples/sec   Loss 0.8544   LearningRate 0.0000   Epoch: 33   Global Step: 58510   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-06 15:35:06,370-Speed 9372.92 samples/sec   Loss 0.8535   LearningRate 0.0000   Epoch: 33   Global Step: 58520   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-06 15:35:32,468-Speed 9417.46 samples/sec   Loss 0.8536   LearningRate 0.0000   Epoch: 33   Global Step: 58530   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-06 15:35:58,603-Speed 9403.67 samples/sec   Loss 0.8601   LearningRate 0.0000   Epoch: 33   Global Step: 58540   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:36:24,713-Speed 9412.92 samples/sec   Loss 0.8558   LearningRate 0.0000   Epoch: 33   Global Step: 58550   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:36:50,856-Speed 9400.89 samples/sec   Loss 0.8557   LearningRate 0.0000   Epoch: 33   Global Step: 58560   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:37:16,996-Speed 9402.09 samples/sec   Loss 0.8538   LearningRate 0.0000   Epoch: 33   Global Step: 58570   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:37:43,109-Speed 9412.18 samples/sec   Loss 0.8524   LearningRate 0.0000   Epoch: 33   Global Step: 58580   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:38:09,195-Speed 9421.41 samples/sec   Loss 0.8532   LearningRate 0.0000   Epoch: 33   Global Step: 58590   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:38:35,259-Speed 9429.36 samples/sec   Loss 0.8546   LearningRate 0.0000   Epoch: 33   Global Step: 58600   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:39:01,374-Speed 9411.38 samples/sec   Loss 0.8520   LearningRate 0.0000   Epoch: 33   Global Step: 58610   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:39:27,478-Speed 9415.45 samples/sec   Loss 0.8544   LearningRate 0.0000   Epoch: 33   Global Step: 58620   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:39:53,597-Speed 9409.58 samples/sec   Loss 0.8495   LearningRate 0.0000   Epoch: 33   Global Step: 58630   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-06 15:40:19,714-Speed 9410.78 samples/sec   Loss 0.8556   LearningRate 0.0000   Epoch: 33   Global Step: 58640   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-06 15:40:45,785-Speed 9426.91 samples/sec   Loss 0.8491   LearningRate 0.0000   Epoch: 33   Global Step: 58650   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-06 15:41:11,902-Speed 9410.27 samples/sec   Loss 0.8514   LearningRate 0.0000   Epoch: 33   Global Step: 58660   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-06 15:41:37,984-Speed 9423.42 samples/sec   Loss 0.8584   LearningRate 0.0000   Epoch: 33   Global Step: 58670   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-06 15:42:04,079-Speed 9418.17 samples/sec   Loss 0.8536   LearningRate 0.0000   Epoch: 33   Global Step: 58680   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-06 15:42:30,274-Speed 9382.20 samples/sec   Loss 0.8567   LearningRate 0.0000   Epoch: 33   Global Step: 58690   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-06 15:42:56,387-Speed 9411.78 samples/sec   Loss 0.8567   LearningRate 0.0000   Epoch: 33   Global Step: 58700   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-06 15:43:22,522-Speed 9403.89 samples/sec   Loss 0.8563   LearningRate 0.0000   Epoch: 33   Global Step: 58710   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-06 15:43:48,608-Speed 9421.84 samples/sec   Loss 0.8507   LearningRate 0.0000   Epoch: 33   Global Step: 58720   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-06 15:44:14,795-Speed 9384.85 samples/sec   Loss 0.8494   LearningRate 0.0000   Epoch: 33   Global Step: 58730   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:44:40,910-Speed 9411.10 samples/sec   Loss 0.8561   LearningRate 0.0000   Epoch: 33   Global Step: 58740   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:45:06,999-Speed 9420.68 samples/sec   Loss 0.8552   LearningRate 0.0000   Epoch: 33   Global Step: 58750   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:45:33,162-Speed 9393.66 samples/sec   Loss 0.8561   LearningRate 0.0000   Epoch: 33   Global Step: 58760   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:46:52,490-Speed 3098.08 samples/sec   Loss 0.8533   LearningRate 0.0000   Epoch: 34   Global Step: 58770   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:47:18,358-Speed 9500.86 samples/sec   Loss 0.8482   LearningRate 0.0000   Epoch: 34   Global Step: 58780   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:47:44,317-Speed 9467.66 samples/sec   Loss 0.8522   LearningRate 0.0000   Epoch: 34   Global Step: 58790   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:48:10,275-Speed 9468.16 samples/sec   Loss 0.8456   LearningRate 0.0000   Epoch: 34   Global Step: 58800   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:48:36,218-Speed 9473.54 samples/sec   Loss 0.8460   LearningRate 0.0000   Epoch: 34   Global Step: 58810   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:49:02,170-Speed 9470.00 samples/sec   Loss 0.8463   LearningRate 0.0000   Epoch: 34   Global Step: 58820   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:49:28,141-Speed 9463.93 samples/sec   Loss 0.8445   LearningRate 0.0000   Epoch: 34   Global Step: 58830   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-06 15:49:54,074-Speed 9476.84 samples/sec   Loss 0.8491   LearningRate 0.0000   Epoch: 34   Global Step: 58840   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-06 15:50:20,040-Speed 9465.18 samples/sec   Loss 0.8462   LearningRate 0.0000   Epoch: 34   Global Step: 58850   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-06 15:50:46,029-Speed 9456.77 samples/sec   Loss 0.8557   LearningRate 0.0000   Epoch: 34   Global Step: 58860   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-06 15:51:12,006-Speed 9461.20 samples/sec   Loss 0.8450   LearningRate 0.0000   Epoch: 34   Global Step: 58870   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-06 15:51:37,977-Speed 9464.10 samples/sec   Loss 0.8522   LearningRate 0.0000   Epoch: 34   Global Step: 58880   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-06 15:52:04,013-Speed 9439.91 samples/sec   Loss 0.8454   LearningRate 0.0000   Epoch: 34   Global Step: 58890   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-06 15:52:30,047-Speed 9440.40 samples/sec   Loss 0.8490   LearningRate 0.0000   Epoch: 34   Global Step: 58900   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-06 15:52:55,958-Speed 9484.96 samples/sec   Loss 0.8393   LearningRate 0.0000   Epoch: 34   Global Step: 58910   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-06 15:53:22,005-Speed 9435.72 samples/sec   Loss 0.8404   LearningRate 0.0000   Epoch: 34   Global Step: 58920   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-06 15:53:47,987-Speed 9459.06 samples/sec   Loss 0.8513   LearningRate 0.0000   Epoch: 34   Global Step: 58930   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-06 15:54:13,982-Speed 9454.82 samples/sec   Loss 0.8506   LearningRate 0.0000   Epoch: 34   Global Step: 58940   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-06 15:54:39,956-Speed 9461.96 samples/sec   Loss 0.8491   LearningRate 0.0000   Epoch: 34   Global Step: 58950   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-06 15:55:05,926-Speed 9463.80 samples/sec   Loss 0.8499   LearningRate 0.0000   Epoch: 34   Global Step: 58960   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-06 15:55:31,949-Speed 9444.25 samples/sec   Loss 0.8491   LearningRate 0.0000   Epoch: 34   Global Step: 58970   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-06 15:55:57,925-Speed 9461.50 samples/sec   Loss 0.8491   LearningRate 0.0000   Epoch: 34   Global Step: 58980   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-06 15:56:23,969-Speed 9436.66 samples/sec   Loss 0.8493   LearningRate 0.0000   Epoch: 34   Global Step: 58990   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-06 15:56:50,007-Speed 9439.06 samples/sec   Loss 0.8428   LearningRate 0.0000   Epoch: 34   Global Step: 59000   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-06 15:57:16,017-Speed 9449.01 samples/sec   Loss 0.8481   LearningRate 0.0000   Epoch: 34   Global Step: 59010   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-06 15:57:42,031-Speed 9447.99 samples/sec   Loss 0.8441   LearningRate 0.0000   Epoch: 34   Global Step: 59020   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-06 15:58:08,073-Speed 9437.29 samples/sec   Loss 0.8508   LearningRate 0.0000   Epoch: 34   Global Step: 59030   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 15:58:34,123-Speed 9434.55 samples/sec   Loss 0.8441   LearningRate 0.0000   Epoch: 34   Global Step: 59040   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 15:59:00,141-Speed 9446.81 samples/sec   Loss 0.8414   LearningRate 0.0000   Epoch: 34   Global Step: 59050   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 15:59:26,204-Speed 9429.93 samples/sec   Loss 0.8492   LearningRate 0.0000   Epoch: 34   Global Step: 59060   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 15:59:52,265-Speed 9430.25 samples/sec   Loss 0.8474   LearningRate 0.0000   Epoch: 34   Global Step: 59070   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:00:18,284-Speed 9445.96 samples/sec   Loss 0.8536   LearningRate 0.0000   Epoch: 34   Global Step: 59080   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:00:44,348-Speed 9429.32 samples/sec   Loss 0.8443   LearningRate 0.0000   Epoch: 34   Global Step: 59090   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:01:10,424-Speed 9425.21 samples/sec   Loss 0.8440   LearningRate 0.0000   Epoch: 34   Global Step: 59100   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:01:36,484-Speed 9430.88 samples/sec   Loss 0.8462   LearningRate 0.0000   Epoch: 34   Global Step: 59110   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:02:02,570-Speed 9421.74 samples/sec   Loss 0.8535   LearningRate 0.0000   Epoch: 34   Global Step: 59120   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:02:28,701-Speed 9405.29 samples/sec   Loss 0.8450   LearningRate 0.0000   Epoch: 34   Global Step: 59130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-03-06 16:02:54,845-Speed 9400.50 samples/sec   Loss 0.8451   LearningRate 0.0000   Epoch: 34   Global Step: 59140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-03-06 16:03:20,924-Speed 9424.30 samples/sec   Loss 0.8504   LearningRate 0.0000   Epoch: 34   Global Step: 59150   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:03:47,051-Speed 9406.70 samples/sec   Loss 0.8491   LearningRate 0.0000   Epoch: 34   Global Step: 59160   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:04:13,138-Speed 9421.06 samples/sec   Loss 0.8466   LearningRate 0.0000   Epoch: 34   Global Step: 59170   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:04:39,279-Speed 9401.61 samples/sec   Loss 0.8424   LearningRate 0.0000   Epoch: 34   Global Step: 59180   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:05:05,459-Speed 9387.94 samples/sec   Loss 0.8464   LearningRate 0.0000   Epoch: 34   Global Step: 59190   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:05:31,602-Speed 9401.09 samples/sec   Loss 0.8435   LearningRate 0.0000   Epoch: 34   Global Step: 59200   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:05:57,689-Speed 9421.15 samples/sec   Loss 0.8395   LearningRate 0.0000   Epoch: 34   Global Step: 59210   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:06:23,891-Speed 9379.89 samples/sec   Loss 0.8425   LearningRate 0.0000   Epoch: 34   Global Step: 59220   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:06:49,985-Speed 9418.89 samples/sec   Loss 0.8413   LearningRate 0.0000   Epoch: 34   Global Step: 59230   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:07:16,142-Speed 9395.75 samples/sec   Loss 0.8470   LearningRate 0.0000   Epoch: 34   Global Step: 59240   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:07:42,288-Speed 9400.19 samples/sec   Loss 0.8419   LearningRate 0.0000   Epoch: 34   Global Step: 59250   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-03-06 16:08:08,411-Speed 9408.15 samples/sec   Loss 0.8451   LearningRate 0.0000   Epoch: 34   Global Step: 59260   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:08:34,611-Speed 9380.50 samples/sec   Loss 0.8533   LearningRate 0.0000   Epoch: 34   Global Step: 59270   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:09:00,824-Speed 9376.09 samples/sec   Loss 0.8436   LearningRate 0.0000   Epoch: 34   Global Step: 59280   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:09:27,056-Speed 9369.21 samples/sec   Loss 0.8516   LearningRate 0.0000   Epoch: 34   Global Step: 59290   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:09:53,212-Speed 9396.37 samples/sec   Loss 0.8465   LearningRate 0.0000   Epoch: 34   Global Step: 59300   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:10:19,335-Speed 9408.09 samples/sec   Loss 0.8445   LearningRate 0.0000   Epoch: 34   Global Step: 59310   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:10:45,472-Speed 9403.47 samples/sec   Loss 0.8476   LearningRate 0.0000   Epoch: 34   Global Step: 59320   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:11:11,624-Speed 9397.62 samples/sec   Loss 0.8461   LearningRate 0.0000   Epoch: 34   Global Step: 59330   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:11:37,725-Speed 9415.95 samples/sec   Loss 0.8497   LearningRate 0.0000   Epoch: 34   Global Step: 59340   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:12:03,864-Speed 9402.78 samples/sec   Loss 0.8389   LearningRate 0.0000   Epoch: 34   Global Step: 59350   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:12:29,998-Speed 9404.49 samples/sec   Loss 0.8358   LearningRate 0.0000   Epoch: 34   Global Step: 59360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-03-06 16:12:56,130-Speed 9404.93 samples/sec   Loss 0.8471   LearningRate 0.0000   Epoch: 34   Global Step: 59370   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-03-06 16:13:22,191-Speed 9430.56 samples/sec   Loss 0.8436   LearningRate 0.0000   Epoch: 34   Global Step: 59380   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:13:48,339-Speed 9399.25 samples/sec   Loss 0.8405   LearningRate 0.0000   Epoch: 34   Global Step: 59390   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:14:14,446-Speed 9414.23 samples/sec   Loss 0.8439   LearningRate 0.0000   Epoch: 34   Global Step: 59400   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:14:40,613-Speed 9392.51 samples/sec   Loss 0.8384   LearningRate 0.0000   Epoch: 34   Global Step: 59410   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:15:06,731-Speed 9410.22 samples/sec   Loss 0.8445   LearningRate 0.0000   Epoch: 34   Global Step: 59420   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:15:32,882-Speed 9398.01 samples/sec   Loss 0.8383   LearningRate 0.0000   Epoch: 34   Global Step: 59430   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:15:59,011-Speed 9406.20 samples/sec   Loss 0.8434   LearningRate 0.0000   Epoch: 34   Global Step: 59440   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:16:25,220-Speed 9377.32 samples/sec   Loss 0.8441   LearningRate 0.0000   Epoch: 34   Global Step: 59450   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:16:54,582-Speed 8370.39 samples/sec   Loss 0.8441   LearningRate 0.0000   Epoch: 34   Global Step: 59460   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-06 16:17:20,774-Speed 9383.78 samples/sec   Loss 0.8524   LearningRate 0.0000   Epoch: 34   Global Step: 59470   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-06 16:17:46,980-Speed 9378.42 samples/sec   Loss 0.8433   LearningRate 0.0000   Epoch: 34   Global Step: 59480   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-06 16:18:13,139-Speed 9395.47 samples/sec   Loss 0.8378   LearningRate 0.0000   Epoch: 34   Global Step: 59490   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-06 16:18:39,252-Speed 9412.10 samples/sec   Loss 0.8397   LearningRate 0.0000   Epoch: 34   Global Step: 59500   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-06 16:19:05,433-Speed 9387.45 samples/sec   Loss 0.8377   LearningRate 0.0000   Epoch: 34   Global Step: 59510   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-06 16:19:31,539-Speed 9414.16 samples/sec   Loss 0.8394   LearningRate 0.0000   Epoch: 34   Global Step: 59520   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-06 16:19:57,744-Speed 9378.79 samples/sec   Loss 0.8394   LearningRate 0.0000   Epoch: 34   Global Step: 59530   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-06 16:20:23,941-Speed 9381.69 samples/sec   Loss 0.8424   LearningRate 0.0000   Epoch: 34   Global Step: 59540   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-06 16:20:50,178-Speed 9367.62 samples/sec   Loss 0.8367   LearningRate 0.0000   Epoch: 34   Global Step: 59550   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-06 16:21:16,428-Speed 9362.71 samples/sec   Loss 0.8473   LearningRate 0.0000   Epoch: 34   Global Step: 59560   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:21:42,533-Speed 9414.79 samples/sec   Loss 0.8415   LearningRate 0.0000   Epoch: 34   Global Step: 59570   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:22:08,673-Speed 9402.10 samples/sec   Loss 0.8465   LearningRate 0.0000   Epoch: 34   Global Step: 59580   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:22:34,944-Speed 9355.03 samples/sec   Loss 0.8358   LearningRate 0.0000   Epoch: 34   Global Step: 59590   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:23:01,107-Speed 9394.11 samples/sec   Loss 0.8364   LearningRate 0.0000   Epoch: 34   Global Step: 59600   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:23:27,284-Speed 9388.69 samples/sec   Loss 0.8445   LearningRate 0.0000   Epoch: 34   Global Step: 59610   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:23:53,396-Speed 9412.31 samples/sec   Loss 0.8364   LearningRate 0.0000   Epoch: 34   Global Step: 59620   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:24:19,630-Speed 9368.26 samples/sec   Loss 0.8400   LearningRate 0.0000   Epoch: 34   Global Step: 59630   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:24:45,791-Speed 9394.53 samples/sec   Loss 0.8377   LearningRate 0.0000   Epoch: 34   Global Step: 59640   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:25:11,887-Speed 9417.88 samples/sec   Loss 0.8354   LearningRate 0.0000   Epoch: 34   Global Step: 59650   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-06 16:25:38,026-Speed 9402.55 samples/sec   Loss 0.8333   LearningRate 0.0000   Epoch: 34   Global Step: 59660   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-06 16:26:04,170-Speed 9400.41 samples/sec   Loss 0.8443   LearningRate 0.0000   Epoch: 34   Global Step: 59670   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-06 16:26:30,319-Speed 9398.93 samples/sec   Loss 0.8328   LearningRate 0.0000   Epoch: 34   Global Step: 59680   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-06 16:26:56,478-Speed 9395.20 samples/sec   Loss 0.8406   LearningRate 0.0000   Epoch: 34   Global Step: 59690   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-06 16:27:22,705-Speed 9370.94 samples/sec   Loss 0.8386   LearningRate 0.0000   Epoch: 34   Global Step: 59700   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-06 16:27:48,856-Speed 9397.95 samples/sec   Loss 0.8343   LearningRate 0.0000   Epoch: 34   Global Step: 59710   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-06 16:28:15,002-Speed 9400.30 samples/sec   Loss 0.8370   LearningRate 0.0000   Epoch: 34   Global Step: 59720   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-06 16:28:41,045-Speed 9437.04 samples/sec   Loss 0.8391   LearningRate 0.0000   Epoch: 34   Global Step: 59730   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-06 16:29:07,177-Speed 9404.77 samples/sec   Loss 0.8372   LearningRate 0.0000   Epoch: 34   Global Step: 59740   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-06 16:29:33,322-Speed 9400.50 samples/sec   Loss 0.8316   LearningRate 0.0000   Epoch: 34   Global Step: 59750   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:29:59,398-Speed 9425.12 samples/sec   Loss 0.8383   LearningRate 0.0000   Epoch: 34   Global Step: 59760   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:30:25,599-Speed 9379.99 samples/sec   Loss 0.8424   LearningRate 0.0000   Epoch: 34   Global Step: 59770   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:30:51,844-Speed 9364.62 samples/sec   Loss 0.8399   LearningRate 0.0000   Epoch: 34   Global Step: 59780   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:31:17,995-Speed 9398.07 samples/sec   Loss 0.8279   LearningRate 0.0000   Epoch: 34   Global Step: 59790   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:31:44,108-Speed 9411.77 samples/sec   Loss 0.8366   LearningRate 0.0000   Epoch: 34   Global Step: 59800   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:32:10,257-Speed 9398.82 samples/sec   Loss 0.8354   LearningRate 0.0000   Epoch: 34   Global Step: 59810   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:32:36,308-Speed 9433.97 samples/sec   Loss 0.8425   LearningRate 0.0000   Epoch: 34   Global Step: 59820   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:33:02,369-Speed 9430.98 samples/sec   Loss 0.8404   LearningRate 0.0000   Epoch: 34   Global Step: 59830   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:33:28,525-Speed 9396.34 samples/sec   Loss 0.8254   LearningRate 0.0000   Epoch: 34   Global Step: 59840   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:33:54,575-Speed 9434.42 samples/sec   Loss 0.8325   LearningRate 0.0000   Epoch: 34   Global Step: 59850   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-06 16:34:20,705-Speed 9406.03 samples/sec   Loss 0.8358   LearningRate 0.0000   Epoch: 34   Global Step: 59860   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-06 16:34:46,791-Speed 9421.46 samples/sec   Loss 0.8341   LearningRate 0.0000   Epoch: 34   Global Step: 59870   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-06 16:35:12,907-Speed 9410.79 samples/sec   Loss 0.8279   LearningRate 0.0000   Epoch: 34   Global Step: 59880   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-06 16:35:38,995-Speed 9421.32 samples/sec   Loss 0.8319   LearningRate 0.0000   Epoch: 34   Global Step: 59890   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-06 16:36:05,145-Speed 9398.34 samples/sec   Loss 0.8354   LearningRate 0.0000   Epoch: 34   Global Step: 59900   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-06 16:36:31,271-Speed 9407.00 samples/sec   Loss 0.8374   LearningRate 0.0000   Epoch: 34   Global Step: 59910   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-06 16:36:57,446-Speed 9389.40 samples/sec   Loss 0.8295   LearningRate 0.0000   Epoch: 34   Global Step: 59920   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-06 16:37:23,564-Speed 9410.11 samples/sec   Loss 0.8371   LearningRate 0.0000   Epoch: 34   Global Step: 59930   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-06 16:37:49,674-Speed 9412.94 samples/sec   Loss 0.8327   LearningRate 0.0000   Epoch: 34   Global Step: 59940   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-06 16:38:15,802-Speed 9406.50 samples/sec   Loss 0.8399   LearningRate 0.0000   Epoch: 34   Global Step: 59950   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:38:41,907-Speed 9414.48 samples/sec   Loss 0.8258   LearningRate 0.0000   Epoch: 34   Global Step: 59960   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:39:08,023-Speed 9410.63 samples/sec   Loss 0.8333   LearningRate 0.0000   Epoch: 34   Global Step: 59970   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:39:34,218-Speed 9382.33 samples/sec   Loss 0.8354   LearningRate 0.0000   Epoch: 34   Global Step: 59980   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:40:00,348-Speed 9405.64 samples/sec   Loss 0.8336   LearningRate 0.0000   Epoch: 34   Global Step: 59990   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:40:26,446-Speed 9417.33 samples/sec   Loss 0.8342   LearningRate 0.0000   Epoch: 34   Global Step: 60000   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:40:52,538-Speed 9419.63 samples/sec   Loss 0.8305   LearningRate 0.0000   Epoch: 34   Global Step: 60010   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:41:18,752-Speed 9375.37 samples/sec   Loss 0.8340   LearningRate 0.0000   Epoch: 34   Global Step: 60020   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:41:44,880-Speed 9406.62 samples/sec   Loss 0.8351   LearningRate 0.0000   Epoch: 34   Global Step: 60030   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:42:11,093-Speed 9376.11 samples/sec   Loss 0.8298   LearningRate 0.0000   Epoch: 34   Global Step: 60040   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:42:37,168-Speed 9425.45 samples/sec   Loss 0.8293   LearningRate 0.0000   Epoch: 34   Global Step: 60050   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:43:03,305-Speed 9403.12 samples/sec   Loss 0.8245   LearningRate 0.0000   Epoch: 34   Global Step: 60060   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:43:29,369-Speed 9429.39 samples/sec   Loss 0.8340   LearningRate 0.0000   Epoch: 34   Global Step: 60070   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:43:55,411-Speed 9437.67 samples/sec   Loss 0.8308   LearningRate 0.0000   Epoch: 34   Global Step: 60080   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:44:21,551-Speed 9402.06 samples/sec   Loss 0.8440   LearningRate 0.0000   Epoch: 34   Global Step: 60090   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:44:47,636-Speed 9421.77 samples/sec   Loss 0.8349   LearningRate 0.0000   Epoch: 34   Global Step: 60100   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:45:13,738-Speed 9415.71 samples/sec   Loss 0.8356   LearningRate 0.0000   Epoch: 34   Global Step: 60110   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:45:39,918-Speed 9387.67 samples/sec   Loss 0.8420   LearningRate 0.0000   Epoch: 34   Global Step: 60120   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:46:06,081-Speed 9393.91 samples/sec   Loss 0.8355   LearningRate 0.0000   Epoch: 34   Global Step: 60130   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:46:32,121-Speed 9438.11 samples/sec   Loss 0.8340   LearningRate 0.0000   Epoch: 34   Global Step: 60140   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-06 16:46:58,187-Speed 9428.86 samples/sec   Loss 0.8305   LearningRate 0.0000   Epoch: 34   Global Step: 60150   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-06 16:47:24,267-Speed 9423.82 samples/sec   Loss 0.8337   LearningRate 0.0000   Epoch: 34   Global Step: 60160   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-06 16:47:50,329-Speed 9429.86 samples/sec   Loss 0.8340   LearningRate 0.0000   Epoch: 34   Global Step: 60170   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-06 16:48:16,367-Speed 9439.08 samples/sec   Loss 0.8285   LearningRate 0.0000   Epoch: 34   Global Step: 60180   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-06 16:48:42,457-Speed 9419.98 samples/sec   Loss 0.8335   LearningRate 0.0000   Epoch: 34   Global Step: 60190   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-06 16:49:08,555-Speed 9417.31 samples/sec   Loss 0.8334   LearningRate 0.0000   Epoch: 34   Global Step: 60200   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-06 16:49:34,707-Speed 9397.86 samples/sec   Loss 0.8272   LearningRate 0.0000   Epoch: 34   Global Step: 60210   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-06 16:50:00,829-Speed 9408.50 samples/sec   Loss 0.8322   LearningRate 0.0000   Epoch: 34   Global Step: 60220   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-06 16:50:27,117-Speed 9349.03 samples/sec   Loss 0.8335   LearningRate 0.0000   Epoch: 34   Global Step: 60230   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-06 16:50:53,262-Speed 9400.40 samples/sec   Loss 0.8351   LearningRate 0.0000   Epoch: 34   Global Step: 60240   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:51:19,435-Speed 9390.48 samples/sec   Loss 0.8270   LearningRate 0.0000   Epoch: 34   Global Step: 60250   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:51:45,577-Speed 9401.29 samples/sec   Loss 0.8310   LearningRate 0.0000   Epoch: 34   Global Step: 60260   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:52:11,672-Speed 9418.21 samples/sec   Loss 0.8313   LearningRate 0.0000   Epoch: 34   Global Step: 60270   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:52:37,800-Speed 9406.40 samples/sec   Loss 0.8280   LearningRate 0.0000   Epoch: 34   Global Step: 60280   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-06 16:53:03,985-Speed 9386.23 samples/sec   Loss 0.8298   LearningRate 0.0000   Epoch: 34   Global Step: 60290   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-06 16:53:30,012-Speed 9442.65 samples/sec   Loss 0.8280   LearningRate 0.0000   Epoch: 34   Global Step: 60300   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-06 16:53:56,111-Speed 9417.16 samples/sec   Loss 0.8330   LearningRate 0.0000   Epoch: 34   Global Step: 60310   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-06 16:54:22,161-Speed 9434.35 samples/sec   Loss 0.8320   LearningRate 0.0000   Epoch: 34   Global Step: 60320   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 16:54:48,215-Speed 9432.98 samples/sec   Loss 0.8298   LearningRate 0.0000   Epoch: 34   Global Step: 60330   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 16:55:14,285-Speed 9427.42 samples/sec   Loss 0.8321   LearningRate 0.0000   Epoch: 34   Global Step: 60340   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 16:55:40,409-Speed 9407.73 samples/sec   Loss 0.8368   LearningRate 0.0000   Epoch: 34   Global Step: 60350   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 16:56:06,488-Speed 9424.15 samples/sec   Loss 0.8303   LearningRate 0.0000   Epoch: 34   Global Step: 60360   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 16:56:32,714-Speed 9371.18 samples/sec   Loss 0.8329   LearningRate 0.0000   Epoch: 34   Global Step: 60370   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 16:56:58,854-Speed 9402.35 samples/sec   Loss 0.8334   LearningRate 0.0000   Epoch: 34   Global Step: 60380   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 16:57:24,952-Speed 9416.94 samples/sec   Loss 0.8254   LearningRate 0.0000   Epoch: 34   Global Step: 60390   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 16:57:50,943-Speed 9456.04 samples/sec   Loss 0.8280   LearningRate 0.0000   Epoch: 34   Global Step: 60400   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-06 16:58:17,114-Speed 9390.74 samples/sec   Loss 0.8262   LearningRate 0.0000   Epoch: 34   Global Step: 60410   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-06 16:58:43,251-Speed 9403.05 samples/sec   Loss 0.8299   LearningRate 0.0000   Epoch: 34   Global Step: 60420   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-06 16:59:09,367-Speed 9410.86 samples/sec   Loss 0.8357   LearningRate 0.0000   Epoch: 34   Global Step: 60430   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-06 16:59:35,448-Speed 9423.24 samples/sec   Loss 0.8283   LearningRate 0.0000   Epoch: 34   Global Step: 60440   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-06 17:00:01,589-Speed 9402.03 samples/sec   Loss 0.8321   LearningRate 0.0000   Epoch: 34   Global Step: 60450   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-06 17:00:27,668-Speed 9423.97 samples/sec   Loss 0.8245   LearningRate 0.0000   Epoch: 34   Global Step: 60460   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-06 17:00:53,795-Speed 9406.92 samples/sec   Loss 0.8205   LearningRate 0.0000   Epoch: 34   Global Step: 60470   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-06 17:01:20,032-Speed 9367.57 samples/sec   Loss 0.8284   LearningRate 0.0000   Epoch: 34   Global Step: 60480   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-06 17:01:46,337-Speed 9342.87 samples/sec   Loss 0.8314   LearningRate 0.0000   Epoch: 34   Global Step: 60490   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-06 17:03:03,993-Speed 3164.79 samples/sec   Loss 0.8246   LearningRate 0.0000   Epoch: 35   Global Step: 60500   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:03:30,006-Speed 9448.08 samples/sec   Loss 0.8264   LearningRate 0.0000   Epoch: 35   Global Step: 60510   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:03:56,123-Speed 9410.51 samples/sec   Loss 0.8214   LearningRate 0.0000   Epoch: 35   Global Step: 60520   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:04:22,406-Speed 9350.84 samples/sec   Loss 0.8243   LearningRate 0.0000   Epoch: 35   Global Step: 60530   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:04:48,715-Speed 9341.50 samples/sec   Loss 0.8298   LearningRate 0.0000   Epoch: 35   Global Step: 60540   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:05:14,948-Speed 9369.06 samples/sec   Loss 0.8223   LearningRate 0.0000   Epoch: 35   Global Step: 60550   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:05:41,844-Speed 9137.76 samples/sec   Loss 0.8327   LearningRate 0.0000   Epoch: 35   Global Step: 60560   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:06:08,074-Speed 9369.61 samples/sec   Loss 0.8257   LearningRate 0.0000   Epoch: 35   Global Step: 60570   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:06:34,188-Speed 9411.31 samples/sec   Loss 0.8249   LearningRate 0.0000   Epoch: 35   Global Step: 60580   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:07:00,278-Speed 9420.58 samples/sec   Loss 0.8242   LearningRate 0.0000   Epoch: 35   Global Step: 60590   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:07:26,497-Speed 9373.79 samples/sec   Loss 0.8264   LearningRate 0.0000   Epoch: 35   Global Step: 60600   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-06 17:07:52,635-Speed 9402.53 samples/sec   Loss 0.8211   LearningRate 0.0000   Epoch: 35   Global Step: 60610   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-06 17:08:18,764-Speed 9407.30 samples/sec   Loss 0.8261   LearningRate 0.0000   Epoch: 35   Global Step: 60620   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-06 17:08:44,899-Speed 9403.86 samples/sec   Loss 0.8198   LearningRate 0.0000   Epoch: 35   Global Step: 60630   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-06 17:09:11,260-Speed 9323.63 samples/sec   Loss 0.8204   LearningRate 0.0000   Epoch: 35   Global Step: 60640   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-06 17:09:37,399-Speed 9402.62 samples/sec   Loss 0.8221   LearningRate 0.0000   Epoch: 35   Global Step: 60650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-06 17:10:03,551-Speed 9397.75 samples/sec   Loss 0.8183   LearningRate 0.0000   Epoch: 35   Global Step: 60660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-06 17:10:29,694-Speed 9400.78 samples/sec   Loss 0.8228   LearningRate 0.0000   Epoch: 35   Global Step: 60670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-06 17:10:55,845-Speed 9399.13 samples/sec   Loss 0.8219   LearningRate 0.0000   Epoch: 35   Global Step: 60680   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-06 17:11:22,064-Speed 9373.80 samples/sec   Loss 0.8256   LearningRate 0.0000   Epoch: 35   Global Step: 60690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-06 17:11:48,161-Speed 9417.33 samples/sec   Loss 0.8223   LearningRate 0.0000   Epoch: 35   Global Step: 60700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-03-06 17:12:14,267-Speed 9414.40 samples/sec   Loss 0.8279   LearningRate 0.0000   Epoch: 35   Global Step: 60710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-03-06 17:12:40,458-Speed 9383.79 samples/sec   Loss 0.8292   LearningRate 0.0000   Epoch: 35   Global Step: 60720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-03-06 17:13:06,594-Speed 9403.67 samples/sec   Loss 0.8245   LearningRate 0.0000   Epoch: 35   Global Step: 60730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-03-06 17:13:32,760-Speed 9392.94 samples/sec   Loss 0.8259   LearningRate 0.0000   Epoch: 35   Global Step: 60740   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-06 17:13:58,838-Speed 9424.20 samples/sec   Loss 0.8263   LearningRate 0.0000   Epoch: 35   Global Step: 60750   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:14:24,963-Speed 9407.49 samples/sec   Loss 0.8231   LearningRate 0.0000   Epoch: 35   Global Step: 60760   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:14:51,054-Speed 9419.96 samples/sec   Loss 0.8239   LearningRate 0.0000   Epoch: 35   Global Step: 60770   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:15:17,193-Speed 9402.39 samples/sec   Loss 0.8247   LearningRate 0.0000   Epoch: 35   Global Step: 60780   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:15:43,358-Speed 9393.19 samples/sec   Loss 0.8230   LearningRate 0.0000   Epoch: 35   Global Step: 60790   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:16:09,487-Speed 9405.84 samples/sec   Loss 0.8239   LearningRate 0.0000   Epoch: 35   Global Step: 60800   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:16:35,600-Speed 9412.18 samples/sec   Loss 0.8269   LearningRate 0.0000   Epoch: 35   Global Step: 60810   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:17:01,718-Speed 9409.79 samples/sec   Loss 0.8254   LearningRate 0.0000   Epoch: 35   Global Step: 60820   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:17:27,817-Speed 9416.91 samples/sec   Loss 0.8211   LearningRate 0.0000   Epoch: 35   Global Step: 60830   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:17:53,921-Speed 9415.33 samples/sec   Loss 0.8314   LearningRate 0.0000   Epoch: 35   Global Step: 60840   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:18:20,026-Speed 9414.48 samples/sec   Loss 0.8242   LearningRate 0.0000   Epoch: 35   Global Step: 60850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-06 17:18:46,085-Speed 9431.47 samples/sec   Loss 0.8217   LearningRate 0.0000   Epoch: 35   Global Step: 60860   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:19:12,183-Speed 9417.18 samples/sec   Loss 0.8250   LearningRate 0.0000   Epoch: 35   Global Step: 60870   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:19:38,282-Speed 9416.81 samples/sec   Loss 0.8219   LearningRate 0.0000   Epoch: 35   Global Step: 60880   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:20:04,451-Speed 9391.77 samples/sec   Loss 0.8219   LearningRate 0.0000   Epoch: 35   Global Step: 60890   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:20:30,650-Speed 9380.62 samples/sec   Loss 0.8205   LearningRate 0.0000   Epoch: 35   Global Step: 60900   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:20:56,720-Speed 9427.63 samples/sec   Loss 0.8256   LearningRate 0.0000   Epoch: 35   Global Step: 60910   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:21:22,853-Speed 9404.33 samples/sec   Loss 0.8250   LearningRate 0.0000   Epoch: 35   Global Step: 60920   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:21:48,881-Speed 9443.47 samples/sec   Loss 0.8253   LearningRate 0.0000   Epoch: 35   Global Step: 60930   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:22:14,916-Speed 9440.03 samples/sec   Loss 0.8233   LearningRate 0.0000   Epoch: 35   Global Step: 60940   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:22:41,038-Speed 9408.75 samples/sec   Loss 0.8308   LearningRate 0.0000   Epoch: 35   Global Step: 60950   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:23:07,204-Speed 9392.46 samples/sec   Loss 0.8259   LearningRate 0.0000   Epoch: 35   Global Step: 60960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-06 17:23:33,328-Speed 9407.74 samples/sec   Loss 0.8244   LearningRate 0.0000   Epoch: 35   Global Step: 60970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-06 17:23:59,463-Speed 9403.92 samples/sec   Loss 0.8204   LearningRate 0.0000   Epoch: 35   Global Step: 60980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-06 17:24:25,578-Speed 9411.25 samples/sec   Loss 0.8192   LearningRate 0.0000   Epoch: 35   Global Step: 60990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-06 17:24:51,726-Speed 9399.30 samples/sec   Loss 0.8267   LearningRate 0.0000   Epoch: 35   Global Step: 61000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-06 17:25:17,830-Speed 9414.82 samples/sec   Loss 0.8253   LearningRate 0.0000   Epoch: 35   Global Step: 61010   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-06 17:25:43,994-Speed 9393.51 samples/sec   Loss 0.8255   LearningRate 0.0000   Epoch: 35   Global Step: 61020   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-06 17:26:10,112-Speed 9410.17 samples/sec   Loss 0.8188   LearningRate 0.0000   Epoch: 35   Global Step: 61030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-06 17:26:36,269-Speed 9396.05 samples/sec   Loss 0.8275   LearningRate 0.0000   Epoch: 35   Global Step: 61040   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:27:02,451-Speed 9386.78 samples/sec   Loss 0.8190   LearningRate 0.0000   Epoch: 35   Global Step: 61050   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:27:28,652-Speed 9380.54 samples/sec   Loss 0.8149   LearningRate 0.0000   Epoch: 35   Global Step: 61060   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:27:54,813-Speed 9394.39 samples/sec   Loss 0.8196   LearningRate 0.0000   Epoch: 35   Global Step: 61070   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:28:20,957-Speed 9400.78 samples/sec   Loss 0.8217   LearningRate 0.0000   Epoch: 35   Global Step: 61080   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:28:47,062-Speed 9414.84 samples/sec   Loss 0.8186   LearningRate 0.0000   Epoch: 35   Global Step: 61090   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:29:13,195-Speed 9404.52 samples/sec   Loss 0.8216   LearningRate 0.0000   Epoch: 35   Global Step: 61100   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:29:39,296-Speed 9415.82 samples/sec   Loss 0.8198   LearningRate 0.0000   Epoch: 35   Global Step: 61110   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:30:05,455-Speed 9395.62 samples/sec   Loss 0.8209   LearningRate 0.0000   Epoch: 35   Global Step: 61120   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:30:31,577-Speed 9408.25 samples/sec   Loss 0.8226   LearningRate 0.0000   Epoch: 35   Global Step: 61130   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:30:57,686-Speed 9413.37 samples/sec   Loss 0.8207   LearningRate 0.0000   Epoch: 35   Global Step: 61140   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-06 17:31:23,843-Speed 9396.04 samples/sec   Loss 0.8192   LearningRate 0.0000   Epoch: 35   Global Step: 61150   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-06 17:31:50,056-Speed 9375.96 samples/sec   Loss 0.8221   LearningRate 0.0000   Epoch: 35   Global Step: 61160   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-06 17:32:16,128-Speed 9426.76 samples/sec   Loss 0.8198   LearningRate 0.0000   Epoch: 35   Global Step: 61170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-06 17:32:42,355-Speed 9370.83 samples/sec   Loss 0.8117   LearningRate 0.0000   Epoch: 35   Global Step: 61180   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-06 17:33:08,501-Speed 9399.86 samples/sec   Loss 0.8207   LearningRate 0.0000   Epoch: 35   Global Step: 61190   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-06 17:33:34,720-Speed 9373.79 samples/sec   Loss 0.8228   LearningRate 0.0000   Epoch: 35   Global Step: 61200   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-06 17:34:00,807-Speed 9420.93 samples/sec   Loss 0.8177   LearningRate 0.0000   Epoch: 35   Global Step: 61210   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-06 17:34:26,891-Speed 9423.53 samples/sec   Loss 0.8260   LearningRate 0.0000   Epoch: 35   Global Step: 61220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-06 17:34:52,990-Speed 9416.58 samples/sec   Loss 0.8147   LearningRate 0.0000   Epoch: 35   Global Step: 61230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-06 17:35:19,150-Speed 9395.09 samples/sec   Loss 0.8160   LearningRate 0.0000   Epoch: 35   Global Step: 61240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-06 17:35:45,197-Speed 9435.62 samples/sec   Loss 0.8209   LearningRate 0.0000   Epoch: 35   Global Step: 61250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-06 17:36:11,334-Speed 9403.35 samples/sec   Loss 0.8165   LearningRate 0.0000   Epoch: 35   Global Step: 61260   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-06 17:36:37,472-Speed 9403.96 samples/sec   Loss 0.8220   LearningRate 0.0000   Epoch: 35   Global Step: 61270   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-06 17:37:03,610-Speed 9402.65 samples/sec   Loss 0.8196   LearningRate 0.0000   Epoch: 35   Global Step: 61280   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-06 17:37:29,798-Speed 9384.53 samples/sec   Loss 0.8279   LearningRate 0.0000   Epoch: 35   Global Step: 61290   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-06 17:37:55,948-Speed 9398.57 samples/sec   Loss 0.8243   LearningRate 0.0000   Epoch: 35   Global Step: 61300   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-06 17:38:22,159-Speed 9376.46 samples/sec   Loss 0.8117   LearningRate 0.0000   Epoch: 35   Global Step: 61310   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-06 17:38:48,245-Speed 9421.54 samples/sec   Loss 0.8215   LearningRate 0.0000   Epoch: 35   Global Step: 61320   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:39:14,434-Speed 9384.78 samples/sec   Loss 0.8203   LearningRate 0.0000   Epoch: 35   Global Step: 61330   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:39:40,574-Speed 9402.11 samples/sec   Loss 0.8167   LearningRate 0.0000   Epoch: 35   Global Step: 61340   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:40:06,799-Speed 9371.32 samples/sec   Loss 0.8135   LearningRate 0.0000   Epoch: 35   Global Step: 61350   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:40:32,913-Speed 9411.90 samples/sec   Loss 0.8158   LearningRate 0.0000   Epoch: 35   Global Step: 61360   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:40:59,024-Speed 9412.31 samples/sec   Loss 0.8238   LearningRate 0.0000   Epoch: 35   Global Step: 61370   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:41:25,073-Speed 9435.20 samples/sec   Loss 0.8142   LearningRate 0.0000   Epoch: 35   Global Step: 61380   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:41:51,191-Speed 9409.87 samples/sec   Loss 0.8145   LearningRate 0.0000   Epoch: 35   Global Step: 61390   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:42:17,284-Speed 9418.94 samples/sec   Loss 0.8155   LearningRate 0.0000   Epoch: 35   Global Step: 61400   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:42:43,392-Speed 9413.93 samples/sec   Loss 0.8174   LearningRate 0.0000   Epoch: 35   Global Step: 61410   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:43:09,529-Speed 9403.23 samples/sec   Loss 0.8150   LearningRate 0.0000   Epoch: 35   Global Step: 61420   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-06 17:43:35,708-Speed 9387.79 samples/sec   Loss 0.8105   LearningRate 0.0000   Epoch: 35   Global Step: 61430   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-06 17:44:01,837-Speed 9406.08 samples/sec   Loss 0.8134   LearningRate 0.0000   Epoch: 35   Global Step: 61440   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-06 17:44:28,033-Speed 9382.29 samples/sec   Loss 0.8159   LearningRate 0.0000   Epoch: 35   Global Step: 61450   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-06 17:44:54,154-Speed 9408.84 samples/sec   Loss 0.8176   LearningRate 0.0000   Epoch: 35   Global Step: 61460   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-06 17:45:20,292-Speed 9403.07 samples/sec   Loss 0.8134   LearningRate 0.0000   Epoch: 35   Global Step: 61470   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-06 17:45:46,409-Speed 9410.52 samples/sec   Loss 0.8189   LearningRate 0.0000   Epoch: 35   Global Step: 61480   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:46:12,526-Speed 9410.38 samples/sec   Loss 0.8177   LearningRate 0.0000   Epoch: 35   Global Step: 61490   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:46:38,584-Speed 9431.79 samples/sec   Loss 0.8131   LearningRate 0.0000   Epoch: 35   Global Step: 61500   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:47:04,647-Speed 9429.81 samples/sec   Loss 0.8183   LearningRate 0.0000   Epoch: 35   Global Step: 61510   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:47:30,716-Speed 9427.90 samples/sec   Loss 0.8101   LearningRate 0.0000   Epoch: 35   Global Step: 61520   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:47:56,816-Speed 9416.66 samples/sec   Loss 0.8148   LearningRate 0.0000   Epoch: 35   Global Step: 61530   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:48:22,992-Speed 9388.97 samples/sec   Loss 0.8195   LearningRate 0.0000   Epoch: 35   Global Step: 61540   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:48:49,113-Speed 9408.86 samples/sec   Loss 0.8111   LearningRate 0.0000   Epoch: 35   Global Step: 61550   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:49:15,303-Speed 9384.21 samples/sec   Loss 0.8162   LearningRate 0.0000   Epoch: 35   Global Step: 61560   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:49:41,514-Speed 9376.74 samples/sec   Loss 0.8094   LearningRate 0.0000   Epoch: 35   Global Step: 61570   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-06 17:50:07,610-Speed 9417.84 samples/sec   Loss 0.8168   LearningRate 0.0000   Epoch: 35   Global Step: 61580   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-06 17:50:33,751-Speed 9401.65 samples/sec   Loss 0.8151   LearningRate 0.0000   Epoch: 35   Global Step: 61590   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-06 17:50:59,802-Speed 9434.16 samples/sec   Loss 0.8117   LearningRate 0.0000   Epoch: 35   Global Step: 61600   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-06 17:51:25,896-Speed 9419.11 samples/sec   Loss 0.8164   LearningRate 0.0000   Epoch: 35   Global Step: 61610   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-06 17:51:51,957-Speed 9430.51 samples/sec   Loss 0.8116   LearningRate 0.0000   Epoch: 35   Global Step: 61620   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-06 17:52:18,084-Speed 9406.74 samples/sec   Loss 0.8192   LearningRate 0.0000   Epoch: 35   Global Step: 61630   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-06 17:52:44,233-Speed 9399.12 samples/sec   Loss 0.8152   LearningRate 0.0000   Epoch: 35   Global Step: 61640   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-06 17:53:10,450-Speed 9374.32 samples/sec   Loss 0.8208   LearningRate 0.0000   Epoch: 35   Global Step: 61650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-06 17:53:36,579-Speed 9406.47 samples/sec   Loss 0.8127   LearningRate 0.0000   Epoch: 35   Global Step: 61660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-06 17:54:02,639-Speed 9431.01 samples/sec   Loss 0.8136   LearningRate 0.0000   Epoch: 35   Global Step: 61670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-06 17:54:28,732-Speed 9418.89 samples/sec   Loss 0.8148   LearningRate 0.0000   Epoch: 35   Global Step: 61680   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-03-06 17:54:54,827-Speed 9418.21 samples/sec   Loss 0.8113   LearningRate 0.0000   Epoch: 35   Global Step: 61690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-06 17:55:20,921-Speed 9418.73 samples/sec   Loss 0.8139   LearningRate 0.0000   Epoch: 35   Global Step: 61700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-06 17:55:46,984-Speed 9430.09 samples/sec   Loss 0.8142   LearningRate 0.0000   Epoch: 35   Global Step: 61710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-06 17:56:13,057-Speed 9426.03 samples/sec   Loss 0.8097   LearningRate 0.0000   Epoch: 35   Global Step: 61720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-06 17:56:39,139-Speed 9423.37 samples/sec   Loss 0.8085   LearningRate 0.0000   Epoch: 35   Global Step: 61730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-06 17:57:05,291-Speed 9397.37 samples/sec   Loss 0.8172   LearningRate 0.0000   Epoch: 35   Global Step: 61740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-06 17:57:31,490-Speed 9381.24 samples/sec   Loss 0.8120   LearningRate 0.0000   Epoch: 35   Global Step: 61750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-06 17:57:57,618-Speed 9406.39 samples/sec   Loss 0.8219   LearningRate 0.0000   Epoch: 35   Global Step: 61760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-06 17:58:23,713-Speed 9418.52 samples/sec   Loss 0.8142   LearningRate 0.0000   Epoch: 35   Global Step: 61770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-06 17:58:49,815-Speed 9415.65 samples/sec   Loss 0.8139   LearningRate 0.0000   Epoch: 35   Global Step: 61780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-06 17:59:15,940-Speed 9407.28 samples/sec   Loss 0.8175   LearningRate 0.0000   Epoch: 35   Global Step: 61790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-06 17:59:41,997-Speed 9432.24 samples/sec   Loss 0.8127   LearningRate 0.0000   Epoch: 35   Global Step: 61800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-06 18:00:08,079-Speed 9422.99 samples/sec   Loss 0.8106   LearningRate 0.0000   Epoch: 35   Global Step: 61810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-06 18:00:34,153-Speed 9425.86 samples/sec   Loss 0.8123   LearningRate 0.0000   Epoch: 35   Global Step: 61820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-06 18:01:00,224-Speed 9426.97 samples/sec   Loss 0.8157   LearningRate 0.0000   Epoch: 35   Global Step: 61830   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:01:26,319-Speed 9418.31 samples/sec   Loss 0.8123   LearningRate 0.0000   Epoch: 35   Global Step: 61840   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:01:52,422-Speed 9415.66 samples/sec   Loss 0.8184   LearningRate 0.0000   Epoch: 35   Global Step: 61850   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:02:18,520-Speed 9417.41 samples/sec   Loss 0.7997   LearningRate 0.0000   Epoch: 35   Global Step: 61860   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:02:44,554-Speed 9440.28 samples/sec   Loss 0.8140   LearningRate 0.0000   Epoch: 35   Global Step: 61870   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:03:10,727-Speed 9390.26 samples/sec   Loss 0.8191   LearningRate 0.0000   Epoch: 35   Global Step: 61880   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:03:36,877-Speed 9398.80 samples/sec   Loss 0.8155   LearningRate 0.0000   Epoch: 35   Global Step: 61890   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:04:02,924-Speed 9435.14 samples/sec   Loss 0.8110   LearningRate 0.0000   Epoch: 35   Global Step: 61900   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-06 18:04:28,931-Speed 9450.50 samples/sec   Loss 0.8124   LearningRate 0.0000   Epoch: 35   Global Step: 61910   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-06 18:04:54,968-Speed 9439.25 samples/sec   Loss 0.8146   LearningRate 0.0000   Epoch: 35   Global Step: 61920   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-06 18:05:21,102-Speed 9404.08 samples/sec   Loss 0.8140   LearningRate 0.0000   Epoch: 35   Global Step: 61930   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-06 18:05:47,221-Speed 9409.85 samples/sec   Loss 0.8045   LearningRate 0.0000   Epoch: 35   Global Step: 61940   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-06 18:06:13,300-Speed 9423.83 samples/sec   Loss 0.8097   LearningRate 0.0000   Epoch: 35   Global Step: 61950   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-06 18:06:39,388-Speed 9420.75 samples/sec   Loss 0.8118   LearningRate 0.0000   Epoch: 35   Global Step: 61960   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-06 18:07:05,533-Speed 9400.31 samples/sec   Loss 0.8118   LearningRate 0.0000   Epoch: 35   Global Step: 61970   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-06 18:07:31,597-Speed 9429.69 samples/sec   Loss 0.8141   LearningRate 0.0000   Epoch: 35   Global Step: 61980   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-06 18:07:57,692-Speed 9418.15 samples/sec   Loss 0.8120   LearningRate 0.0000   Epoch: 35   Global Step: 61990   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-06 18:08:23,773-Speed 9423.28 samples/sec   Loss 0.8125   LearningRate 0.0000   Epoch: 35   Global Step: 62000   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:08:49,883-Speed 9413.62 samples/sec   Loss 0.8079   LearningRate 0.0000   Epoch: 35   Global Step: 62010   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:09:16,052-Speed 9391.74 samples/sec   Loss 0.8133   LearningRate 0.0000   Epoch: 35   Global Step: 62020   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:09:42,183-Speed 9405.22 samples/sec   Loss 0.8040   LearningRate 0.0000   Epoch: 35   Global Step: 62030   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:10:08,351-Speed 9392.21 samples/sec   Loss 0.8165   LearningRate 0.0000   Epoch: 35   Global Step: 62040   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:10:34,397-Speed 9435.76 samples/sec   Loss 0.8054   LearningRate 0.0000   Epoch: 35   Global Step: 62050   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:11:00,465-Speed 9428.44 samples/sec   Loss 0.8025   LearningRate 0.0000   Epoch: 35   Global Step: 62060   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:11:26,558-Speed 9418.91 samples/sec   Loss 0.8099   LearningRate 0.0000   Epoch: 35   Global Step: 62070   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:11:52,607-Speed 9435.11 samples/sec   Loss 0.8086   LearningRate 0.0000   Epoch: 35   Global Step: 62080   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:12:18,704-Speed 9417.55 samples/sec   Loss 0.8132   LearningRate 0.0000   Epoch: 35   Global Step: 62090   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:12:44,793-Speed 9420.22 samples/sec   Loss 0.8114   LearningRate 0.0000   Epoch: 35   Global Step: 62100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-06 18:13:10,924-Speed 9405.61 samples/sec   Loss 0.8123   LearningRate 0.0000   Epoch: 35   Global Step: 62110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-06 18:13:37,052-Speed 9406.20 samples/sec   Loss 0.8165   LearningRate 0.0000   Epoch: 35   Global Step: 62120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-06 18:14:03,063-Speed 9448.78 samples/sec   Loss 0.8118   LearningRate 0.0000   Epoch: 35   Global Step: 62130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-06 18:14:29,134-Speed 9426.68 samples/sec   Loss 0.8160   LearningRate 0.0000   Epoch: 35   Global Step: 62140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-06 18:14:55,177-Speed 9437.27 samples/sec   Loss 0.8080   LearningRate 0.0000   Epoch: 35   Global Step: 62150   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:15:21,201-Speed 9444.07 samples/sec   Loss 0.8130   LearningRate 0.0000   Epoch: 35   Global Step: 62160   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:15:47,277-Speed 9425.20 samples/sec   Loss 0.8112   LearningRate 0.0000   Epoch: 35   Global Step: 62170   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:16:13,438-Speed 9394.69 samples/sec   Loss 0.8066   LearningRate 0.0000   Epoch: 35   Global Step: 62180   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:16:39,664-Speed 9371.49 samples/sec   Loss 0.8110   LearningRate 0.0000   Epoch: 35   Global Step: 62190   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:17:05,835-Speed 9390.86 samples/sec   Loss 0.8102   LearningRate 0.0000   Epoch: 35   Global Step: 62200   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:17:31,987-Speed 9398.76 samples/sec   Loss 0.7982   LearningRate 0.0000   Epoch: 35   Global Step: 62210   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:18:52,217-Speed 3063.27 samples/sec   Loss 0.8099   LearningRate 0.0000   Epoch: 36   Global Step: 62220   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:19:18,125-Speed 9486.41 samples/sec   Loss 0.8064   LearningRate 0.0000   Epoch: 36   Global Step: 62230   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:19:44,006-Speed 9496.13 samples/sec   Loss 0.8056   LearningRate 0.0000   Epoch: 36   Global Step: 62240   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:20:10,058-Speed 9433.66 samples/sec   Loss 0.8118   LearningRate 0.0000   Epoch: 36   Global Step: 62250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-06 18:20:36,032-Speed 9462.20 samples/sec   Loss 0.8075   LearningRate 0.0000   Epoch: 36   Global Step: 62260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-06 18:21:01,982-Speed 9470.87 samples/sec   Loss 0.8114   LearningRate 0.0000   Epoch: 36   Global Step: 62270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-06 18:21:28,002-Speed 9445.46 samples/sec   Loss 0.8105   LearningRate 0.0000   Epoch: 36   Global Step: 62280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-06 18:21:54,070-Speed 9428.42 samples/sec   Loss 0.8067   LearningRate 0.0000   Epoch: 36   Global Step: 62290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-06 18:22:20,138-Speed 9427.90 samples/sec   Loss 0.8081   LearningRate 0.0000   Epoch: 36   Global Step: 62300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-06 18:22:46,129-Speed 9456.37 samples/sec   Loss 0.8059   LearningRate 0.0000   Epoch: 36   Global Step: 62310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-06 18:23:12,217-Speed 9420.74 samples/sec   Loss 0.8024   LearningRate 0.0000   Epoch: 36   Global Step: 62320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-06 18:23:38,366-Speed 9399.10 samples/sec   Loss 0.8143   LearningRate 0.0000   Epoch: 36   Global Step: 62330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-06 18:24:04,534-Speed 9391.77 samples/sec   Loss 0.8077   LearningRate 0.0000   Epoch: 36   Global Step: 62340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-06 18:24:30,621-Speed 9421.10 samples/sec   Loss 0.7980   LearningRate 0.0000   Epoch: 36   Global Step: 62350   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-03-06 18:24:56,730-Speed 9413.68 samples/sec   Loss 0.8026   LearningRate 0.0000   Epoch: 36   Global Step: 62360   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-03-06 18:25:22,742-Speed 9448.11 samples/sec   Loss 0.8019   LearningRate 0.0000   Epoch: 36   Global Step: 62370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-06 18:25:48,901-Speed 9395.20 samples/sec   Loss 0.8160   LearningRate 0.0000   Epoch: 36   Global Step: 62380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-06 18:26:14,916-Speed 9447.54 samples/sec   Loss 0.8013   LearningRate 0.0000   Epoch: 36   Global Step: 62390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-06 18:26:40,933-Speed 9446.54 samples/sec   Loss 0.8045   LearningRate 0.0000   Epoch: 36   Global Step: 62400   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:27:07,098-Speed 9392.98 samples/sec   Loss 0.8023   LearningRate 0.0000   Epoch: 36   Global Step: 62410   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:27:33,147-Speed 9434.78 samples/sec   Loss 0.8082   LearningRate 0.0000   Epoch: 36   Global Step: 62420   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:27:59,212-Speed 9429.09 samples/sec   Loss 0.8073   LearningRate 0.0000   Epoch: 36   Global Step: 62430   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:28:25,274-Speed 9430.54 samples/sec   Loss 0.8047   LearningRate 0.0000   Epoch: 36   Global Step: 62440   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:28:51,440-Speed 9392.44 samples/sec   Loss 0.8041   LearningRate 0.0000   Epoch: 36   Global Step: 62450   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:29:17,599-Speed 9395.39 samples/sec   Loss 0.8066   LearningRate 0.0000   Epoch: 36   Global Step: 62460   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:29:43,686-Speed 9421.09 samples/sec   Loss 0.8094   LearningRate 0.0000   Epoch: 36   Global Step: 62470   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:30:09,792-Speed 9414.04 samples/sec   Loss 0.8031   LearningRate 0.0000   Epoch: 36   Global Step: 62480   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:30:35,899-Speed 9413.91 samples/sec   Loss 0.8042   LearningRate 0.0000   Epoch: 36   Global Step: 62490   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:31:02,128-Speed 9370.42 samples/sec   Loss 0.8101   LearningRate 0.0000   Epoch: 36   Global Step: 62500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-06 18:31:28,180-Speed 9433.63 samples/sec   Loss 0.8023   LearningRate 0.0000   Epoch: 36   Global Step: 62510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-06 18:31:54,246-Speed 9428.73 samples/sec   Loss 0.8005   LearningRate 0.0000   Epoch: 36   Global Step: 62520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-06 18:32:20,416-Speed 9391.31 samples/sec   Loss 0.8009   LearningRate 0.0000   Epoch: 36   Global Step: 62530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-06 18:32:46,525-Speed 9413.46 samples/sec   Loss 0.8077   LearningRate 0.0000   Epoch: 36   Global Step: 62540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-06 18:33:12,641-Speed 9410.65 samples/sec   Loss 0.8029   LearningRate 0.0000   Epoch: 36   Global Step: 62550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-06 18:33:38,704-Speed 9430.32 samples/sec   Loss 0.8083   LearningRate 0.0000   Epoch: 36   Global Step: 62560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-06 18:34:04,917-Speed 9375.83 samples/sec   Loss 0.8079   LearningRate 0.0000   Epoch: 36   Global Step: 62570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-06 18:34:30,980-Speed 9429.79 samples/sec   Loss 0.8033   LearningRate 0.0000   Epoch: 36   Global Step: 62580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-06 18:34:57,053-Speed 9426.69 samples/sec   Loss 0.8026   LearningRate 0.0000   Epoch: 36   Global Step: 62590   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:35:23,202-Speed 9398.68 samples/sec   Loss 0.8041   LearningRate 0.0000   Epoch: 36   Global Step: 62600   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:35:49,342-Speed 9402.05 samples/sec   Loss 0.8072   LearningRate 0.0000   Epoch: 36   Global Step: 62610   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:36:15,407-Speed 9428.80 samples/sec   Loss 0.7989   LearningRate 0.0000   Epoch: 36   Global Step: 62620   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:36:41,618-Speed 9376.91 samples/sec   Loss 0.8087   LearningRate 0.0000   Epoch: 36   Global Step: 62630   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:37:07,703-Speed 9421.84 samples/sec   Loss 0.8023   LearningRate 0.0000   Epoch: 36   Global Step: 62640   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:37:33,779-Speed 9425.19 samples/sec   Loss 0.8131   LearningRate 0.0000   Epoch: 36   Global Step: 62650   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-06 18:37:59,934-Speed 9396.97 samples/sec   Loss 0.8048   LearningRate 0.0000   Epoch: 36   Global Step: 62660   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-06 18:38:26,112-Speed 9388.24 samples/sec   Loss 0.8097   LearningRate 0.0000   Epoch: 36   Global Step: 62670   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-06 18:38:52,253-Speed 9401.64 samples/sec   Loss 0.8046   LearningRate 0.0000   Epoch: 36   Global Step: 62680   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-06 18:39:18,370-Speed 9410.41 samples/sec   Loss 0.8108   LearningRate 0.0000   Epoch: 36   Global Step: 62690   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-06 18:39:44,557-Speed 9385.17 samples/sec   Loss 0.8078   LearningRate 0.0000   Epoch: 36   Global Step: 62700   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-06 18:40:10,740-Speed 9386.75 samples/sec   Loss 0.8044   LearningRate 0.0000   Epoch: 36   Global Step: 62710   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-06 18:40:36,902-Speed 9394.19 samples/sec   Loss 0.8073   LearningRate 0.0000   Epoch: 36   Global Step: 62720   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-06 18:41:03,075-Speed 9390.41 samples/sec   Loss 0.8127   LearningRate 0.0000   Epoch: 36   Global Step: 62730   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-06 18:41:29,224-Speed 9400.21 samples/sec   Loss 0.8032   LearningRate 0.0000   Epoch: 36   Global Step: 62740   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-06 18:41:55,368-Speed 9400.61 samples/sec   Loss 0.8069   LearningRate 0.0000   Epoch: 36   Global Step: 62750   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:42:21,481-Speed 9411.90 samples/sec   Loss 0.8055   LearningRate 0.0000   Epoch: 36   Global Step: 62760   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:42:47,594-Speed 9411.54 samples/sec   Loss 0.8049   LearningRate 0.0000   Epoch: 36   Global Step: 62770   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:43:13,778-Speed 9386.42 samples/sec   Loss 0.8057   LearningRate 0.0000   Epoch: 36   Global Step: 62780   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:43:39,877-Speed 9416.75 samples/sec   Loss 0.7999   LearningRate 0.0000   Epoch: 36   Global Step: 62790   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:44:06,012-Speed 9403.89 samples/sec   Loss 0.8034   LearningRate 0.0000   Epoch: 36   Global Step: 62800   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:44:32,232-Speed 9373.56 samples/sec   Loss 0.8084   LearningRate 0.0000   Epoch: 36   Global Step: 62810   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:44:58,294-Speed 9430.14 samples/sec   Loss 0.8019   LearningRate 0.0000   Epoch: 36   Global Step: 62820   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:45:24,413-Speed 9409.86 samples/sec   Loss 0.8009   LearningRate 0.0000   Epoch: 36   Global Step: 62830   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:45:50,505-Speed 9419.36 samples/sec   Loss 0.8033   LearningRate 0.0000   Epoch: 36   Global Step: 62840   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:46:16,669-Speed 9393.36 samples/sec   Loss 0.8018   LearningRate 0.0000   Epoch: 36   Global Step: 62850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-06 18:46:42,763-Speed 9418.48 samples/sec   Loss 0.8000   LearningRate 0.0000   Epoch: 36   Global Step: 62860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-06 18:47:08,957-Speed 9382.71 samples/sec   Loss 0.7994   LearningRate 0.0000   Epoch: 36   Global Step: 62870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-06 18:47:35,140-Speed 9386.71 samples/sec   Loss 0.8075   LearningRate 0.0000   Epoch: 36   Global Step: 62880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-06 18:48:01,203-Speed 9430.02 samples/sec   Loss 0.8058   LearningRate 0.0000   Epoch: 36   Global Step: 62890   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:48:27,312-Speed 9413.18 samples/sec   Loss 0.8098   LearningRate 0.0000   Epoch: 36   Global Step: 62900   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:48:53,470-Speed 9395.72 samples/sec   Loss 0.8034   LearningRate 0.0000   Epoch: 36   Global Step: 62910   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:49:19,566-Speed 9417.66 samples/sec   Loss 0.8025   LearningRate 0.0000   Epoch: 36   Global Step: 62920   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:49:45,612-Speed 9436.09 samples/sec   Loss 0.8009   LearningRate 0.0000   Epoch: 36   Global Step: 62930   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:50:11,768-Speed 9396.47 samples/sec   Loss 0.7979   LearningRate 0.0000   Epoch: 36   Global Step: 62940   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:50:37,953-Speed 9385.52 samples/sec   Loss 0.8027   LearningRate 0.0000   Epoch: 36   Global Step: 62950   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:51:04,083-Speed 9405.69 samples/sec   Loss 0.8093   LearningRate 0.0000   Epoch: 36   Global Step: 62960   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:51:30,243-Speed 9394.84 samples/sec   Loss 0.8048   LearningRate 0.0000   Epoch: 36   Global Step: 62970   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:51:56,398-Speed 9396.96 samples/sec   Loss 0.8029   LearningRate 0.0000   Epoch: 36   Global Step: 62980   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-06 18:52:22,545-Speed 9399.51 samples/sec   Loss 0.7961   LearningRate 0.0000   Epoch: 36   Global Step: 62990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-06 18:52:48,754-Speed 9377.18 samples/sec   Loss 0.8045   LearningRate 0.0000   Epoch: 36   Global Step: 63000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-06 18:53:14,834-Speed 9423.99 samples/sec   Loss 0.8018   LearningRate 0.0000   Epoch: 36   Global Step: 63010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 18:53:41,028-Speed 9382.74 samples/sec   Loss 0.8089   LearningRate 0.0000   Epoch: 36   Global Step: 63020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 18:54:07,195-Speed 9392.11 samples/sec   Loss 0.7981   LearningRate 0.0000   Epoch: 36   Global Step: 63030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 18:54:33,384-Speed 9384.52 samples/sec   Loss 0.8056   LearningRate 0.0000   Epoch: 36   Global Step: 63040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 18:54:59,527-Speed 9401.01 samples/sec   Loss 0.8009   LearningRate 0.0000   Epoch: 36   Global Step: 63050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 18:55:25,637-Speed 9413.03 samples/sec   Loss 0.7965   LearningRate 0.0000   Epoch: 36   Global Step: 63060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 18:55:51,791-Speed 9397.11 samples/sec   Loss 0.7975   LearningRate 0.0000   Epoch: 36   Global Step: 63070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 18:56:17,999-Speed 9377.79 samples/sec   Loss 0.7994   LearningRate 0.0000   Epoch: 36   Global Step: 63080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 18:56:44,051-Speed 9433.78 samples/sec   Loss 0.8000   LearningRate 0.0000   Epoch: 36   Global Step: 63090   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-06 18:57:10,203-Speed 9397.77 samples/sec   Loss 0.8066   LearningRate 0.0000   Epoch: 36   Global Step: 63100   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-06 18:57:39,721-Speed 8326.16 samples/sec   Loss 0.7904   LearningRate 0.0000   Epoch: 36   Global Step: 63110   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-06 18:58:05,831-Speed 9412.71 samples/sec   Loss 0.8024   LearningRate 0.0000   Epoch: 36   Global Step: 63120   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-06 18:58:32,016-Speed 9385.92 samples/sec   Loss 0.7927   LearningRate 0.0000   Epoch: 36   Global Step: 63130   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-06 18:58:58,181-Speed 9393.29 samples/sec   Loss 0.8033   LearningRate 0.0000   Epoch: 36   Global Step: 63140   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-06 18:59:24,346-Speed 9393.09 samples/sec   Loss 0.7964   LearningRate 0.0000   Epoch: 36   Global Step: 63150   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-06 18:59:50,563-Speed 9374.36 samples/sec   Loss 0.8020   LearningRate 0.0000   Epoch: 36   Global Step: 63160   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-06 19:00:16,780-Speed 9374.50 samples/sec   Loss 0.7991   LearningRate 0.0000   Epoch: 36   Global Step: 63170   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-06 19:00:42,885-Speed 9414.56 samples/sec   Loss 0.7988   LearningRate 0.0000   Epoch: 36   Global Step: 63180   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-06 19:01:09,101-Speed 9374.87 samples/sec   Loss 0.7982   LearningRate 0.0000   Epoch: 36   Global Step: 63190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:01:35,238-Speed 9403.13 samples/sec   Loss 0.7975   LearningRate 0.0000   Epoch: 36   Global Step: 63200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:02:01,345-Speed 9414.12 samples/sec   Loss 0.7989   LearningRate 0.0000   Epoch: 36   Global Step: 63210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:02:27,510-Speed 9393.27 samples/sec   Loss 0.7999   LearningRate 0.0000   Epoch: 36   Global Step: 63220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:02:53,676-Speed 9392.85 samples/sec   Loss 0.7951   LearningRate 0.0000   Epoch: 36   Global Step: 63230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:03:19,866-Speed 9383.93 samples/sec   Loss 0.8005   LearningRate 0.0000   Epoch: 36   Global Step: 63240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:03:46,079-Speed 9375.86 samples/sec   Loss 0.8025   LearningRate 0.0000   Epoch: 36   Global Step: 63250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:04:12,236-Speed 9396.05 samples/sec   Loss 0.7916   LearningRate 0.0000   Epoch: 36   Global Step: 63260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:04:38,372-Speed 9403.42 samples/sec   Loss 0.8051   LearningRate 0.0000   Epoch: 36   Global Step: 63270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:05:04,520-Speed 9400.12 samples/sec   Loss 0.7951   LearningRate 0.0000   Epoch: 36   Global Step: 63280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:05:30,715-Speed 9382.26 samples/sec   Loss 0.8049   LearningRate 0.0000   Epoch: 36   Global Step: 63290   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-06 19:05:56,891-Speed 9389.31 samples/sec   Loss 0.8060   LearningRate 0.0000   Epoch: 36   Global Step: 63300   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-06 19:06:23,154-Speed 9357.99 samples/sec   Loss 0.8040   LearningRate 0.0000   Epoch: 36   Global Step: 63310   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-06 19:06:49,298-Speed 9400.63 samples/sec   Loss 0.7984   LearningRate 0.0000   Epoch: 36   Global Step: 63320   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-06 19:07:15,571-Speed 9354.35 samples/sec   Loss 0.7994   LearningRate 0.0000   Epoch: 36   Global Step: 63330   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-06 19:07:41,869-Speed 9345.71 samples/sec   Loss 0.7982   LearningRate 0.0000   Epoch: 36   Global Step: 63340   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-06 19:08:07,992-Speed 9407.87 samples/sec   Loss 0.8007   LearningRate 0.0000   Epoch: 36   Global Step: 63350   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-06 19:08:34,121-Speed 9406.08 samples/sec   Loss 0.7931   LearningRate 0.0000   Epoch: 36   Global Step: 63360   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-06 19:09:00,356-Speed 9368.23 samples/sec   Loss 0.7942   LearningRate 0.0000   Epoch: 36   Global Step: 63370   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-06 19:09:26,464-Speed 9413.52 samples/sec   Loss 0.7932   LearningRate 0.0000   Epoch: 36   Global Step: 63380   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-06 19:09:52,550-Speed 9421.50 samples/sec   Loss 0.7963   LearningRate 0.0000   Epoch: 36   Global Step: 63390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:10:18,714-Speed 9393.64 samples/sec   Loss 0.8026   LearningRate 0.0000   Epoch: 36   Global Step: 63400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:10:44,902-Speed 9384.54 samples/sec   Loss 0.7961   LearningRate 0.0000   Epoch: 36   Global Step: 63410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:11:11,173-Speed 9355.30 samples/sec   Loss 0.8057   LearningRate 0.0000   Epoch: 36   Global Step: 63420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:11:37,335-Speed 9394.11 samples/sec   Loss 0.7879   LearningRate 0.0000   Epoch: 36   Global Step: 63430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:12:03,461-Speed 9407.20 samples/sec   Loss 0.7944   LearningRate 0.0000   Epoch: 36   Global Step: 63440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:12:29,543-Speed 9423.08 samples/sec   Loss 0.7925   LearningRate 0.0000   Epoch: 36   Global Step: 63450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:12:55,659-Speed 9410.64 samples/sec   Loss 0.7976   LearningRate 0.0000   Epoch: 36   Global Step: 63460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:13:21,762-Speed 9415.39 samples/sec   Loss 0.7982   LearningRate 0.0000   Epoch: 36   Global Step: 63470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:13:47,890-Speed 9406.38 samples/sec   Loss 0.8038   LearningRate 0.0000   Epoch: 36   Global Step: 63480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:14:13,995-Speed 9414.54 samples/sec   Loss 0.7948   LearningRate 0.0000   Epoch: 36   Global Step: 63490   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-06 19:14:40,126-Speed 9405.32 samples/sec   Loss 0.7945   LearningRate 0.0000   Epoch: 36   Global Step: 63500   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-06 19:15:06,316-Speed 9384.10 samples/sec   Loss 0.7991   LearningRate 0.0000   Epoch: 36   Global Step: 63510   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-06 19:15:32,489-Speed 9390.17 samples/sec   Loss 0.8040   LearningRate 0.0000   Epoch: 36   Global Step: 63520   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-06 19:15:58,571-Speed 9423.06 samples/sec   Loss 0.7956   LearningRate 0.0000   Epoch: 36   Global Step: 63530   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-06 19:16:24,774-Speed 9379.38 samples/sec   Loss 0.7969   LearningRate 0.0000   Epoch: 36   Global Step: 63540   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-06 19:16:50,903-Speed 9405.97 samples/sec   Loss 0.7938   LearningRate 0.0000   Epoch: 36   Global Step: 63550   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-06 19:17:17,021-Speed 9409.92 samples/sec   Loss 0.8007   LearningRate 0.0000   Epoch: 36   Global Step: 63560   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-06 19:17:43,175-Speed 9396.92 samples/sec   Loss 0.8048   LearningRate 0.0000   Epoch: 36   Global Step: 63570   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-06 19:18:09,278-Speed 9415.64 samples/sec   Loss 0.7972   LearningRate 0.0000   Epoch: 36   Global Step: 63580   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-06 19:18:35,422-Speed 9400.76 samples/sec   Loss 0.7959   LearningRate 0.0000   Epoch: 36   Global Step: 63590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:19:01,571-Speed 9398.86 samples/sec   Loss 0.7954   LearningRate 0.0000   Epoch: 36   Global Step: 63600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:19:27,736-Speed 9392.92 samples/sec   Loss 0.7985   LearningRate 0.0000   Epoch: 36   Global Step: 63610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:19:53,885-Speed 9398.88 samples/sec   Loss 0.7915   LearningRate 0.0000   Epoch: 36   Global Step: 63620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:20:20,074-Speed 9384.39 samples/sec   Loss 0.7923   LearningRate 0.0000   Epoch: 36   Global Step: 63630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:20:46,219-Speed 9400.51 samples/sec   Loss 0.8016   LearningRate 0.0000   Epoch: 36   Global Step: 63640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:21:12,413-Speed 9382.66 samples/sec   Loss 0.7947   LearningRate 0.0000   Epoch: 36   Global Step: 63650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:21:38,591-Speed 9388.59 samples/sec   Loss 0.7991   LearningRate 0.0000   Epoch: 36   Global Step: 63660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:22:04,741-Speed 9398.50 samples/sec   Loss 0.7952   LearningRate 0.0000   Epoch: 36   Global Step: 63670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:22:30,876-Speed 9403.80 samples/sec   Loss 0.7982   LearningRate 0.0000   Epoch: 36   Global Step: 63680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:22:56,955-Speed 9424.39 samples/sec   Loss 0.7989   LearningRate 0.0000   Epoch: 36   Global Step: 63690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:23:23,068-Speed 9411.95 samples/sec   Loss 0.7982   LearningRate 0.0000   Epoch: 36   Global Step: 63700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:23:49,194-Speed 9407.08 samples/sec   Loss 0.7995   LearningRate 0.0000   Epoch: 36   Global Step: 63710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:24:15,343-Speed 9399.09 samples/sec   Loss 0.8023   LearningRate 0.0000   Epoch: 36   Global Step: 63720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:24:41,566-Speed 9372.09 samples/sec   Loss 0.7974   LearningRate 0.0000   Epoch: 36   Global Step: 63730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:25:07,697-Speed 9405.72 samples/sec   Loss 0.7905   LearningRate 0.0000   Epoch: 36   Global Step: 63740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:25:33,852-Speed 9396.70 samples/sec   Loss 0.7973   LearningRate 0.0000   Epoch: 36   Global Step: 63750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:26:00,003-Speed 9398.35 samples/sec   Loss 0.7989   LearningRate 0.0000   Epoch: 36   Global Step: 63760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:26:26,120-Speed 9410.39 samples/sec   Loss 0.7970   LearningRate 0.0000   Epoch: 36   Global Step: 63770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:26:52,301-Speed 9387.31 samples/sec   Loss 0.7932   LearningRate 0.0000   Epoch: 36   Global Step: 63780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:27:18,509-Speed 9377.46 samples/sec   Loss 0.8012   LearningRate 0.0000   Epoch: 36   Global Step: 63790   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-03-06 19:27:44,613-Speed 9415.40 samples/sec   Loss 0.7991   LearningRate 0.0000   Epoch: 36   Global Step: 63800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:28:10,734-Speed 9409.01 samples/sec   Loss 0.7940   LearningRate 0.0000   Epoch: 36   Global Step: 63810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:28:36,878-Speed 9400.89 samples/sec   Loss 0.7934   LearningRate 0.0000   Epoch: 36   Global Step: 63820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:29:03,125-Speed 9363.69 samples/sec   Loss 0.8021   LearningRate 0.0000   Epoch: 36   Global Step: 63830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:29:29,358-Speed 9368.96 samples/sec   Loss 0.7962   LearningRate 0.0000   Epoch: 36   Global Step: 63840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:29:55,504-Speed 9400.08 samples/sec   Loss 0.7937   LearningRate 0.0000   Epoch: 36   Global Step: 63850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:30:21,771-Speed 9356.21 samples/sec   Loss 0.7935   LearningRate 0.0000   Epoch: 36   Global Step: 63860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:30:47,934-Speed 9393.75 samples/sec   Loss 0.7965   LearningRate 0.0000   Epoch: 36   Global Step: 63870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:31:14,226-Speed 9347.80 samples/sec   Loss 0.8011   LearningRate 0.0000   Epoch: 36   Global Step: 63880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:31:40,437-Speed 9376.62 samples/sec   Loss 0.7932   LearningRate 0.0000   Epoch: 36   Global Step: 63890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:32:06,606-Speed 9391.88 samples/sec   Loss 0.7929   LearningRate 0.0000   Epoch: 36   Global Step: 63900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:32:32,784-Speed 9388.57 samples/sec   Loss 0.7914   LearningRate 0.0000   Epoch: 36   Global Step: 63910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:32:58,931-Speed 9399.60 samples/sec   Loss 0.7986   LearningRate 0.0000   Epoch: 36   Global Step: 63920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:33:25,144-Speed 9375.93 samples/sec   Loss 0.7949   LearningRate 0.0000   Epoch: 36   Global Step: 63930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:33:51,308-Speed 9393.72 samples/sec   Loss 0.7966   LearningRate 0.0000   Epoch: 36   Global Step: 63940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:35:10,834-Speed 3090.33 samples/sec   Loss 0.7901   LearningRate 0.0000   Epoch: 37   Global Step: 63950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:35:36,730-Speed 9490.68 samples/sec   Loss 0.7897   LearningRate 0.0000   Epoch: 37   Global Step: 63960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:36:02,716-Speed 9458.27 samples/sec   Loss 0.7895   LearningRate 0.0000   Epoch: 37   Global Step: 63970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:36:28,797-Speed 9423.20 samples/sec   Loss 0.7893   LearningRate 0.0000   Epoch: 37   Global Step: 63980   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-06 19:36:54,819-Speed 9444.75 samples/sec   Loss 0.7865   LearningRate 0.0000   Epoch: 37   Global Step: 63990   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-06 19:37:20,859-Speed 9438.21 samples/sec   Loss 0.7958   LearningRate 0.0000   Epoch: 37   Global Step: 64000   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-06 19:37:46,791-Speed 9477.30 samples/sec   Loss 0.7939   LearningRate 0.0000   Epoch: 37   Global Step: 64010   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-06 19:38:12,734-Speed 9473.32 samples/sec   Loss 0.7845   LearningRate 0.0000   Epoch: 37   Global Step: 64020   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-06 19:38:38,703-Speed 9464.05 samples/sec   Loss 0.7928   LearningRate 0.0000   Epoch: 37   Global Step: 64030   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-06 19:39:04,662-Speed 9467.77 samples/sec   Loss 0.7943   LearningRate 0.0000   Epoch: 37   Global Step: 64040   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-06 19:39:30,674-Speed 9448.09 samples/sec   Loss 0.7884   LearningRate 0.0000   Epoch: 37   Global Step: 64050   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-06 19:39:56,664-Speed 9456.85 samples/sec   Loss 0.7969   LearningRate 0.0000   Epoch: 37   Global Step: 64060   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-06 19:40:22,616-Speed 9470.27 samples/sec   Loss 0.7948   LearningRate 0.0000   Epoch: 37   Global Step: 64070   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-06 19:40:48,623-Speed 9450.19 samples/sec   Loss 0.7851   LearningRate 0.0000   Epoch: 37   Global Step: 64080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:41:14,661-Speed 9438.66 samples/sec   Loss 0.7899   LearningRate 0.0000   Epoch: 37   Global Step: 64090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:41:40,689-Speed 9442.61 samples/sec   Loss 0.7970   LearningRate 0.0000   Epoch: 37   Global Step: 64100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:42:06,700-Speed 9448.69 samples/sec   Loss 0.7899   LearningRate 0.0000   Epoch: 37   Global Step: 64110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:42:32,751-Speed 9434.26 samples/sec   Loss 0.7911   LearningRate 0.0000   Epoch: 37   Global Step: 64120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:42:58,803-Speed 9433.92 samples/sec   Loss 0.7933   LearningRate 0.0000   Epoch: 37   Global Step: 64130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:43:24,929-Speed 9407.31 samples/sec   Loss 0.7913   LearningRate 0.0000   Epoch: 37   Global Step: 64140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:43:50,895-Speed 9464.87 samples/sec   Loss 0.7914   LearningRate 0.0000   Epoch: 37   Global Step: 64150   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-06 19:44:17,211-Speed 9339.32 samples/sec   Loss 0.7927   LearningRate 0.0000   Epoch: 37   Global Step: 64160   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-06 19:44:43,283-Speed 9426.85 samples/sec   Loss 0.7900   LearningRate 0.0000   Epoch: 37   Global Step: 64170   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-06 19:45:09,446-Speed 9393.66 samples/sec   Loss 0.7878   LearningRate 0.0000   Epoch: 37   Global Step: 64180   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-06 19:45:35,525-Speed 9424.10 samples/sec   Loss 0.7949   LearningRate 0.0000   Epoch: 37   Global Step: 64190   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-06 19:46:01,686-Speed 9394.63 samples/sec   Loss 0.7945   LearningRate 0.0000   Epoch: 37   Global Step: 64200   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-06 19:46:27,758-Speed 9426.73 samples/sec   Loss 0.7950   LearningRate 0.0000   Epoch: 37   Global Step: 64210   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-06 19:46:53,896-Speed 9402.95 samples/sec   Loss 0.7946   LearningRate 0.0000   Epoch: 37   Global Step: 64220   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-06 19:47:20,127-Speed 9369.32 samples/sec   Loss 0.7957   LearningRate 0.0000   Epoch: 37   Global Step: 64230   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-06 19:47:46,293-Speed 9392.86 samples/sec   Loss 0.7893   LearningRate 0.0000   Epoch: 37   Global Step: 64240   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-06 19:48:12,463-Speed 9391.18 samples/sec   Loss 0.7866   LearningRate 0.0000   Epoch: 37   Global Step: 64250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:48:38,612-Speed 9398.85 samples/sec   Loss 0.7951   LearningRate 0.0000   Epoch: 37   Global Step: 64260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:49:04,746-Speed 9404.28 samples/sec   Loss 0.8014   LearningRate 0.0000   Epoch: 37   Global Step: 64270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:49:30,992-Speed 9364.16 samples/sec   Loss 0.7986   LearningRate 0.0000   Epoch: 37   Global Step: 64280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:49:57,155-Speed 9393.92 samples/sec   Loss 0.7963   LearningRate 0.0000   Epoch: 37   Global Step: 64290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:50:23,434-Speed 9352.43 samples/sec   Loss 0.7933   LearningRate 0.0000   Epoch: 37   Global Step: 64300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:50:49,655-Speed 9373.53 samples/sec   Loss 0.8001   LearningRate 0.0000   Epoch: 37   Global Step: 64310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:51:15,882-Speed 9370.88 samples/sec   Loss 0.7945   LearningRate 0.0000   Epoch: 37   Global Step: 64320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:51:42,062-Speed 9387.32 samples/sec   Loss 0.7925   LearningRate 0.0000   Epoch: 37   Global Step: 64330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:52:08,268-Speed 9378.82 samples/sec   Loss 0.7930   LearningRate 0.0000   Epoch: 37   Global Step: 64340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:52:34,502-Speed 9368.26 samples/sec   Loss 0.7943   LearningRate 0.0000   Epoch: 37   Global Step: 64350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:53:00,698-Speed 9382.43 samples/sec   Loss 0.8005   LearningRate 0.0000   Epoch: 37   Global Step: 64360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-06 19:53:26,834-Speed 9403.63 samples/sec   Loss 0.7954   LearningRate 0.0000   Epoch: 37   Global Step: 64370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 19:53:53,074-Speed 9366.04 samples/sec   Loss 0.7937   LearningRate 0.0000   Epoch: 37   Global Step: 64380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 19:54:19,271-Speed 9381.67 samples/sec   Loss 0.7953   LearningRate 0.0000   Epoch: 37   Global Step: 64390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 19:54:45,503-Speed 9369.43 samples/sec   Loss 0.7846   LearningRate 0.0000   Epoch: 37   Global Step: 64400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 19:55:11,658-Speed 9396.59 samples/sec   Loss 0.7876   LearningRate 0.0000   Epoch: 37   Global Step: 64410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 19:55:37,859-Speed 9380.02 samples/sec   Loss 0.7907   LearningRate 0.0000   Epoch: 37   Global Step: 64420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 19:56:04,015-Speed 9396.37 samples/sec   Loss 0.7914   LearningRate 0.0000   Epoch: 37   Global Step: 64430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 19:56:30,227-Speed 9376.39 samples/sec   Loss 0.7934   LearningRate 0.0000   Epoch: 37   Global Step: 64440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 19:56:56,436-Speed 9377.45 samples/sec   Loss 0.7961   LearningRate 0.0000   Epoch: 37   Global Step: 64450   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-03-06 19:57:22,581-Speed 9400.09 samples/sec   Loss 0.7901   LearningRate 0.0000   Epoch: 37   Global Step: 64460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 19:57:48,765-Speed 9386.28 samples/sec   Loss 0.7942   LearningRate 0.0000   Epoch: 37   Global Step: 64470   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-06 19:58:14,915-Speed 9398.53 samples/sec   Loss 0.7909   LearningRate 0.0000   Epoch: 37   Global Step: 64480   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-06 19:58:41,064-Speed 9399.14 samples/sec   Loss 0.7945   LearningRate 0.0000   Epoch: 37   Global Step: 64490   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-06 19:59:07,253-Speed 9384.45 samples/sec   Loss 0.7811   LearningRate 0.0000   Epoch: 37   Global Step: 64500   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-06 19:59:36,100-Speed 8519.62 samples/sec   Loss 0.7937   LearningRate 0.0000   Epoch: 37   Global Step: 64510   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-06 20:00:02,249-Speed 9398.99 samples/sec   Loss 0.7873   LearningRate 0.0000   Epoch: 37   Global Step: 64520   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-06 20:00:28,404-Speed 9396.87 samples/sec   Loss 0.7917   LearningRate 0.0000   Epoch: 37   Global Step: 64530   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-06 20:00:54,552-Speed 9399.32 samples/sec   Loss 0.7849   LearningRate 0.0000   Epoch: 37   Global Step: 64540   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-06 20:01:20,690-Speed 9402.88 samples/sec   Loss 0.7931   LearningRate 0.0000   Epoch: 37   Global Step: 64550   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-06 20:01:46,918-Speed 9370.37 samples/sec   Loss 0.7950   LearningRate 0.0000   Epoch: 37   Global Step: 64560   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-06 20:02:13,069-Speed 9398.42 samples/sec   Loss 0.7913   LearningRate 0.0000   Epoch: 37   Global Step: 64570   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-06 20:02:39,263-Speed 9382.53 samples/sec   Loss 0.7883   LearningRate 0.0000   Epoch: 37   Global Step: 64580   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-06 20:03:05,404-Speed 9401.45 samples/sec   Loss 0.7891   LearningRate 0.0000   Epoch: 37   Global Step: 64590   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-06 20:03:31,632-Speed 9370.87 samples/sec   Loss 0.7861   LearningRate 0.0000   Epoch: 37   Global Step: 64600   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-06 20:03:57,870-Speed 9366.99 samples/sec   Loss 0.7923   LearningRate 0.0000   Epoch: 37   Global Step: 64610   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-06 20:04:24,071-Speed 9380.22 samples/sec   Loss 0.7899   LearningRate 0.0000   Epoch: 37   Global Step: 64620   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-06 20:04:50,280-Speed 9377.10 samples/sec   Loss 0.7930   LearningRate 0.0000   Epoch: 37   Global Step: 64630   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-06 20:05:16,412-Speed 9404.99 samples/sec   Loss 0.7903   LearningRate 0.0000   Epoch: 37   Global Step: 64640   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-06 20:05:42,569-Speed 9396.03 samples/sec   Loss 0.7961   LearningRate 0.0000   Epoch: 37   Global Step: 64650   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-06 20:06:08,759-Speed 9385.25 samples/sec   Loss 0.7878   LearningRate 0.0000   Epoch: 37   Global Step: 64660   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-06 20:06:34,955-Speed 9381.99 samples/sec   Loss 0.7854   LearningRate 0.0000   Epoch: 37   Global Step: 64670   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-06 20:07:01,128-Speed 9389.99 samples/sec   Loss 0.7877   LearningRate 0.0000   Epoch: 37   Global Step: 64680   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-06 20:07:27,298-Speed 9391.53 samples/sec   Loss 0.7884   LearningRate 0.0000   Epoch: 37   Global Step: 64690   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-06 20:07:53,470-Speed 9390.75 samples/sec   Loss 0.7907   LearningRate 0.0000   Epoch: 37   Global Step: 64700   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-06 20:08:19,727-Speed 9360.20 samples/sec   Loss 0.7948   LearningRate 0.0000   Epoch: 37   Global Step: 64710   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-06 20:08:45,881-Speed 9397.08 samples/sec   Loss 0.7951   LearningRate 0.0000   Epoch: 37   Global Step: 64720   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-06 20:09:12,087-Speed 9378.39 samples/sec   Loss 0.7928   LearningRate 0.0000   Epoch: 37   Global Step: 64730   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-06 20:09:38,297-Speed 9376.98 samples/sec   Loss 0.7939   LearningRate 0.0000   Epoch: 37   Global Step: 64740   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-06 20:10:04,435-Speed 9403.54 samples/sec   Loss 0.7910   LearningRate 0.0000   Epoch: 37   Global Step: 64750   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-06 20:10:30,616-Speed 9387.43 samples/sec   Loss 0.7938   LearningRate 0.0000   Epoch: 37   Global Step: 64760   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-06 20:10:56,756-Speed 9402.00 samples/sec   Loss 0.7918   LearningRate 0.0000   Epoch: 37   Global Step: 64770   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-06 20:11:23,004-Speed 9363.40 samples/sec   Loss 0.7850   LearningRate 0.0000   Epoch: 37   Global Step: 64780   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-06 20:11:49,175-Speed 9391.01 samples/sec   Loss 0.7874   LearningRate 0.0000   Epoch: 37   Global Step: 64790   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-06 20:12:15,414-Speed 9366.49 samples/sec   Loss 0.7834   LearningRate 0.0000   Epoch: 37   Global Step: 64800   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-06 20:12:41,586-Speed 9390.38 samples/sec   Loss 0.7832   LearningRate 0.0000   Epoch: 37   Global Step: 64810   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-06 20:13:07,808-Speed 9372.67 samples/sec   Loss 0.7918   LearningRate 0.0000   Epoch: 37   Global Step: 64820   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-06 20:13:34,062-Speed 9361.24 samples/sec   Loss 0.7857   LearningRate 0.0000   Epoch: 37   Global Step: 64830   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-06 20:14:00,327-Speed 9357.40 samples/sec   Loss 0.7884   LearningRate 0.0000   Epoch: 37   Global Step: 64840   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-06 20:14:26,536-Speed 9377.57 samples/sec   Loss 0.7859   LearningRate 0.0000   Epoch: 37   Global Step: 64850   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-06 20:14:52,750-Speed 9375.67 samples/sec   Loss 0.7930   LearningRate 0.0000   Epoch: 37   Global Step: 64860   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-06 20:15:18,919-Speed 9391.63 samples/sec   Loss 0.7882   LearningRate 0.0000   Epoch: 37   Global Step: 64870   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-06 20:15:45,147-Speed 9371.09 samples/sec   Loss 0.7845   LearningRate 0.0000   Epoch: 37   Global Step: 64880   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-06 20:16:11,342-Speed 9382.41 samples/sec   Loss 0.7920   LearningRate 0.0000   Epoch: 37   Global Step: 64890   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-06 20:16:37,481-Speed 9402.33 samples/sec   Loss 0.7860   LearningRate 0.0000   Epoch: 37   Global Step: 64900   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-06 20:17:03,682-Speed 9380.15 samples/sec   Loss 0.7871   LearningRate 0.0000   Epoch: 37   Global Step: 64910   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-06 20:17:29,777-Speed 9418.28 samples/sec   Loss 0.7871   LearningRate 0.0000   Epoch: 37   Global Step: 64920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 20:17:55,937-Speed 9394.97 samples/sec   Loss 0.7862   LearningRate 0.0000   Epoch: 37   Global Step: 64930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 20:18:22,062-Speed 9407.46 samples/sec   Loss 0.7888   LearningRate 0.0000   Epoch: 37   Global Step: 64940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 20:18:48,259-Speed 9381.69 samples/sec   Loss 0.7868   LearningRate 0.0000   Epoch: 37   Global Step: 64950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 20:19:14,304-Speed 9436.47 samples/sec   Loss 0.7899   LearningRate 0.0000   Epoch: 37   Global Step: 64960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 20:19:40,386-Speed 9422.91 samples/sec   Loss 0.7842   LearningRate 0.0000   Epoch: 37   Global Step: 64970   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-06 20:20:06,555-Speed 9391.36 samples/sec   Loss 0.7862   LearningRate 0.0000   Epoch: 37   Global Step: 64980   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-06 20:20:32,745-Speed 9384.38 samples/sec   Loss 0.7869   LearningRate 0.0000   Epoch: 37   Global Step: 64990   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-06 20:20:58,919-Speed 9389.69 samples/sec   Loss 0.7846   LearningRate 0.0000   Epoch: 37   Global Step: 65000   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-06 20:21:25,024-Speed 9414.54 samples/sec   Loss 0.7942   LearningRate 0.0000   Epoch: 37   Global Step: 65010   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-06 20:21:51,142-Speed 9410.03 samples/sec   Loss 0.7876   LearningRate 0.0000   Epoch: 37   Global Step: 65020   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-06 20:22:17,324-Speed 9387.23 samples/sec   Loss 0.7932   LearningRate 0.0000   Epoch: 37   Global Step: 65030   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-06 20:22:43,435-Speed 9412.45 samples/sec   Loss 0.7919   LearningRate 0.0000   Epoch: 37   Global Step: 65040   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-06 20:23:09,526-Speed 9419.84 samples/sec   Loss 0.7891   LearningRate 0.0000   Epoch: 37   Global Step: 65050   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-06 20:23:35,654-Speed 9406.06 samples/sec   Loss 0.7865   LearningRate 0.0000   Epoch: 37   Global Step: 65060   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-06 20:24:01,878-Speed 9372.03 samples/sec   Loss 0.7886   LearningRate 0.0000   Epoch: 37   Global Step: 65070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 20:24:28,023-Speed 9400.34 samples/sec   Loss 0.7865   LearningRate 0.0000   Epoch: 37   Global Step: 65080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 20:24:54,132-Speed 9413.20 samples/sec   Loss 0.7887   LearningRate 0.0000   Epoch: 37   Global Step: 65090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 20:25:20,335-Speed 9379.86 samples/sec   Loss 0.7822   LearningRate 0.0000   Epoch: 37   Global Step: 65100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 20:25:46,438-Speed 9415.54 samples/sec   Loss 0.7830   LearningRate 0.0000   Epoch: 37   Global Step: 65110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 20:26:12,542-Speed 9415.13 samples/sec   Loss 0.7880   LearningRate 0.0000   Epoch: 37   Global Step: 65120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 20:26:38,782-Speed 9366.75 samples/sec   Loss 0.7840   LearningRate 0.0000   Epoch: 37   Global Step: 65130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 20:27:05,027-Speed 9364.45 samples/sec   Loss 0.7866   LearningRate 0.0000   Epoch: 37   Global Step: 65140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 20:27:31,136-Speed 9413.44 samples/sec   Loss 0.7837   LearningRate 0.0000   Epoch: 37   Global Step: 65150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 20:27:57,285-Speed 9398.83 samples/sec   Loss 0.7908   LearningRate 0.0000   Epoch: 37   Global Step: 65160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 20:28:23,432-Speed 9400.04 samples/sec   Loss 0.7859   LearningRate 0.0000   Epoch: 37   Global Step: 65170   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-03-06 20:28:49,530-Speed 9417.08 samples/sec   Loss 0.7737   LearningRate 0.0000   Epoch: 37   Global Step: 65180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 20:29:15,705-Speed 9389.36 samples/sec   Loss 0.7897   LearningRate 0.0000   Epoch: 37   Global Step: 65190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 20:29:41,921-Speed 9374.95 samples/sec   Loss 0.7862   LearningRate 0.0000   Epoch: 37   Global Step: 65200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 20:30:08,098-Speed 9388.45 samples/sec   Loss 0.7951   LearningRate 0.0000   Epoch: 37   Global Step: 65210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 20:30:34,222-Speed 9409.01 samples/sec   Loss 0.7845   LearningRate 0.0000   Epoch: 37   Global Step: 65220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 20:31:00,368-Speed 9399.78 samples/sec   Loss 0.7864   LearningRate 0.0000   Epoch: 37   Global Step: 65230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 20:31:26,475-Speed 9413.95 samples/sec   Loss 0.7855   LearningRate 0.0000   Epoch: 37   Global Step: 65240   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-06 20:31:52,616-Speed 9401.61 samples/sec   Loss 0.7827   LearningRate 0.0000   Epoch: 37   Global Step: 65250   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-06 20:32:18,850-Speed 9368.52 samples/sec   Loss 0.7797   LearningRate 0.0000   Epoch: 37   Global Step: 65260   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-06 20:32:45,051-Speed 9380.36 samples/sec   Loss 0.7882   LearningRate 0.0000   Epoch: 37   Global Step: 65270   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-06 20:33:11,221-Speed 9391.11 samples/sec   Loss 0.7885   LearningRate 0.0000   Epoch: 37   Global Step: 65280   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-06 20:33:37,437-Speed 9374.92 samples/sec   Loss 0.7834   LearningRate 0.0000   Epoch: 37   Global Step: 65290   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-06 20:34:03,690-Speed 9361.77 samples/sec   Loss 0.7892   LearningRate 0.0000   Epoch: 37   Global Step: 65300   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-06 20:34:29,830-Speed 9401.99 samples/sec   Loss 0.7952   LearningRate 0.0000   Epoch: 37   Global Step: 65310   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-06 20:34:56,036-Speed 9378.42 samples/sec   Loss 0.7786   LearningRate 0.0000   Epoch: 37   Global Step: 65320   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-06 20:35:22,150-Speed 9411.26 samples/sec   Loss 0.7866   LearningRate 0.0000   Epoch: 37   Global Step: 65330   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-06 20:35:48,301-Speed 9398.30 samples/sec   Loss 0.7826   LearningRate 0.0000   Epoch: 37   Global Step: 65340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 20:36:14,440-Speed 9402.21 samples/sec   Loss 0.7865   LearningRate 0.0000   Epoch: 37   Global Step: 65350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 20:36:40,679-Speed 9366.87 samples/sec   Loss 0.7823   LearningRate 0.0000   Epoch: 37   Global Step: 65360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 20:37:06,835-Speed 9396.14 samples/sec   Loss 0.7805   LearningRate 0.0000   Epoch: 37   Global Step: 65370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 20:37:32,952-Speed 9410.08 samples/sec   Loss 0.7893   LearningRate 0.0000   Epoch: 37   Global Step: 65380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 20:37:59,199-Speed 9363.91 samples/sec   Loss 0.7856   LearningRate 0.0000   Epoch: 37   Global Step: 65390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 20:38:25,440-Speed 9365.84 samples/sec   Loss 0.7832   LearningRate 0.0000   Epoch: 37   Global Step: 65400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 20:38:55,935-Speed 8059.41 samples/sec   Loss 0.7857   LearningRate 0.0000   Epoch: 37   Global Step: 65410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 20:39:22,067-Speed 9405.78 samples/sec   Loss 0.7890   LearningRate 0.0000   Epoch: 37   Global Step: 65420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 20:39:48,235-Speed 9391.63 samples/sec   Loss 0.7810   LearningRate 0.0000   Epoch: 37   Global Step: 65430   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-06 20:40:14,419-Speed 9386.47 samples/sec   Loss 0.7862   LearningRate 0.0000   Epoch: 37   Global Step: 65440   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-06 20:40:40,633-Speed 9375.49 samples/sec   Loss 0.7879   LearningRate 0.0000   Epoch: 37   Global Step: 65450   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-06 20:41:06,786-Speed 9397.55 samples/sec   Loss 0.7844   LearningRate 0.0000   Epoch: 37   Global Step: 65460   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-06 20:41:32,895-Speed 9413.30 samples/sec   Loss 0.7964   LearningRate 0.0000   Epoch: 37   Global Step: 65470   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-06 20:41:59,103-Speed 9377.66 samples/sec   Loss 0.7942   LearningRate 0.0000   Epoch: 37   Global Step: 65480   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-06 20:42:25,329-Speed 9371.17 samples/sec   Loss 0.7852   LearningRate 0.0000   Epoch: 37   Global Step: 65490   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-06 20:42:51,407-Speed 9424.54 samples/sec   Loss 0.7809   LearningRate 0.0000   Epoch: 37   Global Step: 65500   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-06 20:43:17,442-Speed 9440.28 samples/sec   Loss 0.7938   LearningRate 0.0000   Epoch: 37   Global Step: 65510   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-06 20:43:43,462-Speed 9445.44 samples/sec   Loss 0.7881   LearningRate 0.0000   Epoch: 37   Global Step: 65520   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-06 20:44:09,535-Speed 9425.93 samples/sec   Loss 0.7821   LearningRate 0.0000   Epoch: 37   Global Step: 65530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 20:44:35,710-Speed 9389.67 samples/sec   Loss 0.7845   LearningRate 0.0000   Epoch: 37   Global Step: 65540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 20:45:01,837-Speed 9406.73 samples/sec   Loss 0.7877   LearningRate 0.0000   Epoch: 37   Global Step: 65550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 20:45:27,933-Speed 9417.93 samples/sec   Loss 0.7854   LearningRate 0.0000   Epoch: 37   Global Step: 65560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 20:45:53,979-Speed 9435.94 samples/sec   Loss 0.7850   LearningRate 0.0000   Epoch: 37   Global Step: 65570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 20:46:20,135-Speed 9396.56 samples/sec   Loss 0.7860   LearningRate 0.0000   Epoch: 37   Global Step: 65580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 20:46:46,171-Speed 9439.41 samples/sec   Loss 0.7775   LearningRate 0.0000   Epoch: 37   Global Step: 65590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 20:47:16,932-Speed 7989.67 samples/sec   Loss 0.7869   LearningRate 0.0000   Epoch: 37   Global Step: 65600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 20:47:43,063-Speed 9405.44 samples/sec   Loss 0.7890   LearningRate 0.0000   Epoch: 37   Global Step: 65610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 20:48:09,163-Speed 9416.29 samples/sec   Loss 0.7840   LearningRate 0.0000   Epoch: 37   Global Step: 65620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 20:48:35,186-Speed 9444.53 samples/sec   Loss 0.7877   LearningRate 0.0000   Epoch: 37   Global Step: 65630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 20:49:01,282-Speed 9417.96 samples/sec   Loss 0.7786   LearningRate 0.0000   Epoch: 37   Global Step: 65640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 20:49:27,365-Speed 9422.73 samples/sec   Loss 0.7885   LearningRate 0.0000   Epoch: 37   Global Step: 65650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 20:49:53,439-Speed 9426.11 samples/sec   Loss 0.7896   LearningRate 0.0000   Epoch: 37   Global Step: 65660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 20:50:19,571-Speed 9404.75 samples/sec   Loss 0.7853   LearningRate 0.0000   Epoch: 37   Global Step: 65670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 20:51:39,482-Speed 3075.46 samples/sec   Loss 0.7889   LearningRate 0.0000   Epoch: 38   Global Step: 65680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 20:52:05,431-Speed 9471.24 samples/sec   Loss 0.7885   LearningRate 0.0000   Epoch: 38   Global Step: 65690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 20:52:31,448-Speed 9446.73 samples/sec   Loss 0.7865   LearningRate 0.0000   Epoch: 38   Global Step: 65700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 20:52:57,590-Speed 9401.58 samples/sec   Loss 0.7827   LearningRate 0.0000   Epoch: 38   Global Step: 65710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-06 20:53:23,719-Speed 9406.13 samples/sec   Loss 0.7839   LearningRate 0.0000   Epoch: 38   Global Step: 65720   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-06 20:53:49,860-Speed 9401.72 samples/sec   Loss 0.7798   LearningRate 0.0000   Epoch: 38   Global Step: 65730   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-06 20:54:15,961-Speed 9416.41 samples/sec   Loss 0.7855   LearningRate 0.0000   Epoch: 38   Global Step: 65740   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-06 20:54:41,996-Speed 9439.90 samples/sec   Loss 0.7808   LearningRate 0.0000   Epoch: 38   Global Step: 65750   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-06 20:55:08,110-Speed 9411.57 samples/sec   Loss 0.7840   LearningRate 0.0000   Epoch: 38   Global Step: 65760   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-06 20:55:34,202-Speed 9419.20 samples/sec   Loss 0.7834   LearningRate 0.0000   Epoch: 38   Global Step: 65770   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-06 20:56:00,311-Speed 9413.41 samples/sec   Loss 0.7825   LearningRate 0.0000   Epoch: 38   Global Step: 65780   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-06 20:56:26,408-Speed 9417.47 samples/sec   Loss 0.7851   LearningRate 0.0000   Epoch: 38   Global Step: 65790   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-06 20:56:52,548-Speed 9402.28 samples/sec   Loss 0.7783   LearningRate 0.0000   Epoch: 38   Global Step: 65800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-06 20:57:18,655-Speed 9413.94 samples/sec   Loss 0.7790   LearningRate 0.0000   Epoch: 38   Global Step: 65810   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-06 20:57:44,801-Speed 9400.04 samples/sec   Loss 0.7845   LearningRate 0.0000   Epoch: 38   Global Step: 65820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 20:58:10,863-Speed 9430.09 samples/sec   Loss 0.7816   LearningRate 0.0000   Epoch: 38   Global Step: 65830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 20:58:36,968-Speed 9414.74 samples/sec   Loss 0.7824   LearningRate 0.0000   Epoch: 38   Global Step: 65840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 20:59:03,132-Speed 9393.19 samples/sec   Loss 0.7853   LearningRate 0.0000   Epoch: 38   Global Step: 65850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 20:59:29,191-Speed 9432.48 samples/sec   Loss 0.7845   LearningRate 0.0000   Epoch: 38   Global Step: 65860   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-06 20:59:55,244-Speed 9433.59 samples/sec   Loss 0.7760   LearningRate 0.0000   Epoch: 38   Global Step: 65870   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-06 21:00:21,348-Speed 9414.86 samples/sec   Loss 0.7870   LearningRate 0.0000   Epoch: 38   Global Step: 65880   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-06 21:00:47,502-Speed 9397.24 samples/sec   Loss 0.7784   LearningRate 0.0000   Epoch: 38   Global Step: 65890   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-06 21:01:13,602-Speed 9416.54 samples/sec   Loss 0.7849   LearningRate 0.0000   Epoch: 38   Global Step: 65900   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-06 21:01:39,746-Speed 9400.61 samples/sec   Loss 0.7849   LearningRate 0.0000   Epoch: 38   Global Step: 65910   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-06 21:02:05,978-Speed 9369.07 samples/sec   Loss 0.7831   LearningRate 0.0000   Epoch: 38   Global Step: 65920   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-06 21:02:32,087-Speed 9413.31 samples/sec   Loss 0.7832   LearningRate 0.0000   Epoch: 38   Global Step: 65930   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-06 21:02:58,235-Speed 9399.02 samples/sec   Loss 0.7830   LearningRate 0.0000   Epoch: 38   Global Step: 65940   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-06 21:03:24,418-Speed 9386.55 samples/sec   Loss 0.7907   LearningRate 0.0000   Epoch: 38   Global Step: 65950   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-06 21:03:50,494-Speed 9426.27 samples/sec   Loss 0.7768   LearningRate 0.0000   Epoch: 38   Global Step: 65960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:04:16,681-Speed 9386.27 samples/sec   Loss 0.7791   LearningRate 0.0000   Epoch: 38   Global Step: 65970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:04:42,901-Speed 9373.11 samples/sec   Loss 0.7836   LearningRate 0.0000   Epoch: 38   Global Step: 65980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:05:09,001-Speed 9416.39 samples/sec   Loss 0.7829   LearningRate 0.0000   Epoch: 38   Global Step: 65990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:05:35,104-Speed 9415.23 samples/sec   Loss 0.7852   LearningRate 0.0000   Epoch: 38   Global Step: 66000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:06:01,237-Speed 9405.87 samples/sec   Loss 0.7872   LearningRate 0.0000   Epoch: 38   Global Step: 66010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:06:27,440-Speed 9379.51 samples/sec   Loss 0.7762   LearningRate 0.0000   Epoch: 38   Global Step: 66020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:06:53,569-Speed 9406.22 samples/sec   Loss 0.7913   LearningRate 0.0000   Epoch: 38   Global Step: 66030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:07:19,700-Speed 9405.11 samples/sec   Loss 0.7887   LearningRate 0.0000   Epoch: 38   Global Step: 66040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:07:45,832-Speed 9405.11 samples/sec   Loss 0.7865   LearningRate 0.0000   Epoch: 38   Global Step: 66050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:08:11,927-Speed 9418.49 samples/sec   Loss 0.7851   LearningRate 0.0000   Epoch: 38   Global Step: 66060   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-03-06 21:08:38,023-Speed 9417.72 samples/sec   Loss 0.7838   LearningRate 0.0000   Epoch: 38   Global Step: 66070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:09:04,099-Speed 9425.28 samples/sec   Loss 0.7866   LearningRate 0.0000   Epoch: 38   Global Step: 66080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:09:30,237-Speed 9402.56 samples/sec   Loss 0.7825   LearningRate 0.0000   Epoch: 38   Global Step: 66090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:09:56,310-Speed 9426.55 samples/sec   Loss 0.7845   LearningRate 0.0000   Epoch: 38   Global Step: 66100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:10:22,488-Speed 9388.29 samples/sec   Loss 0.7869   LearningRate 0.0000   Epoch: 38   Global Step: 66110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:10:48,619-Speed 9405.28 samples/sec   Loss 0.7876   LearningRate 0.0000   Epoch: 38   Global Step: 66120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:11:14,816-Speed 9381.79 samples/sec   Loss 0.7894   LearningRate 0.0000   Epoch: 38   Global Step: 66130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:11:40,934-Speed 9409.85 samples/sec   Loss 0.7866   LearningRate 0.0000   Epoch: 38   Global Step: 66140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:12:07,172-Speed 9367.03 samples/sec   Loss 0.7811   LearningRate 0.0000   Epoch: 38   Global Step: 66150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:12:33,330-Speed 9395.62 samples/sec   Loss 0.7864   LearningRate 0.0000   Epoch: 38   Global Step: 66160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:12:59,487-Speed 9395.80 samples/sec   Loss 0.7838   LearningRate 0.0000   Epoch: 38   Global Step: 66170   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-03-06 21:13:25,572-Speed 9421.94 samples/sec   Loss 0.7890   LearningRate 0.0000   Epoch: 38   Global Step: 66180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:13:51,715-Speed 9401.15 samples/sec   Loss 0.7822   LearningRate 0.0000   Epoch: 38   Global Step: 66190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:14:17,879-Speed 9393.66 samples/sec   Loss 0.7825   LearningRate 0.0000   Epoch: 38   Global Step: 66200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:14:43,977-Speed 9416.94 samples/sec   Loss 0.7815   LearningRate 0.0000   Epoch: 38   Global Step: 66210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:15:10,148-Speed 9390.95 samples/sec   Loss 0.7775   LearningRate 0.0000   Epoch: 38   Global Step: 66220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:15:36,217-Speed 9427.56 samples/sec   Loss 0.7803   LearningRate 0.0000   Epoch: 38   Global Step: 66230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:16:02,297-Speed 9423.81 samples/sec   Loss 0.7837   LearningRate 0.0000   Epoch: 38   Global Step: 66240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:16:28,526-Speed 9370.35 samples/sec   Loss 0.7893   LearningRate 0.0000   Epoch: 38   Global Step: 66250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:16:54,570-Speed 9436.42 samples/sec   Loss 0.7781   LearningRate 0.0000   Epoch: 38   Global Step: 66260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:17:20,660-Speed 9420.44 samples/sec   Loss 0.7821   LearningRate 0.0000   Epoch: 38   Global Step: 66270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:17:46,764-Speed 9414.93 samples/sec   Loss 0.7848   LearningRate 0.0000   Epoch: 38   Global Step: 66280   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-03-06 21:18:12,826-Speed 9430.34 samples/sec   Loss 0.7869   LearningRate 0.0000   Epoch: 38   Global Step: 66290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:18:38,956-Speed 9405.59 samples/sec   Loss 0.7790   LearningRate 0.0000   Epoch: 38   Global Step: 66300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:19:05,090-Speed 9404.25 samples/sec   Loss 0.7831   LearningRate 0.0000   Epoch: 38   Global Step: 66310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:19:31,172-Speed 9423.17 samples/sec   Loss 0.7807   LearningRate 0.0000   Epoch: 38   Global Step: 66320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:19:57,351-Speed 9387.88 samples/sec   Loss 0.7860   LearningRate 0.0000   Epoch: 38   Global Step: 66330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:20:23,512-Speed 9394.46 samples/sec   Loss 0.7803   LearningRate 0.0000   Epoch: 38   Global Step: 66340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:20:49,697-Speed 9386.22 samples/sec   Loss 0.7837   LearningRate 0.0000   Epoch: 38   Global Step: 66350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:21:15,820-Speed 9408.16 samples/sec   Loss 0.7815   LearningRate 0.0000   Epoch: 38   Global Step: 66360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:21:41,943-Speed 9408.43 samples/sec   Loss 0.7784   LearningRate 0.0000   Epoch: 38   Global Step: 66370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:22:08,155-Speed 9376.38 samples/sec   Loss 0.7897   LearningRate 0.0000   Epoch: 38   Global Step: 66380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:22:34,280-Speed 9407.36 samples/sec   Loss 0.7881   LearningRate 0.0000   Epoch: 38   Global Step: 66390   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-03-06 21:23:00,449-Speed 9391.75 samples/sec   Loss 0.7779   LearningRate 0.0000   Epoch: 38   Global Step: 66400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:23:26,599-Speed 9398.52 samples/sec   Loss 0.7707   LearningRate 0.0000   Epoch: 38   Global Step: 66410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:23:52,750-Speed 9398.26 samples/sec   Loss 0.7883   LearningRate 0.0000   Epoch: 38   Global Step: 66420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:24:18,869-Speed 9409.33 samples/sec   Loss 0.7839   LearningRate 0.0000   Epoch: 38   Global Step: 66430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:24:45,052-Speed 9386.72 samples/sec   Loss 0.7832   LearningRate 0.0000   Epoch: 38   Global Step: 66440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:25:11,223-Speed 9391.09 samples/sec   Loss 0.7798   LearningRate 0.0000   Epoch: 38   Global Step: 66450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:25:37,360-Speed 9403.57 samples/sec   Loss 0.7826   LearningRate 0.0000   Epoch: 38   Global Step: 66460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:26:03,473-Speed 9411.68 samples/sec   Loss 0.7834   LearningRate 0.0000   Epoch: 38   Global Step: 66470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:26:29,583-Speed 9412.87 samples/sec   Loss 0.7842   LearningRate 0.0000   Epoch: 38   Global Step: 66480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:26:55,728-Speed 9400.58 samples/sec   Loss 0.7805   LearningRate 0.0000   Epoch: 38   Global Step: 66490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:27:21,843-Speed 9411.34 samples/sec   Loss 0.7829   LearningRate 0.0000   Epoch: 38   Global Step: 66500   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-03-06 21:27:47,913-Speed 9427.26 samples/sec   Loss 0.7811   LearningRate 0.0000   Epoch: 38   Global Step: 66510   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-03-06 21:28:14,048-Speed 9403.78 samples/sec   Loss 0.7794   LearningRate 0.0000   Epoch: 38   Global Step: 66520   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-03-06 21:28:40,191-Speed 9401.16 samples/sec   Loss 0.7803   LearningRate 0.0000   Epoch: 38   Global Step: 66530   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-03-06 21:29:06,246-Speed 9432.73 samples/sec   Loss 0.7758   LearningRate 0.0000   Epoch: 38   Global Step: 66540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:29:32,341-Speed 9418.28 samples/sec   Loss 0.7777   LearningRate 0.0000   Epoch: 38   Global Step: 66550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:29:58,442-Speed 9416.13 samples/sec   Loss 0.7870   LearningRate 0.0000   Epoch: 38   Global Step: 66560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:30:24,544-Speed 9415.83 samples/sec   Loss 0.7860   LearningRate 0.0000   Epoch: 38   Global Step: 66570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:30:50,630-Speed 9421.53 samples/sec   Loss 0.7737   LearningRate 0.0000   Epoch: 38   Global Step: 66580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:31:16,718-Speed 9420.80 samples/sec   Loss 0.7820   LearningRate 0.0000   Epoch: 38   Global Step: 66590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:31:42,950-Speed 9369.19 samples/sec   Loss 0.7768   LearningRate 0.0000   Epoch: 38   Global Step: 66600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:32:09,022-Speed 9426.75 samples/sec   Loss 0.7733   LearningRate 0.0000   Epoch: 38   Global Step: 66610   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-06 21:32:35,090-Speed 9428.04 samples/sec   Loss 0.7824   LearningRate 0.0000   Epoch: 38   Global Step: 66620   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-06 21:33:01,270-Speed 9387.68 samples/sec   Loss 0.7806   LearningRate 0.0000   Epoch: 38   Global Step: 66630   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-06 21:33:27,327-Speed 9431.95 samples/sec   Loss 0.7764   LearningRate 0.0000   Epoch: 38   Global Step: 66640   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-06 21:33:53,386-Speed 9431.45 samples/sec   Loss 0.7800   LearningRate 0.0000   Epoch: 38   Global Step: 66650   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-06 21:34:19,499-Speed 9411.71 samples/sec   Loss 0.7798   LearningRate 0.0000   Epoch: 38   Global Step: 66660   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-06 21:34:45,632-Speed 9404.56 samples/sec   Loss 0.7847   LearningRate 0.0000   Epoch: 38   Global Step: 66670   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-06 21:35:11,827-Speed 9382.23 samples/sec   Loss 0.7789   LearningRate 0.0000   Epoch: 38   Global Step: 66680   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-06 21:35:37,928-Speed 9417.27 samples/sec   Loss 0.7847   LearningRate 0.0000   Epoch: 38   Global Step: 66690   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-06 21:36:04,000-Speed 9426.69 samples/sec   Loss 0.7776   LearningRate 0.0000   Epoch: 38   Global Step: 66700   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-06 21:36:30,127-Speed 9407.04 samples/sec   Loss 0.7778   LearningRate 0.0000   Epoch: 38   Global Step: 66710   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-06 21:36:56,220-Speed 9419.14 samples/sec   Loss 0.7812   LearningRate 0.0000   Epoch: 38   Global Step: 66720   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-06 21:37:22,348-Speed 9406.25 samples/sec   Loss 0.7813   LearningRate 0.0000   Epoch: 38   Global Step: 66730   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-06 21:37:48,400-Speed 9434.34 samples/sec   Loss 0.7834   LearningRate 0.0000   Epoch: 38   Global Step: 66740   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-06 21:38:14,476-Speed 9424.93 samples/sec   Loss 0.7789   LearningRate 0.0000   Epoch: 38   Global Step: 66750   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-06 21:38:40,481-Speed 9450.96 samples/sec   Loss 0.7734   LearningRate 0.0000   Epoch: 38   Global Step: 66760   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-06 21:39:06,565-Speed 9422.51 samples/sec   Loss 0.7821   LearningRate 0.0000   Epoch: 38   Global Step: 66770   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-06 21:39:32,704-Speed 9402.39 samples/sec   Loss 0.7813   LearningRate 0.0000   Epoch: 38   Global Step: 66780   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-06 21:40:01,331-Speed 8585.12 samples/sec   Loss 0.7786   LearningRate 0.0000   Epoch: 38   Global Step: 66790   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-06 21:40:27,455-Speed 9407.88 samples/sec   Loss 0.7828   LearningRate 0.0000   Epoch: 38   Global Step: 66800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-06 21:40:53,498-Speed 9437.34 samples/sec   Loss 0.7788   LearningRate 0.0000   Epoch: 38   Global Step: 66810   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-06 21:41:19,561-Speed 9429.97 samples/sec   Loss 0.7775   LearningRate 0.0000   Epoch: 38   Global Step: 66820   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-06 21:41:45,627-Speed 9428.71 samples/sec   Loss 0.7794   LearningRate 0.0000   Epoch: 38   Global Step: 66830   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-06 21:42:11,765-Speed 9402.94 samples/sec   Loss 0.7753   LearningRate 0.0000   Epoch: 38   Global Step: 66840   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-06 21:42:37,854-Speed 9420.19 samples/sec   Loss 0.7776   LearningRate 0.0000   Epoch: 38   Global Step: 66850   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-06 21:43:03,941-Speed 9421.31 samples/sec   Loss 0.7820   LearningRate 0.0000   Epoch: 38   Global Step: 66860   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-06 21:43:29,949-Speed 9450.08 samples/sec   Loss 0.7814   LearningRate 0.0000   Epoch: 38   Global Step: 66870   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-06 21:43:55,955-Speed 9450.20 samples/sec   Loss 0.7849   LearningRate 0.0000   Epoch: 38   Global Step: 66880   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-06 21:44:22,126-Speed 9391.18 samples/sec   Loss 0.7794   LearningRate 0.0000   Epoch: 38   Global Step: 66890   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-06 21:44:48,196-Speed 9427.48 samples/sec   Loss 0.7829   LearningRate 0.0000   Epoch: 38   Global Step: 66900   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-06 21:45:14,348-Speed 9397.60 samples/sec   Loss 0.7785   LearningRate 0.0000   Epoch: 38   Global Step: 66910   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-06 21:45:40,379-Speed 9441.54 samples/sec   Loss 0.7776   LearningRate 0.0000   Epoch: 38   Global Step: 66920   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-06 21:46:06,626-Speed 9363.86 samples/sec   Loss 0.7793   LearningRate 0.0000   Epoch: 38   Global Step: 66930   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-06 21:46:32,797-Speed 9390.90 samples/sec   Loss 0.7725   LearningRate 0.0000   Epoch: 38   Global Step: 66940   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-06 21:46:58,959-Speed 9393.95 samples/sec   Loss 0.7787   LearningRate 0.0000   Epoch: 38   Global Step: 66950   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-06 21:47:25,029-Speed 9427.56 samples/sec   Loss 0.7762   LearningRate 0.0000   Epoch: 38   Global Step: 66960   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-06 21:47:51,201-Speed 9390.49 samples/sec   Loss 0.7840   LearningRate 0.0000   Epoch: 38   Global Step: 66970   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-06 21:48:17,294-Speed 9419.09 samples/sec   Loss 0.7798   LearningRate 0.0000   Epoch: 38   Global Step: 66980   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-06 21:48:43,436-Speed 9401.34 samples/sec   Loss 0.7802   LearningRate 0.0000   Epoch: 38   Global Step: 66990   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-06 21:49:09,593-Speed 9396.15 samples/sec   Loss 0.7864   LearningRate 0.0000   Epoch: 38   Global Step: 67000   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-06 21:49:35,689-Speed 9417.75 samples/sec   Loss 0.7841   LearningRate 0.0000   Epoch: 38   Global Step: 67010   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-06 21:50:01,761-Speed 9426.75 samples/sec   Loss 0.7893   LearningRate 0.0000   Epoch: 38   Global Step: 67020   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-06 21:50:27,835-Speed 9426.07 samples/sec   Loss 0.7781   LearningRate 0.0000   Epoch: 38   Global Step: 67030   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-06 21:50:53,931-Speed 9417.57 samples/sec   Loss 0.7838   LearningRate 0.0000   Epoch: 38   Global Step: 67040   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-06 21:51:20,029-Speed 9417.45 samples/sec   Loss 0.7818   LearningRate 0.0000   Epoch: 38   Global Step: 67050   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-06 21:51:46,111-Speed 9422.87 samples/sec   Loss 0.7813   LearningRate 0.0000   Epoch: 38   Global Step: 67060   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-06 21:52:12,206-Speed 9418.69 samples/sec   Loss 0.7765   LearningRate 0.0000   Epoch: 38   Global Step: 67070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:52:38,336-Speed 9405.86 samples/sec   Loss 0.7841   LearningRate 0.0000   Epoch: 38   Global Step: 67080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-06 21:53:04,472-Speed 9403.54 samples/sec   Loss 0.7743   LearningRate 0.0000   Epoch: 38   Global Step: 67090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-06 21:53:30,613-Speed 9401.78 samples/sec   Loss 0.7800   LearningRate 0.0000   Epoch: 38   Global Step: 67100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-06 21:53:56,696-Speed 9422.83 samples/sec   Loss 0.7791   LearningRate 0.0000   Epoch: 38   Global Step: 67110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-06 21:54:22,710-Speed 9447.80 samples/sec   Loss 0.7749   LearningRate 0.0000   Epoch: 38   Global Step: 67120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-06 21:54:48,778-Speed 9428.10 samples/sec   Loss 0.7774   LearningRate 0.0000   Epoch: 38   Global Step: 67130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-06 21:55:14,913-Speed 9403.96 samples/sec   Loss 0.7735   LearningRate 0.0000   Epoch: 38   Global Step: 67140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-06 21:55:41,061-Speed 9399.04 samples/sec   Loss 0.7779   LearningRate 0.0000   Epoch: 38   Global Step: 67150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-06 21:56:07,154-Speed 9418.89 samples/sec   Loss 0.7815   LearningRate 0.0000   Epoch: 38   Global Step: 67160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-06 21:56:33,229-Speed 9425.61 samples/sec   Loss 0.7769   LearningRate 0.0000   Epoch: 38   Global Step: 67170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-06 21:56:59,304-Speed 9425.47 samples/sec   Loss 0.7848   LearningRate 0.0000   Epoch: 38   Global Step: 67180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-06 21:57:25,372-Speed 9428.23 samples/sec   Loss 0.7802   LearningRate 0.0000   Epoch: 38   Global Step: 67190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 21:57:51,450-Speed 9424.33 samples/sec   Loss 0.7811   LearningRate 0.0000   Epoch: 38   Global Step: 67200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 21:58:17,604-Speed 9396.91 samples/sec   Loss 0.7836   LearningRate 0.0000   Epoch: 38   Global Step: 67210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 21:58:43,733-Speed 9406.35 samples/sec   Loss 0.7757   LearningRate 0.0000   Epoch: 38   Global Step: 67220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 21:59:09,896-Speed 9393.59 samples/sec   Loss 0.7793   LearningRate 0.0000   Epoch: 38   Global Step: 67230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 21:59:35,879-Speed 9458.94 samples/sec   Loss 0.7816   LearningRate 0.0000   Epoch: 38   Global Step: 67240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:00:01,892-Speed 9447.95 samples/sec   Loss 0.7816   LearningRate 0.0000   Epoch: 38   Global Step: 67250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:00:27,931-Speed 9438.46 samples/sec   Loss 0.7764   LearningRate 0.0000   Epoch: 38   Global Step: 67260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:00:53,971-Speed 9438.85 samples/sec   Loss 0.7764   LearningRate 0.0000   Epoch: 38   Global Step: 67270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:01:19,998-Speed 9443.10 samples/sec   Loss 0.7856   LearningRate 0.0000   Epoch: 38   Global Step: 67280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:01:46,002-Speed 9451.09 samples/sec   Loss 0.7784   LearningRate 0.0000   Epoch: 38   Global Step: 67290   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-03-06 22:02:12,059-Speed 9432.16 samples/sec   Loss 0.7811   LearningRate 0.0000   Epoch: 38   Global Step: 67300   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-03-06 22:02:38,130-Speed 9427.10 samples/sec   Loss 0.7780   LearningRate 0.0000   Epoch: 38   Global Step: 67310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:03:04,119-Speed 9456.58 samples/sec   Loss 0.7749   LearningRate 0.0000   Epoch: 38   Global Step: 67320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:03:30,243-Speed 9407.96 samples/sec   Loss 0.7799   LearningRate 0.0000   Epoch: 38   Global Step: 67330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:03:56,410-Speed 9392.35 samples/sec   Loss 0.7844   LearningRate 0.0000   Epoch: 38   Global Step: 67340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:04:22,474-Speed 9429.66 samples/sec   Loss 0.7814   LearningRate 0.0000   Epoch: 38   Global Step: 67350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:04:48,607-Speed 9404.47 samples/sec   Loss 0.7828   LearningRate 0.0000   Epoch: 38   Global Step: 67360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:05:14,700-Speed 9418.91 samples/sec   Loss 0.7799   LearningRate 0.0000   Epoch: 38   Global Step: 67370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:05:40,832-Speed 9404.87 samples/sec   Loss 0.7804   LearningRate 0.0000   Epoch: 38   Global Step: 67380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:06:06,957-Speed 9407.49 samples/sec   Loss 0.7752   LearningRate 0.0000   Epoch: 38   Global Step: 67390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:06:33,183-Speed 9371.29 samples/sec   Loss 0.7840   LearningRate 0.0000   Epoch: 38   Global Step: 67400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:07:53,152-Speed 3073.25 samples/sec   Loss 0.7821   LearningRate 0.0000   Epoch: 39   Global Step: 67410   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-03-06 22:08:19,198-Speed 9435.88 samples/sec   Loss 0.7812   LearningRate 0.0000   Epoch: 39   Global Step: 67420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:08:45,251-Speed 9434.32 samples/sec   Loss 0.7802   LearningRate 0.0000   Epoch: 39   Global Step: 67430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-06 22:09:11,257-Speed 9450.23 samples/sec   Loss 0.7756   LearningRate 0.0000   Epoch: 39   Global Step: 67440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-06 22:09:37,276-Speed 9446.06 samples/sec   Loss 0.7787   LearningRate 0.0000   Epoch: 39   Global Step: 67450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-06 22:10:03,392-Speed 9410.64 samples/sec   Loss 0.7786   LearningRate 0.0000   Epoch: 39   Global Step: 67460   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-06 22:10:29,445-Speed 9433.27 samples/sec   Loss 0.7805   LearningRate 0.0000   Epoch: 39   Global Step: 67470   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-06 22:10:55,650-Speed 9378.99 samples/sec   Loss 0.7755   LearningRate 0.0000   Epoch: 39   Global Step: 67480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-06 22:11:21,724-Speed 9425.97 samples/sec   Loss 0.7771   LearningRate 0.0000   Epoch: 39   Global Step: 67490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-06 22:11:47,767-Speed 9436.90 samples/sec   Loss 0.7753   LearningRate 0.0000   Epoch: 39   Global Step: 67500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-06 22:12:13,852-Speed 9422.14 samples/sec   Loss 0.7788   LearningRate 0.0000   Epoch: 39   Global Step: 67510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-06 22:12:39,995-Speed 9400.92 samples/sec   Loss 0.7739   LearningRate 0.0000   Epoch: 39   Global Step: 67520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-06 22:13:06,033-Speed 9438.99 samples/sec   Loss 0.7814   LearningRate 0.0000   Epoch: 39   Global Step: 67530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:13:32,149-Speed 9410.56 samples/sec   Loss 0.7789   LearningRate 0.0000   Epoch: 39   Global Step: 67540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:13:58,245-Speed 9418.60 samples/sec   Loss 0.7732   LearningRate 0.0000   Epoch: 39   Global Step: 67550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:14:24,349-Speed 9414.95 samples/sec   Loss 0.7766   LearningRate 0.0000   Epoch: 39   Global Step: 67560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:14:50,455-Speed 9414.28 samples/sec   Loss 0.7783   LearningRate 0.0000   Epoch: 39   Global Step: 67570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:15:16,507-Speed 9433.83 samples/sec   Loss 0.7768   LearningRate 0.0000   Epoch: 39   Global Step: 67580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:15:42,653-Speed 9400.00 samples/sec   Loss 0.7751   LearningRate 0.0000   Epoch: 39   Global Step: 67590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-06 22:16:08,787-Speed 9404.39 samples/sec   Loss 0.7759   LearningRate 0.0000   Epoch: 39   Global Step: 67600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-06 22:16:34,910-Speed 9408.18 samples/sec   Loss 0.7777   LearningRate 0.0000   Epoch: 39   Global Step: 67610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-06 22:17:01,104-Speed 9382.70 samples/sec   Loss 0.7779   LearningRate 0.0000   Epoch: 39   Global Step: 67620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-06 22:17:27,175-Speed 9427.23 samples/sec   Loss 0.7806   LearningRate 0.0000   Epoch: 39   Global Step: 67630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-06 22:17:53,394-Speed 9374.06 samples/sec   Loss 0.7800   LearningRate 0.0000   Epoch: 39   Global Step: 67640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-06 22:18:19,525-Speed 9405.04 samples/sec   Loss 0.7726   LearningRate 0.0000   Epoch: 39   Global Step: 67650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-06 22:18:45,612-Speed 9421.39 samples/sec   Loss 0.7737   LearningRate 0.0000   Epoch: 39   Global Step: 67660   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-06 22:19:11,731-Speed 9409.38 samples/sec   Loss 0.7816   LearningRate 0.0000   Epoch: 39   Global Step: 67670   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-06 22:19:37,931-Speed 9380.55 samples/sec   Loss 0.7766   LearningRate 0.0000   Epoch: 39   Global Step: 67680   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-06 22:20:04,045-Speed 9411.59 samples/sec   Loss 0.7769   LearningRate 0.0000   Epoch: 39   Global Step: 67690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:20:30,223-Speed 9388.34 samples/sec   Loss 0.7797   LearningRate 0.0000   Epoch: 39   Global Step: 67700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:20:56,388-Speed 9393.12 samples/sec   Loss 0.7898   LearningRate 0.0000   Epoch: 39   Global Step: 67710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:21:22,527-Speed 9402.60 samples/sec   Loss 0.7765   LearningRate 0.0000   Epoch: 39   Global Step: 67720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:21:48,740-Speed 9375.89 samples/sec   Loss 0.7763   LearningRate 0.0000   Epoch: 39   Global Step: 67730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:22:14,794-Speed 9433.27 samples/sec   Loss 0.7797   LearningRate 0.0000   Epoch: 39   Global Step: 67740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:22:40,869-Speed 9425.40 samples/sec   Loss 0.7797   LearningRate 0.0000   Epoch: 39   Global Step: 67750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-06 22:23:06,966-Speed 9417.60 samples/sec   Loss 0.7799   LearningRate 0.0000   Epoch: 39   Global Step: 67760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-06 22:23:33,039-Speed 9426.34 samples/sec   Loss 0.7689   LearningRate 0.0000   Epoch: 39   Global Step: 67770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-06 22:23:59,164-Speed 9407.52 samples/sec   Loss 0.7786   LearningRate 0.0000   Epoch: 39   Global Step: 67780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-06 22:24:25,332-Speed 9391.98 samples/sec   Loss 0.7796   LearningRate 0.0000   Epoch: 39   Global Step: 67790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-06 22:24:51,464-Speed 9404.97 samples/sec   Loss 0.7832   LearningRate 0.0000   Epoch: 39   Global Step: 67800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-06 22:25:17,583-Speed 9409.88 samples/sec   Loss 0.7822   LearningRate 0.0000   Epoch: 39   Global Step: 67810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-06 22:25:43,658-Speed 9425.34 samples/sec   Loss 0.7790   LearningRate 0.0000   Epoch: 39   Global Step: 67820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-06 22:26:09,751-Speed 9419.38 samples/sec   Loss 0.7784   LearningRate 0.0000   Epoch: 39   Global Step: 67830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-06 22:26:35,886-Speed 9403.68 samples/sec   Loss 0.7819   LearningRate 0.0000   Epoch: 39   Global Step: 67840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-06 22:27:02,027-Speed 9401.68 samples/sec   Loss 0.7761   LearningRate 0.0000   Epoch: 39   Global Step: 67850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:27:28,113-Speed 9421.62 samples/sec   Loss 0.7767   LearningRate 0.0000   Epoch: 39   Global Step: 67860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:27:54,163-Speed 9434.78 samples/sec   Loss 0.7818   LearningRate 0.0000   Epoch: 39   Global Step: 67870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:28:20,344-Speed 9387.40 samples/sec   Loss 0.7796   LearningRate 0.0000   Epoch: 39   Global Step: 67880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:28:46,440-Speed 9418.05 samples/sec   Loss 0.7791   LearningRate 0.0000   Epoch: 39   Global Step: 67890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:29:12,560-Speed 9409.55 samples/sec   Loss 0.7783   LearningRate 0.0000   Epoch: 39   Global Step: 67900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:29:38,742-Speed 9386.77 samples/sec   Loss 0.7775   LearningRate 0.0000   Epoch: 39   Global Step: 67910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:30:04,803-Speed 9430.87 samples/sec   Loss 0.7839   LearningRate 0.0000   Epoch: 39   Global Step: 67920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:30:30,938-Speed 9404.02 samples/sec   Loss 0.7800   LearningRate 0.0000   Epoch: 39   Global Step: 67930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:30:57,043-Speed 9414.93 samples/sec   Loss 0.7827   LearningRate 0.0000   Epoch: 39   Global Step: 67940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:31:23,132-Speed 9420.48 samples/sec   Loss 0.7793   LearningRate 0.0000   Epoch: 39   Global Step: 67950   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-03-06 22:31:49,241-Speed 9413.27 samples/sec   Loss 0.7835   LearningRate 0.0000   Epoch: 39   Global Step: 67960   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-03-06 22:32:15,324-Speed 9422.45 samples/sec   Loss 0.7780   LearningRate 0.0000   Epoch: 39   Global Step: 67970   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-03-06 22:32:41,508-Speed 9386.34 samples/sec   Loss 0.7759   LearningRate 0.0000   Epoch: 39   Global Step: 67980   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-03-06 22:33:07,602-Speed 9419.12 samples/sec   Loss 0.7728   LearningRate 0.0000   Epoch: 39   Global Step: 67990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:33:33,691-Speed 9420.45 samples/sec   Loss 0.7779   LearningRate 0.0000   Epoch: 39   Global Step: 68000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:33:59,824-Speed 9404.60 samples/sec   Loss 0.7783   LearningRate 0.0000   Epoch: 39   Global Step: 68010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:34:25,925-Speed 9416.16 samples/sec   Loss 0.7770   LearningRate 0.0000   Epoch: 39   Global Step: 68020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:34:53,373-Speed 8953.98 samples/sec   Loss 0.7715   LearningRate 0.0000   Epoch: 39   Global Step: 68030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:35:19,448-Speed 9425.63 samples/sec   Loss 0.7732   LearningRate 0.0000   Epoch: 39   Global Step: 68040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:35:45,574-Speed 9406.82 samples/sec   Loss 0.7850   LearningRate 0.0000   Epoch: 39   Global Step: 68050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:36:11,706-Speed 9404.87 samples/sec   Loss 0.7795   LearningRate 0.0000   Epoch: 39   Global Step: 68060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:36:37,883-Speed 9389.65 samples/sec   Loss 0.7744   LearningRate 0.0000   Epoch: 39   Global Step: 68070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:37:04,048-Speed 9393.18 samples/sec   Loss 0.7796   LearningRate 0.0000   Epoch: 39   Global Step: 68080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:37:30,158-Speed 9413.12 samples/sec   Loss 0.7820   LearningRate 0.0000   Epoch: 39   Global Step: 68090   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-03-06 22:37:56,299-Speed 9401.71 samples/sec   Loss 0.7751   LearningRate 0.0000   Epoch: 39   Global Step: 68100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:38:22,553-Speed 9361.32 samples/sec   Loss 0.7774   LearningRate 0.0000   Epoch: 39   Global Step: 68110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:38:48,800-Speed 9363.63 samples/sec   Loss 0.7776   LearningRate 0.0000   Epoch: 39   Global Step: 68120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:39:15,064-Speed 9357.73 samples/sec   Loss 0.7767   LearningRate 0.0000   Epoch: 39   Global Step: 68130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:39:41,298-Speed 9368.40 samples/sec   Loss 0.7817   LearningRate 0.0000   Epoch: 39   Global Step: 68140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:40:07,438-Speed 9402.25 samples/sec   Loss 0.7826   LearningRate 0.0000   Epoch: 39   Global Step: 68150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:40:33,570-Speed 9404.75 samples/sec   Loss 0.7818   LearningRate 0.0000   Epoch: 39   Global Step: 68160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:40:59,720-Speed 9398.34 samples/sec   Loss 0.7719   LearningRate 0.0000   Epoch: 39   Global Step: 68170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:41:25,905-Speed 9385.90 samples/sec   Loss 0.7721   LearningRate 0.0000   Epoch: 39   Global Step: 68180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:41:52,100-Speed 9382.97 samples/sec   Loss 0.7745   LearningRate 0.0000   Epoch: 39   Global Step: 68190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:42:18,247-Speed 9399.20 samples/sec   Loss 0.7795   LearningRate 0.0000   Epoch: 39   Global Step: 68200   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-03-06 22:42:44,438-Speed 9384.04 samples/sec   Loss 0.7773   LearningRate 0.0000   Epoch: 39   Global Step: 68210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:43:10,588-Speed 9398.74 samples/sec   Loss 0.7753   LearningRate 0.0000   Epoch: 39   Global Step: 68220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:43:36,687-Speed 9416.78 samples/sec   Loss 0.7768   LearningRate 0.0000   Epoch: 39   Global Step: 68230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:44:02,871-Speed 9386.43 samples/sec   Loss 0.7705   LearningRate 0.0000   Epoch: 39   Global Step: 68240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:44:28,958-Speed 9420.97 samples/sec   Loss 0.7799   LearningRate 0.0000   Epoch: 39   Global Step: 68250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:44:55,091-Speed 9405.04 samples/sec   Loss 0.7744   LearningRate 0.0000   Epoch: 39   Global Step: 68260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:45:21,256-Speed 9393.24 samples/sec   Loss 0.7797   LearningRate 0.0000   Epoch: 39   Global Step: 68270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:45:47,435-Speed 9387.70 samples/sec   Loss 0.7779   LearningRate 0.0000   Epoch: 39   Global Step: 68280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:46:13,583-Speed 9399.46 samples/sec   Loss 0.7747   LearningRate 0.0000   Epoch: 39   Global Step: 68290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:46:39,803-Speed 9373.29 samples/sec   Loss 0.7720   LearningRate 0.0000   Epoch: 39   Global Step: 68300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:47:05,928-Speed 9407.79 samples/sec   Loss 0.7758   LearningRate 0.0000   Epoch: 39   Global Step: 68310   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-03-06 22:47:31,999-Speed 9426.70 samples/sec   Loss 0.7790   LearningRate 0.0000   Epoch: 39   Global Step: 68320   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-03-06 22:47:58,066-Speed 9428.44 samples/sec   Loss 0.7799   LearningRate 0.0000   Epoch: 39   Global Step: 68330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:48:24,260-Speed 9382.84 samples/sec   Loss 0.7767   LearningRate 0.0000   Epoch: 39   Global Step: 68340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:48:50,420-Speed 9394.66 samples/sec   Loss 0.7771   LearningRate 0.0000   Epoch: 39   Global Step: 68350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:49:16,481-Speed 9430.61 samples/sec   Loss 0.7763   LearningRate 0.0000   Epoch: 39   Global Step: 68360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:49:42,635-Speed 9397.39 samples/sec   Loss 0.7800   LearningRate 0.0000   Epoch: 39   Global Step: 68370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:50:08,798-Speed 9393.74 samples/sec   Loss 0.7718   LearningRate 0.0000   Epoch: 39   Global Step: 68380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:50:34,968-Speed 9391.08 samples/sec   Loss 0.7763   LearningRate 0.0000   Epoch: 39   Global Step: 68390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:51:01,096-Speed 9406.30 samples/sec   Loss 0.7804   LearningRate 0.0000   Epoch: 39   Global Step: 68400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:51:29,739-Speed 8581.89 samples/sec   Loss 0.7734   LearningRate 0.0000   Epoch: 39   Global Step: 68410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:51:55,895-Speed 9396.44 samples/sec   Loss 0.7779   LearningRate 0.0000   Epoch: 39   Global Step: 68420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:52:22,002-Speed 9414.20 samples/sec   Loss 0.7761   LearningRate 0.0000   Epoch: 39   Global Step: 68430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-06 22:52:48,183-Speed 9387.44 samples/sec   Loss 0.7789   LearningRate 0.0000   Epoch: 39   Global Step: 68440   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-03-06 22:53:14,281-Speed 9417.33 samples/sec   Loss 0.7794   LearningRate 0.0000   Epoch: 39   Global Step: 68450   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-03-06 22:53:40,506-Speed 9371.53 samples/sec   Loss 0.7740   LearningRate 0.0000   Epoch: 39   Global Step: 68460   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-03-06 22:54:06,678-Speed 9390.53 samples/sec   Loss 0.7745   LearningRate 0.0000   Epoch: 39   Global Step: 68470   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-03-06 22:54:32,772-Speed 9418.94 samples/sec   Loss 0.7724   LearningRate 0.0000   Epoch: 39   Global Step: 68480   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-03-06 22:54:58,973-Speed 9380.28 samples/sec   Loss 0.7744   LearningRate 0.0000   Epoch: 39   Global Step: 68490   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-03-06 22:55:25,010-Speed 9439.27 samples/sec   Loss 0.7840   LearningRate 0.0000   Epoch: 39   Global Step: 68500   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-03-06 22:55:51,048-Speed 9439.30 samples/sec   Loss 0.7712   LearningRate 0.0000   Epoch: 39   Global Step: 68510   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-06 22:56:17,302-Speed 9361.25 samples/sec   Loss 0.7696   LearningRate 0.0000   Epoch: 39   Global Step: 68520   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-06 22:56:43,344-Speed 9437.75 samples/sec   Loss 0.7732   LearningRate 0.0000   Epoch: 39   Global Step: 68530   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-06 22:57:09,466-Speed 9408.57 samples/sec   Loss 0.7796   LearningRate 0.0000   Epoch: 39   Global Step: 68540   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-06 22:57:35,624-Speed 9395.63 samples/sec   Loss 0.7775   LearningRate 0.0000   Epoch: 39   Global Step: 68550   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-06 22:58:01,819-Speed 9382.19 samples/sec   Loss 0.7773   LearningRate 0.0000   Epoch: 39   Global Step: 68560   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-06 22:58:27,922-Speed 9415.43 samples/sec   Loss 0.7785   LearningRate 0.0000   Epoch: 39   Global Step: 68570   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-06 22:58:54,058-Speed 9403.81 samples/sec   Loss 0.7750   LearningRate 0.0000   Epoch: 39   Global Step: 68580   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-06 22:59:20,257-Speed 9380.87 samples/sec   Loss 0.7847   LearningRate 0.0000   Epoch: 39   Global Step: 68590   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-06 22:59:46,427-Speed 9391.52 samples/sec   Loss 0.7843   LearningRate 0.0000   Epoch: 39   Global Step: 68600   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-06 23:00:12,568-Speed 9401.89 samples/sec   Loss 0.7818   LearningRate 0.0000   Epoch: 39   Global Step: 68610   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-03-06 23:00:38,798-Speed 9369.80 samples/sec   Loss 0.7731   LearningRate 0.0000   Epoch: 39   Global Step: 68620   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-03-06 23:01:05,044-Speed 9364.11 samples/sec   Loss 0.7791   LearningRate 0.0000   Epoch: 39   Global Step: 68630   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-03-06 23:01:31,229-Speed 9385.94 samples/sec   Loss 0.7739   LearningRate 0.0000   Epoch: 39   Global Step: 68640   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-03-06 23:01:57,352-Speed 9408.18 samples/sec   Loss 0.7806   LearningRate 0.0000   Epoch: 39   Global Step: 68650   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-03-06 23:02:23,484-Speed 9405.13 samples/sec   Loss 0.7772   LearningRate 0.0000   Epoch: 39   Global Step: 68660   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-03-06 23:02:49,738-Speed 9361.23 samples/sec   Loss 0.7767   LearningRate 0.0000   Epoch: 39   Global Step: 68670   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-03-06 23:03:15,839-Speed 9416.27 samples/sec   Loss 0.7820   LearningRate 0.0000   Epoch: 39   Global Step: 68680   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-03-06 23:03:41,975-Speed 9403.18 samples/sec   Loss 0.7789   LearningRate 0.0000   Epoch: 39   Global Step: 68690   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-03-06 23:04:08,109-Speed 9404.53 samples/sec   Loss 0.7738   LearningRate 0.0000   Epoch: 39   Global Step: 68700   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-03-06 23:04:34,217-Speed 9413.48 samples/sec   Loss 0.7732   LearningRate 0.0000   Epoch: 39   Global Step: 68710   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-03-06 23:05:00,405-Speed 9385.00 samples/sec   Loss 0.7767   LearningRate 0.0000   Epoch: 39   Global Step: 68720   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-03-06 23:05:26,538-Speed 9404.54 samples/sec   Loss 0.7723   LearningRate 0.0000   Epoch: 39   Global Step: 68730   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-03-06 23:05:52,604-Speed 9429.03 samples/sec   Loss 0.7786   LearningRate 0.0000   Epoch: 39   Global Step: 68740   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-06 23:06:18,765-Speed 9394.54 samples/sec   Loss 0.7790   LearningRate 0.0000   Epoch: 39   Global Step: 68750   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-06 23:06:44,830-Speed 9429.11 samples/sec   Loss 0.7810   LearningRate 0.0000   Epoch: 39   Global Step: 68760   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-06 23:07:10,906-Speed 9425.49 samples/sec   Loss 0.7760   LearningRate 0.0000   Epoch: 39   Global Step: 68770   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-06 23:07:37,062-Speed 9396.33 samples/sec   Loss 0.7794   LearningRate 0.0000   Epoch: 39   Global Step: 68780   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-06 23:08:03,149-Speed 9421.08 samples/sec   Loss 0.7755   LearningRate 0.0000   Epoch: 39   Global Step: 68790   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-06 23:08:29,236-Speed 9421.27 samples/sec   Loss 0.7787   LearningRate 0.0000   Epoch: 39   Global Step: 68800   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-06 23:08:55,319-Speed 9422.64 samples/sec   Loss 0.7805   LearningRate 0.0000   Epoch: 39   Global Step: 68810   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-06 23:09:21,449-Speed 9405.90 samples/sec   Loss 0.7784   LearningRate 0.0000   Epoch: 39   Global Step: 68820   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-06 23:09:47,542-Speed 9419.17 samples/sec   Loss 0.7817   LearningRate 0.0000   Epoch: 39   Global Step: 68830   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-06 23:10:13,671-Speed 9406.44 samples/sec   Loss 0.7804   LearningRate 0.0000   Epoch: 39   Global Step: 68840   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-03-06 23:10:39,837-Speed 9392.67 samples/sec   Loss 0.7740   LearningRate 0.0000   Epoch: 39   Global Step: 68850   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-03-06 23:11:05,974-Speed 9403.11 samples/sec   Loss 0.7755   LearningRate 0.0000   Epoch: 39   Global Step: 68860   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-06 23:11:32,034-Speed 9431.02 samples/sec   Loss 0.7756   LearningRate 0.0000   Epoch: 39   Global Step: 68870   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-06 23:11:58,104-Speed 9427.48 samples/sec   Loss 0.7768   LearningRate 0.0000   Epoch: 39   Global Step: 68880   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-06 23:12:24,226-Speed 9408.34 samples/sec   Loss 0.7773   LearningRate 0.0000   Epoch: 39   Global Step: 68890   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-06 23:12:50,380-Speed 9397.17 samples/sec   Loss 0.7810   LearningRate 0.0000   Epoch: 39   Global Step: 68900   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-06 23:13:16,492-Speed 9412.16 samples/sec   Loss 0.7749   LearningRate 0.0000   Epoch: 39   Global Step: 68910   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-06 23:13:42,606-Speed 9411.46 samples/sec   Loss 0.7774   LearningRate 0.0000   Epoch: 39   Global Step: 68920   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-06 23:14:08,724-Speed 9409.74 samples/sec   Loss 0.7786   LearningRate 0.0000   Epoch: 39   Global Step: 68930   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-06 23:14:34,943-Speed 9373.96 samples/sec   Loss 0.7717   LearningRate 0.0000   Epoch: 39   Global Step: 68940   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-06 23:15:01,065-Speed 9408.90 samples/sec   Loss 0.7746   LearningRate 0.0000   Epoch: 39   Global Step: 68950   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-06 23:15:27,255-Speed 9384.09 samples/sec   Loss 0.7720   LearningRate 0.0000   Epoch: 39   Global Step: 68960   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-03-06 23:15:53,376-Speed 9408.91 samples/sec   Loss 0.7760   LearningRate 0.0000   Epoch: 39   Global Step: 68970   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-03-06 23:16:19,461-Speed 9422.02 samples/sec   Loss 0.7796   LearningRate 0.0000   Epoch: 39   Global Step: 68980   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-03-06 23:16:45,655-Speed 9382.88 samples/sec   Loss 0.7747   LearningRate 0.0000   Epoch: 39   Global Step: 68990   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-03-06 23:17:11,822-Speed 9392.05 samples/sec   Loss 0.7783   LearningRate 0.0000   Epoch: 39   Global Step: 69000   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-03-06 23:17:37,991-Speed 9392.00 samples/sec   Loss 0.7785   LearningRate 0.0000   Epoch: 39   Global Step: 69010   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-03-06 23:18:04,269-Speed 9352.67 samples/sec   Loss 0.7741   LearningRate 0.0000   Epoch: 39   Global Step: 69020   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-03-06 23:18:30,557-Speed 9349.07 samples/sec   Loss 0.7782   LearningRate 0.0000   Epoch: 39   Global Step: 69030   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-06 23:18:56,679-Speed 9408.79 samples/sec   Loss 0.7796   LearningRate 0.0000   Epoch: 39   Global Step: 69040   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-06 23:19:22,745-Speed 9428.85 samples/sec   Loss 0.7809   LearningRate 0.0000   Epoch: 39   Global Step: 69050   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-06 23:19:48,841-Speed 9418.13 samples/sec   Loss 0.7767   LearningRate 0.0000   Epoch: 39   Global Step: 69060   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-06 23:20:14,911-Speed 9427.00 samples/sec   Loss 0.7794   LearningRate 0.0000   Epoch: 39   Global Step: 69070   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-06 23:20:41,045-Speed 9404.34 samples/sec   Loss 0.7818   LearningRate 0.0000   Epoch: 39   Global Step: 69080   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-06 23:21:07,078-Speed 9440.52 samples/sec   Loss 0.7765   LearningRate 0.0000   Epoch: 39   Global Step: 69090   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-06 23:21:33,136-Speed 9431.79 samples/sec   Loss 0.7808   LearningRate 0.0000   Epoch: 39   Global Step: 69100   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-06 23:21:59,235-Speed 9416.93 samples/sec   Loss 0.7772   LearningRate 0.0000   Epoch: 39   Global Step: 69110   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-06 23:22:25,339-Speed 9415.14 samples/sec   Loss 0.7787   LearningRate 0.0000   Epoch: 39   Global Step: 69120   Fp16 Grad Scale: 16384   Required: -0 hours
Training: 2022-03-06 23:22:51,452-Speed 9411.79 samples/sec   Loss 0.7833   LearningRate 0.0000   Epoch: 39   Global Step: 69130   Fp16 Grad Scale: 16384   Required: -0 hours